-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds provisioner limits based on cpu and mem #817
Conversation
✔️ Deploy Preview for karpenter-docs-prod canceled. 🔨 Explore the source changes: 2591f0a 🔍 Inspect the deploy log: https://app.netlify.com/sites/karpenter-docs-prod/deploys/619ebf3aefc2cb0008c10508 |
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Etarn fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I really like this approach. Some quick comments.
@@ -29,6 +30,9 @@ type ProvisionerStatus struct { | |||
// its target, and indicates whether or not those conditions are met. | |||
// +optional | |||
Conditions apis.Conditions `json:"conditions,omitempty"` | |||
|
|||
// Resources is the list of resources that have been provisioned. | |||
Resources v1.ResourceList `json:"resources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we had a mechanism to express the amount of resources reserved by pods as well as the total amount of capacity in the current nodes. Not that you need to implement them both, but we should make sure it makes sense on the API.
Provisioner.Status.Capacity
Provisioner.Status.Reserved
Maybe we want to be cautious of the Reserved bit, since it will mean updating on every pod event.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice call. I think there's definite value in having that in ProvisionerSpec. Might even help show how much better the instance utilization is with Karpenter. I think in the API though, it should could maybe be Provisioner.Status.Requests?
Agreed though, that it might mean we update the provisioner on each pod event so I'm a little concerned from a scaling front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Controller-runtime will batch the reconciles and has a serial execution guarantee. I think this will roughly translate into a linear stream of writes to the API Server as it detects new resources. AFAICT, it shouldn't be enough to overwhelm anything.
provisioner, err := l.updateState(ctx, provisioner) | ||
if err != nil { | ||
return fmt.Errorf("unable to determine status of provisioner") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why I didn't need this in my earlier revisions. Maybe I was doing something else that was forcing the provisioner to get updated or I wasn't testing my scale ups fast enough, but without this the state in this provisioner object is really off. Since it's hitting a cache, I don't mind moving this into the for loop below either so we reduce the chance of staleness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! We just need a unit test now.
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
Co-authored-by: Ellis Tarn <[email protected]>
1. Issue, if available:
N/A
2. Description of changes:
Adds limits to the Karpenter provisioner based on CPU and Memory. With the current implementation these are soft limits and aren't strictly enforced.
The way the limiting works is that we keep launching worker nodes until we have any amount of capacity yet. Only once we've actually hit / crossed our limits do we stop the provisioning. It's a pretty naive implementation, and can be problematic sometimes.
Scenario 1 -
You've got 2 CPU left, and a 10 new pods come in each asking for 1 CPU. In an ideal world, we should only provision a tiny instance with 2CPU that can host 2 pods and the other 8 remain stuck in pending. However, what this implementation will do is provision whatever instance type Karpenter would do anyway based on our bin packing logic and fit all 10 pods. The next batch of pending pods however, will be entirely stuck since we'd detect we've crossed our limits.
Scenario 2 -
If you set the CPU or Memory to 0 in the provisioner, Karpenter won't be able to launch any worker nodes at all. This is by design and can help when you're looking to block instance creation entirely. Will be more useful once I add GPUs in.
What is to come next?
Testing done so far
3. Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.