Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HPA tolerance a flag #52275

Conversation

mattjmcnaughton
Copy link
Contributor

@mattjmcnaughton mattjmcnaughton commented Sep 11, 2017

What this PR does / why we need it:
Make HPA tolerance configurable as a flag. This change allows us to use
different tolerance values in production/testing.

Which issue this PR fixes:
Fixes #18155

Release note:

Control HPA tolerance through the `horizontal-pod-autoscaler-tolerance` flag.

Signed-off-by: mattjmcnaughton [email protected]

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 11, 2017
@k8s-ci-robot
Copy link
Contributor

Hi @mattjmcnaughton. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Sep 11, 2017
@ncdc
Copy link
Member

ncdc commented Sep 11, 2017

/unassign
/assign @DirectXMan12

@k8s-ci-robot k8s-ci-robot assigned DirectXMan12 and unassigned ncdc Sep 11, 2017
@xiangpengzhao
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 12, 2017
@mattjmcnaughton mattjmcnaughton force-pushed the mattjmcnaughton/18155-hpa-tolerance-should-be-flag branch from c949523 to 1cf1e0c Compare September 12, 2017 02:33
Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple nits, otherwise looks good

tolerance = 0.1
const (
// DefaultTolerance used to calculating when to scale up/scale down.
DefaultTolerance = 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to replica_calculator.go, since we actually use it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good point!

@@ -164,6 +166,7 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet, allControllers []string, disabled
fs.DurationVar(&s.HorizontalPodAutoscalerSyncPeriod.Duration, "horizontal-pod-autoscaler-sync-period", s.HorizontalPodAutoscalerSyncPeriod.Duration, "The period for syncing the number of pods in horizontal pod autoscaler.")
fs.DurationVar(&s.HorizontalPodAutoscalerUpscaleForbiddenWindow.Duration, "horizontal-pod-autoscaler-upscale-delay", s.HorizontalPodAutoscalerUpscaleForbiddenWindow.Duration, "The period since last upscale, before another upscale can be performed in horizontal pod autoscaler.")
fs.DurationVar(&s.HorizontalPodAutoscalerDownscaleForbiddenWindow.Duration, "horizontal-pod-autoscaler-downscale-delay", s.HorizontalPodAutoscalerDownscaleForbiddenWindow.Duration, "The period since last downscale, before another downscale can be performed in horizontal pod autoscaler.")
fs.Float64Var(&s.HorizontalPodAutoscalerTolerance, "horizontal-pod-autoscaler-tolerance", s.HorizontalPodAutoscalerTolerance, "Desired pod tolerance outside of which controller will upscale/downscale.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be reworded slightly for clarity. People get easily confused about the mechanics of the HPA controller, so we need to make sure things are crystal clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I was thinking about that - I will take another stab - please let me know if you have any suggestions :)

@mattjmcnaughton mattjmcnaughton force-pushed the mattjmcnaughton/18155-hpa-tolerance-should-be-flag branch 2 times, most recently from 3f1627d to cfb8c0a Compare September 13, 2017 14:04
@mattjmcnaughton
Copy link
Contributor Author

@DirectXMan12 - thanks for the feedback - pushed an update incorporating your changes whenever you get the chance to take a look.

Copy link
Contributor

@DirectXMan12 DirectXMan12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last change, and then this is good

@@ -77,6 +78,7 @@ func NewCMServer() *CMServer {
HorizontalPodAutoscalerSyncPeriod: metav1.Duration{Duration: 30 * time.Second},
HorizontalPodAutoscalerUpscaleForbiddenWindow: metav1.Duration{Duration: 3 * time.Minute},
HorizontalPodAutoscalerDownscaleForbiddenWindow: metav1.Duration{Duration: 5 * time.Minute},
HorizontalPodAutoscalerTolerance: podautoscaler.DefaultTolerance,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, I missed this the last time around. Normally, this location is the place for "defaults". We actually probably just want the default value here (i.e. just have the value here, and don't reference podautoscaler, and the rename DefaultTolerance to something like defaultTestingTolerance...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha, I wasn't sure about that. Thanks for catching.

@@ -164,6 +166,7 @@ func (s *CMServer) AddFlags(fs *pflag.FlagSet, allControllers []string, disabled
fs.DurationVar(&s.HorizontalPodAutoscalerSyncPeriod.Duration, "horizontal-pod-autoscaler-sync-period", s.HorizontalPodAutoscalerSyncPeriod.Duration, "The period for syncing the number of pods in horizontal pod autoscaler.")
fs.DurationVar(&s.HorizontalPodAutoscalerUpscaleForbiddenWindow.Duration, "horizontal-pod-autoscaler-upscale-delay", s.HorizontalPodAutoscalerUpscaleForbiddenWindow.Duration, "The period since last upscale, before another upscale can be performed in horizontal pod autoscaler.")
fs.DurationVar(&s.HorizontalPodAutoscalerDownscaleForbiddenWindow.Duration, "horizontal-pod-autoscaler-downscale-delay", s.HorizontalPodAutoscalerDownscaleForbiddenWindow.Duration, "The period since last downscale, before another downscale can be performed in horizontal pod autoscaler.")
fs.Float64Var(&s.HorizontalPodAutoscalerTolerance, "horizontal-pod-autoscaler-tolerance", s.HorizontalPodAutoscalerTolerance, "Horizontal pod autoscaler will not attempt upscaling/downscaling if desired replicas/current replicas is within tolerance of 1.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that I'm more awake, I'd probably go for something like "The minimum change (from 1.0) in the desired-to-actual metrics ratio for the horizontal pod autoscaler to consider scaling".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect :)

@mattjmcnaughton mattjmcnaughton force-pushed the mattjmcnaughton/18155-hpa-tolerance-should-be-flag branch from cfb8c0a to d7a29f0 Compare September 14, 2017 13:51
@mattjmcnaughton
Copy link
Contributor Author

@DirectXMan12 updated with your suggested changes - this is ready for another look whenever you get the chance. Thanks!

@DirectXMan12
Copy link
Contributor

DirectXMan12 commented Sep 14, 2017

/lgtm
/ok-to-test

Looks like the bazel failure is #52433. Retest once master re-opens.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 14, 2017
@mattjmcnaughton
Copy link
Contributor Author

/test pull-kubernetes-e2e-kops-aws

@mattjmcnaughton
Copy link
Contributor Author

mattjmcnaughton commented Sep 16, 2017

/assign @mikedanese (re suggested approvers)

@mattjmcnaughton
Copy link
Contributor Author

/test all

@mattjmcnaughton
Copy link
Contributor Author

/test pull-kubernetes-e2e-gce-etcd3

@mattjmcnaughton
Copy link
Contributor Author

Ping @mikedanese whenever you get a chance to look at this - thanks :)

Fix kubernetes#18155

Make HPA tolerance configurable as a flag. This change allows us to use
different tolerance values in production/testing.

Signed-off-by: mattjmcnaughton <[email protected]>
@mattjmcnaughton mattjmcnaughton force-pushed the mattjmcnaughton/18155-hpa-tolerance-should-be-flag branch from 42d7507 to abd4668 Compare September 29, 2017 02:03
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 29, 2017
@mattjmcnaughton
Copy link
Contributor Author

@DirectXMan12 would you mind giving another LGTM? My rebase cancelled your previous one.

/assign @lavalamp (cmd/kube-controller-manager/OWNERS)
/assign @vishh (pkg/apis/componentconfig/OWNERS)

Thanks so much :)

@DirectXMan12
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2017
@mikedanese
Copy link
Member

Why aren't we making this an HPA spec element? Was that discussed?

@kubernetes/sig-autoscaling-api-reviews

@k8s-ci-robot k8s-ci-robot added sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Oct 3, 2017
@mattjmcnaughton
Copy link
Contributor Author

Great question @mikedanese - the original TODO in the code case was // TODO: make it a flag or HPA spec element., so it mentioned both as an option.

In my mind, tolerance conceptually fits with sync period and upscale/downscale forbidden window, which are also flags. But I'm definitely open to discussion :)

@mattjmcnaughton
Copy link
Contributor Author

HI @mikedanese just wanted to follow up on this - what's the best process for doing an auto-scaling api review, if you think its necessary (I'm pretty new to k8s contributing, so its a bit of a foreign process to me haha)? Thanks!

@DirectXMan12
Copy link
Contributor

@mattjmcnaughton @mikedanese we've discussed things like this in the past, and generally tended to err on the side of "that seems like an implementation detail, and implementation details shouldn't be in the API". While I'm certainly willing to have that discussion, I think making it a flag is probably fine.

@mikedanese
Copy link
Member

/approve

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: DirectXMan12, mattjmcnaughton, mikedanese

Associated issue: 18155

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 16, 2017
@mikedanese
Copy link
Member

/retest

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

@k8s-github-robot
Copy link

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 03cb11f into kubernetes:master Oct 16, 2017
@mattjmcnaughton mattjmcnaughton deleted the mattjmcnaughton/18155-hpa-tolerance-should-be-flag branch October 17, 2017 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update HPA tolerance to be a flag