Skip to content

fix occasional restarts of kruise controller manager#2382

Open
PersistentJZH wants to merge 1 commit intoopenkruise:masterfrom
PersistentJZH:feat/fix-occasional-restarts-of-kruise-controller-manager
Open

fix occasional restarts of kruise controller manager#2382
PersistentJZH wants to merge 1 commit intoopenkruise:masterfrom
PersistentJZH:feat/fix-occasional-restarts-of-kruise-controller-manager

Conversation

@PersistentJZH
Copy link
Contributor

Ⅰ. Describe what this PR does

fix occasional restarts of kruise controller manager

root cause: The CRD installation was completed later than the CRD index registration time during the start of kruise-controller-manager.

Ⅱ. Does this pull request fix one issue?

Fix: #2380

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

Copilot AI review requested due to automatic review settings February 28, 2026 12:36
@kruise-bot kruise-bot requested review from veophi and zmberg February 28, 2026 12:36
@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign furykerry for approval by writing /assign @furykerry in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses intermittent kruise-controller-manager restarts during startup by preventing field index registration for ImagePullJob (v1alpha1) until the corresponding CRD/version is discoverable, avoiding “no matches for kind” errors when CRDs are established slightly later than the controller starts.

Changes:

  • Guard ImagePullJob (apps.kruise.io/v1alpha1) ownerReference field index registration behind utildiscovery.DiscoverObject.
  • Align v1alpha1 ImagePullJob ownerRef index registration behavior with other CRD-gated index registrations already present in RegisterFieldIndexes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to 74
if utildiscovery.DiscoverObject(&appsv1alpha1.ImagePullJob{}) {
if err = c.IndexField(context.TODO(), &appsv1alpha1.ImagePullJob{}, IndexNameForOwnerRefUID, ownerIndexFunc); err != nil {
return
}
}
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiscoverObject(&appsv1alpha1.ImagePullJob{}) is now called here and again later in RegisterFieldIndexes (for the isActive index). Because DiscoverObject can retry with exponential backoff, calling it multiple times for the same GVK can noticeably slow startup in CRD-not-ready scenarios and can also lead to inconsistent registration (e.g., ownerRef index registered but isActive skipped, or vice versa). Consider computing the discovery result once (e.g., hasImagePullJobV1Alpha1 := utildiscovery.DiscoverObject(...)) and reusing it for all ImagePullJob v1alpha1 index registrations (and similarly for v1beta1).

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Feb 28, 2026

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.68%. Comparing base (bba2621) to head (669dbb7).

Files with missing lines Patch % Lines
pkg/util/fieldindex/register.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2382      +/-   ##
==========================================
+ Coverage   48.66%   48.68%   +0.01%     
==========================================
  Files         324      324              
  Lines       27920    27921       +1     
==========================================
+ Hits        13587    13592       +5     
+ Misses      12794    12791       -3     
+ Partials     1539     1538       -1     
Flag Coverage Δ
unittests 48.68% <0.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: PersistentJZH <zhihao.kan17@gmail.com>

- fix occasional restarts of kruise-controller-manager
@PersistentJZH PersistentJZH force-pushed the feat/fix-occasional-restarts-of-kruise-controller-manager branch from efaf2dc to 669dbb7 Compare February 28, 2026 13:10
@PersistentJZH
Copy link
Contributor Author

/cc @zmberg @furykerry

@kruise-bot kruise-bot requested a review from furykerry March 6, 2026 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS size/XS: 0-9

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] 安装kruise时kruise-controller-manager会重启

3 participants