Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pod stage for failing an initContainer #1019

Merged
merged 2 commits into from
Apr 16, 2024

Conversation

yuanchen8911
Copy link
Member

@yuanchen8911 yuanchen8911 commented Apr 3, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR adds a pod stage that injects an error to initContainer specified by a label and annotations, including the initContainer's name and optional failure message, reason, exit code and delay parameters.

labels:
   pod-init-container-running-failed.stage.kwok.x-k8s.io: 'true'
annotations:
   pod-init-container-running-failed.stage.kwok.x-k8s.io/container-name: <failed initContainer name>
   // optional
   pod-init-container-running-failed.stage.kwok.x-k8s.io/reason: <failure reason, the default is "initContainerError">
   pod-init-container-running-failed.stage.kwok.x-k8s.io/message: <failure message, the default is "initContainer reported errors" > 
   pod-init-container-running-failed.stage.kwok.x-k8s.io/exit-code: <exit code, the default is 1>
   pod-init-container-running-failed.stage.kwok.x-k8s.io/delay: <delay in milliseconds> 
   pod-init-container-running-failed.stage.kwok.x-k8s.io/jitter-delay: <jitter delay in milliseconds> 

Which issue(s) this PR fixes:

Fixes #
None

Special notes for your reviewer:

Questions:

  1. How to set the default values for durationFrom and jitterDurationFrom
  2. Do we need to set the statues of the other initContainers after the failed initContainer?

Does this PR introduce a user-facing change?

Support the failure of an initContainer in a pod.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 3, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 3, 2024
Copy link

netlify bot commented Apr 3, 2024

Deploy Preview for k8s-kwok canceled.

Name Link
🔨 Latest commit d7d3929
🔍 Latest deploy log https://app.netlify.com/sites/k8s-kwok/deploys/661d4a7404994a000853919b

@yuanchen8911 yuanchen8911 marked this pull request as draft April 3, 2024 18:36
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 3, 2024
@yuanchen8911
Copy link
Member Author

/cc @wzshiming

@k8s-ci-robot k8s-ci-robot requested a review from wzshiming April 3, 2024 18:39
@yuanchen8911 yuanchen8911 force-pushed the chaos branch 4 times, most recently from afe6c57 to f50a28e Compare April 4, 2024 17:22
apiVersion: kwok.x-k8s.io/v1alpha1
kind: Stage
metadata:
name: pod-init-container-running-failed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename the file name to match this one.

IMO, one should distinguish between creating and running failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the file to pod-init-container-running-failed.yaml.

@yuanchen8911
Copy link
Member Author

yuanchen8911 commented Apr 8, 2024

Thanks for the fixes, @wzshiming! All the changes were committed. I still have two questions.

  1. Would it be possible to use some default values if the duration/jitterDuration annotations are not specified?
  2. The statuses of those initContainers defined after the failed initContainers should NOT be terminated with Completed, right?

@yuanchen8911 yuanchen8911 requested a review from wzshiming April 8, 2024 15:46
@wzshiming
Copy link
Member

wzshiming commented Apr 9, 2024

Would it be possible to use some default values if the duration/jitterDuration annotations are not specified?

Sure, set both durationMilliseconds and durationFrom.expressionFrom, the durationMilliseconds will like a default for it.

The statues of those initContainers after the failed initContainers should NOT be terminated with Completed, right?

Yes, the Completed reason is only for the 0 exit code

https://github.com/containerd/containerd/blob/406e9e84b4fb0a6bf2912f56ff1dfe78a56ed5ee/internal/cri/server/container_status.go#L89-L95

Change labels to annotations

Replace labels with annotations

Rename label and annotation names

Change label value True to true

Rename label and annotations

Update kustomize/stage/pod/chaos/pod-init-container-failed.yaml

Co-authored-by: Shiming Zhang <[email protected]>

rename and set default duration values
@yuanchen8911
Copy link
Member Author

yuanchen8911 commented Apr 9, 2024

Would it be possible to use some default values if the duration/jitterDuration annotations are not specified?

Sure, set both durationMilliseconds and durationFrom.expressionFrom, the durationMilliseconds will like a default for it.

Set both durationMilliseconds and jitterDurationMillisecons to 1000 by default (no jitter).

The statues of those initContainers after the failed initContainers should NOT be terminated with Completed, right?

Yes, the Completed reason is only for the 0 exit code

https://github.com/containerd/containerd/blob/406e9e84b4fb0a6bf2912f56ff1dfe78a56ed5ee/internal/cri/server/container_status.go#L89-L95

My question was whether we should distinguish an initContainers defined before the failed initContainers and one after the failed initContainer. The former's state is terminated while the latter's is waiting.

@yuanchen8911 yuanchen8911 changed the title Add a pod stage file for failing an initContainer (WIP) Add a pod stage file for failing an initContainer Apr 12, 2024
Copy link
Member

@wzshiming wzshiming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you, I think it's a good start for chaos testing

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2024
@yuanchen8911 yuanchen8911 changed the title Add a pod stage file for failing an initContainer Add a pod stage for failing an initContainer Apr 15, 2024
@yuanchen8911 yuanchen8911 marked this pull request as ready for review April 15, 2024 14:09
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2024
@yuanchen8911 yuanchen8911 requested a review from wzshiming April 15, 2024 14:10
@yuanchen8911
Copy link
Member Author

/lgtm

Thank you, I think it's a good start for chaos testing

Removed the WIP label. Can you approve it again? Thanks!

@yuanchen8911
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2024
@yuanchen8911
Copy link
Member Author

New changes are detected. LGTM label has been removed.

Fixed the linkyaml issues (trailing spaces). PTAL, @wzshiming. Thanks!

@wzshiming
Copy link
Member

/approve
/lgtm
/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Apr 16, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wzshiming, yuanchen8911

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 16, 2024
@k8s-ci-robot k8s-ci-robot merged commit e2a2a9b into kubernetes-sigs:main Apr 16, 2024
26 checks passed
mohamedasifs123 pushed a commit to mohamedasifs123/kwok that referenced this pull request May 21, 2024
* Add a pod stage file for failing an initContainer

Change labels to annotations

Replace labels with annotations

Rename label and annotation names

Change label value True to true

Rename label and annotations

Update kustomize/stage/pod/chaos/pod-init-container-failed.yaml

Co-authored-by: Shiming Zhang <[email protected]>

rename and set default duration values

* Fix the yamllint issues (trailing spaces)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants