Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New CRD StackConfigPolicy to declaratively configure multiple Elasticsearch clusters #6148

Merged
merged 32 commits into from
Nov 22, 2022

Conversation

thbkrkr
Copy link
Contributor

@thbkrkr thbkrkr commented Nov 7, 2022

New custom resource StackConfigPolicy to configure cluster settings, snapshot repositorires and lifecycle policies for a list of Elasticsearch clusters matching a labels selector.

apiVersion: stackconfigpolicy.k8s.elastic.co/v1alpha1
kind: StackConfigPolicy
metadata:
  name: config-staging
  namespace: elastic-system
spec:
  resourceSelector:
    matchLabels:
      env: staging
  elasticsearch:
    clusterSettings:
      indices.recovery.max_bytes_per_sec: "100mb"
    snapshotRepositories:
      backup:
        type: gcs
        settings:
          bucket: "gcs-bucket"
    snapshotLifecyclePolicies:
      test-snapshots:
        schedule: "0 1 2 3 4 ?"
        name: "<staging-snap-{now/d}>"
        repository: "backup"
        config:
          indices: ["*"]  
          ignore_unavailable: true
          include_global_state: false
        retention:
          expire_after: "30d"
          min_count: 1
          max_count: 50

Notables details:

The namespaces used to find Elasticsearch clusters to configure depend on the StackConfigPolicy namespace. If it is the operator namespace, all namespaces managed by the operator are used, otherwise only the StackConfigPolicy namespace.

A new FileSettings Secret <esName>-es-file-settings owns by Elasticsearch is created empty and mounted as data volume by the Elasticsearch controller. The StackConfigPolicy controller soft owns the Secret and only updates it. On deletion, soft owned labels are used to find Secrets to reset.

For the SecureSettings Secrets defined in the StackConfigPolicy, the StackConfigPolicy controller writes the Secret namespaces and names in an annotation of the FileSettings Secret. The ES controller watches and reads the Secrets from that and merges their content with the user-provided Secrets of the Elasticsearch resource in the existing Secret <esName>-es-secure-settings.

Discuss:

  • cleanStacktrace

TODO:

@thbkrkr thbkrkr added the >feature Adds or discusses adding a feature to the product label Nov 7, 2022
@thbkrkr thbkrkr force-pushed the stackconfigpolicy branch 2 times, most recently from 23d1510 to 0af90a6 Compare November 7, 2022 22:47
@thbkrkr thbkrkr force-pushed the stackconfigpolicy branch 2 times, most recently from e554722 to d052970 Compare November 8, 2022 13:39
@thbkrkr thbkrkr marked this pull request as ready for review November 8, 2022 17:04
@pebrc pebrc self-assigned this Nov 9, 2022
Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I did a first pass just by looking at the code. Will do some testing next.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some tests. Looks really good. One oddity I noticed during testing is that errors during snapshot repository creation are actually not surfaced through the reserved cluster state but only through Elasticsearch logs. However that's not something we can fix here but maybe interesting to discuss with the Elasticsearch team. I tried both using GKE Workload Identity and explicit service account credentials, which both worked fine. The former is quite compelling as you don't have to configure any credentials inside the cluster and the policy just works with minimal delay (no rolling restarts!)

- getclusterstate only supported in v8 es client
- adjust godoc
- helper struct to simplify collecting telemetry
- rename UpdateResourceStatusInPhase to AddPolicyErrorFor
- refactor status.Update()
- take into account es version errors in policy status
- no finalizers
- group var declaration
- lookup in a nil map is safe
- remove useless getters
- simplification
- remove dead code
- better assertions in Test_newSettingsSecret
- fix comment typos
Copy link
Contributor

@naemono naemono left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't finished the code review, and haven't tested yet, but some initial comments.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost LGTM!

}
requests := make([]reconcile.Request, 0)
for _, stackConfig := range stackConfigList.Items {
stackConfig := stackConfig
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify I consider this a nice to have optimisation and not critical for this to be merged

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review as I wanted to give it a try and understand how the feature is implemented.
Overall LGTM 👍 , I only left a few cosmetic comments.

- typo: s/Error/Errors/
- typo: s/Human/human
- comment phaseOrder map
- ensure AddPolicyErrorFor is called once per named resource
- wrap error from GetSecureSettingsSecretSources
- add unit tests in TestGarbageCollectSoftOwnedSecrets and
TestGarbageCollectAllSoftOwnedOrphanSecrets
- improve error messages with types
@thbkrkr
Copy link
Contributor Author

thbkrkr commented Nov 21, 2022

If you would like to take another look, I've taken into account all your feedback.

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Collaborator

@pebrc pebrc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@thbkrkr
Copy link
Contributor Author

thbkrkr commented Nov 22, 2022

Thanks pebrc, naemono, barkbay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature Adds or discusses adding a feature to the product v2.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants