Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lvm volume deletion is not idempotent leading to leak of lvmvolumes objects #15

Closed
iyashu opened this issue Jan 29, 2021 · 1 comment · Fixed by #16
Closed

lvm volume deletion is not idempotent leading to leak of lvmvolumes objects #15

iyashu opened this issue Jan 29, 2021 · 1 comment · Fixed by #16
Labels
bug Something isn't working
Milestone

Comments

@iyashu
Copy link
Contributor

iyashu commented Jan 29, 2021

What steps did you take and what happened:

Steps to reproduce -

  1. Create a statefulset of size 5 with some volumeclaim & wait for all pods to be running.
  2. Delete the statefulset & wait for deletion of all pods
  3. Delete all pvcs using kubectl delete pvc {claim-name}-{sf-name}-{index} and wait till its removed.
  4. List all lvmvolume objects in the openebs namespace using kubectl get lvmvol -n openebs.

You will observe that there are few lvmvolume objects hanging around, although the corresponding pvs are deleted.

What did you expect to happen:
Deletion of bounded pvc (pv) lead to eventual deletion of corresponding lvmvolume object.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other Pastebin is fine.)

  • kubectl logs -f openebs-lvm-controller-0 -n kube-system -c openebs-lvm-plugin
  • kubectl logs -f openebs-lvm-node-[xxxx] -n kube-system -c openebs-lvm-plugin
I0129 16:52:49.015466       1 grpc.go:72] GRPC call: /csi.v1.Node/NodeUnpublishVolume requests {"target_path":"/var/lib/kubelet/pods/ae9be57b-ca49-4515-a315-468875b0b02f/volumes/kubernetes.io~csi/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345/mount","volume_id":"pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"}
I0129 16:52:49.134790       1 mount.go:112] umount done pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 path /var/lib/kubelet/pods/ae9be57b-ca49-4515-a315-468875b0b02f/volumes/kubernetes.io~csi/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345/mount
I0129 16:52:49.134811       1 agent.go:164] hostpath: volume pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 path: /var/lib/kubelet/pods/ae9be57b-ca49-4515-a315-468875b0b02f/volumes/kubernetes.io~csi/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345/mount has been unmounted.
I0129 16:52:49.134817       1 grpc.go:81] GRPC response: {}
I0129 16:53:47.160657       1 volume.go:128] Got update event for deleted Vol lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345
I0129 16:53:47.176698       1 volume.go:128] Got update event for deleted Vol lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345
I0129 16:53:47.247680       1 lvm_util.go:107] lvm: destroyed volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345
E0129 16:53:47.252755       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': Operation cannot be fulfilled on lvmvolumes.local.openebs.io "pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345": the object has been modified; please apply your changes to the latest version and try again, requeuing
E0129 16:53:47.295536       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
E0129 16:53:47.295587       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': exit status 5, requeuing
E0129 16:53:47.339509       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
E0129 16:53:47.339550       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': exit status 5, requeuing
E0129 16:53:47.379639       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
E0129 16:53:47.379683       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': exit status 5, requeuing
E0129 16:53:47.419544       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
E0129 16:53:47.419587       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': exit status 5, requeuing
E0129 16:53:47.467631       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
E0129 16:53:47.467678       1 volume.go:243] error syncing 'openebs/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345': exit status 5, requeuing
E0129 16:53:47.667561       1 lvm_util.go:101] lvm: could not destroy volume lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345 cmd [-y /dev/lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345] error:   Failed to find logical volume "lvmvg/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345"
  • kubectl get pods -n kube-system
  • kubectl get lvmvol -A -o yaml

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
As I'm deleting the bounded pvc object, sometimes the corresponding lvmvolume object (custom crd provided by openebs under local.openebs.io/v1alpha1) is not getting deleted. However, the actual logical volume created on the physical node gets deleted (as per lsblk or lvs output), but the lvmvolume object gets struck.

As one can see in above node agent logs, the controller is trying to delete non-existent lv without check if it's exists or not in first place. Since the lv deletion (as it doesn't exists) fails, the finalizer lvm.openebs.io/finalizer on lvmvol is not getting removed.

err = lvm.DestroyVolume(Vol)

Now you may be wondering why one ended up in this situation where lv doesn't exists and the controller is trying to delete it. Reason being the failure of finalizer removal during the first syncVol call. As one can see from above node agent logs that while removing the finalizer (RemoveVolFinalizer func), apiserver is responding with conflict error (the object has been modified; please apply your changes to the latest version), i.e someone has updated the lvmvol object from the moment it was queried, so apiserver can't apply updates with older resource version (optimistic locking).

Another question that may arise it who is updating the lvmvol object since only openebs (csi controller and node agents) manages its lifecycle. It's the garbage collection controller removing a foregroundDeletion finalizer from the object. Here are the relevant kube apiserver side audit logs

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"7c8f1475-3281-4c90-8046-412407ee2d45","stage":"ResponseComplete","requestURI":"/apis/local.openebs.io/v1alpha1/namespaces/openebs/lvmvolumes/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345","verb":"delete","user":{"username":"system:serviceaccount:kube-system:openebs-lvm-controller-sa","uid":"95300a2c-253f-4370-84cf-3b946909b24f","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"sourceIPs":["10.51.35.67"],"userAgent":"lvm-driver/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"lvmvolumes","namespace":"openebs","name":"pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345","apiGroup":"local.openebs.io","apiVersion":"v1alpha1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2021-01-29T16:53:47.152986Z","stageTimestamp":"2021-01-29T16:53:47.160389Z","annotations":{"authentication.k8s.io/legacy-token":"system:serviceaccount:kube-system:openebs-lvm-controller-sa","authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"openebs-lvm-provisioner-binding\" of ClusterRole \"openebs-lvm-provisioner-role\" to ServiceAccount \"openebs-lvm-controller-sa/kube-system\""}}
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"b2673239-4b9e-4cec-a51d-ad546ac4cc28","stage":"ResponseComplete","requestURI":"/apis/local.openebs.io/v1alpha1/namespaces/openebs/lvmvolumes/pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345","verb":"patch","user":{"username":"system:serviceaccount:kube-system:generic-garbage-collector","uid":"076e8568-3c2e-4423-9ba7-8034f42d51db","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"sourceIPs":["10.50.147.64"],"userAgent":"kube-controller-manager/v1.20.1 (linux/amd64) kubernetes/c4d7527/system:serviceaccount:kube-system:generic-garbage-collector","objectRef":{"resource":"lvmvolumes","namespace":"openebs","name":"pvc-6f3e71c1-e4d2-4e00-9823-8efece3c3345","apiGroup":"local.openebs.io","apiVersion":"v1alpha1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2021-01-29T16:53:47.171712Z","stageTimestamp":"2021-01-29T16:53:47.179197Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:controller:generic-garbage-collector\" of ClusterRole \"system:controller:generic-garbage-collector\" to ServiceAccount \"generic-garbage-collector/kube-system\""}}

In nutshell, we need to check if the lv exists before trying to delete it thus making controller idempotent. If looks good, let me know if I can create a pull request to fix this.

Environment:

  • LVM Driver version: 4.35.0
  • Kubernetes version (use kubectl version): v1.20.1
  • Kubernetes installer & version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): Debian 9
@pawanpraka1
Copy link
Contributor

pawanpraka1 commented Jan 29, 2021

In nutshell, we need to check if the lv exists before trying to delete it thus making controller idempotent. If looks good, let me know if I can create a pull request to fix this.

I had planned to do this. Thanks for reporting this issue. Please go ahead and raise the pull request.

iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 1, 2021
iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 1, 2021
iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 1, 2021
iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 1, 2021
iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 2, 2021
@pawanpraka1 pawanpraka1 added the bug Something isn't working label Feb 2, 2021
@pawanpraka1 pawanpraka1 added this to the v0.2 milestone Feb 2, 2021
iyashu added a commit to iyashu/lvm-localpv that referenced this issue Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants