Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow PersistenVolume creation for multiple created PVC (e.g. through STS template) when using multiple local storage nodes #347

Open
polskomleko opened this issue Nov 11, 2024 · 1 comment

Comments

@polskomleko
Copy link

What steps did you take and what happened:
We're testing OpenEBS LVM Local PV provider as a local storage solution for specific workloads in our Openshift environment.
We've added a VMDK to multiple nodes to be used for logical volumes and created the same VG on each node.
The provider is deployed using Helm chart with custom image registry (due to necessity to use OpenEBS 3.10 since we're using an outdated Openshift deployment in a restricted environment)

StorageClass used for PVC (hostnames are redacted, but use only the nodes that are used for local storage):

allowVolumeExpansion: true
allowedTopologies:
- matchLabelExpressions:
  - key: kubernetes.io/hostname
    values:
    - worker01.example.com
    - worker02.example.com
    - worker03.example.com
    - worker04.example.com
    - worker10.example.com
    - worker11.example.com
    - worker12.example.com
    - worker13.example.com
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-lvmpv
parameters:
  storage: lvm
  volgroup: ebsvg
provisioner: local.csi.openebs.io
reclaimPolicy: Delete
volumeBindingMode: Immediate

Here's an excerpt with PVC template from a statefulset we're using for testing:

  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: pv-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 4Gi
      storageClassName: openebs-lvmpv
      volumeMode: Filesystem

When we create a number of PVCs by scaling the STS, it takes a while for PVs to be provisioned. While a subset of them are created and succefully bounded to PVC almost immediately, the other PVCs take more time than expected, and the time to provision PVs increases with amount of nodes needed to accomadate all the requests.
According to our tests, it can take up to 5 minutes to provision the storage to satisfy the requests depending on the number of nodes needed (we've test with up to 8)
This behaviour also spreads to newly created storage requests as it throws the same "ResourceExhausted" error in logs until it gets to a node with free space in a volume group.
My main question is - is this behaviour expected? It seems like the provider controller doesn't aggregate the information about space left in volume groups of the provisioning nodes or does not respect LVMNode custom resource information at all, and just goes sequentially to the nodes in StorageClass list and tries to provision a PV on each until it reaches some kind of timeout and moves to another node.

What did you expect to happen:
All persistent volumes are provisioned and bounded to corresponding claims all the same time/reasonable amount of time.

The output of the following commands will help us better understand what's going on:
(Pasting long output into a GitHub gist or other Pastebin is fine.)

I don't believe there is any relavent info in the logs except for errors like:

E1107 13:50:36.474552       1 grpc.go:79] GRPC error: rpc error: code = ResourceExhausted desc = no vg available to serve volume request having regex="^ebsvg$" & capacity="4294967296"

Anything else you would like to add:
If any other logs are needed, I can provided it at request.

Environment:

  • LVM Driver version:
  LVM version:     2.03.11(2)-RHEL8 (2021-01-28)
  Library version: 1.02.175-RHEL8 (2021-01-28)
  Driver version:  4.43.0
  Configuration:   ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-default-dm-run-dir=/run --with-default-run-dir=/run/lvm --with-default-pid-dir=/run --with-default-locking-dir=/run/lock/lvm --with-usrlibdir=/usr/lib64 --enable-fsadm --enable-write_install --with-user= --with-group= --with-device-uid=0 --with-device-gid=6 --with-device-mode=0660 --enable-pkgconfig --enable-cmdlib --enable-dmeventd --enable-blkid_wiping --with-cluster=internal --enable-cmirrord --with-udevdir=/usr/lib/udev/rules.d --enable-udev_sync --with-thin=internal --with-cache=internal --enable-lvmpolld --enable-lvmlockd-dlm --enable-lvmlockd-dlmcontrol --enable-lvmlockd-sanlock --enable-dbus-service --enable-notify-dbus --enable-dmfilemapd --with-writecache=internal --with-vdo=internal --with-vdo-format=/usr/bin/vdoformat --with-integrity=internal --disable-silent-rules
  • Kubernetes and Openshift version:
Server Version: 4.10.67
Kubernetes Version: v1.23.17+26fdcdf
  • Cloud provider or hardware configuration: we're using VMWare VCenter 7.0.3.01700 cluster as virtualization provider for the Openshift cluster and virtualized storage
  • OS (e.g. from /etc/os-release): Red Hat CoreOS 4.10
@abhilashshetty04
Copy link
Contributor

Hi @polskomleko , Thanks for reporting the issue. We will reproduce this and let you know our findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants