Smart MIG Placement

TLDR:
When starting multiple MIG device using pods, they don't get created on one GPU and then the others, but instead it puts them randomly.

Full story:
I have a deployment that is requesting 7 x 1g10gb MIG devices on node with `Nvidia H100 GPU` 
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: abstract-mig-claim
  namespace: nvidia-dra-driver-gpu
  labels:
    app: abstract-mig-claim
spec:
  replicas: 7
  selector:
    matchLabels:
      app: abstract-mig-claim
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: abstract-mig-claim
    spec:
      restartPolicy: Always
      containers:
        - name: abstract-mig-claiming-pod
          image: docker.ops.iszn.cz/ftxt-gpu/cuda:13.1.1-runtime-ubuntu24.04
          command: ["sleep", "6000"]
          resources:
            claims:
            - name: mig-device
              request: mig-10gb
      resourceClaims:
        - name: mig-device
          resourceClaimTemplateName: at-least-10gb-mig-template
```

with ResourceClaimTemplate:
```yaml
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: at-least-10gb-mig-template
spec:
  spec:
    devices:
      requests:
      - name: mig-10gb
        exactly:
          deviceClassName: mig.nvidia.com
          selectors:
          - cel:
              expression: |
                device.capacity['gpu.nvidia.com'].multiprocessors.isGreaterThan(quantity("10"))
                &&
                device.capacity['gpu.nvidia.com'].memory.isGreaterThan(quantity("9Gi"))

      constraints:
      - requests: []
        matchAttribute: "gpu.nvidia.com/parentUUID"
```

and when I apply the deployment. i get the MIG devices randomly assigned instead of them trying to fit where they can and then move to other GPUs
```
GPU 0: NVIDIA H100 PCIe (UUID: GPU-c5dc08af-14bd-8444-4b8c-807a2b927bfc)
  MIG 1g.10gb     Device  0: (UUID: MIG-7bfd1ff3-36c9-5943-801e-ee320b69b080)
  MIG 1g.10gb     Device  1: (UUID: MIG-289084bd-9d24-58a1-b108-bd165d812ba0)
  MIG 1g.10gb     Device  2: (UUID: MIG-d5da8058-8391-52ad-9a82-4d71b789ff88)
GPU 1: NVIDIA H100 PCIe (UUID: GPU-26feb2ed-98d7-c1b0-85cc-9742bde90813)
GPU 2: NVIDIA H100 PCIe (UUID: GPU-406c8304-24fb-4e0f-ac82-8cc39e5deabe)
  MIG 1g.10gb     Device  0: (UUID: MIG-0eb84170-f5b5-5480-bb0b-1a68ec838b22)
GPU 3: NVIDIA H100 PCIe (UUID: GPU-75578d92-6462-98be-dc5e-9d90ec4d5fed)
  MIG 1g.10gb     Device  0: (UUID: MIG-ca1bcc65-fefc-5bb1-9009-76b161bc870b)
  MIG 1g.10gb     Device  1: (UUID: MIG-bc74463a-73bc-5c66-bb5e-78d95757f444)
  MIG 1g.10gb     Device  2: (UUID: MIG-83a994de-db16-5943-9cc4-92eb9951048e)
```

This basically makes most of the GPUs not usable as full by other pods that would require a full GPU (1 full GPU available when it could have been 3) and you would need to separate "MIG-able" and full GPU pods.

I was thinking that logic behind this could be:

New MIG device request comes -> Is there a GPU with MIG device already? if YES -> Can it fit the requested device? if YES -> create the device on that GPU.
If the answer is at any time NO. It moves to other GPU while checking the whole cluster before it creates MIG device on a GPU with no MIG already.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart MIG Placement #932

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Smart MIG Placement #932

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions