NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

yahavb · 2024-10-06T16:30:17Z

System Info

docker run -it --device=/dev/neuron0 -d -v /home/ubuntu/:/home/ubuntu/ 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.20.0-ubuntu20.04  bash
root@7938e622e5ed:/# python --version
Python 3.10.12
root@7938e622e5ed:/# pip list | grep neuron
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.4115.0
neuronx-cc                    2.14.227.0+2d4f85be
neuronx-distributed           0.8.0
optimum-neuron                0.0.24
torch-neuronx                 2.1.2.2.2.0
transformers-neuronx          0.11.351

when attempting to compile a model like https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/sentence_transformers#sentence-transformers-on-aws-inferentia-with-optimum-neuron or https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/stable_diffusion

The compilation aborts with the error log:

2024-10-06 15:59:34.984051: F external/xla/xla/parse_flags_from_env.cc:224] Unknown flags in XLA_FLAGS: --xla_gpu_simplify_all_fp_conversions=false  --xla_gpu_force_compilation_parallelism=8



### Who can help?

@JingyaHuang @dacorvo

Thanks for the help!!!!

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

root@7938e622e5ed:/# python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from optimum.neuron import NeuronModelForSentenceTransformers
model_id = "BAAI/bge-small-en-v1.5"
input_shapes = {"batch_size": 1, "sequence_length": 384}
model = NeuronModelForSentenceTransformers.from_pretrained(model_id, export=True, **input_shapes)

and

root@7938e622e5ed:/# python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from optimum.neuron import NeuronStableDiffusionPipeline
compiler_args = {"auto_cast": "none", "auto_cast_type": "bf16","inline_weights_to_neff": "True"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipe=NeuronStableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", export=True, **compiler_args, **input_shapes)


In both cases the process terminated with core dump with no logs but the log line:

2024-10-06 16:27:46.795083: F external/xla/xla/parse_flags_from_env.cc:224] Unknown flags in XLA_FLAGS: --xla_gpu_simplify_all_fp_conversions=false --xla_gpu_force_compilation_parallelism=8


### Expected behavior

the model graph is compiled and ready to be pushed to HF model repo.

The text was updated successfully, but these errors were encountered:

dacorvo · 2024-10-07T07:49:44Z

@yahavb thank you for your feedback. optimum-neuron requires pytorch 2.1.2: this is why you had this error message (although I agree it is not crystal clear).
To make sure you have all the correct prerequisites, if you have the ability to switch to a different region, you could use the Hugging Face Deep Learning AMI which is available in us-east-1 (ami-0271953de6aa28bdb).

yahavb · 2024-10-07T17:13:36Z

Thanks for the speedy response. I upgraded pytroch to 2.1.2+cu121 but that was not enough. I had to unset the xla flags before.

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> import os
>>> 
>>> # Unset GPU-specific flags
>>> os.environ["XLA_FLAGS"] = ""
>>> os.environ["TF_XLA_FLAGS"] = ""
>>> 
>>> compiler_args = {"auto_cast": "none", "auto_cast_type": "bf16","inline_weights_to_neff": "True"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> pipe=NeuronStableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", export=True, **compiler_args, **input_shapes)
Keyword arguments {'subfolder': '', 'use_auth_token': None, 'trust_remote_code': False} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 15.16it/s]
Applying optimized attention score computation for stable diffusion.
2024-10-07 16:36:41.000556:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207 not found in aws-neuron/optimum-neuron-cache: 404 Client Error. (Request ID: Root=1-67040e19-24c026d559cf363e7d04f12a;7affd799-aacb-4e62-97cd-86934d0ab690)

Entry Not Found for url: https://huggingface.co/api/models/aws-neuron/optimum-neuron-cache/tree/main/neuronxcc-2.14.227.0%2B2d4f85be%2FMODULE_de2af93f5bde2beab207?recursive=True&expand=False.
neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207 does not exist on "main" 
The model will be recompiled.
Keyword arguments {'subfolder': '', 'use_auth_token': None, 'trust_remote_code': False} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 15.78it/s]
Applying optimized attention score computation for stable diffusion.
***** Compiling text_encoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
...
Compiler status PASS
[Compilation Time] 68.27 seconds.
***** Compiling unet *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
..............
Compiler status PASS
[Compilation Time] 297.54 seconds.
***** Compiling vae_encoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
.........
Compiler status PASS
[Compilation Time] 181.68 seconds.
***** Compiling vae_decoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
..............
Compiler status PASS
[Compilation Time] 274.88 seconds.
[Total compilation Time] 822.37 seconds.
2024-10-07 16:50:25.000242:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-07 16:50:25.000243:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207.
Loading only U-Net into both Neuron Cores...

DO we know who set these flags? I tried with older DLC (deep learning containers images) and we did not needed to set those flags.

dacorvo · 2024-10-08T09:57:36Z

gently pinging @JingyaHuang on this.

JingyaHuang · 2024-10-09T13:09:22Z

Would it be possible that you set up these flags while debugging? (like for snapshotting hlo?) @yahavb. Thes flags are not set by Optimum Neuron, and in my dev env they are not set up if I install optimum neuron in a clean environment.

yahavb · 2024-10-09T18:35:56Z

I don't. could be from the DLC. No idea but I will close it as I am unblocked.

yahavb added the bug Something isn't working label Oct 6, 2024

yahavb closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

yahavb commented Oct 6, 2024

dacorvo commented Oct 7, 2024

yahavb commented Oct 7, 2024

dacorvo commented Oct 8, 2024

JingyaHuang commented Oct 9, 2024

yahavb commented Oct 9, 2024

NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

Comments

yahavb commented Oct 6, 2024

System Info

dacorvo commented Oct 7, 2024

yahavb commented Oct 7, 2024

dacorvo commented Oct 8, 2024

JingyaHuang commented Oct 9, 2024

yahavb commented Oct 9, 2024