Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeuronModelForSentenceTransformers and NeuronStableDiffusionPipeline are not compiling with Neuron SDK 2.19 and onward #710

Closed
yahavb opened this issue Oct 6, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@yahavb
Copy link
Contributor

yahavb commented Oct 6, 2024

System Info

docker run -it --device=/dev/neuron0 -d -v /home/ubuntu/:/home/ubuntu/ 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.20.0-ubuntu20.04  bash
root@7938e622e5ed:/# python --version
Python 3.10.12
root@7938e622e5ed:/# pip list | grep neuron
aws-neuronx-runtime-discovery 2.9
libneuronxla                  2.0.4115.0
neuronx-cc                    2.14.227.0+2d4f85be
neuronx-distributed           0.8.0
optimum-neuron                0.0.24
torch-neuronx                 2.1.2.2.2.0
transformers-neuronx          0.11.351

when attempting to compile a model like https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/sentence_transformers#sentence-transformers-on-aws-inferentia-with-optimum-neuron or https://huggingface.co/docs/optimum-neuron/en/inference_tutorials/stable_diffusion

The compilation aborts with the error log:

2024-10-06 15:59:34.984051: F external/xla/xla/parse_flags_from_env.cc:224] Unknown flags in XLA_FLAGS: --xla_gpu_simplify_all_fp_conversions=false  --xla_gpu_force_compilation_parallelism=8


### Who can help?

@JingyaHuang @dacorvo

Thanks for the help!!!!

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction (minimal, reproducible, runnable)

root@7938e622e5ed:/# python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from optimum.neuron import NeuronModelForSentenceTransformers
model_id = "BAAI/bge-small-en-v1.5"
input_shapes = {"batch_size": 1, "sequence_length": 384}
model = NeuronModelForSentenceTransformers.from_pretrained(model_id, export=True, **input_shapes)


and 

root@7938e622e5ed:/# python
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from optimum.neuron import NeuronStableDiffusionPipeline
compiler_args = {"auto_cast": "none", "auto_cast_type": "bf16","inline_weights_to_neff": "True"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512}
pipe=NeuronStableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", export=True, **compiler_args, **input_shapes)


In both cases the process terminated with core dump with no logs but the log line:

2024-10-06 16:27:46.795083: F external/xla/xla/parse_flags_from_env.cc:224] Unknown flags in XLA_FLAGS: --xla_gpu_simplify_all_fp_conversions=false --xla_gpu_force_compilation_parallelism=8


### Expected behavior

the model graph is compiled and ready to be pushed to HF model repo. 
@yahavb yahavb added the bug Something isn't working label Oct 6, 2024
@dacorvo
Copy link
Collaborator

dacorvo commented Oct 7, 2024

@yahavb thank you for your feedback. optimum-neuron requires pytorch 2.1.2: this is why you had this error message (although I agree it is not crystal clear).
To make sure you have all the correct prerequisites, if you have the ability to switch to a different region, you could use the Hugging Face Deep Learning AMI which is available in us-east-1 (ami-0271953de6aa28bdb).

@yahavb
Copy link
Contributor Author

yahavb commented Oct 7, 2024

Thanks for the speedy response. I upgraded pytroch to 2.1.2+cu121 but that was not enough. I had to unset the xla flags before.

Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from optimum.neuron import NeuronStableDiffusionPipeline
>>> import os
>>> 
>>> # Unset GPU-specific flags
>>> os.environ["XLA_FLAGS"] = ""
>>> os.environ["TF_XLA_FLAGS"] = ""
>>> 
>>> compiler_args = {"auto_cast": "none", "auto_cast_type": "bf16","inline_weights_to_neff": "True"}
>>> input_shapes = {"batch_size": 1, "height": 512, "width": 512}
>>> pipe=NeuronStableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", export=True, **compiler_args, **input_shapes)
Keyword arguments {'subfolder': '', 'use_auth_token': None, 'trust_remote_code': False} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 15.16it/s]
Applying optimized attention score computation for stable diffusion.
2024-10-07 16:36:41.000556:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207 not found in aws-neuron/optimum-neuron-cache: 404 Client Error. (Request ID: Root=1-67040e19-24c026d559cf363e7d04f12a;7affd799-aacb-4e62-97cd-86934d0ab690)

Entry Not Found for url: https://huggingface.co/api/models/aws-neuron/optimum-neuron-cache/tree/main/neuronxcc-2.14.227.0%2B2d4f85be%2FMODULE_de2af93f5bde2beab207?recursive=True&expand=False.
neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207 does not exist on "main" 
The model will be recompiled.
Keyword arguments {'subfolder': '', 'use_auth_token': None, 'trust_remote_code': False} are not expected by StableDiffusionPipeline and will be ignored.
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 15.78it/s]
Applying optimized attention score computation for stable diffusion.
***** Compiling text_encoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
...
Compiler status PASS
[Compilation Time] 68.27 seconds.
***** Compiling unet *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
..............
Compiler status PASS
[Compilation Time] 297.54 seconds.
***** Compiling vae_encoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
.........
Compiler status PASS
[Compilation Time] 181.68 seconds.
***** Compiling vae_decoder *****
Using Neuron: --auto-cast none
Using Neuron: --auto-cast-type bf16
..............
Compiler status PASS
[Compilation Time] 274.88 seconds.
[Total compilation Time] 822.37 seconds.
2024-10-07 16:50:25.000242:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-10-07 16:50:25.000243:  376  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_de2af93f5bde2beab207.
Loading only U-Net into both Neuron Cores...

DO we know who set these flags? I tried with older DLC (deep learning containers images) and we did not needed to set those flags.

@dacorvo
Copy link
Collaborator

dacorvo commented Oct 8, 2024

gently pinging @JingyaHuang on this.

@JingyaHuang
Copy link
Collaborator

Would it be possible that you set up these flags while debugging? (like for snapshotting hlo?) @yahavb. Thes flags are not set by Optimum Neuron, and in my dev env they are not set up if I install optimum neuron in a clean environment.

@yahavb
Copy link
Contributor Author

yahavb commented Oct 9, 2024

I don't. could be from the DLC. No idea but I will close it as I am unblocked.

@yahavb yahavb closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants