Skip to content

optimum neuron optimum-cli compile fails in newest TGI Neuron #698

@jimburtoft

Description

@jimburtoft

System Info

!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
env


Platform:

- Platform: Linux-5.15.0-1056-aws-x86_64-with-glibc2.35
- Python version: 3.10.12


Python packages:

- `optimum-neuron` version: 0.0.25.dev0
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.21.4
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.25.0
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: NA
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313


Neuron Driver:


WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]

Who can help?

@dacorvo

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

When I use the newly merged TGI image to compile with optimum-cli, I get an error message.

I haven't been able to test it without TGI because I am having trouble upgrading my image to 2.20.

!git clone https://github.com/huggingface/optimum-neuron.git && cd optimum-neuron && make neuronx-tgi

REPOSITORY    TAG           IMAGE ID       CREATED        SIZE
neuronx-tgi   0.0.25.dev0   165727a580ea   14 hours ago   11.6GB
neuronx-tgi   latest        165727a580ea   14 hours ago   11.6GB
nginx         alpine        c7b4f26a7d93   5 weeks ago    43.2MB
!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
export neuron --model NousResearch/Llama-2-7b-chat-hf \
--sequence_length 4096 \
--batch_size 4 \
--num_cores 8 \
/data/exportedmodel/

Error:

Downloading shards: 100%|█████████████████████████| 2/2 [00:11<00:00,  5.81s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:00<00:00,  7.93it/s]
generation_config.json: 100%|██████████████████| 200/200 [00:00<00:00, 2.17MB/s]
2024-09-20 13:33:14.000539:  136  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000915:  766  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
  warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000945:  767  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000478:  766  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000848:  767  INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
................
2024-09-20 13:36:05.000005:  766  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:35:59Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-20 13:36:05.000005:  766  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:36:05.000006:  766  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
...
2024-09-20 13:37:22.000177:  767  ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:37:16Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory

2024-09-20 13:37:22.000177:  767  ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:37:22.000178:  767  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 500, in compile
    self.build(num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 507, in build
    self.neff_bytes = compile_hlo_module(self.hlo_module, self.tag, num_exec_repetition)
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 144, in compile_hlo_module
    neff_bytes = neuron_xla_compile(module_bytes, flags, input_format="hlo", platform_target="trn1",
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 210, in neuron_xla_compile
    neuron_xla_compile_impl(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 269, in neuron_xla_compile_impl
    return compile_cache_entry(output, entry, execution_mode,
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 186, in compile_cache_entry
    raise (e)
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 165, in compile_cache_entry
    ret = call_neuron_compiler(
  File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 109, in call_neuron_compiler
    raise subprocess.CalledProcessError(res.returncode, cmd, stderr=error_info)
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 737, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 690, in main
    decoder_export(
  File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 655, in decoder_export
    model = NeuronModelForCausalLM.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/optimum/modeling_base.py", line 420, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 331, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 382, in _export
    return cls(new_config, checkpoint_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling.py", line 1254, in __init__
    super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 215, in __init__
    neuronx_model.to_neuron()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 85, in to_neuron
    self.compile()
  File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 64, in compile
    kernel.neff_bytes = neff_bytes_futures[hash_hlo(kernel.hlo_module)].result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
    return self.__get_result()
  File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
Traceback (most recent call last):
  File "/usr/local/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/neuronx.py", line 298, in run
    subprocess.run(full_command, shell=True, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model NousResearch/Llama-2-7b-chat-hf --sequence_length 4096 --batch_size 4 --num_cores 8 /data/exportedmodel/' returned non-zero exit status 1.

Expected behavior

I expect the command to successfully compile

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions