-
Notifications
You must be signed in to change notification settings - Fork 97
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
env
Platform:
- Platform: Linux-5.15.0-1056-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
Python packages:
- `optimum-neuron` version: 0.0.25.dev0
- `neuron-sdk` version: 2.20.0
- `optimum` version: 1.21.4
- `transformers` version: 4.43.2
- `huggingface_hub` version: 0.25.0
- `torch` version: 2.1.2+cu121
- `aws-neuronx-runtime-discovery` version: 2.9
- `libneuronxla` version: 2.0.4115.0
- `neuronx-cc` version: 2.15.128.0+56dc5a86
- `neuronx-distributed` version: NA
- `neuronx-hwm` version: NA
- `torch-neuronx` version: 2.1.2.2.3.0
- `torch-xla` version: 2.1.4
- `transformers-neuronx` version: 0.12.313
Neuron Driver:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
aws-neuronx-collectives/now 2.22.26.0-17a033bc8 amd64 [installed,local]
aws-neuronx-dkms/now 2.18.12.0 amd64 [installed,local]
aws-neuronx-runtime-lib/now 2.22.14.0-6e27b8d5b amd64 [installed,local]
aws-neuronx-tools/now 2.19.0.0 amd64 [installed,local]Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction (minimal, reproducible, runnable)
When I use the newly merged TGI image to compile with optimum-cli, I get an error message.
I haven't been able to test it without TGI because I am having trouble upgrading my image to 2.20.
!git clone https://github.com/huggingface/optimum-neuron.git && cd optimum-neuron && make neuronx-tgi
REPOSITORY TAG IMAGE ID CREATED SIZE
neuronx-tgi 0.0.25.dev0 165727a580ea 14 hours ago 11.6GB
neuronx-tgi latest 165727a580ea 14 hours ago 11.6GB
nginx alpine c7b4f26a7d93 5 weeks ago 43.2MB
!docker run -p 8080:80 \
-v $(pwd):/data \
--device=/dev/neuron0 \
--device=/dev/neuron1 \
--device=/dev/neuron2 \
--device=/dev/neuron3 \
--device=/dev/neuron4 \
--device=/dev/neuron5 \
-ti \
--entrypoint "optimum-cli" neuronx-tgi:latest \
export neuron --model NousResearch/Llama-2-7b-chat-hf \
--sequence_length 4096 \
--batch_size 4 \
--num_cores 8 \
/data/exportedmodel/
Error:
Downloading shards: 100%|█████████████████████████| 2/2 [00:11<00:00, 5.81s/it]
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:00<00:00, 7.93it/s]
generation_config.json: 100%|██████████████████| 200/200 [00:00<00:00, 2.17MB/s]
2024-09-20 13:33:14.000539: 136 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000915: 766 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py:198: SyntaxWarning: str format compiler_flags is discouraged as its handling involves repeated joining and splitting, which can easily make mistakes if something is quoted or escaped. Use list[str] instead. Refer to documentation of the Python subprocess module for details.
warnings.warn(SyntaxWarning(
2024-09-20 13:33:37.000945: 767 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_b65d76e18d6cf6a5fd70+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000478: 766 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.done not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
neuronxcc-2.15.128.0+56dc5a86/MODULE_4bcb2a4fdd83da295490+39f12043/model.log not found in aws-neuron/optimum-neuron-cache: the corresponding graph will be recompiled. This may take up to one hour for large models.
2024-09-20 13:33:38.000848: 767 INFO ||NEURON_CC_WRAPPER||: Call compiler with cmd: neuronx-cc compile --target=trn1 --framework=XLA /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb --output /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff --model-type=transformer --auto-cast=none --execute-repetition=1 --verbose=35
................
2024-09-20 13:36:05.000005: 766 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:35:59Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-20 13:36:05.000005: 766 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:36:05.000006: 766 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
...
2024-09-20 13:37:22.000177: 767 ERROR ||NEURON_CC_WRAPPER||: Failed compilation with ['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']: 2024-09-20T13:37:16Z /usr/local/lib/python3.10/dist-packages/neuronxcc/starfish/bin/walrus_driver: error while loading shared libraries: libxml2.so.2: cannot open shared object file: No such file or directory
2024-09-20 13:37:22.000177: 767 ERROR ||NEURON_CC_WRAPPER||: Compilation failed for /tmp/no-user/neuroncc_compile_workdir/3d7a7a87-e699-48ea-9d8c-4efe21bcfd19/model.MODULE_4bcb2a4fdd83da295490+39f12043.hlo_module.pb after 0 retries.
2024-09-20 13:37:22.000178: 767 INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.10/concurrent/futures/process.py", line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 500, in compile
self.build(num_exec_repetition)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 507, in build
self.neff_bytes = compile_hlo_module(self.hlo_module, self.tag, num_exec_repetition)
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/compiler.py", line 144, in compile_hlo_module
neff_bytes = neuron_xla_compile(module_bytes, flags, input_format="hlo", platform_target="trn1",
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 210, in neuron_xla_compile
neuron_xla_compile_impl(
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 269, in neuron_xla_compile_impl
return compile_cache_entry(output, entry, execution_mode,
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 186, in compile_cache_entry
raise (e)
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 165, in compile_cache_entry
ret = call_neuron_compiler(
File "/usr/local/lib/python3.10/dist-packages/libneuronxla/neuron_cc_wrapper.py", line 109, in call_neuron_compiler
raise subprocess.CalledProcessError(res.returncode, cmd, stderr=error_info)
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 737, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 690, in main
decoder_export(
File "/usr/local/lib/python3.10/dist-packages/optimum/exporters/neuron/__main__.py", line 655, in decoder_export
model = NeuronModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/optimum/modeling_base.py", line 420, in from_pretrained
return from_pretrained_method(
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 331, in _from_transformers
return cls._export(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 382, in _export
return cls(new_config, checkpoint_dir, generation_config=generation_config)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling.py", line 1254, in __init__
super().__init__(config, checkpoint_dir, compiled_dir=compiled_dir, generation_config=generation_config)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/utils/require_utils.py", line 51, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/optimum/neuron/modeling_decoder.py", line 215, in __init__
neuronx_model.to_neuron()
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 85, in to_neuron
self.compile()
File "/usr/local/lib/python3.10/dist-packages/transformers_neuronx/base.py", line 64, in compile
kernel.neff_bytes = neff_bytes_futures[hash_hlo(kernel.hlo_module)].result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
subprocess.CalledProcessError: Command '['neuronx-cc', 'compile', '--target=trn1', '--framework=XLA', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.hlo_module.pb', '--output', '/tmp/no-user/neuroncc_compile_workdir/204e2c1a-846c-4f4d-b20a-6ae1d1f46bef/model.MODULE_b65d76e18d6cf6a5fd70+39f12043.neff', '--model-type=transformer', '--auto-cast=none', '--execute-repetition=1', '--verbose=35']' returned non-zero exit status 70.
Traceback (most recent call last):
File "/usr/local/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/usr/local/lib/python3.10/dist-packages/optimum/commands/export/neuronx.py", line 298, in run
subprocess.run(full_command, shell=True, check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model NousResearch/Llama-2-7b-chat-hf --sequence_length 4096 --batch_size 4 --num_cores 8 /data/exportedmodel/' returned non-zero exit status 1.
Expected behavior
I expect the command to successfully compile
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working