AttributeError: can't set attribute 'deepspeed_plugin' #735

anushka0415 · 2024-11-14T12:38:29Z

System Info

accelerate                    1.1.1
neuronx-cc                    2.14.227.0+2d4f85be
neuronx-distributed           0.8.0
neuronx-distributed-training  1.0.0
optimum                       1.22.0
optimum-neuron                0.0.25
torch                         2.1.2
torch-neuronx                 2.1.2.2.3.1
torch-xla                     2.1.4
torchvision                   0.16.2
triton                        2.1.0
trl                           0.12.1

Who can help?

@michaelbenayoun @JingyaHuang

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

set -ex

export NEURON_FUSE_SOFTMAX=1
export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
export MALLOC_ARENA_MAX=64
export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cach>
PROCESSES_PER_NODE=2

NUM_EPOCHS=1
TP_DEGREE=2
PP_DEGREE=1

BS=1
GRADIENT_ACCUMULATION_STEPS=8
LOGGING_STEPS=1
MODEL_NAME="meta-llama/Meta-Llama-3-8B"
OUTPUT_DIR=output-$SLURM_JOB_ID

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi

XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE train.py
--model_id $MODEL_NAME
--num_train_epochs $NUM_EPOCHS
--do_train
--learning_rate 5e-5
--warmup_ratio 0.03
--max_steps $MAX_STEPS
--per_device_train_batch_size $BS
--per_device_eval_batch_size $BS
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS
--gradient_checkpointing true
--bf16
--zero_1 false
--tensor_parallel_size $TP_DEGREE
--pipeline_parallel_size $PP_DEGREE
--logging_steps $LOGGING_STEPS
--save_total_limit 1
--output_dir $OUTPUT_DIR
--lr_scheduler_type "constant"
--overwrite_output_dir

Expected behavior

compilation should pass.

The text was updated successfully, but these errors were encountered:

anushka0415 · 2024-11-14T12:39:25Z

Traceback (most recent call last):
File "/home/ubuntu/bobble-poc/train_example/train.py", line 112, in
main()
File "/home/ubuntu/bobble-poc/train_example/train.py", line 108, in main
training_function(script_args, training_args)
File "/home/ubuntu/bobble-poc/train_example/train.py", line 76, in training_function
trainer = NeuronSFTTrainer(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1753, in init
super().init(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 179, in init
super().init(*args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1514, in init
return Trainer.init(self, *args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
return func(*args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/trainer.py", line 430, in init
self.create_accelerator_and_postprocess()
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 279, in create_accelerator_and_postprocess
self.accelerator = NeuronAccelerator(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/accelerator.py", line 153, in init
super().init(**full_kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/accelerate/accelerator.py", line 415, in init
self.state = AcceleratorState(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/state.py", line 151, in init
self.deepspeed_plugin = None
AttributeError: can't set attribute 'deepspeed_plugin'

vedant123454 · 2024-11-14T13:04:01Z

Issue: Incorrect Variable Name in `state.py`

In the file state.py, at line 151, the code currently sets:

self.deepspeed_plugin = None

This should be corrected to:

self.deepspeed_plugins = None

make the changes in the repo and build it from source

michaelbenayoun · 2024-11-19T14:27:51Z

@vedant123454's solution might work.

As accelerate is a fast moving library, and we extend it quite a bit in optimum-neuron to make everything work, we actually bump the version for every release. Right now, the officially supported version for accelerate is 0.29.2 but 1.1.1 is installed on your system.

anushka0415 added the bug Something isn't working label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: can't set attribute 'deepspeed_plugin' #735

AttributeError: can't set attribute 'deepspeed_plugin' #735

anushka0415 commented Nov 14, 2024

anushka0415 commented Nov 14, 2024

vedant123454 commented Nov 14, 2024

michaelbenayoun commented Nov 19, 2024

AttributeError: can't set attribute 'deepspeed_plugin' #735

AttributeError: can't set attribute 'deepspeed_plugin' #735

Comments

anushka0415 commented Nov 14, 2024

System Info

Who can help?

Information

Tasks

Reproduction (minimal, reproducible, runnable)

Expected behavior

anushka0415 commented Nov 14, 2024

vedant123454 commented Nov 14, 2024

Issue: Incorrect Variable Name in state.py

michaelbenayoun commented Nov 19, 2024

Issue: Incorrect Variable Name in `state.py`