Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: can't set attribute 'deepspeed_plugin' #735

Open
2 of 4 tasks
anushka0415 opened this issue Nov 14, 2024 · 3 comments
Open
2 of 4 tasks

AttributeError: can't set attribute 'deepspeed_plugin' #735

anushka0415 opened this issue Nov 14, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@anushka0415
Copy link

System Info

accelerate                    1.1.1
neuronx-cc                    2.14.227.0+2d4f85be
neuronx-distributed           0.8.0
neuronx-distributed-training  1.0.0
optimum                       1.22.0
optimum-neuron                0.0.25
torch                         2.1.2
torch-neuronx                 2.1.2.2.3.1
torch-xla                     2.1.4
torchvision                   0.16.2
triton                        2.1.0
trl                           0.12.1

Who can help?

@michaelbenayoun @JingyaHuang

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

set -ex

export NEURON_FUSE_SOFTMAX=1
export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
export MALLOC_ARENA_MAX=64
export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cach>
PROCESSES_PER_NODE=2

NUM_EPOCHS=1
TP_DEGREE=2
PP_DEGREE=1

BS=1
GRADIENT_ACCUMULATION_STEPS=8
LOGGING_STEPS=1
MODEL_NAME="meta-llama/Meta-Llama-3-8B"
OUTPUT_DIR=output-$SLURM_JOB_ID

if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi

XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE train.py
--model_id $MODEL_NAME
--num_train_epochs $NUM_EPOCHS
--do_train
--learning_rate 5e-5
--warmup_ratio 0.03
--max_steps $MAX_STEPS
--per_device_train_batch_size $BS
--per_device_eval_batch_size $BS
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS
--gradient_checkpointing true
--bf16
--zero_1 false
--tensor_parallel_size $TP_DEGREE
--pipeline_parallel_size $PP_DEGREE
--logging_steps $LOGGING_STEPS
--save_total_limit 1
--output_dir $OUTPUT_DIR
--lr_scheduler_type "constant"
--overwrite_output_dir

Expected behavior

compilation should pass.

@anushka0415 anushka0415 added the bug Something isn't working label Nov 14, 2024
@anushka0415
Copy link
Author

Traceback (most recent call last):
File "/home/ubuntu/bobble-poc/train_example/train.py", line 112, in
main()
File "/home/ubuntu/bobble-poc/train_example/train.py", line 108, in main
training_function(script_args, training_args)
File "/home/ubuntu/bobble-poc/train_example/train.py", line 76, in training_function
trainer = NeuronSFTTrainer(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1753, in init
super().init(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 179, in init
super().init(*args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 1514, in init
return Trainer.init(self, *args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/utils/deprecation.py", line 165, in wrapped_func
return func(*args, **kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/transformers/trainer.py", line 430, in init
self.create_accelerator_and_postprocess()
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/trainers.py", line 279, in create_accelerator_and_postprocess
self.accelerator = NeuronAccelerator(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/accelerator.py", line 153, in init
super().init(**full_kwargs)
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/accelerate/accelerator.py", line 415, in init
self.state = AcceleratorState(
File "/opt/aws_neuronx_venv_pytorch_2_1/lib/python3.10/site-packages/optimum/neuron/accelerate/state.py", line 151, in init
self.deepspeed_plugin = None
AttributeError: can't set attribute 'deepspeed_plugin'

@vedant123454
Copy link

Issue: Incorrect Variable Name in state.py

In the file state.py, at line 151, the code currently sets:

self.deepspeed_plugin = None

This should be corrected to:

self.deepspeed_plugins = None

make the changes in the repo and build it from source

@michaelbenayoun
Copy link
Member

@vedant123454's solution might work.

As accelerate is a fast moving library, and we extend it quite a bit in optimum-neuron to make everything work, we actually bump the version for every release. Right now, the officially supported version for accelerate is 0.29.2 but 1.1.1 is installed on your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants