-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError: can't set attribute 'deepspeed_plugin' #735
Comments
Traceback (most recent call last): |
Issue: Incorrect Variable Name in
|
@vedant123454's solution might work. As |
System Info
Who can help?
@michaelbenayoun @JingyaHuang
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
set -ex
export NEURON_FUSE_SOFTMAX=1
export NEURON_RT_ASYNC_EXEC_MAX_INFLIGHT_REQUESTS=3
export MALLOC_ARENA_MAX=64
export NEURON_CC_FLAGS="--model-type=transformer --distribution-strategy=llm-training --enable-saturate-infinity --cach>
PROCESSES_PER_NODE=2
NUM_EPOCHS=1
TP_DEGREE=2
PP_DEGREE=1
BS=1
GRADIENT_ACCUMULATION_STEPS=8
LOGGING_STEPS=1
MODEL_NAME="meta-llama/Meta-Llama-3-8B"
OUTPUT_DIR=output-$SLURM_JOB_ID
if [ "$NEURON_EXTRACT_GRAPHS_ONLY" = "1" ]; then
MAX_STEPS=$((LOGGING_STEPS + 5))
else
MAX_STEPS=-1
fi
XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node $PROCESSES_PER_NODE train.py
--model_id $MODEL_NAME
--num_train_epochs $NUM_EPOCHS
--do_train
--learning_rate 5e-5
--warmup_ratio 0.03
--max_steps $MAX_STEPS
--per_device_train_batch_size $BS
--per_device_eval_batch_size $BS
--gradient_accumulation_steps $GRADIENT_ACCUMULATION_STEPS
--gradient_checkpointing true
--bf16
--zero_1 false
--tensor_parallel_size $TP_DEGREE
--pipeline_parallel_size $PP_DEGREE
--logging_steps $LOGGING_STEPS
--save_total_limit 1
--output_dir $OUTPUT_DIR
--lr_scheduler_type "constant"
--overwrite_output_dir
Expected behavior
compilation should pass.
The text was updated successfully, but these errors were encountered: