You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When trying to load a model after training, which was consolidated with optimum-cli neuron consolidate dolly_llama/tensor_parallel_shards dolly_llama you get an tensor size miss match error.
I tried to fine-tune llama on dolly dataset, the training succeeds with TP=8. Afterwards i consolidated the weights and tried to load the model to make sure it learned the dolly format. For this i tried to load the model with the NeuronModelForCausalLM and AutoModelForCausalLM class, e.g.
Error(s) in loading state_dict for LlamaForCausalLM:
size mismatch formodel.embed_tokens.weight: copying a param with shape torch.Size([32000, 4096]) from checkpoint, the shapein current model is torch.Size([4000, 4096]).
size mismatch forlm_head.weight: copying a param with shape torch.Size([32000, 4096]) from checkpoint, the shapein current model is torch.Size([4000, 4096]).
You may consider adding `ignore_mismatched_sizes=True`in the model `from_pretrained` method.
This feels that there is some error when consolidating the weights from the sharded ones since 32000/8=4000.
The text was updated successfully, but these errors were encountered:
I was able to reproduce this. The vocab_size in the fine-tuned model's config.json is incorrectly set to 4000 and is not adjusted during consolidation. I manually changed this to 32000 and was able to load the checkpoint.
When trying to load a model after training, which was consolidated with
optimum-cli neuron consolidate dolly_llama/tensor_parallel_shards dolly_llama
you get an tensor size miss match error.I tried to fine-tune llama on dolly dataset, the training succeeds with TP=8. Afterwards i consolidated the weights and tried to load the model to make sure it learned the dolly format. For this i tried to load the model with the
NeuronModelForCausalLM
andAutoModelForCausalLM
class, e.g.This leads to the error
This feels that there is some error when consolidating the weights from the sharded ones since 32000/8=4000.
The text was updated successfully, but these errors were encountered: