Skip to content

Zero Loss: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values #58

@akjindal53244

Description

@akjindal53244

Hi GaLore Team, congratulations for the interesting work!

I am trying to fine-tune llama-3 8B model using GaLore but getting this error:
torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values.

  • Interestingly first batch loss is non-zero and all subsequent losses are zero values before training is automatically terminated.

    Full Error Log

    Activated GaLoRE fine-tuning, depending on your model size and hardware, the training might take a while before starting. Please be patient !
    model.layers.0.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.0.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.0.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.0.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.1.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.1.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.1.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.1.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.2.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.2.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.2.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.2.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.3.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.3.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.3.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.3.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.4.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.4.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.4.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.4.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.5.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.5.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.5.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.5.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.6.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.6.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.6.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.6.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.7.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.7.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.7.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.7.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.8.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.8.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.8.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.8.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.9.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.9.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.9.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.9.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.10.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.10.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.10.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.10.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.11.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.11.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.11.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.11.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.12.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.12.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.12.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.12.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.13.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.13.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.13.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.13.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.14.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.14.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.14.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.14.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.15.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.15.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.15.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.15.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.16.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.16.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.16.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.16.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.17.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.17.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.17.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.17.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.18.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.18.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.18.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.18.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.19.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.19.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.19.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.19.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.20.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.20.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.20.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.20.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.21.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.21.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.21.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.21.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.22.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.22.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.22.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.22.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.23.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.23.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.23.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.23.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.24.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.24.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.24.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.24.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.25.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.25.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.25.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.25.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.26.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.26.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.26.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.26.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.27.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.27.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.27.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.27.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.28.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.28.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.28.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.28.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.29.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.29.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.29.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.29.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.30.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.30.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.30.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.30.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.31.self_attn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.31.self_attn.rotary_emb has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.31.mlp has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
    model.layers.31.mlp.act_fn has been matched but ignored as GaLore only supports linear layers. Please double check your `optim_target_modules`!
      0%|                                                                                                                                                                            | 0/6719320 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
      0%|                                                                                                                                                            | 1/6719320 [06:15<701609:55:03, 375.90s/it][2024-07-23 07:19:54,094] [INFO] [axolotl.callbacks.on_step_end:128] [PID:148509] [RANK:0] GPU memory usage while training: 17.607GB (+15.215GB cache, +1.482GB misc)
    {'loss': 1.694, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                      
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
    {'loss': 0.0, 'grad_norm': 0.0, 'learning_rate': 0.001, 'epoch': 0.0}                                                                                                                                        
      0%|                                                                                                                                                             | 200/6719320 [08:51<1455:09:36,  1.28it/s]/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py:83: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0 failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at ../aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp:697.)
      U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
    Traceback (most recent call last):
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 196, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 72, in <module>
        fire.Fire(do_cli)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
        component_trace = _Fire(component, args, parsed_flag_args, context, name)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
        component, remaining_args = _CallAndUpdateTrace(
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
        component = fn(*varargs, **kwargs)
      File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
        return do_train(parsed_cfg, parsed_cli_args)
      File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/cli/train.py", line 67, in do_train
        return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
      File "/home/minimalist/work/projects/sota/axolotl/src/axolotl/train.py", line 191, in train
        trainer.train(resume_from_checkpoint=resume_from_checkpoint)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
        return inner_training_loop(
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
        tr_loss_step = self.training_step(model, inputs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 3324, in training_step
        self.accelerator.backward(loss, **kwargs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/accelerator.py", line 2151, in backward
        loss.backward(**kwargs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
        torch.autograd.backward(
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
        _engine_run_backward(
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
        return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/transformers/trainer.py", line 1398, in optimizer_hook
        optimizer_dict[param].step()
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 75, in wrapper
        return wrapped(*args, **kwargs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/optim/optimizer.py", line 391, in wrapper
        out = func(*args, **kwargs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
        return func(*args, **kwargs)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/adamw8bit.py", line 58, in step
        grad = state["projector"].project(p.grad, state["step"])
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 21, in project
        self.ortho_matrix = self.get_orthogonal_matrix(full_rank_grad, self.rank, type='left')
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/galore_torch/galore_projector.py", line 83, in get_orthogonal_matrix
        U, s, Vh = torch.linalg.svd(matrix, full_matrices = False)
    torch._C._LinAlgError: linalg.svd: The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated singular values (error code: 1023).
      0%|                                                                                                                                                             | 200/6719320 [13:19<7465:36:33,  4.00s/it]
    Traceback (most recent call last):
      File "/home/minimalist/miniconda3/envs/comps/bin/accelerate", line 8, in <module>
        sys.exit(main())
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
        args.func(args)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
        simple_launcher(args)
      File "/home/minimalist/miniconda3/envs/comps/lib/python3.10/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
        raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
    subprocess.CalledProcessError: Command '['/home/minimalist/miniconda3/envs/comps/bin/python', '-m', 'axolotl.cli.train', 'examples/llama-3/qlora.yml']' returned non-zero exit status 1.
    

    Hyperparams

base_model: meta-llama/Meta-Llama-3-8B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

datasets:
  - path: <dataset>
    type: sharegpt
    conversation: llama-3
    field_human: human
    field_model: gpt

dataset_prepared_path:
val_set_size: 0.01
output_dir: ./outputs/galore-out

sequence_len: 2048
sample_packing: false
eval_sample_packing: true
pad_to_sequence_len: true

gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 4
optimizer: galore_adamw_8bit_layerwise
lr_scheduler: cosine
learning_rate: 0.000001

optim_target_modules:
 - self_attn
 - mlp

train_on_inputs: false
group_by_length: false
bf16: true
tf32: false

bfloat16: true

logging_steps: 4
flash_attention: true

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions