-
-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checking for overtraining? #30
Comments
Sometimes, the output looks too good, but "Ctrl+F" some parts and you should see that it is not copied from the dataset. If that is not the case, for instance if you have trained your model for too many iterations with respect to the size of the "finetuning dataset", I suggest that you simply increase the temperature. :temperature=1 : Float value controlling randomness in boltzmann
distribution. Lower temperature results in less random completions. As the
temperature approaches zero, the model will become deterministic and
repetitive. Higher temperature results in more random completions. |
I downloaded the training corpus for 345M (approx 765mb) output-dataset_v1_medium-345M-k40.train.jsonl, and searched- but I'm not seeing some of the vocabulary popping up in my generated text... |
Sorry, I thought you were talking about your dataset, on which you wanted to finetune the model. As mentioned in this blog post, the whole training dataset was not released. You only have 250k samples, instead of 8M in this repository. |
@david-sg i'm getting really impressive results too...and searching the fine-tuning dataset doesn't bring up anything similar. It's amazing how it blends the new stuff with the original model. |
Is there an apriori method to know how many steps you should train your dataset when fine-tuning? Want to know if there's any good heuristics out there. Thank You |
I found it over-training at a loss below 0.10. Interestingly, it was pulling sentences from the main corpus, not the fine-tuning corpus. Increasing the temperature did not seem to help. |
the avg. loss? |
yes |
Some of the text that is being generated seems too good... Wondering if I might have overtrained the model.
I can check my dataset manually quickly.
Suggestions for checking the original dataset?
The text was updated successfully, but these errors were encountered: