Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for overtraining? #30

Open
david-sg opened this issue May 7, 2019 · 8 comments
Open

Checking for overtraining? #30

david-sg opened this issue May 7, 2019 · 8 comments

Comments

@david-sg
Copy link

david-sg commented May 7, 2019

Some of the text that is being generated seems too good... Wondering if I might have overtrained the model.
I can check my dataset manually quickly.

Suggestions for checking the original dataset?

@woctezuma
Copy link
Contributor

woctezuma commented May 7, 2019

Sometimes, the output looks too good, but "Ctrl+F" some parts and you should see that it is not copied from the dataset. If that is not the case, for instance if you have trained your model for too many iterations with respect to the size of the "finetuning dataset", I suggest that you simply increase the temperature.

    :temperature=1 : Float value controlling randomness in boltzmann
     distribution. Lower temperature results in less random completions. As the
     temperature approaches zero, the model will become deterministic and
     repetitive. Higher temperature results in more random completions.

@david-sg
Copy link
Author

david-sg commented May 8, 2019

I downloaded the training corpus for 345M (approx 765mb) output-dataset_v1_medium-345M-k40.train.jsonl, and searched- but I'm not seeing some of the vocabulary popping up in my generated text...
Is this the correct corpus that it was trained on?

@woctezuma
Copy link
Contributor

woctezuma commented May 8, 2019

Sorry, I thought you were talking about your dataset, on which you wanted to finetune the model.

As mentioned in this blog post, the whole training dataset was not released. You only have 250k samples, instead of 8M in this repository.

@robclouth
Copy link

@david-sg i'm getting really impressive results too...and searching the fine-tuning dataset doesn't bring up anything similar. It's amazing how it blends the new stuff with the original model.

@ghost
Copy link

ghost commented Jun 17, 2019

Is there an apriori method to know how many steps you should train your dataset when fine-tuning? Want to know if there's any good heuristics out there. Thank You

@david-sg
Copy link
Author

I found it over-training at a loss below 0.10. Interestingly, it was pulling sentences from the main corpus, not the fine-tuning corpus. Increasing the temperature did not seem to help.

@ghost
Copy link

ghost commented Jun 20, 2019

the avg. loss?

@david-sg
Copy link
Author

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants