Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What exactly happens in one training step? #245

Open
bjoernhommel opened this issue Dec 15, 2020 · 1 comment
Open

What exactly happens in one training step? #245

bjoernhommel opened this issue Dec 15, 2020 · 1 comment

Comments

@bjoernhommel
Copy link

I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?

@minimaxir
Copy link
Owner

In a training step, a batch of 1024 tokens (about 2-3 paragraphs) of text is fed into the model, it does a forward pass and the gradients are updated with the backward pass.

If a row of data is less than 2-3 paragraphs, it will receive several continuous rows of data (hence why randomizing the rows of data is recommended)

I had tried to implement training row-by-row when working on aitextgen, but using the implementation here in gpt-2-simple performed much better for whatever reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants