What exactly happens in one training step? #245

bjoernhommel · 2020-12-15T08:31:13Z

I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?

minimaxir · 2020-12-24T04:38:34Z

In a training step, a batch of 1024 tokens (about 2-3 paragraphs) of text is fed into the model, it does a forward pass and the gradients are updated with the backward pass.

If a row of data is less than 2-3 paragraphs, it will receive several continuous rows of data (hence why randomizing the rows of data is recommended)

I had tried to implement training row-by-row when working on aitextgen, but using the implementation here in gpt-2-simple performed much better for whatever reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What exactly happens in one training step? #245

What exactly happens in one training step? #245

bjoernhommel commented Dec 15, 2020

minimaxir commented Dec 24, 2020

What exactly happens in one training step? #245

What exactly happens in one training step? #245

Comments

bjoernhommel commented Dec 15, 2020

minimaxir commented Dec 24, 2020