You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?
The text was updated successfully, but these errors were encountered:
In a training step, a batch of 1024 tokens (about 2-3 paragraphs) of text is fed into the model, it does a forward pass and the gradients are updated with the backward pass.
If a row of data is less than 2-3 paragraphs, it will receive several continuous rows of data (hence why randomizing the rows of data is recommended)
I had tried to implement training row-by-row when working on aitextgen, but using the implementation here in gpt-2-simple performed much better for whatever reason.
I'm using gpt-2-simple for model fine-tuning and wonder what exactly happens in one "training step"? Is the entire fine-tuning data fed into the model or is only one unit (i.e. row in my training file) fed?
The text was updated successfully, but these errors were encountered: