GPT2

In this project we repoduce the GPT architecture implementation and re-train it to match OpenAI 2019 results. This project is for us a PoC of a distributed pretraining experiment.

The dataset originally chosen for pretraining is a 10B subset of the SlimPajama dataset. First results were disappointing so we switched to the FineWeb-Edu dataset (15B tokens) (highly qualitative data). The training is performed on 2xA100 80GB. We effectively reach (and beat) GPT2 (small) performance.

This project is inspired from: https://github.com/karpathy/build-nanogpt/tree/master

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
__pycache__		__pycache__
data		data
gpt2_model		gpt2_model
results		results
scripts		scripts
weights		weights
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
try_model.ipynb		try_model.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT2

About

Releases

Packages

Languages

sade-adrien/gpt2

Folders and files

Latest commit

History

Repository files navigation

GPT2

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages