Build GPT-2 model from scratch, init it properly then train!
The basic idea is to create a GPT-2 model and train it. And during the process we will explore how to observe the metrics and make the training smooth. We will build a fastai flavored mini training framework that let us adjust and training process more easily
This project is inspired by Let's reproduce GPT-2 (124M) from Andrej Karpathy and Practical Deep Learning Part 2 from Jeremy Howard.