These examples replicate the experiments in the original paper ( The training code base is derived from, in turn derived from the official PyTorch language model example.
Install PyTorch and alstm and download the data you want to use (p
for Penn Treebank and w
for Wikitext-2): -pw
To train the aLSTM on Penn Treebank, run
python --model ALSTM --epochs 190 --emsize 400 --nhid 1150 --nlayers 2 --npar 100 --dropouth 0.25 --dropoute 0.16 --dropouti 0.6 --dropouto 0.6 --dropouta 0.1 --wdecay 1e-6 --var-seq --seq-len 70 --batch_size 20 --cut-steps 100 160 --cut-rate 10 --save
This will give you val / test scores of 58.7
/ 56.5
To train the aLSTM on Wikitext-2, run
python --model ALSTM --epochs 187 --emsize 400 --nhid 1500 --nlayers 2 --npar 100 --dropouth 0.25 --dropoute 0.16 --dropouti 0.6 --dropouto 0.6 --dropouta 0.1 --wdecay 1e-6 --var-seq --seq-len 70 --batch_size 20 --cut-steps 80 160 200 --cut-rate 10 --save --data data/wikitext-2
This will give you val / test scores of 67.5
/ 64.5
The API for the language model is the same as that of the AWD-LSTM, so you can use any post-processing scripts they have, such as fine tuning, adding a neural cache or generating samples.