An implementation of the Hierachical Attention Model and the baselines described in the paper Hierarchical Attention Model for Improved Comprehension of Spoken Content by Wei Fang, Juei-Yang Hsu, Hung-Yi Lee, and Lin-Shan Lee.
- Torch7
- penlight
- nn
- nngraph
- optim
- rnn
- Java >= 8 (for Stanford CoreNLP utilities)
- Python2 >= 2.7
- Python3 >= 3.5
The Torch/Lua dependencies can be installed using luarocks. For example:
luarocks install nngraph
First run the following script:
sh preprocess.sh
This downloads the following data:
- TOEFL Listening Comprehension Test Dataset
- Glove word vectors (Common Crawl 840B) -- Warning: this is a 2GB download!
and the following libraries:
The preprocessing script generates dependency parses of the TOEFL dataset using the Stanford Neural Network Dependency Parser.
Alternatively, the download and preprocessing scripts can be called individually.
TOEFL is an English examination which tests knowledge and skills of academic English for non-native English learners. Each example consists of an audio story, a question, and four answer choices. Among these choices, one or two of them are correct. given the manual or ASR transcriptions of an audio story and a question, machine has to select the correct answer out of the four choices.
To train models for the TOEFL Listening Comprehension Test, run:
th toefl/main.lua --model <ham|lstm|bilstm|treelstm|memn2n> --task <manual|ASR> --level <phrase|sentence> --dim <sentence_representation_dim> --internal <memn2n_dim> --hops <memn2n_hops> --layers <num_layers> --epochs <num_epochs> --prune <pruning_rate>
where:
model
: the model to train (default: ham, i.e. the Hierarchical Attention Model)task
: the transcription to be trained on (default: manual)level
: the attention level of the HAM (default: phrase, ignored for other models)dim
: the dimension for sentence/phrase representations (default: 75)internal
: the dimension for memory module in HAM or for MemN2N (default: 75, ignored for other models)hops
: the number of hops for HAM or MemN2N (default: 1, ignored for other models)layers
: the number of layers for LSTM or BiLSTM (default: 1, ignored for other models)epochs
: the number of training epochs (default: 10)prune
: the preprocessing prune rate (default: 1, i.e. no pruning)
Trained model parameters are saved to the trained_models
directory.