Pytorch implementation of Meta BiLSTM sequence tagger from this paper https://arxiv.org/pdf/1805.08237v1.pdf alongwith additions of GRU cells comparing both their performance.
- conllu
The dataset download links and insructions are in scripts
directory.
The rar file needs to be extracted into the data/embeddings
directory and
the CoNLLU dataset needs to be placed in the root of the dir
directory.
- Name ID - 1
- Title, affiliation of authors - 1
- Description -
Aim, Methodology
, Outcome - 3/4 - Concepts - 3/4
- Dataset Details - 2/3
- Alloted tasks and progress - 1/2
- Implementation details, pseudocode - 4/6
- Results/discussions - 3/4
- Comparison of results - 1/2
- Challenges - 2/3
- Scope - 1
- Experience/Learning Outcomes - 1/2
Ref 1: Embeddings
- Need for char based models? No word segmentations in some languages and handling informal language.
- Benefits of char based models:
- Generate embeddings for unknown words.
- Similar words have similar embeddings.
- Subword Models a. BYTE PAIR ENCODING: looking for the most frequent sequence of 2 bytes and then you add that sequence of 2 bytes as a new element to your dictionary of possible values. Essentially character n-grams. It encapsulates the most frequent n-gram pairs into a new n-gram. b. WORDPIECE/SENTENCEPIECE: Greedy approximation to maximize language model log-likelihood to choose the pieces and add n-gram that maximally reduces perplexity. Wordpiece tokenizes inside words, it tokenizes words first then applies BPE. In sentencepiece model, the whitespace is retained as a special token and grouped normally.
- Hybrid character and word level models
- The main issue with one-hot encoding is that the transformation does not rely on any supervision. We can greatly improve embeddings by learning them using a neural network on a supervised task. The embeddings form the parameters — weights — of the network which are adjusted to minimize loss on the task.