Highlights
- Pro
-
hallucinations Public
Code for "The Curious Case of Hallucinations in Neural Machine Translation".
-
Finding-Memo Public
Code for "Extractive Memorization in Constrained Sequence Generation Tasks"
-
literalness Public
Code for "Do GPTs Produce Less Literal Translations?"
-
dlp Public
Code for "On Dimensional Linguistic Properties of the Word Embedding Space".
-
-
-
-
fairseq Public
Forked from facebookresearch/fairseqFacebook AI Research Sequence-to-Sequence Toolkit written in Python.
Python MIT License UpdatedApr 8, 2022 -
awesome-align Public
Forked from neulab/awesome-alignA word aligner based on multilingual encoders
-
wmt21-news-systems Public
Forked from wmt-conference/wmt21-news-systemsSmalltalk UpdatedOct 7, 2021 -
wmt-format-tools Public
Forked from wmt-conference/wmt-format-toolsTools for formatting WMT hypothesis and test sets in XML
Python Apache License 2.0 UpdatedOct 6, 2021 -
-
BIG-bench Public
Forked from r-barnes/BIG-benchBeyond the Imitation Game collaborative benchmark for enormous language models
Jupyter Notebook Apache License 2.0 UpdatedJun 15, 2021 -
Megalodon Public
Various ML/DL Resources organised at a single place.
-
bert_score Public
Forked from Tiiiger/bert_scoreBERT score for text generation
Jupyter Notebook MIT License UpdatedApr 26, 2021 -
transformers Public
Forked from huggingface/transformers🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
Python Apache License 2.0 UpdatedApr 23, 2021 -
wit Public
Forked from google-research-datasets/witWIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.
Other UpdatedApr 23, 2021 -
xtreme Public
Forked from google-research/xtremeXTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 typologically diverse languages and includes nine tasks.
Shell Apache License 2.0 UpdatedApr 21, 2021 -
Half-Size Public
Code for "Effective Dimensionality Reduction for Word Embeddings".
-
bleurt Public
Forked from google-research/bleurtBLEURT is a metric for Natural Language Generation based on transfer learning.
Python Apache License 2.0 UpdatedApr 13, 2021 -
long-tailed Public
Code for "On Long-Tailed Phenomena in NMT".
-
cookbook Public
Forked from unicode-cookbook/cookbookThe Unicode Cookbook for Linguists
TeX UpdatedNov 21, 2020 -
biaffine-ner Public
Forked from juntaoy/biaffine-nerNamed Entity Recognition as Dependency Parsing
Python Apache License 2.0 UpdatedNov 13, 2020 -
Wikilingua Public
Forked from esdurmus/WikilinguaMultilingual abstractive summarization dataset extracted from WikiHow.
Creative Commons Zero v1.0 Universal UpdatedOct 8, 2020 -
LM_NE_bias Public
Forked from vered1986/LM_NE_biasNamed Entity Biases in Pre-trained Language Models
Jupyter Notebook Other UpdatedSep 15, 2020 -
-
espresso Public
Forked from freewym/espressoEspresso: A Fast End-to-End Neural Speech Recognition Toolkit
Python Other UpdatedMay 19, 2020 -
-
TVCaption Public
Forked from jayleicn/TVCaptionPyTorch implementation of MMT on TVCaption dataset
Python MIT License UpdatedJan 27, 2020 -
CMC Public
Forked from HobbitLong/CMCpytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level D…
Python UpdatedJan 5, 2020