Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Linyi Yang, Yaoxian Song, Xuan Ren, Chenyang Lyu, Yidong Wang, Jingming Zhuo, Lingqiao Liu, Jindong Wang, Jennifer Foster, Yue Zhang


Abstract
Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in natural language understanding. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We further discuss the challenges involved and potential future research directions. By providing convenient access to existing work, we hope this survey will encourage future research in this area.
Anthology ID:
2023.emnlp-main.276
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4533–4559
Language:
URL:
https://aclanthology.org/2023.emnlp-main.276
DOI:
10.18653/v1/2023.emnlp-main.276
Bibkey:
Cite (ACL):
Linyi Yang, Yaoxian Song, Xuan Ren, Chenyang Lyu, Yidong Wang, Jingming Zhuo, Lingqiao Liu, Jindong Wang, Jennifer Foster, and Yue Zhang. 2023. Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4533–4559, Singapore. Association for Computational Linguistics.
Cite (Informal):
Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future (Yang et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.276.pdf