Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs

Wafa Abdullah Alrajhi; Hend Al-Khalifa; Abdulmalik AlSalman

doi:10.18653/v1/2022.wanlp-1.17

Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs

Wafa Abdullah Alrajhi, Hend Al-Khalifa, Abdulmalik AlSalman

Abstract

Despite the noticeable progress that we recently witnessed in Arabic pre-trained language models (PLMs), the linguistic knowledge captured by these models remains unclear. In this paper, we conducted a study to evaluate available Arabic PLMs in terms of their linguistic knowledge. BERT-based language models (LMs) are evaluated using Minimum Pairs (MP), where each pair represents a grammatical sentence and its contradictory counterpart. MPs isolate specific linguistic knowledge to test the model’s sensitivity in understanding a specific linguistic phenomenon. We cover nine major Arabic phenomena: Verbal sentences, Nominal sentences, Adjective Modification, and Idafa construction. The experiments compared the results of fifteen Arabic BERT-based PLMs. Overall, among all tested models, CAMeL-CA outperformed the other PLMs by achieving the highest overall accuracy.

Anthology ID:: 2022.wanlp-1.17
Volume:: Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 185–193
Language:
URL:: https://aclanthology.org/2022.wanlp-1.17/
DOI:: 10.18653/v1/2022.wanlp-1.17
Bibkey:
Cite (ACL):: Wafa Abdullah Alrajhi, Hend Al-Khalifa, and Abdulmalik AlSalman. 2022. Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 185–193, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Assessing the Linguistic Knowledge in Arabic Pre-trained Language Models Using Minimal Pairs (Alrajhi et al., WANLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wanlp-1.17.pdf

PDF Cite Search Fix data