Shahriar Golchin


2025

pdf bib
Grading Massive Open Online Courses Using Large Language Models
Shahriar Golchin | Nikhil Garuda | Christopher Impey | Matthew Wenger
Proceedings of the 31st International Conference on Computational Linguistics

Massive open online courses (MOOCs) offer free education globally. Despite this democratization of learning, the massive enrollment in these courses makes it impractical for an instructor to assess every student’s writing assignment. As a result, peer grading, often guided by a straightforward rubric, is the method of choice. While convenient, peer grading often falls short in terms of reliability and validity. In this study, we explore the feasibility of using large language models (LLMs) to replace peer grading in MOOCs. To this end, we adapt the zero-shot chain-of-thought (ZCoT) prompting technique to automate the feedback process once the LLM assigns a score to an assignment. Specifically, to instruct LLMs for grading, we use three distinct prompts based on ZCoT: (1) ZCoT with instructor-provided correct answers, (2) ZCoT with both instructor-provided correct answers and rubrics, and (3) ZCoT with instructor-provided correct answers and LLM-generated rubrics. We tested these prompts in 18 different scenarios using two LLMs—GPT-4 and GPT-3.5—across three MOOCs: Introductory Astronomy, Astrobiology, and the History and Philosophy of Astronomy. Our results show that ZCoT, when augmented with instructor-provided correct answers and rubrics, produces grades that are more aligned with those assigned by instructors compared to peer grading. Finally, our findings indicate a promising potential for automated grading systems in MOOCs, especially in subjects with well-defined rubrics, to improve the learning experience for millions of online learners worldwide.

2024

pdf bib
Data Contamination Report from the 2024 CONDA Shared Task
Oscar Sainz | Iker García-Ferrero | Alon Jacovi | Jon Ander Campos | Yanai Elazar | Eneko Agirre | Yoav Goldberg | Wei-Lin Chen | Jenny Chim | Leshem Choshen | Luca D’Amico-Wong | Melissa Dell | Run-Ze Fan | Shahriar Golchin | Yucheng Li | Pengfei Liu | Bhavish Pahwa | Ameya Prabhu | Suryansh Sharma | Emily Silcock | Kateryna Solonko | David Stap | Mihai Surdeanu | Yu-Min Tseng | Vishaal Udandarao | Zengzhi Wang | Ruijie Xu | Jinglin Yang
Proceedings of the 1st Workshop on Data Contamination (CONDA)

The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora used to train large scale models, compromising evaluation results. The workshop fostered a shared task to collect evidence on data contamination in current available datasets and models. The goal of the shared task and associated database is to assist the community in understanding the extent of the problem and to assist researchers in avoiding reporting evaluation results on known contaminated resources. The shared task provides a structured, centralized public database for the collection of contamination evidence, open to contributions from the community via GitHub pool requests. This first compilation paper is based on 566 reported entries over 91 contaminated sources from a total of 23 contributors. The details of the individual contamination events are available in the platform. The platform continues to be online, open to contributions from the community.

2023

pdf bib
Intermediate Domain Finetuning for Weakly Supervised Domain-adaptive Clinical NER
Shilpa Suresh | Nazgol Tavabi | Shahriar Golchin | Leah Gilreath | Rafael Garcia-Andujar | Alexander Kim | Joseph Murray | Blake Bacevich | Ata Kiapour
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Accurate human-annotated data for real-worlduse cases can be scarce and expensive to obtain. In the clinical domain, obtaining such data is evenmore difficult due to privacy concerns which notonly restrict open access to quality data but also require that the annotation be done by domain experts. In this paper, we propose a novel framework - InterDAPT - that leverages Intermediate Domain Finetuning to allow language models to adapt to narrow domains with small, noisy datasets. By making use of peripherally-related, unlabeled datasets,this framework circumvents domain-specific datascarcity issues. Our results show that this weaklysupervised framework provides performance improvements in downstream clinical named entityrecognition tasks.

pdf bib
Do not Mask Randomly: Effective Domain-adaptive Pre-training by Masking In-domain Keywords
Shahriar Golchin | Mihai Surdeanu | Nazgol Tavabi | Ata Kiapour
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)

We propose a novel task-agnostic in-domain pre-training method that sits between generic pre-training and fine-tuning. Our approach selectively masks in-domain keywords, i.e., words that provide a compact representation of the target domain. We identify such keywords using KeyBERT (Grootendorst, 2020). We evaluate our approach using six different settings: three datasets combined with two distinct pre-trained language models (PLMs). Our results reveal that the fine-tuned PLMs adapted using our in-domain pre-training strategy outperform PLMs that used in-domain pre-training with random masking as well as those that followed the common pre-train-then-fine-tune paradigm. Further, the overhead of identifying in-domain keywords is reasonable, e.g., 7-15% of the pre-training time (for two epochs) for BERT Large (Devlin et al., 2019).