Date of Graduation

5-2025

Document Type

Thesis

Degree Name

Bachelor of Science in Data Science

Degree Level

Undergraduate

Department

Data Science

Advisor/Mentor

Gauch, Susan

Committee Member

Sullivan, Kelly

Second Committee Member

Yang, Song

Third Committee Member

Schubert, Karl

Abstract

NLP (natural language processing) models often rely on human-labeled data for training and evaluation. Many approaches crowdsource this data from a large number of annotators with varying skills, backgrounds, and motivations, resulting in conflicting annotations. These conflicts have traditionally been resolved by aggregation methods that assume disagreements are errors. Recent work has argued that for many tasks annotators may have genuine disagreements and that variation should be treated as signal rather than noise. However, limited work has combined the two frameworks to separate signal from noise in human-labeled data. In this work, we introduce NUTMEG, a new Bayesian model that incorporates information about annotator backgrounds to remove noisy annotations from human-labeled training data while preserving systematic disagreements. We then use a synthetic data evaluation framework to show that NUTMEG is more effective at recovering ground-truth from annotations with systematic disagreement than traditional aggregation methods. We provide further analysis characterizing how differences in subpopulation sizes, rates of disagreement, and rates of spam affect the performance of our model. Finally, we demonstrate that downstream models trained on data aggregated by NUTMEG significantly outperform both models trained on traditionally aggregated data and models trained on the full set of disaggregated annotations. Our results highlight the importance of accounting for both annotator competence and systematic disagreements when training on human-labeled data.

Keywords

nlp; ai; annotator disagreement; crowdsourcing; item-response; learning from disagreement

Available for download on Thursday, October 22, 2026

Share

COinS