Date of Graduation

12-2025

Document Type

Thesis

Degree Name

Bachelor of Science in Computer Science

Degree Level

Undergraduate

Department

Computer Science and Computer Engineering

Advisor/Mentor

Dr. Lu Zhang

Committee Member

Dr. Alexander Nelson

Second Committee Member

Dr. Susan Gauch

Abstract

This thesis investigates demographic bias in large language models (LLMs) through the use of evaluating outcome disparities when utilized in decision making tasks as well as underlying associations that could contribute to furthering these disparities. Using profiles from the Adult dataset, we analyze how Gemini 2.0 Flash performs in an income prediction task using zero-shot and few-shot prompting methods. Our findings show that models exhibit measurable differences in demographic parity and false positive rates, with the use of few-shot prompting reducing these disparities. Alongside this line of testing, we tested associational bias in Qwen 2.5 using probability based association tests that compare log probabilities of occupations and adjectives in prompts that contain gendered pronouns. These tests revealed measurable gender-linked associations, with male prompts more probable to display high status occupations and competence based adjectives. Female prompts were found to be more likely to show caregiving occupations as well as warmth based adjectives. Using both tests, we see that outcome disparities and internal associations of the model align which suggests that these patterns could affect downstream decision based tasks that these LLMs are used for. While the difference in prompting strategies helped mitigate some bias, the persistence of gender based patterns helps highlight the importance of evaluating LLMs for fairness as they become heavily utilized within applications that could have social consequences.

Keywords

Artificial Intelligence; Large Language Models; Bias; Classification; Gender Associations

Share

COinS