Date of Graduation

5-2024

Document Type

Thesis

Degree Name

Bachelor of Science in Data Science

Degree Level

Undergraduate

Department

Data Science

Advisor/Mentor

Schubert, Karl

Committee Member/Reader

Buttle, Casey

Committee Member/Second Reader

Mitchell, Rachael

Abstract

This thesis explores the application of multiprocessing and multithreading techniques in Python to optimize runtime efficiency on the analysis of retail data. As the retail data processed by a program increases, so does the runtime of the program. If you are performing this processing using only a single core, even a gigabyte of data can potentially take upwards to half an hour to finish processing, while larger datasets of 100 GB or more could take days, heavily limiting the amount of retail data that can be processed in a reasonable amount of time. By employing multithreading and multiprocessing architectures in Python, a programming language commonly used in data analysis, this study attempts to evaluate their efficacy and feasibility in reducing the runtime of retail data processing to more manageable levels. The results of this study show the importance of utilizing concurrent computing paradigms to address the computational challenges posed by the ever-expanding volumes of retail data.

Keywords

Concurrent Processing, Multiprocessing, Multithreading

Available for download on Friday, May 09, 2025

Included in

Data Science Commons

Share

COinS