Date of Graduation
5-2024
Document Type
Thesis
Degree Name
Bachelor of Science in Data Science
Degree Level
Undergraduate
Department
Data Science
Advisor/Mentor
Schubert, Karl
Committee Member
Buttle, Casey
Second Committee Member
Mitchell, Rachael
Abstract
This thesis explores the application of multiprocessing and multithreading techniques in Python to optimize runtime efficiency on the analysis of retail data. As the retail data processed by a program increases, so does the runtime of the program. If you are performing this processing using only a single core, even a gigabyte of data can potentially take upwards to half an hour to finish processing, while larger datasets of 100 GB or more could take days, heavily limiting the amount of retail data that can be processed in a reasonable amount of time. By employing multithreading and multiprocessing architectures in Python, a programming language commonly used in data analysis, this study attempts to evaluate their efficacy and feasibility in reducing the runtime of retail data processing to more manageable levels. The results of this study show the importance of utilizing concurrent computing paradigms to address the computational challenges posed by the ever-expanding volumes of retail data.
Keywords
Concurrent Processing; Multiprocessing; Multithreading
Citation
Slavin, B. (2024). Concurrent Processing of Retail Data in Python to Optimize Runtime. Data Science Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/dtscuht/12