Date of Graduation


Document Type


Degree Name

Bachelor of Science in Data Science

Degree Level



Data Science


Schubert, Karl

Committee Member/Reader

Sims, Paul

Committee Member/Second Reader

Shepherd, Sabrina


In the rapidly evolving landscape of consumer-packaged goods (CPG) retail, understanding the true values of various factors influencing sales performance is paramount for strategic decision-making and effective resource allocation. In ensuring accuracy of data points, the CatBoost model is utilized, a state-of-the-art gradient boosting technique, to predict the true attribution values of datasets sourced from CPG industry retailers.

By leveraging CatBoost’s inherent capabilities to handle categorical data and its robustness against overfitting, the models are optimized to accurately predict the true attribution values for various items. The performance of the CatBoost models is evaluated through rigorous cross-validation techniques and compared against baseline models to assess their effectiveness in predicting attribution values. The results demonstrate the efficacy of the CatBoost algorithm in accurately predicting true attribution values, thereby providing valuable insights for CPG retailers to optimize their marketing strategies, promotional activities, and pricing tactics. Overall implications of this research extend to enhancing decision-making processes and improving understanding of items within the database.

This project was worked on and completed during an internship with Nuqleous, a retail intelligence software company. Over the course of a year, this project was refined and improved upon so that Nuqleous customers could determine if any of their data attributes were false, and if so, get the predicted value for that instance.


Consumer-Packaged Goods (CPG) industry, internship, machine learning techniques, CatBoost, multiclass datasets

Available for download on Sunday, May 04, 2025

Included in

Data Science Commons