Date of Graduation
5-2026
Document Type
Thesis
Degree Name
Bachelor of Science in Data Science
Degree Level
Undergraduate
Department
Data Science
Advisor/Mentor
Karl D Schubert
Committee Member
Karl D Schubert
Second Committee Member
Eric Specking
Third Committee Member
Kelly M Sullivan
Abstract
This thesis examines substitutability within Walmart apparel as a foundation for modular category optimization. Using large-scale item-level data, I develop an attribute-based framework that aggregates products to the fineline level, constructs a structured feature space, and identifies candidate substitute relationships through similarity-based matching within relevant merchandise groupings. The results show that Walmart item master data contains sufficient structure to support scalable substitute generation across a high-variety assortment. However, substitutability is not uniform: many item pairs exhibit high similarity but low observed demand transfer, indicating that structural similarity alone does not guarantee substitution. To address this, the framework is positioned within a two-stage approach that combines similarity-based candidate generation with sales-based validation. Empirical results show that a smaller subset of relationships demonstrates meaningful substitution, with up to 11% observed lift and high similarity among strong pairs.
Keywords
Substitutability Analysis; Apparel Analytics; Business Analytics
Citation
Sankaran, M. A. (2026). Modular Category Optimization for Substitutability: An Item-Level Approach. Data Science Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/dtscuht/28