Date of Graduation

5-2025

Document Type

Thesis

Degree Name

Bachelor of Science in Data Science

Degree Level

Undergraduate

Department

Data Science

Advisor/Mentor

Schubert, Karl

Committee Member

Brown, Jamelle

Second Committee Member

Davis, EmmaLe

Abstract

This honors thesis presents the development of a user-friendly data science application built with Streamlit, designed to facilitate data exploration and analysis for users with little to no prior knowledge of data science. The application is structured into five interactive pages, guiding users through the essential stages of data analysis: a welcome page, data cleaning, data visualization, modeling, and interpretation.

The welcome page serves as the entry point, allowing users to upload their data, which will be used throughout the subsequent pages. It also features the company logo for Bentley Ave Data Labs, the industry partner that requested the development of this application. This collaboration ensures that the app meets real-world industry needs while providing an accessible tool for data exploration.

The data cleaning page offers intuitive, clickable functions for basic preprocessing tasks such as handling missing values, removing duplicates, and dropping unnecessary columns. Once the data is cleaned, users can move to the data visualization page, where they can generate two customizable graphs by selecting variables to display, alongside a heatmap highlighting data outliers and a z-score graph for deeper insights. Users are also given the option to remove outliers directly from this interface.

The modeling page providing users the ability to choose between three categorical models and three regression models to fit their data. This streamlined approach empowers users to build and evaluate machine learning models without writing any code, promoting accessibility and enabling more people to harness the power of data science.

The interpretation page ties together the insights gathered throughout the application by presenting the results of the chosen machine learning model in an easy-to-understand format. Users are shown key performance metrics of accuracy, R-squared, and feature importance. Additionally, the page displays a confusion matrix or residual plot to help users visualize how well the model performed. This

3

final step not only reinforces the outcomes of their analysis but also supports informed decision-making by translating complex results into clear, actionable information.

Through this application, the thesis aims to bridge the gap between complex data analysis techniques and users who lack programming experience, offering an accessible platform for understanding and interpreting data with ease and confidence.

Keywords

Data Science; Web Application; Machine Learning Modeling; No Code

Available for download on Monday, May 15, 2028

Share

COinS