Date of Graduation
5-2025
Document Type
Thesis
Degree Name
Bachelor of Science in Mathematics
Degree Level
Undergraduate
Department
Mathematical Sciences
Advisor/Mentor
Jiahui Chen
Abstract
The influenza virus is one of the most common viral infections each year and can mutate rapidly. Viral mutations pose significant threats to public health by increasing infectivity and strengthening vaccine resistance. To track these evolving patterns, agencies like the CDC annually evaluate thousands of virus strains to understand viral mutagenesis and evolution in depth. Therefore, a computational method for analyzing high-dimensional, noisy virus data could aid in the rapid identification of antigens essential for an effective influenza vaccine for the upcoming season. Through the integration of genomic analysis, clustering, and dimensionality reduction methods, this study specifically aims to develop such a computational method for tracking influenza virus mutation patterns. Additionally, this method can be applied to further investigate the differences in the mutagenesis of the influenza virus before and after the COVID-19 pandemic, potentially due to preventative measures such as quarantining and wearing masks. More specifically, K-means clustering is applied to pre- and post-COVID-19 influenza datasets that are reduced by principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP) then labeled by time collected. Post-COVID-19 influenza data is transformed onto the embedding from pre-COVID-19 influenza data to identify unique clustering of most recent influenza sequences. Furthermore, the latent space of a variational graph autoencoder is explored as a dimensionality reduction method. This method can include embeddings of similarities between sequences in both time collected and specific mutation changes for future mutation predictions. Findings indicate that clustering and dimensionality reduction provide insight into the complex dynamics of viral mutation, informing both future research directions and strategies for public health intervention.
Keywords
influenza; dimensionality reduction
Citation
Walden, E. (2025). Mathematics-AI Based Phylogenetic Analysis of Influenza Virus Mutation Data. Mathematical Sciences Undergraduate Honors Theses Retrieved from https://scholarworks.uark.edu/mascuht/10