Files

Download

Download Full Text (2.1 MB)

Date of Graduation

5-2026

Description

Frequent mutations in influenza virus surface proteins can increase infectivity while evading human and vaccine immunity, causing seasonal epidemics. The CDC annually evaluates thousands of virus strains to predict mutated sequences likely to be dominant in the next season, which creates a need for methods that better capture how viral mutations evolve over time. In this study, we represent influenza protein sequences as a network-like graph, creating connections if sequences are collected a week apart and only differ by one mutation. This method explicitly considers time information as part of the evolution, while other existing methods analyze mutated sequences without time information. We use a machine learning model called a Variational Graph Autoencoder (VGAE) to learn a compact mathematical representation of the flu sequences, and then apply this learned model to new data to determine if the new sequences are close enough for a connection. First, each data point collects information from neighboring sequences to update its own representation, becoming more similar to the sequences near it. Then, the data point is mapped to a range of likely mutations, formally defined as a probability distribution in a lower dimensional space, where the randomness could help us represent the randomness of mutations. To evaluate unseen sequences, the model takes in pairs of new sequences and maps their representation in the lower-dimensional space. Sequences that are more closely related are mapped closer by the learned model, and hence are more likely to be connected to each other. This framework allows us to see how well unseen sequences align with the learned distribution of influenza mutations. When evaluated on the H1N1 strain data, the model achieves an AUC of 0.696 on classifying connections between sequences, which is better than a random guesser with AUC 0.5. Our findings indicate that checking new sequences against the learned distribution can demonstrate relationships between new and old sequences, supporting influenza surveillance and vaccine strain selection.

Publication Date

2026

Document Type

Book

Degree Name

Bachelor of Science in Mathamatics

Degree Level

Undergraduate

Department

Mathematical Sciences

Advisor/Mentor

Chen, Jiahui

Disciplines

Mathematics

Keywords

Natural Science

Temporal Variational Graph Autoencoder for Influenza Evolution

Included in

Mathematics Commons

Share

COinS