Following the Flu

Standing next to my poster at the Research Symposium

Author: Emilee Walden | Major: Applied Mathematics and Biology | Semester: Spring 2024

Through the Spring 2024 semester, I worked alongside my mentor Dr. Jiahui Chen to evaluate trends in influenza (flu) virus mutation patterns. Dr. Chen arrived at the University of Arkansas in the Fall of 2024 as a professor in the Department of Mathematical Sciences with research experience in mathematical biology. This unique combination of interests perfectly fit my majors of Applied Mathematics and Biology. Having already reached out to 11 professors throughout the math and biology departments in hopes of finding a good fit for my honors thesis advisor before the beginning of my junior year, I was losing hope of finding the right project to combine both majors. So of course, I rejoiced when I heard the news of Dr. Chen’s arrival to UofA, which came at such a perfect time, and I began meeting with him over Zoom before he even made it to Arkansas.

Inspired by Dr. Chen’s previous work predicting the mutational trends of COVID-19, my goal was to develop a computational technique to track mutational patterns of the flu virus before they arise. The flu virus’ evolutionary changes continuously threaten the human immune system when existing antibodies from vaccines or previous infections can no longer recognize the virus. So, this technique can aid in the development of targeted vaccines and antiviral therapies for the flu. A Jaccard distance metric, comparing the dissimilarity of each flu sequence in the dataset to every other sequence, was first applied to H1N1 and H3N2 flu virus sequence data from the USA and from 2018-2023 sourced from the NCBI database. Then, dimensionality reduction methods principal component analysis (PCA), uniform manifold approximation and projection (UMAP), and t-distributed stochastic neighbor embedding (t-SNE) were each combined with k-means clustering to better visualize the most common clusters or trends in the high-dimensional data. Each point on the plot was colored based on the date the data was collected, and we found that the applied method preserves temporal patterns. We also trained 2018-2019 data with PCA then applied the embedding to 2020-2023 data. This revealed mutation trends over time not apparent in direct k-means clustering of 2020-2023 data.

My results were presented in a poster at the UofA’s Undergraduate Research Symposium where my poster won first place in my category! My progress and success is largely due to Dr. Chen’s guidance as we met in person twice weekly and emailed much more throughout the week. Having a mentor who is so available, patient, and consistent kept me motivated to keep taking the next step in my project, even when my classes were busy. Our meetings often looked like a small lesson on the white board including pictures of the mathematical and computational concepts I was struggling with, then I was given a list of small tasks to try in the next few days. This structure made progress manageable nearly every day.

Moving forward, I will continue my research throughout the summer until I defend my honors thesis in the fall. I also hope to publish my work before I graduate. I then plan to begin graduate school to get a PhD in Bioinformatics and Genomics to become a research professor.