Computational Advances in Comparative Biology for Modeling Messy Trait Data

Jenniffer Roa Lozano, University of Arkansas, Fayetteville

Abstract

As comparative trait data continue to grow in scale and complexity, the need for flexible and efficient tools for evolutionary simulation and model selection has become increasingly critical. This thesis presents two complementary advances in the study of continuous trait evolution that address this demand. In Chapter 1, we introduce TraitTrainR, an R package designed to perform fast, large-scale simulations under complex evolutionary models. TraitTrainR supports a variety of trait data transformations, enables multi-trait evolution, and allows users to define flexible parameter spaces, including model stacking and measurement error. Its design helps bridge the gap between evolutionary theory and practical inference by enabling well-organized simulation pipelines for testing statistical power, exploring model behavior, and evaluating the effects of measurement error. We demonstrate the utility of TraitTrainR by applying it to three empirical phylogenetic datasets (Primates, Fungi, Arthropods) to assess how model selection accuracy is impacted by trait imprecision, even when standard error is estimated during fitting. In Chapter 2, We explore an alternative strategy for evolutionary model selection based on supervised learning. Introducing Evolutionary Discriminant Analysis (EvoDA), a classification framework that predicts evolutionary models from trait data using discriminant functions. EvoDA is evaluated using simulated and empirical case studies involving fungal gene expression, targeting challenging scenarios with noisy traits and multiple candidate models. Compared to conventional approaches such as AIC, EvoDA achieved higher classification accuracy, especially under measurement error. Most genes were classified under Ornstein-Uhlenbeck (OU) models, reflecting known patterns of stabilizing selection in gene expression. Together, these two contributions, TraitTrainR and EvoDA, offer powerful tools for advancing comparative biology. TraitTrainR enables large-scale simulation, while EvoDA extends the PCM toolkit with a novel classification approach capable of handling complex and noisy trait data. These methods provide new opportunities for understanding evolutionary processes and improving model-based inference across diverse biological systems.