Research overview

    We are a biological data science research group, which focuses on three major areas: human evolutionary genomics, statistical population genomics, and mathematical and algorithmic phylogenomics. Specifically, we work in human evolutionary genomics by exploying information from ancient and modern DNA samples to elucidate the evolutionary history of populations in the Americas. We also develop statistical approaches, including likelihood and machine learning methods, for identifying genomic regions undergoing natural selection. Moreover, we design and theoretically assess algorithms for inferring phylogenies when genomic signals conflict. For more detailed information, please read the sections below.

Statistical population genomics

    Population genetics is the study of how various evolutionary processes affect allele frequencies within and among populations over time. Some evolutionary processes that can influence allele frequencies are mutation, migration, genetic drift, and natural selection. Mathematical models enable us to make predictions about patterns of genetic variation expected under different evolutionary processes, which we can use to develop statistics for making inferences about evolutionary processes from genetic data. Our specific interests in this area include constructing evolutionary models to study how population history shapes genetic variation, developing quantitative techniques for inferring population history and adaptation, and designing statistics to assess differences in genetic variation within and among populations. An example of our work in statistical population genetics can be found here, which depicts some of the evolutionary models that we explored to investigate modern human origins.

Human evolutionary genomics

    Understanding the evolutionary processes that shaped the distribution of human genetic variation is central to the study of human population genetics. Novel high-throughput sequencing methods and increased computational power have provided geneticists with the tools needed to investigate the evolutionary processes driving human diversity. In particular, the availability of whole-genome data from a variety of modern and archaic human populations enables geneticists to answer questions about modern human origins by testing hypotheses of human evolutionary history, as well as questions about how humans have adapted to their environments by searching for genomic regions that display signatures of natural selection. An example of our work in human evolutionary genomics can be found here, which shows our top candidate gene for high-altitude adaptation in Ethiopians revealed through a genome-wide scan.

Mathematical and algorithmic phylogenomics

    Phylogenetics is the study of evolutionary relationships among species. A common hurdle when using genetic data to estimate the branching pattern of a set of species (known as the species tree topology) is that the branching patterns at different genomic regions (known as gene tree topologies) can differ. Reconciling conflicting gene tree topologies into a single species tree topology is becoming particularly important due to the rapid growth in genetic datasets, which increases the probability of observing loci with conflicting gene tree topologies. A number of factors can cause conflicting gene tree topologies, including incomplete lineage sorting, recombination, and mutation. We employ retrospective mathematical models to study the evolution of gene trees embedded in model species tree and investigate the processes shaping distributions of gene tree topologies. An example of our work in phylogenetics can be found here, which depicts gene tree discordance due to incomplete lineage sorting, a phenomenon that occurs when sets of genetic lineages fail to find a common ancestor in the first population that they are able.