Download PDF

Annual Review of Statistics and Its Application

Publication date: 2023-01-01
Volume: 10 Pages: 353 - 377
Publisher: Annual Reviews

Author:

Hassler, Gabriel W
Magee, Andrew F ; Zhang, Zhenyu ; Baele, Guy ; Lemey, Philippe ; Ji, Xiang ; Fourment, Mathieu ; Suchard, Marc A

Keywords:

Science & Technology, Physical Sciences, Mathematics, Interdisciplinary Applications, Statistics & Probability, Mathematics, Bayesian networks, continuous-time Markov processes, Gaussian processes, phylogenetic comparative methods, phylogeography, TIME MARKOV-CHAINS, EVOLUTIONARY TREES, MAXIMUM-LIKELIHOOD, DNA-SEQUENCES, INFERENCE, MODELS, COALESCENT, BOOTSTRAP, PROPOSALS, DYNAMICS, G0E1420N#55517644, G098321N#56127204, G051322N#56762601, C14/18/094#54689608, 0103 Numerical and Computational Mathematics, 0104 Statistics, 4905 Statistics

Abstract:

Researchers studying the evolution of viral pathogens and other organisms increasingly encounter and use large and complex data sets from multiple different sources. Statistical research in Bayesian phylogenetics has risen to this challenge. Researchers use phylogenetics not only to reconstruct the evolutionary history of a group of organisms, but also to understand the processes that guide its evolution and spread through space and time. To this end, it is now the norm to integrate numerous sources of data. For example, epidemiologists studying the spread of a virus through a region incorporate data including genetic sequences (e.g. DNA), time, location (both continuous and discrete) and environmental covariates (e.g. social connectivity between regions) into a coherent statistical model. Evolutionary biologists routinely do the same with genetic sequences, location, time, fossil and modern phenotypes, and ecological covariates. These complex, hierarchical models readily accommodate both discrete and continuous data and have enormous combined discrete/continuous parameter spaces including, at a minimum, phylogenetic tree topologies and branch lengths. The increased size and complexity of these statistical models have spurred advances in computational methods to make them tractable. We discuss both the modeling and computational advances below, as well as unsolved problems and areas of active research.