Joint Statistical Meetings 2019, Date: 2019/07/27 - 2019/08/01, Location: Denver, Colorado, United States

Publication date: 2019-07


Baele, Guy
Gill, Mandev ; Lemey, Philippe ; Suchard, Marc ; Rambaut, Andrew


Reconstructing pathogen dynamics from genetic data as they become available during an epidemic represents an important statistical scenario in which observations arrive sequentially in time and one is interested in performing inference in an ‘online’ fashion. Widely-used Bayesian phylogenetic inference packages are not set up for this purpose, generally requiring one to recompute trees and evolutionary model parameters de novo when new data arrive. To accommodate increasing data flow in a Bayesian phylogenetic framework, we introduce a novel approach to efficiently update the posterior distribution with newly available genetic data. Our procedure allows to insert new taxa into the current phylogeny and imputes plausible values for new model parameters to accommodate growing dimensionality. This augmentation creates informed starting values and uses optimally tuned transition kernels for posterior exploration of growing data sets, reducing the time necessary to converge to target posterior distributions. We use this approach to demonstrate a considerable reduction in time required to obtain posterior estimates at different time points of the recent West African Ebola virus epidemic.