This site uses cookies. By continuing, your consent is assumed. Learn more

130.4m shares

Molecular dating of phylogenies by likelihood methods of execution

opinion

Mutations that accumulate in the genome of cells or viruses can be used to infer their evolutionary history. In the case of rapidly evolving organisms, genomes can reveal their detailed spatiotemporal spread.

Such phylodynamic analyses are particularly useful to understand the epidemiology of rapidly evolving viral pathogens. As the number of genome sequences available for different pathogens has increased dramatically over the last years, phylodynamic analysis with traditional methods becomes challenging as these methods scale poorly with growing datasets. Here, we present TreeTime, a Python-based framework for phylodynamic analysis using an approximate Maximum Likelihood approach.

TreeTime can estimate ancestral states, infer evolution models, reroot trees to maximize temporal signals, estimate molecular clock phylogenies and population size histories. The runtime of TreeTime scales linearly with dataset size.

Molecular Biology and Evolution CodonPhyML:...

Phylogenetics uses differences between homologous sequences to infer the history of the sample and learn about the evolutionary processes that gave rise to the observed diversity.

In absence of recombination, this history is a tree along which sequences Molecular dating of phylogenies by likelihood methods of execution from ancestors with modification.

In general, the reconstruction of phylogenetic trees is a computationally Molecular dating of phylogenies by likelihood methods of execution problem but efficient heuristics often produce reliable reconstructions in polynomial time Felsenstein, ; Price, Dehal, and Arkin, ; Stamatakis, Such heuristics become indispensable for large datasets of hundreds or thousands of sequences.

Again, exact inference from large datasets is computationally expensive since it requires high-dimensional optimization of complex likelihood functions or extensive sampling of the posterior distribution. Efficient heuristics are needed to cope with the growing datasets available today. One particularly common inference problem is estimating the time of historical events from sequence data. Molecular clock methods have since been used to date the divergence of ancient proteins billions of years ago as well as the spread of Molecular dating of phylogenies by likelihood methods of execution viruses on time scales less than a year Langley and Fitch, ; Rambaut, ; Yoder and Yang, ; Sanderson, Beyond dating of individual divergence events or a common ancestor algorithms have been developed to infer trees where branch lengths correspond directly to elapsed time and each node is placed such that its position reflects its known or inferred date.

Such trees are known as time trees, molecular clock phylogenies, or time stamped phylogenies. These methods have been generalized to allow for variation in substitution rates between different branches of the tree and between sites along a sequence. For a Molecular dating of phylogenies by likelihood methods of execution review of such methods, see Kumar and Hedges, In addition to questions regarding natural history, time trees are useful to study epidemiology and pathogen evolution Gardy, Loman, and Rambaut, In outbreak scenarios such as the recent Ebola virus EBOV or Zika virus outbreaks, rapid near real-time analysis of large numbers of viral genomes has the potential to assist epidemiological analysis and containment efforts —provided sample collection, sequencing, and analysis are sufficiently rapid Gardy, Loman, and Rambaut, BEAST samples many possible histories to evaluate posterior distributions of divergence times, evolutionary rates, and many other parameters.

BEAST implements Molecular dating of phylogenies by likelihood methods of execution large number of different phylogenetic and phylogeographic models. The sampling of trees, however, results in run-times of days to weeks for moderately large datasets of a few hundred sequences.

Molecular Biology and Evolution CodonPhyML:...

We developed a new tool called TreeTime that combines efficient heuristics with probabilistic sequence evolution models. TreeTime infers maximum likelihood time trees of a few thousand tips within a few minutes.

TreeTime was designed for applications in molecular epidemiology and analysis of rapidly evolving heterochronous viral sequences Volz, Koelle, and Bedford, It is already Molecular dating of phylogenies by likelihood methods of execution use as an integral component of the real-time time outbreak tracking tools nextstrain and nextflu Neher and Bedford, The main applications of TreeTime Molecular dating of phylogenies by likelihood methods of execution ancestral state inference, evolutionary model inference, and time tree estimation.

We discuss the core algorithms briefly below. Iteration is used on multiple levels, for example by iterating optimization of branch lengths, ancestral sequences, parameters of the relaxed clock, or coalescent models. Such an iterative procedure typically converges quickly when the branch lengths of Molecular dating of phylogenies by likelihood methods of execution tree are short such that ancestral sequence inference has little ambiguity.

Ancestral sequences or node positions can be determined to optimize the joint or marginal likelihood. A joint maximum-likelihood assignment corresponds to the global configuration with highest likelihood. In a marginal maximum-likelihood assignment, individual parameters are assigned to the most likely value after summing or integrating over all other unknown states.

However, when branch lengths are short and only a minority of sites change on a given branch, a joint optimization of branch lengths and ancestral sequences can be achieved by iteratively inferring branch length and ancestral sequences since corrections due to recurrent substitutions are neglibile.

Likewise maximum-likelihood branch length given the parent and offspring sequences are easy to optimize. We use this iterative optimization scheme to rapidly optimize branch length and ancestral sequences. For more divergent sequences, however, subleading states of internal nodes make a substantial contribution and the iterative optimization will underestimate the branch lengths.

In this case, TreeTime can use branch lengths provided in the input tree.

Molecular dating of phylogenies by...

For a fixed tree topology, TreeTime infers ancestral sequences maximizing the joint sequence likelihood see above. This approach is similar to the approach by Rambautbut the dynamic programming technique avoids computationally expensive numerical optimization of the branch lengths.

In analogy to maximum-likelihood inference of ancestral sequences the algorithm proceeds via a post-order tree traversal propagating the maximum-likelihood assignments of subtrees towards the root, and a pre-order traversal selecting the optimal subtree given the placement of the parent node. Specifically, we calculate in post-order for each node n. E n t accounts for external contraints imposed on the date of the node e.

Download Date | 1/11/13 PM...

The time t is measured as time before present. Temporal information is propagated along the branches of the tree via. This distribution is conditional on the sequences assigned to node n and its parent.

Molecular dating of phylogenies by likelihood methods of execution, C n t p Molecular dating of phylogenies by likelihood Molecular dating of phylogenies by likelihood methods of execution of execution the distribution of the date t p of the parent of node ngiven the constraints from the tips descending from node n and the substitutions that accumulated on the branch to the parent node.

The different Molecular dating of phylogenies by likelihood methods of execution are illustrated in Fig. Terminal nodes in the tree are either associated with exact dates or date ranges node c 2 in this example. Once the Molecular dating of phylogenies by likelihood methods of execution transversal arrives at the root, the marginal distribution of time t of the root node r is given by. The corresponding marginal distributions of other nodes are then calculated during a pre-order traversal via.

The result of the marginal reconstruction is a probability distribution of the node date given the tree, the ancestral sequence assignment, and the evolutionary model while the unknown times of other nodes are traced out.

From this distribution, confidence intervals of node dates can be computed in a straight-forward manner. TreeTime allows one to compute joint or marginal maximum-likelihood dates, but the algorithm described above can be used for any continuous character on the tree. We will use an analogous algorithm below to estimate parameters of relaxed molecular clock models. The fraction of variance in root-to-tip RTT distance explained by a linear regression on sampling date is given by.

The distances d i are measured as the sum of lengths of all branches from the root to the tip, that is, the expected number of substitutions since the root divided by the length of the sequence. The angular brackets denote the sample average. The regression and r 2 depend on the choice of root since the d i depend on the root. In absence of an outgroup, the root is often chosen to maximize r 2 or minimize the squared residuals of a linear fit to the RTT distance.

This search for the optimal root can be achieved in linear time in the number of sequences N by first calculating. With these quantities at hand, r 2 can be calculated for any choice of root on the tree as detailed in the Appendix.

Hence two tree traversals are sufficient to determine the optimal root. The root position that minimizes the mean squared residual can be calculated analogously. In general, the optimal position of the root will not be an internal node, but a position between two nodes on a branch of the tree.

Such optimal position on internal branches of the tree can be determined from the quantities calculated above by solving a quadratic equation without any numerical optimization. The required algebra is described in the Appendix.

Phylogenetic trees of many very similar sequences are often poorly resolved and contain multifurcating nodes also known as polytomies. Tree building software often randomly resolves these polytomies into a series of bifurcations. However, the order of bifurcations will often be inconsistent with the temporal structure of the tree resulting in poor approximations. To overcome this problem, TreeTime can prune all branches of length zero and resolve the resulting polytomies in a manner consistent with the sampling dates.

For each pair of nodes, TreeTime calculates by how much the likelihood would increase when grouping this pair of nodes into a clade of size two. The polytomy is Molecular dating of phylogenies by likelihood methods of execution resolved Molecular dating of phylogenies by likelihood methods of execution by always grouping pairs corresponding to the highest gain.

The likelihood of observing a particular genealogical tree depends on the size of the population, its geographic structure, and fitness variation Molecular dating of phylogenies by likelihood methods of execution the Molecular dating of phylogenies by likelihood methods of execution Kingman, ; Nordborg, ; Neher, Hence parameters of models describing the ensemble of genealogies can be estimated from the data.

In the simplest case of a panmictic population without fitness variation, the ensemble of genealogies is described by Molecular dating of phylogenies by likelihood methods of execution Kingman coalescent, possibly with a population size that changes over time. Here, the population size N t defines a time scale measured in units of generation time and we will more generally refer to this time scale by T c t and measure it in units of the inverse clock rate.

The contribution of a branch between time points t 0 child and t 1 parent in the tree to the likelihood is then given by. TreeTime can estimate population sizes or coalescent time scales by maximizing the likelihood contribution of the coalescent likelihood for a fixed tree.

The latter can be evaluated in one tree traversal by summing contributions from branches and merger events. In addition to a constant T cTreeTime can model T c as a piecewise linear function and optimize the parameters of that function. As part of the iterative optimization by TreeTime, the next round of optimization of branch lengths and dates of ancestral nodes will account for the coalescent likelihood.

The newly inferred dates will in turn be used to update the parameters of the coalescent model as described earlier. Large phylogenies typically contain s of substitutions and thus provide enough information to infer substitution models from the data. TreeTime first reconstructs ancestral sequences using a standard substitution model specified by the user Jukes-Cantor by default.

From this reconstruction, TreeTime Molecular dating of phylogenies by likelihood methods of execution the time T i spent in different states i across the tree, and the number of substitutions n ij between any pair of states ij. This algorithm typically converges in a few iterations. Substitution rates can vary across the tree and models that assume constant clock rates may give inaccurate inferences.

Introduction

These models Molecular dating of phylogenies by likelihood methods of execution regularize clock rate variation through a prior and penalize rapid changes of the rate by coupling the rate along branches—known as autocorrelated or local molecular clock Thorne, Kishino, and Painter, ; Aris-Brosou, Yang, and Huelsenbeck, TreeTime implements an autocorrelated molecular with a normal prior on variation in clock rates.

Other priors could be implemented, but would require numerical optimization or approximations. TreeTime is implemented in Python version 2.

Molecular dating of phylogenetic divergence...

Computationally costly operations are cast into array operations executed by numpy whenever possible. TreeTime is organized as a hierarchy of classes. TreeAnc performs maximum-likelihood inference of ancestral sequences, ClockTree infers a time scaled phylogeny given a tree topology, and TreeTime adds an additional layer of functionality including rerooting, polytomy resolution, coalescent models, and relaxed clocks. The substitution model is implemented in the class GTR.

This structure allows TreeTime to be used in a modular fashion in Python based phylogenetic analysis pipelines.

MORE: Radiometric dating methods time limitations quotes

YOU ARE HERE:
News feed