Bayesian inference in phylogeny

Bayesian inference in phylogeny

Bayesian inference in phylogeny generates a posterior distribution for a parameter, composed of a phylogenetic tree and a model of evolution, based on the prior for that parameter and the likelihood of the data, generated by a multiple alignment. The Bayesian approach has become more popular due to advances in computational machinery, especially, Markov Chain Monte Carlo algorithms. Bayesian inference has a number of applications in molecular phylogenetics, for example, estimation of species phylogeny and species divergence times.

Basic Bayesian Theory

Recall that for Bayesian inference:

p( heta | D) = frac{p(D| heta)p( heta)}{p(D)}

The denominator p(D) is the "marginal probability of the data", averaged over all possible parameter values weighted by their prior distribution. Formally,

p(D) = int_{Theta}p(D| heta)p( heta)d heta

where Theta is the parameter space for heta .

In the original Metropolis algorithm, given a current heta -value x , and a new heta -value y , the new value is accepted with probability:

::h(y)/h(x) = frac{p(D|y)p(y)}{p(D|x)p(x)}

The LOCAL algorithm of Larget and Simon

The LOCAL algorithm begins by selecting an internal branch of the tree at random. The nodes at the ends of this branch are each connected to two other branches. One of each pair is chosen at random. Imagine taking these three selected edges and stringing them like a clothesline from left to right, where the direction (left/right) is also selected at random. The two endpoints of the first branch selected will have a sub-tree hanging like a piece of clothing strung to the line. The algorithm proceeds by multiplying the three selected branches by a common random amount, akin to stretching or shrinking the clothesline. Finally the leftmost of the two hanging sub-trees is disconnected and reattached to the clothesline at a location selected uniformly at random. "This is the candidate tree".

Suppose we began by selecting the internal branch with length $$t_8 (in Figure (a) (to be added)) that separates taxa A and B from the rest. Suppose also that we have (randomly) selected branches with lengths t_1 and t_9 from each side, and that we oriented these branches as shown in Figure(b). Let m = t_1+t_8+t_9 , be the current length of the clothesline. We select the new length to be m^{star} = mexp(lambda(U_1-0.5)) , where U_1 is a uniform random variable on (0,1) . Then for the LOCAL algorithm, the acceptance probability can be computed to be:

::frac{h(y)}{h(x)} imes fracm^{star^3}{m^3}

Assessing Convergence

Suppose we want to estimate a branch length of a 2-taxon tree under JC, in which n_1 sites are unvaried and n_2 are variable. Assume exponential prior distribution with rate lambda . The density is p(t) = lambda e^{-lambda t} . The probabilities of the possible site patterns are:

::1/4left(1/4+3/4e^{-4/3t} ight)

for unvaried sites, and

:: 1/4left(1/4-1/4e^{-4/3t} ight)

Thus the unnormalized posterior distribution is:

: h(t) = left(1/4 ight)^{n_1+n_2}left(1/4+3/4{e^{-4/3t^{n_1} ight)

or, alternately,

: h(t) = left(1/4-1/4{e^{-4/3t^{n_2} ight)(lambda e^{-lambda t}

Update branch length by choosing new value uniformly at random from a window of half-width w centered at the current value:

: t^{star} = |t+U|

where U is uniformly distributed between -w and w . The acceptanceprobability is:

: h(t^{star})/h(t)

Example: n_1 = 70 , n_2 = 30 . We will compare results for two values of w , w = 0.1 and w = 0.5 . In each case, we will begin with an initial length of 5 and update the length 2000 times. (See Figure 3.2 (to be added) for results.)

Metropolis-coupled MCMC (Geyer)

If the target distribution has multiple peaks, separated by low valleys, the Markov chain may have difficulty in moving from one peak to another. As a result, the chain may get stuck on one peak and the resulting samples will not approximate the posterior density correctly. This is a serious practical concern for phylogeny reconstruction, as multiple local peaks are known to exist in the tree space during heuristic tree search under maximum parsimony (MP), maximum likelihood (ML), and minimum evolution (ME) criteria, and the same can be expected for stochastic tree search using MCMC. Many strategies have been proposed to improve mixing of Markov chains in presence of multiple local peaks in the posterior density. One of the most successful algorithms is the Metropolis-coupled MCMC (or mathrm{MC}^3 ).

In this algorithm, m chains are run in parallel, with different stationary distributions pi_j(.) , j = 1, 2, ldots, m , where the first one, pi_1 = pi is the target density, while pi_j , j = 2, 3, ldots, m are chosen to improve mixing. For example, one can choose incremental heating of the form:

:: pi_j( heta) = pi( heta)^{1/ [1+lambda(j-1)] }, lambda > 0,

so that the first chain is the cold chain with the correct target density, while chains 2, 3, ldots, m are heated chains. Note that raising the density pi(.) to the power 1/T with T>1 has the effect of flattening out the distribution, similar to heating a metal. In such a distribution, it is easier to traverse between peaks (separated by valleys) than in the original distribution. After each iteration, a swap of states between two randomly chosen chains is proposed through a Metropolis-type step. Let heta^{(j)} be the current state in chain j , j = 1, 2, ldots, m . A swap between the states of chains i and j is accepted with probability:

: alpha = frac{pi_i( heta^{(j)}pi_j( heta^{(i)})}{pi_i( heta^{(i)})pi_j( heta^{(j)})}

At the end of the run, output from only the cold chain is used, while those from the hot chains are discarded. Heuristically, the hot chains will visit the local peaks rather easily, and swapping states between chains will let the cold chain occasionally jump valleys, leading to better mixing. However, if pi_i( heta)/pi_j( heta) is unstable, proposed swaps will seldom be accepted. This is the reason for using several chains which differ only incrementally. (See Figure3.3 (to be added)).

An obvious disadvantage of the algorithm is that m chains are run and only one chain is used for inference. For this reason, mathrm{MC}^3 is ideally suited for implementation on parallel machines, since each chain will in general require the same amount of computation per iteration.

References

* Geyer, C.J. (1991) Markov chain Monte Carlo maximum likelihood. In "Computing Science and Statistics: Proceedings of the 23rd Symposium of the Interface" (ed. E.M. Keramidas), pp. 156-163. Interface Foundation, Fairfax Station, VA.
* Yang, Z. and B. Rannala. (1997) Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. "Molecular Biology and Evolution", 14, 717-724.
* Larget, B. and D.L. Simon. (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. "Molecular Biology and Evolution", 16, 750-759.
* Huelsenbeck, J.P. and F. Ronquist. (2001) MrBayes: Bayesian inference in phylogenetic trees. "Bioinformatics", 17, 754-755.
* Ronquist, F. and J.P. Huelsenbeck. (2003) MrBayes3: Bayesian phylogenetic inference under mixed models. "Bioinformatics", 19, 1572-1574.
* Rannala, B. and Z. Yang. (2003) Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. "Genetics", 164, 1645-1656.


Wikimedia Foundation. 2010.

Игры ⚽ Нужна курсовая?

Look at other dictionaries:

  • Distance matrices in phylogeny — Distance matrices are used in phylogeny as non parametric distance methods were originally applied to phenetic data using a matrix of pairwise distances. These distances are then reconciled to produce a tree (a phylogram, with informative branch… …   Wikipedia

  • Maximum parsimony (phylogenetics) — Parsimony is a non parametric statistical method commonly used in computational phylogenetics for estimating phylogenies. Under parsimony, the preferred phylogenetic tree is the tree that requires the least evolutionary change to explain some… …   Wikipedia

  • List of statistics topics — Please add any Wikipedia articles related to statistics that are not already on this list.The Related changes link in the margin of this page (below search) leads to a list of the most recent changes to the articles listed below. To see the most… …   Wikipedia

  • Monte Carlo method — Not to be confused with Monte Carlo algorithm. Computational physics …   Wikipedia

  • Доказательства эволюции — Ископаемый археоптерикс, обнаруженный вскоре после публикации « …   Википедия

  • Information theory — Not to be confused with Information science. Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental… …   Wikipedia

  • List of phylogenetics software — This list of phylogenetics software is a compilation of computational phylogenetics software used to produce phylogenetic trees. Such tools are commonly used in comparative genomics, cladistics, and bioinformatics. Methods for estimating… …   Wikipedia

  • Computational phylogenetics — is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa. For… …   Wikipedia

  • Occam's razor — For the aerial theatre company, see Ockham s Razor Theatre Company. It is possible to describe the other planets in the solar system as revolving around the Earth, but that explanation is unnecessarily complex compared to the modern consensus… …   Wikipedia

  • Quantitative comparative linguistics — is a branch of comparative linguistics that applies mathematical models to the problem of classifying language relatedness. This includes the use of computational phylogenetics and cladistics to define an optimal tree (or network) to represent a… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”