Inferring Phylogenies
Joseph Felsenstein
2004
580 pages
paper
About This Title
Phylogenies (evolutionary trees) are basic to thinking about and analyzing differences between species. Statistical, computational, and algorithmic work on them has been ongoing for four decades, with great advances in understanding. Yet no book has summarized this work until now. Inferring Phylogenies explains clearly the assumptions and logic of making inferences about phylogenies, and using them to make inferences about evolutionary processes. It is an essential text and reference for anyone who wants to understand how phylogenies are reconstructed and how they are used.
As phylogenies are inferred with various kinds of data, this book concentrates on some of the central ones: discretely coded characters, molecular sequences, gene frequencies, and quantitative traits. Also covered are restriction sites, RAPDs, and microsatellites.
Inferring Phylogenies is intended for graduate-level courses, assuming some knowledge of statistics, mathematics (calculus and fundamental matrix algebra), molecular sequences, and quantitative genetics.
Back to top
About the Author(s)
Joe Felsenstein is Professor in the Department of Genome Sciences at the University of Washington, Seattle, where he has taught for more than thirty years. He earned a B.S. (Honors) in Zoology from the University of Wisconsin, Madison, and a Ph.D. in Zoology from the University of Chicago. Dr. Felsenstein is the author of the widely used PHYLIP package of programs for inferring phylogenies. He served as President of the Society for the Study of Evolution in 1993 and has received numerous awards, including: election to membership in the American Academy of Arts and Sciences (1992); the Sewall Wright Award, American Society of Naturalists (1993; election to membership in the National Academy of Sciences (1999); and the Weldon Memorial Prize, Oxford University (2000). His work has ranged from theoretical evolutionary genetics to statistical methods for inferring phylogenies. His current research interests include the development of coalescent-based Markov Chain Monte Carlo methods of computing likelihoods for models of evolution within species, and development of statistical methods for inferences about quantitative characters within and between species.
The Author at Work
Joe presenting at the 2004 SSE meeting in Fort Collins, CO.
Companion Webpages
Errata: http://evolution.gs.washington.edu/book/typos.html
Datasets: http://evolution.gs.washington.edu/book/datasets.html
Back to top
Reviews and Commentary
“Joe Felsenstein has had more positive influence on the statistical revolution of phylogenetics than any other researcher in the field. For that reason, many biologists view him as the father of statistical phylogenetics. It was with this in mind that I finally got my hands on his long-awaited book, Inferring Phylogenies. The short answer is: it delivers. . . . Inferring Phylogenies is quite simply an instant classic.”
—AJ Drummond, Heredity
“The book is full of expert insights, as one would expect from an author who has made important original contributions to many of the areas he covers. Felsenstein provides beautiful and creative accounts of many topics. . . . It will be a long time before there will be a comparable book; perhaps the field is now growing too fast for there to ever be one. The publication of Inferring Phylogenies is a milestone for evolutionary biology in general and phylogenetics in particular.”
—Fredrik Ronquist, Science
“The author certainly sets out with an ambitious goal: to survey, in one book, the field of phylogenetics since computational methods entered the arena 40 years ago, and he amply delivers on this promise. . . . For researchers new to this area, the book describes contemporary methodology in a way that is both accessible and authoritative. . . . For ‘old hands,’ it provides a wealth of background and commentary.”
—Mike Steel, TRENDS in Ecology and Evolution
“Occasionally a book is a classic by the time it is published, and this is it. . . . The breadth is very wide with all the main expected topics. . . . It is hard to imagine how any lab could function without this book.”
—David Penny, Systematic Biology
“Felsenstein’s book . . . represents a truly majestic discussion of the inference and applications of phylogenetic trees. The power of this volume lies in its unique combination of an accessible style with undoubted intellectual authority. . . . Over 30 years ago Crow and Kimura produced what has become the cornerstone of theoretical population genetics. Felsenstein has now given us the definitive resource for anyone interested in phylogenetics. This volume is an outstanding achievement.”
—Edward C. Holmes, The Quarterly Review of Biology
Back to top
Table of Contents
PREFACE
1. Parsimony methods
- A simple example
- Evaluating a particular tree
- Rootedness and unrootedness
- Methods of rooting the tree
- Branch lengths
- Unresolved questions
2. Counting evolutionary changes
- The Fitch algorithm
- The Sankoff algorithm
- Connection between the two algorithms
- Using the algorithms when modifying trees
- Views
- Using views when a tree is altered
- Further economies
3. How many trees are there?
- Rooted bifurcating trees
- Unrooted bifurcating trees
- Multifurcating trees
- Unrooted trees with multifurcations
- Tree shapes
- Rooted bifurcating tree shapes
- Rooted multifurcating tree shapes
- Unrooted Shapes
- Labeled histories
- Perspective
4. Finding the best tree by heuristic search
- Nearest-neighbor interchanges
- Subtree pruning and regrafting
- Tree bisection and reconnection
- Other tree rearrangement methods
- Tree-fusing
- Genetic algorithms
- Tree windows and sectorial search
- Speeding up rearrangements
- Sequential addition
- Star decomposition
- Tree space
- Search by reweighting of characters
- Simulated annealing
- History
5. Finding the best tree by branch and bound
- A nonbiological example
- Finding the optimal solution
- NP-hardness
- Branch and bound methods
- Phylogenies: Despair and hope
- Branch and bound for parsimony
- Improving the bound
- Using still-absent states
- Using compatibility
- Rules limiting the search
6. Ancestral states and branch lengths
- Reconstructing ancestral states
- Accelerated and delayed transformation
- Branch lengths
7. Variants of parsimony
- Camin-Sokal parsimony
- Parsimony on an ordinal scale
- Dollo parsimony
- Polymorphism parsimony
- Unknown ancestral states
- Multiple states and binary coding
- Dollo parsimony and multiple states
- Polymorphism parsimony and multiple states
- Transformation series analysis
- Weighting characters
- Successive weighting and nonlinear weighting
- Successive weighting
- Nonsuccessive algorithms
8. Compatibility
- Testing compatibility
- The Pairwise Compatibility Theorem
- Cliques of compatible characters
- Finding the tree from the clique
- Other cases where cliques can be used
- Where cliques cannot be used
- Perfect phylogeny
- Using compatibility on molecules anyway
9. Statistical properties of parsimony
- Likelihood and parsimony
- The weights
- Unweighted parsimony
- Limitations of this justification of parsimony
- Farris's proofs
- No common mechanism
- Likelihood and compatibility
- Parsimony versus compatibility
- Consistency and parsimony
- Character patterns and parsimony
- Observed numbers of the patterns
- Observed fractions of the patterns
- Expected fractions of the patterns
- Inconsistency
- When inconsistency is not a problem
- The nucleotide sequence case
- Other situations where consistency is guaranteed
- Does a molecular clock guarantee consistency?
- The Farris zone
- Some perspective
10. A digression on history and philosophy
- How phylogeny algorithms developed
- Sokal and Sneath
- Edwards and Cavalli-Sforza
- Camin and Sokal and parsimony
- Eck and Dayhoff and molecular parsimony
- Fitch and Margoliash popularize distance matrix methods
- Wilson and Le Quesne introduce compatibility
- Jukes and Cantor and molecular distances
- Farris and Kluge and unordered parsimony
- Fitch and molecular parsimony
- Further work
- What about Willi Hennig and Walter Zimmerman?
- Different philosophical frameworks
- Hypothetico-deductive
- Logical parsimony
- Logical probability?
- Criticisms of statistical inference
- The irrelevance of classification
11. Distance matrix methods
- Branch lengths and times
- The least squares methods
- Least squares branch lengths
- Finding the least squares tree topology
- The statistical rationale
- Generalized least squares
- Distances
- The Jukes-Cantor model---an example
- Why correct for multiple changes?
- Minimum evolution
- Clustering algorithms
- UPGMA and least squares
- A clustering algorithm
- An example
- UPGMA on nonclocklike trees
- Neighbor-joining
- Performance
- Using neighbor-joining with other methods
- Relation of neighbor-joining to least squares
- Weighted versions of neighbor-joining
- Other approximate distance methods
- Distance Wagner method
- A related family
- Minimizing the maximum discrepancy
- Two approaches to error in trees
- A puzzling formula
- Consistency and distance methods
- A limitation of distance methods
12. Quartets of species
- The four point metric
- The split decomposition
- Short quartets methods
- The disk-covering method
- Challenges for the short quartets and DCM methods
- Three-taxon statement methods
- Other uses of quartets with parsimony
- Consensus supertrees
- Neighborliness
- De Soete's search method
- Quartet puzzling and searching tree space
- Perspective
13. Models of DNA evolution
- Kimura's two-parameter model
- Calculation of the distance
- The Tamura-Nei model, F84, and HKY
- The general time-reversible model
- Distances from the GTR model
- The general 12-parameter model
- LogDet distances
- Other distances
- Variance of distance
- Rate variation between sites or loci
- Different rates at different sites
- Distances with known rates
- Distribution of rates
- Gamma- and lognormally distributed rates
- Distances from gamma-distributed rates
- Models with nonindependence of sites
14. Models of protein evolution
- Amino acid models
- The Dayhoff model
- Other empirically-based models
- Models depending on secondary structure
- Codon-based models
- Inequality of synonymous and nonsynonymous substitutions
- Protein structure and correlated change
15. Restriction sites, RAPDs, AFLPs, and microsatellites
- Restriction sites
- Nei and Tajima's model
- Distances based on restriction sites
- Issues of ascertainment
- Parsimony for restriction sites
- Modeling restriction fragments
- Parsimony with restriction fragments
- RAPDs and AFLPs
- The issue of dominance
- Unresolved problems
- Microsatellite models
- The one-step model
- Microsatellite distances
- A Brownian motion approximation
- Models with constraints on array size
- Multi-step and heterogeneous models
- Snakes and Ladders
- Complications
16. Likelihood methods
- Maximum likelihood
- Computing the likelihood of a tree
- Economizing on the computation
- Handling ambiguity and error
- Unrootedness
- Finding the maximum likelihood tree
- Inferring ancestral sequences
- Rates varying among sites
- Hidden Markov models
- Autocorrelation of rates
- HMMs for other aspects of models
- Estimating the states
- Models with clocks
- Relaxing molecular clocks
- Models for relaxed clocks
- Covarions
- Empirical approaches to change of rates
- Are ML estimates consistent?
- Comparability of likelihoods
- A nonexistent proof?
- A simple proof
- Misbehavior with the wrong model
- Better behavior with the wrong model
17. Hadamard methods
- The edge length spectrum and conjugate spectrum
- The closest tree criterion
- DNA models
- Computational effort
- Extensions of Hadamard methods
18. Bayesian inference of phylogenies
- Bayes' theorem
- Bayesian methods for phylogenies
- Markov chain Monte Carlo methods
- The Metropolis algorithm
- Its equilibrium distribution
- Bayesian MCMC
- Bayesian MCMC for phylogenies
- Proposal distributions
- Computing the likelihoods
- Summarizing the posterior
- Priors on trees
- Controversies over Bayesian inference
- Universality of the prior
- Flat priors and doubts about them
- Applications of Bayesian methods
19. Testing models, trees, and clocks
- Likelihood and tests
- Likelihood ratios near asymptopia
- Multiple parameters
- Some parameters constrained, some not
- Conditions
- Curvature or height?
- Interval estimates
- Testing assertions about parameters
- Coins in a barrel
- Evolutionary rates instead of coins
- Choosing among nonnested hypotheses: AIC and BIC
- An example using the AIC criterion
- The problem of multiple topologies
- Interior branch tests
- Interior branch tests using parsimony
- A multiple-branch counterpart of interior branch tests
- Testing the molecular clock
- Parsimony-based methods
- Distance-based methods
- Likelihood-based methods
- The relative rate test
- Simulation tests based on likelihood
- More exact tests and confidence intervals
- Tests for three species with a clock
- Bremer support
- Zander's conditional probability of reconstruction
- More generalized confidence sets
20. Bootstrap, jackknife, and permutation tests
- The bootstrap and the jackknife
- Bootstrapping and phylogenies
- The delete-half jackknife
- The bootstrap and jackknife for phylogenies
- The multiple-tests problem
- Independence of characters
- Identical distribution --- a problem?
- Invariant characters and resampling methods
- Biases in bootstrap and jackknife probabilities
- $P$ values in a simple normal case
- Methods of reducing the bias
- The drug testing analogy
- Alternatives to P values
- Probabilities of trees
- Using tree distances
- Jackknifing species
- Parametric bootstrapping
- Advantages and disadvantages of the parametric bootstrap
- Permutation tests
- Permuting species within characters
- Permuting characters
- Skewness of tree length distribution
21. Paired-sites tests
- An example
- Multiple trees
- The SH test
- Other multiple-comparison tests
- Testing other parameters
- Perspective
22. Invariants
- Symmetry invariants
- Three-species invariants
- Lake's linear invariants
- Cavender's quadratic invariants
- The K invariants
- The L invariants
- Generalization of Cavender's L invariants
- Drolet and Sankoff's k-state quadratic invariants
- Clock invariants
- General methods for finding invariants
- Fourier transform methods
- Gröbner bases and other general methods
- Expressions for all the 3ST invariants
- Finding all invariants empirically
- All linear invariants
- Special cases and extensions
- Invariants and evolutionary rates
- Testing invariants
- What use are invariants?
23. Brownian motion and gene frequencies
- Brownian motion
- Likelihood for a phylogeny
- What likelihood to compute?
- Assuming a clock
- The REML approach
- Multiple characters and Kronecker products
- Pruning the likelihood
- Maximizing the likelihood
- Inferring ancestral states
- Gene frequencies and Brownian motion
- Using approximate Brownian motion
- Distances from gene frequencies
- A more exact likelihood method
- Gene frequency parsimony
24. Quantitative characters
- Neutral models of quantitative characters
- Changes due to natural selection
- Selective correlation
- Covariances of multiple characters in multiple lineages
- Selection for an optimum
- Brownian motion and selection
- Correcting for correlations
- Punctuational models
- Inferring phylogenies and correlations
- Chasing a common optimum
- The character-coding "problem"
- Continuous-character parsimony methods
- Manhattan metric parsimony
- Other parsimony methods
- Threshold models
25. Comparative methods
- An example with discrete states
- An example with continuous characters
- The contrasts method
- Correlations between characters
- When the tree is not completely known
- Inferring change in a branch
- Sampling error
- The standard regression and other variations
- Generalized least squares
- Phylogenetic autocorrelation
- Transformations of time
- Should we use the phylogeny at all?
- Paired-lineage tests
- Discrete characters
- Ridley's method
- Concentrated-changes tests
- A paired-lineages test
- Methods using likelihood
- Advantages of the likelihood approach
- Molecular applications
26. Coalescent trees
- Kingman's coalescent
- Bugs in a box—an analogy
- Effect of varying population size
- Migration
- Effect of recombination
- Coalescents and natural selection
- Neuhauser and Krone's method
27. Likelihood calculations on coalescents
- The basic equation
- Using accurate genealogies—a reverie
- Two random sampling methods
- A Metropolis-Hastings method
- Griffiths and Tavaré's method
- Bayesian methods
- MCMC for a variety of coalescent models
- Single-tree methods
- Slatkin and Maddison's method
- Fu's method
- Summary-statistic methods
- Watterson's method
- Other summary-statistic methods
- Testing for recombination
28. Coalescents and species trees
- Methods of inferring the species phylogeny
- Reconciled tree parsimony approaches
- Likelihood
29. Alignment, gene families, and genomics
- Alignment
- Why phylogenies are important
- Parsimony method
- Approximations and progressive alignment
- Probabilistic models
- Bishop and Thompson's method
- The minimum message length method
- The TKF model
- Multibase insertions and deletions
- Tree HMMs
- Trees
- Inferring the alignment
- Gene families
- Reconciled trees
- Reconstructing duplications
- Rooting unrooted trees
- A likelihood analysis
- Comparative genomics
- Tandemly repeated genes
- Inversions
- Inversions in trees
- Inversions, transpositions, and translocations
- Breakpoint and neighbor-coding approximations
- Synteny
- Probabilistic models
- Genome signature methods
30. Consensus trees and distances between trees
- Consensus trees
- Strict consensus
- Majority-rule consensus
- Adams consensus tree
- A dismaying result
- Consensus using branch lengths
- Other consensus tree methods
- Consensus subtrees
- Distances between trees
- The symmetric difference
- The quartets distance
- The nearest-neighbor interchange distance
- The path-length-difference metric
- Distances using branch lengths
- Are these distances truly distances?
- Consensus trees and distances
- Trees significantly the same? different?
- What do consensus trees and tree distances tell us?
- The total evidence debate
- A modest proposal
31. Biogeography, hosts, and parasites
- Component compatibility
- Brooks parsimony
- Event-based parsimony methods
- Relation to tree reconciliation
- Randomization tests
- Statistical inference
32. Phylogenies and paleontology
- Stratigraphic indices
- Stratophenetics
- Stratocladistics
- Controversies
- A not-quite-likelihood method
- Stratolikelihood
- Making a full likelihood method
- More realistic fossilization models
- Fossils within species: Sequential sampling
- Between species
33. Tests based on tree shape
- Using the topology only
- Harding's probabilities of tree shapes
- Tests from shapes
- Measures of overall asymmetry
- Choosing a powerful test
- Tests using times
- Lineage plots
- Likelihood formulas
- Other likelihood approaches
- Other statistical approaches
- A time transformation
- Characters and key innovations
- Work remaining
34. Drawing trees
- Issues in drawing rooted trees
- Placement of interior nodes
- Shapes of lineages
- Unrooted trees
- The equal-angle algorithm
- n-Body algorithms
- The equal-daylight algorithm
- Challenges
35. Phylogeny software
- Trees, records, and pointers
- Declaring records
- Traversing the tree
- Unrooted tree data structures
- Tree file formats
- Widely used phylogeny programs and packages
REFERENCES
INDEX
Back to top
Pricing and Options
Home ||
Contact Us ||
About Ordering ||
List by Author
List by Subject Area ||
List by Title ||
View Shopping Cart