Estimating Species Trees
eBook - ePub

Estimating Species Trees

Practical and Theoretical Aspects

  1. English
  2. ePUB (mobile friendly)
  3. Available on iOS & Android
eBook - ePub

Estimating Species Trees

Practical and Theoretical Aspects

About this book

Recent computational and modeling advances have produced methods for estimating species trees directly, avoiding the problems and limitations of the traditional phylogenetic paradigm where an estimated gene tree is equated with the history of species divergence. The overarching goal of the volume is to increase the visibility and use of these new methods by the entire phylogenetic community by specifically addressing several challenges: (i) firm understanding of the theoretical underpinnings of the methodology, (ii) empirical examples demonstrating the utility of the methodology as well as its limitations, and (iii) attention to technical aspects involved in the actual software implementation of the methodology. As such, this volume will not only be poised to become the quintessential guide to training the next generation of researchers, but it will also be instrumental in ushering in a new phylogenetic paradigm for the 21st century.

Trusted by 375,005 students

Access to over 1 million titles for a fair monthly price.

Study more efficiently using our study tools.

Information

Year
2011
Print ISBN
9780470526859
Edition
1
eBook ISBN
9781118211403
1 CHAPTER
ESTIMATING SPECIES TREES: AN INTRODUCTION TO CONCEPTS AND MODELS
L. Lacey Knowles
Laura S. Kubatko
1.1 INTRODUCTION
The estimation of relationships among species in an evolutionary context broadly falls within the purview of the discipline of systematics. However, as the central framework in evolutionary (and some ecological) study, the enormous impact of this single endeavor—phylogenetic estimation—is unquestionable. How, and whether, species relationships are accurately inferred are, consequently, issues of broad and far-reaching concern.
The goal of this book is to provide an overview of several recently developed methods for phylogenetic estimation that focus explicitly on the challenges and strengths inherent in the analysis of multilocus data while giving practical guidelines on implementing these approaches. Decreased sequencing costs and increased access to primer sets enhance the relative ease of data collection, providing unprecedented amounts of multilocus sequence for molecular phylogenetic analysis across all of biodiversity (e.g., Goldman and Yang 2008; Hughes et al. 2006; Wiens et al. 2008). Detailed suggestions and discussion throughout the chapters focus on both conceptual and methodological issues, addressing such topics as how results should be interpreted and how to recognize the signs of a problem with an analysis. The combination of theoretical and empirical studies contained herein serves to identify both the strengths and the limitations of these new methods under not only idealized situations with simulated data but also with empirical sequence data. The guidelines also serve to draw attention to the impact that sampling design, marker choice, and taxon sampling will have on the performance of the new methods.
1.1.1 Different Tree Types and Their Relationship to Phylogeny
As a characterization of the history of species divergence (including both the pattern and relative timing of lineage splitting), a phylogeny is a tree where both the topology and branch lengths portray information about the evolutionary history of species (Fig. 1.1). While molecular data predominate the pursuit of estimating the evolutionary history of species, the trees estimated from DNA sequences are clearly distinct from, and are not
Figure 1.1 Species trees contain information on both the pattern (topology) and timing (branch lengths) of species diversification. This phylogenetic history can be inferred from the gene trees that are embedded within the species lineages, which may or may not be concordant with the species tree (e.g., the deep coalescence of gene lineages marked with the red dots). By incorporating a model of gene lineage coalescence (in addition to the models of nucleotide substitution), the phylogenetic history of species (i.e., the species tree) can be estimated, despite widespread incomplete lineage sorting (i.e., sequences from multiple individuals per species— three individuals for this locus in this case—do not form monophyletic clades). (Illustration by John Megahan.)
img
synonymous with, the underlying species history—the species tree (Maddison 1997; Slowinski and Page 1999). In contrast to the differing genealogical histories (i.e., gene trees) that might characterize a locus (or a nonrecombining DNA fragment), there is only one species history, whether that history is strictly bifurcating (i.e., a species tree) or involves reticulations, which may or may not obscure species relationships.
The patterns of similarity and differences in the DNA sequences of organisms related by descent from common ancestors implicitly contain information about species relationships. That is, there is an intimate link between gene trees and the species tree in which they are embedded. This link means that gene trees are informative about species phylogenies, yet it is clear that a gene tree should not be equated with a species phylogeny since the evolutionary processes that determine the structure of gene trees differ from those governing species trees. The structure of a species tree is determined by the process of speciation, extinction, and in some cases, hybridization, whereas the gene tree structure reflects not only the proliferation and loss of species lineages but also the population genetic process of mutation and gene lineage coalescence within species lineages, and in some cases, the locus-specific effects of migration between species lineages.
Enormous attention has been dedicated to understanding the theoretical and computational challenges associated with estimating gene trees from molecular data, as well as the practical complications that arise with empirical investigations. For example, in addition to the development of very sophisticated methods for estimating a gene tree from DNA sequences (e.g., accommodating complex models of nucleotide evolution and evaluating the full probability of the data for a set of tree topologies and branch lengths; reviewed in Felsenstein 2004), the impact of various data properties on tree accuracy is also well studied (e.g., the number of base pairs analyzed and taxon sampling; Flynn et al. 2005; Graybeal 1998; Rannala et al. 1998; Rosenberg and Kumar 2001; Wiens 2003; Zwiki and Hillis 2002). In contrast, we are only beginning to understand the theoretical and computational challenges, as well as the practical complications of empirical data, when the target is to obtain an estimate of the species tree. For example, multiple processes may determine the relationship between species and their contained loci (e.g., gene lineage coalescence alone or in combination with gene flow). Moreover, the collection of possible bifurcating trees (i.e., the tree space) becomes enormous even for a moderate number of species. For example, even if only bifurcating processes are considered, and ignoring differences in branch lengths, there are approximately 2 × 10s trees for 10taxa. The difficulties posed by such issues, as well as strategies for contending with these challenges, are discussed in the following sections that trace the steps from species tree estimation back to the collection of DNA sequence data.
While much of the research on obtaining direct estimates of species trees has been driven by computational developments, these methodological changes do not represent the inception of new core phylogenetic concepts. The recent advances (paradoxically) provide a practical means of returning to the systematic tradition of estimating species relationship. Thus, in spite of the fact that estimating species trees involves a fundamental shift in how molecular data are used and interpreted, the target is still the phylogeny. Estimation of a species tree, in addition to putting the focus on the object of systematic interest, also provides a framework for studying the processes generating a set of contained gene trees because of the explicit distinction between the species tree and gene trees. For example, the discord among gene trees may be biologically meaningful (as opposed to being due to tree-building errors, for example; Jeffroy et al. 2006). The different gene trees may provide insights about the diversification process (e.g., the population size of the taxa relative to the divergence time separating speciation events, or the extent of gene flow among taxa), or whether species trees are meaningful if there is significant horizontal gene transfer, a question that requires empirical evaluation (e.g., Galtier and Daubin 2008 ).
1.2 THE RELATIONSHIP BETWEEN GENE TREES AND SPECIES TREES
Gene trees and species trees are different from one another for a variety of reasons. The most important of these is the possibility that evolutionary processes such as horizontal gene transfer, hybridization, gene duplication, or incomplete lineage sorting lead to differences in the underlying histories of each gene for a given species phylogeny. Understanding these evolutionary processes and their effect on the relationship between gene trees and species trees is thus a problem of central importance to the development of methods for estimating species phylogenies: the goal is estimation of species trees; the data available to do this come in the form of DNA sequences arising from the histories of individual genes. We must therefore strive to understand and effectively model the process by which sequence data arise on the individual gene trees, conditional on the overall species-level relationships.
The methods described and illustrated in this book incorporate one or more of the evolutionary processes mentioned above, and many of these models are common to several of the subsequent chapters on species tree estimation. For this reason, we will devote the next few sections to giving a relatively broad overview of the common models used to relate gene trees to species trees, with ample references to which the reader is directed to obtain a more detailed explanation. Section 1.2.1 defines the processes of horizontal gene transfer, gene duplication, hybridization, and incomplete lineage sorting, and briefly describes their effects on relationships between gene trees and species trees. Section 1.2.2 gives a more detailed description of the coalescent process because it is fundamental to several of the methods included in this book (e.g., Chapters 2, 4, 5, and 6). Section 1.3 then builds on this by describing methods for modeling nucleotide sequence evolution along gene trees.
1.2.1 Evolutionary Mechanisms for Gene Tree Discord
Maddison (1997) provides a very comprehensive description of the processes mentioned below, with explicit discussion of the effects of these processes on individual gene histories. Here we provide the following brief descriptions:
  • Horizontal gene transfer is a term used to describe a process by which genetic material is transferred from one species to another at a given point in time (thus corresponding to genetic exchange that occurs “horizontally” across a phylogeny), rather than from parent to offspring (which occurs “vertically” on a phylogeny). This could happen, for instance, when a vector such as a virus carries DNA from one species to another and this genetic material is subsequently integrated into the genome of the infected organism. Horizontal gene transfer events are known to occur commonly in the bacteria (Medigue et al. 1991; Syvanen 1994; Valdez and Pinero 1992). Horizontally transferred genes will, at least initially, be more closely related to the ancestors of the organism from which they were derived than to those in which they currently reside, thus leading to gene trees that differ from the species tree.
  • Gene duplication refers to the event that a copy of a particular gene is inserted into the genome, followed by the subsequent (and separate) evolution of the two copies. If a single copy of the gene is sampled from each organism, the sampling of a duplicated gene might result in the observation of a gene tree that differs from the species tree. Gene duplication events are prevalent in plants, fish, and insects.
  • Hybridization between species occurs when two distinct species interbreed, with the resulting formation of hybrid organisms that share some genetic material from each of the parental organisms. When hybridization occurs without formation of a new taxonomic lineage that is distinct from the parental lineages from which it was formed, the process is often referred to as introgression or introgressive hybridization. Hybridization is ubiquitous in nature, with current estimates that approximately 25% of plants and 10% of animals hybridize (Mallet 2007).
  • Incomplete lineage sorting occurs when multiple gene lineages persist through speciation events. Following a speciation event, some forms of the gene may be lost, while others are maintained and continue to evolve. This process is illustrated in Figure 1.2a, which shows a species tree for three taxa (outlined in bold, black lines) with several embedded gene trees (thinner, colored lines). For example, in the green gene tree, gene lineage C fails to find a most recent common ancestor with gene lineage B during time interval t, and instead finds a most recent common ancestor with gene lineage A above the root of the species tree. This leads to a gene tree that differs from the species tree (Fig. 1.2b). It is clear that the possibility of such events can result in gene trees that differ in substantial and important ways from the species tree. This process is commonly modeled by the coalescent.
Figure 1.2 Topology probabilities under the coalescent model for three-taxon trees. (a) The species tree is shown outlined in black. The time interval between the two speciation events is t, and should be interpreted in coalescent units (number of 2 N generations). The four embedded trees are the four possible gene histories when deep coalescent events are allowed. (b) The four possible gene histories from (a) are shown separately, with their probabilities under the coalescent model given beneath. Note that the two gene histories in the first row are the same when only the topology is considered, so that the probability of this gene tree topology under the coalescent model is the sum of these two probabilities. Thus, there are only three distin...

Table of contents

  1. Cover
  2. Title Page
  3. Copyright
  4. PREFACE
  5. CONTRIBUTORS
  6. CHAPTER 1 ESTIMATING SPECIES TREES: AN INTRODUCTION TO CONCEPTS AND MODELS
  7. CHAPTER 2 BAYESIAN ESTIMATION OF SPECIES TREES: APRACTICAL GUIDE TO OPTIMAL SAMPLING AND ANALYSIS
  8. CHAPTER 3 RECONSTRUCTING CONCORDANCE TREES AND TESTING THE COALESCENT MODEL FROM GENOME-WIDE DATA SETS
  9. CHAPTER 4 PROBABILITIES OF GENE TREE TOPOLOGIES WITH INTRASPECIFIC SAMPLING GIVEN A SPECIES TREE
  10. CHAPTER 5 INFERENCE OF PARSIMONIOUS SPECIES TREE FROM MULTILOCUS DATA BY MINIMIZING DEEP COALESCENCES
  11. CHAPTER 6 ACCOMMODATING HYBRIDIZATION IN A MULTILOCUS PHYLOGENETIC FRAMEWORK
  12. CHAPTER 7 THE INFLUENCE OF HYBRID ZONES ON SPECIES TREE INFERENCE IN MAN AKINS
  13. CHAPTER 8 SUMMARIZING GENE TREE IN CONOR UENCE AT MULTIPLE PHYLOGENETIC DEPTHS
  14. CHAPTER 9 SPECIES TREE ESTIMATION FOR COMPLEX DIVERGENCE HISTORIES: ACASE STUDY IN NEODIPRIONSAWFLIES
  15. CHAPTER 10 SAMPLING STRATEGIES FOR SPECIES TREE ESTIMATION
  16. CHAPTER 11 DEVELOPING NUCLEAR SEQUENCES FOR SPECIES TREE ESTIMATION IN NONMODEL ORGANISMS: INSIGHTS FROM ACASE STUDY OF BOTTAE’S POCKET GOPHER, THOMOMYS BOTTAE
  17. CHAPTER 12 ESTIMATING SPECIES RELATIONSHIPS AND TAXON DISTINCTIVENESS IN SISTRURUS RATTLESNAKES USING MULTILOCUS DATA
  18. INDEX

Frequently asked questions

Yes, you can cancel anytime from the Subscription tab in your account settings on the Perlego website. Your subscription will stay active until the end of your current billing period. Learn how to cancel your subscription
No, books cannot be downloaded as external files, such as PDFs, for use outside of Perlego. However, you can download books within the Perlego app for offline reading on mobile or tablet. Learn how to download books offline
Perlego offers two plans: Essential and Complete
  • Essential is ideal for learners and professionals who enjoy exploring a wide range of subjects. Access the Essential Library with 800,000+ trusted titles and best-sellers across business, personal growth, and the humanities. Includes unlimited reading time and Standard Read Aloud voice.
  • Complete: Perfect for advanced learners and researchers needing full, unrestricted access. Unlock 1.4M+ books across hundreds of subjects, including academic and specialized titles. The Complete Plan also includes advanced features like Premium Read Aloud and Research Assistant.
Both plans are available with monthly, semester, or annual billing cycles.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 990+ topics, we’ve got you covered! Learn about our mission
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more about Read Aloud
Yes! You can use the Perlego app on both iOS and Android devices to read anytime, anywhere — even offline. Perfect for commutes or when you’re on the go.
Please note we cannot support devices running on iOS 13 and Android 7 or earlier. Learn more about using the app
Yes, you can access Estimating Species Trees by L. Lacey Knowles, Laura S. Kubatko, L. Lacey Knowles,Laura S. Kubatko in PDF and/or ePUB format, as well as other popular books in Biological Sciences & Evolution. We have over one million books available in our catalogue for you to explore.