Phylogenetic Resolution Across Data Partitions
By parsing our data into exonic (Supplemental figure 1A, 4.7 kb), intronic (Supplemental figure 1B, 8.6 kb) or noncoding (Supplemental figure 1C, 9.5 kb including all intronic plus UTR and noncoding sequence) partitions, we begin to see which topologies are well supported by which types of data. For example, the Indriidae+Lepilemuridae+Cheirogaleidae relationship (node 8) is well supported by combined nuclear data (MP bootstrap 94, ML bootstrap 99 and Bayesian PP 1.0) and is resolved, yet not well supported, by separate analyses of the exonic, intronic, and noncoding data (MP boostraps 77, 70, 75, ML boostraps 76, 90, 97, respectively). The exonic partition yields the greatest reduction in overall branch support as compared to the total nuclear dataset, indicating a substantial reduction in informative sites. Some terminal nodes also indicate differing topologies with different data partitions (see node 15 in Supplemental figure 1A and nodes 21-25 in Supplemental figures 1A-C).
Estimating Concordance among Genes Trees
We used a Bayesian gene-to-tree map approach to assess the level of concordance of our individual nuclear gene trees relative to the total nuclear concatenated phylogenetic tree (Ane et al. 2007). By using a prior distribution of the number of distinct gene trees that should exist across all genes (), the posterior distribution of trees from a single-gene Bayesian analysis can then influence the posterior distribution of trees in another single-gene analysis. With a prior of =0, all genes are expected to give the same tree; an =∞ indicates all genes should have a different tree. An important component of this analysis is the posterior distribution of clade concordance, which measures the number of genes that support a particular branch. High levels of clade concordance are indicative that a branch in the total evidence tree is resolved as a result of phylogenetic information present in multiple genes and not through the overwhelming influence of a single or small number of gene partitions.
We performed the Bayesian analyses described in Ane et al. (2007) on a 13 nuclear-gene subset of our data. This method requires equal taxon sampling in each individual dataset. Five of the datasets generated from previously published primers were missing key taxa and were excluded from the analysis (FGA, SLC11A1, RAG1, TTR, and VWF). Due to substantial variation in relationships among some of the more terminal branches (especially within Eulemur), we used datasets with the following taxa pruned (Eulemur coronatus, E. fulvus albifrons, E. f. collaris, E. f. rufus, E. f. sanfordi, E. macaco flavifrons, E. m. macaco, E. mongoz, Microcebus murinus, and Propithecus tattersalli). Pruned single-gene datasets were analyzed in MrBayes as described above to generate a posterior tree distribution. These posterior distributions were subsequently analyzed using BUCKy v1.1 (Larget 2006). We performed analyses using values of α=0.1 (where the median of the prior probability distribution is 1 distinct gene tree), α=1 (where the median is 3 trees), and α=10 (where the median is 9 trees). All BUCKy MCMCMC analyses were run for 1 million generations following a burnin of 100,000 generations. Four chains were run (3 heated, 1 cold) with an exchange rate every 100 generations.
Our concordance analyses identify eight loci whose joint posterior densities are focused almost exclusively on the primary concordance topology (Supplemental Table 4). A ninth locus (CFTR-pair B) also places a modest PP on the primary concordance topology (PP=0.4173), which is only slightly less than the highest PP tree placing the Cheirogalidae as sister to the Indriidae (PP=0.4832). Interestingly, single-locus Bayesian analysis of a tenth locus (ADORA) resolves a tree nearly identical to the Bayesian primary concordance tree except for the paraphyletic resolution of the genus Microcebus (Supplemental Table 4). However, our concordance analysis shifts the joint posterior density towards a topology that places the Indriidae and Lepilemuridae as sister taxa. Nonetheless, the joint posterior densities of these latter two loci are consistent with the primary concordance tree in placing the Cheirogalidae, Indriidae, and Lepilemuridae in a clade, regardless of its internal branching structure.
Two of the loci removed from concordance analyses due to missing taxa are missing sequence data from too many taxa to draw conclusions about relationships among the Cheirogalidae, Indriidae, Lemuridae, and Lepilemuridae. However, three loci (FGA, RAG1, and VWF) are consistent in placing the Cheirogalidae, Indriidae, and Lepilemuridae in a clade, similar to the genes used in the concordance analyses that support the primary concordance tree.
As an additional assessment of the influence that a single gene partition may be having on the resolution of the total evidence tree, we conducted a series of analyses in which a single partition was removed (‘leave-one-out’ test). Our complete dataset was first run through the parsimony analysis as previously described. Then each partition was removed individually and the parsimony analysis was run again. We recorded all parsimony bootstrap values (1000 replicates) for each node to verify that the support for nodes did not substantially change by removal of one partition. With the exception of nodes 23-25, all permutations yielded trees with an identical topology and similar measures of branch support (Supplemental Table 3). Only nodes 8 and 18 (see Figure 1) increased or decreased substantially after single partition removal, consistent with our Bayesian concordance results indicating that certain loci contribute more to the branch support for these nodes than do others.
PCR and Sequencing
PCR assays were conducted with a final volume of 10 or 20µl using 1µl template DNA (approximately 50-150ng DNA), 25µM each dNTP (Genesee, San Diego, CA), 1µM each primer, and 0.625U PlatinumTM Taq High Fidelity (Invitrogen, Carlsbad, CA) in a standard 1x reaction buffer. Typical amplification conditions were carried out as follows: an initial denaturation of 2 min at 94C, followed by 35 cycles of 30s at 94C, 30s at 55C and 45s at 68C. A final extension for 7 minutes was performed at 68C. See Supplemental Table 1 for conditions specific to each primer pair and supplemental table 5 for amplification results. All cycling conditions were carried out using MJ Research, Inc. (Waltham, MA) thermocyclers.
PCR products were directly sequenced using both forward and reverse PCR primers and BigDye® Terminator v3.1 (Applied Biosystems, Foster City, CA). Prior to sequencing, 8µl PCR product (2/5 total PCR volume) was treated with 1.5U exonuclease I and 0.3U shrimp alkaline phosphatase to eliminate deoxynucleotide triphosphates and excess single-stranded DNA. These reactions were incubated at 37C for 5 minutes, followed by 72C for 15 minutes to inactivate the enzymes. Alternatively, some PCR products were purified using DNA Clean & ConcentratorTM 5 columns (Zymo Research, Orange, CA). Cycle sequencing was performed in a total volume of 10µL including 5µl Exo/SAP treated product (or equivalent amount if column purified), 2µM primer, 0.5µl BDv3.1 and water to 10µl. Cycle sequencing conditions were carried out for 25 cycles: 95C for 10 sec, 55C for 5 sec, 60C for 2 min and a final hold at 10C. Fluorescent traces were analyzed using Applied Biosystems 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA). PCR products failing to produce high quality sequence by direct sequencing were ligated using the pGEM-T Easy Vector System II (Promega, Madison, WI) with approximately 20-40 ng PCR product as insert. Transformations were conducted as recommended by the manufacturer (Promega, Madison, WI). At least three independent colonies were picked and inserts were verified to be the correct size by restriction digest. Miniprep DNA was isolated using the QIAprep Spin Miniprep Kit (Qiagen, Valencia, CA). Approximately 100-300 ng plasmid DNA was used in each sequencing reaction with 0.5µl BDv3.1, 2µM primer (primer 780: GTAAAACGACGGCCAGT or primer 545: CAGGAAACAGCTATGAC) and 1µl BD buffer to a total volume of 10µl. Cycle sequencing conditions were carried out as above except that the 60C extension was conducted for 4 min instead of 2 min. Sequence quality was assessed using either PHRED/PHRAP/CONSED (Gordon et al. 1998) or Sequencher (Gene Codes Corporation, Ann Arbor, MI). Base calls with a PHRED quality score of less than 30 were scored as Ns. Sequences were obtained for each taxon by comparing forward and reverse PCR product sequences and/or aligning all plasmid insert sequences. Accession numbers corresponding to each sequence are deposited in GenBank (Supplemental table 6). Outgroup sequences (human, chimpanzee, macaque) were obtained using PSL map (http://hgdownload.cse.ucsc.edu/downloads.html) based on the human coordinates.
SUPPLEMENTAL FIGURE LEGENDS
Supplemental Figure 1: Lemur Partitioned Bayesian Phylograms
Phylograms for the A) exonic (4.7 kb), B) intronic (8.6 kb) and C) noncoding (9.5 kb) partitions are shown. The nodes are numbered according to those in Figure 1 with filled black circles indicating nodes with strong support (PP=1.0, MP bootstrap >90%, ML bootstrap >90%). Node numbers in red indicate nodes that show strong support in the combined nuclear data set (Figure 1) but have dropped below the threshold for the particular partitioned set. See Table 3 for branch support values for each partitioned data set. For the exonic partition (A), node 23 was not resolved, and for the intronic (B) and noncoding (C) partitions, nodes 24 and 25 were not resolved, so have been removed from the figure.
Supplemental Figure 2: Divergence Estimates in Present Study Compared to Previous Analyses
Depicted here is the calculated BEAST tree (Figure 3) superimposed with other previously calculated divergence estimates. Gray bars span the 95% highest posterior density of divergence time estimates from our present study. Green bars span the divergence times estimated in (Yoder and Yang 2004) while blue bars span the ranges from (Roos et al. 2004) and black bars span the ranges estimated in (Poux et al. 2005). The colored numbers in brackets give the numerical value of the range with a short line perpendicular to each bar representing the calculated date in each previous study. Brackets from other studies not overlapping nodes from the present study have been extended beyond the divergence range by dashed lines.
Ane, C., Larget, B., Baum, D.A., Smith, S.D., and Rokas, A. 2007. Bayesian estimation of concordance among gene trees. Mol Biol Evol 24: 412-426.
Deinard, A. and Smith, D.G. 2001. Phylogenetic relationships among the macaques: evidence from the nuclear locus NRAMP1. J Hum Evol 41: 45-59.
Flynn, J.J. and Nedbal, M.A. 1998. Phylogeny of the Carnivora (Mammalia): congruence vs incompatibility among multiple data sets. Mol Phylogenet Evol 9: 414-426.
Gordon, D., Abajian, C., and Green, P. 1998. Consed: a graphical tool for sequence finishing. Genome Res 8: 195-202.
Heckman, K.L., Mariani, C.L., Rasoloarison, R., and Yoder, A.D. 2007. Multiple nuclear loci reveal patterns of incomplete lineage sorting and complex species history within western mouse lemurs (Microcebus). Mol Phylogenet Evol 43: 353-367.
Irwin, D.M., Kocher, T.D., and Wilson, A.C. 1991. Evolution of the cytochrome b gene of mammals. J Mol Evol 32: 128-144.
Larget, B. 2006. Bayesian untangling of Concordance Knots (BUCKy), version 1.1. Department of Statistics, University of Wisconsin.
Mancuso, D.J., Tuley, E.A., Westfield, L.A., Worrall, N.K., Shelton-Inloes, B.B., Sorace, J.M., Alevy, Y.G., and Sadler, J.E. 1989. Structure of the gene for human von Willebrand factor. J Biol Chem 264: 19514-19527.
Murphy, W.J., Eizirik, E., O'Brien, S.J., Madsen, O., Scally, M., Douady, C.J., Teeling, E., Ryder, O.A., Stanhope, M.J., de Jong, W.W. et al. 2001. Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science 294: 2348-2351.
Poux, C., Madsen, O., Marquard, E., Vieites, D.R., de Jong, W.W., and Vences, M. 2005. Asynchronous colonization of Madagascar by the four endemic clades of primates, tenrecs, carnivores, and rodents as inferred from nuclear genes. Syst Biol 54: 719-730.
Roos, C., Schmitz, J., and Zischler, H. 2004. Primate jumping genes elucidate strepsirrhine phylogeny. Proc Natl Acad Sci U S A 101: 10650-10654.
Stanhope, M.J., Czelusniak, J., Si, J.S., Nickerson, J., and Goodman, M. 1992. A molecular perspective on mammalian evolution from the gene encoding interphotoreceptor retinoid binding protein, with convincing evidence for bat monophyly. Mol Phylogenet Evol 1: 148-160.
Yoder, A.D. and Yang, Z. 2004. Divergence dates for Malagasy lemurs estimated from multiple gene loci: geological and evolutionary context. Mol Ecol 13: 757-773.