SUPPLEMENTARY MATERIALS – DETAILED METHODS AND

SUPPLEMENTARY MATERIALS – DETAILED METHODS AND RESULTS Detailed Methods Chloroplast DNA assembly and SNP detection An in silicio assembly of cpDNA was conducted using the libraries of Endiandra globosa to construct the chloroplast genome. This involved an independent de novo assembly of each library in Velvet 1.2.1 [1] in order to aid assembly of the contigs into a full chloroplast genome. The paired reads were assembled into contigs using the default parameters except the hash length (or k-mer length) was set as 81. The next steps were similar to the approach previously outlined in McPherson et al. [2] and Van der Merwe et al. [3]. To isolate chloroplast sequences from the contigs, each set of de novo contigs was BLASTed against a database of whole chloroplast genomes (168 seed plants downloaded from NCBI (http://www.ncbi.nlm.nih.gov/, 28 May 2012) in CLC Bio Genomics Workbench 7.5 (http://www.clcbio.com) (CLC) with default parameters. All contigs with an E-value of zero were exported into Geneious Pro v6.1.6 (Biomatters Ltd., http://www.geneious.com) for further assembly using the approach of McPherson et al. [2] and Van der Merwe et al. [3]. Without the availability of a close full genome reference, a scaffold mapping to the chloroplast genome sequence of Liriodendron tulipifera (NC_008326.1) was generated. The gaps between the remaining contigs, including IR junctions, were captured via manual editing. Quality trimmed reads available for E. globosa were mapped onto the sequence to verify that the assembly was correct. The cpDNA of Endiandra discolor was then constructed by mapping the available reads to this reference. To simplify the SNP detection analysis, each cpDNAs was maintained as one continuous sequence with one copy of the inverted repeat (IR) region removed. Pooling of eight individuals during the preparation of the shotgun library enabled the detection of within and between population SNPs. SNP detection using a similar approach to the study of McPherson et al. [2], Van der Merwe et al. [3] and Rossetto et al. [4] was conducted for each Endiandra species. Trimmed reads were mapped onto the new cpDNAs with all default settings except for a length fraction of 0.9 and similarity fraction of 0.8. Within population SNPs were detected using the Basic Variant Detection on CLC, with the default parameters except ploidy=2, minimum coverage=40 and minimum frequency=10%. SNP comparison between population was conducted with the paramaters, ploidy=1, minimum coverage=20 and minimum frequency=10%, in which the resulting alleles labelled as homozygous were retained as the between population SNPs. All SNPs were manually checked and only those with paired reads for both variants were accepted. SNPs and low coverage areas (defined as less than or equal to 10 times (x) coverage) were then annotated, and each mapping consensus file along with annotations were imported into Geneious for alignment following the approach of Van der Merwe et al. [3]. For each species we calculated genomic diversities and between-population genomic distances as summary statistics based on chloroplast genome variation based on Rossetto et al. [4]. Within-population genomic diversity was calculated using SNP counts, total number of nucleotide sites and sample size per population [5]. Within-species genomic diversity was calculated in the same way except using total SNP counts, total number of nucleotide sites and sample size per species. Between-population genomic distance was calculated as the proportion of nucleotide sites at which the consensus sequence for each population was different, following Nei and Kumar [6]. Detailed Results Sequencing, assembly and SNP detection Using a similar approach to McPherson et al. [2], we were able to assemble the chloroplast genomes of Endiandra globosa (158,585bp) and Endiandra discolor (158,567bp) from Illumina 150bp paired-end sequencing data with an average coverage of 172x, and 100x respectively. The lengths of the IR regions were found to be 25,464bp and 25,466bp respectively. A total of eight whole-genome shotgun sequencing libraries representing two Endiandra species produced approximately 2.23Gbp of sequence data per library. Table S1 summarizes the sequencing outcomes for each library. On average for E. globosa and E. discolor, the number of reads after initial filtering before trimming was 16,238,070 (after trimming 16,129,183) and 14,751,390 (after trimming 14,662,281) respectively with an average read length after trimming of 144.33bp and 144.78bp respectively. In each library, approximately one percent of the total number of reads was estimated to be chloroplast, see Table S1. Average phred scores were both 37 for E. globosa and for E. discolor. De novo assembly using Velvet produced an average of 362 contigs (Table S2). Of these, an average of 31 Velvet contigs has an E-value of zero when a BLASTn search against a whole chloroplast genome database was performed. A total of 122 overlapping contigs were used for whole genome assembly. The final cpDNA for each Endiandra species contained a gap identified as a non-coding region between trnT and trnL with a length ranging from 300 to 800bp. The average number of chloroplast reads for E. globosa and E. discolor was 191,543 and 110,983 (respectively) with an average minimum coverage of 14x and 10x respectively. cpDNA diversity within two Endiandra species Table S3 shows within-population diversity across all sites tested for E. globosa and E. discolor. E. discolor generally shows uniformly higher diversity (with Coastal Byron in NNSW being the most diverse population) than E. globosa (with Hogan’s Scrub in NNSW being the least diverse population) for which AWT population are more diverse than NNSW ones. Table S1: A summary of the raw reads and quality trimmed reads, and variant detection mapping of each of the Illumina sequence libraries used (CB=Coastal Byron, HC=Harvey Creek, HS=Hogan’s Scrub, T=Tcupala, BS=Big Scrub, Blue= Bluewater Orange Track and Keelbottom Ck). CB 0.41 5,468,644 151 Number of Gbp Number of reads Avg. read length Number of reads 5,437,138 after trim Percentage 99.43 trimmed Avg. length after 144.65 trim Average coverage of SNP 62 mapping Percentage of paired reads 0.0097 mapped Mean read length 147.38 Minimum coverage without 6 the gap region Maximum 156 coverage Average 46.9 coverage Standard 37.94 deviation *No SNP found Endiandra globosa HC HS 3.13 2.70 20,747,342 17,907,910 151 151 T BS 3.14 2.96 20,828,384 19,617,466 151 151 Endiandra discolor MS CB 2.56 2.57 16,964,266 17,010,214 151 151 Blue 0.82 5,413,612 151 20,634,335 17,754,914 20,690,344 19,506,696 16,860,108 16,899,656 5,382,664 99.46 99.15 99.34 1 1 1 1 143.95 144.65 144.05 144.85 144.95 144.65 144.65 85 * 76 127 71 * * 0.0098 0.0207 0.0069 0.013 0.0051 0.0044 0.0112 146.58 146.72 145.99 146.62 145.59 145.23 146.96 8 34 9 1 17 15 5 516 742 527 714 514 511 153 180.35 333.4 128.03 227.06 77.4 66.83 54.28 107.21 66.68 46.62 48.52 22.31 19.53 16.57 Table S2: Summary of de novo assemblies and BLASTn results for E. globosa Illumina reads. CB HC HS T no. of contigs 449 385 273 343 mean contig length (bp) 1,693 3,712 5,149 3,643 longest contig (bp) 14,548 31,454 68,645 23,601 N50 214 239 222 201 number of cp contigs 23 27 42 30 longest cp contig (bp) 14,628 11,534 10,520 17,716 sum of cp contigs (bp) 110,500 113,663 100,227 123,936 Table S3: Within population SNPs across both Endiandra species (minimum frequency > 10%). Endiandra globosa Endiandra discolor Population HS CB T HC BS MS CB Blue Within population SNPs 29 63 116 100 158 189 282 126 4 Fig. S1: Biodiverse outputs comparing: a) richness of fleshy-fruited species, b) richness of species with fleshy fruits >30mm and c) weighted endemism across all 2306 rainforest woody species (WE) for 50 km × 50 km grid cells. The Tropic of Capricorn line is included for reference. 5 References 1. Zerbino DR, Birney E. 2008 Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome research 18, 821-829. (doi:10.1101/gr.074492.107) 2. McPherson H, Van der Merwe M, Delaney SK, Edwards MA, Henry RJ, McIntosh E, Rymer PD, Milner ML, Siow J. 2013 Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree. BMC Ecol. 13, 14726785. (doi:10.1186/1472-6785-13-8) 3. Van der Merwe M, McPherson H, Siow J, Rossetto M. 2014 Next‐Gen phylogeography of rainforest trees: exploring landscape‐level cpDNA variation from whole‐genome sequencing. Mol. Ecol. Resour. 14, 199-208. (doi:10.1111/1755-0998.12176) 4. Rossetto M, McPherson H, Siow J, Kooyman R, van der Merwe M, Wilson PD. 2015 Where did all the trees come from? A novel multidisciplinary approach reveals the impacts of biogeographic history and functional diversity on rain forest assembly. J. Biogeogr. (doi:10.1111/jbi.12571) 5. Nei, M., Li, W.H. (1979) Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 5269-5273. 6. Nei M, Kumar S. Molecular evolution and phylogenetics. New York. Oxford University Press; 2000. 7. Kooyman R, Rossetto M, Sauquet H, Laffan SW. 2013. Landscape patterns in rainforest phylogenetic signal: isolated islands of refugia or structured continental distributions? PloS ONE 8, e80685. (doi: 10.1371/journal.pone.0080685) 6

SUPPLEMENTARY MATERIALS – DETAILED METHODS AND

Related documents

Products

Support

SUPPLEMENTARY MATERIALS – DETAILED METHODS AND

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib