SUPPLEMENTARY MATERIALS – DETAILED METHODS AND

advertisement
SUPPLEMENTARY MATERIALS – DETAILED METHODS AND RESULTS
Detailed Methods
Chloroplast DNA assembly and SNP detection
An in silicio assembly of cpDNA was conducted using the libraries of Endiandra globosa to
construct the chloroplast genome. This involved an independent de novo assembly of each
library in Velvet 1.2.1 [1] in order to aid assembly of the contigs into a full chloroplast
genome. The paired reads were assembled into contigs using the default parameters except
the hash length (or k-mer length) was set as 81.
The next steps were similar to the approach previously outlined in McPherson et al. [2] and
Van der Merwe et al. [3]. To isolate chloroplast sequences from the contigs, each set of de
novo contigs was BLASTed against a database of whole chloroplast genomes (168 seed
plants downloaded from NCBI (http://www.ncbi.nlm.nih.gov/, 28 May 2012) in CLC Bio
Genomics Workbench 7.5 (http://www.clcbio.com) (CLC) with default parameters. All
contigs with an E-value of zero were exported into Geneious Pro v6.1.6 (Biomatters Ltd.,
http://www.geneious.com) for further assembly using the approach of McPherson et al. [2]
and Van der Merwe et al. [3]. Without the availability of a close full genome reference, a
scaffold mapping to the chloroplast genome sequence of Liriodendron tulipifera
(NC_008326.1) was generated. The gaps between the remaining contigs, including IR
junctions, were captured via manual editing. Quality trimmed reads available for E. globosa
were mapped onto the sequence to verify that the assembly was correct. The cpDNA of
Endiandra discolor was then constructed by mapping the available reads to this reference. To
simplify the SNP detection analysis, each cpDNAs was maintained as one continuous
sequence with one copy of the inverted repeat (IR) region removed.
Pooling of eight individuals during the preparation of the shotgun library enabled the
detection of within and between population SNPs. SNP detection using a similar approach to
the study of McPherson et al. [2], Van der Merwe et al. [3] and Rossetto et al. [4] was
conducted for each Endiandra species. Trimmed reads were mapped onto the new cpDNAs
with all default settings except for a length fraction of 0.9 and similarity fraction of 0.8.
Within population SNPs were detected using the Basic Variant Detection on CLC, with the
default parameters except ploidy=2, minimum coverage=40 and minimum frequency=10%.
SNP comparison between population was conducted with the paramaters, ploidy=1,
minimum coverage=20 and minimum frequency=10%, in which the resulting alleles labelled
as homozygous were retained as the between population SNPs.
All SNPs were manually checked and only those with paired reads for both variants were
accepted. SNPs and low coverage areas (defined as less than or equal to 10 times (x)
coverage) were then annotated, and each mapping consensus file along with annotations were
imported into Geneious for alignment following the approach of Van der Merwe et al. [3].
For each species we calculated genomic diversities and between-population genomic
distances as summary statistics based on chloroplast genome variation based on Rossetto et
al. [4]. Within-population genomic diversity was calculated using SNP counts, total number
of nucleotide sites and sample size per population [5]. Within-species genomic diversity was
calculated in the same way except using total SNP counts, total number of nucleotide sites
and sample size per species. Between-population genomic distance was calculated as the
proportion of nucleotide sites at which the consensus sequence for each population was
different, following Nei and Kumar [6].
Detailed Results
Sequencing, assembly and SNP detection
Using a similar approach to McPherson et al. [2], we were able to assemble the chloroplast
genomes of Endiandra globosa (158,585bp) and Endiandra discolor (158,567bp) from
Illumina 150bp paired-end sequencing data with an average coverage of 172x, and 100x
respectively. The lengths of the IR regions were found to be 25,464bp and 25,466bp
respectively.
A total of eight whole-genome shotgun sequencing libraries representing two Endiandra
species produced approximately 2.23Gbp of sequence data per library. Table S1 summarizes
the sequencing outcomes for each library. On average for E. globosa and E. discolor, the
number of reads after initial filtering before trimming was 16,238,070 (after trimming
16,129,183) and 14,751,390 (after trimming 14,662,281) respectively with an average read
length after trimming of 144.33bp and 144.78bp respectively.
In each library, approximately one percent of the total number of reads was estimated to be
chloroplast, see Table S1. Average phred scores were both 37 for E. globosa and for E.
discolor. De novo assembly using Velvet produced an average of 362 contigs (Table S2). Of
these, an average of 31 Velvet contigs has an E-value of zero when a BLASTn search against
a whole chloroplast genome database was performed. A total of 122 overlapping contigs were
used for whole genome assembly. The final cpDNA for each Endiandra species contained a
gap identified as a non-coding region between trnT and trnL with a length ranging from 300
to 800bp. The average number of chloroplast reads for E. globosa and E. discolor was
191,543 and 110,983 (respectively) with an average minimum coverage of 14x and 10x
respectively.
cpDNA diversity within two Endiandra species
Table S3 shows within-population diversity across all sites tested for E. globosa and E.
discolor. E. discolor generally shows uniformly higher diversity (with Coastal Byron in
NNSW being the most diverse population) than E. globosa (with Hogan’s Scrub in NNSW
being the least diverse population) for which AWT population are more diverse than NNSW
ones.
Table S1: A summary of the raw reads and quality trimmed reads, and variant detection mapping of each of the Illumina sequence libraries used
(CB=Coastal Byron, HC=Harvey Creek, HS=Hogan’s Scrub, T=Tcupala, BS=Big Scrub, Blue= Bluewater Orange Track and Keelbottom Ck).
CB
0.41
5,468,644
151
Number of Gbp
Number of reads
Avg. read length
Number of reads
5,437,138
after trim
Percentage
99.43
trimmed
Avg. length after
144.65
trim
Average
coverage of SNP
62
mapping
Percentage of
paired reads
0.0097
mapped
Mean read length
147.38
Minimum
coverage without
6
the gap region
Maximum
156
coverage
Average
46.9
coverage
Standard
37.94
deviation
*No SNP found
Endiandra globosa
HC
HS
3.13
2.70
20,747,342 17,907,910
151
151
T
BS
3.14
2.96
20,828,384 19,617,466
151
151
Endiandra discolor
MS
CB
2.56
2.57
16,964,266 17,010,214
151
151
Blue
0.82
5,413,612
151
20,634,335
17,754,914
20,690,344 19,506,696
16,860,108
16,899,656
5,382,664
99.46
99.15
99.34
1
1
1
1
143.95
144.65
144.05
144.85
144.95
144.65
144.65
85
*
76
127
71
*
*
0.0098
0.0207
0.0069
0.013
0.0051
0.0044
0.0112
146.58
146.72
145.99
146.62
145.59
145.23
146.96
8
34
9
1
17
15
5
516
742
527
714
514
511
153
180.35
333.4
128.03
227.06
77.4
66.83
54.28
107.21
66.68
46.62
48.52
22.31
19.53
16.57
Table S2: Summary of de novo assemblies and BLASTn results for E. globosa Illumina reads.
CB
HC
HS
T
no. of contigs
449
385
273
343
mean contig length (bp) 1,693
3,712
5,149
3,643
longest contig (bp)
14,548 31,454 68,645 23,601
N50
214
239
222
201
number of cp contigs
23
27
42
30
longest cp contig (bp)
14,628 11,534 10,520 17,716
sum of cp contigs (bp)
110,500 113,663 100,227 123,936
Table S3: Within population SNPs across both Endiandra species (minimum frequency > 10%).
Endiandra globosa
Endiandra discolor
Population
HS
CB
T
HC
BS
MS
CB
Blue
Within population SNPs
29
63
116
100
158
189
282
126
4
Fig. S1: Biodiverse outputs comparing: a) richness of fleshy-fruited species, b) richness of species with
fleshy fruits >30mm and c) weighted endemism across all 2306 rainforest woody species (WE) for 50
km × 50 km grid cells. The Tropic of Capricorn line is included for reference.
5
References
1. Zerbino DR, Birney E. 2008 Velvet: algorithms for de novo short read assembly using de
Bruijn graphs. Genome research 18, 821-829. (doi:10.1101/gr.074492.107)
2. McPherson H, Van der Merwe M, Delaney SK, Edwards MA, Henry RJ, McIntosh E, Rymer
PD, Milner ML, Siow J. 2013 Capturing chloroplast variation for molecular ecology studies: a
simple next generation sequencing approach applied to a rainforest tree. BMC Ecol. 13, 14726785. (doi:10.1186/1472-6785-13-8)
3. Van der Merwe M, McPherson H, Siow J, Rossetto M. 2014 Next‐Gen phylogeography of
rainforest trees: exploring landscape‐level cpDNA variation from whole‐genome sequencing.
Mol. Ecol. Resour. 14, 199-208. (doi:10.1111/1755-0998.12176)
4. Rossetto M, McPherson H, Siow J, Kooyman R, van der Merwe M, Wilson PD. 2015 Where
did all the trees come from? A novel multidisciplinary approach reveals the impacts of
biogeographic history and functional diversity on rain forest assembly. J. Biogeogr.
(doi:10.1111/jbi.12571)
5. Nei, M., Li, W.H. (1979) Mathematical model for studying genetic variation in terms of
restriction endonucleases. Proc. Natl. Acad. Sci. USA 76, 5269-5273.
6. Nei M, Kumar S. Molecular evolution and phylogenetics. New York. Oxford University Press;
2000.
7. Kooyman R, Rossetto M, Sauquet H, Laffan SW. 2013. Landscape patterns in rainforest
phylogenetic signal: isolated islands of refugia or structured continental distributions? PloS
ONE 8, e80685. (doi: 10.1371/journal.pone.0080685)
6
Download