mec12366-sup-0001-DataS1

advertisement
Supplementary Methods M1
Genomic Library Preparation by Protocols Adapted from Son & Taylor (2011) and
NEXTflex DNA Barcodes (Bioo Scientific, Austin TX)
Library preparation involved random shearing of ~0.2 to 3μg of genomic DNA into
~350-400 bp fragments using the Bioruptor (Diagenode Inc., Sparta, NJ) for 6 rounds of
7 min each. Fragmented DNA was then cleaned and concentrated using a MinElute PCR
Purification Kit (Qiagen) to 12μl, then blunt-end repaired using a cocktail of T4 DNA
Polymerase, T4 Polynucleotide Kinase, and Polymerase I, Large (Klenow) unit (NEB)
enzymes, dNTPs, ATP, BSA and buffers (see Son & Taylor, 2011) before clean-up with
AMPure XP beads (Beckman Coulter, Brea CA) using a 1.8:1 ratio of beads to sample.
DNA libraries were then 3' adenylated with Klenow fragment 3'-5' exo minus (NEB) in
buffer with 1 mM dATP and followed immediately by NEXTflex adapter ligation using
T4 Quick Ligation Kit (NEB). Adapter-ligated libraries were then cleaned using AMPure
beads in a 1.1:1 ratio of beads to sample to remove ~120 bp adapter dimers. Libraries
were then further cleaned and size-selected to ~450 bp using 2% low-melt agarose 1X
TAE gels, and gel was removed with the MinElute Gel Purification Kit (Qiagen).
Libraries were quantitated using a NanoDrop (Thermo Scientific) spectrophotometer and
also by a pico-green dye-based standard curve as described earlier, then ~20 to 100 ng of
library template was PCR-amplified using Phusion High-Fidelity DNA Polymerase
(NEB) and the NEXTflex Primer Mix with an initial denaturation of 98C 2m, then 12-15
cycles of 98C 30s, 65C 30s, 72C 30s, and a final extension at 72C 4m. Amplified
libraries were again AMPure bead-cleaned to remove remaining small-sized fragments
and ~120 bp dimers using a 1.1:1 ratio of beads to sample, then quantity and fragment
size were assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies) using the
Agilent DNA 1000 Kit, following the manufacturer's instructions. Barcoded libraries
were then combined for a single lane of paired-end 100bp Illumina HiSeq sequencing in
equal molar ratios based on relative symbiont (rather than total) DNA as assessed earlier
using qPCR. Libraries that consisted of pools from 15 insects were combined in equal
molar ratios based on symbiont-to-insect DNA ratios prior to combination with single
insects, then added in a ratio of 1:5 single- to pool-of-15.
Barcoded Library Demultiplexing, Quality Filtering, and Assembly
Barcoded Illumina HiSeq 2000 sequencing was demultiplexed in CASAVA v1.8.0 at
Vincent J. Coates Genomics Sequencing Laboratory (University of California at
Berkeley). This generated ~379 million reads divided amongst 25 barcodes, with on
average 92% bases above Q30 quality score. Filtering and trimming steps included (1)
adapter removal using fastq-mcf from ea-utils 1.1.2-301, (2) removal of sequences not
passing Illumina filter and trimming low-quality ends using a custom perl script, (3)
removal of reads with <90% of bases with quality score Q25 using FASTX-Toolkit
0.0.13. Assembly and alignment pipelines involved both mapping to a reference sequence
and de novo assembly in order to better-predict small indels. Velvet 1.2.06 (Zerbino &
Birney 2008) was used for de novo assembly of a consensus sequence for each barcoded
sample, optimizing k-mer size, expected coverage and coverage cutoff to produce the
longest possible contigs with Blast 2.2.25 matches to Ishikawaella genes. Contigs were
sorted and binned with the help of custom perl scripts and then oriented and aligned using
MUMmer 3.22 with show-tiling option. In most cases, Velvet generated breaks between
contigs at 9 regions of non-unique sequence: 3 ribosomal RNA operons, 2 copies of
elongation factor Tu (EF-Tu), and 4 tandem repeat regions (position 73568 – six repeats
of a 9bp sequence, position 222097 – eight repeats of a 77 bp sequence, position 260111
– 20 repeats of a 21 bp sequence, and position 697630 – 12 repeats of a 116 bp
sequence). These regions were not included in downstream analyses.
References
Zerbino DR, Birney E (2008) Velvet: Algorithms for de novo short read assembly using
de Bruijn graphs. Genome Research, 18, 821–829.
Population Genetic Parameter Estimation
While most sample locations involved single insects, an abundance of
polymorphic differences within single insect samples and a lack of fixed differences
between these, led us to attempt treating each sample as a "pool" of individuals for
population genetic parameter estimation using PoPoolation/PoPoolation2 (Kofler et al.
2011a; Kofler et al. 2011b; Futschik & Schlötterer 2010). This approach was used
successfully by Holt et al. (2009) for bacterial samples for variants called using Maq.
Pool size determination for each sample was achieved using allele frequencies calculated
by the UnifiedGenotyper tool in GATK, after duplicate removal, using the reciprocal of
the lowest alternate allele frequency above an error threshold filter of 1.0% for any
variant across locations (i.e. alleles below this frequency were not considered). This
would be an underestimate of the number of unique variant genotypes in a sample; thus,
it is a conservative estimate of the number of evolutionarily distinct individuals. Pool
sizes were used in population genetics estimates of mean pairwise nucleotide difference
(Pi), Watterson's Theta, and Tajima's D statistics (Tajima 1989), using PoPoolation/2 for
the complete genomes analyzed in 10kb and 50kb windows. Results were plotted using
RStudio v0.96.122 (2012).
References
Futschik A, Schlötterer C (2010). Massively parallel sequencing of pooled DNA samples
- the next generation of molecular markers. Genetics, 186, 207–218.
Holt KE, Teo YY, Li H, Nair S, Dougan G et al. (2009) Detecting SNPs and estimating
allele frequencies in clonal bacterial populations by sequencing pooled DNA.
Bioinformatics, 25, 2074–2075.
Kofler R, Orozco-terWengel P, De Maio N, Pandey RV, Nolte V et al. (2011)
PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing
Data from Pooled Individuals. PLoS ONE, 6, e15925.
Kofler R, Pandey RV, Schlötterer C (2011) PoPoolation2: Identifying differentiation
between populations using sequencing of pooled DNA samples (Pool-Seq).
Bioinformatics, 27, 3435–3436.
RStudio (2012). RStudio: Integrated development environment for R (Version 0.96.122)
[Computer software]. Boston, MA.
Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA
polymorphism. Genetics, 123, 585–595.
Download