mec12366-sup-0001-DataS1

Supplementary Methods M1 Genomic Library Preparation by Protocols Adapted from Son & Taylor (2011) and NEXTflex DNA Barcodes (Bioo Scientific, Austin TX) Library preparation involved random shearing of ~0.2 to 3μg of genomic DNA into ~350-400 bp fragments using the Bioruptor (Diagenode Inc., Sparta, NJ) for 6 rounds of 7 min each. Fragmented DNA was then cleaned and concentrated using a MinElute PCR Purification Kit (Qiagen) to 12μl, then blunt-end repaired using a cocktail of T4 DNA Polymerase, T4 Polynucleotide Kinase, and Polymerase I, Large (Klenow) unit (NEB) enzymes, dNTPs, ATP, BSA and buffers (see Son & Taylor, 2011) before clean-up with AMPure XP beads (Beckman Coulter, Brea CA) using a 1.8:1 ratio of beads to sample. DNA libraries were then 3' adenylated with Klenow fragment 3'-5' exo minus (NEB) in buffer with 1 mM dATP and followed immediately by NEXTflex adapter ligation using T4 Quick Ligation Kit (NEB). Adapter-ligated libraries were then cleaned using AMPure beads in a 1.1:1 ratio of beads to sample to remove ~120 bp adapter dimers. Libraries were then further cleaned and size-selected to ~450 bp using 2% low-melt agarose 1X TAE gels, and gel was removed with the MinElute Gel Purification Kit (Qiagen). Libraries were quantitated using a NanoDrop (Thermo Scientific) spectrophotometer and also by a pico-green dye-based standard curve as described earlier, then ~20 to 100 ng of library template was PCR-amplified using Phusion High-Fidelity DNA Polymerase (NEB) and the NEXTflex Primer Mix with an initial denaturation of 98C 2m, then 12-15 cycles of 98C 30s, 65C 30s, 72C 30s, and a final extension at 72C 4m. Amplified libraries were again AMPure bead-cleaned to remove remaining small-sized fragments and ~120 bp dimers using a 1.1:1 ratio of beads to sample, then quantity and fragment size were assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies) using the Agilent DNA 1000 Kit, following the manufacturer's instructions. Barcoded libraries were then combined for a single lane of paired-end 100bp Illumina HiSeq sequencing in equal molar ratios based on relative symbiont (rather than total) DNA as assessed earlier using qPCR. Libraries that consisted of pools from 15 insects were combined in equal molar ratios based on symbiont-to-insect DNA ratios prior to combination with single insects, then added in a ratio of 1:5 single- to pool-of-15. Barcoded Library Demultiplexing, Quality Filtering, and Assembly Barcoded Illumina HiSeq 2000 sequencing was demultiplexed in CASAVA v1.8.0 at Vincent J. Coates Genomics Sequencing Laboratory (University of California at Berkeley). This generated ~379 million reads divided amongst 25 barcodes, with on average 92% bases above Q30 quality score. Filtering and trimming steps included (1) adapter removal using fastq-mcf from ea-utils 1.1.2-301, (2) removal of sequences not passing Illumina filter and trimming low-quality ends using a custom perl script, (3) removal of reads with <90% of bases with quality score Q25 using FASTX-Toolkit 0.0.13. Assembly and alignment pipelines involved both mapping to a reference sequence and de novo assembly in order to better-predict small indels. Velvet 1.2.06 (Zerbino & Birney 2008) was used for de novo assembly of a consensus sequence for each barcoded sample, optimizing k-mer size, expected coverage and coverage cutoff to produce the longest possible contigs with Blast 2.2.25 matches to Ishikawaella genes. Contigs were sorted and binned with the help of custom perl scripts and then oriented and aligned using MUMmer 3.22 with show-tiling option. In most cases, Velvet generated breaks between contigs at 9 regions of non-unique sequence: 3 ribosomal RNA operons, 2 copies of elongation factor Tu (EF-Tu), and 4 tandem repeat regions (position 73568 – six repeats of a 9bp sequence, position 222097 – eight repeats of a 77 bp sequence, position 260111 – 20 repeats of a 21 bp sequence, and position 697630 – 12 repeats of a 116 bp sequence). These regions were not included in downstream analyses. References Zerbino DR, Birney E (2008) Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18, 821–829. Population Genetic Parameter Estimation While most sample locations involved single insects, an abundance of polymorphic differences within single insect samples and a lack of fixed differences between these, led us to attempt treating each sample as a "pool" of individuals for population genetic parameter estimation using PoPoolation/PoPoolation2 (Kofler et al. 2011a; Kofler et al. 2011b; Futschik & Schlötterer 2010). This approach was used successfully by Holt et al. (2009) for bacterial samples for variants called using Maq. Pool size determination for each sample was achieved using allele frequencies calculated by the UnifiedGenotyper tool in GATK, after duplicate removal, using the reciprocal of the lowest alternate allele frequency above an error threshold filter of 1.0% for any variant across locations (i.e. alleles below this frequency were not considered). This would be an underestimate of the number of unique variant genotypes in a sample; thus, it is a conservative estimate of the number of evolutionarily distinct individuals. Pool sizes were used in population genetics estimates of mean pairwise nucleotide difference (Pi), Watterson's Theta, and Tajima's D statistics (Tajima 1989), using PoPoolation/2 for the complete genomes analyzed in 10kb and 50kb windows. Results were plotted using RStudio v0.96.122 (2012). References Futschik A, Schlötterer C (2010). Massively parallel sequencing of pooled DNA samples - the next generation of molecular markers. Genetics, 186, 207–218. Holt KE, Teo YY, Li H, Nair S, Dougan G et al. (2009) Detecting SNPs and estimating allele frequencies in clonal bacterial populations by sequencing pooled DNA. Bioinformatics, 25, 2074–2075. Kofler R, Orozco-terWengel P, De Maio N, Pandey RV, Nolte V et al. (2011) PoPoolation: A Toolbox for Population Genetic Analysis of Next Generation Sequencing Data from Pooled Individuals. PLoS ONE, 6, e15925. Kofler R, Pandey RV, Schlötterer C (2011) PoPoolation2: Identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics, 27, 3435–3436. RStudio (2012). RStudio: Integrated development environment for R (Version 0.96.122) [Computer software]. Boston, MA. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, 585–595.

mec12366-sup-0001-DataS1

Related documents

Products

Support

mec12366-sup-0001-DataS1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib