Gene discovery, linkage mapping and comparative genomics in Lepidoptera using next-generation RAD sequencing Simon W. Baxter, Chris D. Jiggins, Mark L. Blaxter and John W. Davey Department of Zoology, University of Cambridge Institute of Evolutionary Biology, University of Edinburgh Lepidoptera : systems for studying evolution Butterfly mimicry Heliconius melpomene Heliconius erato Insecticide Resistance Diamondback moth, Plutella xylostella 1. Isolate gDNA (200 ng/ individual) Method in brief 4. Pool all barcoded individuals and shear gDNA 1000 500 2. Restriction Digest CA nnnCCTG nnnGG 100 GGnnn nnn C C T G AC 3. Ligate P1 adapter AGCTGTGCA TCGAC GGnnn Cnnn C T G C A 5. - Add “paired end” adapter - PCR amplification - Illumina sequencing AGCTGTGCAnnnnn TCGACACGTnnnnn are are are are SNPs between RAD alleles are are are are ATTCGATGCACGACACGGC are are are are ACACATGCAGGCTACACGCTGAAAGACCCAT are are are are GTGTGTGCAGGCTACACGCTGAAAGACCCAT GTTCGATGCACGACACGGC are are are are Presence/Absence of RAD allele are are are are are are are are ACACATGCAGGCGACTTGCATGCAAGTTACGATGATTTCTGATGCATGTA are are are are Sheared gDNA Paired end RAD alleles Individual 2 Individual 1 1000 500 GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG GTGTGTGCAGG 100 ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG ACACATGCAGG 11 bp 39 bp 200-600 bp consensus for BLAST (Assemble using VelvetOptimiser) How long are the PE RAD-alleles? Count Length of the assembled paired-end read vs number of counts Length (bp) Project Perform paired end illumina RAD sequencing using a diamondback moth backcross, segregating for a known insecticide resistance mutation (spinosad) Photo credit, Heiko Vogel Aims 1. Identify all 31 linkage groups including the chromosome with a resistance mutation 2. Create linkage maps of each linkage group 3. Compare sequenced RAD tags with the silkworm genome, Bombyx mori Baxter et al. (2010) PLoS Genetics RAD Library material and sequencing Library has 24 individuals Father, Mother, 22 progeny How many RAD alleles per individual? 339 Mb haploid genome, ~66% AT Spinosad susceptible Spinosad resistant ~ 2000-5000 RAD sites/individual if SbfI enzyme is used (CC//TGCAGG) Expect each RAD allele to be sequenced 50100 times Sequencing Coverage 10 Million sequenced clusters (50 bp, PE) on a single lane of Illumina GAII Average number of RAD tags/individual = 425,000 (2 progeny had low coverage and were excluded) Total RAD alleles identified=17,000 How many times was each RAD allele sequenced? What do segregating patterns look like? progeny 1 2 3 4 5 6 7 8 9 10 GCTCATGGTTATTTAAAAATGAGCTT ATATTTTTTTCATCAAAAACAGTCTA 42 31 52 23 57 0 54 0 117 49 153 19 0 25 98 0 0 26 63 32 37 26 37 7 CCGAAGCGGCCTTAGTCCTCAGGCTT CAAGGGCGTCAGCTGTATCTCTGCTT 0 0 0 0 0 3 0 0 0 6 0 0 0 0 0 0 3 0 0 0 0 0 0 0 CCATAGAATTGGAAACTCTTTTTAC CATCATTATGAGAACACATAGACGC 54 31 0 0 62 0 0 0 50 62 0 0 81 57 0 0 57 0 39 0 37 48 0 0 AACTTATTAACAAGCTTCCCTGTTGC ATATTTTTTTCATCAAAAACAGCCTA 0 0 23 28 0 23 18 26 32 0 15 49 0 39 0 71 17 20 0 29 0 0 6 0 ATGGGTCTGCGAATAAACCGACGCAA 7882 6738 5103 3433 6124 5640 7450 6778 4230 5732 6422 6637 1 0 6361 Convert to binary format for analysis ATATTTTTTTCATCAAAAACAGCCTA 0 1 1 1 0 1 1 1 1 0 0 Aim 1. Identify all 31 chromosome pairs (including resistance) by analyzing RAD alleles inherited from the MOTHER Mother Father Progeny 1 2 3 x 4 5 x x 6 7 8 … 20 x x x x 1 0 0 1 1 0 1 0 0 1 1 0 0 1 0 1 Expect 62 patterns (31 chromosomes x 2 phases) x 2583 RAD-alleles map to 31 linkage groups LG01 LG02 LG03 LG04 LG05 LG06 LG07 LG08 LG09 LG10 LG11 LG12 LG13 LG14 LG15 LG16 LG17 LG18 LG19 LG20 LG21 LG22 LG23 LG24 LG25 LG26 LG27 LG28 LG29 LG30 LG31 00101011001001010111 10101110111101000100 01101010011000010101 00111000011000011001 00001101010100111000 01110010101100010000 00000111111010111010 11101100010011010000 11100011100000011110 11101110111000110110 10011000001101000001 01110010011110111010 00111111001001010100 10001010010011101001 10001111011100101010 01001111100001100110 10110001011001110001 01100010100010111001 01000000111010010010 01010001101000110011 00101111000000111111 10100011100111000000 11001010110101111101 10011101110011110001 10110001111001011001 01000101010101010111 10010110101111110010 11100011011100111000 00100000111100010100 11111010100010010000 01111100111111111111 16 14 18 17 25 29 29 33 22 26 32 32 35 35 37 41 35 29 17 39 35 17 45 23 65 48 64 48 25 49 75 1055 11010100110110101000 01010001000010111011 10010101100111101010 11000111100111100110 11110010101011000111 10001101010011101111 11111000000101000101 00010011101100101111 00011100011111100001 00010001000111001001 01100111110010111110 10001101100001000101 11000000110110101011 01110101101100010110 01110000100011010101 10110000011110011001 01001110100110001110 10011101011101000110 10111111000101101101 10101110010111001100 11010000111111000000 01011100011000111111 00110101001010000010 01100010001100001110 01001110000110100110 10111010101010101000 01101001010000001101 00011100100011000111 11011111000011101011 00000101011101101111 10000011000000000000 17 19 19 32 32 33 33 35 36 37 37 39 39 42 43 43 45 47 47 48 54 57 62 66 66 67 70 79 80 90 99 1513 W and Z Spinosad Resistance Aim 2. Create linkage maps of each linkage group Crossing-over in males during spermatogenesis female male Progeny 1 2 3 4 5 1 0 0 1 1 0 1 1 0 0 6 7 8 … 20 0 1 0 1 0 1 Diamondback moth genome linkage map (n=31) 4041 RAD alleles segregated from the father Diamondback moth genome linkage map (n=31) 135 RAD alleles on LG22a Aim 3. Compare sequenced RAD alleles with the silkworm genome, Bombyx mori Lepidoptera show a high degree of conserved synteny Bombyx Diamondback moth BLAST RAD-alleles against the Bombyx genome (Expect<1e-20) Diamondback moth 31 LGs Bombyx mori 28 Chromosomes LG 22 1 16 11 30 31 5 17 7 23 12 6 28 13 15 21 2 29 27 25 14 24 10 20 26 19 18 3 4 8 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 24 1 3 7 9 20 23 5 14 5 12 3 4 14 17 8 10 6 22 13 10 - 6 15 9 16 17 2 5 - = Px31 Bm3 Px5 Current work:BLAST diamondback moth RAD-alleles against 454-EST contigs Tissue for EST library + + Heliconius melpomene genome RAD tags generated from crosses will assist with genome scaffold assembly RAD linkage map Assembled genomic scaffolds 220 kb 113 kb 400 kb 500 kb 350 kb 200 kb Heliconius melpomene Thanks to… Paul Etter and Eric Johnson, University of Oregon (Protocol advice, adapters) Tony Shelton, Cornell University (Diamondback moths) GenePool team, University of Edinburgh Funding QuickTime™ and a decompressor are needed to see this picture. John’s scripts available here; https://www.wiki.ed.ac.uk/display/RADSequencing Digest genomic DNA with a restriction enzyme Common cutter (eg PstI) = many RAD tags, low sequence coverage per tag Rare cutter (eg SbfI) = fewer RAD tags, high sequence coverage per tag Spinosad Resistance Cross SAMPLE BC_father F1_mother C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 S7 S8 S9 S10 S11 S12 S29 S30 S31 S32 S33 S34 total Average READS 297795 495190 474048 381912 467094 496749 16959 510666 842036 414966 462595 325581 510086 609811 374328 168157 375236 373667 513757 463518 638261 482003 476704 45955 10217074 425711 TAGS 33558 51123 42147 34926 51663 49053 6996 48554 76469 38031 41883 34691 45641 59572 36786 21829 38573 39223 46542 44070 69112 45827 45548 9805 1011622 42151 10 million clusters sorted into individuals using 5 bp index How many RAD tags per individual? 17K to 638K, average 425K Group tags with identical sequence for each individual Expect ~ 4000 Observed 42000 Filtered ~ 7000 Expect to see 2x31 complementary patterns (from the Mother) However… 1293 unique binary patterns identified 120 Pattern count 100 80 60 Series1 40 20 0 1 53 105 157 209 261 313 365 417 469 521 573 625 677 729 781 833 885 937 989 1041 1093 1145 1197 1249 Number of unique patterns 0111110010101011000111 0101110010011110111010 0101100111110010111110 0101010001101000110011 0101111100111111111111 0100011100100011000111 0111011111000011101011 0100000101011101101111 0110000011000000000000 32 32 37 39 75 79 80 90 99 1049 patterns appear once (errors) 68 patterns occur more than 10 times How are the 31 linkage groups identified from the mother matched to the 31 linkage maps constructed from the father? Mother - 31 linkage groups LG01 LG02 LG03 LG04 LG05 LG06 00101011001001010111 10101110111101000100 01101010011000010101 00111000011000011001 00001101010100111000 01110010101100010000 16 14 18 17 25 29 11010100110110101000 01010001000010111011 10010101100111101010 11000111100111100110 11110010101011000111 10001101010011101111 Mother LG01, RAD tag 1 Mother LG01, RAD tag 2 Assigned to Linkage Map 1 Father RAD tag 3 Father - linkage maps 17 19 19 32 32 33 ATTCGATGCACGACACGG CTACACGCTGAAAGACCCATCTTCGATGCACGACACGG CTACACGCTGAAAGACCCATGTTCGATGCACGACACGG CTACACGCTGAAAGACCCAT