Gene discovery, linkage mapping and comparative genomics in Lepidoptera using next-generation RAD sequencing

advertisement
Gene discovery, linkage mapping
and comparative genomics in
Lepidoptera using next-generation
RAD sequencing
Simon W. Baxter, Chris D. Jiggins,
Mark L. Blaxter and John W. Davey
Department of Zoology, University of Cambridge
Institute of Evolutionary Biology, University of Edinburgh
Lepidoptera : systems for studying evolution
Butterfly mimicry
Heliconius melpomene
Heliconius erato
Insecticide Resistance
Diamondback moth,
Plutella xylostella
1. Isolate gDNA
(200 ng/ individual)
Method in brief
4. Pool all barcoded
individuals and shear gDNA
1000
500
2. Restriction Digest
CA
nnnCCTG
nnnGG
100
GGnnn
nnn
C
C
T
G
AC
3. Ligate P1 adapter
AGCTGTGCA
TCGAC
GGnnn
Cnnn
C
T
G
C
A
5. - Add “paired end” adapter
- PCR amplification
- Illumina sequencing
AGCTGTGCAnnnnn
TCGACACGTnnnnn
are
are
are
are
SNPs between RAD alleles
are
are
are
are
ATTCGATGCACGACACGGC
are
are
are
are
ACACATGCAGGCTACACGCTGAAAGACCCAT
are
are
are
are
GTGTGTGCAGGCTACACGCTGAAAGACCCAT
GTTCGATGCACGACACGGC
are
are
are
are
Presence/Absence of RAD allele
are
are
are
are
are
are
are
are
ACACATGCAGGCGACTTGCATGCAAGTTACGATGATTTCTGATGCATGTA
are
are
are
are
Sheared gDNA
Paired end RAD alleles
Individual 2
Individual 1
1000
500
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
GTGTGTGCAGG
100
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
ACACATGCAGG
11 bp
39 bp
200-600 bp consensus for BLAST
(Assemble using VelvetOptimiser)
How long are the PE RAD-alleles?
Count
Length of the assembled paired-end read vs number of counts
Length (bp)
Project
Perform paired end illumina RAD sequencing using a diamondback moth
backcross, segregating for a known insecticide resistance mutation (spinosad)
Photo credit, Heiko Vogel
Aims
1. Identify all 31 linkage groups including the
chromosome with a resistance mutation
2. Create linkage maps of each linkage group
3. Compare sequenced RAD tags with the
silkworm genome, Bombyx mori
Baxter et al. (2010) PLoS Genetics
RAD Library material and sequencing
Library has 24 individuals
Father, Mother, 22 progeny
How many RAD alleles per individual?
339 Mb haploid genome, ~66% AT
Spinosad
susceptible
Spinosad
resistant
~ 2000-5000 RAD sites/individual if SbfI enzyme
is used (CC//TGCAGG)
Expect each RAD allele to be sequenced 50100 times
Sequencing Coverage
10 Million sequenced clusters (50 bp, PE) on
a single lane of Illumina GAII
Average number of RAD tags/individual = 425,000
(2 progeny had low coverage and were excluded)
Total RAD alleles identified=17,000
How many times was each RAD allele sequenced?
What do segregating patterns look like?
progeny
1
2
3
4
5
6
7
8
9
10
GCTCATGGTTATTTAAAAATGAGCTT
ATATTTTTTTCATCAAAAACAGTCTA
42
31
52
23
57
0
54
0
117
49
153
19
0
25
98
0
0
26
63
32
37
26
37
7
CCGAAGCGGCCTTAGTCCTCAGGCTT
CAAGGGCGTCAGCTGTATCTCTGCTT
0
0
0
0
0
3
0
0
0
6
0
0
0
0
0
0
3
0
0
0
0
0
0
0
CCATAGAATTGGAAACTCTTTTTAC
CATCATTATGAGAACACATAGACGC
54
31
0
0
62
0
0
0
50
62
0
0
81
57
0
0
57
0
39
0
37
48
0
0
AACTTATTAACAAGCTTCCCTGTTGC
ATATTTTTTTCATCAAAAACAGCCTA
0
0
23
28
0
23
18
26
32
0
15
49
0
39
0
71
17
20
0
29
0
0
6
0
ATGGGTCTGCGAATAAACCGACGCAA
7882
6738
5103
3433
6124
5640
7450
6778
4230
5732
6422
6637
1
0
6361
Convert to binary format for analysis
ATATTTTTTTCATCAAAAACAGCCTA
0
1
1
1
0
1
1
1
1
0
0
Aim 1. Identify all 31 chromosome pairs (including resistance) by
analyzing RAD alleles inherited from the MOTHER
Mother
Father
Progeny
1
2
3
x
4
5
x
x
6
7
8 … 20
x
x
x
x
1
0
0
1
1
0
1
0
0
1
1
0
0
1
0
1
Expect 62 patterns (31 chromosomes x 2 phases)
x
2583 RAD-alleles map to 31 linkage groups
LG01
LG02
LG03
LG04
LG05
LG06
LG07
LG08
LG09
LG10
LG11
LG12
LG13
LG14
LG15
LG16
LG17
LG18
LG19
LG20
LG21
LG22
LG23
LG24
LG25
LG26
LG27
LG28
LG29
LG30
LG31
00101011001001010111
10101110111101000100
01101010011000010101
00111000011000011001
00001101010100111000
01110010101100010000
00000111111010111010
11101100010011010000
11100011100000011110
11101110111000110110
10011000001101000001
01110010011110111010
00111111001001010100
10001010010011101001
10001111011100101010
01001111100001100110
10110001011001110001
01100010100010111001
01000000111010010010
01010001101000110011
00101111000000111111
10100011100111000000
11001010110101111101
10011101110011110001
10110001111001011001
01000101010101010111
10010110101111110010
11100011011100111000
00100000111100010100
11111010100010010000
01111100111111111111
16
14
18
17
25
29
29
33
22
26
32
32
35
35
37
41
35
29
17
39
35
17
45
23
65
48
64
48
25
49
75
1055
11010100110110101000
01010001000010111011
10010101100111101010
11000111100111100110
11110010101011000111
10001101010011101111
11111000000101000101
00010011101100101111
00011100011111100001
00010001000111001001
01100111110010111110
10001101100001000101
11000000110110101011
01110101101100010110
01110000100011010101
10110000011110011001
01001110100110001110
10011101011101000110
10111111000101101101
10101110010111001100
11010000111111000000
01011100011000111111
00110101001010000010
01100010001100001110
01001110000110100110
10111010101010101000
01101001010000001101
00011100100011000111
11011111000011101011
00000101011101101111
10000011000000000000
17
19
19
32
32
33
33
35
36
37
37
39
39
42
43
43
45
47
47
48
54
57
62
66
66
67
70
79
80
90
99
1513
W and Z
Spinosad
Resistance
Aim 2. Create linkage maps of each linkage group
Crossing-over in males during spermatogenesis
female
male
Progeny
1
2
3
4
5
1
0
0
1
1
0
1
1
0
0
6
7
8 … 20
0
1
0
1
0
1
Diamondback moth genome linkage map (n=31)
4041 RAD alleles segregated from the father
Diamondback moth genome linkage map (n=31)
135 RAD alleles on LG22a
Aim 3. Compare sequenced RAD alleles with
the silkworm genome, Bombyx mori
Lepidoptera show a high degree of conserved synteny
Bombyx
Diamondback moth
BLAST RAD-alleles against the Bombyx genome (Expect<1e-20)
Diamondback moth 31 LGs
Bombyx mori 28 Chromosomes
LG
22
1
16
11
30
31
5
17
7
23
12
6
28
13
15
21
2
29
27
25
14
24
10
20
26
19
18
3
4
8
9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
24
1
3
7
9
20
23
5
14
5
12
3
4
14
17
8
10
6
22
13
10
-
6
15
9
16
17
2
5
-
=
Px31
Bm3
Px5
Current work:BLAST diamondback moth RAD-alleles against 454-EST contigs
Tissue for EST library
+
+
Heliconius melpomene genome
RAD tags generated from crosses will assist with genome scaffold assembly
RAD
linkage
map
Assembled
genomic
scaffolds
220 kb
113 kb
400 kb
500 kb
350 kb
200 kb
Heliconius melpomene
Thanks to…
Paul Etter and Eric Johnson, University of Oregon (Protocol advice, adapters)
Tony Shelton, Cornell University (Diamondback moths)
GenePool team, University of Edinburgh
Funding
QuickTime™ and a
decompressor
are needed to see this picture.
John’s scripts available here;
https://www.wiki.ed.ac.uk/display/RADSequencing
Digest genomic DNA with a restriction enzyme
Common cutter (eg PstI) = many RAD tags, low sequence coverage per tag
Rare cutter (eg SbfI) = fewer RAD tags, high sequence coverage per tag
Spinosad Resistance Cross
SAMPLE
BC_father
F1_mother
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
S7
S8
S9
S10
S11
S12
S29
S30
S31
S32
S33
S34
total
Average
READS
297795
495190
474048
381912
467094
496749
16959
510666
842036
414966
462595
325581
510086
609811
374328
168157
375236
373667
513757
463518
638261
482003
476704
45955
10217074
425711
TAGS
33558
51123
42147
34926
51663
49053
6996
48554
76469
38031
41883
34691
45641
59572
36786
21829
38573
39223
46542
44070
69112
45827
45548
9805
1011622
42151
10 million clusters sorted into
individuals using 5 bp index
How many RAD tags per
individual?
17K to 638K, average 425K
Group tags with identical
sequence for each individual
Expect
~ 4000
Observed 42000
Filtered ~ 7000
Expect to see 2x31 complementary patterns (from the Mother)
However… 1293 unique binary patterns identified
120
Pattern count
100
80
60
Series1
40
20
0
1
53
105
157
209
261
313
365
417
469
521
573
625
677
729
781
833
885
937
989 1041 1093 1145 1197 1249
Number of unique patterns
0111110010101011000111
0101110010011110111010
0101100111110010111110
0101010001101000110011
0101111100111111111111
0100011100100011000111
0111011111000011101011
0100000101011101101111
0110000011000000000000
32
32
37
39
75
79
80
90
99
1049 patterns appear once (errors)
68 patterns occur more than 10 times
How are the 31 linkage groups identified from the mother
matched to the 31 linkage maps constructed from the father?
Mother - 31 linkage groups
LG01
LG02
LG03
LG04
LG05
LG06
00101011001001010111
10101110111101000100
01101010011000010101
00111000011000011001
00001101010100111000
01110010101100010000
16
14
18
17
25
29
11010100110110101000
01010001000010111011
10010101100111101010
11000111100111100110
11110010101011000111
10001101010011101111
Mother LG01, RAD tag 1
Mother LG01, RAD tag 2
Assigned to Linkage Map 1
Father
RAD tag 3
Father - linkage maps
17
19
19
32
32
33
ATTCGATGCACGACACGG
CTACACGCTGAAAGACCCATCTTCGATGCACGACACGG
CTACACGCTGAAAGACCCATGTTCGATGCACGACACGG
CTACACGCTGAAAGACCCAT
Download