Welcome to UW-Madison, the WNPRC, and O’Connor Lab! MHC Genotyping Workshop November 7th – 11th, 2011 Madison, Wisconsin Introductions • Trainers (WNPRC Genetics Service) – – – – – Roger Wiseman Julie Karl Simon Lank Gabe Starrett Francesca Norante • Participants – – – – – – Wendy Garnica Mark Garthwaite Julie Holister-Smith Suzanne Queen Premeela Rajakumar Yuko Yuki Schedule of Events • Monday – Welcome and Overview Presentation – Begin bench work: cDNA synthesis & PCR (run #1) • Tuesday – PCR product purification, quantification & pooling (run #1) – Begin emulsion PCR (run #1) – Begin bench work (run #2) • Wednesday – Break & enrich DNA beads (run #1) – Run Roche/454 GS Junior instrument (run #1) – emPCR (run #2) • Thursday – – – – View run #1 results Continue work on run #2 Informatics presentation Data analysis • Friday – Run #2 results – Continue Data Analysis & Wrap-up Overview of Presentation • Our lab & research focus • Evolution of DNA sequencing technology • Discussion of Roche/454 technology & sample multiplexing • MHC genotyping method overview – NHP immunogenetics – Genotyping strategy – Workflow • Genotyping results Welcome to Madison! WNPRC Welcome to Madison! The Wisconsin National Primate Research Center (WNPRC) • Only federally funded National Primate Research Center in the Midwest • Center holds ~1,100 rhesus macaques, 200 marmosets, and 100 cynomolgus macaques • Research strengths: – Immunogenetics & Virology – Aging & Metabolism – Reproductive & Regenerative Medicine The O’Connor Laboratory Genetics Services Members The O’Connor Laboratory Genetics Services Members The O’Connor Laboratory: Research • NHP immunogenetics (MHC class I, class II, KIR) – Cynomolgus Macaque (Mauritian, Indonesian, SE Asian) – Rhesus Macaque (Indian & Chinese) – Japanese Macaque, Vervet, Sooty Mangaby • SIV pathogenesis (immunology) and viral evolution • Human immunogenetics (HLA) and HIV variation The O’Connor Laboratory: Research • NHP immunogenetics (MHC class I, class II, KIR) – Cynomolgus Macaque (Mauritian, Indonesian, SE Asian) – Rhesus Macaque (Indian & Chinese) – Japanese Macaque, Vervet, Sooty Mangaby • SIV pathogenesis (immunology) and viral evolution • Human immunogenetics (HLA) and HIV variation Sequencing Technology is Changing • Micro sequencing reactions – Pyrosequencing – Single molecule sequencing • Higher throughput – Millions of sequences per day • Lower cost – $10,000 human genome (original HGP = $3 billion) Sequencing Technology: Overview • 1st Generation (previous): Sanger sequencing Applied Biosystems 3730xl: 1 x 103 reads / day - 500 to 1,000 bp read length Sequencing Technology: Overview • 2nd Generation (current): 454, Illumina, SoLID, Ion torrent Roche / 454: 1 x 106 reads / day - 500 to 800 bp read length Illumina: 2 x 109 reads / week - 100 or 200 bp read length Sequencing Technology: Overview • 3rd Generation (future): Pacific Biosciences, Nanopore sequencing, Complete Genomics Pacific Biosciences: 1 x 105 sequences / hour - 1,000 to 10,000 bp reads (?) - Single molecule sequencing - Goal = $1,000 genome ! Sequencing Technology: Overview • 1st Generation (previous): Sanger – Slow, Expensive, Not clonal, easy to analyze • 2nd Generation (current): 454, Illumina, SoLID, Ion torrent – Faster, Cheaper, Clonal, hard to analyze • 3rd Generation (future): Pacific Biosciences, Nanopore sequencing, Complete Genomics, Helicos – Very fast, Very cheap, Impossible to analyze Roche / 454 Sequencing How does it work? Flowgram (instead of chromat) O’Connor Laboratory Sequencing 2005 2006 2007 2008 NHP MHC class I genotyping with E. coli based cloning and Sanger sequencing: Throughput of ~ 8 animals per week. Sanger sequencing 2009 2010 O’Connor Laboratory Sequencing 2005 2006 2007 2008 MHC class I genotyping pilot project: ~24 samples per week Pilot with Roche sequencing center Sanger sequencing 2009 2010 O’Connor Laboratory Sequencing 2005 2006 2007 2008 2009 MHC class I genotyping at UIUC, ~ 48 samples per week GS FLX at UIUC Pilot with Roche sequencing center Sanger sequencing 2010 O’Connor Laboratory Sequencing 2005 2006 2007 2008 2009 2010 MHC class I full-length sequencing project with Roche using Titanium chemistry Titanium pilot with Roche sequencing center GS FLX at UIUC Pilot with Roche sequencing center Sanger sequencing O’Connor Laboratory Sequencing 2005 2006 2007 2008 2009 MHC class I and viral sequencing projects run inhouse ( > 48 samples per week ) 2010 GS Junior in lab Titanium pilot with Roche sequencing center GS FLX at UIUC Pilot with Roche sequencing center Sanger sequencing Roche/454 Sequencing Advantages • Inherently clonal (no bacterial cloning needed) • Far cheaper per base than Sanger (3 – 4 orders of magnitude) • Reliable read number and data regularity • Easy protocol: many people trained GS Junior 5 Month Run Summary MHC Class I 568bp Amplicon – 9 runs Average 70,848 HQ reads 523 bp median length Highest 101,711 526 Lowest 33,552 521 101,846 HQ reads 360 bp median length Highest 177,642 494 Lowest 42,949 147 SIV Whole Genome – 16 runs Average SIV Epitope Amplicons (Various Sizes) – 5 runs Average 80,244 HQ reads 369 bp median length Highest 107,605 388 Lowest 37,066 356 Ease of Use Access to instrument since Jan 2010 34 different fully-trained operators to date 7 additional people have begun training, but have not yet completed a solo run Ease of Use Access to instrument since Jan 2010 34 different fully-trained operators to date 7 additional people have begun training, but have not yet completed a solo run Ultra-Deep vs. Ultra-Wide Sequencing • 2nd & 3rd Generation = thousands / millions of sequences per run • Cost per run is high ($1000s) • Can examine polymorphic target at high depth (ultra-deep) – expensive • Can sequence many samples sequenced at the same time (ultra-wide) – cheap Ultra-Deep vs. Ultra-Wide Sequencing • Significantly improves sensitivity over traditional Sanger-based sequencing (500x vs 2x coverage) Ultra-Deep vs. Ultra-Wide Sequencing Ultra-deep Ultra-wide • HLA Typing • Allele frequencies • SNP detection • Low frequency ARV resistance • TCR sequencing • Antibody sequencing Multiplexed (Ultra-wide) Amplicon Sequencing Multiplex Identifier MID Tag Methods to increase multiplexing 1. Physically subdividing plate (gasket) 2. Sample specific MID sequence tags 3. Uniquely mixing 5’ & 3’ MID tags Patient 1 2 3 4 5 6 7 8 9 MID ATCGTAGTCA TCCGATCGA GTGTAACGT CCATGGATC TGGATGCAG TAGTAGCCA GTAGTCTAA AACGATGCA GCGCTAGCA 2. 1. Patient 1 2 3 4 5 6 7 8 9 5' MID 1 1 1 2 2 2 3 3 3 3. 3' MID 1 2 3 1 2 3 1 2 3 O’Connor lab sequencing projects • NHP comprehensive MHC genotyping & allele discovery (amplicons) Importance of MHC Class I Host Immune Genetics Source: modified from Yewdell et al., Nature Reviews Immunology 2003 MHC class I molecules dictate immunity to disease High degree of polymorphism within the MHC class I peptide-binding domain Specific MHC alleles associated with superior control of HIV infection NHP MHC Class I Allele Libraries 700 663 Total # Alleles in GenBank 600 460 500 400 300 200 156 100 0 Rhesus Macaque Cynomolgus Macaque Pig-tailed Macaque 9 0 Vervet Sooty Mangabey NHP MHC Class I Allele Libraries 700 663 Total # Alleles in GenBank 600 460 500 400 300 200 156 100 0 Rhesus Macaque Cynomolgus Macaque Human HLA class I = 5,400 alleles Pig-tailed Macaque 9 0 Vervet Sooty Mangabey Human HLA vs NHP MHC Class I Human HLA class I A C B A C B Human HLA vs NHP MHC Class I Human HLA class I A C B A C B Nonhuman primate MHC class I A1 A2 A3 A4 B1 B2 B3 B4 BN A1 A2 A3 A4 B1 B2 B3 B4 BN MHC Genotyping Design α1 Domain α2 Domain α3 Domain Transmembran e Cytoplasmi c 568bp Amplicon 100 80 60 40 F R 20 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 137 145 153 161 169 177 185 193 201 209 217 225 233 241 249 257 265 273 281 289 297 305 313 321 329 337 345 353 361 % MHC Class I Variability Leader Peptide Amino Acid Position • 568bp amplicon captures highly variable peptide binding region flanked by conserved sequences • Amplifies in multiple primate species • Longer reads provide better resolution of alleles MHC Genotyping Design Primer = Adapter (A or B) + MID + sequence-specific 568bp Amplicon MHC Genotyping Design Primer = Adapter (A or B) + MID + sequence-specific 568bp Amplicon Within a single nonhuman primate sample: MHC Genotyping Design Primer = Adapter (A or B) + MID + sequence-specific 568bp Amplicon Within an MHC class I amplicon genotyping pool: Roche/454 MHC Workflow • Total RNA isolation and cDNA synthesis – RNA isolation ~4 hrs; cDNA synthesis ~2 hrs • Primary PCR amplification – plus SPRI purification, quantification, pooling ~3 hrs • emPCR – set-up ~1 hr, run ~5.5 hrs • Breaking and enrichment – ~3 hrs • GS Junior run – set-up ~1.5 hrs; run time ~10 hrs • Data processing and analysis www.454.co m – run processing ~2 hrs; – analysis time varies GS Junior Run Metrics – MHC Reads per Sample Sample Monkey001 Monkey002 Monkey003 Monkey004 Monkey005 Monkey006 Monkey007 Monkey008 Monkey009 Monkey010 Monkey011 Monkey012 Monkey013 Monkey014 Monkey015 Monkey016 Monkey017 Monkey018 Monkey019 Monkey020 Monkey021 Monkey022 Monkey023 Monkey024 Monkey025 Monkey026 Monkey027 Monkey028 Monkey029 Monkey030 Monkey031 Monkey032 Monkey033 Monkey034 Monkey035 Monkey036 Monkey037 Monkey038 Monkey039 Monkey040 Monkey041 Monkey042 Monkey043 Monkey044 Monkey045 Monkey046 Monkey047 Monkey048 MID Read Count 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 525 392 1,023 504 450 722 622 489 344 635 660 796 653 731 1,342 628 76 481 503 633 573 463 390 723 739 560 1,672 559 801 590 548 748 583 374 226 791 618 558 438 666 250 451 612 673 570 207 604 180 Sample Monkey049 Monkey050 Monkey051 Monkey052 Monkey053 Monkey054 Monkey055 Monkey056 Monkey057 Monkey058 Monkey059 Monkey060 Monkey061 Monkey062 Monkey063 Monkey064 Monkey065 Monkey066 Monkey067 Monkey068 Monkey069 Monkey070 Monkey071 Monkey072 Monkey073 Monkey074 Monkey075 Monkey076 Monkey077 Monkey078 Monkey079 Monkey080 Monkey081 Monkey082 Monkey083 Monkey084 Monkey085 Monkey086 Monkey087 Monkey088 Monkey089 Monkey090 Monkey091 Monkey092 Monkey093 Monkey094 Monkey095 Monkey096 MID Read Count 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 585 504 673 565 893 581 623 955 698 792 655 1,203 428 8 391 663 411 386 625 637 367 391 585 808 594 391 578 728 612 283 475 527 27 226 113 481 52 612 733 800 647 1,094 522 756 624 912 610 514 Allele Calls & Transcript Profiles % Total Reads ChRh10 16 14 12 10 8 6 4 2 0 ChRh11 ChRh12 MHC Class I Alleles Lymphocyte Specific Expression % Total Reads CD16 CD20 50 45 40 35 30 25 20 15 10 5 0 CD4 CD8 CD14 MHC Class I Alleles ROGER: INSERT ADDITIONAL DATA SLIDES? Same methods applicable to HLA typing • We have developed a similar assay to genotype human samples: HLA Class I and DRB loci • Cheaper, higher-resolution, and higherthroughput than existing methods • Can genotype up to 96 individuals per GS-Jr run 1 23 45 67 89 111 133 155 177 199 221 243 265 287 309 331 353 375 397 419 441 463 485 507 529 551 573 595 617 639 661 683 705 727 749 771 793 815 837 859 881 903 925 947 969 991 1013 1035 1057 1079 High Resolution HLA Genotyping 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 LP α1 Domain α2 Domain 1kb-F / 581-R (Amplicon 1) α3 Domain 581-F / 1kb-R bp SBT (Amplicon 2) TM CT High-resolution Typing for 40 Reference Cell Lines UW ID# HLA-Ref01 HLA-Ref02 HLA-Ref03 HLA-Ref04 HLA-Ref05 HLA-Ref06 HLA-Ref07 HLA-Ref08 HLA-Ref09 HLA-Ref10 HLA-Ref11 HLA-Ref12 HLA-Ref13 HLA-Ref14 HLA-Ref15 HLA-Ref16 HLA-Ref17 HLA-Ref18 HLA-Ref19 HLA-Ref20 HLA-Ref21 HLA-Ref22 HLA-Ref23 HLA-Ref24 HLA-Ref25 HLA-Ref26 HLA-Ref27 HLA-Ref28 HLA-Ref29 HLA-Ref30 HLA-Ref31 HLA-Ref32 HLA-Ref33 HLA-Ref34 HLA-Ref35 HLA-Ref36 HLA-Ref37 HLA-Ref38 A* A*31:01:02 A*32:01:01 A*02:16 A*03:01:01:01/03 A*24:02:01:01/02L A*26:02 A*30:01:01 A*02:01:01:01/02L/0 3 A*02:07 A*33:03:01 A*30:01:01 A*68:02:01:01/02/03 A*02:06:01 A*11:01:01 A*26:01:01 A*02:04 A*03:01:01:01/03 A*01:01:01:01 A*02:01:01:01/02L/0 3 A*02:01:01:01/02L/0 3 A*34:01:01 A*02:01:01:01/02L/0 3 A*01:01:01:01 A*25:01 A*30:02:01 A*01:01:01:01 A*02:05:01 A*01:01:01:01 A*03:01:01:01/03 A*01:01:01 A*02:01 A*01:01:01:01 A*24:02:01:01/02L A*01:01:01:01 A*01:37 A*03:01:01:01/03 A*03:01:01:01/03 A*01:01:01:01 A*03:01:01:01/03 A*03:01:01:01/03 A*24:02:01:01/02L A*02:01:01:01/02L/0 3 A*03:01:01:01/03 A*01:01:01:01 A*24:02:01:01/02L A*24:02:01:01/02L A*03:01:01:01/03 A*03:01:01:01/03 A*24:02:01:01/02L A*02:01:01:01/02L/0 3 A*24:02:01:01/02L A*24:02:01:01/02L A*31:01:02 A*02:01:01:01/02L/0 3 A*24:02:01:01/02L A*3402 A*7401 B* B*51:01:01 B*38:01:01 B*51:01:01 B*40:06:01:01/02 B*51:01:01 B*13:02:01 C* C*15:02:01 C*12:03:01:01/02 C*07:04:01 C*08:01:01 C*06:02:01:01/02 B*46:01:01 C*01:02:01 B*44:03:01 C*14:03 B*42:01:01 C*1701 B*15:01:01:01 B*35:01:01:01/02 C*03:03:01 B*08:01:01 C*07:01:01 B*51:01:01 C*15:02:01 B*47:01:01:01/02 C*06:02:01:01/02 B*57:01:01 C*06:02 C*15:02:01 C*14:02:01 C*04:01:01:01/02/03 B*35:03:01 C*12:03:01:01/02 B*35:01:01:01/02 B*15:21 B*15:35 C*04:01:01:01/02/03 C*04:03 C*07:02:01:01/02/03 B*15:01:01:01 B*49:01:01 B*51:01:01 B*18:01:01:01 B*08:01:01 B*07:02:01 B*05:801 B*39:06:02 B*35:01:01:01/02 B*07:02:01 B*07:02:01 B*35:01:01:01/02 B*35:01:01:01/02 B*50:01:01 B*58:01:01 B*07:02 B*58:01:01 B*58:01:01 B*35:01:01:01/02 B*35:01:01:01/02 B*58:01:01 B*51:01:04 C*03:04:01:01/02 C*07:01:01 C*01:02 C*05:01:01:01/02 C*06:02:01:01/02 C*07:01:01 C*07:01 C*07:01:01 C*07:01:01 C*07:02:01:01/02/03 C*07:02 C*07:02:01:01/02/03 C*04:01:01:01/02/03 C*04:01:01:01/02/03 C*04:01:01:01/02/03 C*04:01:01:01/02/03 C*07:02:01:01/02/03 C*07:02:01:01/02/03 C*07:18 (701?) C*07:04:01 B*07:02:01 B*39:06:02 B*07:02:01 B*07:02:01 B*35:01:01:01/02 B*37:01:01 B*58:01:01 B*51:01:01 B*35:01:01:01/02 B*39:06:02 C*06:02:01:01/02 C*07:01:01 C*07:117 C*04:01:01:01/02/03 C*04:01:01:01/02/03 C*07:02:01:01/02/03 C*07:02:01:01/02/03 B*07:02:01 B*07:02:01 B*13:02:01 B*40:01:02 C*06:02:01:01/02 C*03:04:01:01/02 C*07:02:01:01/02/03 C*07:02:01:01/02/03 B*15:01:01:01 B*801 B*39:06:02 B*1503 C*03:03:01 C*02:10 C*07:02:01:01/02/03 C*701 C*07:02:01:01/02/03 C*07:02:01:01/02/03 Example High-Resolution HLA Genotypes with DRB Read s 1kbF 581F 581R 1kbR DRB-F DRB-R 122 35 41 23 23 150 50 45 50 5 74 16 24 25 9 223 36 87 61 39 99 14 52 13 20 45 2 32 2 9 163 83 80 127 65 62 60 60 . Sample HIV_114 HIV_114 HIV_114 HIV_114 HIV_114 HIV_114 HIV_114 HIV_114 HIV_114 Allele A*36:01 A*68:01:01 B*41:02:01 B*53:01:01 C*04:01:01 C*17:01:01 (primer) DRB1*01:02:01 DRB1*16:02:01 DRB5*02-novel? HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 HIV_115 A*03:01:01 A*11:01:01 B*07:02:01 B*51:01:01 C*07:02:01 C*15:02:01 DRB1*04:04:01 DRB1*07:01:01 DRB4*01:01:01:01 DRB4*01:03:01:01 60 70 120 177 62 109 165 228 93 99 24 32 28 53 30 60 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 HIV_116 A*01:01:01 A*02:01:01 B*08:01:01 B*15:01:01 C*03:04:01 C*07:01:01 DRB1*03:01:01 DRB1*04:01:01 DRB3*01:01:02 DRB4*01:03:01:01 122 97 213 129 103 114 471 429 137 176 37 40 57 21 27 46 16 16 48 53 15 20 31 17 71 58 43 22 7 9 12 35 16 19 49 31 63 32 21 41 13 13 32 36 1 10 86 114 75 75 79 114 18 24 244 221 74 101 227 208 63 75 5 9 22 18 12 5 Sample HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 HIV_117 Allele A*26:01:01 A*29:02:01 B*44:03:01 (putative) B*44:10 (putative) C*04:01:01 Reads 1kbF 581F 581R 1kbR DRB-F DRB-R 167 24 74 40 29 96 24 31 24 17 286 112 53 59 62 210 113 51 46 . 245 38 130 26 51 DRB1*03:01:01 DRB1*07:01:01 DRB3*02:02:01 DRB4*01:03:01:01 173 171 50 44 HIV_118 HIV_118 HIV_118 HIV_118 HIV_118 HIV_118 HIV_118 HIV_118 HIV_118 A*02:01:01 A*23:01:01 B*40:01:02 B*44:03:01 C*03:04:01 C*14:03 DRB1*04:01:01 DRB1*10:01:01 DRB4*01:03:01:01 117 156 113 206 84 142 151 195 57 33 42 13 51 7 28 HIV_119 HIV_119 HIV_119 HIV_119 HIV_119 HIV_119 HIV_119 HIV_119 HIV_119 A*29:01:01:01 A*68:01:02 B*07:05:01 B*44:02:01:01 C*05:01:01 C*15:05:01/02 DRB1*04:04:01 DRB1*07:01:01 DRB4*01:03:01:01 36 73 48 86 47 63 233 250 77 13 36 12 41 25 26 46 61 50 81 47 61 7 12 11 15 5 15 24 39 35 63 15 31 10 20 7 26 10 11 94 81 25 29 79 90 25 15 80 96 33 71 99 24 89 105 33 144 145 44 14 14 15 11 15 22 6 5 18 4 7 11