NGS Data Consortium October 8, 2012 HLA Genotyping Data Generated by 454 Sequencing Cherie Holcomb, Ph.D. Roche Molecular Systems picture placeholder Medium, High, and Very High Resolution HLA Genotyping Systems Targeted Amplicon sequencing – Primers target up to 9 loci (format as fusion primers or “4 primer system” w/use of Fluidigm Access Array™) • NOTE: After amplification of gDNA, amplicons contain all adapters & MIDs etc. for NGS sequencing – Workflow: Amplicons processed either individually or pooled (fusion primers) or pooled (“4 primer system”) – 454 Life Sciences GS FLX or GS Junior for NGS – Conexio Assign ATF 454 software (commercially available will perform MR and HR; for VHR limited early access to Conexio Assign MPS v1.0) Amplicon Sequencing using 454 GS FLX* Resolution Possible/Future Applications "Medium" (MR) "High" (HR) Unrelated bone marrow donor registry screening Research # Primer Pairs 8 14 Method of Amplicon Generation Loci # Samples per GS FLX run achieved & DQA1, DPB1 454 fusion primer plates (workflow as above) As above (workflow as above)ǂ 22 & DPA1 40-48 (conservative est) Fluidigm Access Array™ǂ *GS Junior can also be used, ~8x fewer samples per run ∆Abstract #1025-LB ǂAbstract As above 96 454 fusion primer plates Clinical Disadvantages 88 (limitation as above) Fluidigm Access Array™∆ "Very High" (VHR) Advantages 88 (limited by MID/PTP region 454 fusion primer plates Storing primer plates at 4oC; (simplification of workflow by combination if use commercial Commercially available Limited to 11 MIDs-limits # product) pooling amplicons possible) of samples per run A, B, C, DQB1, DRB1 More MIDs→higher (also get DRB3/4/5) Need to purchase Fluidigm throughput; Less instrumentation & sample consumption & 192 Fluidigm Access Array™∆ disposables PCR reagent consumption #ORO1-02 NOTE: All current products are for Research Use Only Storing plates at 4oC; need MR & HR plates commercially available, to store VHR primers in diff (VHR primers could be format (liquid) unless made commercially available added on by lab) Less sample consumption & PCR reagent consumption as above Primer Set Comparison GS GType MR, HR with added VHR* amplicons 454 GS GType HLA Primer Sets MR Primers CLASS I HR Primers VHR • HLA-A: 2, 3 2, 3, 4 1, 2, 3, 4-5 • HLA-B: 2, 3 2, 3, 4 1, 2, 3, 4, 5 • HLA-C: 2, 3 2, 3, 4 1, 2, 3, 4, 5, 6-7 • DPA1 exon: N/A N/A 2 • DPB1 exon: N/A 2 2 • DQA1 exon: N/A 2 2 • DQB1 exons: 2 2, 3 2, 3 • DRB1 exon: 2 2 2, 3 • DRB 3, 4, 5 exon: 2 2 2, 3 CLASS II *Numbers shown are exons, “-” indicates intron Primer Plate Layout for Very High Resolution Sequencing 9 loci, 22 primer pairs, 11 MIDs, 10 samples per set (3 plates) HR Plate A A4-5 A4-5 A4-5 A4-5 A4-5 A4-5 A4-5 A4-5 A4-5 A4-5 B B4 B4 B4 B4 B4 B4 B4 B4 B4 B4 C C4 C4 C4 C4 C4 C4 C4 C4 C4 C4 D DPB1 DPB1 DPB1 DPB1 DPB1 DPB1 DPB1 DPB1 DPB1 DPB1 E DQA1 DQA1 DQA1 DQA1 DQA1 DQA1 DQA1 DQA1 DQA1 DQA1 F DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 DQB1 E3 G H A B C D E F G H VHR Plate A1 A1 B1 B1 C1 C1 B5 B5 C5 C5 C6-7 C6-7 DPA1 DPA1 DRB E3 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 A1 B1 C1 B5 C5 C6-7 DPA1 DRB E3 11 Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) Neg (-) 12 Commercially avail from 454 A B C D E F G H MR Plate 1 2 3 4 5 6 7 8 9 10 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A3 A3 A3 A3 A3 A3 A3 A3 A3 A3 B2 B2 B2 B2 B2 B2 B2 B2 B2 B2 B3 B3 B3 B3 B3 B3 B3 B3 B3 B3 C2 C2 C2 C2 C2 C2 C2 C2 C2 C2 C3 C3 C3 C3 C3 C3 C3 C3 C3 C3 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DQB1 E2 DRB E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 DRB1 E2 454 Amplicon Sequencing File Generation 454 Pyrosequencing Image Acquisition PNG Image Processing Image Processed CWF Signal Processing FNA Signal Processing (FASTA) Signal Processing SFF Consolidation (454 AVA software) Consolidated FNA (FASTA) Examine seq Genotyping (Conexio Assign) Genotype Report + Sequence Export Signal Processed CWF Conexio Assign ATF 454 Interface Genotypes automatically assigned , sequences visible Conexio Assign ATF 454 Interface Genotype Report allele name format and output format can be chosen Conexio Assign MPS v1.0 Genotyping Report All fields, MS Excel format A CWD filter OR “highlighting” of CWD alleles in report (preferred) has been requested 454 GS GType HLA +VHR primers—only part of report shown; assay includes DQA1, DQB1, DRB1 and DRB3/4/5 GS GType HLA HR primers, Conexio Assign MPSv1.0 References for gDNA; Noncoding sequence can be considered HLA-A genotype: Ambiguity String includes A*03:01:01:02N NC seq not activated Null GS GType HLA VHR primers, Conexio MPSv1.0 Noncoding sequence is activated, ambiguity string greatly reduced HLA-A genotype: A*02:01:01:01/02L, A*03:01:01:01 NC seq activated; Null resolved CHALLENGE: How could/should these sequences be reported? Reporting of Sequence Information Currently • Can report out (combined) consensus exon sequence that has given rise to list of possible genotypes for a given sample/locus. Can do this easily for all samples. (Q: If community decides sequence is necessary for publications, is this sufficient?) – Cannot report component (consensus) sequences (with exons matched) to give individual allele(s) – Doesn’t include intronic sequence (but can report consensus of each intron individually—too laborious to be practical) – FASTA format in notepad (Q: Sufficient for publication?) Reporting of Sequence Information Preferred • Option to report component sequences (with exons and introns matched) to give allele calls—imp for reporting new alleles • For (combined or individual alleles) consensus sequence – Option to report Coding only OR Coding plus Noncoding – Format options including XML (accepted by IMGT)—imp for reporting new alleles Works in Progress GS GType HLA VHR primers, Conexio MPSv1.0 New allele can be identified Genotype has 1 mismatch w/IMGT database; can determine in which allele GS GType HLA VHR primers, Conexio MPSv1.0 DRB1*12 allele is a perfect match with IMGT database GS GType HLA VHR primers, Conexio MPSv1.0 New allele is identifiable DRB1*07:01 allele has 1 mismatch w/IMGT database (A at b259 instead of G) Confirmed by Sanger sequencing Proposed acceptance criteria: Sequenced multiple times (2 different runs); Minimum read depth of 25 for each direction, each allele for (all) amplicons of prospective new allele; Mutation(s) defining new allele observed in both F and R direction Reporting Sequences for New Allele Using info from Res Layers, manually harvest sequences and assemble 1 allele “Copy Sequence” Output Simple Text file: Copy into Word Pad, Excel, Bioedit Not in FASTA or XML format (currently no way to convert to latter) Assume XML is most appropriate for submission to IMGT database In discussion with Conexio Gaps in IMGT database create ambiguity in typing Gaps indicated by “orange bar” in user interface but not in Genotyping Report etc. Issue: Would be good if alleles lacking sequence were flagged in Genotyping Report In discussion with Conexio Additional info & Summary • Using HR or VHR 454 sequencing HLA genotyping system including Conexio Assign ATF 454 or MPS v1.0 software, respectively: – Ambiguity string lengths are reduced to a practically reportable size – Genotype/ allele ambiguity strings (in various formats using combinations of delineation in columns, “+”, “or”, “,”) can be reported in Excel, text and XML(??) format at 1, 2, or all field level. – NMDP codes supported – Most recent IMGT nomenclature and references supported (updated with periodicity, 6 months); version of references used is reported – Export of consensus sequence used to make genotype calls for all loci/all samples is easily accomplished in FASTA format—currently doesn’t include NC sequence. – New alleles readily identifiable, however, reporting of amplicon sequences currently only possible by manual “harvesting” into text file. Acknowledgements • Roche Molecular Systems – Henry Erlich • Conexio Genomics – Damian Goodridge We Innovate Healthcare 22 Back-up slides 23 Ambiguity C*03:03/ 03:20N GS GType HLA HR primers Ambiguity Resolution C*03:03/ 03:20N VHR primers