Questions about LOINCs for gene sequences from Jerry Sable Discussion points (from C McDonald) Test for human mutations of a specific gene or genes are usually reported by exception: in other words, by noting what is different from “normal”, and not by reporting the whole sequence. o This variation can (and commonly is) reported by HGVS format -- a standard syntax for describing mutations (http://www.hgvs.org/dblist/dblist.html). Ideally, such reports would also include a LOINC code for the reference sequence (the code system from NCBI) so the position of the variation is crystal clear. o The variation can also be reported using NCBI’s dbSNP data base ID (http://www.ncbi.nlm.nih.gov/SNP/) -- dbSNP assigns a unique ID for each short sequence variation that occur frequently enough in a population to be termed polymorphic. (In order to emphasize the comprehensive nature of dbSNP’s content, the full name of the database was changed from “database of Single Nucleotide Polymorphism” to the more inclusive “database of Short Genetic Variation” in July of 2011. http://www.ncbi.nlm.nih.gov/books/NBK174586/) o Because HL7 v2 allows two coding systems to be reported in one OBX-5, both the HCVS notation and the dbSNP ID could be reported in one OBX (See HL7 LOINC-based specification for reporting genetic sequences: HL7 Version 2 Implementation Guide: Clinical Genomics; Fully LOINC-Qualified Genetic Variation Model) A different nomenclature exists for deletion duplications – and such abnormalities should be reported in the ISCN (Shaffer LG, McGowan-Jordan J, Schmid M, Editors. ISCN 2013: An International System for Human Cytogenetic Nomenclature. Karger:2013. ISBN-13: 978-3318022537), which is a syntax for reporting every kind of abnormality that can be observed in any kind of cytogenetic study from simple Q-bands to FISH and ArrCGH. It reports simple chromosome findings in the familiar way, e.g. XY, XX, or XXY, but can go much farther and provide detail at every level of cytopathology precision. If every cytopathology test result, included one structured variable (e.g. LOINC # 62356-1, Chromosome analysis result in ISCN expression….) that described the findings in ISCN syntax in addition to what is currently reported, receivers would have a fully-computable result. If needed, LOINC could create additional LOINC codes that deliver ISCN nomenclature but were more specific to the tissue or purpose of the exam. For details on an HL7 LOINC-based specification for reporting everything in a cytopathology report see (HL7 Version 2 Implementation Guide: Clinical Genomics; Fully LOINC-Qualified Cytogenetics model). Mutation analyses for a given gene can be done in multiple ways. o Historically (and still today) a set of probes for the variations known to be clinically important is run against the sample. So a report would list the mutations (in HGVS format and/or as DbSNP codes), but, ideally, would also report the mutations tested for. (Well more than 1000 variations are known to be cause cystic fibrosis, but only 30-80 are common enough to be tested. In LOINC such tests would be PRID tests. A parallel LOINC code for the same gene by with the name, “mutations tested for” would ideally be reported.) o It can be done by sequencing. The reporting of the variations found would be the same as for the probe case, but in contrast to the case for probe testing, one could discover variations of unknown significance. The field is still discovering how and whether to report these. The HL7-approved LOINC panel(s) for human sequencing provide LOINC terms (and appropriate answer lists) for describing the importance of such variations. Mutation analysis by sequencing should also describe the parts of the gene that was sequenced. This is usually described in text narrative. o It can be done by other less common methods, which we won’t go into here. Mutation analysis of infectious organisms o NCBI’s dbSNP accepts reports of variations from any organism, and it now carries genetic sequences for over 800 different organisms, including multiple independent submissions, frequency data, genotype data, and allele observations. NCBI has specialized databases of genetic sequences for Influenza virus, West Nile Virus, and Dengue virus with plans for Rotavirus. Still have to research what CDC has. They do have special databases for salmonella and related organisms but until recently those used an old technology -- I believe they are going to pure sequencing now. A lot is known about variation among HIV, Hep C, and Hep B (at least) but each virus tends to use a different nomenclature for variation. Some have suggested using the HGVS system which sounds like a good suggestion for reporting individual mutations in viruses and bacteria. Though would want to also have a way to report a reference sequence for each such organism. LOINC does have mechanism for reporting actual sequences, as illustrated below. Seq as the property defines a LOINC term that carries a sequence. We have used Nom as the scale because these are long strings of letters (text if you will). Public Health needed these for the literal sequence of viruses that had not yet been given a unique absolute ID, but the same approach would work for other uses including those listed below. Figure 1 From Jerry Sable Hi Dan, I've attached a PDF with questions about LOINCs for gene sequences. This has come up in the context of ELR 2.5.1 messaging. The PDF includes examples from messages. Cindy Johns has reviewed them already. Can you put this topic on the agenda for the next LOINC meeting? Let me know if you have any questions. Thanks, A LabCorp example message: This message was discussed on an ELR call. MSH|^~\&|LABCORP-CORP|LABCORP^34D0655205^CLIA|CA|CA-DOH|20120806152852||ORU^R01|2012080615285210001|P|2.3 PID||415183^^^^^TWIN CITIES COMMUNITY HOSPITAL&04238340|15309706470^^^^^CMBP Infectious Disease Dept.|342735|TEST^PATIENT^A||190101010000|M||U|||||||||||||||||||| OBR|1||15309706470|551697^GenoSure (R) MG^L|||2012060100001322|||||||||WRIGHT|||||||||F ZLR||TWIN CITIES COMMUNITY HOSPITAL, INTERFACE ACCOUNT|1100 LAS TABLAS RD^^TEMPLETON^CA^93465|8054344501^^^ OBX|1|TX|33630-5^HIV protease gene mutations detected [Identifier]^LN^551657^HIV GenoSure(R)^L|| CCTCARATCACTCTTTGGCAACGACCCCTAGTCACAATAAARATAGGGGGGCAACTAAGGGAAGCTCTATTAGATACAGGRGCAGATGATACAGTATTAGAAGAAATAAA TTTGCCAGGRAGATGGAAACCRAAAATGATAGGGGGAATTGGAGGTTTTATCAAAGTAAGACAGTATGATCAGRTAYTAGTAGAAATYTGYGGACAWARAGCCATAGGT ACAGTATTAGTAGGACCTACACCTGTCAACATAATTGGAAGAAATCTGTTGACTCAGCTTGGCTGYACTTTAAATTTT||||||F|||2012060100001322 The LOINC used in OBX-3 is: loinc_num component property time_aspct system scale_typ 33630-5 HIV protease gene mutations detected [Identifier] Prid Pt Isolate Nom method_typ class ABXBACT Question: Is Prid/Nom the best choice for this kind of result? CM: No this term intended to identify the specific mutations of importance via some coding system (Could be local) , Not the whole sequence. Would need a LOINC term with the same component but with property of Seq Another LabCorp example message MSH|^~\&|LABCORP-CORP|LABCORP^34D0655205^CLIA|CA|CA-DOH|20120806152852||ORU^R01|2012080615285210002|P|2.3 PID||415183^^^^^TWIN CITIES COMMUNITY HOSPITAL&04238340|15309706470^^^^^CMBP Infectious Disease Dept.|342735|TEST^PATIENT^A||190101010000|M||U|||||||||||||||||||| OBR|1||15309706470|815959^HIV-1 Genotypic Drug Resist, Panel^L|||2012060100001322|||||||||WRIGHT|||||||||F ZLR||TWIN CITIES COMMUNITY HOSPITAL, INTERFACE ACCOUNT|1100 LAS TABLAS RD^^TEMPLETON^CA^93465|8054344501^^^ CM: Same comment as above, this term would be used to report a discrete mutation. Need a different term with property of Seq and method of Sequencing to report the whole sequence. OBX|1|TX|30554-0^HIV reverse transcriptase gene mutations detected [Identifier]^LN^815962^Reverse Transcriptase Mutation^L| |CCTATTAGTCCTATTGAAACTGTACCAGTAARATTAAARCCAGGAATGGATGGCCCAAAAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTAGAAAT TTGYACAGAAATGGAAAAGGAAGGAAARATTTCAAAAATTGGGCCTGAAAATCCATACAATACTCCAGTATTTGCCATAAAGAAAAARGATAGTACTAAATGGAGRAART TAGTAGATTTCAGAGAACTTAACAAGAGAACTCAAGACTTYTGGGAAGTTCAATTAGGAATACCACATCCTGCAGGRTTRAAAAAGAAAAAATCAGTAACAGTAGTGGAT GTGGGTGATGCATATTTTTCAGTRCCCTTAGACAARGACTTCAGGAAATATACTGCATTTACCATACCTAGTATAAACAATGAGACACCAGGGATTAGATATCAGTACAATG TGCTTCCACAGGGATGGAARGGATCACCAGCAATATTYCAAAGTAGCATGACAAAAATCTTAGAACCTTTTAGAAAACAAAATCCAGACATAGTTATCTACCAGTATATGG ATGATTTGTATGTAGGWTCTGACTTAGAAATAGGGCAGCATAGARCAAAAATAGAGGAACTGAGAGAACATCTGTTGAGGTGGGGATTYACCACCCCAGACAAAAARCA YCAGAAAGAACCYCCATTCCTTTGGATGGGTTATGAACTCCATCCTGATAAATGGACAGTACAGCCTATCATGCTRCCAGAAAAAGACAGCTGGACTGTCAATGACATACA GAAGTTAGTRGGRAAATTRAATTGGGCAAGTCAAATYTATCCAGGGATTAARGTAAAGCAATTATGTAAACTCCTTAGAGGARCCAAAGCACTAACAGAAGTAGTACCACT AACRGAAGAAGCAGAGCTAGAACTAGCAGAAAACAGGGARATTCTAARAGARCCAGTGCATGGAGTGTATTATGAYCCATCAAAAGACTTAATAGCAGAAATACAGAAG CAGGGGYACGGCCAATGGACATATCARATTTATCAAGAGCCATTTAAAAATCTGAARACAGGRAAGTATGCAAGAATGAGGGGTGCCCACACTAATGATGTAAAACAATT AACAGAGGCAGTGCAAAAAGTRGCCACAGAAAGCATAGTAATATGGGGAAAGACTCCCAAATTTAGATTACCCATACAAAAGGAAACATGGGAATCA||||||F|||2012 060100001322 LabCorp's LOINC for HIV sequencing: loinc_num component property time_aspct system scale_typ 30554-0 HIV reverse transcriptase gene mutations detected Prid Pt Isolate Nom method_typ The OBX-5 result above contains 1201 characters. Prid/Nom might not be the best choice for a result like this. A possible new LOINC request for the test LabCorp is sending in this message: component HIV reverse transcriptase gene sequence property Seq time_aspct Pt system XXX scale_typ Nar method_typ Sequencing class exmpl_answers ABXBACT Listing of any mutations found Comments: Component: Another choice for Component could be the existing one: "HIV reverse transcriptase gene mutations detected" Property: Seq is probably a better choice than Prid. Scale: Nar seems like a better choice than Nom for long sequences. Method: Sequencing is probably an improvement over methodless. CM: Yes to all Your feedback and questions are requested on this. Some reference tables LOINCs with Property = Seq loinc_num component property time_ aspct system scale_ typ method_typ class status comments Example answer: tagttagtgtgaaagaagc 48021-0 Reverse primer Seq Pt Bld/Tiss Nom Molgen MOLPATH .MISC ACTIVE 48020-2 Forward primer Seq Pt Bld/Tiss Nom Molgen MOLPATH .MISC ACTIVE 74311-2 Porcine reproductive and respiratory syndrome virus ORF5 gene sequence Seq Pt Isolate Nom Sequencing MICRO ACTIVE Definition: Carries the literal string that represents the reverse primer for the sequencing. Use the usual abbreviation for the four nucleotides. Example answer: cctcattagtcgtt Definition: Carries the literal string that represents the forward primer used to perform one portion of the sequencing g .The nucleotides are specified with the standard one letter codes. Sequencing of ORF5 gene in PRRSV and reporting the actual base sequence. The submitter's lab performs reverse transcriptase PCR on various specimens, including tissue, serum, oral fluid, and semen, to detect and isolate the ORF5 gene. PCR products are then used for sequencing. LOINCs with Method = Sequencing CM: those that are not defined as ”seq” have answers that are either HGVS or some class of organism not the actual sequence recorded in OBX-5 loinc_num 61101-2 67569-4 74311-2 71357-8 component Influenza virus A neuraminidase RNA Streptococcus pyogenes M protein (emm) gene Porcine reproductive and respiratory syndrome virus ORF5 gene sequence PDGFRA gene exon 18 mutation analysis property Type Type time_aspct Pt Pt system XXX Isolate scale_typ Nom Nom method_typ Sequencing Sequencing class MICRO MICRO Seq Pt Isolate Nom Sequencing MICRO Prid Pt Bld/Tiss Nar Sequencing MOLPATH.MUT loinc_num 72487-2 69487-7 73735-3 49122-5 49123-3 49124-1 70907-1 49125-8 70294-4 39025-2 49126-6 74310-4 70867-7 49127-4 49128-2 71760-3 71761-1 74306-2 74308-8 74309-6 74307-0 72767-7 72201-7 72200-9 component TF gene full mutation analysis TNFRSF13B gene full mutation analysis ACADVL gene full mutation analysis Anaplasma sp identified Bartonella sp identified Coxiella burnetii identified Cryptococcus sp rDNA Ehrlichia sp identified Entamoeba sp DNA Influenza virus A hemagglutinin cDNA Orientia tsutsugamushi identified Porcine reproductive and respiratory syndrome virus strain identified Rabies virus strain identified Rickettsia sp identified Rickettsia typhus group identified Rotavirus identified Rotavirus identified Porcine reproductive and respiratory syndrome virus ORF5 gene sequence homology to Fostera Porcine reproductive and respiratory syndrome virus ORF5 gene sequence homology to IngelvacATP Porcine reproductive and respiratory syndrome virus ORF5 gene sequence homology to IngelvacMLV Porcine reproductive and respiratory syndrome virus ORF5 gene sequence homology to Lelystad Influenza virus A hemagglutinin segment sequence identifier Influenza virus A matrix protein segment sequence identifier Influenza virus A neuraminidase segment sequence identifier Jerry Sable, APHL 2014-0527 jsable@tsjg.com property Prid Prid Prid Prid Prid Prid Prid Prid Prid Prid Prid time_aspct Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt Pt system Bld/Tiss Bld/Tiss Bld/Tiss XXX XXX XXX XXX XXX XXX XXX XXX scale_typ Nar Nar Nom Nom Nom Nom Nom Nom Nom Nom Nom method_typ Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing Sequencing class MOLPATH.MUT MOLPATH.MUT MOLPATH.MUT MICRO MICRO MICRO MICRO MICRO MICRO MICRO MICRO Prid Pt Isolate Nom Sequencing MICRO Prid Prid Prid Prid Prid Pt Pt Pt Pt Pt XXX XXX XXX Stool Isolate Nom Nom Nom Nom Nom Sequencing Sequencing Sequencing Sequencing Sequencing MICRO MICRO MICRO MICRO MICRO NFr Pt Isolate Qn Sequencing MICRO NFr Pt Isolate Qn Sequencing MICRO NFr Pt Isolate Qn Sequencing MICRO NFr Pt Isolate Qn Sequencing MICRO ID Pt Isolate Nom Sequencing MICRO ID Pt Isolate Nom Sequencing MICRO ID Pt Isolate Nom Sequencing MICRO