StandardisedNomenclatureand DefinitionsforHumanGenotypic Variants Zbigniew Rudzki DepartmentofPathology UniversityofMelbourne StandardisedNomenclature Whydoweneedit? • Whenmutations(sequencevariants)werefirstbeing published,theauthorsoftenusedavarietyofnomenclatures fortheirparticularvariant • Howeveritsoonbecameobviousthatdifferentauthorswere describedthesamevariantindifferentways. • Thisbecamearealproblemwhensubmittingvariantstothe rapidlyproliferatinglocus(gene)specificdatabasesinthe 1980sand1990s • Itbecameessentialthatastandardised,internationally recognisednomenclaturewasdevelopedwhichaccurately andunambiguouslydescribedthevariant CorrectGeneName • Beforeyoucanstartonnomenclatureofhuman genevariationyouneedtoensurethatyouhavethe correctgenename • Asnewgeneswerediscovered,theauthorscameup withagenename.Frequentlyadifferentgroup wouldindependentlyfindthesamegenebutgiveit adifferentname. • TheHumanGenomeOrganisation(HUGO)tookup thetaskofoverseeingthestandardisationofgene names UseHGNCgenenomenclature • TheHUGOGeneNomenclatureCommittee(HGNC) maintainsthelistofapprovednamesforallgenes includinghumangenes. • Thisisnecessarytoensureinvestigatorsaretalkingabout thesamegenewithoutambiguity. • Thislistofcanbefoundathttp://www.genenames.org • Humangenesaredesignatedincapitalseg – – – – DMD– Dystrophin ,musculardystrophygene F8– Factor8orhaemophiliagene HBB– Betaglobin geneinbetathalassaemia BRCA1– Breastcancer1gene • Ideallyanysequencevariantshouldbeprecededbythe approvedgenename Historyofcurrentstandardised nomenclature • Twoseminalpapersonastandardisednomenclaturewere publishedin1996. – Beaudet etalHumMut 8:197202,1996 – Beutler etalHumMut 8:203206,1996 • ThisledtotheformationofanInternationalNomenclature WorkingGroupledbyStylianos Antonarakis whichcameupa suggestednomenclature – Antonarakis etalHumMut 11:13,1998 – DenDunnen andAntonarakis HumMut 15:712,2000 • Theserecommendationshavebeencontinuallyrefinedand updatedovertheyearsandarenowgenerallyaccepted CurrentNomenclaturesituation • Theresponsibilityforthemaintenanceand refinementoftheguidelineshasnowbeentaken overbytheHumanGenomeVariationSociety – http://www.hgvs.org/mutnomen/ • Itisimportanttonotethattheseguidelinesareina constantstateofevolution • Alwaysusethelatestversion • Thistalkhasbeentakendirectlyfromthe recommendationsontheHGVSwebsite Somedefinitions • Thewordmutationisaveryoldgenetictermandis usedtodescribeadiseasecausingvariantwhile polymorphismisusedtodescribeaharmlesschange occurringinabout1%ofthepopulation • However,asourknowledgeofthenaturalvariation withinthegenomehasincreased,sohasthenumber andtypeofvariants.Manysequencevariantssimply don'tfitintothosetwocategories • Thustheguidelinessuggesttheuseofneutralterms suchas“sequencevariant”or“allelicvariant” RecommendationsforSequence Variants • SequencevariationsarebestdescribedattheDNAlevel. EithergenomicDNAorcopyDNA(cDNA)canbeused • WhendescribinganyDNAsequencevariant,youshould alwaysuseaReferenceSequencewhetheritisDNAor cDNA. • www.ncbi.nlm.nih.gov/RefSeq/. • AnydescriptionofthevariantshouldincludetheRefSeq accessionnumber – NG_000007.3 – NM_000518.4 HomoSapiensbetaglobin regionHBB HomoSapienshaemoglobinbeta(HBB)mRNA • ToavoidconfusionbetweenDNAandRNAsequences,the nucleotidenumberisprecededbya“g.”whenagenomicor a“c.”whenacDNA referencesequenceisused NomenclatureRecommendations • Nucleotides – ThesearedesignatedbybasesinuppercaseA(adenine),C(cytosine),G(guanine) andT(Thymine) • Nucleotidenumbering– Genomicsequence – Thenucleotidesarenumberedconsecutivelyfromnucleotide1ofthereference sequence – Introns aredescribedintheannotationasoccurringbetweencertainnucleotide numbers • Nucleotidenumbering– Codingsequence – – – – – Nucleotide1istheAoftheATG translationinitiatingcodon Thereisnonucleotide0 Thenucleotide5’oftheATGis1 Thenucleotide3’ofthetranslationstopcodon is*1 Beginningoftheintron:thenumberofthelastnucleotideoftheprecedingexon,a plussignandthepositionintheintron eg c.77+1G,c.77+2T – Endoftheintron:thenumberofthefirstnucleotideofthefollowingexon anda– signandthepositionupstreamintheintron eg c.78– 1G,c.78– 2A Partofgene nucleotidenumbering genomic ReferenceSequence nucleotidenumbering codingDNA ReferenceSequence codon numbering protein ReferenceSequence 1to270 (300to31) 5'UTR 271to300 30to1 codingregion 301to312 1to12 1to4 intron 1 313to412 12+1...12+50, 1350...131 exon 2 413to488 13to88 5to29(30) intron 2 489to689 88+1...88+100, 89100...891 exon3 689to723 89to123 30to41 724to1023 123+1...123+150, 124150...1241 exon4 1024to1200 124to300 42to100 intron4 1201to1600 300+1...300+200, 301200...3011 codingregion 1601to1630 301to330 101to109 UTR,containinga(CA)7 stretchfromnts1700to1713(coding DNA*70to*83);polyAadditionsiteat 1631to1850 *1to*220 1851to2000 (*221to*370) 5'geneflankingregion exon 1 intron 3 exon5 containsrarealternatively splicedexonfrom800to 859(codingDNA123+77 to123+136) 1825(codingDNA*195) 3'geneflankingregion RecommendationsforSubstitutions Part1 • Nucleotidesubstitutionsstartwiththe nucleotidenumberfollowedbythechange. • Example:GtoCsubstitutionatposition303of thegenomicrefseq isdescribedas – g.303G>A • Howeverthesamenucleotideisalsonumber 3ofthecodingsequence.Thusitwouldbe – c.3G>A RecommendationsforSubstitutions Part2 • 5’partofintron:TtoGsubstitutioninsecond nucleotideoftheintron positionedbetween codingnucleotides88and89 • c.88+2T>Corg.490T>C • 3’partofintron:GtoTsubstitutioninthelast nucleotideoftheintron positionedbetween codingnucleotides88and89 • c.891G>Torg.688G>T RecommendationsforDeletions • Deletionsaredesignatedby“del”afterthedescription ofthedeletedsegment – Singlenucleotidedeletion–deletionofnucleotide13of codingsequence • g.413del (g.413delG) • c.13del (c.13delG) – Severalnucleotidesdeleted– deletionofnucleotides92to 94ofcodingsequence • g.692_694del (g.692_694delGAC) • c.92_94del (c.92_94delGAC) • Fordeletionsinmononucleotideorshorttandem repeats,themost3’copyisarbitrarilyassignedtohave beenchanged RecommendationsforInsertions • Insertionsaredesignatedby“ins”afterthe nucleotidesflankingtheinsertion – SinglenucleotideinsertionofaTbetweennucleotides 51and52ofthecodingsequence • g.451452insT • c.51_52insT – Severalnucleotidesinsertedeg GAGAinserted betweennucleotides51and52ofcodingsequence • g.451_452insGAGA • c.51_52insGAGA RecommendationsforDuplications • Duplications;thesearedesignatedby“dup” afteradescriptionoftheduplicatedsegment – Singlenucleotideduplication– duplicationof nucleotide13ofthecodingregion • g.413dup (g.423dupT) • c.13dup (c.13dupT) – Severalnucleotideduplication– duplicationof nucleotides94to94ofthecodingsequence • g.692_694dup (g.692_694dupGAC) • c.92_94dup (c.92_94dupGAC) RecommendationsforInversions • Inversionsaredesignatedby“inv”afterthe nucleotidenumberofthenucleotides inverted – Shortinversion– inversionofnucleotides77to80 ofthecodingsequence • g.1077_1080inv (g.1077_1080invCTAG) • c.77_80inv (c.77_80invCTAG) RecommendationsforvariableShort SequenceRepeats • ShortsequencerepeatssuchasATGCGATGTGTGCC shouldbedescribedas – c.123+74TG(3_6).Thec.123+74indicatesthestartofthe firstnucleotideoftherepeatandTGindicatesthe sequenceoftherepeatandthe[36]indicatesthatthe repeatisfound3to6timesinthepopulation • Anotherexampleisthevariablerepeatregionsof5to 9Tnucleotidesinintron 9oftheCFTRgene.ThecDNA referencesequencecontainsastretchof7T’s.The recommendationistodescribeindividualalleles differingfromthisreferencesequenceeg – c.121012T[5]orc.121012T[9] HelpwithNomenclature • Acomputerprogramhasbeendevelopedtoassistwith gettingtheHGVSnomenclaturecorrect • ThisprogramiscalledMutalyzer andithandlesmost variationtypes:substitution,deletion,duplication, insertion,indel,andsplicesitechangesfollowing currentrecommendationsoftheHumanGenome VariationSociety(HGVS) • InputisaGenBank accessionnumberoranuploaded referencesequencefileinGenBank format,anHGNC genesymbol,andthevariant(singleorinabatchfile) • Mutalyzer generatesvariantdescriptionsatDNAlevel, allannotatedtranscriptsandthededucedoutcomeat proteinlevel ImprovingSequenceVariantDescriptionsinMutation DatabasesandLiteratureUsingtheMutalyzer Sequence VariationNomenclatureChecker MartinWildeman,ErnestvanOphuizen,JohanT.denDunnen,andPeterE.M.Taschner DepartmentofHumanGenetics,Center ofHumanandClinicalGenetics,Leiden UniversityMedicalCenter,Leiden,TheNetherlands HumanMutation29(1),613,2008 Unambiguousandcorrectsequencevariantdescriptionsareofutmostimportance,notintheleastsincemistakesand uncertaintiesmayleadtoundesirederrorsinclinicaldiagnosis.WedevelopedtheMutationAnalyzer (Mutalyzer)sequence variationnomenclaturechecker(www.lovd.nl/mutalyzer;lastaccessed13September2007)forautomatedanalysisand correctionofsequencevariantdescriptionsusingreferencesequencesfromanyorganism.Mutalyzer handlesmostvariation types:substitution,deletion,duplication,insertion,indel,andsplicesitechangesfollowingcurrentrecommendationsofthe HumanGenomeVariationSociety(HGVS).InputisaGenBank accessionnumberoranuploadedreferencesequencefilein GenBank formatwithusermodifiedannotation,anHGNCgenesymbol,andthevariant(singleorinabatchfile).Mutalyzer generatesvariantdescriptionsatDNAlevel,thelevelofallannotatedtranscriptsandthededucedoutcomeatproteinlevel. To validateMutalyzer’s performanceandtoinvestigatethesequencevariantdescriptionqualityinlocusspecificmutation databases(LSDBs),morethan11,000variantsinthePAH,BICBRCA2,andHbVar databaseswereanalyzed,showingthat87%, 25%,and38%,respectively,wereerrorfreeandfollowingtherecommendations.LowrecognitionratesinBICandHbVar (38% and51%,respectively)wereduetolackofawellannotatedgenomicreferencesequence(HbVar)ornoncompliancetothe guidelines(BRCA2).Providedwithwellannotatedgenomicreferencesequences,Mutalyzer isveryeffectiveforthecuration of newlydiscoveredsequencevariationdescriptionsandexistingLSDBdata.Mutalyzer willbelinkedtotheLeidenOpensource VariationDatabase(LOVD)(www.LOVD.nl;lastaccessed13September2007)andisthefirstmoduleofasequencevariant effectpredictionpackage.HumMutat 29(1),6–13,2008 http://www.mutalyzer.nl/2.0/ Nomenclaturedoesnotimplyfunction • Thenomenclaturesystemisintendedforthe unambiguousdescriptionofasequence variantfromthereferencesequence • Itwasneverintendedtoascribethe consequenceofthevariant • Thatisawholenewanalyticalprocess Summary • Beforeyoustartondescribingyoursequence variant – Ensureyouhavethecorrectgenename – Ensuretouseareferencesequence(s)forthatgene • ObtainacopyofthelatestversionoftheHGVS nomenclatureandusetheexamplesprovidedto describeyourvariant • Complexvariantsareachallengetothe nomenclaturesystem.Thisiswherethirdparty softwaremayhelp