StandardisedNomenclatureand DefinitionsforHumanGenotypic Variants Zbigniew Rudzki

advertisement
StandardisedNomenclatureand
DefinitionsforHumanGenotypic
Variants
Zbigniew Rudzki
DepartmentofPathology
UniversityofMelbourne
StandardisedNomenclature
Whydoweneedit?
• Whenmutations(sequencevariants)werefirstbeing
published,theauthorsoftenusedavarietyofnomenclatures
fortheirparticularvariant
• Howeveritsoonbecameobviousthatdifferentauthorswere
describedthesamevariantindifferentways.
• Thisbecamearealproblemwhensubmittingvariantstothe
rapidlyproliferatinglocus(gene)specificdatabasesinthe
1980sand1990s
• Itbecameessentialthatastandardised,internationally
recognisednomenclaturewasdevelopedwhichaccurately
andunambiguouslydescribedthevariant
CorrectGeneName
• Beforeyoucanstartonnomenclatureofhuman
genevariationyouneedtoensurethatyouhavethe
correctgenename
• Asnewgeneswerediscovered,theauthorscameup
withagenename.Frequentlyadifferentgroup
wouldindependentlyfindthesamegenebutgiveit
adifferentname.
• TheHumanGenomeOrganisation(HUGO)tookup
thetaskofoverseeingthestandardisationofgene
names
UseHGNCgenenomenclature
• TheHUGOGeneNomenclatureCommittee(HGNC)
maintainsthelistofapprovednamesforallgenes
includinghumangenes.
• Thisisnecessarytoensureinvestigatorsaretalkingabout
thesamegenewithoutambiguity.
• Thislistofcanbefoundathttp://www.genenames.org
• Humangenesaredesignatedincapitalseg
–
–
–
–
DMD– Dystrophin ,musculardystrophygene
F8– Factor8orhaemophiliagene
HBB– Betaglobin geneinbetathalassaemia
BRCA1– Breastcancer1gene
• Ideallyanysequencevariantshouldbeprecededbythe
approvedgenename
Historyofcurrentstandardised
nomenclature
• Twoseminalpapersonastandardisednomenclaturewere
publishedin1996.
– Beaudet etalHumMut 8:197202,1996
– Beutler etalHumMut 8:203206,1996
• ThisledtotheformationofanInternationalNomenclature
WorkingGroupledbyStylianos Antonarakis whichcameupa
suggestednomenclature
– Antonarakis etalHumMut 11:13,1998
– DenDunnen andAntonarakis HumMut 15:712,2000
• Theserecommendationshavebeencontinuallyrefinedand
updatedovertheyearsandarenowgenerallyaccepted
CurrentNomenclaturesituation
• Theresponsibilityforthemaintenanceand
refinementoftheguidelineshasnowbeentaken
overbytheHumanGenomeVariationSociety
– http://www.hgvs.org/mutnomen/
• Itisimportanttonotethattheseguidelinesareina
constantstateofevolution
• Alwaysusethelatestversion
• Thistalkhasbeentakendirectlyfromthe
recommendationsontheHGVSwebsite
Somedefinitions
• Thewordmutationisaveryoldgenetictermandis
usedtodescribeadiseasecausingvariantwhile
polymorphismisusedtodescribeaharmlesschange
occurringinabout1%ofthepopulation
• However,asourknowledgeofthenaturalvariation
withinthegenomehasincreased,sohasthenumber
andtypeofvariants.Manysequencevariantssimply
don'tfitintothosetwocategories
• Thustheguidelinessuggesttheuseofneutralterms
suchas“sequencevariant”or“allelicvariant”
RecommendationsforSequence
Variants
• SequencevariationsarebestdescribedattheDNAlevel.
EithergenomicDNAorcopyDNA(cDNA)canbeused
• WhendescribinganyDNAsequencevariant,youshould
alwaysuseaReferenceSequencewhetheritisDNAor
cDNA.
• www.ncbi.nlm.nih.gov/RefSeq/.
• AnydescriptionofthevariantshouldincludetheRefSeq
accessionnumber
– NG_000007.3
– NM_000518.4
HomoSapiensbetaglobin regionHBB
HomoSapienshaemoglobinbeta(HBB)mRNA
• ToavoidconfusionbetweenDNAandRNAsequences,the
nucleotidenumberisprecededbya“g.”whenagenomicor
a“c.”whenacDNA referencesequenceisused
NomenclatureRecommendations
• Nucleotides
– ThesearedesignatedbybasesinuppercaseA(adenine),C(cytosine),G(guanine)
andT(Thymine)
• Nucleotidenumbering– Genomicsequence
– Thenucleotidesarenumberedconsecutivelyfromnucleotide1ofthereference
sequence
– Introns aredescribedintheannotationasoccurringbetweencertainnucleotide
numbers
• Nucleotidenumbering– Codingsequence
–
–
–
–
–
Nucleotide1istheAoftheATG translationinitiatingcodon
Thereisnonucleotide0
Thenucleotide5’oftheATGis1
Thenucleotide3’ofthetranslationstopcodon is*1
Beginningoftheintron:thenumberofthelastnucleotideoftheprecedingexon,a
plussignandthepositionintheintron eg c.77+1G,c.77+2T
– Endoftheintron:thenumberofthefirstnucleotideofthefollowingexon anda–
signandthepositionupstreamintheintron eg c.78– 1G,c.78– 2A
Partofgene
nucleotidenumbering
genomic
ReferenceSequence
nucleotidenumbering
codingDNA
ReferenceSequence
codon numbering
protein
ReferenceSequence
1to270
(300to31)
5'UTR
271to300
30to1
codingregion
301to312
1to12
1to4
intron 1
313to412
12+1...12+50,
1350...131 exon 2
413to488
13to88
5to29(30)
intron 2
489to689
88+1...88+100,
89100...891
exon3
689to723
89to123
30to41
724to1023
123+1...123+150,
124150...1241
exon4
1024to1200
124to300
42to100
intron4
1201to1600
300+1...300+200,
301200...3011
codingregion
1601to1630
301to330
101to109
UTR,containinga(CA)7
stretchfromnts1700to1713(coding
DNA*70to*83);polyAadditionsiteat
1631to1850
*1to*220
1851to2000
(*221to*370)
5'geneflankingregion
exon 1
intron 3
exon5
containsrarealternatively
splicedexonfrom800to
859(codingDNA123+77
to123+136)
1825(codingDNA*195)
3'geneflankingregion
RecommendationsforSubstitutions
Part1
• Nucleotidesubstitutionsstartwiththe
nucleotidenumberfollowedbythechange.
• Example:GtoCsubstitutionatposition303of
thegenomicrefseq isdescribedas
– g.303G>A
• Howeverthesamenucleotideisalsonumber
3ofthecodingsequence.Thusitwouldbe
– c.3G>A
RecommendationsforSubstitutions
Part2
• 5’partofintron:TtoGsubstitutioninsecond
nucleotideoftheintron positionedbetween
codingnucleotides88and89
•
c.88+2T>Corg.490T>C
• 3’partofintron:GtoTsubstitutioninthelast
nucleotideoftheintron positionedbetween
codingnucleotides88and89
• c.891G>Torg.688G>T
RecommendationsforDeletions
• Deletionsaredesignatedby“del”afterthedescription
ofthedeletedsegment
– Singlenucleotidedeletion–deletionofnucleotide13of
codingsequence
• g.413del (g.413delG)
• c.13del (c.13delG)
– Severalnucleotidesdeleted– deletionofnucleotides92to
94ofcodingsequence
• g.692_694del (g.692_694delGAC)
• c.92_94del (c.92_94delGAC)
• Fordeletionsinmononucleotideorshorttandem
repeats,themost3’copyisarbitrarilyassignedtohave
beenchanged
RecommendationsforInsertions
• Insertionsaredesignatedby“ins”afterthe
nucleotidesflankingtheinsertion
– SinglenucleotideinsertionofaTbetweennucleotides
51and52ofthecodingsequence
• g.451452insT
• c.51_52insT
– Severalnucleotidesinsertedeg GAGAinserted
betweennucleotides51and52ofcodingsequence
• g.451_452insGAGA
• c.51_52insGAGA
RecommendationsforDuplications
• Duplications;thesearedesignatedby“dup”
afteradescriptionoftheduplicatedsegment
– Singlenucleotideduplication– duplicationof
nucleotide13ofthecodingregion
• g.413dup (g.423dupT)
• c.13dup
(c.13dupT)
– Severalnucleotideduplication– duplicationof
nucleotides94to94ofthecodingsequence
• g.692_694dup (g.692_694dupGAC)
• c.92_94dup
(c.92_94dupGAC)
RecommendationsforInversions
• Inversionsaredesignatedby“inv”afterthe
nucleotidenumberofthenucleotides
inverted
– Shortinversion– inversionofnucleotides77to80
ofthecodingsequence
• g.1077_1080inv (g.1077_1080invCTAG)
• c.77_80inv
(c.77_80invCTAG)
RecommendationsforvariableShort
SequenceRepeats
• ShortsequencerepeatssuchasATGCGATGTGTGCC
shouldbedescribedas
– c.123+74TG(3_6).Thec.123+74indicatesthestartofthe
firstnucleotideoftherepeatandTGindicatesthe
sequenceoftherepeatandthe[36]indicatesthatthe
repeatisfound3to6timesinthepopulation
• Anotherexampleisthevariablerepeatregionsof5to
9Tnucleotidesinintron 9oftheCFTRgene.ThecDNA
referencesequencecontainsastretchof7T’s.The
recommendationistodescribeindividualalleles
differingfromthisreferencesequenceeg
–
c.121012T[5]orc.121012T[9]
HelpwithNomenclature
• Acomputerprogramhasbeendevelopedtoassistwith
gettingtheHGVSnomenclaturecorrect
• ThisprogramiscalledMutalyzer andithandlesmost
variationtypes:substitution,deletion,duplication,
insertion,indel,andsplicesitechangesfollowing
currentrecommendationsoftheHumanGenome
VariationSociety(HGVS)
• InputisaGenBank accessionnumberoranuploaded
referencesequencefileinGenBank format,anHGNC
genesymbol,andthevariant(singleorinabatchfile)
• Mutalyzer generatesvariantdescriptionsatDNAlevel,
allannotatedtranscriptsandthededucedoutcomeat
proteinlevel
ImprovingSequenceVariantDescriptionsinMutation
DatabasesandLiteratureUsingtheMutalyzer Sequence
VariationNomenclatureChecker
MartinWildeman,ErnestvanOphuizen,JohanT.denDunnen,andPeterE.M.Taschner
DepartmentofHumanGenetics,Center ofHumanandClinicalGenetics,Leiden
UniversityMedicalCenter,Leiden,TheNetherlands
HumanMutation29(1),613,2008
Unambiguousandcorrectsequencevariantdescriptionsareofutmostimportance,notintheleastsincemistakesand
uncertaintiesmayleadtoundesirederrorsinclinicaldiagnosis.WedevelopedtheMutationAnalyzer (Mutalyzer)sequence
variationnomenclaturechecker(www.lovd.nl/mutalyzer;lastaccessed13September2007)forautomatedanalysisand
correctionofsequencevariantdescriptionsusingreferencesequencesfromanyorganism.Mutalyzer handlesmostvariation
types:substitution,deletion,duplication,insertion,indel,andsplicesitechangesfollowingcurrentrecommendationsofthe
HumanGenomeVariationSociety(HGVS).InputisaGenBank accessionnumberoranuploadedreferencesequencefilein
GenBank formatwithusermodifiedannotation,anHGNCgenesymbol,andthevariant(singleorinabatchfile).Mutalyzer
generatesvariantdescriptionsatDNAlevel,thelevelofallannotatedtranscriptsandthededucedoutcomeatproteinlevel. To
validateMutalyzer’s performanceandtoinvestigatethesequencevariantdescriptionqualityinlocusspecificmutation
databases(LSDBs),morethan11,000variantsinthePAH,BICBRCA2,andHbVar databaseswereanalyzed,showingthat87%,
25%,and38%,respectively,wereerrorfreeandfollowingtherecommendations.LowrecognitionratesinBICandHbVar (38%
and51%,respectively)wereduetolackofawellannotatedgenomicreferencesequence(HbVar)ornoncompliancetothe
guidelines(BRCA2).Providedwithwellannotatedgenomicreferencesequences,Mutalyzer isveryeffectiveforthecuration of
newlydiscoveredsequencevariationdescriptionsandexistingLSDBdata.Mutalyzer willbelinkedtotheLeidenOpensource
VariationDatabase(LOVD)(www.LOVD.nl;lastaccessed13September2007)andisthefirstmoduleofasequencevariant
effectpredictionpackage.HumMutat 29(1),6–13,2008
http://www.mutalyzer.nl/2.0/
Nomenclaturedoesnotimplyfunction
• Thenomenclaturesystemisintendedforthe
unambiguousdescriptionofasequence
variantfromthereferencesequence
• Itwasneverintendedtoascribethe
consequenceofthevariant
• Thatisawholenewanalyticalprocess
Summary
• Beforeyoustartondescribingyoursequence
variant
– Ensureyouhavethecorrectgenename
– Ensuretouseareferencesequence(s)forthatgene
• ObtainacopyofthelatestversionoftheHGVS
nomenclatureandusetheexamplesprovidedto
describeyourvariant
• Complexvariantsareachallengetothe
nomenclaturesystem.Thisiswherethirdparty
softwaremayhelp
Download