1st July 2008 HGVbaseG2P Genotype and Allele representations (where n represents the flanking sequence) General rules for Alleles 1) Alleles are always presented in the context of 10 base upstream and 10 bases downstream, so that the DNA strand is clear 2) Alleles MUST ALWAYS be delimited with () e.g., nnnnnnnnnn(T)nnnnnnnnnn for the ‘T’ allele of a SNP 3) Allele sequences can be compressed down with the appropriate nomenclature e.g ATTTTTTAAAAAAAAA becomes (A(T)6(A)9) 4) Text-based rather than sequence-based alleles must exactly match the representation given in dbSNP (e.g LARGEDELETION) but with the text string additionally enclosed in Double Quotes “” and () eg. nnnnnnnnnn(“LARGEDELETION”)nnnnnnnnnn General rules for Genotypes 1) Genotypes are always presented in the context of 10 base upstream and 10 bases downstream, so that the DNA strand is clear 2) Genotypes MUST ALWAYS be delimited with [] e.g., nnnnnnnnnn[(T)]nnnnnnnnnn for a ‘T’ allele homozygote 3) For genotypes of type ‘QUALITATIVE’ (i.e., where the utilized assay technology allows alleles to be merely seen or not seen), the detected alleles are listed with a ‘+’ separator e.g., nnnnnnnnnn[(G)+(T)]nnnnnnnnnn for a ‘T’ & ‘G’ allele heterozygote Note: virtually all genotype data today are of type ‘QUALITATIVE’ 4) For genotypes of type ‘RATIO’ (i.e., where the utilized assay technology allows alleles to be quantified with respect to each other when more than one allele is detected), the detected alleles are listed with a ‘+’ separator and a ratio number in front of the opening curved bracket for each allele e.g., nnnnnnnnnn[2(P)+1(Q)]nnnnnnnnnn for a genotype where alleles P and Q are measured to be in the ratio of 2:1 5) For genotypes of type ‘QUANTITATIVE’ (i.e., where the utilized assay technology allows alleles to be quantified in absolute terms per cell), the detected alleles are listed with a ‘+’ separator and a quantification number plus ‘x’ immediately after the opening curved bracket for each allele e.g., nnnnnnnnnn[(1xP)+(0.5xQ)]nnnnnnnnnn for a genotype where alleles P is single copy per cell and allele Q is deleted from 50% of cells See the Nomenclature tables at the end of this document for further examples of allele and genotype conventions Single Nucleotide Polymorphism (SNP) Definition: single base substitutions involving A, T, C, or G Example allele representations: dbSNP: G/T HGVbaseG2P: nnnnnnnnnn(G)nnnnnnnnnn nnnnnnnnnn(T)nnnnnnnnnn Example qualitative genotype representations: homozygote: nnnnnnnnnn[(G)]nnnnnnnnnn homozygote: nnnnnnnnnn[(T)]nnnnnnnnnn heterozygote: nnnnnnnnnn[(G)+(T)]nnnnnnnnnn Multi-Nucleotide Polymorphism (MNP) Definition: variations that are multi-base variations, with all alleles being the same length Example allele representations: dbSNP: ACG/TTC HGVbaseG2P: nnnnnnnnnn(ACG)nnnnnnnnnn nnnnnnnnnn(TTC)nnnnnnnnnn Example qualitative genotype representations: homozygote: nnnnnnnnnn[(ACG)]nnnnnnnnnn homozygote: nnnnnnnnnn[(TTC)]nnnnnnnnnn heterozygote: nnnnnnnnnn[(ACG)+(TTC)]nnnnnnnnnn No-variation (none) Definition: Segments of sequence that are assayed and determined to be invariant in a set of samples dbSNP: HGVbaseG2P: NOVARIATION nnnnnnnnnn(“NOVARIATION”)nnnnnnnnnn Insertion/Deletion Polymorphism (in-del) Definition: An insertion of one or more nucleotides in one version of a sequence relative to another. Since the molecular event that gave rise to this observation cannot be determined from the alleles alone (i.e. was it an insertion or a deletion), both events are incorporated into the name of this polymorphism type. In dbSNP, indels are designated using the full sequence of the insertion as one allele, and a "-" character to specify the deleted allele. Note: In HGVbaseG2P the deleted allele is specified as “_” to avoid confusion that “-“ is interpreted as a numeric operation. Also, “_” more appropriately reflects the alphabetical nature of the sequences. Example allele representations: dbSNP: -/T HGVbaseG2P: nnnnnnnnnn(_)nnnnnnnnnn nnnnnnnnnn(T)nnnnnnnnnn Example qualitative genotype representations: homozygote: nnnnnnnnnn[(_)]nnnnnnnnnn homozygote: nnnnnnnnnn[(T)]nnnnnnnnnn heterozygote: nnnnnnnnnn[(_)+(T)]nnnnnnnnnn Microsatellite or short tandem repeat (STR) Definition: Alleles consist of a repeated sequence motif and the number of tandem copies of this motif. Expansion of the motif into full-length sequence will be only an approximation of the true genomic sequence because microsatellite markers are typically not fully sequenced and are resolved as size variants only. Example_1 allele representations: dbSNP: (TAGATCATGCTGGAGCTTCTGGTGGG)28/41/49 HGVbaseG2P: nnnnnnnnnn((TAGATCATGCTGGAGCTTCTGGTGGG)28)nnnnnnnnnn nnnnnnnnnn((TAGATCATGCTGGAGCTTCTGGTGGG)41)nnnnnnnnnn nnnnnnnnnn((TAGATCATGCTGGAGCTTCTGGTGGG)49)nnnnnnnnnn ...note: the () enclosing brackets around each allele Example_2 allele representations: dbSNP: (A)1/10/11/T HGVbaseG2P: nnnnnnnnnn(A)nnnnnnnnnn nnnnnnnnnn((A)10)nnnnnnnnnn nnnnnnnnnn((A)11)nnnnnnnnnn nnnnnnnnnn(T)nnnnnnnnnn ...note: the () enclosing brackets around each allele Example_3 allele representations: dbSNP: (CA)11CGCACA(CG)6(CA)8/(CA)13CGCACA(CG)6(CA)8/(CA)14 CGCACA(CG)6(CA)8/(CA)14CGCACA(CG)7(CA)8/(CA)15CGCACA (CG)6(CA)8/(CA)15CGCACA(CG)7(CA)8/(CA)17CGCA/(CA)18 CGCACA(CG)7(CA)8/(CA)20CGCACA(CG)7(CA)8/(CA)20CGCACA (CG)7(CA)9/(CA)21CGCACA(CG)7(CA)8 HGVbaseG2P: nnnnnnnnnn((CA)11CGCACA(CG)6(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)13CGCACA(CG)6(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)14CGCACA(CG)6(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)15CGCACA(CG)6(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)15CGCACA(CG)7(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)17CGCA)nnnnnnnnnn nnnnnnnnnn((CA)18CGCACA(CG)7(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)20CGCACA(CG)7(CA)8)nnnnnnnnnn nnnnnnnnnn((CA)20CGCACA(CG)7(CA)9)nnnnnnnnnn nnnnnnnnnn((CA)21CGCACA(CG)7(CA)8)nnnnnnnnnn ...note: the () enclosing brackets around each allele Mixed variants Definition: Markers that are comprised of alleles of different variation classes Example_1 allele representations: dbSNP: -/A/ATTTA/T HGVbaseG2P: nnnnnnnnnn(_)nnnnnnnnnn nnnnnnnnnn(A)nnnnnnnnnn nnnnnnnnnn(ATTTA)nnnnnnnnnn nnnnnnnnnn(T)nnnnnnnnnn ...note: the ‘-’ allele is replaced by ‘_’ Example_2 allele representations: dbSNP: -/G/T/TG/TTTTTTTG/TTTTTTTTTTTTTTG HGVbaseG2P: nnnnnnn(_)nnnnnnnnnn nnnnnnn(G)nnnnnnnnnn nnnnnnn(T)nnnnnnnnnn nnnnnnn(TG)nnnnnnnnnn nnnnnnn((T)7G))nnnnnnnnnn nnnnnnn((T)14G))nnnnnnnnnn ...note: the ‘-’ allele is replaced by ‘_’ Named variants (Named) Definition: insertion/deletion polymorphisms of longer sequence features, such as retroposons (presence or absence), Alus or LINEs. These variations frequently include a deletion "-" indicator for the absent allele. Example_1 allele representations: dbSNP: (LARGEDELETION)/-/G/T HGVbaseG2P: nnnnnnnnnn(“LARGEDELETION”)nnnnnnnnnn nnnnnnnnnn(_)nnnnnnnnnn nnnnnnnnnn(G)nnnnnnnnnn nnnnnnnnnn(T)nnnnnnnnnn ...note: quotation marks added to text-based allele ...note: the ‘-’ allele is replaced by ‘_’ Example_1 allele representations: dbSNP: ([CT]+[CA]+[CT]STR, LENGTH 204)/([CT]+[CA]+[CT]STR, LENGTH194)/([CT]+[CA]+[CT]STR, LENGTHS 190-208) HGVbaseG2P: nnnnnnnnnn(“[CT]+[CA]+[CT]STR, LENGTH 204”)nnnnnnnnnn nnnnnnnnnn(“[CT]+[CA]+[CT]STR,LENGTH194”)nnnnnnnnnn nnnnnnnnnn(“[CT]+[CA]+[CT]STR,LENGTHS190-208”)nnnnnnnnnn ...note: quotation marks added to text-based alleles HGVbaseG2P System For Genotype And Allele Nomenclature: Usage Examples GenotypeDef (given 3 known alleles) GENOTYPE NOMENCLATURE P Obs P Ratio P Count Q Obs Q Ratio Q Count R Obs R Ratio GenotypeLabel Implies that a genotyping experiment detected... nnnnn[(P)]nnnnn allele P only 1 0 0 nnnnn[(P)+(Q)]nnnnn alleles P, and Q 1 1 0 nnnnn[(P)+(Q)+(R)]nnnnn alleles P, Q, and R 1 1 1 nnnnn[(P)+(?)]nnnnn allele P, plus an indeterminate allele 1 ? ? nnnnn[1(P)+1(Q)]nnnnn alleles P, and Q; with signals in the ratio 1:1 1 1 0 nnnnn[2(P)+1(Q)]nnnnn alleles P, and Q; with signals in the ratio 2:1 2 1 0 nnnnn[5(P)+4(Q)]nnnnn alleles P, and Q; with signals in the ratio 5:4 5 4 0 nnnnn[1(P)+2(Q)+1(R)]nnnnn alleles P, Q, and R; with signals in the ratio 1:2:1 1 2 1 1 1? 1? R Count alleles P, plus an indeterminate allele; with signals in the nnnnn[1(P)+1(?)]nnnnn ratio 1:1 [note: only integers permitted in front of curved brackets, and all alleles must have a numeric value] allele P only; quantified at 1 copy of P (e.g., haploid or a nnnnn[(1xP)]nnnnn deletion mosaic for a unique-sequence marker) 1 0 0 2 0 0 0.5 0 0 1-2 0 0 >6 0 0 1 1 0 2 1 1 1 1? 1? allele P only; quantified at 2 copies of P (e.g., diploid for a nnnnn[(2xP)]nnnnn unique-sequence marker) allele P only; quantified at 0.5 copies of P (e.g., a deletion nnnnn[(0.5xP)]nnnnn mosaic for a marker) allele P only; quantified at between 1-2 copies of P (e.g., as nnnnn[(1-2xP)]nnnnn determined by a semi-quantitative method) allele P only; quantified at >6 copies of P (e.g., non-unique nnnnn[(>6xP)]nnnnn marker at >6 copies per cell) alleles P, and Q; quantified at 1 copy of P and 1 copy of Q nnnnn[(1xP)+(1xQ)]nnnnn (i.e., unique-sequence marker at 2 copies per cell) alleles P, Q, and R; quantified at 2 copies of P, 1 copy of Q, nnnnn[(2xP)+(1xQ)+(1xR)]nnnnn and 1 copy of R (e.g., non-unique marker at 4 copies per cell) allele P, plus an indeterminate allele; quantified at 1 copy of nnnnn[(1xP)+(1x?)]nnnnn P and 1 copy of the indeterminate allele [note: non-integer values and range values permitted between curved brackets, and all alleles must have a numeric value] ALLELE NOMENCLATURE nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(T)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn T nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(TTT)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn TTT nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn((T)3)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn TTT nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(ATTT)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ATTT nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(A(T)3)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ATTT nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(ATTTAA)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ATTTAA nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(A(T)3(A)2)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ATTTAA nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(A(T)3(A)2)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn ATTTAA nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(?)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn One of the known alleles for this marker nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(X)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn Named allele that is an unknown allele nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn(TTTorT)nnnnnnnnnnnnnnnnnnnnnnnnnnnnnn TTT or T [note; permitted allele character are all IUPAC codes, plus ?, plus X] USING COMPLEX ALLELES IN GENOTYPES nnnnn[1(A(T)3(A)2)+1(C)]nnnnn alleles ATTTAA, and C; with signals in the ratio 1:1 nnnnn[(1xA(T)3(A)2)+(1xC)]nnnnn alleles ATTTAA and C; quantified at 1 copy of ATTTAA and 1 copy of C