The loci that were identified as hydridizing to the human GGT1

advertisement
SUPPLEMENTARY DATA
Four different categories of GGT-related sequences
The loci that were identified as having homology to human GGT1 could be subdivided into four
categories (see Table 1 Supplementary Data). Category 1 contains four members that have
substantial nucleic acid identity over the full length of the test GGT1 cDNA and includes GGT1,
GGT2, GGT3P and GGT4P. The status of the GGT2 gene as an actively transcribed locus is
currently not clear. Pawlak et al (1998) cloned three cDNAs from a human kidney cDNA library and
reported the sequence of an 0.8 kb clone which was designated type II. We performed a database
search against the human genome (build 36.1) and EST databases with this 800 bp sequence and
found that it has the highest match with both GGT2 and GGTLC3, but 100% identity with neither
sequence. In addition, there are no ESTs with 100% identity to this type II RNA. No other authentic
mRNAs are currently listed in the databases for either GGT2 or GGTLC3. Both GGT2 (not shown)
and GGTLC3 (Suppl. Fig. 1) have mutations in residues S451 (S→L) and D423 (D→T), which are
proposed to interact with glutathione in GGT1 (Han et al., 2007; Okada et al., 2006). Based on this,
the status of GGT2 and GGTLC3 as active loci is currently unclear. GGT3P and GGT4P are descibed
in more detail below.
Category 2 includes the light chain genes of which only two (GGTLC1 and GGTLC2) are associated
with mRNAs. They are also represented on microarrays (see Table II, Geo Profile records). Category
3 includes one gene that contains only sequences homologous to GGT1 coding exons 1, 4, and 5.
However, the reading frame of that gene encounters a stop codon at amino acid residue 25 and
although an mRNA (NR_003503) was reported it does not appear to be able to encode protein with
any GGT function and therefore this gene is considered to be a pseudogene (GGT8P).
Category 4 contains genes with a deduced amino acid sequence exhibiting a higher or lower
degree of similarity to GGT1, namely GGT5 {formerly GGTLA1/GGT-rel, GGL}, GGT6 and GGT7
(formerly GGTL3 or GGTL5). We also performed database searches with the GGT5 cDNA sequence.
However, apart from the locus that encodes it on chr 22: 22.9 Mb, the human genome does not
contain sequences with substantial nucleotide identity to this gene or parts of it. In 2005 Puente et al
reported that chimpanzee does not contain a GGT5 ortholog, but a database survey for GGTLA1
against the Pan troglodytes (build 2.1) genome now showed the presence of such gene on
chromosome 22. We also performed a database search with the GGT7 cDNA sequence but it also
represents a single gene, lacking other sequences with significant identity in human. Finally, GGT6
also is a single-copy gene that is located on chromosome 17p13.2.
Examination of genes with a frame shift in coding exon 9- GGT3P and GGT4P
GGT3P and GGT4P have very substantial nucelotide sequence homology over their entire length
to the bona fide GGT1 gene. However, although the moieties encoding the heavy chain have an
apparently open reading frame consistent with GGT1, the light chains have a frame shift that would
render the amino acid sequence quite different. As shown in Supplementary Fig. 1, the GGT3P and
GGT4P genes both miss one nucleotide in exon 9 that causes a frame shift, although the possibility
of a continued open reading frame that encodes a substantial extra number of amino acid residues is
present.
Frame shifts are also present in GGTLC4P and GGTLC5P. GGTLC4P has one nucleotide missing
in coding exon 9 after the –FGSKVRSPVSGILFNDEMDDFSSPNITNEFGVPP- string, causing a
frame shift.
GGTLC5P misses one nucleotide after the sequence FGSKVCSPVSGILFNNEWTTSALPA- leading
to an amino acid terminal end similar to GGT3P and GGT4P.
Supplementary Table 1. Categories of GGT1-related sequences in the human genome
Category
gene
Chrom. location
1. substantial nucleic acid identity to
GGT1 over the entire length
GGT1
GGT2
GGT3P
GGT4P
22: 23.3
22: 19.89
22: 17.15
13
2. substantial nucleic acid identity to
the light chain encoding part of
GGT1
GGTLC1
GGTLC2
GGTLC3
GGTLC4P
GGTLC5P
20: 23.92
22: 21.31
22: 18.75
22: 22.97
22: 18.95
3. substantial nucleic acid identity to
some GGT1 exons
GGT8P
4. similarity in deduced amino acid
sequence to GGT1
GGT5
GGT6
GGT7
2: 91.3
22: 22.95
17: 4.4
20: 32.9
2
Supplementary Figure 1.
GGTLC1
GGTLC2
GGTLC3
GGTLC1
GGTLC2
GGTLC3
GGTLC1
GGTLC2
GGTLC3
T381
N401
MTSEFFSAQLRAQISDDTTHPISYYKPEFYMPDDGGTAHLSVVAEDGSAVSATSTINLY
MTSEFFAAQLRAQISDDTTHPISYYKPEFYTPVDGGTAHLSVVAEDGSAVSATSTINLY
MTSEFFAAQLRSQISDHTTHPISYYKPEFYTPDDGGTAHLSVVAEDGSAVSATSTINLY
E420
FGSKVRSPVSGILLNNEMDDFSSTSITNEFGVPPSPANFIQP
FGSKVRSPVSEILFNDEMDDFSSPNITNEFGVPPSPANFIQP
FGSKVCSPVSGILFNNEWTTSALPAFTNEFGAPPSPANFIQP
D423
S451
G474
GKQPLSSMCPTIMVGQDGQVRMVVGAAGGTQITMATAL
GKQPLSSMCPTIMVGQDGQVRMVVGAAGGTQITTATAL
GKQPLLSMCPTIMVGQDGQVRMVVGAAGGTQITTDTAL
S452
G473
8
8
8
9
9
9
10
10
10
GGTLC1
GGTLC2
GGTLC3
AIIYNLWFGYDVKWAVEEPRLHNQLLPNVTTVERNIDQ
AIIYNLWFGYDVKRAVEEPRLHNQLLPNVTTVERNIDQ
AIIYNLWFGYDVKRAVEEPRLHNKLLPNVTTVERNIDQ
11
11
11
GGTLC1
GGTLC2
GGTLC3
EVTAALETRHHHTQITSTFIAVVQAIVRMAGGWAAASDSRKGGEPAGY
AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY
AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY
12
12
12
Supplementary Fig. 1. Alignment of light chain only genes (GGTLC) that have a deduced amino acid
sequence similar to the frame of GGT1 and GGT2. The amino acid residues that would differ from those
found in the light chain of GGT1 are highlighted. The sequences are divided into segments encoded by
different exons according to that of GGT1. Exon numbering indicated at the end of each line is that of
the corresponding protein coding exons of the GGT1 gene. Han et al (2007) proposed that residues
T381, N401, E420, D423, G473, G474, S451 and S452 in the active site of human GGT1 light chain
interact with glutathione, based on analysis of the crystal structure of E. coli GGT residues T391, D433,
S462 and S463 (Okada et al., 2006). Relevant sites are indicated above and below the sequence. The
sequence underlined in exon 9 is one hallmark difference between GGT1 (similar to GGTLC1 and
GGTLC2) and GGT2 (similar to GGTLC3). The GGTLC1 sequence is NP_842563 from NM_178311
(chr 20: 23.92 Mb, Wetmore et al (1993); the GGTLC2 sequence is NP_543029 from NM_080839 (chr
2: 21.3 Mb, locus 129026, gene 1, GGTL4); the GGTLC3 sequence is predicted XP_001128310 from
predicted XM_001128310 (chr 22: 18.75 Mb, gene 11). There are variant cDNAs for both GGTLC2
(NM_199127) and GGTLC3 (predicted, XM_001128302) that encode additional amino acid residues
between exons 10 and 11 that are the result of an in-frame read through of the intron.
For GGTLC2 this sequence is
-ICVTPFLPGRAHPAQPPSHADHTPMQP- and for GGTLC3 –
VCVTPFLPGPAHSAQPPSHADHTPMPQ-. The type III amino acid sequence reported by Leh et al.
(1996) is similar to the GGTLC2 sequence with inclusion of the read-through of the nucleotide sequence
between exons 10 and 11; however this type III cDNA has a C-terminal end (…GGVPATECSPGGQG*)
that differs from that of GGTLC2 (…..GGEPAGY). An additional difference is an E at position 414 in
GGTLC2 and a G at that position in the clone reported by Leh et al (1996).
3
Supplementary Figure 2.
GGT3P
MKKKLVVLGLLAVVLVLVIVGLCLWLPSASKEPDNHVYTRAVVAADAKQCLEIGR
DTLRDGGSAVDAAIAALLCVGLMNAHSMGIGVGLSSTIYNSTT
RKAEVINAREVAPRLAFASMFNSSEQSQK
GGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARQGFPVGKGLAAVLENKRTVIEQQPVLC
EVFCRDRKVLREGERLTLPRLADTYEMLAIEGAQAFYNGSLMAQIVKDIQAA
GGIVTAEDLNNYCAELIEHPLNISLGDAVLYMPSARLSGPVLALILNILK
GYNFSRESVETPEQKGLTYHRIVEAFRAYAKRTLLGDPKFVDVTE
VVRNMTSEFFAAQLRSQISDHTTHPISYYKPEFYTPDDGGTAHLSVVAEDGSAVSATSTINLY
1
2
3
4
5
6
7
8
FGSKVCSPVSGILFNNEWTTSALPA^SPMSLGHPPHLPISSSQGSSRSCPCARRSWWARTARSGW
WWELLGARRSPQTLHWPSSTTSGSAMT*
^¾ cDNAs miss a nt here
9a
FGSKVCSPVSGILFNNEWTTSALPAFTNEFGAPPSPANFIQP
9b
GKQPLLSMCLTIMVGQDGQVRMVVGAAGGTQITTDTAL
AIIYNLWFGYDVKRAVEEPRLHNKLLPNVTTVERNIDQ
AVTAALETRHHHTQIASTFIAVVQAIVRTAGGWAAASDSRKGGEPAGY*
(NM_002058)
(NM_002058)
(NM_002058)
(NM_002058)
10
11
12
GGT4P
MKKKLVVLGLLAVVLVLVIVNLCLWLPSASKEPDNHVYTRAAVAADAKQCSEIGR
1
DTLRDGGSAVDAAIAALLCVGLMNAHSMSIGGGLFLTIYNSTS
2
GKAEVINAREVAPRLAFASMFNSLEQSQK
3
GGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARQGFPVGKGLAAVLENKRTVIEQQPVLW
4
HVCGEVFCRDRKVLREGERLTLPRVADTYETLAIEGAQAFYNGSLMAQIVKDIQAA
5
VMVQPHPSAHSSCCPVAGGIVTAEDLNNYCAELIEHPLNISLGDAVLYMPSAPLSGPVLALILNILK
6
GYNFSWESVETPEQKGLTYHRIVEAFWFAYAKRTLLGDPKFVNVTE
7
VVRNMTSEFFAAQLWAQISDNTTHTISYYKPKFYTPDDRGTAHLSVITEDGSAVSATSTINLY
8
FGSKVCSPVSGILFNNEWTTSALPA^SPMSLGYPPHLPISSSQGSSRSRPCSQRSWWARTARSGW
WWELLGARRSPQPLHWPSSTTSGSAMT*
^frame shift, one nt gone
Supplementary Fig. 2. GGT3P and GGT4P genes- deduced amino acid sequences. Numbers at the
end of each line indicate the coding exon in GGT1 which encodes these residues. For GGT3P, all exons
and intron-exon junctions were compared to those in GGT1. Residues in green indicate amino acids in
the deduced sequence that differ from those in GGT1. The boxed area in GGT3P indicates a one
nucleotide discrepancy between one cDNA record (NM_002058) and the sequence of genomic DNA
and three other cDNAs. It is unclear whether this is reflective of a polymorphism or an error. Only the
sequence that includes this residue (NM_002058) would encode a light chain with homology to that of
GGT1. The other cDNAs have a reading frame that is shifted and amino acid residues thereafter unlike
that of GGT1. The GGT3P gene sequence is from the genomic DNA and from several cDNAs
(NR_003267, BC108264), whereas the GGT4P sequence is from a predicted mRNA XR_016938.
4
Download