Lab3_key

advertisement
CS 251 Introduction to Bioinformatics:
Laboratory 3:
Working with Eukaryotic sequences:
Today, we will continue to examine a universal DNA surveillance protein, mutS/hMSH2. This week
we will focus attention on eukaryotic sequences, using the the hMSH2 gene as an example.
Please refer to the beginning of last week’s laboratory for gene terminology, gene abbreviations, etc.
Procedure:
follow pp. 85-95, and pp. 105-110 in C & N/BFD
Objective 1:
Locate and study the hMSH2 gene from H. sapiens (that’s us!)
Go to the GenBank entry tool at http://www.ncbi.nlm.nih.gov/entrez/
a. From the “HOTSPOTS” list, point your browser to “Human genome resources”.
b. Under the subheading “The Genomic Sequence”, point your browser to “BLAST the
genome”. You will now BLAST the human genome, all 30,000 or so proteins, with the
Escherichia coli mutS protein sequence that you discovered and studied last week. Let’s see
if the mutS protein of a bacterium is similar enough that it can find the human gene equivalent.
NOTE: The common ancestor of humans and bacteria diverged into separate lineages
approximately 1.5 billion years ago. If this common ancestor to humans and bacteria
possessed a mutS gene, then is it possible that the homologous genes in bacteria and
humans have remained sufficiently conserved over this vast evolutionary distance that they
still resemble each other??? Soon you will know the answer…..
c. Obtain the 853 aa E. coli mutS protein sequence from your lab report, or from this GenBank
ID: AAM84420. Paste the aa sequence into the search window. Under “Database”,
choose “RefSeq protein”. With this input, you are instructing GenBank to compare the E. coli
mutS protein not with the entire 3 billion base pairs of DNA sequence, but rather with a much
smaller databank of known human proteins. Under “Program”, choose blastp (this is a
protein-protein search, as opposed to a blastn, which is nucleotide-nucleotide search).
This blastp search of “Refseq protein” is much faster than the alternative, which is to ask
GenBank to translate all six reading frames of the entire genome and then compare every
resulting ORF to your query sequence (this would be a tblastn search).
For the output from this search, choose 50 descriptions and 10 alignments.
“Begin Search” Be patient, it may take a few minutes to complete the search….
d. A standard BLAST page will appear, same as last week. On this new screen, click
“FORMAT”. This will bring up a new window, and within 1-4 minutes the completed BLAST
search report should appear.
Q1: In the space below, copy/paste the second element of the BLAST report, the single line
entries under
“Sequences producing significant alignments:”
ref|NP_002430.1| mutS homolog
ref|NP_000242.1| mutS homolog
ref|NP_000170.1| mutS homolog
ref|NP_002431.2| mutS homolog
ref|NP_079535.3| mutS homolog
ref|NP_751898.1| mutS homolog
gb|AAP35864.1| mutS homolog 5
3; mutS (E. coli) homolog 3 [...
2; mutS (E. coli) homolog 2; ...
6; G/T mismatch-binding prote...
4; mutS (E. coli) homolog 4 [...
5 isoform a; mutS (E. coli) h...
5 isoform c; mutS (E. coli) h...
(E. coli) [Homo sapiens] >gi|...
273
265
230
225
174
174
169
5e-73
8e-71
3e-60
1e-58
3e-43
3e-43
8e-42
Q2: So, do we humans appear to possess a gene related to bacterial mutS?
In fact, how many mutS-related genes do we appear to possess?
Yes - 7
Q3: Discussion question: How did the human genome come to contain multiple, related
copies of a gene?
Repeated duplication of the original (ancestral) gene
During this discussion, the following terms will be introduced, and you will henceforth be
responsible to understand each of them:
Gene family:
Homolog/homologous:
Ortholog/orthologous:
Gene duplication:
Paralog/paralogous:
e. Analysis of the alignment between mutS and hMSH3:
Q4: In the space below, paste the first alignment of your BLAST report (10 pt Courier).
ref|NP_002430.1| mutS homolog 3; mutS (E. coli) homolog 3 [Homo sapiens]
Length = 1128
Score = 273 bits (697), Expect = 5e-73
Identities = 243/883 (27%), Positives = 410/883 (46%), Gaps = 108/883 (12%)
Query: 10
Sbjct: 220
Query: 70
Sbjct: 275
Query: 122
Sbjct: 333
HTPMMQQYLKLKAQHPEILLFYRMGDFYELFYDDAKRASQLLDISLTKRGASAGEPIPMA 69
+TP+ QY+++K QH + +L
G Y F +DA+ A++ L+I
A
YTPLELQYIEMKQQHKDAVLCVECGYKYRFFGEDAEIAARELNIY-----CHLDHNFMTA 274
GIPYHAVENYLAKLVNQGESVAICEQ--------IGDPATSKGPVERKVVRIVTPGTISD 121
IP H + ++ +LV +G V + +Q
IGD +S
RK+ + T T+
SIPTHRLFVHVRRLVAKGYKVGVVKQTETAALKAIGDNRSSL--FSRKLTALYTKSTLIG 332
E---------------ALLQERQDNLLAAIWQDSKG----------FGYATLDISSGRF- 155
E
++ +
+ L I ++ +
G
+ ++G
EDVNPLIKLDDAVNVDEIMTDTSTSYLLCISENKENVRDKKKGNIFIGIVGVQPATGEVV 392
Query: 156
Sbjct: 393
Query: 203
Sbjct: 453
Query: 253
Sbjct: 512
Query: 312
Sbjct: 572
Query: 368
Sbjct: 632
Query: 424
Sbjct: 692
Query: 476
Sbjct: 746
Query: 535
Sbjct: 805
Query: 595
Sbjct: 865
Query: 652
Sbjct: 925
Query: 712
Sbjct: 985
--RLSEPADRETMAAELQRTNPAELLYAEDFAE-----------MSLIEGRRGLRRRPLW 202
+ A R +
+
P ELL
+E
+S+ + R + R
FDSFQDSASRSELETRMSSLQPVELLLPSALSEQTEALIHRATSVSVQDDRIRVERMDNI 452
EFEIDTARQQLNLQFGTRDLVGF-------GVENAPRG-LCAAGCLLQYAKD--TQRTTL 252
FE
A Q + +F +D V
G+ N + +C+
+++Y K+
++
YFEYSHAFQAVT-EFYAKDTVDIKGSQIISGIVNLEKPVICSLAAIIKYLKEFNLEKMLS 511
PHIRSITMEREQDSIIMDAATRRNLEITQNLAG-GAENTLASVLDCTVTPMGSRMLKRWL 311
+ + + + ++ T RNLEI QN
+ +L VLD T T G R LK+W+
KPENFKQLSSKMEFMTINGTTLRNLEILQNQTDMKTKGSLLWVLDHTKTSFGRRKLKKWV 571
HMPVRDTRVLLERQQTIGAL----QDFTAELQPVLRQVGDLERILARLALRTARPRDLAR 367
P+
R + R
+ +
+++ LR++ D+ER L + +
++
TQPLLKLREINARLDAVSEVLHSESSVFGQIENHLRKLPDIERGLCSIYHKKCSTQEFFL 631
MRHAFQQLP-ELRAQLETVDS-APVQALREKMGEFAELRDLLER--AIIDTPPVLVRDGG 423
+
L E +A + V+S
LR + E EL
+E
I++
V D
IVKTLYHLKSEFQAIIPAVNSHIQSDLLRTVILEIPELLSPVEHYLKILNEQAAKVGDKT 691
VIASGYNE------ELDEWRALADGATDYLERLEVRERERTGLDTLKVGFNAVHG--YYI 475
+
++
DE + + D
+L+ +
R L
+ V G + I
ELFKDLSDFPLIKKRKDEIQGVIDEIRMHLQEI------RKILKNPSAQYVTVSGQEFMI 745
QISRGQSHLAPINYMRRQTLKNAERYIIPELKEYEDKVLTSKGKALALEKQL-YEELFDL 534
+I
P ++++ + K
R+ P + E
+ L
+ L L+
+ + +
EIKNSAVSCIPTDWVKVGSTKAVSRFHSPFIVE-NYRHLNQLREQLVLDCSAEWLDFLEK 804
LLPHLEALQQSASALAELDVLVNLAERAYTLNYTCPTFIDKPGIRITEGRHPVVEQVLNE 594
H +L ++
LA +D + +LA+ A
+Y PT ++ I I GRHPV++ +L E
FSEHYHSLCKAVHHLATVDCIFSLAKVAKQGDYCRPTVQEERKIVIKNGRHPVIDVLLGE 864
P--FIANPLNLSPQ-RRMLIITGPNMGGKSTYMRQTALIALMAYIGSYVPAQKVEIGPID 651
++ N +LS
R++IITGPNMGGKS+Y++Q ALI +MA IGSYVPA++ IG +D
QDQYVPNNTDLSEDSERVMIITGPNMGGKSSYIKQVALITIMAQIGSYVPAEEATIGIVD 924
RIFTRVGAADDLASGRSTFMVEMTETANILHNATEYSLVLMDEIGRGTSTYDGLSLAWAC 711
IFTR+GAAD++ GRSTFM E+T+TA I+ AT SLV++DE+GRGTST+DG+++A+A
GIFTRMGAADNIYKGRSTFMEELTDTAEIIRKATSQSLVILDELGRGTSTHDGIAIAYAT 984
AENLANKIKALTLFATHYFELTQLPEK-MEGVANVHLDAL--------------EHGDTI 756
E
+K+LTLF THY + +L +
V N H+ L
+ D +
LEYFIRDVKSLTLFVTHYPPVCELEKNYSHQVGNYHMGFLVSEDESKLDPGAAEQVPDFV 1044
Query: 757
AFMHSVQDGAASKSYGLAVAALAGVPKEVIKRARQKLRELESI 799
F++ + G A++SYGL VA LA VP E++K+A K +ELE +
Sbjct: 1045 TFLYQITRGIAARSYGLNVAKLADVPGEILKKAAHKSKELEGL 1087
Q5:
Relatedness of the proteins: How much identity and similarity are shared by the two
proteins?
Identity = 245/897
Similarity = 420/897
How does this compare to last week’s analysis of E. coli vs. Y. pestis mutS genes?
Identity = 27% vs. 80%
Similarity = 46% vs. 88%
In simple terms, what does this comparison reveal about evolutionary relatedness
among these three species?
The three are related, however, the bacterial strains, as could be expected show
A greater similarity than the bacterial to human strain
Q6:
Comparing the lengths of the bacterial and human proteins: How do the lengths of
E. coli mutS and hMSH3 differ?
E. coli mutS = 853 aa
hMSH3
= 1128 aa
Does the alignment show the full length sequence of each protein? If not, why not?
No – BLAST is a local alignment scheme that only shows regions that meet a
certain minimum score.
Interpret the “Gaps” portion of the BLAST report for this alignment. In order to create
an optimal alignment, how many locations containing gaps were introduced into the
sequence?
Shows number gap characters 132 gap characters introduced. However, there
are only 18 gaps.
Explain what the “Gaps” report means to say “Gaps = 132/897 (14%)”
Basically, there were 132 indels required to attain the highest scoring alignment
Between the two sequences. This means that 14%of the best alignment consists
of gap characters.
f.
Study one of the human mutS homologs: point your browser to the first human gene from the
BLAST report, and open the annotation for this gene (hMSH3). This will bring up a page titled
“LocusLink”. This page serves as a gateway to a variety of sources of information about this
gene.
g. Start by scrolling to “Annotation for this locus”, and open the hyperlink associated with
“mRNA”. Paste the output here.
LOCUS
DEFINITION
ACCESSION
VERSION
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REMARK
REFERENCE
AUTHORS
TITLE
JOURNAL
PUBMED
REMARK
REFERENCE
AUTHORS
NM_002439
4374 bp
mRNA
linear
PRI 23-AUG-2004
Homo sapiens mutS homolog 3 (E. coli) (MSH3), mRNA.
NM_002439
NM_002439.1 GI:4505248
.
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
1 (bases 1 to 4374)
Plaschke,J., Kruger,S., Jeske,B., Theissig,F., Kreuz,F.R.,
Pistorius,S., Saeger,H.D., Iaccarino,I., Marra,G. and
Schackert,H.K.
Loss of MSH3 protein expression is frequent in MLH1-deficient
colorectal cancer and is associated with disease progression
Cancer Res. 64 (3), 864-870 (2004)
14871813
GeneRIF: MSH3 abrogation may be a predictor of metastatic disease
or even favor tumor cell spread in MLH1-deficient colorectal
cancers.
2 (bases 1 to 4374)
Mazurek,A., Berardini,M. and Fishel,R.
Activation of human MutS homologs by 8-oxo-guanine DNA damage
J. Biol. Chem. 277 (10), 8260-8266 (2002)
11756455
GeneRIF: hMSH2-hMSH3 did not appear to bind any of the 8-oxo-G
containing DNA substrates nor was there enhanced ATPase or ADP -->
ATP exchange activities.
3 (bases 1 to 4374)
Arzimanoglou,I.I., Hansen,L.L., Chong,D., Li,Z., Psaroudi,M.C.,
Dimitrakakis,C., Jacovina,A.T., Shevchuk,M., Reid,L., Hajjar,K.A.,
Vassilaros,S., Michalas,S., Gilbert,F., Chervenak,F.A. and
Barber,H.R.
TITLE
Frequent LOH at hMLH1, a highly variable SNP in hMSH3, and
negligible coding instability in ovarian cancer
JOURNAL
Anticancer Res. 22 (2A), 969-975 (2002)
PUBMED
12014680
REMARK
GeneRIF: Frequent LOH at hMLH1, a highly variable SNP in hMSH3, and
negligible coding instability occur in ovarian cancer.
REFERENCE
4 (bases 1 to 4374)
AUTHORS
Ceccotti,S., Ciotta,C., Fronza,G., Dogliotti,E. and Bignami,M.
TITLE
Multiple mutations and frameshifts are the hallmark of defective
hPMS2 in pZ189-transfected human tumor cells
JOURNAL
Nucleic Acids Res. 28 (13), 2577-2584 (2000)
PUBMED
10871409
REFERENCE
5 (bases 1 to 4374)
AUTHORS
Risinger,J.I., Umar,A., Boyd,J., Berchuck,A., Kunkel,T.A. and
Barrett,J.C.
TITLE
Mutation of MSH3 in endometrial cancer and evidence for its
functional role in heteroduplex repair
JOURNAL
Nat. Genet. 14 (1), 102-105 (1996)
PUBMED
8782829
REFERENCE
6 (bases 1 to 4374)
AUTHORS
Watanabe,A., Ikejima,M., Suzuki,N. and Shimada,T.
TITLE
Genomic organization and expression of the human MSH3 gene
JOURNAL
Genomics 31 (3), 311-318 (1996)
PUBMED
8838312
REFERENCE
7 (bases 1 to 4374)
AUTHORS
Fujii,H. and Shimada,T.
TITLE
Isolation and characterization of cDNA clones derived from the
divergently transcribed gene in the region upstream from the human
dihydrofolate reductase gene
JOURNAL
J. Biol. Chem. 264 (17), 10057-10064 (1989)
PUBMED
2722860
COMMENT
PROVISIONAL REFSEQ: This record has not yet been subject to final
NCBI review. The reference sequence was derived from U61981.1.
FEATURES
Location/Qualifiers
source
1..4374
/organism="Homo sapiens"
/mol_type="mRNA"
/db_xref="taxon:9606"
/chromosome="5"
/map="5q11-q12"
gene
1..4374
/gene="MSH3"
/db_xref="GeneID:4437"
/db_xref="LocusID:4437"
/db_xref="MIM:600887"
CDS
17..3403
/gene="MSH3"
/function="putative mismatch repair/binding protein"
/note="mutS (E. coli) homolog 3;
go_function: ATP binding [goid 0005524] [evidence IEA];
go_function: damaged DNA binding [goid 0003684] [evidence
IEA];
go_process: mismatch repair [goid 0006298] [evidence TAS]
[pmid 8782829]"
/codon_start=1
/product="mutS homolog 3"
/protein_id="NP_002430.1"
/db_xref="GI:4505249"
/db_xref="GeneID:4437"
/db_xref="LocusID:4437"
/db_xref="MIM:600887"
/translation="MSRRKPASGGLAASSSAPARQAVLSRFFQSTGSLKSTSSSTGAA
DQVDPGAAAAAAPPAPAFPPQLPPHVATEIDRRKKRPLENDGPVKKKVKKVQQKEGGS
DLGMSGNSEPKKCLRTRNVSKSLEKLKEFCCDSALPQSRVQTESLQERFAVLPKCTDF
DDISLLHAKNAVSSEDSKRQINQKDTTLFDLSQFGSSNTSHENLQKTASKSANKRSKS
IYTPLELQYIEMKQQHKDAVLCVECGYKYRFFGEDAEIAARELNIYCHLDHNFMTASI
PTHRLFVHVRRLVAKGYKVGVVKQTETAALKAIGDNRSSLFSRKLTALYTKSTLIGED
VNPLIKLDDAVNVDEIMTDTSTSYLLCISENKENVRDKKKGNIFIGIVGVQPATGEVV
FDSFQDSASRSELETRMSSLQPVELLLPSALSEQTEALIHRATSVSVQDDRIRVERMD
NIYFEYSHAFQAVTEFYAKDTVDIKGSQIISGIVNLEKPVICSLAAIIKYLKEFNLEK
MLSKPENFKQLSSKMEFMTINGTTLRNLEILQNQTDMKTKGSLLWVLDHTKTSFGRRK
LKKWVTQPLLKLREINARLDAVSEVLHSESSVFGQIENHLRKLPDIERGLCSIYHKKC
STQEFFLIVKTLYHLKSEFQAIIPAVNSHIQSDLLRTVILEIPELLSPVEHYLKILNE
QAAKVGDKTELFKDLSDFPLIKKRKDEIQGVIDEIRMHLQEIRKILKNPSAQYVTVSG
QEFMIEIKNSAVSCIPTDWVKVGSTKAVSRFHSPFIVENYRHLNQLREQLVLDCSAEW
LDFLEKFSEHYHSLCKAVHHLATVDCIFSLAKVAKQGDYCRPTVQEERKIVIKNGRHP
VIDVLLGEQDQYVPNNTDLSEDSERVMIITGPNMGGKSSYIKQVALITIMAQIGSYVP
AEEATIGIVDGIFTRMGAADNIYKGRSTFMEELTDTAEIIRKATSQSLVILDELGRGT
STHDGIAIAYATLEYFIRDVKSLTLFVTHYPPVCELEKNYSHQVGNYHMGFLVSEDES
KLDPGAAEQVPDFVTFLYQITRGIAARSYGLNVAKLADVPGEILKKAAHKSKELEGLI
NTKRKRLKYFAKLWTMHNAQDLQKWTEEFNMEETQTSLLH"
ORIGIN
1
61
121
181
241
301
361
421
481
541
601
661
721
781
841
901
961
1021
1081
1141
1201
1261
1321
1381
1441
1501
1561
1621
1681
1741
1801
1861
1921
1981
2041
2101
gggcacgagc
ctcagcccct
atccacctcc
agcgccccca
cagaagaaag
ccaacaaaag
tctgaggacc
tgcccttcct
aaaatgtact
agattcgaaa
atcatcaaat
gtccaaaagc
agatgcagtt
gattgcagcc
tatacctact
gggagttgtg
actcttttcc
gaatccccta
taccagctat
caacattttt
tttccaggac
agagctgctg
atctgttagt
atacagccat
aggttctcaa
tgccatcata
ttttaaacag
tctggaaatc
agaccacact
ccttaaatta
atctagtgtg
actctgtagc
tttatatcac
gtcagacttg
ttacttaaag
agacctttct
cctgccatgt
gcgaggcaag
tcctccacag
gcgcccgcct
aagagaccat
gaaggaggaa
aggaatgttt
caaagtagag
gattttgatg
cgtcaaatta
acaagtcatg
atctatacgc
ttgtgtgtgg
cgagagctca
cacagactgt
aagcaaactg
cggaaattga
atcaagctgg
cttctgtgca
attggcattg
tctgcttctc
cttccttcgg
gtgcaggatg
gctttccagg
attatttctg
aaatacctca
ctatcaagta
ctacagaatc
aaaacttcat
agggaaataa
tttggtcaga
atttatcaca
ctaaagtcag
ctccggaccg
atactcaatg
gacttccctt
ctcgccggaa
cggttttgag
gtgcagccga
tcccgcccca
tggaaaatga
gtgatctggg
caaagtctct
tccagacaga
atatcagtct
atcaaaagga
aaaatttaca
cgctagaatt
aatgtggata
atatttattg
ttgttcatgt
aaactgcagc
ctgcccttta
atgatgctgt
tctctgaaaa
tgggagtgca
gttcagagct
ccttgtccga
acagaattcg
cagttacaga
gcattgttaa
aagaattcaa
aaatggaatt
agactgatat
ttgggagacg
atgcccggct
tagaaaatca
aaaaatgttc
aatttcaagc
ttattttaga
aacaagctgc
taataaaaaa
gcctgcgtcg
ccgattcttc
ccaggtggac
gctgccgccg
tgggcctgtt
aatgtctggc
ggaaaaattg
atctctgcag
tctacacgca
cacaacactt
gaaaactgct
acaatacata
taagtataga
ccatttagat
acgccgcctg
attaaaggcc
tacaaaatct
aaatgttgat
taaggaaaat
gcctgccaca
agaaacccgg
gcaaacagag
agtcgaaagg
gttttatgca
cttagagaag
cttggaaaag
tatgacaatt
gaaaaccaaa
gaagttaaag
tgatgctgta
tctacgtaaa
tacccaagag
aataatacct
aattcctgaa
caaagttggg
gaggaaggat
ggcggcctcg
cagtctacgg
cctggcgctg
cacgtagcta
aaaaagaaag
aactctgagc
aaagaattct
gagagatttg
aagaatgcag
tttgatctca
tccaaatcag
gaaatgaagc
ttctttgggg
cacaacttta
gtggcaaaag
attggagaca
acacttattg
gagataatga
gttagggaca
ggcgaggttg
atgtcaagcc
gcgctcatcc
atggataaca
aaagatacag
cctgtgattt
atgctctcca
aatggaacaa
ggaagtttgc
aagtgggtga
tcggaagttc
ttgcccgaca
ttcttcttga
gctgttaatt
ctcctcagtc
gataaaactg
gaaattcaag
ctgcctccag
gaagcctgaa
cagcggccgc
cagaaattga
taaagaaagt
caaagaaatg
gctgcgattc
cagttctgcc
tttcttctga
gtcagtttgg
ctaacaaacg
agcagcacaa
aagatgcaga
tgacagcaag
gatataaggt
acagaagttc
gagaagatgt
ctgatacttc
aaaaaaaggg
tgtttgatag
tgcagccagt
acagagccac
tttattttga
ttgacatcaa
gctctttggc
aacctgagaa
cattaaggaa
tgtgggtttt
cccagccact
tccattcaga
tagagagggg
ttgtcaaaac
cccacattca
cagtggagca
aattatttaa
gtgttattga
2161
2221
2281
2341
2401
2461
2521
2581
2641
2701
2761
2821
2881
2941
3001
3061
3121
3181
3241
3301
3361
3421
3481
3541
3601
3661
3721
3781
3841
3901
3961
4021
4081
4141
4201
4261
4321
cgagatccga
tgtgacagta
accaactgat
tattgtagaa
tgctgaatgg
agtgcatcac
agattactgc
ccctgtgatt
atcagaggac
ctacataaaa
agaagaagcg
tatatataaa
aaaagcaaca
tgatggaatt
aaccctgttt
ggtggggaat
cgcagcagaa
agcaaggagt
gaaagcagct
caagtatttt
ggagttcaac
tgaacaaaaa
tatctttgtg
agaggttttt
aacactcttg
aaagccttaa
tgatattttt
atccattgaa
tttttataag
tgggaggcca
tggcaaaacc
ttagctgggc
aatctcttga
gcttgggcaa
caagctttta
ttgtcatagg
taataaatat
atgcatttgc
tcaggacagg
tgggtaaagg
aattacagac
cttgattttc
ctagcaactg
agaccaactg
gatgtgttgc
tcagagagag
caagttgcat
acaattggga
ggacggagta
tcacagtcct
gccattgcct
gtcacccatt
taccacatgg
caagtccctg
tatggattaa
cacaagtcaa
gcaaagttat
atggaagaaa
atggagaatt
tgacatgtga
ctgaagacag
aatagacttc
gtggcagaat
atttgtttca
ctaaaataat
tagaaagaat
aggtaggcag
ccatctttac
atggtggcgc
acctgggagg
cagagcaaga
aaaactagag
attaagcagt
ttaatgaata
aagaaatacg
agtttatgat
ttggaagcac
atctgaatca
tagagaaatt
ttgactgcat
tacaagaaga
tgggagaaca
taatgataat
tgattaccat
ttgtggatgg
catttatgga
tggttatctt
atgctacact
atccgccagt
gattcttggt
attttgtcac
atgtggctaa
aagagctgga
ggacgatgca
cacagacttc
aaaaatacca
gcataaaatt
tctttttcaa
cactttgtaa
ataattccca
gttcagataa
tttattatgc
tggccaggca
atcacctgag
taaaaatata
acacctgtag
cggaggttgc
ctccatctca
cacagaagga
ttaaagattg
cttgctataa
aaaaatacta
agaaataaag
aaaagctgtg
gctccgggag
cagtgaacat
tttctccctg
aagaaaaatt
ggatcaatat
taccggacca
catggctcag
cattttcaca
agaactgact
ggatgaacta
tgagtatttc
ttgtgaacta
cagtgaggat
cttcctttac
actagcagat
aggattaata
taatgcacaa
tcttcttcat
actgtacaaa
atgaccatgg
gtttctgtct
ttagaaaatt
agcttttgga
ttggcaactg
aaccagttta
tggtggctca
gtcaggagtt
aagtacatct
tcccagctac
aatgagccga
aaaaagaaaa
ataaggtcat
ttggatgaaa
aaaaaaaaaa
aaaaatcctt
aactctgctg
agccgctttc
cagctagtcc
tatcactcct
gccaaggtcg
gtaataaaaa
gtcccaaata
aacatgggtg
attggctcct
aggatgggtg
gacacagcag
ggaagaggga
atcagagatg
gaaaaaaatt
gaaagcaaac
caaataacta
gttcctggag
aatacgaaaa
gacctgcaga
taaaatgaag
ataactctcc
tatattccta
tcctaacttt
ttatggacag
gggtgatata
ggtgaatctg
tccaccaaga
tgcctgtaat
caagaccagc
ctactaaaaa
tccggaggct
gatcacgtca
aagaaaagaa
gaaatttaaa
ttatttgtca
aaaaaaaaaa
ctgcacaata
tatcttgtat
actctccttt
ttgactgcag
tgtgtaaagc
ctaagcaagg
atggaaggca
atacagattt
gaaagagctc
atgttcctgc
ctgcagacaa
aaataatcag
cgagcactca
tgaaatcctt
actcacacca
tggatccagg
gaggaattgc
aaattttgaa
gaaagagact
agtggacaga
actacatttg
agtaacagcc
ttggaaacag
tctacgtata
taagtccagt
aaaatttact
gcaggaatct
acataagaat
cccagcactt
ctggccaaca
tacgaaaaaa
gaggcaggag
ctgcactcca
atagaattat
aggttaaata
ttcattcaag
aaaa
//
h. Using this report, follow pp. 86-90 in BFD, making sure that you understand how to navigate
the information in this report page (skip the section titled “Retrieving GenBank entries without
accession numbers”, pp. 90-91).
Q7:
Q8:
Q9:
Q10:
What is the length of the nucleotide sequence in this report?
4,374
Does it represent a DNA or RNA sequence?
mRNA
On what chromosome does this gene reside?
5
Where does the coding region begin and end (at what nucleotides)?
Nucleotides 17 to 34303
i. Using a Gene-Centric database: read and follow C & N/BFD, pp. 91-95, for this next exercise.
The LocusLink tool provides a gene-centered view, in which information about a particular gene
is integrated into one site, so that you can get hold of everything about that gene in one fell
swoop.
Use the “Back” button to return to the LocusLink page.
j.
Gene structure of hMSH3: at this point you know the length of the protein, and of the mRNA
from which the hMSH3 protein is translated. But what about the length of the gene itself? To
learn about the overall structure of the gene, click on the link titled
“Click to Display mRNA-Genomic Alignments”
A new window, Evidence Viewer, will open
From here “Go to full display with alignments”
k. Answer the following questions using information from the full display page in Evidence
Viewer:
l.
Q11:
How long is the hMSH3 gene? At what nt positions does the gene start and end?
222,896 base pairs going from 30544497 to 30767392
Q12:
How many exons and introns does this gene have?
25 exons and 24 introns
Q13:
What percentage of the gene is codegenic, i.e., what percentage of the gene is
devoted to exons, and what percentage to introns?
From Q10 the coding region consists of 3387 nt. Q11 gives the total gene
length as 222,896 nt. Thus, only 1.53% of the gene is codegenic.
Would you like a summary of all that is known about this gene’s relevance to human disease?
For this body of information, select OMIM, the Online Mendelian Inheritance in Man
database. See if you can find the answers to the following questions:
Q14:
Q15:
Q16:
The hMSH2 gene is known to be responsible for a hereditary cancer known as Type I
Hereditary Nonpolyposis Colon Cancer (HNPCC). Is the hMSH3 gene associated with
a particular cluster or family of cancers? If so, which one(s)?
malignancies in blood cells
What happens to mice that have been genetically engineered to lack the mouse
version of hMSH3 (mMSH3)? Does loss of the mMSH3 gene cause cancer
predisposition?
They showed no predisposition to cancer. However, MSH3/MSH6 deficiencies
showed infinitesimal tumor genisis.
The removal of introns and joining of exons during RNA processing follows some very
specific rules, which occasionally can be bent or sidestepped in small but important
ways to achieve alternative splicing. One of the most closely followed tenets is
known as the “GT-AG Rule”, in which the first two bases of every intron are GT, and
the last two bases of every intron are AG.
Does every intron in hMSH3 obey this rule? If not, which one(s) deviates and how?
Intron 6 has an AT – AA combination instead of a GT – GA combination.
m. Making sense of the entire human genome: follow pp. 105-110 of C & N/BFD as you learn
how to navigate the Ensembl Project tool, using hMSH3 as an example. e! is the symbol for
the Ensembl Project, which is located at http://www.ensembl.org.
Follow the procedure on pp. 106-107 to begin navigating the Ensembl website.
At the top of page 108, begin your search for the hMSH3 gene by browsing the appropriate
chromosome.
Q17:
How long is this chromosome, and how many confirmed (known) genes are encoded
on this chromosome?
The chromosome is 181,034,922 bps long. Known genes = 831/ 1317
(Wouldn’t you love to know what is a pseudogene? Stay tuned, we’ll talk about these
fascinating relics of our past when we get to phylogenetics and evolution)
n. From the MapView page, use Find to Lookup a Gene named MSH3.
o. From the TextView page, click on the Ensembl Gene hyperlink and move forward to the
GeneView (Ensembl Gene Report) page. Scroll down this page. Under Transcript
Structure, choose Exon information.
Q18:
Can you discover the lengths of both the shortest and the longest introns in this gene?
What are the identities of these introns, and what are their lengths?
Shortest: Intron 5 – 6
380bp
Longest: Intron 8 – 9
46,359bp
p. Click the appropriate hyperlink to View gene in genomic location. This will take you to
a ContigView page. Scroll up and down the page for a few minutes to get the feel of this
complex and comprehensive report, then answer the following questions:
Q19:
Chromosomes are subdivided into a short arm, or p arm and a long arm, called the
q arm. These arms are further subdivided into regions that are numbered, e.g.,
4p13.2.
In what subregion does hMSH3 reside?
q14.1
Where did you find this information on the ContigView page?
In the “Overview” Box
Q20:
In the vicinity of MSH3, are there any genes of unknown function, i.e., hypothetical
genes? How are they distinguished from known genes in this schematic overview of
the genome?
Yes, they are called novels and are indicated by a black bar as opposed to a
red bar
Q21:
Post-lab assignment: next door to the MSH3 gene is a known gene named DHFR.
Click on the reddish bar representing the gene to get started on the road to
understanding more about this neighboring gene. Use your skills at navigating
genome databases to answer the following questions about DHFR [HINT: from this
starting point, you should be no more than 2-3 clicks (at most) from the answers to the
questions below].
(1) What enzyme is encoded by the DHFR gene, and (briefly) what is its function?
Dihydrofiolate reductase
(2) Is this gene associated with any diseases when it is lost or defective? Explain
briefly.
Megaloblastic Anemia
(3) Is this gene associated in any way with cancer? If so, briefly describe.
In Chemotherapy mutations in DHFR can cause a resistance to methotexate
Download