CS 251 Introduction to Bioinformatics: Laboratory 3: Working with Eukaryotic sequences: Today, we will continue to examine a universal DNA surveillance protein, mutS/hMSH2. This week we will focus attention on eukaryotic sequences, using the the hMSH2 gene as an example. Please refer to the beginning of last week’s laboratory for gene terminology, gene abbreviations, etc. Procedure: follow pp. 85-95, and pp. 105-110 in C & N/BFD Objective 1: Locate and study the hMSH2 gene from H. sapiens (that’s us!) Go to the GenBank entry tool at http://www.ncbi.nlm.nih.gov/entrez/ a. From the “HOTSPOTS” list, point your browser to “Human genome resources”. b. Under the subheading “The Genomic Sequence”, point your browser to “BLAST the genome”. You will now BLAST the human genome, all 30,000 or so proteins, with the Escherichia coli mutS protein sequence that you discovered and studied last week. Let’s see if the mutS protein of a bacterium is similar enough that it can find the human gene equivalent. NOTE: The common ancestor of humans and bacteria diverged into separate lineages approximately 1.5 billion years ago. If this common ancestor to humans and bacteria possessed a mutS gene, then is it possible that the homologous genes in bacteria and humans have remained sufficiently conserved over this vast evolutionary distance that they still resemble each other??? Soon you will know the answer….. c. Obtain the 853 aa E. coli mutS protein sequence from your lab report, or from this GenBank ID: AAM84420. Paste the aa sequence into the search window. Under “Database”, choose “RefSeq protein”. With this input, you are instructing GenBank to compare the E. coli mutS protein not with the entire 3 billion base pairs of DNA sequence, but rather with a much smaller databank of known human proteins. Under “Program”, choose blastp (this is a protein-protein search, as opposed to a blastn, which is nucleotide-nucleotide search). This blastp search of “Refseq protein” is much faster than the alternative, which is to ask GenBank to translate all six reading frames of the entire genome and then compare every resulting ORF to your query sequence (this would be a tblastn search). For the output from this search, choose 50 descriptions and 10 alignments. “Begin Search” Be patient, it may take a few minutes to complete the search…. d. A standard BLAST page will appear, same as last week. On this new screen, click “FORMAT”. This will bring up a new window, and within 1-4 minutes the completed BLAST search report should appear. Q1: In the space below, copy/paste the second element of the BLAST report, the single line entries under “Sequences producing significant alignments:” ref|NP_002430.1| mutS homolog ref|NP_000242.1| mutS homolog ref|NP_000170.1| mutS homolog ref|NP_002431.2| mutS homolog ref|NP_079535.3| mutS homolog ref|NP_751898.1| mutS homolog gb|AAP35864.1| mutS homolog 5 3; mutS (E. coli) homolog 3 [... 2; mutS (E. coli) homolog 2; ... 6; G/T mismatch-binding prote... 4; mutS (E. coli) homolog 4 [... 5 isoform a; mutS (E. coli) h... 5 isoform c; mutS (E. coli) h... (E. coli) [Homo sapiens] >gi|... 273 265 230 225 174 174 169 5e-73 8e-71 3e-60 1e-58 3e-43 3e-43 8e-42 Q2: So, do we humans appear to possess a gene related to bacterial mutS? In fact, how many mutS-related genes do we appear to possess? Yes - 7 Q3: Discussion question: How did the human genome come to contain multiple, related copies of a gene? Repeated duplication of the original (ancestral) gene During this discussion, the following terms will be introduced, and you will henceforth be responsible to understand each of them: Gene family: Homolog/homologous: Ortholog/orthologous: Gene duplication: Paralog/paralogous: e. Analysis of the alignment between mutS and hMSH3: Q4: In the space below, paste the first alignment of your BLAST report (10 pt Courier). ref|NP_002430.1| mutS homolog 3; mutS (E. coli) homolog 3 [Homo sapiens] Length = 1128 Score = 273 bits (697), Expect = 5e-73 Identities = 243/883 (27%), Positives = 410/883 (46%), Gaps = 108/883 (12%) Query: 10 Sbjct: 220 Query: 70 Sbjct: 275 Query: 122 Sbjct: 333 HTPMMQQYLKLKAQHPEILLFYRMGDFYELFYDDAKRASQLLDISLTKRGASAGEPIPMA 69 +TP+ QY+++K QH + +L G Y F +DA+ A++ L+I A YTPLELQYIEMKQQHKDAVLCVECGYKYRFFGEDAEIAARELNIY-----CHLDHNFMTA 274 GIPYHAVENYLAKLVNQGESVAICEQ--------IGDPATSKGPVERKVVRIVTPGTISD 121 IP H + ++ +LV +G V + +Q IGD +S RK+ + T T+ SIPTHRLFVHVRRLVAKGYKVGVVKQTETAALKAIGDNRSSL--FSRKLTALYTKSTLIG 332 E---------------ALLQERQDNLLAAIWQDSKG----------FGYATLDISSGRF- 155 E ++ + + L I ++ + G + ++G EDVNPLIKLDDAVNVDEIMTDTSTSYLLCISENKENVRDKKKGNIFIGIVGVQPATGEVV 392 Query: 156 Sbjct: 393 Query: 203 Sbjct: 453 Query: 253 Sbjct: 512 Query: 312 Sbjct: 572 Query: 368 Sbjct: 632 Query: 424 Sbjct: 692 Query: 476 Sbjct: 746 Query: 535 Sbjct: 805 Query: 595 Sbjct: 865 Query: 652 Sbjct: 925 Query: 712 Sbjct: 985 --RLSEPADRETMAAELQRTNPAELLYAEDFAE-----------MSLIEGRRGLRRRPLW 202 + A R + + P ELL +E +S+ + R + R FDSFQDSASRSELETRMSSLQPVELLLPSALSEQTEALIHRATSVSVQDDRIRVERMDNI 452 EFEIDTARQQLNLQFGTRDLVGF-------GVENAPRG-LCAAGCLLQYAKD--TQRTTL 252 FE A Q + +F +D V G+ N + +C+ +++Y K+ ++ YFEYSHAFQAVT-EFYAKDTVDIKGSQIISGIVNLEKPVICSLAAIIKYLKEFNLEKMLS 511 PHIRSITMEREQDSIIMDAATRRNLEITQNLAG-GAENTLASVLDCTVTPMGSRMLKRWL 311 + + + + ++ T RNLEI QN + +L VLD T T G R LK+W+ KPENFKQLSSKMEFMTINGTTLRNLEILQNQTDMKTKGSLLWVLDHTKTSFGRRKLKKWV 571 HMPVRDTRVLLERQQTIGAL----QDFTAELQPVLRQVGDLERILARLALRTARPRDLAR 367 P+ R + R + + +++ LR++ D+ER L + + ++ TQPLLKLREINARLDAVSEVLHSESSVFGQIENHLRKLPDIERGLCSIYHKKCSTQEFFL 631 MRHAFQQLP-ELRAQLETVDS-APVQALREKMGEFAELRDLLER--AIIDTPPVLVRDGG 423 + L E +A + V+S LR + E EL +E I++ V D IVKTLYHLKSEFQAIIPAVNSHIQSDLLRTVILEIPELLSPVEHYLKILNEQAAKVGDKT 691 VIASGYNE------ELDEWRALADGATDYLERLEVRERERTGLDTLKVGFNAVHG--YYI 475 + ++ DE + + D +L+ + R L + V G + I ELFKDLSDFPLIKKRKDEIQGVIDEIRMHLQEI------RKILKNPSAQYVTVSGQEFMI 745 QISRGQSHLAPINYMRRQTLKNAERYIIPELKEYEDKVLTSKGKALALEKQL-YEELFDL 534 +I P ++++ + K R+ P + E + L + L L+ + + + EIKNSAVSCIPTDWVKVGSTKAVSRFHSPFIVE-NYRHLNQLREQLVLDCSAEWLDFLEK 804 LLPHLEALQQSASALAELDVLVNLAERAYTLNYTCPTFIDKPGIRITEGRHPVVEQVLNE 594 H +L ++ LA +D + +LA+ A +Y PT ++ I I GRHPV++ +L E FSEHYHSLCKAVHHLATVDCIFSLAKVAKQGDYCRPTVQEERKIVIKNGRHPVIDVLLGE 864 P--FIANPLNLSPQ-RRMLIITGPNMGGKSTYMRQTALIALMAYIGSYVPAQKVEIGPID 651 ++ N +LS R++IITGPNMGGKS+Y++Q ALI +MA IGSYVPA++ IG +D QDQYVPNNTDLSEDSERVMIITGPNMGGKSSYIKQVALITIMAQIGSYVPAEEATIGIVD 924 RIFTRVGAADDLASGRSTFMVEMTETANILHNATEYSLVLMDEIGRGTSTYDGLSLAWAC 711 IFTR+GAAD++ GRSTFM E+T+TA I+ AT SLV++DE+GRGTST+DG+++A+A GIFTRMGAADNIYKGRSTFMEELTDTAEIIRKATSQSLVILDELGRGTSTHDGIAIAYAT 984 AENLANKIKALTLFATHYFELTQLPEK-MEGVANVHLDAL--------------EHGDTI 756 E +K+LTLF THY + +L + V N H+ L + D + LEYFIRDVKSLTLFVTHYPPVCELEKNYSHQVGNYHMGFLVSEDESKLDPGAAEQVPDFV 1044 Query: 757 AFMHSVQDGAASKSYGLAVAALAGVPKEVIKRARQKLRELESI 799 F++ + G A++SYGL VA LA VP E++K+A K +ELE + Sbjct: 1045 TFLYQITRGIAARSYGLNVAKLADVPGEILKKAAHKSKELEGL 1087 Q5: Relatedness of the proteins: How much identity and similarity are shared by the two proteins? Identity = 245/897 Similarity = 420/897 How does this compare to last week’s analysis of E. coli vs. Y. pestis mutS genes? Identity = 27% vs. 80% Similarity = 46% vs. 88% In simple terms, what does this comparison reveal about evolutionary relatedness among these three species? The three are related, however, the bacterial strains, as could be expected show A greater similarity than the bacterial to human strain Q6: Comparing the lengths of the bacterial and human proteins: How do the lengths of E. coli mutS and hMSH3 differ? E. coli mutS = 853 aa hMSH3 = 1128 aa Does the alignment show the full length sequence of each protein? If not, why not? No – BLAST is a local alignment scheme that only shows regions that meet a certain minimum score. Interpret the “Gaps” portion of the BLAST report for this alignment. In order to create an optimal alignment, how many locations containing gaps were introduced into the sequence? Shows number gap characters 132 gap characters introduced. However, there are only 18 gaps. Explain what the “Gaps” report means to say “Gaps = 132/897 (14%)” Basically, there were 132 indels required to attain the highest scoring alignment Between the two sequences. This means that 14%of the best alignment consists of gap characters. f. Study one of the human mutS homologs: point your browser to the first human gene from the BLAST report, and open the annotation for this gene (hMSH3). This will bring up a page titled “LocusLink”. This page serves as a gateway to a variety of sources of information about this gene. g. Start by scrolling to “Annotation for this locus”, and open the hyperlink associated with “mRNA”. Paste the output here. LOCUS DEFINITION ACCESSION VERSION KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL PUBMED REMARK REFERENCE AUTHORS TITLE JOURNAL PUBMED REMARK REFERENCE AUTHORS NM_002439 4374 bp mRNA linear PRI 23-AUG-2004 Homo sapiens mutS homolog 3 (E. coli) (MSH3), mRNA. NM_002439 NM_002439.1 GI:4505248 . Homo sapiens (human) Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 1 (bases 1 to 4374) Plaschke,J., Kruger,S., Jeske,B., Theissig,F., Kreuz,F.R., Pistorius,S., Saeger,H.D., Iaccarino,I., Marra,G. and Schackert,H.K. Loss of MSH3 protein expression is frequent in MLH1-deficient colorectal cancer and is associated with disease progression Cancer Res. 64 (3), 864-870 (2004) 14871813 GeneRIF: MSH3 abrogation may be a predictor of metastatic disease or even favor tumor cell spread in MLH1-deficient colorectal cancers. 2 (bases 1 to 4374) Mazurek,A., Berardini,M. and Fishel,R. Activation of human MutS homologs by 8-oxo-guanine DNA damage J. Biol. Chem. 277 (10), 8260-8266 (2002) 11756455 GeneRIF: hMSH2-hMSH3 did not appear to bind any of the 8-oxo-G containing DNA substrates nor was there enhanced ATPase or ADP --> ATP exchange activities. 3 (bases 1 to 4374) Arzimanoglou,I.I., Hansen,L.L., Chong,D., Li,Z., Psaroudi,M.C., Dimitrakakis,C., Jacovina,A.T., Shevchuk,M., Reid,L., Hajjar,K.A., Vassilaros,S., Michalas,S., Gilbert,F., Chervenak,F.A. and Barber,H.R. TITLE Frequent LOH at hMLH1, a highly variable SNP in hMSH3, and negligible coding instability in ovarian cancer JOURNAL Anticancer Res. 22 (2A), 969-975 (2002) PUBMED 12014680 REMARK GeneRIF: Frequent LOH at hMLH1, a highly variable SNP in hMSH3, and negligible coding instability occur in ovarian cancer. REFERENCE 4 (bases 1 to 4374) AUTHORS Ceccotti,S., Ciotta,C., Fronza,G., Dogliotti,E. and Bignami,M. TITLE Multiple mutations and frameshifts are the hallmark of defective hPMS2 in pZ189-transfected human tumor cells JOURNAL Nucleic Acids Res. 28 (13), 2577-2584 (2000) PUBMED 10871409 REFERENCE 5 (bases 1 to 4374) AUTHORS Risinger,J.I., Umar,A., Boyd,J., Berchuck,A., Kunkel,T.A. and Barrett,J.C. TITLE Mutation of MSH3 in endometrial cancer and evidence for its functional role in heteroduplex repair JOURNAL Nat. Genet. 14 (1), 102-105 (1996) PUBMED 8782829 REFERENCE 6 (bases 1 to 4374) AUTHORS Watanabe,A., Ikejima,M., Suzuki,N. and Shimada,T. TITLE Genomic organization and expression of the human MSH3 gene JOURNAL Genomics 31 (3), 311-318 (1996) PUBMED 8838312 REFERENCE 7 (bases 1 to 4374) AUTHORS Fujii,H. and Shimada,T. TITLE Isolation and characterization of cDNA clones derived from the divergently transcribed gene in the region upstream from the human dihydrofolate reductase gene JOURNAL J. Biol. Chem. 264 (17), 10057-10064 (1989) PUBMED 2722860 COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final NCBI review. The reference sequence was derived from U61981.1. FEATURES Location/Qualifiers source 1..4374 /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="5" /map="5q11-q12" gene 1..4374 /gene="MSH3" /db_xref="GeneID:4437" /db_xref="LocusID:4437" /db_xref="MIM:600887" CDS 17..3403 /gene="MSH3" /function="putative mismatch repair/binding protein" /note="mutS (E. coli) homolog 3; go_function: ATP binding [goid 0005524] [evidence IEA]; go_function: damaged DNA binding [goid 0003684] [evidence IEA]; go_process: mismatch repair [goid 0006298] [evidence TAS] [pmid 8782829]" /codon_start=1 /product="mutS homolog 3" /protein_id="NP_002430.1" /db_xref="GI:4505249" /db_xref="GeneID:4437" /db_xref="LocusID:4437" /db_xref="MIM:600887" /translation="MSRRKPASGGLAASSSAPARQAVLSRFFQSTGSLKSTSSSTGAA DQVDPGAAAAAAPPAPAFPPQLPPHVATEIDRRKKRPLENDGPVKKKVKKVQQKEGGS DLGMSGNSEPKKCLRTRNVSKSLEKLKEFCCDSALPQSRVQTESLQERFAVLPKCTDF DDISLLHAKNAVSSEDSKRQINQKDTTLFDLSQFGSSNTSHENLQKTASKSANKRSKS IYTPLELQYIEMKQQHKDAVLCVECGYKYRFFGEDAEIAARELNIYCHLDHNFMTASI PTHRLFVHVRRLVAKGYKVGVVKQTETAALKAIGDNRSSLFSRKLTALYTKSTLIGED VNPLIKLDDAVNVDEIMTDTSTSYLLCISENKENVRDKKKGNIFIGIVGVQPATGEVV FDSFQDSASRSELETRMSSLQPVELLLPSALSEQTEALIHRATSVSVQDDRIRVERMD NIYFEYSHAFQAVTEFYAKDTVDIKGSQIISGIVNLEKPVICSLAAIIKYLKEFNLEK MLSKPENFKQLSSKMEFMTINGTTLRNLEILQNQTDMKTKGSLLWVLDHTKTSFGRRK LKKWVTQPLLKLREINARLDAVSEVLHSESSVFGQIENHLRKLPDIERGLCSIYHKKC STQEFFLIVKTLYHLKSEFQAIIPAVNSHIQSDLLRTVILEIPELLSPVEHYLKILNE QAAKVGDKTELFKDLSDFPLIKKRKDEIQGVIDEIRMHLQEIRKILKNPSAQYVTVSG QEFMIEIKNSAVSCIPTDWVKVGSTKAVSRFHSPFIVENYRHLNQLREQLVLDCSAEW LDFLEKFSEHYHSLCKAVHHLATVDCIFSLAKVAKQGDYCRPTVQEERKIVIKNGRHP VIDVLLGEQDQYVPNNTDLSEDSERVMIITGPNMGGKSSYIKQVALITIMAQIGSYVP AEEATIGIVDGIFTRMGAADNIYKGRSTFMEELTDTAEIIRKATSQSLVILDELGRGT STHDGIAIAYATLEYFIRDVKSLTLFVTHYPPVCELEKNYSHQVGNYHMGFLVSEDES KLDPGAAEQVPDFVTFLYQITRGIAARSYGLNVAKLADVPGEILKKAAHKSKELEGLI NTKRKRLKYFAKLWTMHNAQDLQKWTEEFNMEETQTSLLH" ORIGIN 1 61 121 181 241 301 361 421 481 541 601 661 721 781 841 901 961 1021 1081 1141 1201 1261 1321 1381 1441 1501 1561 1621 1681 1741 1801 1861 1921 1981 2041 2101 gggcacgagc ctcagcccct atccacctcc agcgccccca cagaagaaag ccaacaaaag tctgaggacc tgcccttcct aaaatgtact agattcgaaa atcatcaaat gtccaaaagc agatgcagtt gattgcagcc tatacctact gggagttgtg actcttttcc gaatccccta taccagctat caacattttt tttccaggac agagctgctg atctgttagt atacagccat aggttctcaa tgccatcata ttttaaacag tctggaaatc agaccacact ccttaaatta atctagtgtg actctgtagc tttatatcac gtcagacttg ttacttaaag agacctttct cctgccatgt gcgaggcaag tcctccacag gcgcccgcct aagagaccat gaaggaggaa aggaatgttt caaagtagag gattttgatg cgtcaaatta acaagtcatg atctatacgc ttgtgtgtgg cgagagctca cacagactgt aagcaaactg cggaaattga atcaagctgg cttctgtgca attggcattg tctgcttctc cttccttcgg gtgcaggatg gctttccagg attatttctg aaatacctca ctatcaagta ctacagaatc aaaacttcat agggaaataa tttggtcaga atttatcaca ctaaagtcag ctccggaccg atactcaatg gacttccctt ctcgccggaa cggttttgag gtgcagccga tcccgcccca tggaaaatga gtgatctggg caaagtctct tccagacaga atatcagtct atcaaaagga aaaatttaca cgctagaatt aatgtggata atatttattg ttgttcatgt aaactgcagc ctgcccttta atgatgctgt tctctgaaaa tgggagtgca gttcagagct ccttgtccga acagaattcg cagttacaga gcattgttaa aagaattcaa aaatggaatt agactgatat ttgggagacg atgcccggct tagaaaatca aaaaatgttc aatttcaagc ttattttaga aacaagctgc taataaaaaa gcctgcgtcg ccgattcttc ccaggtggac gctgccgccg tgggcctgtt aatgtctggc ggaaaaattg atctctgcag tctacacgca cacaacactt gaaaactgct acaatacata taagtataga ccatttagat acgccgcctg attaaaggcc tacaaaatct aaatgttgat taaggaaaat gcctgccaca agaaacccgg gcaaacagag agtcgaaagg gttttatgca cttagagaag cttggaaaag tatgacaatt gaaaaccaaa gaagttaaag tgatgctgta tctacgtaaa tacccaagag aataatacct aattcctgaa caaagttggg gaggaaggat ggcggcctcg cagtctacgg cctggcgctg cacgtagcta aaaaagaaag aactctgagc aaagaattct gagagatttg aagaatgcag tttgatctca tccaaatcag gaaatgaagc ttctttgggg cacaacttta gtggcaaaag attggagaca acacttattg gagataatga gttagggaca ggcgaggttg atgtcaagcc gcgctcatcc atggataaca aaagatacag cctgtgattt atgctctcca aatggaacaa ggaagtttgc aagtgggtga tcggaagttc ttgcccgaca ttcttcttga gctgttaatt ctcctcagtc gataaaactg gaaattcaag ctgcctccag gaagcctgaa cagcggccgc cagaaattga taaagaaagt caaagaaatg gctgcgattc cagttctgcc tttcttctga gtcagtttgg ctaacaaacg agcagcacaa aagatgcaga tgacagcaag gatataaggt acagaagttc gagaagatgt ctgatacttc aaaaaaaggg tgtttgatag tgcagccagt acagagccac tttattttga ttgacatcaa gctctttggc aacctgagaa cattaaggaa tgtgggtttt cccagccact tccattcaga tagagagggg ttgtcaaaac cccacattca cagtggagca aattatttaa gtgttattga 2161 2221 2281 2341 2401 2461 2521 2581 2641 2701 2761 2821 2881 2941 3001 3061 3121 3181 3241 3301 3361 3421 3481 3541 3601 3661 3721 3781 3841 3901 3961 4021 4081 4141 4201 4261 4321 cgagatccga tgtgacagta accaactgat tattgtagaa tgctgaatgg agtgcatcac agattactgc ccctgtgatt atcagaggac ctacataaaa agaagaagcg tatatataaa aaaagcaaca tgatggaatt aaccctgttt ggtggggaat cgcagcagaa agcaaggagt gaaagcagct caagtatttt ggagttcaac tgaacaaaaa tatctttgtg agaggttttt aacactcttg aaagccttaa tgatattttt atccattgaa tttttataag tgggaggcca tggcaaaacc ttagctgggc aatctcttga gcttgggcaa caagctttta ttgtcatagg taataaatat atgcatttgc tcaggacagg tgggtaaagg aattacagac cttgattttc ctagcaactg agaccaactg gatgtgttgc tcagagagag caagttgcat acaattggga ggacggagta tcacagtcct gccattgcct gtcacccatt taccacatgg caagtccctg tatggattaa cacaagtcaa gcaaagttat atggaagaaa atggagaatt tgacatgtga ctgaagacag aatagacttc gtggcagaat atttgtttca ctaaaataat tagaaagaat aggtaggcag ccatctttac atggtggcgc acctgggagg cagagcaaga aaaactagag attaagcagt ttaatgaata aagaaatacg agtttatgat ttggaagcac atctgaatca tagagaaatt ttgactgcat tacaagaaga tgggagaaca taatgataat tgattaccat ttgtggatgg catttatgga tggttatctt atgctacact atccgccagt gattcttggt attttgtcac atgtggctaa aagagctgga ggacgatgca cacagacttc aaaaatacca gcataaaatt tctttttcaa cactttgtaa ataattccca gttcagataa tttattatgc tggccaggca atcacctgag taaaaatata acacctgtag cggaggttgc ctccatctca cacagaagga ttaaagattg cttgctataa aaaaatacta agaaataaag aaaagctgtg gctccgggag cagtgaacat tttctccctg aagaaaaatt ggatcaatat taccggacca catggctcag cattttcaca agaactgact ggatgaacta tgagtatttc ttgtgaacta cagtgaggat cttcctttac actagcagat aggattaata taatgcacaa tcttcttcat actgtacaaa atgaccatgg gtttctgtct ttagaaaatt agcttttgga ttggcaactg aaccagttta tggtggctca gtcaggagtt aagtacatct tcccagctac aatgagccga aaaaagaaaa ataaggtcat ttggatgaaa aaaaaaaaaa aaaaatcctt aactctgctg agccgctttc cagctagtcc tatcactcct gccaaggtcg gtaataaaaa gtcccaaata aacatgggtg attggctcct aggatgggtg gacacagcag ggaagaggga atcagagatg gaaaaaaatt gaaagcaaac caaataacta gttcctggag aatacgaaaa gacctgcaga taaaatgaag ataactctcc tatattccta tcctaacttt ttatggacag gggtgatata ggtgaatctg tccaccaaga tgcctgtaat caagaccagc ctactaaaaa tccggaggct gatcacgtca aagaaaagaa gaaatttaaa ttatttgtca aaaaaaaaaa ctgcacaata tatcttgtat actctccttt ttgactgcag tgtgtaaagc ctaagcaagg atggaaggca atacagattt gaaagagctc atgttcctgc ctgcagacaa aaataatcag cgagcactca tgaaatcctt actcacacca tggatccagg gaggaattgc aaattttgaa gaaagagact agtggacaga actacatttg agtaacagcc ttggaaacag tctacgtata taagtccagt aaaatttact gcaggaatct acataagaat cccagcactt ctggccaaca tacgaaaaaa gaggcaggag ctgcactcca atagaattat aggttaaata ttcattcaag aaaa // h. Using this report, follow pp. 86-90 in BFD, making sure that you understand how to navigate the information in this report page (skip the section titled “Retrieving GenBank entries without accession numbers”, pp. 90-91). Q7: Q8: Q9: Q10: What is the length of the nucleotide sequence in this report? 4,374 Does it represent a DNA or RNA sequence? mRNA On what chromosome does this gene reside? 5 Where does the coding region begin and end (at what nucleotides)? Nucleotides 17 to 34303 i. Using a Gene-Centric database: read and follow C & N/BFD, pp. 91-95, for this next exercise. The LocusLink tool provides a gene-centered view, in which information about a particular gene is integrated into one site, so that you can get hold of everything about that gene in one fell swoop. Use the “Back” button to return to the LocusLink page. j. Gene structure of hMSH3: at this point you know the length of the protein, and of the mRNA from which the hMSH3 protein is translated. But what about the length of the gene itself? To learn about the overall structure of the gene, click on the link titled “Click to Display mRNA-Genomic Alignments” A new window, Evidence Viewer, will open From here “Go to full display with alignments” k. Answer the following questions using information from the full display page in Evidence Viewer: l. Q11: How long is the hMSH3 gene? At what nt positions does the gene start and end? 222,896 base pairs going from 30544497 to 30767392 Q12: How many exons and introns does this gene have? 25 exons and 24 introns Q13: What percentage of the gene is codegenic, i.e., what percentage of the gene is devoted to exons, and what percentage to introns? From Q10 the coding region consists of 3387 nt. Q11 gives the total gene length as 222,896 nt. Thus, only 1.53% of the gene is codegenic. Would you like a summary of all that is known about this gene’s relevance to human disease? For this body of information, select OMIM, the Online Mendelian Inheritance in Man database. See if you can find the answers to the following questions: Q14: Q15: Q16: The hMSH2 gene is known to be responsible for a hereditary cancer known as Type I Hereditary Nonpolyposis Colon Cancer (HNPCC). Is the hMSH3 gene associated with a particular cluster or family of cancers? If so, which one(s)? malignancies in blood cells What happens to mice that have been genetically engineered to lack the mouse version of hMSH3 (mMSH3)? Does loss of the mMSH3 gene cause cancer predisposition? They showed no predisposition to cancer. However, MSH3/MSH6 deficiencies showed infinitesimal tumor genisis. The removal of introns and joining of exons during RNA processing follows some very specific rules, which occasionally can be bent or sidestepped in small but important ways to achieve alternative splicing. One of the most closely followed tenets is known as the “GT-AG Rule”, in which the first two bases of every intron are GT, and the last two bases of every intron are AG. Does every intron in hMSH3 obey this rule? If not, which one(s) deviates and how? Intron 6 has an AT – AA combination instead of a GT – GA combination. m. Making sense of the entire human genome: follow pp. 105-110 of C & N/BFD as you learn how to navigate the Ensembl Project tool, using hMSH3 as an example. e! is the symbol for the Ensembl Project, which is located at http://www.ensembl.org. Follow the procedure on pp. 106-107 to begin navigating the Ensembl website. At the top of page 108, begin your search for the hMSH3 gene by browsing the appropriate chromosome. Q17: How long is this chromosome, and how many confirmed (known) genes are encoded on this chromosome? The chromosome is 181,034,922 bps long. Known genes = 831/ 1317 (Wouldn’t you love to know what is a pseudogene? Stay tuned, we’ll talk about these fascinating relics of our past when we get to phylogenetics and evolution) n. From the MapView page, use Find to Lookup a Gene named MSH3. o. From the TextView page, click on the Ensembl Gene hyperlink and move forward to the GeneView (Ensembl Gene Report) page. Scroll down this page. Under Transcript Structure, choose Exon information. Q18: Can you discover the lengths of both the shortest and the longest introns in this gene? What are the identities of these introns, and what are their lengths? Shortest: Intron 5 – 6 380bp Longest: Intron 8 – 9 46,359bp p. Click the appropriate hyperlink to View gene in genomic location. This will take you to a ContigView page. Scroll up and down the page for a few minutes to get the feel of this complex and comprehensive report, then answer the following questions: Q19: Chromosomes are subdivided into a short arm, or p arm and a long arm, called the q arm. These arms are further subdivided into regions that are numbered, e.g., 4p13.2. In what subregion does hMSH3 reside? q14.1 Where did you find this information on the ContigView page? In the “Overview” Box Q20: In the vicinity of MSH3, are there any genes of unknown function, i.e., hypothetical genes? How are they distinguished from known genes in this schematic overview of the genome? Yes, they are called novels and are indicated by a black bar as opposed to a red bar Q21: Post-lab assignment: next door to the MSH3 gene is a known gene named DHFR. Click on the reddish bar representing the gene to get started on the road to understanding more about this neighboring gene. Use your skills at navigating genome databases to answer the following questions about DHFR [HINT: from this starting point, you should be no more than 2-3 clicks (at most) from the answers to the questions below]. (1) What enzyme is encoded by the DHFR gene, and (briefly) what is its function? Dihydrofiolate reductase (2) Is this gene associated with any diseases when it is lost or defective? Explain briefly. Megaloblastic Anemia (3) Is this gene associated in any way with cancer? If so, briefly describe. In Chemotherapy mutations in DHFR can cause a resistance to methotexate