2008 Spring Biological database Homework 1 944295 姜俊成 This problem set is due by 2PM, March 25, 2008. You shall upload your answers to your web site as instructed by your TA. For all questions, please make a reference such as screen-shot to indicate the source of your answer. 1. Here is a nucleotide sequence: CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG Please use database mining tools of your choice to tell me as much as you can about this sequence. i. What gene does this sequence represent in human? Erythropoietin What is its GI number? 89026252 GenBank Accession number? NW_923574 Gene symbol? EPO Unigene ID? Hs.2303 ii. What database(s) did you search, and what tool(s) did you use in your search? NCBI BLAST,Nucleotide、UniGene database What parameter settings did you use?sequence and gene symbol iii. Retrieve one ortholog of this gene’s complete mRNA sequence and Protein sequence in FASTA format. >gi|113931667|ref|NM_007942.2| Mus musculus erythropoietin (Epo), mRNA GATGAAGACTTGCAGCGTGGACACTGGCCCAGCCCCGGGTCGCTAAGGAGCTCCGGCAGCTAGGCGCGGA GATGGGGGTGCCCGAACGTCCCACCCTGCTGCTTTTACTCTCCTTGCTACTGATTCCTCTGGGCCTCCCA GTCCTCTGTGCTCCCCCACGCCTCATCTGCGACAGTCGAGTTCTGGAGAGGTACATCTTAGAGGCCAAGG AGGCAGAAAATGTCACGATGGGTTGTGCAGAAGGTCCCAGACTGAGTGAAAATATTACAGTCCCAGATAC CAAAGTCAACTTCTATGCTTGGAAAAGAATGGAGGTGGAAGAACAGGCCATAGAAGTTTGGCAAGGCCTG TCCCTGCTCTCAGAAGCCATCCTGCAGGCCCAGGCCCTGCTAGCCAATTCCTCCCAGCCACCAGAGACCC TTCAGCTTCATATAGACAAAGCCATCAGTGGTCTACGTAGCCTCACTTCACTGCTTCGGGTACTGGGAGC TCAGAAGGAATTGATGTCGCCTCCAGATACCACCCCACCTGCTCCACTCCGAACACTCACAGTGGATACT TTCTGCAAGCTCTTCCGGGTCTACGCCAACTTCCTCCGGGGGAAACTGAAGCTGTACACGGGAGAGGTCT GCAGGAGAGGGGACAGGTGACATGCTGCTGCCACCGTGGTGGACCGACGAACTTGCTCCCCGTCACTGTG TCATGCCAACCCTCC >gi|21389309|ref|NP_031968.1| erythropoietin [Mus musculus] MGVPERPTLLLLLSLLLIPLGLPVLCAPPRLICDSRVLERYILEAKEAENVTMGCAEGPRLSENITVPDT KVNFYAWKRMEVEEQAIEVWQGLSLLSEAILQAQALLANSSQPPETLQLHIDKAISGLRSLTSLLRVLGA QKELMSPPDTTPPAPLRTLTVDTFCKLFRVYANFLRGKLKLYTGEVCRRGDR From this website,there are two link to get the mRNA and protein sequence Compare the results obtained by blastn vs. blastp. Blastn :Identities = 496/623 (79%) Blastp :Identities = 133/166 (80%) iv. Retrieve at least 5 homologenes of this gene. Perform a multiple sequence alignment? The human sequence is most similar to what organism? Most similar to P.troglodytes v. Is the secondary structure of this protein known? If so, how many “helical fold”are there in its 3D protein structure? How did you determine the exact amino acid number of each helical region? Yes,there are four helical folds. From this website choise the sequence detail button top of the page. vi. Is the function of this protein known? If so, what does it do? Yes,Extracellula region, Erythropoietin Receptor Binding and Hormone Activity vii. Which normal human tissues is this gene mainly expressed in? How did you determine this? prostate viii. Is this protein involved in any biological pathway(s)? If so, what does the pathway do? KEGG pathway: Cytokine-cytokine receptor interaction 04060 KEGG pathway: Hematopoietic cell lineage 04640 KEGG pathway: Jak-STAT signaling pathway 04630 ix. Do any other databases contain information about the superfamily of this target gene product? Which superfamily? How did you find out? Yes, PDB 4-helical cytokines>EPO/TPO family x. Look for publications relevant to the function(s) of this protein in the biomedical literature. Show one abstract of a relevant article. To investigate the role of erythropoietin (EPO) as genetic determinant in the susceptibility to sporadic amyotrophic lateral sclerosis (SALS). We sequenced a 259-bp region spanning the 3'hypoxia-responsive element of the EPO gene in 222 Italian SALS patients and 204 healthy subjects, matched for age and ethnic origin. No potentially causative variation was detected in SALS subjects; in addition, two polymorphic variants (namely C3434T and G3544T) showed the same genotype and haplotype frequencies in patients and controls. Conversely, a weak but significant association between G3544T and age of disease onset was observed (p=0.04). Overall, our data argue against the hypothesis of EPO as a genetic risk factor for motor neuron dysfunction, at least in Italian population. However, further studies on larger cohort of patients are needed to confirm the evidence of EPO gene as modifier factor. xi. Show the protein 3-D structure if there is any. 1. Find the zebra fish homolog of the above gene. And answer the following questions: i. The zebra fish homolog is located on which chromosome? And in Human? Both human and zebra fish are on Chromosome 7 ii. Perform a cDNA and Polypeptide sequence alignment between human and zebra fish of this gene. iii. How many exons does this gene have in zebrafish? How did you determine this? Five exons iv. What is the expression pattern of this gene in zebrafish? In human? In mouse? erythropoietin