HW1

advertisement
2008 Spring Biological database Homework 1
944295 姜俊成
This problem set is due by 2PM, March 25, 2008. You shall upload your answers to your
web site as instructed by your TA. For all questions, please make a reference such as
screen-shot to indicate the source of your answer.
1. Here is a nucleotide sequence:
CTCCAGGCCCGTGGGGCTGGCCCTGCACCGCCGAGCTTCCCGGGATGAGGGCCCCCGGTGTGGTCACCCG
GCGCGCCCCAGGTCGCTGAGGGACCCCGGCCAGGCGCGGAGATGGGGGTGCACGAATGTCCTGCCTGGCT
GTGGCTTCTCCTGTCCCTGCTGTCGCTCCCTCTGGGCCTCCCAGTCCTGGGCGCCCCACCACGCCTCATC
TGTGACAGCCGAGTCCTGGAGAGGTACCTCTTGGAGGCCAAGGAGGCCGAGAATATCACGACGGGCTGTG
CTGAACACTGCAGCTTGAATGAGAATATCACTGTCCCAGACACCAAAGTTAATTTCTATGCCTGGAAGAG
GATGGAGGTCGGGCAGCAGGCCGTAGAAGTCTGGCAGGGCCTGGCCCTGCTGTCGGAAGCTGTCCTGCGG
GGCCAGGCCCTGTTGGTCAACTCTTCCCAGCCGTGGGAGCCCCTGCAGCTGCATGTGGATAAAGCCGTCA
GTGGCCTTCGCAGCCTCACCACTCTGCTTCGGGCTCTGGGAGCCCAGAAGGAAGCCATCTCCCCTCCAGA
TGCGGCCTCAGCTGCTCCACTCCGAACAATCACTGCTGACACTTTCCGCAAACTCTTCCGAGTCTACTCC
AATTTCCTCCGGGGAAAGCTGAAGCTGTACACAGGGGAGGCCTGCAGGACAGGGGACAGATGACCAGGTG
TGTCCACCTGGGCATATCCACCACCTCCCTCACCAACATTGCTTGTGCCACACCCTCCCCCGCCACTCCT
GAACCCCGTCGAGGGGCTCTCAGCTCAGCGCCAGCCTGTCCCATGGACACTCCAGTGCCAGCAATGACAT
CTCAGGGGCCAGAGGAACTGTCCAGAGAGCAACTCTGAGATCTAAGGATGTCACAGGGCCAACTTGAGGG
CCCAGAGCAGGAAGCATTCAGAGAGCAGCTTTAAACTCAGGGACAGAGCCATGCTGGGAAGACGCCTGAG
CTCACTCGGCACCCTGCAAAATTTGATGCCAGGACACGCTTTGGAGGCGATTTACCTGTTTTCGCACCTA
CCATCAGGGACAGGATGACCTGGAGAACTTAGGTGGCAAGCTGTGACTTCTCCAGGTCTCACGGGCATGG
Please use database mining tools of your choice to tell me as much as you can
about this sequence.
i.
What gene does this sequence represent in human? Erythropoietin What is its GI
number? 89026252 GenBank Accession number? NW_923574 Gene symbol? EPO
Unigene ID? Hs.2303
ii.
What database(s) did you search, and what tool(s) did you use in your search?
NCBI BLAST,Nucleotide、UniGene database What parameter settings did
you use?sequence and gene symbol
iii.
Retrieve one ortholog of this gene’s complete mRNA sequence and Protein
sequence in FASTA format.
>gi|113931667|ref|NM_007942.2| Mus musculus erythropoietin (Epo), mRNA
GATGAAGACTTGCAGCGTGGACACTGGCCCAGCCCCGGGTCGCTAAGGAGCTCCGGCAGCTAGGCGCGGA
GATGGGGGTGCCCGAACGTCCCACCCTGCTGCTTTTACTCTCCTTGCTACTGATTCCTCTGGGCCTCCCA
GTCCTCTGTGCTCCCCCACGCCTCATCTGCGACAGTCGAGTTCTGGAGAGGTACATCTTAGAGGCCAAGG
AGGCAGAAAATGTCACGATGGGTTGTGCAGAAGGTCCCAGACTGAGTGAAAATATTACAGTCCCAGATAC
CAAAGTCAACTTCTATGCTTGGAAAAGAATGGAGGTGGAAGAACAGGCCATAGAAGTTTGGCAAGGCCTG
TCCCTGCTCTCAGAAGCCATCCTGCAGGCCCAGGCCCTGCTAGCCAATTCCTCCCAGCCACCAGAGACCC
TTCAGCTTCATATAGACAAAGCCATCAGTGGTCTACGTAGCCTCACTTCACTGCTTCGGGTACTGGGAGC
TCAGAAGGAATTGATGTCGCCTCCAGATACCACCCCACCTGCTCCACTCCGAACACTCACAGTGGATACT
TTCTGCAAGCTCTTCCGGGTCTACGCCAACTTCCTCCGGGGGAAACTGAAGCTGTACACGGGAGAGGTCT
GCAGGAGAGGGGACAGGTGACATGCTGCTGCCACCGTGGTGGACCGACGAACTTGCTCCCCGTCACTGTG
TCATGCCAACCCTCC
>gi|21389309|ref|NP_031968.1| erythropoietin [Mus musculus]
MGVPERPTLLLLLSLLLIPLGLPVLCAPPRLICDSRVLERYILEAKEAENVTMGCAEGPRLSENITVPDT
KVNFYAWKRMEVEEQAIEVWQGLSLLSEAILQAQALLANSSQPPETLQLHIDKAISGLRSLTSLLRVLGA
QKELMSPPDTTPPAPLRTLTVDTFCKLFRVYANFLRGKLKLYTGEVCRRGDR
From this website,there are two link to get the mRNA and protein sequence
Compare the results obtained by blastn vs. blastp.
Blastn :Identities = 496/623 (79%)
Blastp :Identities = 133/166 (80%)
iv.
Retrieve at least 5 homologenes of this gene. Perform a multiple sequence
alignment? The human sequence is most similar to what organism?
Most similar to P.troglodytes
v.
Is the secondary structure of this protein known? If so, how many “helical
fold”are there in its 3D protein structure? How did you determine the exact
amino acid number of each helical region?
Yes,there are four helical folds. From this website choise the sequence detail
button top of the page.
vi.
Is the function of this protein known? If so, what does it do?
Yes,Extracellula region, Erythropoietin Receptor Binding and Hormone Activity
vii.
Which normal human tissues is this gene mainly expressed in? How did you
determine this?
prostate
viii.
Is this protein involved in any biological pathway(s)? If so, what does the
pathway do?
KEGG pathway: Cytokine-cytokine receptor interaction
04060
KEGG pathway: Hematopoietic cell lineage
04640
KEGG pathway: Jak-STAT signaling pathway
04630
ix.
Do any other databases contain information about the superfamily of this
target gene product? Which superfamily? How did you find out?
Yes, PDB
4-helical cytokines>EPO/TPO family
x.
Look for publications relevant to the function(s) of this protein in the
biomedical literature. Show one abstract of a relevant article.
To investigate the role of erythropoietin (EPO) as genetic determinant in the
susceptibility to sporadic amyotrophic lateral sclerosis (SALS). We sequenced a
259-bp region spanning the 3'hypoxia-responsive element of the EPO gene in 222
Italian SALS patients and 204 healthy subjects, matched for age and ethnic origin.
No potentially causative variation was detected in SALS subjects; in addition, two
polymorphic variants (namely C3434T and G3544T) showed the same genotype
and haplotype frequencies in patients and controls. Conversely, a weak but
significant association between G3544T and age of disease onset was observed
(p=0.04). Overall, our data argue against the hypothesis of EPO as a genetic risk
factor for motor neuron dysfunction, at least in Italian population. However,
further studies on larger cohort of patients are needed to confirm the evidence of
EPO gene as modifier factor.
xi.
Show the protein 3-D structure if there is any.
1. Find the zebra fish homolog of the above gene. And answer the following
questions:
i.
The zebra fish homolog is located on which chromosome? And in Human?
Both human and zebra fish are on Chromosome 7
ii.
Perform a cDNA and Polypeptide sequence alignment between human and
zebra fish of this gene.
iii.
How many exons does this gene have in zebrafish? How did you determine
this?
Five exons
iv.
What is the expression pattern of this gene in zebrafish? In human? In mouse?
erythropoietin
Download