probability drosophila

advertisement
1. (a) 進入 NCBI 的 BLASTn 將 sequence 放入,得到 NM_143133
(b)同一頁 source 得到 Drosophila melanogaster (fruit fly)
(c)同一頁點入 MEDLINE or PUBMED 得到 abstract
The genome sequence of Drosophila melanogaster
Science 287 (5461), 2185-2195 (2000)
The fly Drosophila melanogaster is one of the most intensively studied
organisms in biology and serves as a model system for the investigation of
many developmental and cellular processes common to higher eukaryotes,
including humans. We have determined the nucleotide sequence of nearly all
of the approximately 120-megabase euchromatic portion of the Drosophila
genome using a whole-genome shotgun sequencing strategy supported by
extensive clone-based sequence and a high-quality bacterial artificial
chromosome physical map. Efforts are under way to close the remaining gaps;
however, the sequence is of sufficient accuracy and contiguity to be declared
substantially complete and to support an initial analysis of genome structure
and preliminary gene annotation and interpretation. The genome encodes
approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis
elegans genome, but with comparable functional diversity.
2. (a)在 Human Genome Resources 的 BLAST the Genome 內放入 sequence, 選
ref|NT_026437.10|Hs14_26604 得到
LocusID : 6035 cytogenetic location :14q11.1
(b)在點入 seq 得到 1189643 - 1191188
>gi|29736559:1189643-1191188 Homo sapiens chromosome 14 genomic contig
TCAGGTGTTTGCAACTGCGTTTTATTGAAAGAAAGAGTGGAGGGGTTAACATGGGGCCCACCTCACAACC
CACTCTTCACCCCCAAAATCACGCAGGGATGGGACTCAGGAAAGGGAAGCATGTGTGTGTTGAATAGGAG
CCCTAACTGTAGTTACTTCTTTCACAGCAGGGAAGGAAGAGGGAAGAGGCAGCTGTGGAGAGGATGAGGT
TGAGGGAGGTGGGGTATCTCGCTGCTCTGACCTTAGGTAGAGTCCTCCACAGAAGCATCAAAGTGGACTG
GCACATATGGGCTCCCTTCACAGGCCACAATGATGTGTCTCTCCTTCGGGCTGGTCCGGTATGCACAGTT
GGGGTACCTGGAGCCGTTTGTCAGGCGGCAGTCTGTGATGTGCATGCTGGAGTTGCTCTTGTAGCAGTTG
CCCTGCCCGTTCTTGCAGGTGACCTTTTCCTGGAAACAGACATTCTGGACATCTACCAGGGGCTCGTGCA
CAAAGGTGTTCACTGGTTTGCACCGCCCCTGTGTCATATTCCGGCGCCTCATCATTTGGTTACAGTAGGT
GGAGCTGCTGCTGGGGGAACTGTCTGAGTCCATATGCTGCCGCTGGAATTTCTTGGCCCGGGATTCCTTG
CCCAGGGAAGGCTGGACCCAGCCCAGCACCAGCAGTATCAGGACAAGCAGAAGGAGCCGGACAAGAGACT
TCTCCAGAGCCATGGTGGCCTCACTTTCCCAGAAAAGCCTAATTGAGAAAAGGAGAGAGAAGGAAAAGAG
CCCTCTTAGAAAAAGAACTCCAGCTCCCACCTATGGCTTGCCTGGTTTACACTTTCTGTTTCAGGTCAGC
CAGAAAGACGTCCATCCTACTTCCTGAGGTTTTTCGTGTAATGAGACACTCACGTGTACCCTTTTCCACG
AGCGTTTTATTCCCACTCTCCAAAGCGAGGTCTTCCTCCCCTCTAGGTGGTGACCTACAAAGTGACCTAA
AATCCTGTCAGTTCCTATCTTTGAATACTGGAGTTTCTTTGATCCAAAATACATTTGTACCCAGTGCTCA
AATGGCTCATCGAAACATCAGTTTGAATAAATCATGAACCAAATTGTATCCCCGACATCCAGGGCCATGG
3. (a) 進入 ORF Finder 放入 sequence 得到 frame : -3 DNA : 471
(b) 將 frame –3 BLAST 得到 protein (FASTA)
protein: 156
>lcl|Sequence 1 ORF:243..713 Frame -3
MALEKSLVRLLLLVLILLVLGWVQPSLGKESRAKKFQRQHMDSDSSPSSSSTYCNQMM
RRRNMTQGRCKPVNTFVHEPLVDVQNVCFQEKVTCKNGQGNCYKSNSSMHITDCRLT
NGSRYPNCAYRTSPKERHIIVACEGSPYVPVHFDASVEDST
(c) 依 b 同一頁得到 name and accession number
DEFINITION ribonuclease, RNase A family, 1 (pancreatic) [Homo sapiens].
ACCESSION NP_002924
4.(a) Human Genome Resources 中 Search
,在第 14 個 gene 找到 RNASE1,進入後點
Maps
for
rnase
UIGENE,找到 M.musculus 後得到
DEFINITION Ribonuclease pancreatic precursor (RNase 1) (RNase A).
ACCESSION P00683
(b) 進入 BLAST 2 Sequences 選 blastp 分別放入兩支 protein sequences
Identities = 105/152 (69%), Positives = 123/152 (80%)
(c)上題同一頁
2
1
NOTE:The statistics (bitscore and expect value) is calculated based on the size of nr database
Score =
226 bits (576), Expect = 7e-59
Identities = 105/152 (69%), Positives = 123/152 (80%), Gaps = 3/152 (1%)
Query: 1
MALEKSLVRLLLLVLILLVLGWVQPSLGKESRAKKFQRQHMDSDSSPSSSSTYCNQMMRR 60
M LEKSL+
Sbjct: 1
L
L
L+LGWVQPSLG+ES A+KFQRQHMD D S +S TYCNQMM+R
MGLEKSLI---LFPLFFLLLGWVQPSLGRESAAQKFQRQHMDPDGSSINSPTYCNQMMKR 57
Go
Query: 61
RNMTQGRCKPVNTFVHEPLVDVQNVCFQEKVTCKNGQGNCYKSNSSMHITDCRLTNGSRY 120
R+MT G CKPVNTFVHEPL DVQ VC QE VTCKN + NCYKS+S++HITDC L
S+Y
Sbjct: 58
RDMTNGSCKPVNTFVHEPLADVQAVCSQENVTCKNRKSNCYKSSSALHITDCHLKGNSKY 117
Query: 121
PNCAYRTSPKERHIIVACEGSPYVPVHFDASV 152
PNC Y+T+
Sbjct: 118
++HIIVACEG+PYVPVHFDA+V
PNCDYKTTQYQKHIIVACEGNPYVPVHFDATV 149
(d) Pancreatic ribonuclease. Ribonucleases. Members include pancreatic RNAase A
and angiogenins. Structure is an alpha+beta fold -- long curved beta sheet and
three helices.
pfam00074
5.(a) 進入 ExPASy 的 SIB BLAST Network Service,將 protein sequence 放入,
得到第一個結果. RNP_HUMAN Ribonuclease pancreatic precursor (EC
3.1.27... 277 3e-74
P07998
(b) Compute pI/Mw
Molecular weight: 17644.24
Theoretical pI: 9.10
(c) 進入 ProtScale 得到
2.
(d) 用 PeptideMass chymotrypsin cleavage 得到 5 個 peptides
Chain RIBONUCLEASE PANCREATIC at positions 29 - 156 [Theoretical pI: 8.98 /
Mw (average mass): 14574.33 / Mw (monoisotopic mass): 14564.78]
mass
position #MC peptide sequence
2622.3609 126-148
0
RTSPKERHIIVACEGSPYVP VHF
1571.6941 88-101
0
QEKVTCKNGQGNCY
1250.6310 64-74
0
TQGRCKPVNTF
1245.5317 115-125
0
TNGSRYPNCAY
1219.4597 42-53
0
DSDSSPSSSSTY
1
|
11
|
21
|
31
41
51
|
|
|
ke srakkfqrqh mDSDSSPSSS
1
STYcnqmmrr
60
61 rnmTQGRCKP VNTFvheplv dvqnvcfQEK VTCKNGQGNC Yksnssmhit dcrlTNGSRY
120
121 PNCAYRTSPK ERHIIVACEG SPYVPVHFda svedst
(e)
(i)用 ScanProsite 得到 5 個
(ii)用 Prosite 搜尋,結果有 4 個
(iii)
>PDOC00001 PS00001 ASN_GLYCOSYLATION N-glycosylation site [pattern]
[Warning: pattern with a high probability of occurrence].
62 - 65 NMTQ
104 - 107 NSSM
116 - 119 NGSR
>PDOC00005 PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site [pattern] [Warning: pattern with a high probability of occurrence].
92 - 94 TcK
128 - 130 SpK
>PDOC00006 PS00006 CK2_PHOSPHO_SITE Casein kinase II phosphorylation
site [pattern] [Warning: pattern with a high probability of occurrence].
128 - 131 SpkE
151 - 154 SveD
>PDOC00008 PS00008 MYRISTYL N-myristoylation site [pattern] [Warning:
pattern with a high probability of occurrence].
96 - 101 GQgnCY
>PDOC00118 PS00127 RNASE_PANCREATIC Pancreatic ribonuclease family
signature [pattern].
68 - 74 CKpvNTF
Download