midterm.

advertisement
917917
1.
(a). use blastn.
ACCESSION NM_143133
(b). SOURCE
Drosophila melanogaster (fruit fly)
ORGANISM Drosophila melanogaster
Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota;
Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha;
Ephydroidea; Drosophilidae; Drosophila.
(c). Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
The fly Drosophila melanogaster is one of the most intensively studied organisms in
biology and serves as a model system for the investigation of many developmental
and cellular processes common to higher eukaryotes, including humans. We have
determined the nucleotide sequence of nearly all of the approximately 120-megabase
euchromatic portion of the Drosophila genome using a whole-genome shotgun
sequencing strategy supported by extensive clone-based sequence and a high-quality
bacterial artificial chromosome physical map. Efforts are under way to close the
remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be
declared substantially complete and to support an initial analysis of genome structure
and preliminary gene annotation and interpretation. The genome encodes
approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans
genome, but with comparable functional diversity.
MeSH Terms:
Animal
Biological Transport/genetics
Chromatin/genetics
Cloning, Molecular
Computational Biology
Contig Mapping
Cytochrome P-450 Enzyme System/genetics
DNA Repair/genetics
DNA Replication/genetics
Drosophila melanogaster/metabolism
Drosophila melanogaster/genetics*
Euchromatin
Gene Library
Genes, Insect
Genome*
Heterochromatin/genetics
Insect Proteins/physiology
Insect Proteins/genetics
Insect Proteins/chemistry
Nuclear Proteins/genetics
Sequence Analysis, DNA*
Support, Non-U.S. Gov't
Support, U.S. Gov't, P.H.S.
Transcription, Genetic
Translation, Genetic
Substances:
Cytochrome P-450 Enzyme System
Nuclear Proteins
Insect Proteins
Heterochromatin
Euchromatin
Chromatin
Grant support:
P50-HG00750/HG/NHGRI
PMID: 10731132 [PubMed - indexed for MEDLINE]
2.
(a). LocusID: 6035, Cytogenetic: 14q11.1
(b). 14201796- 14211796
RefSeq
3.
(a). >lcl|Sequence 1 ORF:243..713 Frame –3, Length=471
(b).
MALEKSLVRLLLLVLILLVLGWVQPSLGKESRAKKFQRQHMDSDSSPSSSSTY
CNQMMRRRNMTQGRCKP
VNTFVHEPLVDVQNVCFQEKVTCKNGQGNCYKSNSSMHITDCRLTNGSRYPN
CAYRTSPKERHIIVACEG
SPYVPVHFDASVEDST
(c).
LOCUS
RNASE1
156 aa
linear PRI
07-APR-2003
DEFINITION ribonuclease, RNase A family, 1 (pancreatic) [Homo sapiens].
ACCESSION NP_002924
4.
(a).
LOCUS
Rnase1
149 aa
linear
ROD
07-APR-2003
DEFINITION ribonuclease, RNase A family, 1 (pancreatic); ribonuclease 1,
pancreatic [Mus musculus].
ACCESSION NP_035401
(b).
Protein in mouse
MGLEKSLILFPLFFLLLGWVQPSLGRESAAQKFQRQHMDPDGSSINSPTYCNQ
MMKRRDMTNGSCKPVNT
FVHEPLADVQAVCSQENVTCKNRKSNCYKSSSALHITDCHLKGNSKYPNCDY
KTTQYQKHIIVACEGNPY
VPVHFDATV
Use BLAST2
Score = 226 bits (576), Expect = 7e-59Identities = 105/152 (69%), Positives =
123/152 (80%), Gaps = 3/152 (1%)
(c).
BLAST 2 SEQUENCES RESULTS VERSION BLASTP 2.2.5 [Nov-16-2002]
Matrix BLOSUM62PAM30PAM70PAM250BLOSUM90BLOSUM50 gap open:
gap extension:
x_dropoff: expect: wordsize: Filter
-------------------------------------------------------------------------------Sequence 1 lcl|seq_1 Length 149 (1 .. 149)
Sequence 2 lcl|seq_2
Length 156 (1 .. 156)
2
1
NOTE:The statistics (bitscore and expect value) is calculated based on the size of nr
database
Score = 226 bits (576), Expect = 7e-59Identities = 105/152 (69%), Positives =
123/152 (80%), Gaps = 3/152 (1%)
Query: 1
MGLEKSLI---LFPLFFLLLGWVQPSLGRESAAQKFQRQHMDPDGSSINSPTYC
NQMMKR 57
M LEKSL+
L L L+LGWVQPSLG+ES A+KFQRQHMD D S
+S TYCNQMM+R
Sbjct: 1
MALEKSLVRLLLLVLILLVLGWVQPSLGKESRAKKFQRQHMDSDSSPSSSSTY
CNQMMRR 60
Query: 58
RDMTNGSCKPVNTFVHEPLADVQAVCSQENVTCKNRKSNCYKSSSALHITDC
HLKGNSKY 117
R+MT G CKPVNTFVHEPL DVQ VC QE VTCKN +
NCYKS+S++HITDC L S+Y
Sbjct: 61
RNMTQGRCKPVNTFVHEPLVDVQNVCFQEKVTCKNGQGNCYKSNSSMHITD
CRLTNGSRY 120
Query: 118 PNCDYKTTQYQKHIIVACEGNPYVPVHFDATV 149
PNC Y+T+ ++HIIVACEG+PYVPVHFDA+V
Sbjct: 121 PNCAYRTSPKERHIIVACEGSPYVPVHFDASV 152
CPU time:
Lambda
0.04 user secs.
K
H
0.03 sys. secs
0.07 total secs.
0.320
0.133
0.419
Gapped
Lambda
0.267
K
0.0410
H
0.140
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 275
Number of Sequences: 0
Number of extensions: 11
Number of successful extensions: 1
Number of sequences better than 10.0: 1
Number of HSP's better than 10.0 without gapping: 1
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1
length of query: 149
length of database: 456,953,620
effective HSP length: 125
effective length of query: 24
effective length of database: 456,953,495
effective search space: 10966883880
effective search space used: 10966883880
T: 9
A: 40
X1: 16 ( 7.4 bits)
X2: 129 (49.7 bits)
X3: 129 (49.7 bits)
S1: 41 (21.8 bits)
S2: 66 (30.0 bits)
(d).
RPS-BLAST 2.2.6 [Apr-09-2003]
Query= local sequence:
(149 letters)
Database: cdd.v1.62
11,088 PSSMs; 2,717,223 total columns
Domain Relatives
.. This CD alignment includes 3D structure. To display structure, download
Cn3D!
(bits) E
value
PSSMs producing significant alignments: Score
gnl|CDD|14893 smart00092, RNAse_Pc, Pancreatic ribonuclease ; 189 9e-50
gnl|CDD|5327 cd00163, RNAse_Pc, Pancreatic ribonucleases (RNAse) are pyrimi...
179 1e-46
gnl|CDD|7438 pfam00074, rnaseA, Pancreatic ribonuclease. Ribonucleases. Mem...
173 8e-45
-------------------------------------------------------------------------------gnl|CDD|14893, smart00092, RNAse_Pc, Pancreatic ribonuclease ;
CD-Length = 123 residues, 100.0% aligned
Score = 189 bits (482), Expect = 9e-50
Query: 26
RESAAQKFQRQHMDPDGSSINSPTYCNQMMKRRDMTNGSCKPVNTFVHEPL
ADVQAVCSQ 85
Sbjct: 1
QETRAQKFLRQHIDSTPSS-ASSNYCNQMMKRRNMTQGRCKPVNTFIHESLA
NVKAVCSN 59
Query: 86
ENVTCKNRKSNCYKSSSALHITDCHLKGNSKYPNCDYKTTQYQKHIIVACEG
NPYVPVHF 145
Sbjct:
60
KNVTCKNGRTNCHQSNSRFQLTDCRLTGGSKYPNCRYKTTQANKFIIVACEGN
PYVPVHF 119
Query: 146 DATV 149
Sbjct: 120 DGSV 123
-------------------------------------------------------------------------------gnl|CDD|5327, cd00163, RNAse_Pc, Pancreatic ribonucleases (RNAse) are
pyrimidine-specific endonucleases found in high quantity in the pancreas of certain
mammals and of some reptiles. Involved in endonucleolytic cleavage of
3'-phosphomononucleotides and 3'-phosphooligonucleotides ending in C-P or U-P
with 2',3'-cyclic phosphate intermediates. Catalytic mechanism is a
transphosphorylation of P-O 5' bonds on the 3' side of pyrimidines and subsequent
hydrolysis to generate 3' phosphate groups. Other family members include: bovine
seminal vesicle and brain ribonucleases; kidney non-secretory ribonucleases;
liver-type ribonucleases; angiogenin, which induces vascularization of normal and
malignant tissues; eosinophil cationic protein A cytotoxin and helminthotoxin with
ribonuclease activity; and frog liver ribonuclease and frog sialic acid-binding lectin
CD-Length = 119 residues, 100.0% aligned
Score = 179 bits (456), Expect = 1e-46
Query: 28
SAAQKFQRQHMDPDGSSINSPTYCNQMMKRRDMTNGSCKPVNTFVHEPLAD
VQAVCSQEN 87
Sbjct: 1
TRAQKFLRQHIDSTPSG-SSSNYCNQMMKRRNMTQGRCKPVNTFVHESLADV
KAVCSQKN 59
Query: 88
VTCKNRKSNCYKSSSALHITDCHLKGNSKYPNCDYKTTQYQKHIIVACEGNP
YVPVHFDA 147
Sbjct: 60
VTCKNGRNNCHQSNSSFQITDCRLTGGSKYPNCRYRTTQSNKHIIVACEGNPG
VPVHFDG 119
-------------------------------------------------------------------------------gnl|CDD|7438, pfam00074, rnaseA, Pancreatic ribonuclease. Ribonucleases.
Members include pancreatic RNAase A and angiogenins. Structure is an alpha+beta
fold -- long curved beta sheet and three helices.
CD-Length = 121 residues, 99.2% aligned
Score = 173 bits (440), Expect = 8e-45
Query: 27
ESAAQKFQRQHMDPDGSSINSPTYCNQMMKRRDMTNGSCKPVNTFVHEPLA
DVQAVCSQE 86
Sbjct: 2
ETRAQKFQRQHIDP-NTSSSSPNYCNQMMKRRNMTQGRCKPVNTFVHESLAD
VKAVCSQK 60
Query: 87
NVTCKNRKSNCYKSSSALHITDCHLKGNSKYPNCDYKTTQYQKHIIVACEGN
PYVPVHFD 146
Sbjct: 61
NVTCKNGQTNCYLSTSSFQLTDCRLTGGSKYPNCRYRTTPSTKRIIVACEGN--V
PVHFD 118
Query: 147 ATV 149
Sbjct: 119 GSV 121
pfam00074
5.
(a). Primary accession number P07998
(b).
DE Ribonuclease pancreatic precursor (EC 3.1.27.5) (RNase 1) (RNase A)
DE (RNase UpI-1) (RIB-1).
OS Homo sapiens (Human).
The computation has been carried out on the complete sequence.
-------------------------------------------------------------------------------Molecular weight: 17644.24
Theoretical pI: 9.10
(c).
Weights for window positions 1,..,7, using linear weight variation model:
1
2
3
4
5
6
7
0.80 0.87 0.93 1.00 0.93 0.87 0.80
edge
center
edge
MIN: 0.192
MAX: 0.947
(d).
FindPept tool
The entered sequence is:
MALEKSLVRL LLLVLILLVL GWVQPSLGKE SRAKKFQRQH MDSDSSPSSS
STYCNQMMRR
RNMTQGRCKP VNTFVHEPLV DVQNVCFQEK VTCKNGQGNC YKSNSSMHIT
DCRLTNGSRY
PNCAYRTSPK ERHIIVACEG SPYVPVHFDA SVEDST156 Amino Acids.
Theoretical pI/Mw: 9.10 / 17644.24
Entered peptide masses: 1000.000
Tolerance: ±0.5 daltons
Using monoisotopic masses of the occurring amino acid residues and interpreting
your peptide masses as [M+H]+.
Enzyme: Chymotrypsin (C-term to F/Y/W/M/L, not before P) (P00766).
Cysteine in reduced form.
--------------------------------------------------------------------------------
FindPept documentation
Mass values and considered PTMs
-------------------------------------------------------------------------------Matching peptides for unspecific cleavage:
User mass DB mass mass (daltons) peptide position modifications missed cleavages
1000.000 1000.415 0.415 (G)QGNCYKSNS(S) 97-105 0
1000.000 1000.473 0.473 (V)PVHFDASVE(D) 145-153 0
(e).
(i).
ScanProsite
Search a sequence against PROSITE
Sequence:
MALEKSLVRL LLLVLILLVL GWVQPSLGKE SRAKKFQRQH MDSDSSPSSS
STYCNQMMRR
RNMTQGRCKP VNTFVHEPLV DVQNVCFQEK VTCKNGQGNC YKSNSSMHIT
DCRLTNGSRY
PNCAYRTSPK ERHIIVACEG SPYVPVHFDA SVEDST
PROSITE Release 17.44, of 26-Apr-2003
>PDOC00001 PS00001 ASN_GLYCOSYLATION N-glycosylation site [pattern]
[Warning: pattern with a high probability of occurrence].
62 - 65 NMTQ
104 - 107 NSSM
116 - 119 NGSR
>PDOC00005 PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site [pattern] [Warning: pattern with a high probability of occurrence].
92 - 94 TcK
128 - 130 SpK
>PDOC00006 PS00006 CK2_PHOSPHO_SITE Casein kinase II phosphorylation site
[pattern] [Warning: pattern with a high probability of occurrence].
128 - 131 SpkE
151 - 154 SveD
>PDOC00008 PS00008 MYRISTYL N-myristoylation site [pattern] [Warning:
pattern with a high probability of occurrence].
96 - 101 GQgnCY
>PDOC00118 PS00127 RNASE_PANCREATIC Pancreatic ribonuclease family
signature [pattern].
68 - 74 CKpvNTF
Graphical summary of hits (java applet)
9 hits with 5 PROSITE entries
(ii).
4
(iii)
ScanProsite
Search a sequence against PROSITE
Sequence:
MALEKSLVRL LLLVLILLVL GWVQPSLGKE SRAKKFQRQH MDSDSSPSSS
STYCNQMMRR
RNMTQGRCKP VNTFVHEPLV DVQNVCFQEK VTCKNGQGNC YKSNSSMHIT
DCRLTNGSRY
PNCAYRTSPK ERHIIVACEG SPYVPVHFDA SVEDST
PROSITE Release 17.44, of 26-Apr-2003
>PDOC00001 PS00001 ASN_GLYCOSYLATION N-glycosylation site [pattern]
[Warning: pattern with a high probability of occurrence].
62 - 65 NMTQ
104 - 107 NSSM
116 - 119 NGSR
>PDOC00005 PS00005 PKC_PHOSPHO_SITE Protein kinase C phosphorylation
site [pattern] [Warning: pattern with a high probability of occurrence].
92 - 94 TcK
128 - 130 SpK
>PDOC00006 PS00006 CK2_PHOSPHO_SITE Casein kinase II phosphorylation site
[pattern] [Warning: pattern with a high probability of occurrence].
128 - 131 SpkE
151 - 154 SveD
>PDOC00008 PS00008 MYRISTYL N-myristoylation site [pattern] [Warning:
pattern with a high probability of occurrence].
96 - 101 GQgnCY
>PDOC00118 PS00127 RNASE_PANCREATIC Pancreatic ribonuclease family
signature [pattern].
68 - 74 CKpvNTF
Graphical summary of hits (java applet)
9 hits with 5 PROSITE entries
Download