Ch. 6 Working with single protein sequence Designing a protein on

advertisement
Ch. 6 Working with single protein sequence
-
Predicting physio-chemical properties, protease digestion patterns, coiled-coil domains,
post-translational modifications
- Finding out what protein domains can tell you about protein functions
EXPASY http://tw.expasy.org/ or http://www.expasy.org
Seq. = P32851
- compute the molecular weight, extinction coefficient, instability, half-life
Designing a protein on a computer
PeptideCutter – protease digestions
- separate the domains in your protein
- identify potential post-translational modification by mass spectrometry
- remove a tag protein when you express a fusion protein
- make sure the protein you are cloning isn’t sensitivity to some endogenous proteases
- protein is sliced, diced and chopped
2
http://www.expasy.org/tools/#proteome
Doing primary structure analysis
- sliding window approach, hydrophoblic regions, coiled-coil regions, hydrophilic region
- strong signals, robustness, window size problem
- transmembrane domains ~ 21 a.a.
- globular proteins, size ~ 7 – 11 a.a.
Looking for transmembrane segments
- hydrophoblic regions are characteristic of transmembrane proteins
ProtScale
Seq. = P78588
3
TMHMM http://www.cbs.dtu.dk/services
Identify 5 segments and fail to predict another 2 segments
Looking for coiled-coil regions
COILS server http://www.ch.embnet.org/software/COILS_form.html
- protein-protein interaction involve coiled-coil regions
Predicting post-translational modification site
PROSITE http://www.expasy.ch/tools/scanprosite
- a collection of short motif seqs. db, which associated with some biological function
- regular expression representation, i.e. [RK]-x-[ST] Æ RGT, KCS or KET
- compare your protein seq. with the list of pattern in PROSITE
Finding domains in your protein
- domains are independent globular folding units
- may interact with other proteins, bind an ion like calcium or zinc, or it may contain an active site
- InterProScan, CD server, Pfscan
4
InterProScan http://www.ebi.ac.uk/interpro/scan.html
Seq. = FOSB_HUMAN or P53539
All agree on the presence of
Leucine zippers
understanding InterProScan report
- yellow box, domain name, IPR###, PS00138,
- inconsistency Æ consult all the dbs
- domain collection library
- BLOCKS contains many short multiple alignments (blocks)
- Prodom, collection of domains through seq. comparison on NR
- Pfam, high quality manually (PfamA) and machine (PfamB) annotated domains
Finding domains with CD server
- report a score in the output
- CD doesn’t integrate as many dbs as InterProScan
-NCBI Æ BLAST Æ use RPS-BLAST Æ P53539
-deselect the low complexity box, because many domains are rich in certain amino-acids, i.e. the leucine
zippers, glycine-rich domains,
- E-value <0.01 mean significant, the lower the better
- Red domains are from SMART, blue domains are from Pfam, ragged ends indicate partial matches
5
- Watch out for False positive (FP) and False Negative (FN) results
Finding domains with Pfscan http://hits.isb-sib.ch/cgi-bin/PFSCAN
Seq. = P53539
Only score > 7
are consider
Green/Red or bar above/below Æ (conserved/not
expected to be here) amino acid at this position
- Pfscan found two domains not detected by InterProScan and CD, Proline-rich and Arginine-rich
Exercise
6
Name: _______________ Class:________________
Student ID:_______________
Point your browser to SWISS-PROT, http://tw.expasy.org/, or http://www.expasy.ch, or http://ca.expasy.org,
or http://us.expasy.org.
1.
2.
We need to predicting the physio-chemical properties of a well-known protein called Calmodulin
(SWISS-PROT ID, P06787), which mediates the control of a large number of enzymes by Ca(++).
Among the enzymes to be stimulated by the calmodulin-Ca(++) complex are a number of protein
kinases and phosphatases. Use the tool, ProtParam, in which you can find it in the SWISS-PROT web
page, to predict the following properties of this protein;
(i)
(ii)
(iii)
the theoretical pI value ______________
instability index ______________
is this protein stable ? (when the index is below 40, the protein is consider to be stable)
______________
(iv)
Grand average of hydropathicity value ______________.
In question 2, we will study the pattern and domain structures of calmodulin, using PROSITE
(http://www.expasy.org/tools/scanprosite/), InterProScan (http://www.ebi.ac.uk/interpro/scan.html),
and Pfscan servers (http://hits.isb-sib.ch/cgi-bin/PFSCAN).
For PROSITE search, remember to deselect the “scan profiles and rules boxes”. You will obtain
several patterns after the search.
z
z
z
z
Do you think short patterns are reliable ? Yes, it is reliable.
No, it is not reliable.
What is the domain name of the longest pattern ? _________________________________
How long is the longest domain pattern ? _________________________________________
Write down the sequence positions of all the longest domains.
_________________________________________________________________________
For InterProScan search, what is the name of the identified domain ? _______________________
For Pfscan search, how many predicted domains sequences have a score larger than 7 ? _________
Download