The pattern

advertisement
Corrections
N-linked glycosylation (GlcNac):
Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Query:
annotation:(type:carbohyd "N-linked (GlcNAc...)" confidence:experimental) reviewed:yes
Taxonomic distribution
TPNLINDTME
Multiple alignment (ClustalW)
-[LAPIQ]-N-[HAYRCS]-[ST]-[KLESGM]
N-glycosylation does not occur in Bacteria: …false positive !
301 protein (within the set of 1000 proteins) are N-glycosylated
according to the UniProtKB annotation…!
Scan Prosite with the official pattern
The official pattern also match with bacteria sequences (false positives)
PRATT pattern with 20 sequences
D-K-T-G-T-[IL]-T-x(3)-[ILMV]-x-[FILV]
AT31_HUMAN:
SIMILARITY: Belongs to the cation transport ATPase (P-type) family.
Type V subfamily.
The pattern is a discriminator for ATP ase family (Cation-transporting )
C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H
Pattern scan
The pattern missed some Zn finger in the same protein
i.e. Q24174
Pattern
Profile
Not found with the pattern
The pattern:
C - X(2,4) - C - X(3) - [LIVMFYWC] - X(8) - H - X(3,5) – H
Should includes:
YRCVLCGTVAKSRNSLHSHMSrQHRGIST
C-X(2,4)-C-X(3)-[LIVMFYWCA]-X(8)-H-X(3,5)-H
Yes !
But:
The pattern becomes less restrictive.
You get more sequences which should not be here.
(As the results are limited to 1000, the number of hits is not the
same…)
Discriminators (Signatures, descriptors) for the
Zinc finger C2H2 type domain can be found in Prosite (Pattern and
Profile) and Pfam (HMM)
Step 1: scan UniProtKB/Swiss-Prot with the pattern
Use the ‘scanprosite’ tool at http://www.expasy.org/tools/scanprosite/
Step 2: Retrieve the matched human entries @ UniProt
(go at the end of the Scan Prosite result page: click on ‘Matched UniProtKB entries’)
Step 3: Retrieve the sequences annotated as being
‘phosphorylated on a Thr’
Step 3: Retrieve the sequences annotated as being
‘phosphorylated on a Thr’
-> 19 candidates to be manually checked ….
InterPro scan results
InterPro : other shema (Graphical view from UniProtKB)
InterPro shema
PFAM Graphical view
Prosite Graphical view
Blast @ NCBI against Swiss-Prot
NCBI: Color key for alignment scores
NCBI Swiss-Prot does not contain the alternative sequences (i.e. P28175-2) –
!! NCBI gives the ‘version number’ of the Swiss-Prot sequence (i.e. Q8BU25.2)….
UniProt: Color code for identity scores (not alignment !)
UniProt: Color code for identity scores (not alignment !)
ProDom database
List of proteins sharing at least a common domain…
1) BLAST at www.uniprot.org
2) PROSITE tools
You are lucky: domains are rarely not annotated in the different
domain/family databases !
3) Construct a profile with My hits at SIB
Use PSI Blast
Do a PSI BLAST against UniProtKB
Select sequence with a E value > 0.001 and do a second cycle
Look at the MSA
Construct a profile with the MSA
The profile
The profile hits
Construct a HMM with the MSA
The HMM
The HMM hits
- Look at the Goloco data in InterPro.
How many proteins (and/or hits) are found by the different methods ?
http://www.ebi.ac.uk/interpro/
According to InterPro: Goloco domain is described by at least one of the different
methods (PFAM, Prosite, Smart)
PFAM: 167 proteins
Prosite: 192 proteins
SMART: 1 proteins
These different numbers are the consequence of the interval between the different
releases of the different databases (including the sequence databases (UniProtKB). It
may also be due to the different methods used (HMM, profile…)
Look for the HMM for the Goloco domain in PFAM
Look for the HMM for the Goloco domain in PFAM
Download the HMM matrix
the HMM matrix
Download