Duncan Legge EMBL-EBI Introduction to Protein Signatures & InterPro Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic. Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro Foundations of InterPro Integration of signatures Manual curation Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro InterPro InterPro Consortium Consortium of 11 major signature databases Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro What value are signatures? • Better at finding proteins with common function Find more distant homologues than BLAST Introduction to InterPro http://www.ebi.ac.uk/interpro What value are signatures? • Better at finding proteins with common function • Classification of proteins Associate proteins that share: Introduction to InterPro http://www.ebi.ac.uk/interpro Function Domains Sequence Structure What value are signatures? • Better at finding proteins with common function • Classification of proteins • Annotation of protein sequences Define conserved regions of a protein - e.g. location and type of domains key structural or functional sites Introduction to InterPro http://www.ebi.ac.uk/interpro Protein Signature Methods Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro How are protein signatures made? Protein family/domain Build model Multiple sequence alignment Search Refine Significant matches Protein signature Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro ITWKGPVCGLDGKTYRNECALL E-value 1e-49 AVPRSPVCGSDDVTYANECELK E-value 3e-42 SVPRSPVCGSDGVTYGTECDLK E-value 5e-39 HPPPGPVCGTDGLTYDNRCELR E-value 6e-10 Types of Protein signatures (sequence based) Multiple protein alignment Introduction to InterPro http://www.ebi.ac.uk/interpro Types of Protein signatures (sequence based) Single motif methods Regular expression patterns C - C - {P} - x(2) - C - [STDNEKPI] - C Introduction to InterPro http://www.ebi.ac.uk/interpro Types of Protein signatures (sequence based) Single motif methods Regular expression patterns Must be this x = any AA ( ) = number of AAs C - C - {P} - x(2) - C - [STDNEKPI] - C { } = cannot be.. Introduction to InterPro http://www.ebi.ac.uk/interpro [ ] = any of Types of Protein signatures (sequence based) Single motif methods Regular expression patterns 1 2 3 Multiple motif methods Identity matrices Fingerprints Introduction to InterPro http://www.ebi.ac.uk/interpro Types of Protein signatures (sequence based) Single motif methods Regular expression patterns Full domain alignment methods Profiles (Profile Library) Identity matrices Fingerprints I2 I1 Multiple motif methods M1 Introduction to InterPro http://www.ebi.ac.uk/interpro I3 M2 M3 D2 D3 M4 Hidden Markov Models Mathematical model of amino acid probability CONTRIBUTING MEMBER DATA BASES Models built on either sequence or structural alignments Each MDB has its own focus Hidden Markov Models FingerPrints Structural Domains Profiles Protein features (active sites…) Functional annotation of families/domains 16 Introduction to InterPro http://www.ebi.ac.uk/interpro Patterns Sequence Clusters Prediction of conserved domains Database Basis Institution Built from Focus URL Family & Domain based on conserved sequence http://pfam.sanger.ac.u k/ Pfam HMM Sanger Institute Sequence alignment Gene3D HMM UCL Structure alignment Structural Domain http://gene3d.biochem. ucl.ac.uk/Gene3D/ Evolutionary domain relationships http://supfam.cs.bris.ac .uk/SUPERFAMILY/ Superfamily HMM Uni. of Bristol Structure alignment SMART HMM EMBL Heidelberg Sequence alignment Functional domain annotation http://smart.emblheidelberg.de/ J. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http://www.jcvi.org/cm s/research/projects/tigrf ams/overview/ Family functional classification http://www.pantherdb.o rg/ TIGRFAM HMM Panther HMM Uni. S. California Sequence alignment PIRSF HMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification http://pir.georgetown.e du/pirwww/dbinfo/pirsf. shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http://www.bioinf.manc hester.ac.uk/dbbrowser/ PRINTS/index.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http://expasy.org/prosit e/ Sequence alignment Microbial protein family classification http://expasy.org/sprot /hamap/ HAMAP Introduction to InterPro Profiles SIB http://www.ebi.ac.uk/interpro A Closer look at InterPro Introduction to InterPro Introduction to InterPro http://www.ebi.ac.uk/interpro Foundations of InterPro Integration of signatures InterPro Manual curation Master headline Introduction to InterPro http://www.ebi.ac.uk/interpro InterPro Curation Priniciples -To represent MDBs signatures as closely as possible to what they intended -To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping -To provide as much information to the end user as possible about the signature by annotating signatuires and providing links to other databases. Master headline Introduction to InterPro http://www.ebi.ac.uk/interpro InterPro Entry Links related signatures Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Master headline Introduction to InterPro http://www.ebi.ac.uk/interpro Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) * * SMART (100) Protein kinase (75) Serine kinase PFAM (100) Protein kinase PFAM PROSITE (25) Tyrosine kinase PFAM Protein kinase SMART PROSITE Serine kinase SMART Parent Tyrosine kinase PROSITE Children No proteins in common Master headline Introduction to InterPro http://www.ebi.ac.uk/interpro Applies to domains and families The InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein PTM Active Site Binding Site Introduction to InterPro Master headline http://www.ebi.ac.uk/interpro Conserved Site Searching InterPro protein ID Paste in unknown sequence Introduction to InterPro http://www.ebi.ac.uk/interpro InterPro Search Results Family Link to PDBe Domains and sites Unintegrated signatures Structural data Introduction to InterPro http://www.ebi.ac.uk/interpro Link to InterPro entry Links to signature databases Introduction to InterPro http://www.ebi.ac.uk/interpro https://www.ebi.ac.uk/Tools/pfa/iprscan/ Select member databases Introduction to InterPro http://www.ebi.ac.uk/interpro Caveats InterPro entries are based on signatures supplied to us by our member databases •....this means no signature, no entry! We need your feedback! missing/additional references reporting problems requests Introduction to InterPro http://www.ebi.ac.uk/interpro ACKNOWLEDGEMENTS InterPro Team: Sarah Hunter Phil Jones Siew-Yit Yong Alex Mitchell Amaia Sangrador Craig McAnulla Matthew Maxim Sebastien Fraser Scheremetje Pesseat w Introduction to InterPro http://www.ebi.ac.uk/interpro