Using CATH-Gene3D to study the evolution of your protein and find

advertisement
Using CATH-Gene3D to study the evolution of your protein and find its function
CATH-Gene3D classifies 3D-structures from the Protein Databank (PDB) into domain
superfamilies. CATH-Gene3D currently classifies ~120,000 domains structures (~80% of PDB
structures) into 2600 evolutionary superfamilies. By building HMMs to represent each family, we
can assign ~16 million domain sequences from ~2000 completed genomes and UniProt to these
superfamilies.
To understand the functions of superfamilies and how these evolved we have integrated functional
information from many public resources eg GO, EC, IntACT etc. Information on the structures,
sequences and functions observed in each superfamily, are presented on the web.
CATH-Gene3D is widely used with more than one million web pages access and 10,000 unique
visitors per month. In this technology track we will demo some valuable new features.
We have recently sub-classified the superfamilies into functional families (FunFams), in which
relatives have very similar functions. CATH-Gene3D now provides alignments of FunFams, and
shows highly conserved residue sites projected onto representative 3D structures. Our functional
families reveal shifts in catalytic residues and the emergence of varied protein interaction sites
across diverse superfamiles. They can be used to predict functions for new proteins and were found
to perform competitively in the CAFA function prediction held at ISMB last year.
Because structure tends to be much more highly conserved than sequence CATH-Gene3D
superfamilies trace further back in evolution. The emergence of novel functions within CATH
enzyme superfamilies is now tracked by FunTree, developed by Nick Furnham, EBI and powered
by CATH-Gene3D data and information on enzyme substrates and mechanisms from the group of
Janet Thornton, EBI. FunTree generates phylogenetic trees for each enzyme family and displays
information on structural variability across the CATH superfamily, linked to changes in substrates
and chemical mechanisms.
In this technology track we will:
 Briefly present the philosophy underpinning the classification of superfamilies and families.
 Demo the CATH-Gene3D website, highlighting special features eg for performing
comparative genomics, displaying protein functional networks.
 Demo web pages displaying alignments of functional subfamilies and the 3D viewer
illustrating conserved functional sites.
 Demo the FunTree pages displaying phylogenetic trees for enzyme families and showing the
emergence of novel functions in diverse superfamilies.
 Demo the CATH-Gene3D web-servers allowing you to search these resources with your
favourite proteins.
CATH-Gene3D is one of the 8 partners in the InterPro resource, which integrates domain and
protein family information. More recently we have been collaborating with 5 other groups
generating domain family classifications (SCOP) or domain structure predictions
(SUPERFAMILY, PHYRE, Genomic Threading Database, Fugue) to provide the Genome3D
portal.
Genome3D is providing consensus information on known and predicted structures in model
organisms. The project started January 2102 and the web portal will be launched in July 2012. We
will briefly demo the new Genome3D web site.
Download