1 Protein Structure: Threading Park, Jong Hwa MRC-DUNN Hills Road Cambridge CB2 2XY England Bioinformatics in Biosophy : Next 02/06/2001 1. Threading In 1994, the protein structure predictors gathered in Asilomar California saw that THREADING method was very promising. Since then, it became a widely used method for very distant Protein structure detection (often not found by sensitive sequence search method) What is Threading? • Sequence search Sequence Sequence • Threading Sequence Structure Threading is a protein fold recognition technique which uses a library of protein structures to align with a query sequence to rank the library structures using some energy functions. The functions can be diverse and incooporates 3D information. Threading Threading The main idea of threading is to see how happy each amino acid residue in the template structure in terms of energy. If the residues in the templates are very happy, the scores will be high Correct fold for the query sequence. Threading methods: 1D-3D profile • Bowie et al. (1991) described each position of a protein as being in one of eighteen environments. Other researches have developed similar methods e.g. (Ouzounis et al., 1993; Yi & Lander, 1994). The environments in these methods are characterized by properties such as exposed atomic areas and type of residue-residue contacts. • The principle of 1D-3D profiles are as follows: 1. Reduction of the three-dimensional structure to a onedimensional string of residue environments. Bowie defined these environments by measuring the area of the side chain that is buried in the protein, the fraction of the side chain area that is exposed to polar atoms, and the local secondary structure. • 2. A scoring matrix is generated from the probabilities of finding each of the twenty amino acids in each of the environment classes as observed in a database of known structures and related sequences. • 3. Generation of a position-dependent comparison matrix known as the 3D profile, i.e. defining the probability to find a certain aminoacid in a certain position of a given protein. • 4. Alignment of a sequence with the 3D profile. The resulting alignment score is a measure of the compatibility of the sequence with the structure described by the 3D profile. Threading Methods: • • • • • • • • • • 3D-pssm (ICNET). Based on sequence profiles, solvatation potentials and secondary structure. TOPITS (PredictProtein server). Based on coincidence of secondary structure and accesibility. UCLA-DOE Structure Prediction Server (UCLA). Executes various threading programs and report a consensus. 123D+. Combines substitution matrix, secondary structure prediction, and contact capacity potentials. SAM/HMM (UCSC). Basen on Markov models of alignments of crystalized proteins. FAS (Burnham Institute). Based on profile-profile matching algorithms of the query sequence with sequences from clustered PDB database. PSIPRED-GenThreader (Brunel) FUGUE: Profile library search against the HOMSTRAD homologous structure alignment database (Cambridge Univ.). Structural environment-specific substitution tables and structure-dependent gap penalties. THREADER2(Brunel). Based on solvatation potentials and contacts obtained from crystalized proteins. ProFIT CAME (Salzburg). Threader: David Jones • Firstly, a library of unique protein folds is derived from the database of protein structures. Each fold is considered as a chain tracing through space; the original sequence being ignored completely. • The test sequence is then optimally fitted to each library fold (allowing for relative insertions and deletions in loop regions), with the 'energy' of each possible fit (or threading) being calculated by summing the proposed pairwise interactions. • The library of folds is then ranked in ascending order of total energy, with the lowest energy fold being taken as the most probable match. Threader Steps Threader output Jones, D. T., Taylor, W. R. and Thornton, J. M., ``A new approach to protein fold recognition,'' Nature, vol. 358 (1992), 86-89. 6 Jones, D. T., Taylor, W. R., and Thornton, J. M., ``The rapid generation of mutation data matrices from protein sequences,'' Computer Applications in the Biological Sciences (CABIOS) vol. 8 (1992), pp. 275-282. GeneThreader Threading can be used for structural assignment of whole genomes. Mycoplasma Genitalium genome. Topits Threading Still bad alignment? Genome Scale structure assignment Genome level assignment is important for structural genomics. 1. Profiles assignment 2. Intermediate sequence library assignment 3. Hidden Markov Model assignment 4. Threading based assignment Profit: from Manfred Sippl group • Mean Pair Potential : • knowledge-based force fields in which energy potentials are derived for atomic interactions between amino acid residue pairs as a function of the distance between all atoms. Ab initio Uses physical energy functions to predict the short and long range interactions between amino acid residues in proteins => Massive Calculation. By adding secondary structure information and using small fragments found in PDB database, it is possible to predict structures resonably well (as in CASP3). The Challenge: finding a fast way to calculate all the possible conformations Baker -> Mini threading : Match small fold fragments in PDB and assemble them up. Levitt -> Sampling of plausible structure from randomly produced folds -> Evaluate -> Build Ab initio Skolnick-> tetiary contacts: integrate MSA restraints info to tertiary strucutre -> assemble Avbelj -> Hierachical condensation : minimization of free energy (Monte Carlo) Osgoodthorpe -> some complicated equations -> sec. str. info -> tertiary structure. Limitations: too slow and simulation can not be applied to more complicated problems including (modification of proteins and multidomain proteins CASP • What is Casp? – Critical Assessment of Techniques for Protein Structure Prediction – A community wide competition and conference for assessing the techniques of protein structure prediction – Bianual – Since 1994 (the last was CASP4) Summary of CASP • Casp1 (1994): Uncertainty • Casp2 (1996): Confidence • Casp3 (1998): Real Progress from Sequence Search/Alignment • Casp4 (2000): Mini-Threading really takes off? "Everything is beating its purpose of existence, including Science." • Casp is not an exception. • It is a passtionate gathering of religious egos. • However, it is not boring and you do see some visible progresses. • The categories – – – – 1.Comparative modelling 2.Threading (fold recognition) 3.Ab initio 4.Docking CASP • Casp 1: (the real winner was multiple sequence iterative search) Sequence based multiple sequence iterative search (Intermediate Sequence Search) using HMM and other methods. (Nobody know what was going on) Threading was regarded as promising as it could detect homology beyond sequence search level (Not really true): People liked this approach! The real information in threading came from NNN • Casp 2: (the real winner was Natural Neural Network) Natural Neural Network based fold recognition with a good template library shown to be successful. People did not like it that much as the NNN was not very artificial. Casp3: The real winner was PSI-Blast. • Targets become more difficult (for Threading and Ab initio). • Virtually all groups could perform better (WHY??) PSI-BLAST based alignment and search Larger template library due to larger PDB. A progress in Ab Initio using simpler energy terms A progress in Threading using smaller fragment for building topology (mini-threading) . A progress in Secondary Structure prediction based on more templates and better multiple sequence search algorithm. However, nothing essentially new happened. Casp4: • Targets will be just as difficult as CASP3 • David Baker group’s mini-threading algorithm was good. • NNN by Alexey Murzin performed well. • Not much improvement in multiple sequence search algorithms • A focus on large scale automatic methods (CAFASP) PDB_ISL : fast and reliable structural assignment using intermediate sequence library (ISL) Introduction for PDB_ISL The Bioinformatics Space for a protein family containing A and B structures. The transitivity in homology • An Intermediate can link distant A and B, IF homology in Biology is transitive. Methods (1) Use of strucural classification info. to assess homology detection algorithm. (2) Building Intermdiate sequence library (PDB-ISL) The use of Structural Classification ISS procedure ISS Testing PDB_ISL For the protein sequences of known structure, • The evolutionary relationships are apparent from structure even when they have diverged beyond the point where they can be recognised from sequence comparison. • How well do PDB_ISL recognise the relationship of proteins known from structure? PDB_ISL performance Practical: Assign some protein structures using PDB_ISL http://stash.mrc-lmb.cam.ac.uk/PDB_ISL/ The sequences must be important and not easily found by NCBI PSI-BLAST! Use the project sequences. or use UCP3_HUMAN UCP3_HUMAN mvglkpsdvp lqiqgenqav lvaglqrqms lagcttgama gtmdayrtia dilkeklldy vktrymnspp lrlgswnvvm ptmavkflga qtarlvqyrg fasiriglyd vtcaqptdvv reegvrglwk hlltdnfpch gqyfspldcm fvtyeqlkra gtaacfadlv vlgtiltmvr svkqvytpkg kvrfqasihl gtlpnimrna fvsafgagfc ikmvaqegpt lmkvqmlres tfpldtakvr tegpcspyng adnsslttri gpsrsdrkys ivncaevvty atvvaspvdv afykgftpsf pf What is the structure of this sequence?? The steps : • 1. Read the GENBANK and Swissprot texts to know more about the protein itself. – What is it? What does it do? • 2. Do PSI-BLAST or any sensitive sequence search. • 3. Do Secondary Structure prediction • 4. Do Transmembrane prediction – Hydrophobicity regions? – Accessibility? • 5. Do Threading: – – – – 3DPSSM PSIPRED (Threader) SAM T99 (any sensitive HMM) PredictProtein server • 6. Do Ab initio prediction – Make your own. – Look at the secondary structures and fold the protein in your head. – Just take a pick? • 7. If you are really desperate, use X-ray crystallography What can we learn from the text? It is mitochondrial: So, it is likely to have some signal peptide and likely to be membrane protein. Run transmembrane prediction. 1: P55916 MITOCHONDRIAL UNCOUPLING PROTEIN 3 (UCP 3) BLink, PubMed, Related Sequences, Taxonomy, OMIM, LinkOut LOCUS DEFINITION ACCESSION PID VERSION DBSOURCE UCP3_HUMAN 312 aa PRI 01-OCT-2000 MITOCHONDRIAL UNCOUPLING PROTEIN 3 (UCP 3). P55916 g2497983 P55916 GI:2497983 swissprot: locus UCP3_HUMAN, accession P55916; class: standard. extra accessions:O60475,created: Nov 1, 1997. sequence updated: Nov 1, 1997. annotation updated: Oct 1, 2000. xrefs: gi: gi: 2183020, gi: gi: 2183021, gi: gi: 2183017, gi: gi: 2183018, gi: gi: 2198812, gi: gi: 2198813, gi: gi: 2440012, gi: gi: 2440013, gi: gi: 2522401, gi: gi: 2522403, gi: gi: 2522396, gi: gi: 2522397, gi: gi: 2522398, gi: gi: 2522399, gi: gi: 2522400, gi: gi: 3176758, gi: gi: 3176760, gi: gi: 3176756, gi: gi: 3176757 xrefs (non-sequence databases): MIM 602044, InterPro IPR002030, InterPro IPR001993, Pfam PF00153, PRINTS PR00784, PROSITE PS00215 KEYWORDS SOURCE ORGANISM REFERENCE AUTHORS TITLE JOURNAL MEDLINE REMARK REFERENCE AUTHORS TITLE JOURNAL MEDLINE REMARK REFERENCE AUTHORS TITLE JOURNAL MEDLINE REMARK REFERENCE AUTHORS Mitochondrion; Inner membrane; Repeat; Transmembrane; Transport; Alternative splicing; Disease mutation; Diabetes. human. Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 1 (residues 1 to 312) Boss,O., Samec,S., Paoloni-Giacobino,A., Rossier,C., Dulloo,A., Seydoux,J., Muzzin,P. and Giacobino,J.P. Uncoupling protein-3: a new member of the mitochondrial carrier family with tissue-specific expression FEBS Lett. 408 (1), 39-42 (1997) 97324095 SEQUENCE FROM N.A. TISSUE=Skeletal muscle 2 (residues 1 to 312) Solanes,G., Vidal-Puig,A., Grujic,D., Flier,J.S. and Lowell,B.B. The human uncoupling protein-3 gene. Genomic structure, chromosomal localization, and genetic basis for short and long form transcripts J. Biol. Chem. 272 (41), 25433-25436 (1997) 97467322 SEQUENCE FROM N.A. 3 (residues 1 to 312) Gong,D.W., He,Y., Karas,M. and Reitman,M. Uncoupling protein-3 is a mediator of thermogenesis regulated by thyroid hormone, beta3-adrenergic agonists, and leptin J. Biol. Chem. 272 (39), 24129-24132 (1997) 97450925 SEQUENCE FROM N.A. 4 (residues 1 to 312) Urhammer,S.A., Dalgaard,L.T., Sorensen,T.I., Tybjaerg-Hansen,A., Echwald,S.M., Andersen,T., Clausen,J.O. and Pedersen,O. TITLE JOURNAL MEDLINE REMARK REFERENCE AUTHORS TITLE JOURNAL MEDLINE REMARK REFERENCE AUTHORS TITLE JOURNAL REMARK Organisation of the coding exons and mutational screening of the uncoupling protein 3 gene in subjects with juvenile-onset obesity Diabetologia 41 (2), 241-244 (1998) 98158426 SEQUENCE FROM N.A. 5 (residues 1 to 312) Argyropoulos,G., Brown,A.M., Willi,S.M., Zhu,J., He,Y., Reitman,M., Gevao,S.M., Spruill,I. and Garvey,W.T. Effects of mutations in the human uncoupling protein 3 gene on the respiratory quotient and fat oxidation in severe obesity and type 2 diabetes J. Clin. Invest. 102 (7), 1345-1351 (1998) 98443224 VARIANT OBESITY ILE-102. 6 (residues 1 to 312) Brown,A.M., Willi,S.M., Argyropoulos,G. and Garvey,W.T. A novel missense mutation, R70W, in the human uncoupling protein 3 gene in a family with type 2 diabetes Hum. Mutat. 13, 506-506 (1999) VARIANT OBESITY TRP-70. COMMENT ------------------------------------------------------------------This SWISS-PROT entry is copyright. It is produced through a collaboration between the Swiss Institute of Bioinformatics and the EMBL outstation - the European Bioinformatics Institute. The original entry is available from http://www.expasy.ch/sprot and http://www.ebi.ac.uk/sprot ------------------------------------------------------------------. [FUNCTION] UCP ARE MITOCHONDRIAL TRANSPORTER PROTEINS THAT CREATE PROTON LEAKS ACROSS THE INNER MITOCHONDRIAL MEMBRANE, THUS UNCOUPLING OXYDATIVE PHOSPHORYLATION. AS A RESULT, ENERGY IS DISSIPATED IN THE FORM OF HEAT. MAY PLAY A ROLE IN THE MODULATION OF TISSUE RESPIRATORY CONTROL. PARTICIPATES IN THERMOGENESIS AND ENERGY BALANCE. [SUBCELLULAR LOCATION] INTEGRAL MEMBRANE PROTEIN. MITOCHONDRIAL INNER MEMBRANE (BY SIMILARITY). [ALTERNATIVE PRODUCTS] 2 ISOFORMS; UCP3L (SHOWN HERE) AND UCP3S; ARE PRODUCED BY ALTERNATIVE SPLICING. [TISSUE SPECIFICITY] ONLY IN SKELETAL MUSCLE AND HEART. IS MORE EXPRESSED IN GLYCOLYTIC THAN IN OXIDATIVE SKELETAL MUSCLES. [DISEASE] DEFECTS IN UCP3 COULD BE INVOLVED IN SEVERE OBESITY. [SIMILARITY] BELONGS TO THE MITOCHONDRIAL CARRIER FAMILY. FEATURES Location/Qualifiers source 1..312 /organism="Homo sapiens" /db_xref="taxon:9606" 1..312 Protein 1..312 /product="MITOCHONDRIAL UNCOUPLING PROTEIN 3" Region 11..32 /region_name="Transmembrane region" /note="POTENTIAL." Region 70 /region_name="Variant" /note="R -> W (IN SEVERE OBESITY WITH TYPE 2 DIABETES). /FTId=VAR_004407." Region Region Region Region Region Region Region Region Region 77..99 /region_name="Transmembrane region" /note="POTENTIAL." 102 /region_name="Variant" /note="V -> I (IN OBESITY). /FTId=VAR_004408." 120..136 /region_name="Transmembrane region" /note="POTENTIAL." 184..200 /region_name="Transmembrane region" /note="POTENTIAL." 193..194 /region_name="Conflict" /note="NC -> KS (IN REF. 4)." 218..237 /region_name="Transmembrane region" /note="POTENTIAL." 272..294 /region_name="Transmembrane region" /note="POTENTIAL." 276..312 /region_name="Splicing variant" /note="MISSING (IN ISOFORM UCP3S)." 279..301 /region_name="Domain" /note="PURINE NUCLEOTIDE BINDING (BY SIMILARITY)." PDB_ISL result • • • • • • • • Z-scor E-value 105 1.6 104 2.3 98 7.4 98 7.5 99 7.7 102 7.9 99 7.9 100 9.2 SeqID From To 0.282 79 211 0.250 153 289 0.225 131 285 0.221 111 273 0.266 107 232 0.208 73 273 0.328 84 196 0.216 116 268 Query From Q714112_79-211 11 Q714112_153-289 1 Q714112_131-285 27 Q714112_111-273 8 Q714112_107-232 76 Q714112_73-273 140 Q714112_84-196 16 Q714112_116-268 118 To 144 147 179 169 217 339 127 268 InterSeq and SCOP superfamily E1259894_7-154_1ldm_d1ldm_1 Q9Y1U1_157-330_1bdm_d1bdma2 AAF31952_1-209_1gc1_d1gc1g_ Q9WLI5_1-210_1gc1_d1gc1g_ AAF40853_82-331_1uag_d1uag_3 Q71144_61-456_1gc1_d1gc1g_ O25284_1-256_1dd8_d1dd8a1 YFD0_YEAST_11-348_1bjn_d1bjna_