threading

advertisement
1
Protein Structure: Threading
Park, Jong Hwa
MRC-DUNN
Hills Road Cambridge
CB2 2XY
England
Bioinformatics in Biosophy
:
Next
02/06/2001
1. Threading
In 1994, the protein structure predictors gathered in
Asilomar California saw that THREADING method
was very promising.
Since then, it became a widely used method for very
distant Protein structure detection (often not found
by sensitive sequence search method)
What is Threading?
• Sequence search  Sequence  Sequence
• Threading  Sequence  Structure
Threading is a protein fold recognition
technique which uses a library of protein
structures to align with a query sequence
to rank the library structures using some
energy functions. The functions can be
diverse and incooporates 3D information.
Threading
Threading
The main idea of threading is to see how happy
each amino acid residue in the template
structure in terms of energy.
If the residues in the templates are very happy,
the scores will be high  Correct fold for the
query sequence.
Threading methods: 1D-3D profile
• Bowie et al. (1991) described each position of a
protein as being in one of eighteen environments.
Other researches have developed similar methods
e.g. (Ouzounis et al., 1993; Yi & Lander, 1994). The
environments in these methods are characterized
by properties such as exposed atomic areas and
type of residue-residue contacts.
• The principle of 1D-3D profiles are as follows:
1. Reduction of the three-dimensional structure to a onedimensional string of residue environments. Bowie defined these
environments by measuring the area of the side chain that is
buried in the protein, the fraction of the side chain area that is
exposed to polar atoms, and the local secondary structure.
• 2. A scoring matrix is generated from the probabilities of finding
each of the twenty amino acids in each of the environment
classes as observed in a database of known structures and related
sequences.
• 3. Generation of a position-dependent comparison matrix known
as the 3D profile, i.e. defining the probability to find a certain
aminoacid in a certain position of a given protein.
• 4. Alignment of a sequence with the 3D profile. The resulting
alignment score is a measure of the compatibility of the sequence
with the structure described by the 3D profile.
Threading Methods:
•
•
•
•
•
•
•
•
•
•
3D-pssm (ICNET). Based on sequence profiles, solvatation potentials and secondary
structure.
TOPITS (PredictProtein server). Based on coincidence of secondary structure and
accesibility.
UCLA-DOE Structure Prediction Server (UCLA). Executes various threading
programs and report a consensus.
123D+. Combines substitution matrix, secondary structure prediction, and contact
capacity potentials.
SAM/HMM (UCSC). Basen on Markov models of alignments of crystalized proteins.
FAS (Burnham Institute). Based on profile-profile matching algorithms of the query
sequence with sequences from clustered PDB database.
PSIPRED-GenThreader (Brunel)
FUGUE: Profile library search against the HOMSTRAD homologous structure
alignment database (Cambridge Univ.). Structural environment-specific substitution
tables and structure-dependent gap penalties.
THREADER2(Brunel). Based on solvatation potentials and contacts obtained from
crystalized proteins.
ProFIT CAME (Salzburg).
Threader: David Jones
• Firstly, a library of unique protein folds is derived from
the database of protein structures. Each fold is
considered as a chain tracing through space; the original
sequence being ignored completely.
• The test sequence is then optimally fitted to each library
fold (allowing for relative insertions and deletions in loop
regions), with the 'energy' of each possible fit (or
threading) being calculated by summing the proposed
pairwise interactions.
• The library of folds is then ranked in ascending order of
total energy, with the lowest energy fold being taken as
the most probable match.
Threader Steps
Threader output
Jones, D. T., Taylor, W. R. and Thornton, J. M., ``A new approach to protein fold recognition,'' Nature, vol. 358 (1992), 86-89.
6 Jones, D. T., Taylor, W. R., and Thornton, J. M., ``The rapid generation of mutation data matrices from protein sequences,'' Computer Applications in
the Biological Sciences (CABIOS) vol. 8 (1992), pp. 275-282.
GeneThreader
Threading can be used for structural assignment of
whole genomes.
Mycoplasma Genitalium genome.
Topits
Threading  Still bad alignment?
Genome Scale structure assignment
Genome level assignment is important for
structural genomics.
1. Profiles assignment
2. Intermediate sequence library assignment
3. Hidden Markov Model assignment
4. Threading based assignment
Profit: from Manfred Sippl group
• Mean Pair Potential :
• knowledge-based force fields in which
energy potentials are derived for atomic
interactions between amino acid residue
pairs as a function of the distance between
all atoms.
Ab initio
Uses physical energy functions to predict the short and
long range interactions between amino acid residues in
proteins => Massive Calculation.
By adding secondary structure information and using
small fragments found in PDB database, it is possible
to predict structures resonably well (as in CASP3).
The Challenge: finding a fast way to calculate all the
possible conformations
Baker -> Mini threading : Match small fold fragments in
PDB and assemble them up.
Levitt -> Sampling of plausible structure from randomly
produced folds -> Evaluate -> Build
Ab initio
Skolnick-> tetiary contacts: integrate MSA restraints
info to tertiary strucutre -> assemble
Avbelj -> Hierachical condensation : minimization of
free energy (Monte Carlo)
Osgoodthorpe -> some complicated equations -> sec.
str. info -> tertiary structure.
Limitations: too slow and simulation can not be
applied to more complicated problems including
(modification of proteins and multidomain proteins
CASP
• What is Casp?
– Critical Assessment of Techniques for Protein
Structure Prediction
– A community wide competition and conference for
assessing the techniques of protein structure
prediction
– Bianual
– Since 1994 (the last was CASP4)
Summary of CASP
• Casp1 (1994): Uncertainty
• Casp2 (1996): Confidence
• Casp3 (1998): Real Progress from Sequence
Search/Alignment
• Casp4 (2000): Mini-Threading really takes
off?
"Everything is beating its purpose of existence,
including Science."
• Casp is not an exception.
• It is a passtionate gathering of religious egos.
• However, it is not boring and you do see some
visible progresses.
• The categories
–
–
–
–
1.Comparative modelling
2.Threading (fold recognition)
3.Ab initio
4.Docking
CASP
• Casp 1: (the real winner was multiple sequence
iterative search)
Sequence based multiple sequence iterative search
(Intermediate Sequence Search) using HMM
and other methods. (Nobody know what was
going on)
Threading was regarded as promising as it could
detect homology beyond sequence search level
(Not really true): People liked this approach! The
real information in threading came from NNN
• Casp 2: (the real winner was Natural Neural
Network)
Natural Neural Network based fold
recognition with a good template library
shown to be successful. People did not
like it that much as the NNN was not very
artificial.
Casp3: The real winner was PSI-Blast.
• Targets become more difficult (for Threading and Ab
initio).
• Virtually all groups could perform better (WHY??)
PSI-BLAST based alignment and search
Larger template library due to larger PDB.
A progress in Ab Initio using simpler energy terms
A progress in Threading using smaller fragment for
building topology (mini-threading) .
A progress in Secondary Structure prediction based on
more templates and better multiple
sequence search algorithm.
However, nothing essentially new happened.
Casp4:
• Targets will be just as difficult as CASP3
• David Baker group’s mini-threading
algorithm was good.
• NNN by Alexey Murzin performed well.
• Not much improvement in multiple
sequence search algorithms
• A focus on large scale automatic methods
(CAFASP)
PDB_ISL : fast and reliable
structural assignment
using
intermediate sequence library (ISL)
Introduction for PDB_ISL
The Bioinformatics Space for a protein
family containing A and B structures.
The transitivity in homology
• An Intermediate can link distant A and B, IF
homology in Biology is transitive.
Methods
(1) Use of strucural classification info.
to assess homology detection algorithm.
(2) Building Intermdiate sequence library
(PDB-ISL)
The use of Structural Classification
ISS procedure
ISS
Testing PDB_ISL
For the protein sequences of known structure,
• The evolutionary relationships are apparent from
structure even when they have diverged beyond
the point where they can be recognised from
sequence comparison.
• How well do PDB_ISL recognise the
relationship of proteins known from structure?
PDB_ISL performance
Practical: Assign some protein structures using PDB_ISL
http://stash.mrc-lmb.cam.ac.uk/PDB_ISL/
The sequences must be important and not
easily found by NCBI PSI-BLAST!
Use the project sequences.
or use UCP3_HUMAN
UCP3_HUMAN
mvglkpsdvp
lqiqgenqav
lvaglqrqms
lagcttgama
gtmdayrtia
dilkeklldy
vktrymnspp
lrlgswnvvm
ptmavkflga
qtarlvqyrg
fasiriglyd
vtcaqptdvv
reegvrglwk
hlltdnfpch
gqyfspldcm
fvtyeqlkra
gtaacfadlv
vlgtiltmvr
svkqvytpkg
kvrfqasihl
gtlpnimrna
fvsafgagfc
ikmvaqegpt
lmkvqmlres
tfpldtakvr
tegpcspyng
adnsslttri
gpsrsdrkys
ivncaevvty
atvvaspvdv
afykgftpsf
pf
What is the structure of this sequence??
The steps :
• 1. Read the GENBANK and Swissprot texts to
know more about the protein itself.
– What is it? What does it do?
• 2. Do PSI-BLAST or any sensitive sequence
search.
• 3. Do Secondary Structure prediction
• 4. Do Transmembrane prediction
– Hydrophobicity regions?
– Accessibility?
• 5. Do Threading:
–
–
–
–
3DPSSM
PSIPRED (Threader)
SAM T99 (any sensitive HMM)
PredictProtein server
• 6. Do Ab initio prediction
– Make your own.
– Look at the secondary structures and fold the
protein in your head.
– Just take a pick?
• 7. If you are really desperate, use X-ray
crystallography
What can we learn from the text?
It is mitochondrial: So, it is likely to have some signal peptide
and likely to be membrane protein.
Run transmembrane prediction.
1: P55916
MITOCHONDRIAL UNCOUPLING PROTEIN 3 (UCP 3)
BLink, PubMed, Related Sequences, Taxonomy, OMIM, LinkOut
LOCUS
DEFINITION
ACCESSION
PID
VERSION
DBSOURCE
UCP3_HUMAN
312 aa
PRI
01-OCT-2000
MITOCHONDRIAL UNCOUPLING PROTEIN 3 (UCP 3).
P55916
g2497983
P55916 GI:2497983
swissprot: locus UCP3_HUMAN, accession P55916;
class: standard.
extra accessions:O60475,created: Nov 1, 1997.
sequence updated: Nov 1, 1997.
annotation updated: Oct 1, 2000.
xrefs: gi: gi: 2183020, gi: gi: 2183021, gi: gi: 2183017, gi: gi:
2183018, gi: gi: 2198812, gi: gi: 2198813, gi: gi: 2440012, gi: gi:
2440013, gi: gi: 2522401, gi: gi: 2522403, gi: gi: 2522396, gi: gi:
2522397, gi: gi: 2522398, gi: gi: 2522399, gi: gi: 2522400, gi: gi:
3176758, gi: gi: 3176760, gi: gi: 3176756, gi: gi: 3176757
xrefs (non-sequence databases): MIM 602044, InterPro IPR002030,
InterPro IPR001993, Pfam PF00153, PRINTS PR00784, PROSITE PS00215
KEYWORDS
SOURCE
ORGANISM
REFERENCE
AUTHORS
TITLE
JOURNAL
MEDLINE
REMARK
REFERENCE
AUTHORS
TITLE
JOURNAL
MEDLINE
REMARK
REFERENCE
AUTHORS
TITLE
JOURNAL
MEDLINE
REMARK
REFERENCE
AUTHORS
Mitochondrion; Inner membrane; Repeat; Transmembrane; Transport;
Alternative splicing; Disease mutation; Diabetes.
human.
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
1 (residues 1 to 312)
Boss,O., Samec,S., Paoloni-Giacobino,A., Rossier,C., Dulloo,A.,
Seydoux,J., Muzzin,P. and Giacobino,J.P.
Uncoupling protein-3: a new member of the mitochondrial carrier
family with tissue-specific expression
FEBS Lett. 408 (1), 39-42 (1997)
97324095
SEQUENCE FROM N.A.
TISSUE=Skeletal muscle
2 (residues 1 to 312)
Solanes,G., Vidal-Puig,A., Grujic,D., Flier,J.S. and Lowell,B.B.
The human uncoupling protein-3 gene. Genomic structure, chromosomal
localization, and genetic basis for short and long form transcripts
J. Biol. Chem. 272 (41), 25433-25436 (1997)
97467322
SEQUENCE FROM N.A.
3 (residues 1 to 312)
Gong,D.W., He,Y., Karas,M. and Reitman,M.
Uncoupling protein-3 is a mediator of thermogenesis regulated by
thyroid hormone, beta3-adrenergic agonists, and leptin
J. Biol. Chem. 272 (39), 24129-24132 (1997)
97450925
SEQUENCE FROM N.A.
4 (residues 1 to 312)
Urhammer,S.A., Dalgaard,L.T., Sorensen,T.I., Tybjaerg-Hansen,A.,
Echwald,S.M., Andersen,T., Clausen,J.O. and Pedersen,O.
TITLE
JOURNAL
MEDLINE
REMARK
REFERENCE
AUTHORS
TITLE
JOURNAL
MEDLINE
REMARK
REFERENCE
AUTHORS
TITLE
JOURNAL
REMARK
Organisation of the coding exons and mutational screening of the
uncoupling protein 3 gene in subjects with juvenile-onset obesity
Diabetologia 41 (2), 241-244 (1998)
98158426
SEQUENCE FROM N.A.
5 (residues 1 to 312)
Argyropoulos,G., Brown,A.M., Willi,S.M., Zhu,J., He,Y., Reitman,M.,
Gevao,S.M., Spruill,I. and Garvey,W.T.
Effects of mutations in the human uncoupling protein 3 gene on the
respiratory quotient and fat oxidation in severe obesity and type 2
diabetes
J. Clin. Invest. 102 (7), 1345-1351 (1998)
98443224
VARIANT OBESITY ILE-102.
6 (residues 1 to 312)
Brown,A.M., Willi,S.M., Argyropoulos,G. and Garvey,W.T.
A novel missense mutation, R70W, in the human uncoupling protein 3
gene in a family with type 2 diabetes
Hum. Mutat. 13, 506-506 (1999)
VARIANT OBESITY TRP-70.
COMMENT
------------------------------------------------------------------This SWISS-PROT entry is copyright. It is produced through a
collaboration between the Swiss Institute of Bioinformatics and
the EMBL outstation - the European Bioinformatics Institute.
The original entry is available from http://www.expasy.ch/sprot
and http://www.ebi.ac.uk/sprot
------------------------------------------------------------------.
[FUNCTION] UCP ARE MITOCHONDRIAL TRANSPORTER PROTEINS THAT CREATE
PROTON LEAKS ACROSS THE INNER MITOCHONDRIAL MEMBRANE, THUS
UNCOUPLING OXYDATIVE PHOSPHORYLATION. AS A RESULT, ENERGY IS
DISSIPATED IN THE FORM OF HEAT. MAY PLAY A ROLE IN THE MODULATION
OF TISSUE RESPIRATORY CONTROL. PARTICIPATES IN THERMOGENESIS AND
ENERGY BALANCE.
[SUBCELLULAR LOCATION] INTEGRAL MEMBRANE PROTEIN. MITOCHONDRIAL
INNER MEMBRANE (BY SIMILARITY).
[ALTERNATIVE PRODUCTS] 2 ISOFORMS; UCP3L (SHOWN HERE) AND UCP3S;
ARE PRODUCED BY ALTERNATIVE SPLICING.
[TISSUE SPECIFICITY] ONLY IN SKELETAL MUSCLE AND HEART. IS MORE
EXPRESSED IN GLYCOLYTIC THAN IN OXIDATIVE SKELETAL MUSCLES.
[DISEASE] DEFECTS IN UCP3 COULD BE INVOLVED IN SEVERE OBESITY.
[SIMILARITY] BELONGS TO THE MITOCHONDRIAL CARRIER FAMILY.
FEATURES
Location/Qualifiers
source
1..312
/organism="Homo sapiens"
/db_xref="taxon:9606"
1..312
Protein
1..312
/product="MITOCHONDRIAL UNCOUPLING PROTEIN 3"
Region
11..32
/region_name="Transmembrane region"
/note="POTENTIAL."
Region
70
/region_name="Variant"
/note="R -> W (IN SEVERE OBESITY WITH TYPE 2 DIABETES).
/FTId=VAR_004407."
Region
Region
Region
Region
Region
Region
Region
Region
Region
77..99
/region_name="Transmembrane region"
/note="POTENTIAL."
102
/region_name="Variant"
/note="V -> I (IN OBESITY). /FTId=VAR_004408."
120..136
/region_name="Transmembrane region"
/note="POTENTIAL."
184..200
/region_name="Transmembrane region"
/note="POTENTIAL."
193..194
/region_name="Conflict"
/note="NC -> KS (IN REF. 4)."
218..237
/region_name="Transmembrane region"
/note="POTENTIAL."
272..294
/region_name="Transmembrane region"
/note="POTENTIAL."
276..312
/region_name="Splicing variant"
/note="MISSING (IN ISOFORM UCP3S)."
279..301
/region_name="Domain"
/note="PURINE NUCLEOTIDE BINDING (BY SIMILARITY)."
PDB_ISL result
•
•
•
•
•
•
•
•
Z-scor E-value
105
1.6
104
2.3
98
7.4
98
7.5
99
7.7
102
7.9
99
7.9
100
9.2
SeqID From To
0.282 79
211
0.250 153 289
0.225 131 285
0.221 111 273
0.266 107 232
0.208 73
273
0.328 84
196
0.216 116 268
Query
From
Q714112_79-211 11
Q714112_153-289 1
Q714112_131-285 27
Q714112_111-273 8
Q714112_107-232 76
Q714112_73-273 140
Q714112_84-196 16
Q714112_116-268 118
To
144
147
179
169
217
339
127
268
InterSeq and SCOP superfamily
E1259894_7-154_1ldm_d1ldm_1
Q9Y1U1_157-330_1bdm_d1bdma2
AAF31952_1-209_1gc1_d1gc1g_
Q9WLI5_1-210_1gc1_d1gc1g_
AAF40853_82-331_1uag_d1uag_3
Q71144_61-456_1gc1_d1gc1g_
O25284_1-256_1dd8_d1dd8a1
YFD0_YEAST_11-348_1bjn_d1bjna_
Download