Document

advertisement
A dimensionality reduction
approach to modeling
protein flexibility
By Miguel L. Teodoro, George N. Phillips J*
and Lydia E. Kavraki
Rice University and University of Wisconsin-Madison*
Presented by Zhang Jingbo
NUS CS5247
Outline
Motivation, Background and Our goal
 Protein flexibility
 The problems in current methods and the benefit
of our methods in this paper
 Dimensionality reduction techniques
 Obtaining conformational Data
 Application to Specific Systems
 Summary

NUS CS5247
2
Motivation

Introduce a method to obtain a reduced basis
representation of protein flexibility.
NUS CS5247
3
Background




Proteins are involved either directly or indirectly in all
biological processes in living organisms.
Conformational changes of proteins can critically affect
their ability to bind other molecules.
Any progress in modeling protein motion and flexibility
will contribute to the understanding of key biological
functions.
Today there is a large body of knowledge available on
protein structure and function and this amount of
information is expected to grow even faster in the future.
NUS CS5247
4
Our method and goal


Method:
A dimensionality reduction technique — Principal
Component Analysis
Goal:
1.
2.
Transform the original high dimensional representation
of protein motion into a lower dimensional
representation that captures the dominant modes of
motions of the protein.
Obtain conformations that have been observed in
laboratory experiments.
NUS CS5247
5
The focus of this paper

How to obtain a reduced representation of
protein flexibility from raw protein structural data
NUS CS5247
6
What is Protein flexibility ?
Definition: A crucial aspect of the relation
between protein structure and function.
 Proteins change their three-dimensional shapes
when binding or unbinding to other molecules.

NUS CS5247
7
NUS CS5247
8
NUS CS5247
9
Why we want to modeling protein flexibility?
 Several applications for our work:
1. Pharmaceutical drug development
2. To model conformational changes that occur during
protein-protein and protein-DNA/RNA interactions.
NUS CS5247
10
RII molecular
Backbone of the RII
"handshake" (donut dimer showing glycan
with two holes).
binding sites.
NUS CS5247
Models for the
binding of RII to the
glycophorin A
receptor on red blood
cells (erythrocytes).
11
The problems in current methods
The computational complexity of explicitly
modeling all the degrees of freedom of a protein
is too high.
 Modeling proteins as rigid structures limits the
effectiveness of currently used molecular
docking mithods.

NUS CS5247
12
The benefit of our method in this paper (1)
Using the approximation
 Make including protein flexibility in the drug
process a computationally efficient way.

NUS CS5247
13
Two most common structural biology
experimental methods in use today
Protein X-ray crystallography
 Nuclear magnetic resonance (NMR)


Limits:
NUS CS5247
14
An alternative to experimental methods

Computational methods based on classical or
quantum mechanics to approximate protein
flexibility.

Limits:
NUS CS5247
15
The benefit of our method in this paper (2)
Transform the basis of representation of
molecular motion.
 The new degrees of freedom will be linear
combinations of the original variables.
 Some degrees of freedom are significantly more
representative of protein flexibility than others.
 Consider only the most significant dof and the
transformed dof are collective motions affecting
the entire configuration of the protein.
 Some tradeoff between the loss of information
and effectively modeling protein flexibility in a
largely reduced dimensionality subspace.

NUS CS5247
16
What we acutually do in this paper?




Start from initial coordinate information from different
data sources
Apply the principal component analysis method of
dimensionality reduction.
Obtain a new structural representation using collective
degrees of freedom.
Here, we will focus on
a. the interpretation of the principal components as
biologically relevant motions
b. how combinations of a reduced number of these
motions can approximate alternative conformations of
the protein.
NUS CS5247
17
Dimensionality reduction techniques
Aim: find a mapping between the data in a space
and its subspace.
 Two methods:
a. Multidimensional scaling (MDS)
b. Principal component analysis (PCA)

Merits:
Limits:
NUS CS5247
18
PCA of conformational data

Merits: 1). the most established method
2). the most efficient algorithms
3). guaranteed convergence for computation
4). a upper bound on how much we can
reduce the representation of conformational
flexibility in proteins.
5). the principal components have a direct
physical interpretation.
6). can readily project the high dimensional
data to a low dimensional space and do it in
the inverse direction recovering a
representation of the original data with
minimal reconstruction error.
NUS CS5247
19
PCA of conformational data (continued)

Linear and non-linear
NUS CS5247
20
PCA of conformational data (continued)

Conformational Data
1. The input data for PCA: Several atomic
displacement vectors (3N dimension)
corresponding to different structural conformations,
which as the form
corresponds to Cartesian coordinate information
for the ith atom.
2. All atomic displacement vectors constitute the
conformational vector set.
NUS CS5247
21
Singular value decomposition (SVD)


We use the singular value decomposition (SVD) as an
efficient computational method to calculate the principal
components.
The SVD of a matrix, A, is defined as:
where U and V are orthonormal matrices and
is a nonnegative diagonal matrix whose diagonal elements
are the singular values of A.
the columns of matrices U and V are called the left and right
singular vectors, respectively.
the square of each singular value corresponds to the
variance of the data in A.
The SVD of matrix A was computed using the ARPACK library.
NUS CS5247
22
Obtaining conformational Data

The most common data sources:
1. experimental laboratory methods:
a. X-ray crystallography
b. NMR,
2. computational sampling methods based forcefield
such as molecular dynamics.
laboratory methods VS computational methods:
- laboratory methods generate less data
- computational methods have a lower accuracy.
NUS CS5247
23

Now, let’s see about
Application to Specific Systems
NUS CS5247
24
HIV-1 Protease
NUS CS5247
25
The advantages of using the PCA
methodology to analyze protein flexibility

Can be used at different levels of detail:
1. the overall motion of the backbone.
2. the simplified flexibility of the protein as a
whole.
3. include only the atoms that constitute the
binding site.
NUS CS5247
26
In the first experiment situation
NUS CS5247
27
The second situation

The advantage of PCA
NUS CS5247
28
The last situation

As a validation of our method.
NUS CS5247
29
Another application: Aldose Reduction
NUS CS5247
30
Summary
NUS CS5247
31
Thank
you
NUS CS5247
32
Download