Word file

advertisement
Supplementary Methods
Note on physical relevance of druggability equation parameters
The parameters we obtained for  ( ) and C using our calibration set have an
interesting physical interpretation which may warrant further study. Our  ( ) value of
45 cal/mol/Å2 is interestingly smaller than the 72 cal/mol/Å2 typically cited (reference 12
in article) but close to the value of 43 cal/mol/Å2 derived from measurements of
hydrocarbon partitioning by De Young and Dill (reference 15 in article). Use of C=0
implies fortuitous cancellation of energy terms, which is not uncommon but requires
further investigation.
Details of computational algorithms
Here we present further details of the algorithms used to compute the surface area
and curvature values required for the MAPPOD equation. The equation,
GMAPPOD   (r ) A
target
nonpolar
target
Adruglike
target
total
A
 C;  (r ) 
 ( )
1.4
1
r
,
has three parameters that are measured from the binding site: the total solvent accessible
target
target
surface area (SASA) represented by Atotal
, the non-polar SASA represented by Atotal
,
and the global curvature of the binding site represented by r. The remaining parameters
target
are defined as described in the main text ( Adruglike
=300Å2 , C=0, and  ( ) =45
cal/mol/Å2). The goal here is to detail the algorithms for defining the binding site, and
1
the subsequent computation of the surface areas and curvature. The process is depicted
in Figure S1.
2
Figure S1. Overview of computational approach. a, From the crystal structure and the
set of atoms defining the binding site, computational geometry algorithms are used to
define non-overlapping tetrahedra (shown in blue) for the binding site (see also Figure S2
for further details), from which an analytical representation of the surface is derived. The
analytical representation includes sphere sections and torus sections that represent the
molecular surface of the protein. b, From this surface, the total solvent-accessible surface
area (SASA) can be directly computed by summing up surface areas of all the sphere or
torus sections. The non-polar SASA is computed by defining surface patches as polar or
non-polar based on atom types and summing up the surface areas of the non-polar
patches. The outline represents the solvent accessible surface, while the colors depict
polar (green) or non-polar (brown) surfaces. c, The curvature is then measured by
finding the least-squares fitted sphere to the whole binding site. The yellow outline
represents the geometric representation of the binding site molecular surface, while the
orange sphere represents the least-squares fitted sphere to the binding site.
3
Computational representation of binding site.
Crystal structures were downloaded from the PDB, and inspected for completeness in
the binding site. All structures are atomic resolution (<2.5 Å, 1QMF is 2.8 Å) and have a
co-crystallized ligand (peptide, small molecule, or substrate). PDB structures were not
modified, although the programs we use to calculate surfaces areas and curvature ignore
all heteroatoms and hydrogens (as is customary for biological surface area calculations).
Ligand binding sites were further filled in using MOE SiteFinder alpha-spheres (version
2004.03; Chemical Computing Group, Montreal, Canada). Binding sites were defined by
atoms within 5.0 Å of the ligand or alpha spheres and trimmed at the edges to define a
contiguous binding site surface of approximately 300 Å2 of surface area.
Analytic representation of the macromolecular surface involves first constructing the
three-dimensional weighted Delaunay tessellation of the biomolecule and then
subtracting the alpha shape complex, as depicted in Figure S2 and described previously
(Liang et al. 1998; Edelsbrunner and Koehl 2003). We used the POCKET program
(Edelsbrunner and Koehl 2003) to generate the Delaunay and alpha shape complexes, and
Figure S2. Computational definition of binding site. The Delaunay tessellation of the
biomolecule is followed by definition of the alpha shape complex. Subtraction of the
alpha shape complex from the Delaunay tessellation results in identification of pockets.
The exact pocket of interest is identified by the list of binding site atoms.
4
we modified the program to output coordinates for all tetrahedra and alpha shapes.
Tetrahedra representing the protein pocket of interest are identified using the list of
binding site atoms, where tetrahedra are retained if all four vertices are defined in the
atom list. The largest set of connected non-alpha shape tetrahedra is retained as the
computational definition of the pocket. In the case shown in the right-most panel of
Figure S2, the process results in two tetrahedral clusters, and the smaller cluster (in
purple above the larger cluster) is removed. The main advantage of this approach for our
application is the accurate definition of a physically reasonable pocket surface. By
defining the pocket using the tetrahedra that fill the pocket, we only use the portions of
the surface that face into the pocket and directly participate in small-molecule binding.
Use of standard available software to perform a simple additive summation of the surface
areas of atoms in a binding site results in overestimation of binding surface areas by
about 40%, largely from contributions from the “lip” outer edge of the pocket, which
contributes minimally to binding affinity (A.C.C., R.G.C., unpublished observation). A
reasonable pocket definition is also essential to the curvature calculation detailed below.
Surface area calculation
Solvent-accessible surface areas are calculated analytically using the collection of
tetrahedra representing the binding site. The Delaunay tessellation generates nonoverlapping tetrahedra that divides up the space of the pocket. Because the four corners
of the tetrahedra represent atom centers and atoms are represented as spheres, surface
areas for each atom present in the tetrahedron can be calculated as portions of spheres.
For the purpose of calculating the hydrophobic surface area, we define carbon and sulfur
atoms as hydrophobic, and nitrogen and oxygen atoms as polar.
5
Because spheres are overlapping, calculation of surface areas involves an “inclusionexclusion” algorithm developed by Liang et al. (1998). Finding the solvent accessible
(SA) surface area within one tetrahedron involves:

adding the surface area of each SA sphere section that is within the tetrahedron,

subtracting the surface area of the overlap between pairs of spheres within the
tetrahedron, and

adding back the surface area of the overlap between sets of three spheres within
the tetrahedron.
If a set of four spheres overlap, there is no accessible surface area within the tetrahedra.
We have validated the surface areas calculated by our implementation with results from
NACCESS (Laskowski 1995) and POCKET (Edelsbrunner and Koehl 2003). Curves
representing boundaries between atoms are then calculated to allow reconstruction of the
surface. The solvent accessible surface is a union of sphere sections, while the molecular
surface additionally includes torus and sphere sections that map the reentrant surface (in
Figure S3, lines between the dark gray spheres). We do not calculate the molecular
surface area, but we need to generate the molecular surface for the curvature algorithm
detailed below.
Figure S3. Definition of solvent accessible surface,
molecular surface, and van der Waals surface. The two
light gray circles represent water probe spheres, and the
dark gray area represents the area occupied by protein
atoms.
6
Curvature algorithm
A natural way to measure protein surface curvature is to generate the least squares
fitted (LSF) sphere to a surface patch and use the radius as the curvature measure. While
the concept is simple, the sphere-fitting problem is not trivial and most known
approaches to protein surface curvature measurement use alternative approaches that are
arguably less straightforward in terms of a physical interpretation. We have previously
developed an approach to solve the LSF sphere problem by turning the sphere-fitting
problem into a solvable plane-fitting problem using a transformation known as geometric
inversion (Coleman et al. 2005). The approach works on any arbitrary surface patch, and
returns a radius of curvature that has direct physical interpretation. This radius of
curvature is the radius, r, used in the MAPPOD druggability equation.
Finding the best-fitted sphere to a surface requires simultaneous minimization of
distances and definition of a center given the restriction to a sphere. We use a
transformation known as geometric inversion in generating a least squares fit (LSF)
sphere to the binding site surface. An inversive sphere of radius k can be defined for
any inversive point ( p, q, r ) , and all other points ( xi , yi , z i ) can be transformed around
the inversive sphere as follows:
xi 
k 2 ( xi  p )
p
( xi  p ) 2  ( y i  q ) 2  ( z i  r ) 2
k 2 ( yi  q)
yi 
q
( xi  p ) 2  ( y i  q ) 2  ( z i  r ) 2
zi 
k 2 ( zi  r )
r
( xi  p ) 2  ( y i  q ) 2  ( z i  r ) 2
7
Because the inversive transformation is performed on points, a dot surface
representation of the molecular surface is required. A number of available programs can
be used to generate this dot surface, including the Connolly approach available from
biohedron.com (Connolly 1986) and one based on icosahedrons from the Honig lab
(Sridharan et al. 1992). Here, we painted each surface piece evenly with points using a
spiral dot placement algorithm and subsequently removed points outside the boundary
curves. These points can then be used to find the LSF sphere using the inversive
transformation. Depicted in Figure S4a and explained below is the two-dimensional case
of finding a least-squares fitted circle. The three-dimensional case that we care about for
fitting a binding site is depicted in Figure S4b, and involves an inversion sphere that is
used to transform points on the binding site surface to points that can be fitted to a plane
in inversive space.
Figure S4. Inversive transformation used to generate the least-squares fitted sphere to a
binding site. a, In the two-dimensional case (used for clarity), a circle, C, in normal
space becomes a line, L’, in inverted space because it passes through the inversion circle
center. The dashed line represents the diameter of the circle of interest, C, in normal
space, and is the distance between the inversion circle center and the furthest point on
circle C. b, For a binding site surface patch in red, the transformation using an inversion
sphere results in points that can be fitted to a plane in inverted space. Transforming the
LSF plane back to normal space results in the candidate LSF sphere solution shown.
8
Since the transformation takes the inversion point (p, q, r) to infinity, this point
must be treated in practice as a special case. For purposes of measuring curvature we can
ignore this point since it receives a zero weighting in the LSF fit (see below). We use a
unit inversion sphere (k=1) here. We make use of the geometric inversion property of
being self-dual, meaning that a point transformed twice returns to the same point.
Another property we take advantage of is that inversion of points that lie on a sphere that
passes through the inversion point results in a set of points that lies on a plane (depicted
as L’ for the two-dimensional case in Figure S4a). It follows that a set of points that lie
approximately on a sphere, where the LSF sphere passes through the inversion point, will
lie near a plane under inversion. We use inversion to find a LSF sphere, where the fit is
determined by the sum of the smallest distances from the ideal sphere to each data point.
The best-fit line to points in two dimensions, or the best-fit plane to points in three
dimensions can be determined by finding the smallest eigenvalue and corresponding
eigenvector of a symmetric, positive, semi-definite matrix (Pearson 1901). The inversive
transformation results in the closest point to the origin on the plane found being the
furthest point from the origin when inverted back to normal space. The origin and this
furthest point defines the diameter of the sphere since they both lie on the sphere
(depicted as a dashed line in Figure S4a). Because the inverted space shifts the spatial
4
relationships between points, we use a weight of d i for the plane LSF, where di is the
distance for each point, i , from the inversion point in normal space.
Making the reasonable assumption that the LSF sphere passes through at least one
of the data points, the set of surface points is then fitted around each surface point to
generate a set of possible solutions, and the fitted sphere with the least sum of squares is
9
kept as the best fitted sphere solution. The sphere radius is mathematically referred to as
the “radius of curvature”. To determine whether the surface is convex or concave, we
calculate the distance between the center of the LSF sphere and the relevant atom center,
and assign the surface as concave if the distance is less than the diameter of the LSF
sphere, and convex otherwise. The complete algorithm that includes the transformation,
plane fitting, and inversion about each point, is detailed in Listing 1 below, and further
details on the method and its validation are available in Coleman et al. (2005). All
algorithms described in this supplementary materials were implemented in Java, and use
Java3D libraries for vector math and JAMA libraries for solving eigenvalue problems.
The curvature used here is the “global” curvature, where a single sphere is fit to the
molecular surface. Our curvature calculation is intuitive but admittedly simple, and we
are investigating whether localized curvatures can improve the model.
1.
Define a set of points P to find the least sum of squares sphere to.
2.
For each point pi  P :
Let p i be the inversion point ( p, q, r ) and points {xi , yi , z i } be all
other points in P .
O (n)
Invert {xi , yi , z i } using the inversion defined in the methods to generate
points t i .
Find the least sum of squares plane fit to the points t i .
Find the point on the plane closest to p i . Call this point a .
Transform a using the inversion defined in the methods to generate a' .
Define the sphere center, c i , as the average of p i and a' .
Define the radius for the sphere given center c i .
If the least sum of squares is lower than the previous best fit, keep c i
and the radius.
Output the best found center and radius.
O (n)
a.
b.
c.
d.
e.
f.
g.
h.
3.
O(1)
O(1)
O(1)
O(1)
O(1)
O (n)
O (n)
Listing 1. Listing of the algorithm developed for finding the least squares-fitted sphere
to a set of points. Algorithmic complexity for each step is given on the right.
10
References
1. Coleman, R.G., Burr, M.A., Souvaine, D.L. & Cheng, A.C. An intuitive approach to
measuring protein surface curvature. Proteins 61, 1068–1074 (2005).
2. Connolly, M.L. Measurement of protein surface shape by solid angles. J. Mol.
Graphics 4, 3–6 (1986).
3. Edelsbrunner, H., Koehl, P. The weighted-volume derivative of a space-filling
diagram. Proc. Natl. Acad. Sci. U.S.A. 100, 2203–2208 (2003).
4. Laskowski, R.A. SURFNET: A program for visualizing molecular surfaces, cavities,
and intermolecular interactions. J. Mol. Graph. 13, 323–330 (1995).
5. Liang, J., Edelsbrunner, H., Fu, P., Sudhakar, P.V., Subramaniam, S. Analytical shape
computation of macromolecules: I. Molecular area and volume through alpha shape.
Proteins 33, 1–17 (1998).
6. Pearson, K. On lines and planes of closest fit to systems of points in space. The
Philosophical Magazine 2, 559–572 (1901).
7. Sridharan S, Nicholls A, Honig B. A new vertex algorithm to calculate solvent
accessible surface area. Biophys. J. 61, A174 (1992).
11
Download