Segmentation and Interpretation of 3D Protein Images

From: ISMB-94 Proceedings. Copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Segmentation
and Interpretation
of 3D Protein
Images
Laurence
Leherte
*
Laboratoire
de Physico-Chimie
Informatique
Facultes Universitaires
Notre-Dame de la Paix
Rue de Bruxelles,
61, B-5000 Namur, Belgium
leherte~scf.fundp.ac.be
fax: 32-81-72.45.30
Kim Baxter
Janice
Glasgow
Suzanne
Fortier
Departments
of Computing and Information
Science and Chemistry
Queen’s University
Kingston,
Ontario,
Canada K7L 3N6
baxter(~qucis.queensu.ca;
janice@qucis.queensu.ca;
fortiers@qucdn.queensu.ca
fax: (613) 545-6513
Abstract
The segmentation and interpretation
of threedimensionalimages of proteins is considered. A topological approachis used to represent a protein structure as a spanningtree of critical points, whereeach
critical point corresponds to a residue or the connectivity betweenresidues. The critical points are subsequently analyzed to recognize secondary structure
motifs within the protein. Results of applying the approach to ideal and experimental images of proteins
at mediumresolution are presented.
Introduction
Moderncrystallographic studies are at the forefront of
current efforts to characterize and understand molecular structures and molecular recognition processes.
The information derived from such studies provides
a precise and detailed depiction of a molecular scene,
an essential starting point for unraveling the complex
rules of structural organization and molecular interactions in biological systems. However, despite recent
*The authors thank C.K. Johnson for sharing the ORCRIT program and for many helpful discussions. They
also thank M. Fraser for providing experimental data. The
research described in this paper has been supported by
the Natural Science and Engineering Council of Canada
(NSERC),the National Belgiam Foundation for Scientific
Research (FNRS), IBMBelgium, Facultes Universitaires
Notre-Damede la Palx, the NATO
Scientific Division and
the CambridgeCrystallographic Data Center.
technological advances, protein structure determination remains a lengthy and complex task. As a result,
only a small fraction of the currently knownproteins
have been fully characterized.
The determination of crystal structures from their
diffraction data belongs to the general class of image
reconstruction exercises from incomplete and/or noisy
data. In the case of protein structures, a major hurdle in the image reconstruction process is the so-called
"phase problem~, i.e., the extraction of phase information from the measured experimental data. Current
solutions to this problem rely on gathering extensive
experimental data and on considerable input from experts during the image interpretation process.
The goal of the research described in this paper is to
facilitate the image reconstruction processes for protein crystals. Towardsthis goal, techniques from artificial intelligence, machine vision and crystallography
are being integrated in a computational approach to
the interpretation of electron density maps of proteins.
Crucial to this interpretation process is the ability to
locate and identify meaningful features of a protein
structure at multiple levels of resolution. This requires
a simplified representation of a structure, one that preserves shape, connectivity and distance information.
In the proposed approach, molecular scenes are represented as three-dimensional (3D) spanning trees, where
nodes of the tree correspond to critical points (peaks
and passes) in the image data. The methodology is
currently being applied to electron density mapsof proLeherte
261
reins at medium(~ 3 A) resolution. For such images,
the critical points correspond to amino acid residues
(peaks) and their adjacency in the primary sequence
(passes). Initial results suggest that at mediumresolution the electron density maps can successfully be
segmented into protein and solvent regions, main and
side chains and into individual residues along the main
chain. Furthermore, algorithms have been developed
to analyze the spanning trees so as to determine secondary structure motifs in the molecule.
The paper presents an overview of the protein structure determination problem in the context of scene
analysis in machine vision. The processes of segmentation and recognition of secondary structure motifs
in spanning tree representations of proteins are also
described, along with some preliminary experimental
results. The paper concludes with a discussion of ongoing research in the area.
Analysis
of Visual
Scenes
Research in machine vision has long been concerned
with the problems involved in automatic image interpretation. Marr (1982) defines computational vision
as "the process of discovering what is present in the
world, and where it is’. Similar to visual scene analysis, molecular scene analysis is conccrned with the
processes of reconstruction, classification and understanding of complex images. Such analyses rely on the
availability of a priori information, in the form of structural templates and in the form of rules and heuristics,
to locate and identify features in a scene. This section
presents the problem of molecular scene analysis in the
context of related research in machine vision.
Early vision systems consist of a set of processes that
determine physical properties of three-dimensional surfaces from two-dimensional (2D) arrays. These arrays
contain pixel values that denote properties such as light
intensity, tissue density, depth, etc. Unlike input for
the vision problem, the crystallographic experiment
can yield 3D data which allow for the construction of
a 3D array of voxels (volume elements). Each voxcl
contains a value representing the height of the electron
density distribution function at the given location.
A 3D image of the atomic arrangement in a crystal
is readily accessible for small molecules from data generated using X-ray diffraction techniques. Given the
magnitudes of the diffracted waves and prior knowledge about the physical behavior of electron density
distributions, probability theory is applied to retrieve
phase information. Once magnitudes and phases are
known, the spatial arrangement of the atoms within
the crystal can be obtained using a Fourier transform
procedure. The function that is obtained, p(r), is
scalar field visualized as a 3Dgrid of real values (electron density map) in which high density centers are
associated with atoms.
For proteins, the construction of a 3D image from
the diffraction data is much more complex and time262
ISMB-94
consuming. It usually involves many iterations
of
calculation, map interpretation
and model building,
which rely extensively on input from an expert. It
has been suggested, however, that the process could be
significantly enhanced by combining mathematical and
AI strategies, and rephrased as a hierarchical and iterative scene analysis exercise (Fortier e$ al. 1993). The
goal of the exercise would be to reconstruct and interpret images of progressively higher resolution. Thus,
in an initial low resolution map, where the protein appears as a simple object outlined by its molecular envelope, the goal would be to locate and identify protein and solvent regions. At mediumresolution, where
the protein appears as a more complex object, the
goal would be to locate and identify main and side
chains, recognize secondary structure motifs and possibly locate individual residues along the main chain.
At higher resolution, the azlalysis wouldattend to the
identification of amino acid residues and, possibly, the
location and identification of individual atoms.
A primary step in low level scene analysis is to automatically partition (segment) an image into disjoint regions that can be given a symbolic description. Ideally,
each region will correspond to a semantically meaningfail componentor object of the scene. These parts can
be used as input to a high level recognition task. The
nature of the partition and symbolic description depend on the type of processing to be applied. When
model-based recognition is the next step in the analysis, the description should be in a form that is easily
comparable with models in the database. The quality of the final output is dependent on the quality of
the segmentation. Although these processes may appear sequential - first segmentation then recognition
- in practice they are often interdependent. General
purpose, domain independent segmentation techniques
may be a necessary first step, but domain knowledge,
in the form of a partial interpretation, is often useful
for assessing and guiding further segmentation.
Several approaches to image segmentation have been
considered in the vision literature. 1 Thresholding has
proven effective for separating a small number of objects from a contrasting background, while edge detection has been used to separate regions by locating
differences between the regions. One operator used in
the latter approach is the zero-crossing of the second
derivative (Marr 1982). In region eztraction, segmentation is carried out by determining similarity within
a region. Typically, a seed region is chosen, and then
expanded by adding adjoining similar regions. Topological approaches have been used to provide initial estimates for segmentation in range images and in some
medical applications.
Besl and Jain (1986) apply
topological approach which evaluates the surface curvature and sign of the Gaussian for each point on the
surface of range images, and uses the derived primi1See (Arman&: Aggarwal1993; Pal & Pal 1993) for
detailed overviewof these approaches.
tives (peak, pit, ridge, saddle ridge,...) to perform the
initial splitting. Gauchand Pizer (1993) identify ridges
and valley bottoms in 2D images (a ridge is defined as
point where intensity falls off sharply in two directions,
a valley bottom is a point where the intensity increases
sharply in two directions) and follow their behaviour
through scale space. As the resolution is reduced with
Gaussian blurring, ridges and valleys are annihilated;
the resulting hierarchy can be used for several analysis
tasks including segmentation.
As will be discussed in the next section, a topological approach is being used in the segmentation and
recognition of molecular scenes. Similar to the approach of Gauchand Pizer, critical points are used to
delineate a skeletal image of a protein and segment it
into meaningful parts (secondary structure, residues,
atoms, etc.). These critical points are analyzed (using domainrules) to aid in the recognition of the segmented parts. This approach has some similarity with
the skeletonization method which has been described
by Hilditch (1969) , and applied in protein crystallography by Greer (1974) . However, unlike Greer’s
algorithm, which "thins" an electron density map to a
set of connected points that trace the main and secondary chains of the molecule, the proposed representation preserves the original volumetric shape information by retaining the curvatures of electron density at
the critical points. A methodology for outlining the
envelope of a protein molecule in its crystallographic
environment has previously been proposed by Wang
(1985), while Jones et al. (1991) have achieved significant advances in approaches for the interpretation of
mediumto high resolution protein maps.
In summary, the analysis of molecular scenes caal
be considered in the general class of scene analysis
problems. However, the representation,
segmentation
and recognition of molecular images differ from vision
applications in a number of ways. Most significantly,
diffraction data are often 3D in nature, which simplifies
or eliminates many of the problems faced in low level
vision (e.g., occlusion, shading). The complexity that
does exist in the crystallographic domainrelates to the
incompleteness of data due to the phase problem.
Analysis
of Electron
Density
Maps
In the development of a computational methodology
for the analysis of protein structures, methods from
machine vision and crystallography were considered.
Amongthe methods studied, the topological approach
seemed the most natural way to catch the fluctuations
of the density function p(r) in the molecular image.
In this section we overview a methodology that transforms a three-dimensional electron density map into
a spanning tree of critical points that trace the main
chain of the protein structure. Experimental results
from applying the approach to the segmentation and
interpretation of mediumresolution maps are also presented.
Representation
of Protein
Structures
The topological approach to protein image interpretation is based on the representation of a scene in terms
of the critical points of the electron density function,
i.e., the points where the gradient of p(r) vanishes.
At such points, maxima and minima are defined by
computing second derivatives which adopt negative or
positive values respectively. For a 3D function, three
principal second derivatives, or eigenvalues, are computed at each position vector r. Four possible cases
are considered depending upon the number of negative eigenvalues, hE. WhennE ---- 3, the critical point
corresponds to a local maximum
or peat,’, a point where
nE = 2 is a saddle point or pass. nE = 1 corresponds to a saddle point or pale, while nE = 0 characterizes a pit. The use of critical point mapping as
a method for analyzing protein electron density maps
was first proposed by Johnson (1977b) and later used
in Crysalis (Terry 1983), an expert system designed for
the automated interpretation of high resolution protein
electron density maps. Within the framework of the
Molecular Scene Analysis project (Fortier et al. 1993;
Glasgow, Fortier, & Allen 1993), the topological approach is being extended for the analysis of medium
and low resolution maps of proteins.
Topological analysis has been implemented by Johnson in the computer program ORCRIT (Johnson
1977a). By first locating and then connecting the critical points, this programgenerates a graph representation for an electron density map of a protein. The occurrence probability of a connection between two critical points i and j is determinedby following the density
gradient vector Vp(r). For each pair of critical points,
the programcalculates a weight wij, which is inversely
proportional to the occurrence probability of the connection. The collection of critical points and their linkage is represented as a set of minimal spanning trees
(connected acyclic graphs of minimal weight). In the
earlier Crysalis project, the ORCRIT
program was used
to segment a high resolution electron density map into
critical points where peaks correspond to the location
of atomic parts and passes correspond to the bonds
between atoms. More recently,
we have determined
that at mediumresolution peaks correspond to amino
acid residues along the main chain of the protein and
passes to the connectivity determined by the primary
structure of the protein (Leherte et al. 1994). As illustrated in Figure 1, the topological approach produces
a skeleton of a protein backboneas a sequence of alternating peaks (solid circles) and passes (open circles),
where each peak is associated with one residue of the
protein. For larger residues, the side chains are also
included in the tree.
It should be noted that the electron density distribution function is a smooth function with no sudden
changes. Its zero-crossings are detected by characterizing points where the gradient, or the first-derivative, of
the function vanishes. The second derivatives provide
Leherte
263
ASP3
TYR4
THR5
GLU102
CYS103
(pass) (peak)
CYS103
CYS6
CYS6
SER8
GLY7
THR104
Figure i: Planar representation of the critical
information on the characteristics of the zero-crossings
and, in particular, identify whether they are peaks,
passes, pales or pits. In 2D images, such as those considered by Marr (1982), sudden changes in intensity
are present. They give rise to a peak or a pit in the
first derivative and, therefore, the contours of images
are detected at points where the second derivative vanishes.
Segmentation
of Electron
Density
Maps
This section presents experimental studies that have
been carried out on electron density maps at 3 ~ resolution. Computations were first performed on calculated maps reconstructed from available structural
data in order to generate a procedure for the further
analysis of experimental maps. Three protcin structures, Phospholipase A2 (1BP2), Ribonuclease T1 complex (1RNT) and Trypsin inhibitor (4PTI), retrieved
from the Brookhaven Protein Databank (PDB) (Bernstein et al. 1977), were considered. These structures
are composedof 123, 104 and 53 residues, respectively.
The electron density maps for the proteins were constructed using the XTALprogram package (Hall
Stewart 1990), and were then analyzed using ORCRIT.
High density peaks and passes were the only critical points considered in this study. Lowdensity critical points are less significant since the electron density
distribution is modulated by either experimental noise
and/or errors due to the fast Fourier transform protess. In addition, the analysis levels somelow density
264 ISMB..-94
point spanning tree for protein structure 1RNT.
points (those with negative values) to zero. High density peaks and passes were considered by imposing a
cut-off value below which the critical point search pro~
cedure is not applied.
The results obtained from the analysis of the three
calculated density maps led to tile following observations:
¯ The main branch of the spanning tree traces out the
backbone of the protein molecule.
¯ Each peak of the main branch of the tree is associated with a single residue of the primary sequence
for the protein. Furthermore, the peaks are located
close to the C~COcentres of charge for the residue.
¯ Side chains are often observable, particularly for the
larger residues. These chains are represented as side
branches that link to the main branch of the spanning tree.
The result of applying the ORCRITprogram is thus a
partitioning of the electron density mapinto two main
regions: the protein region represented by a chain of
connected critical points, and a solvent region which is
characterized by low density values and non-connected
critical points.
As was illustrated in Figure 1, the ideal critical point
represention of a protein at medium(3 /~) resolution
can be depicted as a tree composedof a long principal branch built on alternating peaks and passes with
2Further details of the experimentalprocess are reported
in (Leherteet al. 1994).
small side branches jutting out of it. In practice, however, such a representation may include some errors
originating from the presence of connections between
critical points associated with non-adjacent residues.
Figure 2 presents a comparison between the backbone
of a protein structure and its critical point representation. In the constructed main chain of the spanning
tree, jumps or bridges occur because of the presence
of disulfide bridges (S-S), heteroatoms (CA++), or
bonds between close residues. These connections can
often be detected by applying further analysis to the
critical points.
Neglecting the passes located between peaks, geometrical parameters were computed for short fragments composed of four adjacent peaks in the main
branch of the spanning trees. Before achieving this geometrical analysis, some preprocessing work was done
in order to fit the spanning trees to the ideal model
described above. Symmetrycoincident critical points
were removed. Distances were computed for sets of adjacent peaks, and peaks separated by a distance smaller
than 1.95 /~. were merged into a single point. The
critical point linkage was then checked: if two adjacent peaks were separated by a distance of < 7 ~ then
the peaks were assumed to be connected. C~onsidering
three peaks at a time, if the distance between the first
and third peak was larger than 4 .~, then the middle
peak was considered to be a side chain peak.
A statistical
analysis of the geometry of critical
points sequences further showed that the most useful parameters for the identification of helices and extended motifs (B-sheet segments) were the torsion angles and the distances between peaks Pi and Pi+3, while
bond angle values were less discriminating. In the next
section we discuss how these criteria were used to determine secondary structure motifs in a protein.
Secondary
Structure
Recognition
From our experiments on ideal electron density maps
of proteins it was concluded that the topological approach was able to segment the protein structure into
main and side chains and capture the conformation of
its main chain. The recognition of secondary structure features from mediumresolution electron density
maps could thus be based on pattern matching of the
critical point networks with templates of critical point
networks for idealized secondary structure motifs. A
set of IF-THENrules were derived that compare the
angles and distances for an uninterpreted spanning tree
of critical points with those derived from the previously
determined maps. Table 1 summarizes the geometrical
parameters that form the basis of these rules applied
for the classification of protein segments.
The parameters in Table 1 provide a basis for the
calculation of measures that represent the quality of fit
between a critical point segment and a helical or sheet
motif. Twodegrees of belief for a critical point c can be
calculated: the degree of belief that critical point c be-
Geometrical paraaneter
Helix
B sheet
Torsion angle (degrees)
30-90
Ii10-1s01
Distance 1-4 (/~)
4.4-6.3
> 6.9
Bond angle (degrees)
60-110
> 90
Table 1. Ranges of angle and distance values considered for the identification of secondary structure
motifs in critical point spanningtrees.
longs to a j3 sheet (ribs(c)), and the degree of belief that
c belongs to a helix (dbh(c)). A degree of belief falls
between the values 0 and 100, where the larger values
denote greater confidence in the classification. Anideal
sequence of critical points (either depicting a helix or
B-sheet segment) would be characterized by a sequence
of large belief measures (dbh or dbs respectively). For
example,a sequence < 26, 53, 80,100,..., 100, 80, 53, 26>
of dbh 3values would denote an idealized helix.
An additional test, which considers the environment
of the extended system, can be applied to the recognition of/.?-sheets. Each pair of segments is considered
in this test. Whena pair of segments is characterized
by at least three pairs of peaks having interdistances
ranging between4 and 7/~, then the level of belief that
they are parts of a B-sheet is increased.
The results of the application of the above rules to
the protein 1BP2are illustrated in Figure 3. In this figure, the horizontal axis represents the peak sequences
aligned next to each other, while the degree of belief values resulting from the application of the helix
(dbh) and B-sheet (dbs) segment recognition rules are
reported along the vertical axis. Figure 3 demonstrates
that, when a peak sequence effectively corresponds to
an existing helix (H), the rules yield degrees of belief
larger than 46. The locations corresponding to turns
(T) have lower confidence levels. All B-sheet segments
found to be parallel (S) were effectively associated with
a/J-sheet.
Application
to
Penicillopepsin
an Experimental
Map of
The proposed approach has also been applied to an
experimental map of penicillopepsin,
which was calculated using the Groningen BIOMOL
crystallographic
program package. In a previous paper (Leherte et al.
1994) we reported results that were derived by considering the peaks as independent objects, i.e., the success
rate of the recognition method was estimated by considering percentages of correctly identified peak. The
results indicated that 82%of the identified peaks were
correctly recognized. However, secondary structure
motifs are built on sequences of residues (or peaks).
It is thus important to also consider the degree of be3Note that the maximum
degree of belief for end points
of a segmentis less than 100. This is becausethere are fewer
torsion angles that can be measuredto raise the confidence.
Leherte
265
I
;,t H-B(
-
S-S
CA.4.,...
Figure 2: Perspective view depicting the superimposition of tile Co chain for protein structure 1BP2(solid line)
the main chain of the corresponding spamfing tree (dashed line).
lief associated with the neighbouring peaks. Using the
previously described geometry-based rules, 59 probable/3-sheet segments were obtained. The results were
gronped into five classes, based on various combinations of the following conditions:
¯ cl : the segment under study has at least,
value equal to 100;
one dbs
¯ c2 : the segment, is parallel to another one; and
¯ e3 : the sequence of the dbs values follows the values
reported in the previous section.
Table 2 summarizes the experimental results for the
five classes considered.
The first class (clA c2 A c3) reported on in Table
2 yielded accurate results. All the segments are effectively associated with a real ’3-sheet. However,it
is observed that the percentage of correctly identified
peak (success rate) is not 100%. This is due to the
fact that most of the recognized fl-sheet segments are
usually shifted by one residue with respect to the definition given in the PDBfilefi It is thus concluded
that at mediumresolution the results of a fully successful recognition procedure lead to a success rate of
about 90% when compared to secondary structure assignmentresults obtained at high resolution.
5The shift of the segment by one residue is due to the
ambiguity in recognizing the extremity points: they maybc
part of two different possible secondarystruclure motifs.
266 ISMB--94
In the second class, only one segment does not correspond to a real fl-sheet segment. Its maximumdbs
value is equal to 53. The success rate of this class is
still impressive (86%), but is characterized by a large
variation. Not surprisingly, the first two classes, which
correspond to the highest degrees of belief, also involve
the longest segments. The results worsen when either
condition e2 or condition c3 is not observed. The most
important or crucial condition appears to be the parallelisrn criteria. Effectively, in the third class where
parallelism alone is considered, 67%of the peaks are
correctly identified. Unfortunately, only three occurrences were observed for this class, so the statistics
arc not as reliable as for the previous classes. For the
fourth class, in which condition c3 alone is considered,
only 57%of the peaks are correctly identified. In the
last class, the three segments which are really parts of
a ’3-sheet contain only two peaks. They are at the origin of the 30% success rate. Such small segments do
not. occur in any of the other four classes.
Due to their low occurrence frequency and size in
the penicillopepsin structure, the numberof recognized
helical motifs was quite low. The application of the
recognition procedure led to the identification of 8 helical segments, all characterized by dbh values less than
100. However, as concluded from the analysis of reconstructed maps, the 6 segments having a dbh value
larger than 46 reflected the presence of a real helix.
100
--dbh
|,
l,
¯i i¯
..... dbs
i ’
¯ ’ i
i
. i
i
[t
.R
’. , i
, ii
, i
0
i
0
,
50
75
100
125
Peak #
Figure 3: Helix degree of belief (dbh) and d-sheet degree of belief (dbs) calculated from the application of the secondary structure recognition rules to the peak sequences obtained from the topological analysis of the reconstructed
maps of protein 1BP2 at 3/t resolution using ORCRIT.
In this experiment, 5 of the 32/3-sheet segments and
5 of the 10 helices were not discovered. The presence of
jumps and breaks in the critical point sequences were
responsible for the non-detection of 4 motifs (2 sheet
segments and 2 helices), and 6 motifs (3 sheet segments
and 3 helices), respectively.
Discussion
It was reported in this paper that the topological approach can effectively segment mediumresolution electron density maps of proteins. Furthermore, it was
shown that secondary structure motifs could be recognized in the map through the use of simple geometrybased rules. The application of these rules yields a
measure of the degree of confidence in the recognition of a given motif. This is important since the
proposed methodology can serve, not only as an aid
to expert crystallographers in their interpretation and
model building tasks, but also more actively in the
structure determination process. Required levels of
confidence would clearly depend on the use that is
madeof the results of the topological analysis.
The work described here is now being extended for
applications in both lower (5 /~) and higher (2.7
resolution maps. In particular, experiments are being
conducted at low resolution to assess the usefulness of
the topological approach to the definition of the protein
envelope. At higher resolution, the goal is to determine
the direction of the main chain and to attend to the
identification of individual residues.
In addition, the tree construction algorithm of ORCRIT is being altered to output multiple plausible
skeletons. Additional methods for evaluating these
skeletons are also being considered. One promising approach borrows from research in protein structure prediction and, in particular, from its formulation as an inverse folding problem (Lathrop & Smith
1994). Given an amino acid sequence and a set of
core segments (pieces of secondary structure forming
the tightly packed internal protein core), this approach
evaluates each possible alignment (threading) of the sequence onto possible core templates. The problem of
identifying individual residues in a critical point map
constructed at mediumto high resolution can be addressed in a similar manner, i.e., by threading a sequence onto a core structure. However, the problem
is simpler than in protein structure prediction since it
is reduced to threading a sequence onto its own experimentally determined structure, rather than onto
templates retrieved from a library of possible models.
In the threading approach proposed by Lathrop and
Smith, a scoring function is used. This function considers the sum of singleton terms, which depends only
on the threading of single core segments, and the sum
Leherte
267
Class
Total number
of segments
Mean
max. dbs
Mean
length
(Ns)
clAc2Ac3
--, cl Ac2 Ac3
cl Ac2A~ c3
~clA,,,c2Ac3
~ cl A ~ c2 A ~ c3
ii
14
4
ii
19
i00
64+15
555=4
62126
505=23
95=2
65=1
5-t-1
55=2
45=2
# of segments
associated with
+
a real B-sheet
11
13
3
7
3
%of correctly
identified peaks
(calc. over Ns)
90-4-8
86-t-21
67-4-27
57+43"
30-t-36"
4Table2. Classification of the recog, ized fl-sheet segmentsof penicillipcpsin.
of pairwise interactions between neighboring core elements. (These functions represent the amino acids
statistical preference for certain environments.) Additions to the scoring function, such as statistical bulk
properties, are being considered to take full "advantage
of the information provided by ORCRIT.
A long-term goal of our research in molecular scene
analysis is to develop a computational methodology
that can aid in the reconstruction of protein structures from their initial low resolution electron density
maps so as to resolve the map until a high-resolution
fully interpreted image emerges. The topological approach presented here is an important component of
this methodology. Further research is required, however, to extend it to low and high resolution maps, and
to incorporate more domain knowledge into the analyses.
References
Arman, F., and Aggarwal, J. 1993. Model-based
object recognition in dense-range images - a review.
A CMComputing Survery 25(1):5-43.
Bernstein, F. C.; Koctzle, T. F.; Williams, J. B.;
Meyer Jr., E. F.; Brice, M. D.; Rodgers, J. R.; Kennard, O.; Shimanouchi, T.; and Tasumi, M. 1977. The
Protein Data Bank: A computer-based archival file
for macromolecular structures. J. Mol. Biol. 112:535
542.
Besl, P., and 3ain, R. 1986. Invariant surface characteristics for 3d object recognition in range images.
CVGIP 33:33-80.
Fortier, S.; Castleden, I.; Glasgow, J.; Conklin, D.;
Walmsley,C.; Leherte, L.; and Allen, F. 1993. Moleculac scene analysis: The integration of direct methods
and artificial intelligence strategies for solving 1)rotein
crystal structures. Acta Crystallographica D1.
Gauch, J., and Pizer, S. 1993. Multiresolution analysis of ridges and valleys in grey-scale images. IEEE
Transactions on Pattern Analysis a ad MachineIntelligence PAMI-15(6):635-646.
Glasgow, J.; Fortier, S.; and Allen, F. 1993. Molecular scene analysis: crystal structure determination
through imagery. In Hunter, L., ed., Ariificial Intelligence and Molecular Biology. AAAI Press.
268
ISMB-94
Greer, J. 1974. Three-dimensional pattern recognition: an approach to automated interpretation
of
electron density maps of proteins. Journal of Molecular Biology 82:279-301.
Hall, S. R., and Stewart, J. M., eds. 1990. XTAL3.0
User’s Manual.
Hilditch, C. 1969. Linear skeletons from square cupboards. Machine Intelligence 4:403-420.
Johnson, C. K. 1977a. ORCR]T.the Oak Ridge critical point network program. Technical report, Chemistry Division, Oak Ridge National Laboratory, USA.
Johnson, C. 1977b. Peaks, passes, pales and pits:
a tour through the critical points of interest in density maps. In Proceedings of the AmericanCrystallographic Association Meeting. Abstract JQ6.
Jones, T.; Zou, J.; Cowan, S.; and Kjeldgaard, M.
1991. Improved methods for building protein models
in electron-density mapsand the location of errors in
those models. Acta Crystallographica A47:110-119.
Lathrop, R., and Smith, T. 1994. A branch-andbound algorithm for optimal protein threading with
pairwise (contact potential) amino acid interactions.
In Proceedings of the 27th Hawaii lnler,,ational Conference on System Science.
Leherte, L.; Fortier, S.; Glasgow, J.; and Alien,
F. 1994. Molecular scene analysis:
A topological approach to the automated interpretation of prorein electron density maps. Acla Crystallographica D
D50:155-.166.
Mart, D. 1982. Vision. W.H. Freeman and Company:
San Francisco.
Pal, N., and Pal, S. 1993. A review on image segmentation techniques. Pattern Recognition 26(9):12771294.
Terry, A. 1983. The Crysalis Projecl: Hierarchical Control of Production Syslems. Ph.D. Dissertation, Stanford Heuristic ProgrammingProject, Staalford University, California, USA.
Wang, B. 1985. Resolution of phase ambiguity
in macromolecular crystallography. In Wyckoff, H.;
Hirs, C.; and Timasheff, S., eds., Diffraction Methods
for Biological Macromolccules. Academic Press, New
York.