Spring 2003 SCBMB Status Report for Daniel H. Morgan Part II

advertisement
Spring 2003 SCBMB Status Report
for Daniel H. Morgan
Part II: Evolutionary Trace Viewer in Java
PI:
Olivier Lichtarge
Committee:
Wah Chiu
Austin Cooney
Aleksander Milosavljevic
Timothy Palzkill
Evolutionary Trace Viewer in Java
Introduction
Advances in crystallization and sequencing technology have led to an explosion in
protein sequence and structure data. However, our understanding of the biological role of these
data remains limited. Our work has shown that with appropriate manipulation of these data,
functionally important residues in proteins can be identified based on specific evolutionary
patterns using the Evolutionary Trace (ET) method. These regions form statistically significant
clusters [1] in protein structures that overlap known functional sites [2-5], predict specific
determinants of function, and can have structural importance. Even when protein function has
been identified through conventional means, localization of functional sites is crucial to smart
drug design.
A common technique in identifying protein function is through the use of exhaustive
mutational screens. The ET method makes use of nature’s inherent mutational experiments to
identify important regions within a protein family. Amino acid positions that are conserved are
thought to represent functionally important regions within the structure of the protein [4]. ET
correlates variance with evolutionary divergence within protein families and assigns ranks to
each alignment position [2, 4]. When analyzing global sequence alignments, any position that is
invariant will have the highest rank while those positions that can accept any amino acid would
have the lowest rank. While this information is interesting, it has little practical value unless
mapped onto a protein structure that is representative of the family. ET converts the 2dimensional data that ranks each alignment position in the protein family to 3-dimensional data
that can be seen to cluster, indicating regions of functional importance.
One goal in bioinformatics is to collect, process and view large amounts of data in a
biologically relevant manner. ET can be used to collect and process appropriate data, however
its output can be somewhat difficult to analyze. Older forms of ET output RasMol scripts for
each rank and it was dependent upon the user to load and visualize each rank manually. Also,
each trace’s associated tree was in post script format and needed to be viewed separately and was
difficult to associate with respective ranks. Moreover, ET’s primary input, a multiple sequence
alignment, had to be viewed using other software in order to determine if the alignment was
satisfactory. While using multiple software platforms to view ET data is somewhat
cumbersome, it was adequate for expert users of the ET method. However, through recent
collaborations, we have increasingly had requests from non-experts to view the results of our
trace analysis. All of these factors contributed to our decision to develop user friendly, platform
independent software that all users could utilize to view ET results.
Results
The Evolutionary Trace Viewer (ETV) was implemented in an object oriented fashion
using Java 2 SDK. The Java programming language was used because of its platform
independence and ease of implementation. Java3D based on OpenGL was chosen to render the
molecule on the screen because of it’s reputation as a stable and fast real time 3D rendering
technology.
Figure 1 depicts the general organization of ETV. The main graphical outlay is contained
within the ETV Frame. The ETV Frame holds the GUI including the menu bar, title bar, text
status, rank slider, and the ETV panel. Since a goal of the ETV project was to tie in some of the
several data elements output from ET into one easy to use software locale, many of these
components are linked together, so that interaction with one part of the GUI will effect one or
more other parts. For example, moving the rank slider will move the rank bar across the ET
phylogenetic tree and display the current rank’s significant residues on the structure.
Evolutionary Trace Viewer Frame
The ETV Frame serves as the entry point for ETV and acts as a container for all the other
objects that the program may output. Some of these objects are contained within the actual
frame of the ETV Frame and others are launched as daughters . This module starts up and
organizes all of the components for the GUI in their default configuration (i.e. no molecule
loaded). At the top of the frame is the menu bar containing the various menu items followed by
informative text describing the loaded file and rank statistics. The majority of the frame consists
of the ETV panel which can display a loaded protein in one of two view modes (bonds and space
fill). The bottom displays text messages, updating the user on file loading and descriptions of
picked atoms (Figure 1).
The ETV frame is also responsible for control of most of the action listeners (i.e. mouse
listening events) that take place within the environment. All of the menu bar items are tied to
action listeners that then take appropriate action based on user selection. The rank slider action
listener is also tied to the ETV frame. Molecular manipulation action listeners are handled via
the ETV panel in the 3D environment.
Graphic User Interface (GUI)
The GUI is the front end that the user interacts with to perform tasks in ETV. As
described above, the GUI is tied to the program via action listeners that translate mouse and
keyboard inputs to commands that perform the requisite tasks within the program. The current
implementation actuates the GUI in two forms. Menu bar interaction, slider movement, and
general windowing manipulations are handled in 2D. The ETV panel, however, exists in 3D and
therefore has different GUI requirements that are controlled separately.
Menu Bar (Table 1)
The menu bar consists of three simple items: File, View, and Help (Table 1). This
version of ETV does not have a help file associated with it as of yet. The File menu consists of
three more choices: Open ETV File, Close ETV File, and Exit. The View menu allows the user
several choices: Bonds, Spacefill, and ET Tree. The View menu also has two inactive items that
are associated with future plans to add a multiple sequence file (MSF) viewer.
When the user selects Open ETV File, they are asked to select an ETV format file from a
file dialog that varies widely between operating systems. The file dialog defaults to filter the
local directory hierarchy for files ending in .etv (Figure 2). When a file of the correct format is
chosen, an input stream is opened and the data is fed to the program. The Close ETV File
returns ETV to its default configuration and activates the Java garbage collector to return
memory back to the system. Exit is a self-explanatory option.
If Bonds is chosen from the View menu and a molecule is loaded, the structure displays
all of the bonds except for trace residues, which are displayed as red spheres of Van der Waals
radii (Figure 7). Selecting Spacefill will display all of the atoms that are trace residues as red
spheres and non-trace amino acids as white spheres (Figure 4). Future versions will allow the
user to change the color schemes to suit their needs. The ET Tree selection opens an additional
frame that displays the Evolutionary Trace phylogenetic tree (Figure 5). This version of ETV
uses portions of Forester (ATV) to display the tree. Many trees consist of more than 100 leaves,
which makes close examination difficult, therefore a zoom feature is included. The ‘=’ key
zooms in (only y-axis is magnified) while the ‘-‘ zooms back out (Figure 6).
Table 1.
File
Open ETV File
Pops up a Open File dialog in the user’s home
directory. A file filter is defaulted to
<filename.etv>
Close ETV File
Removes the current file from memory and
resets ETV to its default configuration.
Exit
View
Shuts ETV down.
Help
Bonds
Shifts view mode so that the protein residues
are represented as bond lines if they are not at
the current rank. Residues that are important
at the current rank are shown as red spheres.
Spacefill
All atoms are shown as spheres with the trace
residues colored red and the non-trace
residues colored white.
ET Tree
Selecting this toggles the Evolutionary Trace
tree viewer frame on and off.
Set MSF File
Inactive
View MSF
Inactive
Inactive
Rank Slider
When an ETV file is loaded it provides information to the program that describes the
rank of each residue in the protein. Tic marks are displayed for each rank and the slider position
begins at the highest rank (rank 1) (Figures 4 and 7). As the user manipulates the slider through
the ranks, the molecule display is updated concurrently to reflect the current position of the
slider. Text output (above the slider) also changes with the slider position to provide information
on current rank, percent coverage, and percent sequence similarity. Percent coverage is a
mathematical approximation that gives the ratio of trace residues to total number of residues.
Percent sequence similarity is an average of the sequence similarity of all the sub-branches in the
tree at a given rank. The highest rank typically has about 25% sequence similarity while the
lowest rank always has 100% sequence similarity because each sub-branch consists of a single
sequence. As shown in Figures 2 and, the rank slider also updates the display of the ET tree,
when it is visible. This display changes in two ways. The first is a vertical blue line that marks
the location of the branch division for the current rank. The second is to display each sub-branch
in red that has lower rank than the one currently selected (Figures 5 and 6).
ET Viewer Panel
The ET Viewer panel is positioned in the ET Viewer frame and is designated to perform
all of the 3D operations. Although it appears as a flat, black plane, it can be properly thought of
as a box. When an ETV file is loaded, the molecule is displayed within this box. The user can
manipulate the protein by mouse as described in Table 2.
Table 2.
Left Mouse button (hold down)
Allows user to rotate the protein
Left Mouse button + Alt key
Moving the mouse up or down zooms the
molecule in or out.
Left Mouse button (Click)
When positioned over an atom, a left mouse
button click will illicit descriptive text to be
output into the status text box.
Right mouse button (hold down)
Used for translating the protein along x and y
axis.
ET Tree Viewer
The ET Tree viewer is adapted from Forester for use in displaying the Evolutionary Trace
phylogenetic tree. ET constructs the tree so that each binary branch node represents a rank
where evolutionary divergence occurs[4]. As the rank slider is moved (ET Viewer frame), the
ET tree is continuously updated so that the user can visualize the node where the current rank
diverges. The display also changes the color of lower ranked sub-branches to red to indicate
where further evolutionary divergence occurs within the protein family.
Protein Structure
As described above, the protein structure is visualized in the ET Viewer panel when an
ETV file is loaded. Data describing the protein is read in from the ETV file (Protein Data Bank
format), and is maintained for each available atom. Atom data includes Van der Waals radius,
atom name, associated residue, residue rank, residue sequence number, and atom center in
Cartesian coordinates. As the protein image is built, bond data is also determined and
maintained as pairs of 3D points in an array. Bond connectivity is determined by comparing
residue atoms to those in a library file (pdb.lib) that was adapted from Amber94.lib. Both the
space fill and bond view modes are rendered simultaneously, however the bonds are occluded by
the space fill atoms. Switching to Bond view mode merely converts white (non-trace residues)
to transparent objects so that the bonds become visible. This strategy results in a loss of
performance as evidenced by less smooth molecule manipulations. However, this loss of
performance is outpaced by the gains made in not having to re-render the protein every time a
view mode is changed. For this same reason, the current version of ETV does not yet have
options included to give the user the ability to change resolution factors to attempt to increase
performance. Attempts by the author to lower the resolution to gain performance on older
graphics cards met with little success. Again, the time it took to redraw proteins at a lower
resolution was not outweighed by the small performance gain of rendering at lower resolution
(see requirements section).
Deployment
To ease deployment, Java Web Start technology is utilized. This technology allows Java
applications to be downloaded from the remote server and run locally on the client. The ETV
software is packaged and compressed in a JAR file . When compressed, the ETV program is
only 250 kb in size. Once users receive the software, it can be run again locally without a
network connection. When a network connection is available, Java Web Start will check the
server for updates and automatically download the new version.
Requirements
Although ETV has not been tested on all possible operating system and hardware
configurations, we recommend a few minimum requirements. ETV requires Java 2 Java
Runtime Environment (version 1.3 or newer) and Java3D version 1.3. The JRE has to be
installed before installation of Java3D. Mac OS X does not currently support Java3D, therefore
it is not able to run ETV. Other software requirements include the use of ETV formatted files
(Figure 8). These files are generated automatically when Evolutionary Trace is run. Since
Java3D is built over OpenGL architecture, we recommend a 3D hardware accelerated graphics
card that supports OpenGL and has at least 16 mb of memory. Windows operating systems have
OpenGL libraries already installed and should not need any additional resources. Linux and
Unix systems, however, may need to have these libraries installed.
Methods
All programming took place using JBuilder 5 build 5.0.296.0. The Java Software
Development Kit used was version 1.3.1. The Forester New Hampshire Tree viewer package
used was from the year 2001 (originally developed in 1999). The Evolutionary Trace version
was ETC January 2003. The ET example shown in Figures 3-6 is from a trace run on the ligand
binding domain (LBD) of nuclear hormone receptors (NHR). The results of the trace were
mapped onto retinoic acid X receptor alpha (RXR alpha PDB code 1LBD. The trace was run
using a similarity matrix to count similar residues as identical. Before the trace was run, BLAST
[6] was used with default parameters to obtain 287 similar sequences from the NHR super
family. These sequences were aligned using CLUSTALW [7] (default parameters) and the
alignment was used as input to ET.
Discussion
This project was initially defined as a continuation of an effort to complete the ET
Viewer. Initial phases brought up increasing graphics hardware requirements that would
possibly make the software unusable to some users. To address this issue, we made an effort to
adjust resolution parameters so that users with low end hardware would still be able to use the
software. These efforts will continue in the future, but as of now we have not been able to get
around the issue of having to rebuild the rendering tree every time the resolution is changed. In
addition, when the proteins were displayed in low resolution, there was no significant increase in
performance. The reasons for this are unclear and need to be investigated further. The problem
may be due to the Java platform or the proper software optimization has not yet been utilized.
Another task that needs to be completed is completion of the multiple sequence
alignment viewer. This viewer will be linked to the ET Viewer frame in a manner similar to the
ET tree viewer. This will allow the rank slider position to communicate with the alignment
viewer so that corresponding alignment positions will be colored according to the current rank.
Importantly, this will also allow the user to quickly identify how certain sequences may be
affecting the overall alignment and thus affecting the trace results. The user may then decide to
delete the offending sequences realign them and rerun the trace. Ultimately we hope to be able
to do all of this from the ET Viewer.
Significant improvements have been made with this version of the ET Viewer. Users are
now able to visualize internal trace residues through the use of the Bond view. The user
interface is better organized with the addition of the menu bar. This program can be launched
from our website at:
http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/traceview/ETViewerHome.html.
1: This flowchart shows the general organization of the Evolutionary Trace Viewer.
The rank slider is a central feature and is used to interact with many of the visual
components.
2: When the user initially opens ETV, this is the default screen
with no etv file loaded.
3: Using the mouse to select File-Open ETV File brings up
this Open file dialog that includes a filter to select only .etv
files.
4: When a file is loaded, the default view mode is space fill.
At rank 143 out of 285, average sequence similarity is 95%
with 15% coverage. Trace residues are shown as red spheres
while the remainder of the molecule is white.
5: Selecting View-ET Tree from the menu will bring up the
current trace’s evolutionary tree. The rank bar (blue) aligns
with the rank shown in Figure 3. The red sub-branches
show where lower ranked divisions occur. Due to the
number of sequences in this trace, the user cannot see the
sequence identifications.
6: By using the = and - keys the user can zoom
in and out of the tree.
7: When the user selects View-Bonds, the ETV makes the
non-trace residues invisible. The protein (PDB code 1LBD)
is the same as the one shown in Figure 3.
blah
~pdb
REMARK access: $Revision: 3.1 $, $Date: 1995/05/22 20:18:50 $
REMARK Args: -v -i 1bik.pdb -o pt_1bik.pdb
REMARK algorithm: Richards + qsort/inline-arclap
REMARK radii: Richards static VdW
ATOM
1 N SER 25 14.378 34.269 -4.694 3.10 38.03
ATOM
2 CA SER 25 13.819 32.972 -4.199 3.40 0.00
ATOM
3 C SER 25 12.960 33.147 -2.964 3.10 0.00
ATOM
4 O SER 25 11.822 32.695 -2.943 2.80 0.00
ATOM
5 CB SER 25 14.932 31.984 -3.896 3.40 13.03
ATOM
6 OG SER 25 15.614 31.662 -5.087 3.00 30.47
ATOM
7 N CYS 26 13.515 33.813 -1.953 3.10 0.19
ATOM
8 CA CYS 26 12.824 34.074 -0.690 3.40 0.00
ATOM
9 C CYS 26
11.461 34.703 -0.924 3.10 0.09
ATOM 10 O CYS 26 10.553 34.571 -0.101 2.80 1.89
ATOM 11 CB CYS 26 13.666 35.005 0.177 3.40 19.50
ATOM 12 SG CYS 26 15.289 34.323 0.637 3.25 13.41
ATOM 13 N GLN 27 11.339 35.400 -2.048 3.10 0.32
ATOM 14 CA GLN 27 10.100 36.068 -2.420 3.40 11.81
ATOM 15 C GLN 27
9.100 35.118 -3.081 3.10 0.07
ATOM 16 O GLN 27
7.896 35.361 -3.058 2.80 26.12
ATOM 17 CB GLN 27 10.406 37.244 -3.331 3.40 65.06
END
~ET_ranks
% Note: in this file % is a comment sign.
%
%
%
RESIDUE RANKS:
% alignment# residue# type rank
variability
1
25
4
.HPA
2
12
6
SAFVIG
3
25
2
CL
4
25
7
QRNKALT
72
15
4
.LKE
73
16
5
.EGKH
74
18
6
.RKIVQ
119
1
1
N
120
23
5
KQNRV
121
6
2
FY
122
10
6
YEVPSQ
123
16
5
STYDE
124
24
5
EQKRL
125
18
6
KRAEDS
126
15
7
EQDATNI
127
1
1
C
128
22
7
KREQHML
129
18
7
ELKNRGV
130
26
7
YVITAFL
131
1
1
C
132
11
6
GKEAVQ
133
22
2
.V
~tree
(((((((((((P00978-1:0.082569 ,P13371-1:0.082569 )22:0.032110 ,pt_1bik:0.114679 )21:0.007645 ,CAA36306-1:0.122324 )19:0.003823
,BAA25305-1:0.126147 )18:0.011468 ,((((NP_031469-1:0.036697 ,NP_037033-1:0.036697 )28:0.013761 ,Q62577-1:0.050459
)27:0.016820 ,AAB50851-1:0.067278 )26:0.010703 ,P04365-1:0.077982 )23:0.059633 )16:0.137615 ,BAA13453-1:0.275229
)12:0.145538 ,(JC2556-1:0.302752 ,P36992-1:0.302752 )11:0.118015 )6:0.135222 ,((CAC82582-1:0.024793 ,CAC82583-1:0.024793
)29:0.433368 ,(((AAD01586-1:0.008065 ,O54819-1:0.008065 )31:0.125000 ,NP_058896-1:0.133065 )17:0.111022 ,(((AAD017001:0.072581 ,Q28864-2:0.072581 )25:0.112903 ,(P19761-2:0.008065 ,S12143-2:0.008065 )30:0.177419 )15:0.048387 ,S53325-1:0.233871
)14:0.010215 )13:0.214075 )5:0.097828 )4:0.032029 ,(NP_006519-2:0.380531 ,NP_033390-1:0.380531 )7:0.207487 )3:0.016573
,(NP_006519-1:0.327434 ,NP_033390-2:0.327434 )9:0.277158 )2:0.072805 ,((((AAG00547-1:0.074766 ,NP_065131-1:0.074766
)24:0.046729 ,AAK31336-2:0.121495 )20:0.196262 ,Q9DA01-1:0.317757 )10:0.025701 ,AAK31337-1:0.343458 )8:0.333939
)1:0.322603
Figure 8: ET produces a ranks file and a tree file which are combined with the associated PDB
file to make a .etv file. The sample shown here has been greatly truncated for illustrative
purposes.
Bibliography
DisplayText cannot span more than one line!
Download