Spring 2003 SCBMB Status Report for Daniel H. Morgan Part II: Evolutionary Trace Viewer in Java PI: Olivier Lichtarge Committee: Wah Chiu Austin Cooney Aleksander Milosavljevic Timothy Palzkill Evolutionary Trace Viewer in Java Introduction Advances in crystallization and sequencing technology have led to an explosion in protein sequence and structure data. However, our understanding of the biological role of these data remains limited. Our work has shown that with appropriate manipulation of these data, functionally important residues in proteins can be identified based on specific evolutionary patterns using the Evolutionary Trace (ET) method. These regions form statistically significant clusters [1] in protein structures that overlap known functional sites [2-5], predict specific determinants of function, and can have structural importance. Even when protein function has been identified through conventional means, localization of functional sites is crucial to smart drug design. A common technique in identifying protein function is through the use of exhaustive mutational screens. The ET method makes use of nature’s inherent mutational experiments to identify important regions within a protein family. Amino acid positions that are conserved are thought to represent functionally important regions within the structure of the protein [4]. ET correlates variance with evolutionary divergence within protein families and assigns ranks to each alignment position [2, 4]. When analyzing global sequence alignments, any position that is invariant will have the highest rank while those positions that can accept any amino acid would have the lowest rank. While this information is interesting, it has little practical value unless mapped onto a protein structure that is representative of the family. ET converts the 2dimensional data that ranks each alignment position in the protein family to 3-dimensional data that can be seen to cluster, indicating regions of functional importance. One goal in bioinformatics is to collect, process and view large amounts of data in a biologically relevant manner. ET can be used to collect and process appropriate data, however its output can be somewhat difficult to analyze. Older forms of ET output RasMol scripts for each rank and it was dependent upon the user to load and visualize each rank manually. Also, each trace’s associated tree was in post script format and needed to be viewed separately and was difficult to associate with respective ranks. Moreover, ET’s primary input, a multiple sequence alignment, had to be viewed using other software in order to determine if the alignment was satisfactory. While using multiple software platforms to view ET data is somewhat cumbersome, it was adequate for expert users of the ET method. However, through recent collaborations, we have increasingly had requests from non-experts to view the results of our trace analysis. All of these factors contributed to our decision to develop user friendly, platform independent software that all users could utilize to view ET results. Results The Evolutionary Trace Viewer (ETV) was implemented in an object oriented fashion using Java 2 SDK. The Java programming language was used because of its platform independence and ease of implementation. Java3D based on OpenGL was chosen to render the molecule on the screen because of it’s reputation as a stable and fast real time 3D rendering technology. Figure 1 depicts the general organization of ETV. The main graphical outlay is contained within the ETV Frame. The ETV Frame holds the GUI including the menu bar, title bar, text status, rank slider, and the ETV panel. Since a goal of the ETV project was to tie in some of the several data elements output from ET into one easy to use software locale, many of these components are linked together, so that interaction with one part of the GUI will effect one or more other parts. For example, moving the rank slider will move the rank bar across the ET phylogenetic tree and display the current rank’s significant residues on the structure. Evolutionary Trace Viewer Frame The ETV Frame serves as the entry point for ETV and acts as a container for all the other objects that the program may output. Some of these objects are contained within the actual frame of the ETV Frame and others are launched as daughters . This module starts up and organizes all of the components for the GUI in their default configuration (i.e. no molecule loaded). At the top of the frame is the menu bar containing the various menu items followed by informative text describing the loaded file and rank statistics. The majority of the frame consists of the ETV panel which can display a loaded protein in one of two view modes (bonds and space fill). The bottom displays text messages, updating the user on file loading and descriptions of picked atoms (Figure 1). The ETV frame is also responsible for control of most of the action listeners (i.e. mouse listening events) that take place within the environment. All of the menu bar items are tied to action listeners that then take appropriate action based on user selection. The rank slider action listener is also tied to the ETV frame. Molecular manipulation action listeners are handled via the ETV panel in the 3D environment. Graphic User Interface (GUI) The GUI is the front end that the user interacts with to perform tasks in ETV. As described above, the GUI is tied to the program via action listeners that translate mouse and keyboard inputs to commands that perform the requisite tasks within the program. The current implementation actuates the GUI in two forms. Menu bar interaction, slider movement, and general windowing manipulations are handled in 2D. The ETV panel, however, exists in 3D and therefore has different GUI requirements that are controlled separately. Menu Bar (Table 1) The menu bar consists of three simple items: File, View, and Help (Table 1). This version of ETV does not have a help file associated with it as of yet. The File menu consists of three more choices: Open ETV File, Close ETV File, and Exit. The View menu allows the user several choices: Bonds, Spacefill, and ET Tree. The View menu also has two inactive items that are associated with future plans to add a multiple sequence file (MSF) viewer. When the user selects Open ETV File, they are asked to select an ETV format file from a file dialog that varies widely between operating systems. The file dialog defaults to filter the local directory hierarchy for files ending in .etv (Figure 2). When a file of the correct format is chosen, an input stream is opened and the data is fed to the program. The Close ETV File returns ETV to its default configuration and activates the Java garbage collector to return memory back to the system. Exit is a self-explanatory option. If Bonds is chosen from the View menu and a molecule is loaded, the structure displays all of the bonds except for trace residues, which are displayed as red spheres of Van der Waals radii (Figure 7). Selecting Spacefill will display all of the atoms that are trace residues as red spheres and non-trace amino acids as white spheres (Figure 4). Future versions will allow the user to change the color schemes to suit their needs. The ET Tree selection opens an additional frame that displays the Evolutionary Trace phylogenetic tree (Figure 5). This version of ETV uses portions of Forester (ATV) to display the tree. Many trees consist of more than 100 leaves, which makes close examination difficult, therefore a zoom feature is included. The ‘=’ key zooms in (only y-axis is magnified) while the ‘-‘ zooms back out (Figure 6). Table 1. File Open ETV File Pops up a Open File dialog in the user’s home directory. A file filter is defaulted to <filename.etv> Close ETV File Removes the current file from memory and resets ETV to its default configuration. Exit View Shuts ETV down. Help Bonds Shifts view mode so that the protein residues are represented as bond lines if they are not at the current rank. Residues that are important at the current rank are shown as red spheres. Spacefill All atoms are shown as spheres with the trace residues colored red and the non-trace residues colored white. ET Tree Selecting this toggles the Evolutionary Trace tree viewer frame on and off. Set MSF File Inactive View MSF Inactive Inactive Rank Slider When an ETV file is loaded it provides information to the program that describes the rank of each residue in the protein. Tic marks are displayed for each rank and the slider position begins at the highest rank (rank 1) (Figures 4 and 7). As the user manipulates the slider through the ranks, the molecule display is updated concurrently to reflect the current position of the slider. Text output (above the slider) also changes with the slider position to provide information on current rank, percent coverage, and percent sequence similarity. Percent coverage is a mathematical approximation that gives the ratio of trace residues to total number of residues. Percent sequence similarity is an average of the sequence similarity of all the sub-branches in the tree at a given rank. The highest rank typically has about 25% sequence similarity while the lowest rank always has 100% sequence similarity because each sub-branch consists of a single sequence. As shown in Figures 2 and, the rank slider also updates the display of the ET tree, when it is visible. This display changes in two ways. The first is a vertical blue line that marks the location of the branch division for the current rank. The second is to display each sub-branch in red that has lower rank than the one currently selected (Figures 5 and 6). ET Viewer Panel The ET Viewer panel is positioned in the ET Viewer frame and is designated to perform all of the 3D operations. Although it appears as a flat, black plane, it can be properly thought of as a box. When an ETV file is loaded, the molecule is displayed within this box. The user can manipulate the protein by mouse as described in Table 2. Table 2. Left Mouse button (hold down) Allows user to rotate the protein Left Mouse button + Alt key Moving the mouse up or down zooms the molecule in or out. Left Mouse button (Click) When positioned over an atom, a left mouse button click will illicit descriptive text to be output into the status text box. Right mouse button (hold down) Used for translating the protein along x and y axis. ET Tree Viewer The ET Tree viewer is adapted from Forester for use in displaying the Evolutionary Trace phylogenetic tree. ET constructs the tree so that each binary branch node represents a rank where evolutionary divergence occurs[4]. As the rank slider is moved (ET Viewer frame), the ET tree is continuously updated so that the user can visualize the node where the current rank diverges. The display also changes the color of lower ranked sub-branches to red to indicate where further evolutionary divergence occurs within the protein family. Protein Structure As described above, the protein structure is visualized in the ET Viewer panel when an ETV file is loaded. Data describing the protein is read in from the ETV file (Protein Data Bank format), and is maintained for each available atom. Atom data includes Van der Waals radius, atom name, associated residue, residue rank, residue sequence number, and atom center in Cartesian coordinates. As the protein image is built, bond data is also determined and maintained as pairs of 3D points in an array. Bond connectivity is determined by comparing residue atoms to those in a library file (pdb.lib) that was adapted from Amber94.lib. Both the space fill and bond view modes are rendered simultaneously, however the bonds are occluded by the space fill atoms. Switching to Bond view mode merely converts white (non-trace residues) to transparent objects so that the bonds become visible. This strategy results in a loss of performance as evidenced by less smooth molecule manipulations. However, this loss of performance is outpaced by the gains made in not having to re-render the protein every time a view mode is changed. For this same reason, the current version of ETV does not yet have options included to give the user the ability to change resolution factors to attempt to increase performance. Attempts by the author to lower the resolution to gain performance on older graphics cards met with little success. Again, the time it took to redraw proteins at a lower resolution was not outweighed by the small performance gain of rendering at lower resolution (see requirements section). Deployment To ease deployment, Java Web Start technology is utilized. This technology allows Java applications to be downloaded from the remote server and run locally on the client. The ETV software is packaged and compressed in a JAR file . When compressed, the ETV program is only 250 kb in size. Once users receive the software, it can be run again locally without a network connection. When a network connection is available, Java Web Start will check the server for updates and automatically download the new version. Requirements Although ETV has not been tested on all possible operating system and hardware configurations, we recommend a few minimum requirements. ETV requires Java 2 Java Runtime Environment (version 1.3 or newer) and Java3D version 1.3. The JRE has to be installed before installation of Java3D. Mac OS X does not currently support Java3D, therefore it is not able to run ETV. Other software requirements include the use of ETV formatted files (Figure 8). These files are generated automatically when Evolutionary Trace is run. Since Java3D is built over OpenGL architecture, we recommend a 3D hardware accelerated graphics card that supports OpenGL and has at least 16 mb of memory. Windows operating systems have OpenGL libraries already installed and should not need any additional resources. Linux and Unix systems, however, may need to have these libraries installed. Methods All programming took place using JBuilder 5 build 5.0.296.0. The Java Software Development Kit used was version 1.3.1. The Forester New Hampshire Tree viewer package used was from the year 2001 (originally developed in 1999). The Evolutionary Trace version was ETC January 2003. The ET example shown in Figures 3-6 is from a trace run on the ligand binding domain (LBD) of nuclear hormone receptors (NHR). The results of the trace were mapped onto retinoic acid X receptor alpha (RXR alpha PDB code 1LBD. The trace was run using a similarity matrix to count similar residues as identical. Before the trace was run, BLAST [6] was used with default parameters to obtain 287 similar sequences from the NHR super family. These sequences were aligned using CLUSTALW [7] (default parameters) and the alignment was used as input to ET. Discussion This project was initially defined as a continuation of an effort to complete the ET Viewer. Initial phases brought up increasing graphics hardware requirements that would possibly make the software unusable to some users. To address this issue, we made an effort to adjust resolution parameters so that users with low end hardware would still be able to use the software. These efforts will continue in the future, but as of now we have not been able to get around the issue of having to rebuild the rendering tree every time the resolution is changed. In addition, when the proteins were displayed in low resolution, there was no significant increase in performance. The reasons for this are unclear and need to be investigated further. The problem may be due to the Java platform or the proper software optimization has not yet been utilized. Another task that needs to be completed is completion of the multiple sequence alignment viewer. This viewer will be linked to the ET Viewer frame in a manner similar to the ET tree viewer. This will allow the rank slider position to communicate with the alignment viewer so that corresponding alignment positions will be colored according to the current rank. Importantly, this will also allow the user to quickly identify how certain sequences may be affecting the overall alignment and thus affecting the trace results. The user may then decide to delete the offending sequences realign them and rerun the trace. Ultimately we hope to be able to do all of this from the ET Viewer. Significant improvements have been made with this version of the ET Viewer. Users are now able to visualize internal trace residues through the use of the Bond view. The user interface is better organized with the addition of the menu bar. This program can be launched from our website at: http://imgen.bcm.tmc.edu/molgen/labs/lichtarge/traceview/ETViewerHome.html. 1: This flowchart shows the general organization of the Evolutionary Trace Viewer. The rank slider is a central feature and is used to interact with many of the visual components. 2: When the user initially opens ETV, this is the default screen with no etv file loaded. 3: Using the mouse to select File-Open ETV File brings up this Open file dialog that includes a filter to select only .etv files. 4: When a file is loaded, the default view mode is space fill. At rank 143 out of 285, average sequence similarity is 95% with 15% coverage. Trace residues are shown as red spheres while the remainder of the molecule is white. 5: Selecting View-ET Tree from the menu will bring up the current trace’s evolutionary tree. The rank bar (blue) aligns with the rank shown in Figure 3. The red sub-branches show where lower ranked divisions occur. Due to the number of sequences in this trace, the user cannot see the sequence identifications. 6: By using the = and - keys the user can zoom in and out of the tree. 7: When the user selects View-Bonds, the ETV makes the non-trace residues invisible. The protein (PDB code 1LBD) is the same as the one shown in Figure 3. blah ~pdb REMARK access: $Revision: 3.1 $, $Date: 1995/05/22 20:18:50 $ REMARK Args: -v -i 1bik.pdb -o pt_1bik.pdb REMARK algorithm: Richards + qsort/inline-arclap REMARK radii: Richards static VdW ATOM 1 N SER 25 14.378 34.269 -4.694 3.10 38.03 ATOM 2 CA SER 25 13.819 32.972 -4.199 3.40 0.00 ATOM 3 C SER 25 12.960 33.147 -2.964 3.10 0.00 ATOM 4 O SER 25 11.822 32.695 -2.943 2.80 0.00 ATOM 5 CB SER 25 14.932 31.984 -3.896 3.40 13.03 ATOM 6 OG SER 25 15.614 31.662 -5.087 3.00 30.47 ATOM 7 N CYS 26 13.515 33.813 -1.953 3.10 0.19 ATOM 8 CA CYS 26 12.824 34.074 -0.690 3.40 0.00 ATOM 9 C CYS 26 11.461 34.703 -0.924 3.10 0.09 ATOM 10 O CYS 26 10.553 34.571 -0.101 2.80 1.89 ATOM 11 CB CYS 26 13.666 35.005 0.177 3.40 19.50 ATOM 12 SG CYS 26 15.289 34.323 0.637 3.25 13.41 ATOM 13 N GLN 27 11.339 35.400 -2.048 3.10 0.32 ATOM 14 CA GLN 27 10.100 36.068 -2.420 3.40 11.81 ATOM 15 C GLN 27 9.100 35.118 -3.081 3.10 0.07 ATOM 16 O GLN 27 7.896 35.361 -3.058 2.80 26.12 ATOM 17 CB GLN 27 10.406 37.244 -3.331 3.40 65.06 END ~ET_ranks % Note: in this file % is a comment sign. % % % RESIDUE RANKS: % alignment# residue# type rank variability 1 25 4 .HPA 2 12 6 SAFVIG 3 25 2 CL 4 25 7 QRNKALT 72 15 4 .LKE 73 16 5 .EGKH 74 18 6 .RKIVQ 119 1 1 N 120 23 5 KQNRV 121 6 2 FY 122 10 6 YEVPSQ 123 16 5 STYDE 124 24 5 EQKRL 125 18 6 KRAEDS 126 15 7 EQDATNI 127 1 1 C 128 22 7 KREQHML 129 18 7 ELKNRGV 130 26 7 YVITAFL 131 1 1 C 132 11 6 GKEAVQ 133 22 2 .V ~tree (((((((((((P00978-1:0.082569 ,P13371-1:0.082569 )22:0.032110 ,pt_1bik:0.114679 )21:0.007645 ,CAA36306-1:0.122324 )19:0.003823 ,BAA25305-1:0.126147 )18:0.011468 ,((((NP_031469-1:0.036697 ,NP_037033-1:0.036697 )28:0.013761 ,Q62577-1:0.050459 )27:0.016820 ,AAB50851-1:0.067278 )26:0.010703 ,P04365-1:0.077982 )23:0.059633 )16:0.137615 ,BAA13453-1:0.275229 )12:0.145538 ,(JC2556-1:0.302752 ,P36992-1:0.302752 )11:0.118015 )6:0.135222 ,((CAC82582-1:0.024793 ,CAC82583-1:0.024793 )29:0.433368 ,(((AAD01586-1:0.008065 ,O54819-1:0.008065 )31:0.125000 ,NP_058896-1:0.133065 )17:0.111022 ,(((AAD017001:0.072581 ,Q28864-2:0.072581 )25:0.112903 ,(P19761-2:0.008065 ,S12143-2:0.008065 )30:0.177419 )15:0.048387 ,S53325-1:0.233871 )14:0.010215 )13:0.214075 )5:0.097828 )4:0.032029 ,(NP_006519-2:0.380531 ,NP_033390-1:0.380531 )7:0.207487 )3:0.016573 ,(NP_006519-1:0.327434 ,NP_033390-2:0.327434 )9:0.277158 )2:0.072805 ,((((AAG00547-1:0.074766 ,NP_065131-1:0.074766 )24:0.046729 ,AAK31336-2:0.121495 )20:0.196262 ,Q9DA01-1:0.317757 )10:0.025701 ,AAK31337-1:0.343458 )8:0.333939 )1:0.322603 Figure 8: ET produces a ranks file and a tree file which are combined with the associated PDB file to make a .etv file. The sample shown here has been greatly truncated for illustrative purposes. Bibliography DisplayText cannot span more than one line!