Interactive Poster: Enlightenment: An Integrated Visualization and Analysis Tool for Drug Discovery Christopher E. Mueller Array BioPharma , 3200 Walnut St. Boulder, CO 80301 cmueller@arraybiopharma.com Abstract Commercial software tools for interpreting analytical chemistry data provide basic views but offer few domain specific enhancements for exploring the data. Gaining an understanding of the results for an individual compound and a large set of compounds requires examining multiple data sets in multiple applications for each compound. In this poster, we present Enlightenment, a new tool that takes the traditional look and feel of an analytical application and significantly enhances the utility of the visualizations. Using Enlightenment, analytical chemists can review large sets of compounds quickly and explore the data from a single, unified interface. Enlightenment demonstrates how applying domain knowledge can enhance the usefulness of traditional displays. Keywords: Visualization, Chromatography, HPLC, Mass Spec, High Throughput Synthesis 1 High Throughput Synthesis High Throughput Synthesis is the process of using combinatorial chemistry to create large numbers of related but diverse compounds quickly. The main vessel for handling compounds is a plate. A plate consists of wells arrayed in an m x n matrix, where m x n is typically 12 x 8 yielding 96 wells. To confirm that the correct products are created, each plate is analyzed using a high performance liquid chromatography (HPLC) instrument with UV and mass spectrometric (MS) detection to confirm purity and identity, respectively. An algorithm is applied to the data to make the first determination as to whether or not the compound was created properly. These results are then reviewed by an analytical chemist who either confirms or amends them. Interpreting the results algorithmically is non-trivial and often produces incorrect results, requiring human intervention to determine if a compound passes or fails. The manual process consists of using a collection of vendorsupplied tools to explore the data, each task requiring a separate application: one for viewing the plate and algorithmic results, one for viewing raw data for each well, one for viewing compound structures, and a spreadsheet for tracking observations. Finnigan's Xcalibur/Discovery [1,2] and Waters’ OpenLynx [3] system are examples of such commercial systems. 2 Enlightenment Enlightenment provides a unified interface to all plate, structure, and analytical data. It applies information visualization techniques to enable the analytical chemist to understand the results quickly and increase the data density of the visualizations. When data exploration is required, a series of data-aware, linked plots allow the chemist to drill down into the data from a single application. Figure 1 - Enlightenment Enlightenment is designed to be immediately familiar to analytical chemists but provides a more information-rich view of the data than commercially available tools. The main views integrated into the UI are the plate view with its linked tree and compound structure views as well as an analytical data view that shows the processing results, linked to plots of the raw data. 3 Plate View The plate view in commercial applications displays a grid of color-coded circles for each well, with the color denoting the status of the well. By default, Finnigan’s Discovery Browser [2] uses four colors denoting pass (green), found but not pure (yellow), pure but not found (pink), and fail (red). However, other data items exist that can be displayed at the well level to give the chemist a better idea of what is happening in the plate. It is often the case that the chemist will step through each well to acquire these, just to get a better view of the big picture. Enlightenment uses the Finnigan color scheme to maintain familiarity, but replaces pink with blue since some displays made it hard to distinguish pink and red. The intensity of the colors was also adjusted using the guidelines in [4, p. 164] so that no single color stood out. Figure 2 – Icons, Enlightenment uses overlays and size to Colors and Overlays show clearly three extra dimensions of data: HPLC signal strength, channel used, and percent BPI (MS signal strength). These values are typically used to understand problems with a plate and are only available through analysis of multiple plots per well in commercial applications. Signal strength is illustrated by the size of the circle: smaller for low signals and larger for signals that are too strong. Size alone was hard to distinguish on small displays, so a “noisy” border was added to give the appearance of a deviant signal. Selected channel and percent BPI use overlays to highlight cases that occur infrequently. Generally, channel 1 is selected and the BPI is 100%. If a different channel was used, the channel's number is overlayed in the upper left corner of the well. If the BPI is below a threshold (e.g. 80%), a bar appears on the left edge of the well, its height relative to the BPI. By using the overlay only in these cases, wells that exhibit these behaviors stand out. Enlightenment's plate view uses different levels of detail (LODs) to display more or less information about each well, depending on the audience. For instance, business development staff can select a LOD that only displays green/red to determine which compounds can be sold, whereas an analytical chemist would select the most detailed LOD. The plate view is linked to a tree view that displays detailed information for each compound and a structure view that displays the structure of the selected compound (Figure 1, top row). The analytical views are also linked to the selected well. 4 Analytical Results View The analytical results views are located beneath the plate view (Figure 1, bottom three rows). There are four different channels of analytical information used to characterize a compound, three displayed by default. Applying the concept of multiples in space and time [5], each channel has an identical results view and a set of plots. Because the results view is linked to the plate view, changing the status of a well in the results view also changes the color and overlays for that well in the plate view. 5 Analytical Plot Views HPLC and MS data are represented by line and stick plots, respectively. HPLC data consists of a time-series trace with distinct peaks. Each peak corresponds to some amount of material passing through the detector and comparing peak areas gives the purity for each peak. Each peak has a start and end point, and the MS data is sub-sampled to show data in the range for each peak. Selecting a peak in a HPLC trace displays the subsample of the MS data in the MS plot. MS plots show the massto-charge (m/z) ratio on the x-axis and relative intensity on the yaxis. axis denotes the maximum value (Figure 1, middle plot). Thus, a quick glance can tell a chemist if the signal was strong enough for proper evaluation. Signals that are too strong lead to obviously distorted traces and have no special marking. Often, all data prior to a certain time will be excluded from analysis. The x-axis range-frame spans only the time range used in processing and includes a single tick mark showing the time for the currently selected peak. Labels on the peaks denote the purity of each peak. If the target compound was found for a given peak, its mass is displayed alongside the purity value. For an MS intensity to be useful, it should be above 20%. This is displayed by the y-axis range-frame on the MS plot, which spans 20-100%. The x-axis range-frame spans the entire length of the plot with ticks at either end displaying the min and max m/z values. Sticks are labeled with the m/z value. The peaks in the HPLC plot are dynamically linked to the MS plot. Changing the endpoints of a peak or drawing a new peak sub-samples the MS data in real time to display the mass spec for the new peak. All plots feature interactive panning, zooming, and arbitrary value picking. Zooming is accomplished by drawing a rectangular region around a plot area to define the new view or by scrolling the ends of the PanBar controls (Figure 4). PanBars are similar to Spotfire's Range Figure 4 – PanBars Sliders [7] and allow both panning and zooming. Originally, only the PanBars and Zoom Controls were available for zooming, but user feedback led to the addition of the zoom box and a button in the lower-left corner of the view that zooms out completely. If no mouse button is pressed, the current x/y value below the mouse cursor is displayed in the status bar in data coordinates. 6 Conclusions Enlightenment is similar to commercial analytical chemistry applications. However, careful analysis of the domain and the chemist’s usage patterns has led to several enhancements. By combining the functionality of multiple applications into one, we have eliminated redundant features and provided better linking among views. Using information visualization techniques, the views build on familiar displays but show significantly more information and allow chemists to draw conclusions more effectively. References Figure 3 - Chromatogram and Mass Spec Plots Applying the principle of maximizing data ink [6], the HPLC and MS plots were redesigned to display more information than the simple scientific plots used in commercial tools. The axes on all plots were replaced with range-frame axes with carefully selected tick marks. Signal strength is important for HPLC traces; too low or strong a signal leads to incorrect purity results. The y-axis range-frame starts with the minimum good value and ends with the maximum observed value. If the signal is low, a single tick-mark with no [1] Finnigan (2000). Xcalibur 1.2. [Software] [2] Finnigan (2000). Xcalibur Discovery Browser 1.2. [Software] [3] Waters (2003). OpenLynx Application Manager - Processing & Reporting (Retrieved June 17, 2003). www.waters.com. [4] Kosslyn, S. M. (1994). Elements of Graph Design. US: W. H. Freeman and Company. [5] Tufte, E. R. (2002). Visual Explanations. Conn: Graphics Press. [6] Tufte, E. R. (2001, 2nd Ed.). The Visual Display of Quantitative Information. Conn: Graphics Press. [7] Spotfire, Inc (2001). Spotfire DecisionSite 6.3.0.349 [Software]