Running head: LAB 1 – UNVERIFIED JOIN VIEWER Lab 1 – Unverified Join Viewer Joseph R. Cooper Old Dominion University CS 411W Janet Brunelle October 25, 2015 Version 2 1 LAB 1 – UNVERIFIED JOIN VIEWER 2 Table of Contents 1 Introduction .................................................................................................................................. 3 2 Unverified Join Viewer Product Description ............................................................................... 4 2.1 Key Product Features and Capabilities. ........................................................................ 4 2.2 Major Components (Hardware/Software)..................................................................... 6 3 Identification of Case Study......................................................................................................... 7 4 Unverified Join Viewer Product Prototype Description .............................................................. 8 4.1 Prototype Architecture (Hardware/Software) ............................................................... 8 4.2 Prototype Features and Capabilities.............................................................................. 9 4.3 Prototype Development Challenges and Risks ........................................................... 10 Glossary .........................................................................................................................................11 References ..................................................................................................................................... 13 List of Figures Figure 1 - Major Functional Components Diagram ........................................................................ 6 Figure 2 - Current process for verifying joins in a genome ............................................................ 7 Figure 3 - Improved process for verifying joins ............................................................................. 8 Figure 4 - Prototype Screenshot ...................................................................................................... 9 List of Tables Table 1 - Differences between prototype and real world product ................................................. 10 LAB 1 – UNVERIFIED JOIN VIEWER 3 1 Introduction In the Chesapeake Bay, mycobacteriosis is afflicting large amounts of striped bass ("David Gauthier, Ph.D."). Mycobacteriosis is an infectious disease caused by bacteria in the genus Mycobacterium (“In Focus - Striped Bass Health”). Mycobacteriosis causes inflammation, tissue destruction and formation of scar tissue in one or more organs ("In Focus - Striped Bass Health"). To further understand the characteristics, evolution, and pathogenicity of the Mycobacterium plaguing the striped bass in the Chesapeake Bay, the pervasive Mycobacterium genomes must be studied. Unfortunately, the current process for studying the genomes of Mycobacterium species is time consuming and costly. This process involves splitting up a genome into many smaller pieces, determining the genetic makeup of those segments, and then joining all of the segments back together ("What's a Genome?"). Sequencers that join together the splits are prone to errors. Genome viewers do not currently have the ability to show join evidence. If a join appears faulty, a biologist must manually check the join in the sequencer output, and then must conduct lab work to find the correct join. In order to prevent this extra work from having to be done, the information must be readily available in the genome viewer being used. The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to inspect and search for genome features, in particular joins, within a genome sequence by means of a graphical user interface (GUI). By presenting all this information in a GUI, the biologists will be able to quickly navigate through vast amounts of genome information to determine if joins were made correctly or how to correctly create a join if it does appear to be improperly formed. The UJV saves the biologist the time of having to track down why and how bad joins were made by making all this information readily available. LAB 1 – UNVERIFIED JOIN VIEWER 4 2 Unverified Join Viewer Product Description The UJV is a GUI application. It reads in genome sequence information stored in GenBank format and then displays this information in such a way that it can be easily read. Users are able to quickly locate specific genome information by filtering genome features and zooming in and out of regions of the genome. Users can also hover over genome features in the viewer with a mouse to view tooltips which provide more information about features. 2.1 Key Product Features and Capabilities. The UJV can display genome information that is stored in GenBank format. In order for GenBank files to be rendered in the UJV, they must be loaded into the UJV using menus provided in the UJV’s GUI. The UJV will parse the loaded file and create a graphical, circular representation of the genome’s information that can be read and searched. The joins and other features in the genome will be represented on the circle by points labeled by feature type and located on the circle in positions that coincide with their positions in the genome sequence. Since there may be enough features present in a GenBank file to cause features to overlap in the genome viewer, the user may zoom in and out in the genome viewer. When zooming in, the distance between genome features increases and genome features move away from focal point of the zoom and may disappear from the genome view if they are close enough to the extremities of the viewer. When zooming out, genome features move closer to the focal point of the zoom and may appear in the genome view if their positions were close enough to the extremities of the viewer before the user zoomed out. It is possible for the user to zoom in all the way to view the nucleotide sequence of the genome. Certain feature types present in the GenBank file, such as joins, may also be selected to not be rendered in the GUI in order to prevent overlapping. LAB 1 – UNVERIFIED JOIN VIEWER 5 If a user sees a feature that has been improperly sequenced, the user can fix the feature in the viewer. The user can access the viewer’s toolbox to be presented with menus and dialogs that facilitate the editing of join positioning and sequence information. The UJV will allow users to save these edits and translate the new genome to a GenBank file. It also allows the user to upload versions of their modified genomes to the GenBank (“Public nucleic acid sequence repository”). High resolution screenshots of the viewer can be generated and saved to secondary memory, with a filename, image format, and location of the user’s choice, in order to facilitate collaboration on verifying a genome is correct. These high resolution images will have a resolution of 600 dots per inch (DPI). The user can choose from the following image formats: Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable Network Graphics (PNG), and Joint Photographic Experts Group (JPEG). The user can change the colors of the components in the application. The changing of colors of components is of particular interest when creating screenshots of the viewer. Consumers of screenshots of the viewer may have color requirements, so being able to change the colors of the UJV to meet these requirements is necessary. In order to make this software as available as possible to further advance the field of bioinformatics, the UJV will be open-source and be able to run on any computer that can run a Java Virtual Machine (JVM). Open-sourcing the UJV code will allow users from all across the world with access to computers that can run a JVM to contribute new features to the UJV codebase and to contribute improvements to existing features in the UJV codebase. Genome sequencing will become less expensive if this application gains attention and is used in the industry. LAB 1 – UNVERIFIED JOIN VIEWER 6 2.2 Major Components (Hardware/Software). The UJV has a single component, which is an executable Java GUI application. No Internet connection or database is required for this software to function. An Internet connection is only required if the user wants to export a genome sequence directly to the GenBank. In order to use this software, the user must run the UJV GUI on a computer that has an operating system installed that can load the Java 8 Runtime Environment and run in graphical mode, has a monitor and a graphics card or chip, and secondary storage to contain the GenBank file(s) to be loaded. The main Java library needed to implement the GUI is JavaFX ("1 JavaFX Overview"). The GUI consists of a pane that details the currently visible features, a central viewer for displaying the current region of the genome, and an inspection pane. Figure 1 outlines the major functional components. Figure 1 - Major Functional Components Diagram LAB 1 – UNVERIFIED JOIN VIEWER 7 3 Identification of Case Study The UJV was designed as a way to aid Dr. Gauthier’s studies on the Mycobacterium species afflicting striped bass in the Chesapeake Bay. Figure 2 details the current process of validating the sequencing of Mycobacterium genomes. Figure 2 - Current process for verifying joins in a genome If a join does not look correct, the join reason must be located in the sequencer results by means other than the genome viewer. This invalid join fixing subroutine of the join validation process is circled in red in Figure 2. Having been unable to find an existing software solution to view join reasoning inside of a genome viewer, Dr. Gauthier has approached the Computer Science department at Old Dominion University and asked for a software solution to be developed that will display join reasoning alongside GenBank genome information. Figure 3 demonstrates the proposed way to meet Dr. Gauthier’s needs. LAB 1 – UNVERIFIED JOIN VIEWER 8 Figure 3 - Improved process for verifying joins The new process will allow for less time to be spent on correcting incorrect. The time required to determine why a join was made will be reduced since this information will be present in the genome viewer. 4 Unverified Join Viewer Product Prototype Description The prototype developed will focus on viewing genome information. The requirements that pertain to the manipulation of genome information and exporting will not be implemented. All of the other requirements will be implemented. 4.1 Prototype Architecture (Hardware/Software) The UJV prototype will be a GUI application implemented using JavaFX. The UJV will display genome information when a user loads a GenBank file within the application. Figure 4 shows the format of how the UJV will display genome information. LAB 1 – UNVERIFIED JOIN VIEWER Figure 4 - Prototype Screenshot 4.2 Prototype Features and Capabilities Only loading, zooming, filtering, changing color schemes, feature inspection and screenshots will be implemented. The manipulation of genome information and saving these edits will not be in the prototype due to time constraints. Table 1 shows the differences in features provided by the prototype and the real world product. [This space intentionally left blank.] 9 LAB 1 – UNVERIFIED JOIN VIEWER 10 Table 1 - Differences between prototype and real world product Features UJV Prototype UJV Real World Product Zoom yes yes Color Options yes yes Load a GenBank File yes yes View Genome in a circle yes yes View Features inside the circle yes yes Select a Feature as a Join yes yes View Joins Outside of circle yes yes View Feature Data yes yes View Join Data yes yes Multiple Tabs for Different Files yes yes High Resolution Screen Shots yes yes Select a Join for Editing no yes Edit Join Information no yes Save Edited File in the GBK format no yes Upload the File to the GenBank yes no 4.3 Prototype Development Challenges and Risks The development challenges the prototype development team faces are time and working with new technology. The team has to finish designing, documenting and developing a working prototype by the end of the fall 2015 semester. In order to reduce the risk of the time constraint, the prototype has fewer requirements than the real world product. The challenge of using a new framework appears to be minimal since the team developing the UJV prototype has shown that they can use the JavaFX framework effectively, as evidenced by prototypes created by the team. LAB 1 – UNVERIFIED JOIN VIEWER 11 Glossary Bioinformatics: Study of biological data. Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in a sequence. DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the development, functioning and reproduction of all organisms. Eco-Epidemiology: Study of ecologic influences on human health. Ecology: Study of interactions among organism and their environment. Etiology: Study of origination. Feature: Specific information about genomic data. Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together. GenBank: A sequence database that is open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information. Geneious: A powerful and comprehensive suite of molecular biology tools. Genome: The genetic material of an organism. Join: A join is a place where two fragments are recombined, or joined again. Join Evidence: The means of determining the position where two fragments are joined. M. marinum: A free living bacterium, which causes opportunistic infections in humans. M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass. Mycobacteriosis: Diseases caused by a group of bacteria. LAB 1 – UNVERIFIED JOIN VIEWER 12 Scale: The measure of when to display the sequence or join as a line or its full sequence pattern. PCR: Stands for Polymerase chain reaction. It is a process used to amplify a region of DNA for further study. Allows us to isolate DNA fragments from a genome for study and analysis. PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In PCR the strands are isolated and a primer is used to start the rebuilding. Through the use of specific primers at specific points we are able to select parts of a DNA strand to be studied. LAB 1 – UNVERIFIED JOIN VIEWER 13 References 1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784 David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from http://www.ncbi.nlm.nih.gov/genbank/submit What's a Genome? (n.d.). Retrieved September 20, 2015, from http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml