Running head: LAB 2 – UNVERIFIED JOIN VIEWER Lab 2 Sections 1 and 2 – Unverified Join Viewer Joseph R. Cooper Old Dominion University CS 411W Janet Brunelle November 22, 2015 Version 1 1 LAB 2 – UNVERIFIED JOIN VIEWER 2 Table of Contents 1 Introduction .................................................................................................................................. 3 1.1 Purpose.......................................................................................................................... 4 1.2 Scope ............................................................................................................................. 5 1.3 Definitions, Acronyms, and Abbreviations ................................................................... 5 1.4 References ..................................................................................................................... 7 1.5 Overview ....................................................................................................................... 7 2 General Description ..................................................................................................................... 8 2.1 Prototype Architecture Description............................................................................... 8 2.2 Prototype Functional Description ................................................................................. 9 2.3 External Interfaces .......................................................................................................11 List of Figures Figure 1. Current process for verifying joins in a genome ............................................................. 4 Figure 2. Improved process for verifying joins .............................................................................. 5 Figure 3. Prototype Major Functional Components Diagram ........................................................ 9 List of Tables Table 1. Differences between prototype and real-world product .................................................. 10 LAB 2 – UNVERIFIED JOIN VIEWER 3 1 Introduction In the Chesapeake Bay, mycobacteriosis is afflicting large amounts of striped bass (David Gauthier, Ph.D.). Mycobacteriosis is an infectious disease caused by bacteria in the genus Mycobacterium (In Focus - Striped Bass Health). Mycobacteriosis causes inflammation, tissue destruction and formation of scar tissue in one or more organs (In Focus - Striped Bass Health). To further understand the characteristics, evolution, and pathogenicity of the Mycobacterium plaguing the striped bass in the Chesapeake Bay, the pervasive Mycobacterium genomes must be studied. Unfortunately, the current process for studying the genomes of Mycobacterium species is time consuming and costly. This process involves splitting up a genome into many smaller pieces, determining the genetic makeup of those segments, and then joining all of the segments back together (What's a Genome?). Sequencers that join together the splits are prone to errors. Genome viewers do not currently have the ability to show join evidence. If a join appears faulty, a biologist must manually check the join in the sequencer output, and then must conduct lab work to find the correct join. To eliminate the need for this extra work, the information must be readily available in the genome viewer being used. The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to inspect and search for genome features, in particular joins, within a genome sequence by means of a graphical user interface (GUI). By presenting all this information in a GUI, the biologists will be able to quickly navigate through vast amounts of genome information to determine if joins were made correctly or how to correctly create a join if it does appear to be improperly formed. The UJV saves the biologist the time of having to track down why and how bad joins were made by making all this information readily available. LAB 2 – UNVERIFIED JOIN VIEWER 4 1.1 Purpose The UJV was designed as a way to aid Dr. Gauthier’s studies on the Mycobacterium species afflicting striped bass in the Chesapeake Bay. Figure 1 shows the current process of validating the sequencing of Mycobacterium genomes. Figure 1. Current process for verifying joins in a genome If a join does not look correct, the join reason must be located in the sequencer results by means other than the genome viewer. This invalid join fixing subroutine of the join validation process is circled in red in Figure 2. Having been unable to find an existing software solution to view join reasoning inside of a genome viewer, Dr. Gauthier has approached the Computer Science department at Old Dominion University and asked for a software solution to be LAB 2 – UNVERIFIED JOIN VIEWER developed that will display join reasoning alongside GenBank genome information. Figure 2 shows the proposed way to meet Dr. Gauthier’s needs. Figure 2. Improved process for verifying joins The new process will allow for less time to be spent on correcting incorrect joins. The time required to determine why a join was made will be reduced since this information will be present in the genome viewer. 1.2 Scope The UJV will be used by biologists to verify and fix, if necessary, the results of genome sequencing. Biologists will be able to collaborate on a genome by saving their edits and exporting them to a GenBank file. The final product will allow users to upload a GenBank file directly to the central GenBank database (Public nucleic acid sequence repository). 1.3 Definitions, Acronyms, and Abbreviations Bioinformatics: Study of biological data. 5 LAB 2 – UNVERIFIED JOIN VIEWER Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in a sequence. DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the development, functioning and reproduction of all organisms. Eco-Epidemiology: Study of ecologic influences on human health. Ecology: Study of interactions among organism and their environment. Etiology: Study of origination. Feature: Specific information about genomic data. Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together. GenBank: A sequence database that is open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information. Geneious: A powerful and comprehensive suite of molecular biology tools. Genome: The genetic material of an organism. Join: A join is a place where two fragments are recombined, or joined again. Join Evidence: The means of determining the position where two fragments are joined. M. marinum: A free living bacterium, which causes opportunistic infections in humans. M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass. Mycobacteriosis: Diseases caused by a group of bacteria. Scale: The measure of when to display the sequence or join as a line or its full sequence pattern. 6 LAB 2 – UNVERIFIED JOIN VIEWER 7 PCR: Stands for Polymerase chain reaction. It is a process used to amplify a region of DNA for further study. Allows us to isolate DNA fragments from a genome for study and analysis. PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In PCR the strands are isolated and a primer is used to start the rebuilding. Through the use of specific primers at specific points we are able to select parts of a DNA strand to be studied. 1.4 References 1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784 David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from http://www.ncbi.nlm.nih.gov/genbank/submit What's a Genome? (n.d.). Retrieved September 20, 2015, from http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml 1.5 Overview This product specification explains what problems the UJV will solve and the target audience who needs the problems solved. The product specification also gives an overview on how the prototype will be used. The specific requirements for the prototype are not enumerated in this document. They are provided in Lab 3, Section 1. LAB 2 – UNVERIFIED JOIN VIEWER 8 2 General Description The UJV is a GUI application. It reads in genome sequence information stored in GenBank format and then displays this information in such a way that it can be easily read. Users are able to quickly locate specific genome information by filtering genome features and zooming in and out of regions of the genome. Users can also hover over genome features in the viewer with a mouse to view tooltips which provide more information about features. Joins can be edited by clicking on them and editing their information in a popup window. 2.1 Prototype Architecture Description The UJV prototype will be a GUI application implemented using JavaFX (1 JavaFX Overview). No internet connection is required to use the prototype. The GUI consists of a pane that details the currently visible features, a central viewer for displaying the current region of the genome, and an inspection pane. The UJV will display genome information when a user loads a GenBank file within the application. The user can inspect and edit genome features using the GUI. Figure 3 shows the format of how the UJV will present genome information. [This space intentionally left blank.] LAB 2 – UNVERIFIED JOIN VIEWER Figure 3. Prototype Major Functional Components Diagram 2.2 Prototype Functional Description The prototype will do everything the real-world product will do except for uploading GenBank files to the central GenBank database and allow changing of the GUI’s color palette. The following table lists the functions that the prototype and real-world product will provide. [This space intentionally left blank.] 9 LAB 2 – UNVERIFIED JOIN VIEWER 10 Features UJV Prototype UJV Real-World Product Zoom Yes Yes Color Options No Yes Load a GenBank File Yes Yes View Genome in a circle Yes Yes View Features Inside the Circle Yes Yes Select a Feature as a Join Yes Yes View Joins Outside of Circle Yes Yes View Feature Data Yes Yes View Join Data Yes Yes Multiple Tabs for Different Files Yes Yes High Resolution Screen Shots Yes Yes Select a Join for Editing Yes Yes Edit Join Information Yes Yes Save Edited File in the GBK format Yes Yes Upload the File to the GenBank Yes No Table 1. Differences between prototype and real-world product The UJV can display genome information that is stored in GenBank format. In order for GenBank files to be rendered in the UJV, they must be loaded into the UJV using menus provided in the UJV’s GUI. The UJV will parse the loaded file and create a graphical, circular representation of the genome’s information that can be read and searched. The joins and other features in the genome will be represented on the circle by points labeled by feature type and located on the circle in positions that coincide with their positions in the genome sequence. Since there may be enough features present in a GenBank file to cause features to overlap in the genome viewer, the user may zoom in and out. When zooming in, less and less features are displayed, but what is displayed becomes larger and clearer. When zooming out, more and more LAB 2 – UNVERIFIED JOIN VIEWER 11 features are displayed, but what is displayed becomes smaller and less clear. Certain feature types present in the GenBank file, such as joins, may also be selected to not be rendered in the GUI in order to prevent overlapping. If a user sees a feature that has been improperly sequenced, the user can fix the feature in the viewer. The user can access the viewer’s toolbox to be presented with menus and dialogs that facilitate the editing of join positioning and sequence information. The UJV will allow users to save these edits and translate the new genome to a GenBank file. High resolution screenshots of the viewer can be generated and saved to secondary memory, with a filename, image format, and location of the user’s choice, to facilitate collaboration on finalizing a genome. These high resolution images will have a resolution of 600 dots per inch (DPI). The user can choose from the following image formats: Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable Network Graphics (PNG), and Joint Photographic Experts Group (JPEG). The prototype will not allow the user to change the GUI’s color palette nor allow the uploading of GenBank files directly to the central GenBank database. These features will not be included due to time constraints. 2.3 External Interfaces The UJV prototype will not have any external interfaces. The final product will have one external interface, the central GenBank database. Users will be able to upload their GenBank files directly from the real-world product.