Running Head: Lab 2 – Unverified Join Viewer Product Specification Lab 2 – Unverified Join Viewer Product Specification Team Red James Ord CS411W Professor Janet Brunelle and Dr. Abishek Biswas November 22, 2015 Version 1.1 1 Lab 2 – Unverified Join Viewer Product Specification 2 Table of Contents 1. Introduction ................................................................................................................................. 3 1.1 Purpose.................................................................................................................................. 4 1.2 Scope ..................................................................................................................................... 4 1.3 Definitions, Acronyms, and Abbreviations .......................................................................... 5 1.4 References ............................................................................................................................. 6 1.5 Overview ............................................................................................................................... 7 2. General Description .................................................................................................................... 8 2.1 Prototype Architecture Description ...................................................................................... 8 2.2 Prototype Functional Description ....................................................................................... 10 List of Tables Table 1. Prototype vs. Product Feature Comparison .................................................................... 10 List of Figures Figure 1. Prototype Major Functional Component Diagram .......................................................... 9 Lab 2 – Unverified Join Viewer Product Specification 3 1. Introduction Striped Bass, which are of major commercial and recreational importance in the Chesapeake Bay, are affected with mycobacteriosis from Mycobacterium spp., namely M. pseudoshottsii and M. shottsii (Old Dominion University, 2011). Combating this sickness requires extensive research into the very genetic sequences of the various species of bacteria present in the bay. The process of sequencing a genome is massively complicated and cannot be done all at once. M. pseudoshottsii, for example, is a relatively primitive bacterial organism, yet it almost has 1,500 base pairs in its genome (Global Catalogue of Microorganisms, n.d.). More complex organisms can have genomes that are millions of base pairs in length. The amount of data that must be processed by a genetic sequencer is too much to do all at once, so the genome must be split before sequencing. After the sequencing process the genome is recombined. This creates another problem for gene sequencers in the form of repeated sequences. Not every sequence along a genome is completely unique, meaning that a genetic sequencer could have difficulty placing the sequence in the correct order. A solution to the problem of repeat sequences comes in the form of the Corrective Algorithm for Repeat Placement software (CARP) developed by Abishek Biswas. This gene sequencer attempts to correctly place a join in order along the genome upon sequencing, while providing the reasons for a join with annotations. In order to fully study a genome, however, a researcher must be able to view the genome. The current market of genome viewers consists of software that is expensive, difficult to use, or lacking in specialized capabilities. The current industry standard, Geneious, is not capable of displaying joins and instead leaves repeat sequences out of the displayed genome (Geneious, n.d.). The Unverified Join Viewer (UJV), designed as an open source GUI Lab 2 – Unverified Join Viewer Product Specification 4 accompaniment to CARP, is a system capable of displaying a complete bacterial genome, repeat sequences, and the evidence used to create a join all in one viewing panel. 1.1 Purpose UJV is a stand alone graphical user interface that displays the results of the genome sequencing process carried out by CARP. Working closely with input from Dr. Gauthier, UJV is a system designed for biologists by a biologist with the intent of creating a free and easy to use genome viewer that includes an important feature not found on the current market. In addition to genome features commonly found with other sequencers, UJV is capable of displaying the location and relevant evidence for joins produced by CARP. Showing the join evidence allows the user to view a join and analyze the justification used to make the join without the use of time consuming and expensive lab processes. UJV is also an open source project that welcomes input from the development community. The project is open for community scrutiny and improvement in addition to specializations that depend on the developer’s specific requirements. 1.2 Scope The product version of UJV will be a completely independent system capable of running on Windows, Mac OS, and Linux machines. While the system will not be inherently dependent on internet connectivity, the availability of valid files to use with UJV may require an internet connection if the user has no such files available. If a genome file is needed, UJV will accept genome files that are properly formatted according to the standards defined by the National Center for Biotechnology Information (NCBI) Genbank (GenBank, 2015). Upon opening a file, the user will be able to view the genome within the main viewing pane of the system. The system will be capable of zooming in on a particular portion of the genome to allow for detailed inspection on a specific area. Additionally, the features defined within the imported genome file Lab 2 – Unverified Join Viewer Product Specification 5 will be represented as dots in their appropriate locations along the genome. The user will be able to click on these features to display relevant information defined within the genome file about that particular feature. Finally, UJV will allow the user to make edits to a genome by creating additional annotations to a feature, redefining the location of a feature, or changing the placement of a repeat sequence. Any changes made to a genome within UJV will be saved to the originally imported genome file. The prototype for UJV will be a fully functional application that is capable of using real world data for demonstration. All of the viewing and zoom features will be on full display and fully functional within the prototype version. The only features that will be missing are editing and saving a genome. The development team feels that the other features such as zoom, color customization, and viewing join data will be enough of a challenge for the semester and are more important to the core functionality of UJV. The prototype will require no simulation as all of the data required is readily available through GenBank. 1.3 Definitions, Acronyms, and Abbreviations Bioinformatics: Study of biological data Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in a sequence DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the development, functioning and reproduction of all organisms Eco-Epidemiology: Study of ecologic influences on human health Ecology: Study of interactions among organism and their environment Etiology: Study of origination Feature: Specific information about genomic data Lab 2 – Unverified Join Viewer Product Specification 6 Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together. GenBank: A sequence database that is open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information Geneious: A powerful and comprehensive suite of molecular biology tools Genome: The genetic material of an organism Join: A place where two fragments are recombined, or joined again Join Evidence: The means of determining the position where two fragments are joined M. marinum: A free living bacterium, which causes opportunistic infections in humans M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass Mycobacteriosis: Diseases caused by a group of bacteria Scale: The measure of when to display the sequence or join as a line or its full sequence pattern PCR: Polymerase chain reaction. A process used to amplify a region of DNA for further study. Allows for the isolation of DNA fragments from a genome for study and analysis. PCR Primer: The starting point for the PCR process along a strand of DNA. A primer is used to mark the specific locations on a DNA strand for the PCR process to amplify. 1.4 References GenBank. (2015). GenBank. Retrieved October 25, 2015 from http://www.ncbi.nlm.nih.gov/genbank/ Geneious. (n.d.). Geneious. Retrieved September 21, 2015, from http://www.geneious.com Lab 2 – Unverified Join Viewer Product Specification 7 Global Catalogue of Microorganisms. (n.d.). Mycobacterium shottsii. Retrieved September 21, 2015, from http://gcm.wfcc.info/speciesPage.jsp?strain_name=Mycobacterium shottsii Lab 1 – Unverified Join Viewer Product Description. Version 2 (2015, October) Red Team. CS411W: James Ord Old Dominion University. (2011). David Gauthier, Ph.D. Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml 1.5 Overview This product specification provides the hardware and software configuration, capabilities and features of the Unverified Join Viewer prototype. The information provided in the remaining sections of this document includes a detailed description of the hardware and software architecture, capabilities, and key features. The specific requirements for the product can be found separately in Lab 2 Section 3.1. (This space left intentionally blank) Lab 2 – Unverified Join Viewer Product Specification 8 2. General Description Unverified Join Viewer is an operating system agnostic program that only depends on the presence of version 8 of Oracle’s Java being installed on the system. The system consists of three primary components: the viewing panel, the side panel, and the top panel. The viewing panel is the user’s main working space. It contains the genome drawing as well as the necessary features as described in the imported genome file. The viewing panel is also where the user can make any edits that may be required to the genome. The side panel is the primary means of organizing the information presented to the user. Through this panel, the user can enable or disable whether or not a particular feature or group of features is displayed on the main panel. The side panel also contains a complete list of features for the user to browse through. The top panel is used to give the user a sense of navigation while viewing the genome. While the viewing panel draws the genome in a circular form, the top panel displays a straight line with markers along the line to denote particular sections of the genome. When the user zooms in on a section of the genome, the top panel displays where the user is viewing in relation to the genome at large. 2.1 Prototype Architecture Description Unverified Join Viewer is a self contained GUI application, requiring few major components in order to work. The first component is the biologist that wants to study a genome. The user will control all aspects of the system including importing a genome, zooming, choosing what features to display, and verifying the validity of the joins that are presented. The next required component is the output from a gene sequencer. Ideally this output should come from CARP so that join evidence can be included into the displayed genome, but any genome from GenBank is valid. The last major component is UJV itself. UJV is where the information from Lab 2 – Unverified Join Viewer Product Specification 9 the imported genome is extracted and displayed for the user to inspect. Figure 1. Prototype Major Functional Component Diagram It should be noted that while the prototype for UJV has no file output, the product version of UJV will include file output as an additional major component after the user makes edits to a genome. Table 1 below describes the key differences between the product and prototype versions of UJV. (This space left intentionally blank) Lab 2 – Unverified Join Viewer Product Specification 10 Table 1. Features UJV Prototype UJV Real World Product yes yes yes yes yes yes yes yes yes yes yes yes yes Zoom yes Color Options yes Load a GenBank File yes View Genome in a circle yes View Features inside the circle yes Select a Feature as a Join yes View Joins Outside of circle yes View Feature Data yes View Join Data yes Multiple Tabs for Different Files yes Hi Resolution Screen Shot yes Select a Join for Editing no Edit Join Information no Save Edited File in the GBK no yes format Upload the File to the GenBank no yes Table 1. Prototype vs. Product Feature Comparison 2.2 Prototype Functional Description The prototype version of UJV is able to import a genome file selected from the user’s local system for display in the main viewing panel. UJV is currently focused on the display of bacterial genomes, which are circular in nature. When a genome file is imported UJV will display the genome in the viewing panel as a circle. Once the genome is displayed, the user is able to zoom in and out of a particular portion of the genome, allowing for detailed inspection of specific features. The viewing panel also displays GenBank features included in the imported file in the form of dots next to a part of the genome. The user is able to click on these dots to display additional information about each feature such as the location, feature type, and the specific genetic sequence of that feature. From the left panel, the user is able to select either a specific Lab 2 – Unverified Join Viewer Product Specification 11 feature from a comprehensive list of features or a set of features. The left panel controls what features are displayed on the viewing panel. Finally, UJV is capable of producing high resolution screen captures of the genome currently being viewed.