Running Head: Lab 1 – Unverified Join Viewer Description Lab 1 – Unverified Join Viewer Product Description James Ord Old Dominion University CS 411W Janet Brunelle October 25, 2015 Version 2 1 Lab 1 – Unverified Join Viewer Description 2 Table of Contents 1 Introduction ............................................................................................................................................ 3 2 Unverified join viewer product description ........................................................................................... 4 2.1 Key Product Features and Capabilities ........................................................................................... 4 2.2 Major Components (Hardware/Software) ....................................................................................... 4 3 Identification of Case Study ................................................................................................................... 5 4 Unverified Join Viewer Prototype Description ...................................................................................... 6 4.1 Prototype Architecture (Hardware/Software) ................................................................................. 6 4.2 Prototype Features and Capabilities................................................................................................ 7 4.3 Prototype Development Challenges ................................................................................................ 8 5 Glossary ................................................................................................................................................. 9 Resources ................................................................................................. Error! Bookmark not defined. Figures and Tables Figure 1. Product Major Functional Component Diagram ....................................................................... 5 Figure 2. Prototype Major Functional Component Diagram .................................................................... 7 Table 1. Prototype vs. Real World Product Features……………………………………………………..8 Lab 1 – Unverified Join Viewer Description 3 1 Introduction The modern market for genetic sequence viewers is populated with software that is expensive, difficult to use, or lacking in specialized development. Industry standard software, such as Geneious does not have the capability to view repeated genome sequences, instead opting to leave such strands out of the genome view up to the user to interpret (Geneious, n.d.). The process of verifying where a repeat sequence is placed involves expensive and time consuming lab work, sometimes costing even thousands of dollars for a single genome. Genome sequencing is a massively complicated process because of the amount of data being processed. Mycobacterium shottsii, for example, is a simple bacterial organism, yet it has almost 1,500 base pairs in its genome (Global Catalogue of Microorganisms, n.d.). More complex organisms can contain genomes that are millions of base pairs in length. The amount of data that must be processed by a genome sequencers is too large to do all at once, so the genome must be split into a number of sequences for easier processing, then later recombined. A common issue for gene sequencers is a repeated sequence. These are not rare within a complete sequence and a genome may contain numerous examples of different repeat sequences. When a genome sequencer runs across repeated sequences it runs into the issue of not knowing exactly where to place that portion of the genetic code. Modern genome viewing software reflects this issue in that there is not a way for viewers to display repeated sequences along with the rest of the genome. A solution to the first problem exists in the form of the Corrective Algorithm for Repeat Placement software (CARP) developed by Abishek Biswas. This gene sequencer attempts to correctly place the l repeat sequences belong along the genome while providing the reasons for the placement with annotations. As helpful as CARP may be, the output is simply a text file for each base pair. This kind of data is indecipherable and requires a visual representation, hence the Unverified Join Viewer (UJV). Designed to be an open-source accompaniment to CARP, UJV will display an entire genome, including the repeat sequences placed by CARP, along with the justification Lab 1 – Unverified Join Viewer Description 4 annotations for each join. This will make studying a genome with repeats faster, easier, and less expensive by greatly reducing the amount of lab work required. 2 Unverified join viewer product description UJV is designed to be a simple and easy to use viewer to see the results of the gene sequencing process. The user will be able to import a genome from a file that is properly formatted according to standards defined by the National Center for Biotechnology Information (NCBI) GenBank (GenBank, 2015). UJV will initially display the whole genome within the main viewing pane with all the appropriate GenBank features as defined in the genome file available by clicking on an indicated position on the sequence. The user will be able to zoom into any portion of the genome to allow for more detailed inspection. UJV will also allow the user to make edits to the genome in the form of additional annotations or changed repeat sequence placements within the viewer and save those edits to the raw text file. 2.1 Key Product Features and Capabilities Unverified Join Viewer will be the first software that is able to view and edit ambiguous joins within a genome. The zoom function of UJV will be important for the user to inspect any particular part of the genome in detail. The final product will also be capable of verifying that edits being made to repeat sequence placements adhere to the proper justifications outlined by CARP for a join's existence. Being able to reorganize a genome on the computer without having to verify through a laboratory saves both time and money, even potentially thousands of dollars and weeks’ worth of time. 2.2 Major Components (Hardware/Software) UJV is designed as a stand-alone viewer, meaning it will not depend on outside databases or internet connectivity unless the user needs to download genome files (.gbk) from Genbank. If the user Lab 1 – Unverified Join Viewer Description 5 already has genome files available on the system then no internet connection will be required. The hardware requirements to run UJV are a minimum of 2 GB of memory and a minimum ATI Radeon HD 2400, GMA 4500, or GeForce 8 GPU. The software requirements are Windows, Linux, or OSX operating system and Java 8. UJV will also be designed as an open-source project which will allow community innovation, improvements, and customization according to individual requirements. Figure 1 below outlines the major components required for UJV. Figure 1. Product Major Functional Component Diagram 3 Identification of Case Study Dr. David Gauthier of Old Dominion University (ODU) is the primary biologist contact for the UVJ project. The purpose of the CARP and UJV systems is to assist with making genome research more feasible through convenient genome inspection and reduced laboratory costs. Dr. Gauthier's research into bacteria that affects Striped Bass in the Chesapeake Bay involves close inspection of the genomes of Mycobacterium spp (Old Dominion University, 2011). His experience with the current Lab 1 – Unverified Join Viewer Description 6 market of genome viewers provides invaluable insight to the design process of UJV since it is a product designed by a biologist for biologists. The open-source nature of the project also lends it to community updates and scrutiny from software developers. A group that requires a comprehensive viewer for a different feature set could potentially add their own features into the UJV framework, expanding its value to the biological community even further. 4 Unverified Join Viewer Prototype Description The prototype for UJV will be a fully functional application that is capable of using real-world data for demonstration. All of the viewing and zoom features will be on full display and fully functional within the prototype version. The only features that will be missing are editing and saving a genome. The development team feels that the other features such as zoom, color customization, and viewing join data will be enough of a challenge for the semester and are more important to the core functionality of UJV. The prototype will require no simulation as all of the data required is readily available through GenBank. 4.1 Prototype Architecture (Hardware/Software) The structure of the prototype will follow the previously described structure of the final product. The user will be able to load a genome file that is either created by a sequencer present on the user’s computer or downloaded from GenBank. UJV will then translate and display the imported genome for the user and allow for detailed inspection of join evidence and other feature annotations as included in the particular genome file. Figure 2 below outlines the major components required for the prototype version of UJV. Lab 1 – Unverified Join Viewer Description 7 Figure 2. Prototype Major Functional Component Diagram 4.2 Prototype Features and Capabilities The prototype will demonstrate the viewing and inspection features of UJV. These are the most essential parts of being able to verify if a genome was assembled correctly by CARP. As long as the system shows the joins made by CARP, the justifications for each join, and the zoom function works as designed, the user will have a far easier time analyzing a genome, indicating a successful prototype. Table 1 below details the differences between the prototype and end product versions of UJV. Lab 1 – Unverified Join Viewer Description Features UJV Prototype Zoom yes Color Options yes Load a GenBank File yes View Genome in a circle yes View Features inside the circle yes Select a Feature as a Join yes View Joins Outside of circle yes View Feature Data yes View Join Data yes Multiple Tabs for Different Files yes Hi Resolution Screen Shot yes Select a Join for Editing no Edit Join Information no Save Edited File in the GBK no format Upload the File to the GenBank no 8 UJV Real World Product yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes Table 1. Prototype vs. Real World Product Features 4.3 Prototype Development Challenges As with any new project, there will be a number of challenges to overcome during the development phase. First, the platforms that the project is being developed on, Java 8 and JavaFX, is brand new, leading to unfamiliarity with the new features implemented with these development tools. This is partially mitigated by the fact that two team members are Oracle certified Java programmers with extensive knowledge. This challenge is further mitigated by the wealth of documentation and tutorials available to familiarize the development team with these new tools. Another challenge will be limited domain experience within the team. As a whole, the team is lacking in experience in the biological field, increasing the risk of a possible disconnect between what the system actually needs and what the team thinks the system needs. This will be mitigated by having Dr. Gauthier directly involved in each phase of development to make sure the system is being designed not only in a way that is appealing to biologists, but is useful as well. Lab 1 – Unverified Join Viewer Description 9 Glossary Bioinformatics: Study of biological data Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in a sequence DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the development, functioning and reproduction of all organisms Eco-Epidemiology: Study of ecologic influences on human health Ecology: Study of interactions among organism and their environment Etiology: Study of origination Feature: Specific information about genomic data Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together. GenBank: A sequence database that is open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information Geneious: A powerful and comprehensive suite of molecular biology tools Genome: The genetic material of an organism Join: A place where two fragments are recombined, or joined again Join Evidence: The means of determining the position where two fragments are joined M. marinum: A free living bacterium, which causes opportunistic infections in humans M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass Mycobacteriosis: Diseases caused by a group of bacteria Lab 1 – Unverified Join Viewer Description 10 Scale: The measure of when to display the sequence or join as a line or its full sequence pattern PCR: Polymerase chain reaction. A process used to amplify a region of DNA for further study. Allows us to isolate DNA fragments from a genome for study and analysis. PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In PCR the strands are isolated and a primer is used to start the rebuilding. Through the use of specific primers at specific points we are able to select parts of a DNA strand we wish to study. Lab 1 – Unverified Join Viewer Description 11 References Geneious. (n.d.). Geneious. Retrieved September 21, 2015, from http://www.geneious.com Global Catalogue of Microorganisms. (n.d.). Mycobacterium shottsii. Retrieved September 21, 2015, from http://gcm.wfcc.info/speciesPage.jsp?strain_name=Mycobacterium shottsii GenBank. (2015). GenBank. Retrieved October 25, 2015 from http://www.ncbi.nlm.nih.gov/genbank/ Old Dominion University. (2011). David Gauthier, Ph.D. Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml