Running head: LAB 2 – UNVERIFIED JOIN VIEWER 1 Joseph R. Cooper

advertisement
Running head: LAB 2 – UNVERIFIED JOIN VIEWER
Lab 2 Sections 1 and 2 – Unverified Join Viewer
Joseph R. Cooper
Old Dominion University
CS 411W
Janet Brunelle
November 22, 2015
Version 1
1
LAB 2 – UNVERIFIED JOIN VIEWER
2
Table of Contents
1 Introduction .................................................................................................................................. 3
1.1 Purpose.......................................................................................................................... 4
1.2 Scope ............................................................................................................................. 5
1.3 Definitions, Acronyms, and Abbreviations ................................................................... 5
1.4 References ..................................................................................................................... 7
1.5 Overview ....................................................................................................................... 7
2 General Description ..................................................................................................................... 8
2.1 Prototype Architecture Description............................................................................... 8
2.2 Prototype Functional Description ................................................................................. 9
2.3 External Interfaces .......................................................................................................11
List of Figures
Figure 1. Current process for verifying joins in a genome ............................................................. 4
Figure 2. Improved process for verifying joins .............................................................................. 5
Figure 3. Prototype Major Functional Components Diagram ........................................................ 9
List of Tables
Table 1. Differences between prototype and real-world product .................................................. 10
LAB 2 – UNVERIFIED JOIN VIEWER
3
1 Introduction
In the Chesapeake Bay, mycobacteriosis is afflicting large amounts of striped bass (David
Gauthier, Ph.D.). Mycobacteriosis is an infectious disease caused by bacteria in the genus
Mycobacterium (In Focus - Striped Bass Health). Mycobacteriosis causes inflammation, tissue
destruction and formation of scar tissue in one or more organs (In Focus - Striped Bass Health).
To further understand the characteristics, evolution, and pathogenicity of the Mycobacterium
plaguing the striped bass in the Chesapeake Bay, the pervasive Mycobacterium genomes must be
studied.
Unfortunately, the current process for studying the genomes of Mycobacterium species is
time consuming and costly. This process involves splitting up a genome into many smaller
pieces, determining the genetic makeup of those segments, and then joining all of the segments
back together (What's a Genome?). Sequencers that join together the splits are prone to errors.
Genome viewers do not currently have the ability to show join evidence. If a join appears faulty,
a biologist must manually check the join in the sequencer output, and then must conduct lab
work to find the correct join. To eliminate the need for this extra work, the information must be
readily available in the genome viewer being used.
The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to
inspect and search for genome features, in particular joins, within a genome sequence by means
of a graphical user interface (GUI). By presenting all this information in a GUI, the biologists
will be able to quickly navigate through vast amounts of genome information to determine if
joins were made correctly or how to correctly create a join if it does appear to be improperly
formed. The UJV saves the biologist the time of having to track down why and how bad joins
were made by making all this information readily available.
LAB 2 – UNVERIFIED JOIN VIEWER
4
1.1 Purpose
The UJV was designed as a way to aid Dr. Gauthier’s studies on the Mycobacterium
species afflicting striped bass in the Chesapeake Bay. Figure 1 shows the current process of
validating the sequencing of Mycobacterium genomes.
Figure 1. Current process for verifying joins in a genome
If a join does not look correct, the join reason must be located in the sequencer results by
means other than the genome viewer. This invalid join fixing subroutine of the join validation
process is circled in red in Figure 2. Having been unable to find an existing software solution to
view join reasoning inside of a genome viewer, Dr. Gauthier has approached the Computer
Science department at Old Dominion University and asked for a software solution to be
LAB 2 – UNVERIFIED JOIN VIEWER
developed that will display join reasoning alongside GenBank genome information. Figure 2
shows the proposed way to meet Dr. Gauthier’s needs.
Figure 2. Improved process for verifying joins
The new process will allow for less time to be spent on correcting incorrect joins. The
time required to determine why a join was made will be reduced since this information will be
present in the genome viewer.
1.2 Scope
The UJV will be used by biologists to verify and fix, if necessary, the results of genome
sequencing. Biologists will be able to collaborate on a genome by saving their edits and
exporting them to a GenBank file. The final product will allow users to upload a GenBank file
directly to the central GenBank database (Public nucleic acid sequence repository).
1.3 Definitions, Acronyms, and Abbreviations
Bioinformatics: Study of biological data.
5
LAB 2 – UNVERIFIED JOIN VIEWER
Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in
a sequence.
DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the
development, functioning and reproduction of all organisms.
Eco-Epidemiology: Study of ecologic influences on human health.
Ecology: Study of interactions among organism and their environment.
Etiology: Study of origination.
Feature: Specific information about genomic data.
Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a
whole and must break the genome up into fragments or pieces that can be processed and
then formed back together.
GenBank: A sequence database that is open access, annotated collection of all publicly available
nucleotide sequences and their protein translations. This database is produced and
maintained by the National Center for Biotechnology Information.
Geneious: A powerful and comprehensive suite of molecular biology tools.
Genome: The genetic material of an organism.
Join: A join is a place where two fragments are recombined, or joined again.
Join Evidence: The means of determining the position where two fragments are joined.
M. marinum: A free living bacterium, which causes opportunistic infections in humans.
M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass.
Mycobacteriosis: Diseases caused by a group of bacteria.
Scale: The measure of when to display the sequence or join as a line or its full sequence pattern.
6
LAB 2 – UNVERIFIED JOIN VIEWER
7
PCR: Stands for Polymerase chain reaction. It is a process used to amplify a region of DNA for
further study. Allows us to isolate DNA fragments from a genome for study and analysis.
PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not
able to build another strand for a length of DNA that has lost its partner, it needs a
starting point to build off of. In PCR the strands are isolated and a primer is used to start
the rebuilding. Through the use of specific primers at specific points we are able to select
parts of a DNA strand to be studied.
1.4 References
1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from
http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784
David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from
http://sci.odu.edu/biology/directory/gauthier.shtml
In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from
http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp
Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from
http://www.ncbi.nlm.nih.gov/genbank/submit
What's a Genome? (n.d.). Retrieved September 20, 2015, from
http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml
1.5 Overview
This product specification explains what problems the UJV will solve and the target
audience who needs the problems solved. The product specification also gives an overview on
how the prototype will be used. The specific requirements for the prototype are not enumerated
in this document. They are provided in Lab 3, Section 1.
LAB 2 – UNVERIFIED JOIN VIEWER
8
2 General Description
The UJV is a GUI application. It reads in genome sequence information stored in
GenBank format and then displays this information in such a way that it can be easily read. Users
are able to quickly locate specific genome information by filtering genome features and zooming
in and out of regions of the genome. Users can also hover over genome features in the viewer
with a mouse to view tooltips which provide more information about features. Joins can be
edited by clicking on them and editing their information in a popup window.
2.1 Prototype Architecture Description
The UJV prototype will be a GUI application implemented using JavaFX (1 JavaFX
Overview). No internet connection is required to use the prototype. The GUI consists of a pane
that details the currently visible features, a central viewer for displaying the current region of the
genome, and an inspection pane. The UJV will display genome information when a user loads a
GenBank file within the application. The user can inspect and edit genome features using the
GUI. Figure 3 shows the format of how the UJV will present genome information.
[This space intentionally left blank.]
LAB 2 – UNVERIFIED JOIN VIEWER
Figure 3. Prototype Major Functional Components Diagram
2.2 Prototype Functional Description
The prototype will do everything the real-world product will do except for uploading
GenBank files to the central GenBank database and allow changing of the GUI’s color palette.
The following table lists the functions that the prototype and real-world product will provide.
[This space intentionally left blank.]
9
LAB 2 – UNVERIFIED JOIN VIEWER
10
Features
UJV Prototype UJV Real-World Product
Zoom
Yes
Yes
Color Options
No
Yes
Load a GenBank File
Yes
Yes
View Genome in a circle
Yes
Yes
View Features Inside the Circle
Yes
Yes
Select a Feature as a Join
Yes
Yes
View Joins Outside of Circle
Yes
Yes
View Feature Data
Yes
Yes
View Join Data
Yes
Yes
Multiple Tabs for Different Files
Yes
Yes
High Resolution Screen Shots
Yes
Yes
Select a Join for Editing
Yes
Yes
Edit Join Information
Yes
Yes
Save Edited File in the GBK format Yes
Yes
Upload the File to the GenBank
Yes
No
Table 1. Differences between prototype and real-world product
The UJV can display genome information that is stored in GenBank format. In order for
GenBank files to be rendered in the UJV, they must be loaded into the UJV using menus
provided in the UJV’s GUI. The UJV will parse the loaded file and create a graphical, circular
representation of the genome’s information that can be read and searched. The joins and other
features in the genome will be represented on the circle by points labeled by feature type and
located on the circle in positions that coincide with their positions in the genome sequence.
Since there may be enough features present in a GenBank file to cause features to overlap
in the genome viewer, the user may zoom in and out. When zooming in, less and less features are
displayed, but what is displayed becomes larger and clearer. When zooming out, more and more
LAB 2 – UNVERIFIED JOIN VIEWER
11
features are displayed, but what is displayed becomes smaller and less clear. Certain feature
types present in the GenBank file, such as joins, may also be selected to not be rendered in the
GUI in order to prevent overlapping.
If a user sees a feature that has been improperly sequenced, the user can fix the feature in
the viewer. The user can access the viewer’s toolbox to be presented with menus and dialogs that
facilitate the editing of join positioning and sequence information. The UJV will allow users to
save these edits and translate the new genome to a GenBank file.
High resolution screenshots of the viewer can be generated and saved to secondary
memory, with a filename, image format, and location of the user’s choice, to facilitate
collaboration on finalizing a genome. These high resolution images will have a resolution of 600
dots per inch (DPI). The user can choose from the following image formats: Tagged Image File
Format (TIFF), Graphics Interchange Format (GIF), Portable Network Graphics (PNG), and
Joint Photographic Experts Group (JPEG).
The prototype will not allow the user to change the GUI’s color palette nor allow the
uploading of GenBank files directly to the central GenBank database. These features will not be
included due to time constraints.
2.3 External Interfaces
The UJV prototype will not have any external interfaces. The final product will have one
external interface, the central GenBank database. Users will be able to upload their GenBank
files directly from the real-world product.
Download