Running head: LAB 1 – UNVERIFIED JOIN VIEWER 1

advertisement
Running head: LAB 1 – UNVERIFIED JOIN VIEWER
Lab 1 – Unverified Join Viewer
Joseph R. Cooper
Old Dominion University
CS 411W
Janet Brunelle
October 25, 2015
Version 2
1
LAB 1 – UNVERIFIED JOIN VIEWER
2
Table of Contents
1 Introduction .................................................................................................................................. 3
2 Unverified Join Viewer Product Description ............................................................................... 4
2.1 Key Product Features and Capabilities. ........................................................................ 4
2.2 Major Components (Hardware/Software)..................................................................... 6
3 Identification of Case Study......................................................................................................... 7
4 Unverified Join Viewer Product Prototype Description .............................................................. 8
4.1 Prototype Architecture (Hardware/Software) ............................................................... 8
4.2 Prototype Features and Capabilities.............................................................................. 9
4.3 Prototype Development Challenges and Risks ........................................................... 10
Glossary .........................................................................................................................................11
References ..................................................................................................................................... 13
List of Figures
Figure 1 - Major Functional Components Diagram ........................................................................ 6
Figure 2 - Current process for verifying joins in a genome ............................................................ 7
Figure 3 - Improved process for verifying joins ............................................................................. 8
Figure 4 - Prototype Screenshot ...................................................................................................... 9
List of Tables
Table 1 - Differences between prototype and real world product ................................................. 10
LAB 1 – UNVERIFIED JOIN VIEWER
3
1 Introduction
In the Chesapeake Bay, mycobacteriosis is afflicting large amounts of striped bass
("David Gauthier, Ph.D."). Mycobacteriosis is an infectious disease caused by bacteria in the
genus Mycobacterium (“In Focus - Striped Bass Health”). Mycobacteriosis causes inflammation,
tissue destruction and formation of scar tissue in one or more organs ("In Focus - Striped Bass
Health"). To further understand the characteristics, evolution, and pathogenicity of the
Mycobacterium plaguing the striped bass in the Chesapeake Bay, the pervasive Mycobacterium
genomes must be studied.
Unfortunately, the current process for studying the genomes of Mycobacterium species is
time consuming and costly. This process involves splitting up a genome into many smaller
pieces, determining the genetic makeup of those segments, and then joining all of the segments
back together ("What's a Genome?"). Sequencers that join together the splits are prone to errors.
Genome viewers do not currently have the ability to show join evidence. If a join appears faulty,
a biologist must manually check the join in the sequencer output, and then must conduct lab
work to find the correct join. In order to prevent this extra work from having to be done, the
information must be readily available in the genome viewer being used.
The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to
inspect and search for genome features, in particular joins, within a genome sequence by means
of a graphical user interface (GUI). By presenting all this information in a GUI, the biologists
will be able to quickly navigate through vast amounts of genome information to determine if
joins were made correctly or how to correctly create a join if it does appear to be improperly
formed. The UJV saves the biologist the time of having to track down why and how bad joins
were made by making all this information readily available.
LAB 1 – UNVERIFIED JOIN VIEWER
4
2 Unverified Join Viewer Product Description
The UJV is a GUI application. It reads in genome sequence information stored in
GenBank format and then displays this information in such a way that it can be easily read. Users
are able to quickly locate specific genome information by filtering genome features and zooming
in and out of regions of the genome. Users can also hover over genome features in the viewer
with a mouse to view tooltips which provide more information about features.
2.1 Key Product Features and Capabilities.
The UJV can display genome information that is stored in GenBank format. In order for
GenBank files to be rendered in the UJV, they must be loaded into the UJV using menus
provided in the UJV’s GUI. The UJV will parse the loaded file and create a graphical, circular
representation of the genome’s information that can be read and searched. The joins and other
features in the genome will be represented on the circle by points labeled by feature type and
located on the circle in positions that coincide with their positions in the genome sequence.
Since there may be enough features present in a GenBank file to cause features to overlap
in the genome viewer, the user may zoom in and out in the genome viewer. When zooming in,
the distance between genome features increases and genome features move away from focal
point of the zoom and may disappear from the genome view if they are close enough to the
extremities of the viewer. When zooming out, genome features move closer to the focal point of
the zoom and may appear in the genome view if their positions were close enough to the
extremities of the viewer before the user zoomed out. It is possible for the user to zoom in all the
way to view the nucleotide sequence of the genome. Certain feature types present in the
GenBank file, such as joins, may also be selected to not be rendered in the GUI in order to
prevent overlapping.
LAB 1 – UNVERIFIED JOIN VIEWER
5
If a user sees a feature that has been improperly sequenced, the user can fix the feature in
the viewer. The user can access the viewer’s toolbox to be presented with menus and dialogs that
facilitate the editing of join positioning and sequence information. The UJV will allow users to
save these edits and translate the new genome to a GenBank file. It also allows the user to upload
versions of their modified genomes to the GenBank (“Public nucleic acid sequence repository”).
High resolution screenshots of the viewer can be generated and saved to secondary
memory, with a filename, image format, and location of the user’s choice, in order to facilitate
collaboration on verifying a genome is correct. These high resolution images will have a
resolution of 600 dots per inch (DPI). The user can choose from the following image formats:
Tagged Image File Format (TIFF), Graphics Interchange Format (GIF), Portable Network
Graphics (PNG), and Joint Photographic Experts Group (JPEG).
The user can change the colors of the components in the application. The changing of
colors of components is of particular interest when creating screenshots of the viewer.
Consumers of screenshots of the viewer may have color requirements, so being able to change
the colors of the UJV to meet these requirements is necessary.
In order to make this software as available as possible to further advance the field of
bioinformatics, the UJV will be open-source and be able to run on any computer that can run a
Java Virtual Machine (JVM). Open-sourcing the UJV code will allow users from all across the
world with access to computers that can run a JVM to contribute new features to the UJV
codebase and to contribute improvements to existing features in the UJV codebase. Genome
sequencing will become less expensive if this application gains attention and is used in the
industry.
LAB 1 – UNVERIFIED JOIN VIEWER
6
2.2 Major Components (Hardware/Software).
The UJV has a single component, which is an executable Java GUI application. No
Internet connection or database is required for this software to function. An Internet connection
is only required if the user wants to export a genome sequence directly to the GenBank. In order
to use this software, the user must run the UJV GUI on a computer that has an operating system
installed that can load the Java 8 Runtime Environment and run in graphical mode, has a monitor
and a graphics card or chip, and secondary storage to contain the GenBank file(s) to be loaded.
The main Java library needed to implement the GUI is JavaFX ("1 JavaFX Overview"). The GUI
consists of a pane that details the currently visible features, a central viewer for displaying the
current region of the genome, and an inspection pane. Figure 1 outlines the major functional
components.
Figure 1 - Major Functional Components Diagram
LAB 1 – UNVERIFIED JOIN VIEWER
7
3 Identification of Case Study
The UJV was designed as a way to aid Dr. Gauthier’s studies on the Mycobacterium
species afflicting striped bass in the Chesapeake Bay. Figure 2 details the current process of
validating the sequencing of Mycobacterium genomes.
Figure 2 - Current process for verifying joins in a genome
If a join does not look correct, the join reason must be located in the sequencer results by
means other than the genome viewer. This invalid join fixing subroutine of the join validation
process is circled in red in Figure 2. Having been unable to find an existing software solution to
view join reasoning inside of a genome viewer, Dr. Gauthier has approached the Computer
Science department at Old Dominion University and asked for a software solution to be
developed that will display join reasoning alongside GenBank genome information. Figure 3
demonstrates the proposed way to meet Dr. Gauthier’s needs.
LAB 1 – UNVERIFIED JOIN VIEWER
8
Figure 3 - Improved process for verifying joins
The new process will allow for less time to be spent on correcting incorrect. The time
required to determine why a join was made will be reduced since this information will be present
in the genome viewer.
4 Unverified Join Viewer Product Prototype Description
The prototype developed will focus on viewing genome information. The requirements
that pertain to the manipulation of genome information and exporting will not be implemented.
All of the other requirements will be implemented.
4.1 Prototype Architecture (Hardware/Software)
The UJV prototype will be a GUI application implemented using JavaFX. The UJV will
display genome information when a user loads a GenBank file within the application. Figure 4
shows the format of how the UJV will display genome information.
LAB 1 – UNVERIFIED JOIN VIEWER
Figure 4 - Prototype Screenshot
4.2 Prototype Features and Capabilities
Only loading, zooming, filtering, changing color schemes, feature inspection and
screenshots will be implemented. The manipulation of genome information and saving these
edits will not be in the prototype due to time constraints. Table 1 shows the differences in
features provided by the prototype and the real world product.
[This space intentionally left blank.]
9
LAB 1 – UNVERIFIED JOIN VIEWER
10
Table 1 - Differences between prototype and real world product
Features
UJV Prototype UJV Real World Product
Zoom
yes
yes
Color Options
yes
yes
Load a GenBank File
yes
yes
View Genome in a circle
yes
yes
View Features inside the circle
yes
yes
Select a Feature as a Join
yes
yes
View Joins Outside of circle
yes
yes
View Feature Data
yes
yes
View Join Data
yes
yes
Multiple Tabs for Different Files
yes
yes
High Resolution Screen Shots
yes
yes
Select a Join for Editing
no
yes
Edit Join Information
no
yes
Save Edited File in the GBK format no
yes
Upload the File to the GenBank
yes
no
4.3 Prototype Development Challenges and Risks
The development challenges the prototype development team faces are time and working
with new technology. The team has to finish designing, documenting and developing a working
prototype by the end of the fall 2015 semester. In order to reduce the risk of the time constraint,
the prototype has fewer requirements than the real world product. The challenge of using a new
framework appears to be minimal since the team developing the UJV prototype has shown that
they can use the JavaFX framework effectively, as evidenced by prototypes created by the team.
LAB 1 – UNVERIFIED JOIN VIEWER
11
Glossary
Bioinformatics: Study of biological data.
Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in
a sequence.
DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the
development, functioning and reproduction of all organisms.
Eco-Epidemiology: Study of ecologic influences on human health.
Ecology: Study of interactions among organism and their environment.
Etiology: Study of origination.
Feature: Specific information about genomic data.
Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a
whole and must break the genome up into fragments or pieces that can be processed and
then formed back together.
GenBank: A sequence database that is open access, annotated collection of all publicly available
nucleotide sequences and their protein translations. This database is produced and
maintained by the National Center for Biotechnology Information.
Geneious: A powerful and comprehensive suite of molecular biology tools.
Genome: The genetic material of an organism.
Join: A join is a place where two fragments are recombined, or joined again.
Join Evidence: The means of determining the position where two fragments are joined.
M. marinum: A free living bacterium, which causes opportunistic infections in humans.
M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass.
Mycobacteriosis: Diseases caused by a group of bacteria.
LAB 1 – UNVERIFIED JOIN VIEWER
12
Scale: The measure of when to display the sequence or join as a line or its full sequence pattern.
PCR: Stands for Polymerase chain reaction. It is a process used to amplify a region of DNA for
further study. Allows us to isolate DNA fragments from a genome for study and analysis.
PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not
able to build another strand for a length of DNA that has lost its partner, it needs a
starting point to build off of. In PCR the strands are isolated and a primer is used to start
the rebuilding. Through the use of specific primers at specific points we are able to select
parts of a DNA strand to be studied.
LAB 1 – UNVERIFIED JOIN VIEWER
13
References
1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from
http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784
David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from
http://sci.odu.edu/biology/directory/gauthier.shtml
In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from
http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp
Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from
http://www.ncbi.nlm.nih.gov/genbank/submit
What's a Genome? (n.d.). Retrieved September 20, 2015, from
http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml
Download