Running head: LAB 1 – UNVERIFIED JOIN VIEWER
Lab 1 – Unverified Join Viewer
Joseph R. Cooper
CS 411W
Janet Brunelle
September 20, 2015
Version 1
1
LAB 1 – UNVERIFIED JOIN VIEWER 2
Table of Contents
2 UNVERIFIED JOIN VIEWER PRODUCT DESCRIPTION ......................................... 4
2.2 Major Components (Hardware/Software) .
................................................................ 5
4 UNVERIFIED JOIN VIEWER PRODUCT PROTOTYPE DESCRIPTION ................. 8
List of Figures
List of Tables
LAB 1 – UNVERIFIED JOIN VIEWER 3
1 INTRODUCTION
In the Chesapeake Bay, large amounts of striped bass are being afflicted by mycobacteriosis ("David Gauthier, Ph.D."). Mycobacteriosis is an infectious disease caused by bacteria in the genus mycobacterium ("In Focus - Striped Bass Health"). Mycobacteriosis causes inflammation, tissue destruction and formation of scar tissue in one or more organs ("In Focus -
Striped Bass Health"). In order to further understand the characteristics, evolution and pathogenicity of the mycobacterium plaguing the striped bass in the Chesapeake Bay, the pervasive mycobacterium genomes must be studied.
Unfortunately, the current process for studying the genomes of Mycobacterium species is very time consuming and costly. This process involves splitting up a genome into many smaller pieces, determining the genetic makeup of those segments, and then joining all of the segments back together ("What's a Genome?"). Sequencers that join together the splits are prone to errors.
Genome viewers do not currently have the ability to show join evidence. So, if a join appears faulty, a biologist must manually check the join in the sequence, and then must conduct lab work to find the correct join. If there was a way to keep track of and view why and how joins are created, less lab work, which is expensive due to materials used and time, will have to be done.
The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to interact with a sequence using a graphical user interface (GUI) and see the various sequence features, in particular the different joins. By presenting all this information in a GUI, the biologists will be able to quickly navigate through vast amounts of genome information to determine if joins were made correctly or how to correctly create a join if it does appear to be improperly formed. The UJV saves the biologist the time of having to track down why and how bad joins were made by making all this information readily available.
LAB 1 – UNVERIFIED JOIN VIEWER 4
2 UNVERIFIED JOIN VIEWER PRODUCT DESCRIPTION
The UJV is a self-contained GUI application. It reads in genome sequence information and then displays this information in such a way that it can be easily interpreted and navigated through by a human. Users are able to quickly navigate genome information by filtering specific genome features and zooming in and out of regions of the genome. Users can also interact with different genome features in the viewer in order to be presented with more specific information.
2.1 Key Product Features and Capabilities.
In order for the UJV to display a genome, it must be supplied a GenBank file that contains information that represents the genome to be analyzed by the user. The UJV will parse this file and create a circular representation of the genome that can be interacted with. The different features in the genome will be represented on the circle by labels. These labels will become more and more informative by means of zooming in on the circular representation of the genome or by clicking on them. The software also allows edits to the genome and will allow users to save their progress and export the new genome to a GenBank file. It also allows the user to upload versions of their modified genomes to the GenBank ("Public nucleic acid sequence repository"). High resolution screenshots of the viewer can be made in order to further facilitate collaboration on verifying a genome is correct.
When a user zooms in on the genome, it will make information in the user’s focus more visible while obscuring information outside of the user’s focus. When zooming out, specific information in the user’s focus becomes more obscure, while obscured information that was previously out of the field of view will be moved in. So, information becomes more detailed but less abundant when zooming in, while information becomes less detailed but more abundant when zooming out. The viewer can zoom all the way down the nucleotides of the genome. If a
LAB 1 – UNVERIFIED JOIN VIEWER 5 user determines that there is too much information cluttering a certain zoom level, they may filter out different features. A user may specify in a features pane which features will be visible on the viewer.
The colors of the different components in the application can also be customized. This is of particular interest when creating screenshots of the viewer. Different destinations of the screenshots may have certain color requirements, so being able to change the colors of the UJV to meet these requirements is necessary.
In order to make this software as available as possible to further advance the field of bioinformatics, the UJV will be open-source and work cross-platform. This will allow contributors from all across the world on many platforms to use and improve the UJV. Genome sequencing will become less expensive if this application gains a foothold in the industry.
2.2 Major Components (Hardware/Software).
This product has a single component, which is a Java graphical user interface (GUI) application. No Internet connection or database is required for this software to function. An
Internet connection is only required if the user wants to export a genome sequence directly to the
GenBank. The application is packaged with all of its dependencies besides its requirement for the Java 8 Runtime Environment. The main Java library that is used to implement the GUI is
JavaFX ("1 JavaFX Overview"). The GUI consists of a pane detailing the features currently visible, a central viewer for displaying the current region of the genome, and an inspection pane.
The major functional components are outlined in Figure 1.
[This space intentionally left blank.]
LAB 1 – UNVERIFIED JOIN VIEWER 6
Figure 1 - Major Functional Components Diagram
3 IDENTIFICATION OF CASE STUDY
The UJV was thought up originally as a way to aid Dr. Gauthier’s studies on the mycobacterium species afflicting striped bass in the Chesapeake Bay. The current process of identifying key parts in the bacterial genomes is roughly outlined in the introduction of this paper. A figure further detailing the process is seen in Figure 1.
[This space intentionally left blank.]
LAB 1 – UNVERIFIED JOIN VIEWER 7
Figure 2 - Current process for verifying joins in a genome
Currently available genome viewers, such as Geneious, do not have the capability to view join evidence (“Geneious”). If a join does not look correct, the tedious process of breaking attention away from the genome viewer, manually tracking down the join outside of the viewer and figuring out how and why a join was created has to be done. To help reduce the time and resources required to track down join evidence, the solution shown in Figure 2 was devised.
[This space intentionally left blank.]
LAB 1 – UNVERIFIED JOIN VIEWER 8
Figure 3 - Improved process for verifying joins
With the new process, less time will be spent on correcting joins that were incorrectly made. The way that this process will be facilitated will be through interaction with a GUI, which is what the Unverified Join Viewer provides.
4 UNVERIFIED JOIN VIEWER PRODUCT PROTOTYPE DESCRIPTION
The prototype developed will have the same software architecture and have all of the features of the real world product sans the join manipulation and exporting features.
4.1 Prototype Architecture (Hardware/Software)
The UJV prototype will also be a Java GUI implemented using JavaFX that will be used to inspect the genome and its features. Even if an Internet connection is available, it will not be used since there will be no exporting GenBank files to the GenBank.
[This space intentionally left blank.]
LAB 1 – UNVERIFIED JOIN VIEWER 9
Figure 4 - Prototype Screenshot
4.2 Prototype Features and Capabilities
The only features not present in the prototype that will be in the real world product involve modifying the genome and exporting it. These differences are outlined in Table 1.
[This space intentionally left blank.]
LAB 1 – UNVERIFIED JOIN VIEWER 10
Table 1 - Differences between prototype and real world product
Features
Zoom
Color Options
Load a GenBank File
View Genome in a circle
View Features inside the circle
Select a Feature as a Join
View Joins Outside of circle
View Feature Data
View Join Data
Multiple Tabs for Different Files
High Resolution Screen Shots
Select a Join for Editing
Edit Join Information no no
Save Edited File in the GBK format no
Upload the File to the GenBank no yes yes yes yes yes yes yes
UJV Prototype UJV Real World Product yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes
4.3 Prototype Development Challenges
The development challenges the prototype development team faces are time and working with new technology. As for the time hurdle, the team has to develop a working prototype even though a considerable portion of the team’s time is spent creating design documents and labs, which takes time from being able to write code. As for the last challenge, working with a new
Java framework, it has been shown that the developers can use this framework effectively, as evidenced by prototypes created by the developers, so this challenge appears to be minimal. So, the main challenge is the lack of time to get a JavaFX GUI to interact with genomes.
LAB 1 – UNVERIFIED JOIN VIEWER 11
GLOSSARY
Join : A join is a place where two fragments are recombined, or joined again, after having been split for genome sequencing. Special software is used to determine joins based on the characteristics of the different splits generated before sequence analysis.
Join Evidence : Certain amount of the beginning and end of the joins sequence correctly overlaps two ends of the known sequence part.
Fragment : A DNA strand fragment, Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together.
GenBank : A sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information (NCBI) as part of the International
Nucleotide Sequence Database Collaboration (INSDC).
PCR : Stands for Polymerase chain reaction, It is a process used to amplify a region of DNA for further study. Allows us to isolate DNA fragments from a genome for study and analysis.
PCR Primer : In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In PCR the strands are isolated and a primer is used to start the rebuilding.
Through the use of specific primers at specific points we are able to select parts of a DNA strand we wish to study.
LAB 1 – UNVERIFIED JOIN VIEWER 12
REFERENCES
1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784
David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml
Geneious. (n.d.). Retrieved September 20, 2015, from http://www.geneious.com/features
In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp
Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from http://www.ncbi.nlm.nih.gov/genbank/submit
What's a Genome? (n.d.). Retrieved September 20, 2015, from http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml