Running head: LAB 1 – UNVERIFIED JOIN VIEWER 1 Joseph R. Cooper

advertisement

Running head: LAB 1 – UNVERIFIED JOIN VIEWER

Lab 1 – Unverified Join Viewer

Joseph R. Cooper

CS 411W

Janet Brunelle

September 20, 2015

Version 1

1

LAB 1 – UNVERIFIED JOIN VIEWER 2

Table of Contents

1 INTRODUCTION ........................................................................................................... 3

2 UNVERIFIED JOIN VIEWER PRODUCT DESCRIPTION ......................................... 4

2.1 Key Product Features and Capabilities. .................................................................... 4

2.2 Major Components (Hardware/Software) .

................................................................ 5

3 IDENTIFICATION OF CASE STUDY .......................................................................... 6

4 UNVERIFIED JOIN VIEWER PRODUCT PROTOTYPE DESCRIPTION ................. 8

4.1 Prototype Architecture (Hardware/Software) ........................................................... 8

4.2 Prototype Features and Capabilities.......................................................................... 9

4.3 Prototype Development Challenges ........................................................................ 10

GLOSSARY .......................................................................................................................11

REFERENCES ................................................................................................................. 12

List of Figures

Figure 1 - Major Functional Components Diagram ....................................................................... 6

Figure 2 - Current process for verifying joins in a genome ............................................................ 7

Figure 3 - Improved process for verifying joins ............................................................................. 8

Figure 4 - Prototype Screenshot ..................................................................................................... 9

List of Tables

Table 1 - Differences between prototype and real world product ................................................. 10

LAB 1 – UNVERIFIED JOIN VIEWER 3

1 INTRODUCTION

In the Chesapeake Bay, large amounts of striped bass are being afflicted by mycobacteriosis ("David Gauthier, Ph.D."). Mycobacteriosis is an infectious disease caused by bacteria in the genus mycobacterium ("In Focus - Striped Bass Health"). Mycobacteriosis causes inflammation, tissue destruction and formation of scar tissue in one or more organs ("In Focus -

Striped Bass Health"). In order to further understand the characteristics, evolution and pathogenicity of the mycobacterium plaguing the striped bass in the Chesapeake Bay, the pervasive mycobacterium genomes must be studied.

Unfortunately, the current process for studying the genomes of Mycobacterium species is very time consuming and costly. This process involves splitting up a genome into many smaller pieces, determining the genetic makeup of those segments, and then joining all of the segments back together ("What's a Genome?"). Sequencers that join together the splits are prone to errors.

Genome viewers do not currently have the ability to show join evidence. So, if a join appears faulty, a biologist must manually check the join in the sequence, and then must conduct lab work to find the correct join. If there was a way to keep track of and view why and how joins are created, less lab work, which is expensive due to materials used and time, will have to be done.

The Unverified Join Viewer (UJV) is a proposed solution that will allow biologists to interact with a sequence using a graphical user interface (GUI) and see the various sequence features, in particular the different joins. By presenting all this information in a GUI, the biologists will be able to quickly navigate through vast amounts of genome information to determine if joins were made correctly or how to correctly create a join if it does appear to be improperly formed. The UJV saves the biologist the time of having to track down why and how bad joins were made by making all this information readily available.

LAB 1 – UNVERIFIED JOIN VIEWER 4

2 UNVERIFIED JOIN VIEWER PRODUCT DESCRIPTION

The UJV is a self-contained GUI application. It reads in genome sequence information and then displays this information in such a way that it can be easily interpreted and navigated through by a human. Users are able to quickly navigate genome information by filtering specific genome features and zooming in and out of regions of the genome. Users can also interact with different genome features in the viewer in order to be presented with more specific information.

2.1 Key Product Features and Capabilities.

In order for the UJV to display a genome, it must be supplied a GenBank file that contains information that represents the genome to be analyzed by the user. The UJV will parse this file and create a circular representation of the genome that can be interacted with. The different features in the genome will be represented on the circle by labels. These labels will become more and more informative by means of zooming in on the circular representation of the genome or by clicking on them. The software also allows edits to the genome and will allow users to save their progress and export the new genome to a GenBank file. It also allows the user to upload versions of their modified genomes to the GenBank ("Public nucleic acid sequence repository"). High resolution screenshots of the viewer can be made in order to further facilitate collaboration on verifying a genome is correct.

When a user zooms in on the genome, it will make information in the user’s focus more visible while obscuring information outside of the user’s focus. When zooming out, specific information in the user’s focus becomes more obscure, while obscured information that was previously out of the field of view will be moved in. So, information becomes more detailed but less abundant when zooming in, while information becomes less detailed but more abundant when zooming out. The viewer can zoom all the way down the nucleotides of the genome. If a

LAB 1 – UNVERIFIED JOIN VIEWER 5 user determines that there is too much information cluttering a certain zoom level, they may filter out different features. A user may specify in a features pane which features will be visible on the viewer.

The colors of the different components in the application can also be customized. This is of particular interest when creating screenshots of the viewer. Different destinations of the screenshots may have certain color requirements, so being able to change the colors of the UJV to meet these requirements is necessary.

In order to make this software as available as possible to further advance the field of bioinformatics, the UJV will be open-source and work cross-platform. This will allow contributors from all across the world on many platforms to use and improve the UJV. Genome sequencing will become less expensive if this application gains a foothold in the industry.

2.2 Major Components (Hardware/Software).

This product has a single component, which is a Java graphical user interface (GUI) application. No Internet connection or database is required for this software to function. An

Internet connection is only required if the user wants to export a genome sequence directly to the

GenBank. The application is packaged with all of its dependencies besides its requirement for the Java 8 Runtime Environment. The main Java library that is used to implement the GUI is

JavaFX ("1 JavaFX Overview"). The GUI consists of a pane detailing the features currently visible, a central viewer for displaying the current region of the genome, and an inspection pane.

The major functional components are outlined in Figure 1.

[This space intentionally left blank.]

LAB 1 – UNVERIFIED JOIN VIEWER 6

Figure 1 - Major Functional Components Diagram

3 IDENTIFICATION OF CASE STUDY

The UJV was thought up originally as a way to aid Dr. Gauthier’s studies on the mycobacterium species afflicting striped bass in the Chesapeake Bay. The current process of identifying key parts in the bacterial genomes is roughly outlined in the introduction of this paper. A figure further detailing the process is seen in Figure 1.

[This space intentionally left blank.]

LAB 1 – UNVERIFIED JOIN VIEWER 7

Figure 2 - Current process for verifying joins in a genome

Currently available genome viewers, such as Geneious, do not have the capability to view join evidence (“Geneious”). If a join does not look correct, the tedious process of breaking attention away from the genome viewer, manually tracking down the join outside of the viewer and figuring out how and why a join was created has to be done. To help reduce the time and resources required to track down join evidence, the solution shown in Figure 2 was devised.

[This space intentionally left blank.]

LAB 1 – UNVERIFIED JOIN VIEWER 8

Figure 3 - Improved process for verifying joins

With the new process, less time will be spent on correcting joins that were incorrectly made. The way that this process will be facilitated will be through interaction with a GUI, which is what the Unverified Join Viewer provides.

4 UNVERIFIED JOIN VIEWER PRODUCT PROTOTYPE DESCRIPTION

The prototype developed will have the same software architecture and have all of the features of the real world product sans the join manipulation and exporting features.

4.1 Prototype Architecture (Hardware/Software)

The UJV prototype will also be a Java GUI implemented using JavaFX that will be used to inspect the genome and its features. Even if an Internet connection is available, it will not be used since there will be no exporting GenBank files to the GenBank.

[This space intentionally left blank.]

LAB 1 – UNVERIFIED JOIN VIEWER 9

Figure 4 - Prototype Screenshot

4.2 Prototype Features and Capabilities

The only features not present in the prototype that will be in the real world product involve modifying the genome and exporting it. These differences are outlined in Table 1.

[This space intentionally left blank.]

LAB 1 – UNVERIFIED JOIN VIEWER 10

Table 1 - Differences between prototype and real world product

Features

Zoom

Color Options

Load a GenBank File

View Genome in a circle

View Features inside the circle

Select a Feature as a Join

View Joins Outside of circle

View Feature Data

View Join Data

Multiple Tabs for Different Files

High Resolution Screen Shots

Select a Join for Editing

Edit Join Information no no

Save Edited File in the GBK format no

Upload the File to the GenBank no yes yes yes yes yes yes yes

UJV Prototype UJV Real World Product yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes

4.3 Prototype Development Challenges

The development challenges the prototype development team faces are time and working with new technology. As for the time hurdle, the team has to develop a working prototype even though a considerable portion of the team’s time is spent creating design documents and labs, which takes time from being able to write code. As for the last challenge, working with a new

Java framework, it has been shown that the developers can use this framework effectively, as evidenced by prototypes created by the developers, so this challenge appears to be minimal. So, the main challenge is the lack of time to get a JavaFX GUI to interact with genomes.

LAB 1 – UNVERIFIED JOIN VIEWER 11

GLOSSARY

Join : A join is a place where two fragments are recombined, or joined again, after having been split for genome sequencing. Special software is used to determine joins based on the characteristics of the different splits generated before sequence analysis.

Join Evidence : Certain amount of the beginning and end of the joins sequence correctly overlaps two ends of the known sequence part.

Fragment : A DNA strand fragment, Current sequencers are unable to process DNA strands as a whole and must break the genome up into fragments or pieces that can be processed and then formed back together.

GenBank : A sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. This database is produced and maintained by the National Center for Biotechnology Information (NCBI) as part of the International

Nucleotide Sequence Database Collaboration (INSDC).

PCR : Stands for Polymerase chain reaction, It is a process used to amplify a region of DNA for further study. Allows us to isolate DNA fragments from a genome for study and analysis.

PCR Primer : In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In PCR the strands are isolated and a primer is used to start the rebuilding.

Through the use of specific primers at specific points we are able to select parts of a DNA strand we wish to study.

LAB 1 – UNVERIFIED JOIN VIEWER 12

REFERENCES

1 JavaFX Overview. (n.d.). Retrieved September 20, 2015, from http://docs.oracle.com/javase/8/javafx/get-started-tutorial/jfx-overview.htm#JFXST784

David Gauthier, Ph.D. (n.d.). Retrieved September 21, 2015, from http://sci.odu.edu/biology/directory/gauthier.shtml

Geneious. (n.d.). Retrieved September 20, 2015, from http://www.geneious.com/features

In Focus - Striped Bass Health. (n.d.). Retrieved September 20, 2015, from http://www.dnr.state.md.us/dnrnews/infocus/striped_bass_health.asp

Public nucleic acid sequence repository. (n.d.). Retrieved September 20, 2015, from http://www.ncbi.nlm.nih.gov/genbank/submit

What's a Genome? (n.d.). Retrieved September 20, 2015, from http://www.genomenewsnetwork.org/resources/whats_a_genome/Chp2_2.shtml

Download