Lab 1 – Unverified Join Viewer Product Description James Ord

advertisement
Running Head: Lab 1 – Unverified Join Viewer Description
Lab 1 – Unverified Join Viewer Product Description
James Ord
Old Dominion University
CS 411W
Janet Brunelle
October 25, 2015
Version 2
1
Lab 1 – Unverified Join Viewer Description
2
Table of Contents
1 Introduction ............................................................................................................................................ 3
2 Unverified join viewer product description ........................................................................................... 4
2.1 Key Product Features and Capabilities ........................................................................................... 4
2.2 Major Components (Hardware/Software) ....................................................................................... 4
3 Identification of Case Study ................................................................................................................... 5
4 Unverified Join Viewer Prototype Description ...................................................................................... 6
4.1 Prototype Architecture (Hardware/Software) ................................................................................. 6
4.2 Prototype Features and Capabilities................................................................................................ 7
4.3 Prototype Development Challenges ................................................................................................ 8
5 Glossary ................................................................................................................................................. 9
Resources ................................................................................................. Error! Bookmark not defined.
Figures and Tables
Figure 1. Product Major Functional Component Diagram ....................................................................... 5
Figure 2. Prototype Major Functional Component Diagram .................................................................... 7
Table 1. Prototype vs. Real World Product Features……………………………………………………..8
Lab 1 – Unverified Join Viewer Description
3
1 Introduction
The modern market for genetic sequence viewers is populated with software that is expensive,
difficult to use, or lacking in specialized development. Industry standard software, such as Geneious
does not have the capability to view repeated genome sequences, instead opting to leave such strands
out of the genome view up to the user to interpret (Geneious, n.d.). The process of verifying where a
repeat sequence is placed involves expensive and time consuming lab work, sometimes costing even
thousands of dollars for a single genome.
Genome sequencing is a massively complicated process because of the amount of data being
processed. Mycobacterium shottsii, for example, is a simple bacterial organism, yet it has almost 1,500
base pairs in its genome (Global Catalogue of Microorganisms, n.d.). More complex organisms can
contain genomes that are millions of base pairs in length. The amount of data that must be processed by
a genome sequencers is too large to do all at once, so the genome must be split into a number of
sequences for easier processing, then later recombined. A common issue for gene sequencers is a
repeated sequence. These are not rare within a complete sequence and a genome may contain numerous
examples of different repeat sequences. When a genome sequencer runs across repeated sequences it
runs into the issue of not knowing exactly where to place that portion of the genetic code. Modern
genome viewing software reflects this issue in that there is not a way for viewers to display repeated
sequences along with the rest of the genome. A solution to the first problem exists in the form of the
Corrective Algorithm for Repeat Placement software (CARP) developed by Abishek Biswas. This gene
sequencer attempts to correctly place the l repeat sequences belong along the genome while providing
the reasons for the placement with annotations. As helpful as CARP may be, the output is simply a text
file for each base pair. This kind of data is indecipherable and requires a visual representation, hence
the Unverified Join Viewer (UJV). Designed to be an open-source accompaniment to CARP, UJV will
display an entire genome, including the repeat sequences placed by CARP, along with the justification
Lab 1 – Unverified Join Viewer Description
4
annotations for each join. This will make studying a genome with repeats faster, easier, and less
expensive by greatly reducing the amount of lab work required.
2 Unverified join viewer product description
UJV is designed to be a simple and easy to use viewer to see the results of the gene sequencing
process. The user will be able to import a genome from a file that is properly formatted according to
standards defined by the National Center for Biotechnology Information (NCBI) GenBank (GenBank,
2015). UJV will initially display the whole genome within the main viewing pane with all the
appropriate GenBank features as defined in the genome file available by clicking on an indicated
position on the sequence. The user will be able to zoom into any portion of the genome to allow for
more detailed inspection. UJV will also allow the user to make edits to the genome in the form of
additional annotations or changed repeat sequence placements within the viewer and save those edits to
the raw text file.
2.1 Key Product Features and Capabilities
Unverified Join Viewer will be the first software that is able to view and edit ambiguous joins
within a genome. The zoom function of UJV will be important for the user to inspect any particular part
of the genome in detail. The final product will also be capable of verifying that edits being made to
repeat sequence placements adhere to the proper justifications outlined by CARP for a join's existence.
Being able to reorganize a genome on the computer without having to verify through a laboratory saves
both time and money, even potentially thousands of dollars and weeks’ worth of time.
2.2 Major Components (Hardware/Software)
UJV is designed as a stand-alone viewer, meaning it will not depend on outside databases or
internet connectivity unless the user needs to download genome files (.gbk) from Genbank. If the user
Lab 1 – Unverified Join Viewer Description
5
already has genome files available on the system then no internet connection will be required. The
hardware requirements to run UJV are a minimum of 2 GB of memory and a minimum ATI Radeon
HD 2400, GMA 4500, or GeForce 8 GPU. The software requirements are Windows, Linux, or OSX
operating system and Java 8. UJV will also be designed as an open-source project which will allow
community innovation, improvements, and customization according to individual requirements. Figure
1 below outlines the major components required for UJV.
Figure 1. Product Major Functional Component Diagram
3 Identification of Case Study
Dr. David Gauthier of Old Dominion University (ODU) is the primary biologist contact for the
UVJ project. The purpose of the CARP and UJV systems is to assist with making genome research
more feasible through convenient genome inspection and reduced laboratory costs. Dr. Gauthier's
research into bacteria that affects Striped Bass in the Chesapeake Bay involves close inspection of the
genomes of Mycobacterium spp (Old Dominion University, 2011). His experience with the current
Lab 1 – Unverified Join Viewer Description
6
market of genome viewers provides invaluable insight to the design process of UJV since it is a product
designed by a biologist for biologists. The open-source nature of the project also lends it to community
updates and scrutiny from software developers. A group that requires a comprehensive viewer for a
different feature set could potentially add their own features into the UJV framework, expanding its
value to the biological community even further.
4 Unverified Join Viewer Prototype Description
The prototype for UJV will be a fully functional application that is capable of using real-world
data for demonstration. All of the viewing and zoom features will be on full display and fully functional
within the prototype version. The only features that will be missing are editing and saving a genome.
The development team feels that the other features such as zoom, color customization, and viewing join
data will be enough of a challenge for the semester and are more important to the core functionality of
UJV. The prototype will require no simulation as all of the data required is readily available through
GenBank.
4.1 Prototype Architecture (Hardware/Software)
The structure of the prototype will follow the previously described structure of the final
product. The user will be able to load a genome file that is either created by a sequencer present on the
user’s computer or downloaded from GenBank. UJV will then translate and display the imported
genome for the user and allow for detailed inspection of join evidence and other feature annotations as
included in the particular genome file. Figure 2 below outlines the major components required for the
prototype version of UJV.
Lab 1 – Unverified Join Viewer Description
7
Figure 2. Prototype Major Functional Component Diagram
4.2 Prototype Features and Capabilities
The prototype will demonstrate the viewing and inspection features of UJV. These are the most
essential parts of being able to verify if a genome was assembled correctly by CARP. As long as the
system shows the joins made by CARP, the justifications for each join, and the zoom function works as
designed, the user will have a far easier time analyzing a genome, indicating a successful prototype.
Table 1 below details the differences between the prototype and end product versions of UJV.
Lab 1 – Unverified Join Viewer Description
Features
UJV Prototype
Zoom
yes
Color Options
yes
Load a GenBank File
yes
View Genome in a circle
yes
View Features inside the circle yes
Select a Feature as a Join
yes
View Joins Outside of circle
yes
View Feature Data
yes
View Join Data
yes
Multiple Tabs for Different Files yes
Hi Resolution Screen Shot
yes
Select a Join for Editing
no
Edit Join Information
no
Save Edited File in the GBK
no
format
Upload the File to the GenBank no
8
UJV Real World
Product
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
Table 1. Prototype vs. Real World Product Features
4.3 Prototype Development Challenges
As with any new project, there will be a number of challenges to overcome during the
development phase. First, the platforms that the project is being developed on, Java 8 and JavaFX, is
brand new, leading to unfamiliarity with the new features implemented with these development tools.
This is partially mitigated by the fact that two team members are Oracle certified Java programmers
with extensive knowledge. This challenge is further mitigated by the wealth of documentation and
tutorials available to familiarize the development team with these new tools. Another challenge will be
limited domain experience within the team. As a whole, the team is lacking in experience in the
biological field, increasing the risk of a possible disconnect between what the system actually needs
and what the team thinks the system needs. This will be mitigated by having Dr. Gauthier directly
involved in each phase of development to make sure the system is being designed not only in a way
that is appealing to biologists, but is useful as well.
Lab 1 – Unverified Join Viewer Description
9
Glossary
Bioinformatics: Study of biological data
Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in a
sequence
DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the
development, functioning and reproduction of all organisms
Eco-Epidemiology: Study of ecologic influences on human health
Ecology: Study of interactions among organism and their environment
Etiology: Study of origination
Feature: Specific information about genomic data
Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a whole
and must break the genome up into fragments or pieces that can be processed and then formed
back together.
GenBank: A sequence database that is open access, annotated collection of all publicly available
nucleotide sequences and their protein translations. This database is produced and maintained
by the National Center for Biotechnology Information
Geneious: A powerful and comprehensive suite of molecular biology tools
Genome: The genetic material of an organism
Join: A place where two fragments are recombined, or joined again
Join Evidence: The means of determining the position where two fragments are joined
M. marinum: A free living bacterium, which causes opportunistic infections in humans
M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass
Mycobacteriosis: Diseases caused by a group of bacteria
Lab 1 – Unverified Join Viewer Description
10
Scale: The measure of when to display the sequence or join as a line or its full sequence pattern
PCR: Polymerase chain reaction. A process used to amplify a region of DNA for further study. Allows us to
isolate DNA fragments from a genome for study and analysis.
PCR Primer: In the PCR process a primer is needed to isolate a fragment of DNA. DNA is not able to build
another strand for a length of DNA that has lost its partner, it needs a starting point to build off of. In
PCR the strands are isolated and a primer is used to start the rebuilding. Through the use of specific
primers at specific points we are able to select parts of a DNA strand we wish to study.
Lab 1 – Unverified Join Viewer Description
11
References
Geneious. (n.d.). Geneious. Retrieved September 21, 2015, from
http://www.geneious.com
Global Catalogue of Microorganisms. (n.d.). Mycobacterium shottsii. Retrieved September 21, 2015,
from http://gcm.wfcc.info/speciesPage.jsp?strain_name=Mycobacterium shottsii
GenBank. (2015). GenBank. Retrieved October 25, 2015 from
http://www.ncbi.nlm.nih.gov/genbank/
Old Dominion University. (2011). David Gauthier, Ph.D. Retrieved September 21, 2015, from
http://sci.odu.edu/biology/directory/gauthier.shtml
Download