Running Head: Lab 2 – Unverified Join Viewer Product Specification 1

advertisement
Running Head: Lab 2 – Unverified Join Viewer Product Specification
Lab 2 – Unverified Join Viewer Product Specification
Team Red
James Ord
CS411W
Professor Janet Brunelle and Dr. Abishek Biswas
November 22, 2015
Version 1.1
1
Lab 2 – Unverified Join Viewer Product Specification
2
Table of Contents
1. Introduction ................................................................................................................................. 3
1.1 Purpose.................................................................................................................................. 4
1.2 Scope ..................................................................................................................................... 4
1.3 Definitions, Acronyms, and Abbreviations .......................................................................... 5
1.4 References ............................................................................................................................. 6
1.5 Overview ............................................................................................................................... 7
2. General Description .................................................................................................................... 8
2.1 Prototype Architecture Description ...................................................................................... 8
2.2 Prototype Functional Description ....................................................................................... 10
List of Tables
Table 1. Prototype vs. Product Feature Comparison .................................................................... 10
List of Figures
Figure 1. Prototype Major Functional Component Diagram .......................................................... 9
Lab 2 – Unverified Join Viewer Product Specification
3
1. Introduction
Striped Bass, which are of major commercial and recreational importance in the
Chesapeake Bay, are affected with mycobacteriosis from Mycobacterium spp., namely M.
pseudoshottsii and M. shottsii (Old Dominion University, 2011). Combating this sickness
requires extensive research into the very genetic sequences of the various species of bacteria
present in the bay. The process of sequencing a genome is massively complicated and cannot be
done all at once. M. pseudoshottsii, for example, is a relatively primitive bacterial organism, yet
it almost has 1,500 base pairs in its genome (Global Catalogue of Microorganisms, n.d.). More
complex organisms can have genomes that are millions of base pairs in length. The amount of
data that must be processed by a genetic sequencer is too much to do all at once, so the genome
must be split before sequencing. After the sequencing process the genome is recombined. This
creates another problem for gene sequencers in the form of repeated sequences. Not every
sequence along a genome is completely unique, meaning that a genetic sequencer could have
difficulty placing the sequence in the correct order. A solution to the problem of repeat
sequences comes in the form of the Corrective Algorithm for Repeat Placement software
(CARP) developed by Abishek Biswas. This gene sequencer attempts to correctly place a join in
order along the genome upon sequencing, while providing the reasons for a join with
annotations. In order to fully study a genome, however, a researcher must be able to view the
genome. The current market of genome viewers consists of software that is expensive, difficult
to use, or lacking in specialized capabilities. The current industry standard, Geneious, is not
capable of displaying joins and instead leaves repeat sequences out of the displayed genome
(Geneious, n.d.). The Unverified Join Viewer (UJV), designed as an open source GUI
Lab 2 – Unverified Join Viewer Product Specification
4
accompaniment to CARP, is a system capable of displaying a complete bacterial genome, repeat
sequences, and the evidence used to create a join all in one viewing panel.
1.1 Purpose
UJV is a stand alone graphical user interface that displays the results of the genome
sequencing process carried out by CARP. Working closely with input from Dr. Gauthier, UJV is
a system designed for biologists by a biologist with the intent of creating a free and easy to use
genome viewer that includes an important feature not found on the current market. In addition to
genome features commonly found with other sequencers, UJV is capable of displaying the
location and relevant evidence for joins produced by CARP. Showing the join evidence allows
the user to view a join and analyze the justification used to make the join without the use of time
consuming and expensive lab processes. UJV is also an open source project that welcomes input
from the development community. The project is open for community scrutiny and improvement
in addition to specializations that depend on the developer’s specific requirements.
1.2 Scope
The product version of UJV will be a completely independent system capable of running
on Windows, Mac OS, and Linux machines. While the system will not be inherently dependent
on internet connectivity, the availability of valid files to use with UJV may require an internet
connection if the user has no such files available. If a genome file is needed, UJV will accept
genome files that are properly formatted according to the standards defined by the National
Center for Biotechnology Information (NCBI) Genbank (GenBank, 2015). Upon opening a file,
the user will be able to view the genome within the main viewing pane of the system. The system
will be capable of zooming in on a particular portion of the genome to allow for detailed
inspection on a specific area. Additionally, the features defined within the imported genome file
Lab 2 – Unverified Join Viewer Product Specification
5
will be represented as dots in their appropriate locations along the genome. The user will be able
to click on these features to display relevant information defined within the genome file about
that particular feature. Finally, UJV will allow the user to make edits to a genome by creating
additional annotations to a feature, redefining the location of a feature, or changing the
placement of a repeat sequence. Any changes made to a genome within UJV will be saved to the
originally imported genome file.
The prototype for UJV will be a fully functional application that is capable of using real
world data for demonstration. All of the viewing and zoom features will be on full display and
fully functional within the prototype version. The only features that will be missing are editing
and saving a genome. The development team feels that the other features such as zoom, color
customization, and viewing join data will be enough of a challenge for the semester and are more
important to the core functionality of UJV. The prototype will require no simulation as all of the
data required is readily available through GenBank.
1.3 Definitions, Acronyms, and Abbreviations
Bioinformatics: Study of biological data
Color Scheme: Unique identifying colors that denote a Fragment, Join, and Individual ACTG in
a sequence
DNA: Deoxyribonucleic acid. Molecules that carries most of the genetic instructions used in the
development, functioning and reproduction of all organisms
Eco-Epidemiology: Study of ecologic influences on human health
Ecology: Study of interactions among organism and their environment
Etiology: Study of origination
Feature: Specific information about genomic data
Lab 2 – Unverified Join Viewer Product Specification
6
Fragment: A DNA strand fragment. Current sequencers are unable to process DNA strands as a
whole and must break the genome up into fragments or pieces that can be processed and then
formed back together.
GenBank: A sequence database that is open access, annotated collection of all publicly available
nucleotide sequences and their protein translations. This database is produced and maintained by
the National Center for Biotechnology Information
Geneious: A powerful and comprehensive suite of molecular biology tools
Genome: The genetic material of an organism
Join: A place where two fragments are recombined, or joined again
Join Evidence: The means of determining the position where two fragments are joined
M. marinum: A free living bacterium, which causes opportunistic infections in humans
M. pseudoshottsii, shotti: Mycobacterium species isolated from Chesapeake Bay striped bass
Mycobacteriosis: Diseases caused by a group of bacteria
Scale: The measure of when to display the sequence or join as a line or its full sequence pattern
PCR: Polymerase chain reaction. A process used to amplify a region of DNA for further study.
Allows for the isolation of DNA fragments from a genome for study and analysis.
PCR Primer: The starting point for the PCR process along a strand of DNA. A primer is used to
mark the specific locations on a DNA strand for the PCR process to amplify.
1.4 References
GenBank. (2015). GenBank. Retrieved October 25, 2015 from
http://www.ncbi.nlm.nih.gov/genbank/
Geneious. (n.d.). Geneious. Retrieved September 21, 2015, from
http://www.geneious.com
Lab 2 – Unverified Join Viewer Product Specification
7
Global Catalogue of Microorganisms. (n.d.). Mycobacterium shottsii. Retrieved September 21,
2015, from http://gcm.wfcc.info/speciesPage.jsp?strain_name=Mycobacterium shottsii
Lab 1 – Unverified Join Viewer Product Description. Version 2 (2015, October) Red Team.
CS411W: James Ord
Old Dominion University. (2011). David Gauthier, Ph.D. Retrieved September 21, 2015, from
http://sci.odu.edu/biology/directory/gauthier.shtml
1.5 Overview
This product specification provides the hardware and software configuration, capabilities
and features of the Unverified Join Viewer prototype. The information provided in the remaining
sections of this document includes a detailed description of the hardware and software
architecture, capabilities, and key features. The specific requirements for the product can be
found separately in Lab 2 Section 3.1.
(This space left intentionally blank)
Lab 2 – Unverified Join Viewer Product Specification
8
2. General Description
Unverified Join Viewer is an operating system agnostic program that only depends on the
presence of version 8 of Oracle’s Java being installed on the system. The system consists of three
primary components: the viewing panel, the side panel, and the top panel. The viewing panel is
the user’s main working space. It contains the genome drawing as well as the necessary features
as described in the imported genome file. The viewing panel is also where the user can make any
edits that may be required to the genome. The side panel is the primary means of organizing the
information presented to the user. Through this panel, the user can enable or disable whether or
not a particular feature or group of features is displayed on the main panel. The side panel also
contains a complete list of features for the user to browse through. The top panel is used to give
the user a sense of navigation while viewing the genome. While the viewing panel draws the
genome in a circular form, the top panel displays a straight line with markers along the line to
denote particular sections of the genome. When the user zooms in on a section of the genome,
the top panel displays where the user is viewing in relation to the genome at large.
2.1 Prototype Architecture Description
Unverified Join Viewer is a self contained GUI application, requiring few major
components in order to work. The first component is the biologist that wants to study a genome.
The user will control all aspects of the system including importing a genome, zooming, choosing
what features to display, and verifying the validity of the joins that are presented. The next
required component is the output from a gene sequencer. Ideally this output should come from
CARP so that join evidence can be included into the displayed genome, but any genome from
GenBank is valid. The last major component is UJV itself. UJV is where the information from
Lab 2 – Unverified Join Viewer Product Specification
9
the imported genome is extracted and displayed for the user to inspect.
Figure 1. Prototype Major Functional Component Diagram
It should be noted that while the prototype for UJV has no file output, the product version of
UJV will include file output as an additional major component after the user makes edits to a
genome. Table 1 below describes the key differences between the product and prototype versions
of UJV.
(This space left intentionally blank)
Lab 2 – Unverified Join Viewer Product Specification
10
Table 1.
Features
UJV Prototype
UJV Real World
Product
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
Zoom
yes
Color Options
yes
Load a GenBank File
yes
View Genome in a circle
yes
View Features inside the circle
yes
Select a Feature as a Join
yes
View Joins Outside of circle
yes
View Feature Data
yes
View Join Data
yes
Multiple Tabs for Different Files
yes
Hi Resolution Screen Shot
yes
Select a Join for Editing
no
Edit Join Information
no
Save Edited File in the GBK
no
yes
format
Upload the File to the GenBank
no
yes
Table 1. Prototype vs. Product Feature Comparison
2.2 Prototype Functional Description
The prototype version of UJV is able to import a genome file selected from the user’s
local system for display in the main viewing panel. UJV is currently focused on the display of
bacterial genomes, which are circular in nature. When a genome file is imported UJV will
display the genome in the viewing panel as a circle. Once the genome is displayed, the user is
able to zoom in and out of a particular portion of the genome, allowing for detailed inspection of
specific features. The viewing panel also displays GenBank features included in the imported file
in the form of dots next to a part of the genome. The user is able to click on these dots to display
additional information about each feature such as the location, feature type, and the specific
genetic sequence of that feature. From the left panel, the user is able to select either a specific
Lab 2 – Unverified Join Viewer Product Specification
11
feature from a comprehensive list of features or a set of features. The left panel controls what
features are displayed on the viewing panel. Finally, UJV is capable of producing high resolution
screen captures of the genome currently being viewed.
Download