Word - ODU Computer Science

advertisement
Running Head: Lab 1 – SCORPION Product Description
SCORPION Product Description
SCORPION – Blue Team
Old Dominion University
CS411W – Janet Brunelle
Author: David Eason
Last Modified: October 7, 2014
Version: 2.0
1
Lab 1 – SCORPION Product Description
2
TABLE OF CONTENTS
1
INTRODUCTION.............................................................................................................................................................. 3
2
SCORPION PRODUCT DESCRIPTION ............................................................................................................... 5
2.1 Key Product Features and Capabilities ................................................................................................................6
2.2
Major Components (Hardware/Software)...........................................................................................................7
3
IDENTIFICATION OF CASE STUDY ................................................................................................................ 11
4
SCORPION PRODUCT PROTOTYPE DESCRIPTION ................................................................................ 12
4.1
Prototype Architecture (Hardware/Software) ................................................................................................. 12
4.2
Prototype Features and Capabilities .................................................................................................................. 14
4.3
Prototype Development Challenges................................................................................................................... 15
LIST OF FIGURES
Figure 1- STING hardware ................................................................................................................................................... 8
Figure 2 - sting API.................................................................................................................................................................. 9
Figure 3 - Real World vs Prototpye ................................................................................................................................ 13
Glossary ..................................................................................................................................................................................... 18
References ................................................................................................................................................................................ 19
Lab 1 – SCORPION Product Description
1
3
INTRODUCTION
In the United States there are many sicknesses and diseases that afflict the population. In the year
2010, statistics reported by the United States Centers for Disease Control (USCDC) showed that Cancer
was the second leading cause of death in the United States. The USCDC stated that death caused by
Cancer would soon have a higher rate than heart disease in coming years (Murphy, 2013). With the
increasing rate in death caused by cancer, institutes of study have been formed to find treatments and
cures. Institutes such as the National Cancer Institute spend $4.9 billion a year in funding and supporting
Cancer research ("Cancer Research Funding", 2013). Due to the lack of understanding and complexity of
how the human body operates from organ systems to cellular interdependence, it is important that
researchers and medical doctors receive all available resources in the search for a cure. Over the last
decade, advancements in computers and technologies have played a role in expanding medical
knowledge.
The human body is comprised of specific strings of proteins made from amino acids, and can be
upwards of one thousand characters long. The proteins strings can play different roles in the body
depending on amino acid ordering within the strings (Yaseen, 2012.). The order in which the amino acids
are sequenced causes the protein chains to fold in a specific manner. The specific folding pattern
determines the job they will perform in the human body.
Typical methods of cancer treatment require knowledge of precise locations of the afflicted cells
forming in the human body. After gaining this information, the cells are manually studied one at a time to
develop the sequence of amino acids that make up protein folds. With this knowledge, medical treatments
can be developed that are tailored for specific individuals. Being able to quickly map the protein
sequences would speed up the process for treating patients. Unfortunately, finding the specific sequences
of proteins for a patient is very difficult and can take many years. For terminal cancer patients, they may
only have a short period of time left; and spend it waiting years for treatment.
Lab 1 – SCORPION Product Description
4
Bioinformaticians is a group of researchers who specialize in scientific computing of biology and
have developed a method of statistically computing the protein folds. This method very accurate and can
reduce the time needed to map protein folds. The process uses a neural network that works very similar to
the human brain. Just like the human brain can recognize items from only partial views- like a tooth
brush head from a bristle, or a soda can from a pop top, a neural network can be trained to recognize an
entire protein fold based off of a small amount of information. This protein fold is known as a secondary
protein structure.
At Old Dominion University (ODU), advancements have been made using Neural Networks to
predict secondary structures. Currently at ODU, the neural network termed, “SCORPION” (SeCOndaRy
structure PredictION), has the highest accuracy in the world compared to all other neural networks using
secondary structure prediction algorithms. As of September 2014, SCORPION is available publicly for
anyone to use and can be accessible online. SCORPION is associated with ODU and partly funded by the
United States Federal Government through grants; therefore, government requirements must be
maintained. With SCORPION becoming very popular and only having a single method for access, use is
limited.
The product, Sting, will be a two part solution. The first part will be a complete redesign of the
SCORPION website. The website must conform to specific guidelines involving format and accessibility
in that the project is partly funded by Federal grants (“Section 508”, 2014). To solve this, the entire
webpage will be redesigned to be more user-friendly as well as 508 compliant. The website will also have
a newer, more professional look that will help support the ODU image. In addition, added code will move
a portion of the input computation to be done on the user’s computer. The second part of the solution will
be an addition to the SCORPION predicting system, allowing new methods of access. The addition to
SCORPION will be an Application Programming Interface (API). The API will allow access to
SCORPION over Hypertext Transfer Protocol (HTTP) which will enable researchers to use the neural
network as if it were their own. With this software package added to the SCORPION system, researchers
Lab 1 – SCORPION Product Description
5
and medical professionals will have the ability to compute much needed information at a faster rate
increasing the popularity of SCORPION and the image of ODU.
2
SCORPION PRODUCT DESCRIPTION
The primary goal of designing a system that can aid in cancer research is the ability to identify
protein sequences. SCORPION does this by receiving a known sequence of amino acids from the user.
With this short sequence, SCORPION can subsequently predict the remaining parts. In order for
SCORPION to use the submitted sequence, it is modified by a tool that is widely used by computational
biologists. The tool, known by the name “Position Specific Basic Logical Alignment Search Tool” (PSIBLAST), prepares the submission for SCORPION to use by reformatting the sequence into a Position
Specific Scoring Matrix (PSSM), which is a data format the neural network can process. Due to the
specialized knowledge that is required to maintain SCORPION, it should be used as efficiently as
possible.
With an API, anyone will have the ability to access SCORPION programmatically. With an API,
developers will be able to streamline the sequence submission process. Also, the API will be language
agnostic, meaning that it can be used with any programming language. In order to mitigate the chance of
overwhelming the system, each submission will be queued for processing.
The other facet of the product includes a completely redesigned website, complete with a user
friendly interface. With Federal funding being a primary source of money for research and maintenance
for SCORPION, it is very important that all laws be followed for a website supported by such grants. To
do so, the website must be accessible for the vision impaired by supporting screen readers.
To be 508 compliant, there must always be alternative text for images and media as well as help
for using the web forms. A separate feature to be added to the website is an in-place amino acid validating
algorithm. With this, the website will perform light computations on the sequences that are part of the
query to SCORPION. The user (client) will use a small amount of computational resources in order to
Lab 1 – SCORPION Product Description
6
allow the hardware and software that makes up SCORPION to focus solely on secondary protein structure
prediction.
Changes to the website will also include an Administrative resource for the customer. The
website will come with tracking tools that will log user locations and any information that they are willing
to forfeit. Administrators that are logged in will have access to a page that will compile user statistics,
include page visits, user information, and geographical IP locations. Anonymous users can still use the
website to submit queries, but an added feature for logging in for general users will also be incorporated.
In doing so, a user that is logged in can view previous submissions and results history. Those logged in to
the SCORPION website will use their access credentials the same way the Administrators use third party
software operated by companies such as Google and Facebook. The customer will not be responsible for
user information, because there will be no information such as names or credentials stored in a database
within the customer’s domain.
2.1 KEY PRODUCT FEATURES AND CAPABILITIES
The purpose of Sting is to bring more visibility to ODU’s professional image and highlight the
contributions that have been made in cancer research. A very good way for institutions to gain popularity
is to offer advanced services to the public. Similarly, companies that offer API’s are large and known for
cutting edge software. For example, Amazon uses an API to manage shopping carts, and Twitter offers an
API for data mining and account manipulation. With these tools available to the public, more people are
likely to use or connect to SCORPION, increasing popularity. Therefore, a SCORPION API will connect
researchers with computational capabilities not offered elsewhere. API’s are enable collective
programming and software integration, becoming very popular in industry. Through a standard format of
HTTP to communicate, any programmer with access to the Internet will be able to harness the predictive
accuracy that SCORPION provides.
There are two standards of API design currently recognized in industry. The most common is
called SOAP which stands for “Simple Object Access protocol”. SOAP focuses on immediate action and
Lab 1 – SCORPION Product Description
7
computing. The SCORPION API will use a RESTful design which stands for “Representational State
Transfer”. When using REST, processes are queued which puts less strain on the network and the server.
The website will feature a redesigned interface. The first thing a user will see is a professional
website that is visually pleasant. With helpful documentation and functionality that warns the user of
invalid character input, the overall experience will be streamlined. Responsive functionality will add ease
of use. For instance, when a user incorrectly enters an invalid character or blank spaces, they should be
aware of that. There is always a possibility that human errors will occur, but such errors can only be fixed
if the user becomes aware of them. Input validation will first check for invalid characters and if found, ask
the user if they want the invalid characters removed. If so, the input will be sanitized, otherwise the user
may do so manually. In addition, depending on the user’s wishes, they may log into the website using an
open source identification authenticator. If a user logs in as an administrator, they will have access to
view all the statistics of website users, including average visits and geographical IP location. Normal
website users will also be able to log in where they can post a sequence, view past submissions, and past
results.
2.2 MAJOR COMPONENTS (HARDWARE/SOFTWARE)
The SCORPION API will consist of three major hardware components. The first component in
order of usage is the user personal computer (PC). The PC can either be a desktop or laptop and must
have a network card. The second hardware component is a high speed network connection. The third
hardware component is the dedicated PHP web server. The web server and disk drives will be the
physical entity that will store all files, folders, and assorted data. The web server requires a very powerful
processor such as a Graphical Processing Unit (GPU) and at least 8GB of Random Access Memory
(RAM). All hardware components are illustrated in figure 1.
Lab 1 – SCORPION Product Description
8
FIGURE 1- STING HARDWARE
There are three software components, illustrated in figure 2 which are involve the API. Each
component is written in PHP, HTML, or XML. The first major software component is the URI folder
structure. In true REST format, the URI will be used by developers to call certain methods. This URI will
encompass all other algorithms aside from the SCORPION binary, which is provided by the customer.
Listed in order of use, the first algorithm will route user requests to other appropriate algorithms. There
are two specific requests that this algorithm will handle and will route requests based upon either the GET
or POST command. If a request is a POST command, the algorithm will check to ensure that included in
the command is an amino acid sequence attached with a valid email address. The attached sequence will
be stripped of invalid characters and blank spaces. As jobs are posted through the API, this algorithm will
queue them to prevent too many running at the same time. Doing so will ensure that the entire system can
function as efficiently as possible. Once these two items are verified the API will send the user back a
Lab 1 – SCORPION Product Description
9
unique id that is associated with the posted job. If a request by the user is a GET command, the algorithm
will return a list of job ids’. Once the user has a specific job id, they can use it to select the according job
information. The algorithm at this URI step will first check if there exists a job with the supplied id. If the
id does exist, the algorithm will return the predicted results determined by the SCORPION binary, or the
algorithm will return a message that the job has not finished.
FIGURE 2 - STING API
Lab 1 – SCORPION Product Description
10
The second part of the product includes a new website with additional functionality. The website
will share hardware components used by the API. The functional requirements differ, but will offer
similar capabilities to that which the API provides such as sequence submission and job tracking. Also
illustrated in figure 1, the first hardware component is the user PC, which will support an Internet browser
and storage. As with the API, a requirement for the PC is a network card with access to a high speed
Internet connection. The second hardware component is the PHP web server that will host the website.
This web server must have enough storage space to store assorted images and files, as well as a
lightweight database. The network connection associated with the web server must also be high speed to
support multiple users accessing the website at the same time.
The next two components are PSI-BLAST and the SCOPRION neural network. PSI-BLAST will
reformat user submissions and the neural network will calculate the predicted result. Both of these
components are separate and distinct because they cannot function efficient enough to perform timely
calculations without separate dedicated hardware to support it.
For the software portion of the website, four components are required. In order of use, the first
component is the web browser, which is not provided. The second software requirement is the website
with all the individual web pages. Consisting of multiple web pages, each page will be written in HTML
and will use scripting languages such as PHP and JavaScript. PHP will be used partly for design purposes,
such as using the same navigation code in multiple pages by pulling text from a single file. This allows
for future changes to be easily done, where one change affects many pages. JavaScript will be used when
a user submits amino acid sequences. The JavaScript code will be able to ensure that the user enters a
valid email address as well as validating the amino acid sequence. The algorithm that checks the amino
acid sequence will strip the string of invalid characters as well as blank spaces if the user wishes to.
Separate from other JavaScript algorithms, there will be one algorithm specific to logged in users.
JavaScript will also be used to pull information from the third software component, the database. The
database will be used to store user logins, as well as yielded user information. Associated with user
Lab 1 – SCORPION Product Description
11
information will be sequence information, such as the amino acid submission, date of submission, and
results regarding whether or not the job was fully processed.
3
IDENTIFICATION OF CASE STUDY
SCORPION was designed and built in 2012 as a project by Dr. Ashraf S. Leading the effort in the
SCORPION project is Dr. Yaohang Li, a professor in the Computational Science department and leader
in the computational biology team at ODU. Dr. Yaseen has been able to create a neural network that
predicts a secondary protein structure with higher accuracy than anyone before. With algorithms that give
context based clues from other known protein chains, SCORPION has gained international interest from
its unmatched secondary protein structure prediction accuracy.
Currently Dr. Li maintains SCORPION and plans to continue to do so for some time.
Maintaining SCORPION will involve basic supportive software updates and the possible re-training of
the neural network. Neural network training is the most laborious and time consuming job of maintaining
SCORPION. The purpose of retraining is to add prediction accuracy, by using new known sequences. The
cycle is based upon this shared information, where new sequences are predicted and verified, then used to
retrain neural networks. In order to initiate and maintain such a complex and specific undertaking
involved in a neural network, many resources are compiled along with determination from the ODU
computational biology team. With such effort, ODU needs to get the most use of SCORPION possible.
From the beginning of the SCORPION project, the public website has been the only method that
researchers and doctors can use for access. Due to time constraints, the website was created under tight
deadlines affectively cutting out all non required development. Unfortunately, some of the original
development was also cut, which happens to be required by the federal government. These U.S.
government requirements are covered in the 508 Compliancy standards that are demanded for all
federally funded or assisted websites (“Section 508”, 2014). Since SCORPION is partially funded by
National Science Foundation (NSF) grant number 1066471, all 508 compliance standards must be
followed.
Lab 1 – SCORPION Product Description
12
The largest user audience SCORPION has is pharmaceutical companies. The pharmaceutical
companies are trying to shorten the decade long drug testing process, which is caused by regulations
required for medical efficacy studies by the Food and Drug Administration (FDA), a federal
administrative body under Health and Human Services (HHS).The current method used in amino acid
sequencing uses expensive X-ray crystallography to visually see how the proteins fold. With the ability
to skip years of development for each specific drug, the entire drug manufacturing schedule will be
shortened, saving money and lives. With the ability to reduce costs, investors are also more likely to be
interested in assisting SCORPION.
4
SCORPION PRODUCT PROTOTYPE DESCRIPTION
The product prototype will entail a redesigned and more efficient website that will attract new
users. With the reuse of many of the original components such as PSI-BLAST and the intact neural
network, the entire system will be more popular than ever before. The website will feature functionality
that is of professional quality and will improve user experience. Logging in will show previous
submissions and results that will help researchers track and audit their own submissions. When an
administrator logs in, they will not only have access to view prior submissions, but will also be able to
view user statistics and locations. This will help the customer understand usage patterns and may assist
with future changes for performance improvements.
A cutting edge tool in industry today will be included in the prototype. This tool, the API, will
allow users to send submissions in a programmatic way. Instead of manually clicking and navigating to
the specific SCORPION website, users can write software that interfaces with SCORPION, using it just
as it were their own. With the RESTful API design, the jobs and submissions can be queued, allowing
increases of performance and reliability relating to predicting the secondary protein structure.
4.1 PROTOTYPE ARCHITECTURE (HARDWARE/SOFTWARE)
Lab 1 – SCORPION Product Description
13
The prototype hardware and software will be very similar for both the API and Website,
illustrated in figures 1 and 2. The website will be developed using HTML and CSS, which all modern
Internet browsers support. JavaScript will be used to sanitize amino acid sequence submissions as well as
check for valid email addresses. PHP has been chosen to handle the functionality of user log-in, mainly
due to open source libraries, which use an open source authentication method, OpenID. OpenID is used
by multiple companies such as Google and Facebook, where users can reuse credentials from one website
for a variety of others. The prototype will contain functionality for OpenID specifically associated with
Google since it is the least complicated of other current entities such as Twitter and Facebook. The full
version would use other entities for authentication such as Yahoo, Facebook, and Amazon, illustrated in
figure 3.
Authenticator
Real World Product
Google, Yahoo, Facebook, Twitter, Amazon Google
Code Libraries
PHP-Google Analytics,
PHP-Google Analytics,
PHP-Google oAuth2
PHP-Google oAuth2
SQLite3
SQLite3
Database
Prototype
FIGURE 3 - REAL WORLD VS PROTOTPYE
With user authentication, a database must be included in the prototype. The database management
system will be SQLite3, because it requires a small amount of resources and is easy to use. The database
will store the user logins, submissions, and results. All queries to the database will be done with PHP
code embedded in the web pages.
The SCORPION API will also be written in PHP. The API will consist of a specific URI which
will handle only two types of HTTP requests. The architecture for the API prototype will be in the form
of a URI, which will be appended to the website’s root path. The API top level will show the locations of
all the jobs in the queue. As the API user goes farther down the path, they will have access to the specific
jobs.
Lab 1 – SCORPION Product Description
14
4.2 PROTOTYPE FEATURES AND CAPABILITIES
The prototype will be capable of giving SCORPION programmatic access as well as allow the
website to pass an audit. The website form interface portion of SCORPION will be strip invalid
characters and will use resources on the client computer. This will free computational load on the
SCORPION system. The website will allow generic users to or an administrator to log in using Google
credentials. A typical user will be able to log in and submit an amino acid sequence just like an
anonymous user, and will be able to view all previously submissions and results. The log will include all
job information such as submission date, submission string, submission result, and status of job.
Anonymous users will not have access to such capabilities.
Logged in administrators will have the capability of seeing SCOPRION usage statistics and
traffic patterns. All logged in users will have the option to provide other information such as their name
and country. For anonymous user tracking, specific JavaScript will be placed on each of the web pages
this will log user IP addresses. With all of these statistics consumed, the administrator will be able to see
page views specific to sites, as well as time spent. Locations of the users will also be displayed using a
standard geographical IP location identifier.
The administrator can also submit an amino acid sequence just the same as a logged in user. They
will have all the same functionality as the normal logged in user such as submission history and job
status. All users will have the same functionality with submitting an amino acid sequence via the website.
If there are invalid characters or blank spaces in the submission, the user will be asked if they wish for the
characters to be automatically removed. For security purposes, the portion of the submission page will
also check for valid email addresses. This will help stop denial of service attacks where a program goes to
a web page and submits a massive amount of work to a system. Users with valid email addresses will be
the only ones able to submit amino acid sequences.
The entire website will also conform to 508 compliancy. To do so, all images and media will be
accompanied by text which may be read using a screen reader. If a user needs assistance in viewing and
Lab 1 – SCORPION Product Description
15
using the webpage, the entire website will be coded with text to speech directions in mind. Users who do
not need assistance, the information and sub-text will be invisible to them. Instructions on how to use
SCORPION will still be provided.
The API will capable of receiving an amino acid submission programmatically through HTTP. In
doing so, the API will queue each job and prevent the SCORPION neural network from cloning too many
instances of itself to process the jobs. Each job submitted to SCORPION will also be checked for valid
email as well as sanitize the amino acid sequence of invalid characters and blank spaces. Through HTTP,
users will have the ability to send GET commands to receive lists of jobs and information about specific
jobs. The format of the data will be XML and JSON compatible for ease of use in user developed
software. With JSON, the user will have an object to work with instead of parsing node trees. Along with
the API, there will be a landing point on the PHP web server that will encompass directions for using the
API, such as input and return types, as well as where certain algorithms exist in the URI.
4.3 PROTOTYPE DEVELOPMENT CHALLENGES
There are specific developmental challenges that relate to the website and the API separately. The
API will need to be intuitive and secure, as well as having proper documentation. A major development
challenge for the API is ensuring it remains secure against attacks. The API should be able to only handle
POST requests with very specific arguments such as an email and amino acid string. With the requests,
the API should be able to properly dispose of data and other objects and handle attacks from buffer
overflows. With a GET request, no sensitive data should be accessible at all by the PHP algorithms so a
scripting attack may not take place. The API should completely deny all other requests that initiate
anything other than the specified functionality of getting and posting jobs to SCORPION.
There will also be development challenges in interfacing the API with the current SCORPION
setup. The API will need to be designed in such a way that it can access resources that are relative to the
root directory path. Without addresses being perfectly correct, the entire system will crash, slowing
research and hurting SCOPRION reputation. To ensure this does not happen, a significant portion of the
Lab 1 – SCORPION Product Description
16
API will require the system architecture to be modified, placing resources in a central file folder or
location.
The final developmental challenge for the API will be documentation. Since the API will be
designed in the RESTful format, there will be no creation of a Web Service Descriptive Language
(WSDL) document which is used by the API consumer to provide information to the programmer. As a
standard practice for all API development, proper documentation must be created and be accessible to the
public. The documentation serves only as a reference to developers for consuming web services. This
documentation provides a central landing point for referencing methods used for programmatically
accessing specified web services, web service locations, and return types.
For the website, 508 compliance will take a significant amount of time. After the initial
development of each webpage, the developer will need to verify every HTML tag in the document to
support sub-text. For images, each HTML tag must have alternative text to go in place of a broken image
or for a screen reader. Another development challenge that relates to 508 compliancy is the order it is
tested. The Department of Health and Human Services supplies online a checklist of what to test for to
ensure proper compliance. For some tests, a tool are provided free to the public by W3.org.
Another major developmental challenge for the website will be determining an accurate estimated
time of completion for each job submission. For some users that need results in a specified time or have
scheduled something relative to receiving results, the estimated time for completion is crucial. If the
estimates are incorrect by a large relative error, user will become disgruntled and possibly abandon the
use of SCORPION. Testing has already been performed on average wait times for submitted protein
sequences through PSI-BLAST and the SCORPION neural network, but using that data may not supply
an accurate representation. The number of jobs being processed on the server will drastically affect wait
time. Not only will the website need information about the current jobs being submitted through the web
form, but will also need the number of jobs in queue from users using the API. In order to mitigate some
of the predicted time error, an estimated time range will be shown.
Lab 1 – SCORPION Product Description
17
The SCORPION website will be using a third party to store user information and credentials. As
such, there will be less risk of ODU being responsible for data leakage. Since colleges and universities are
generally considered not as secure when handling sensitive information as big businesses and medical
facilities, SCORPION may appear to be an easy target. With an open source and secure off premise
solution such as OpenID, the likelihood of sensitive data leakage is minimal. A constant problem in the
Information Technology industry is information leakage and the stealing of private and protected
information.
The final developmental challenge for the website involves user satisfaction. Since user
satisfaction will be critical to product success, considerable time must be spent designing a pleasant and
easy to use interface. The design must be 508 compliant and address those who have vision impairment.
If the color scheme does not support this compliancy, than it should be able to default to the browser
compatibility mode so a user who needs assistance can use the website. If the website is not 508
compliant, the customer will be in risk of losing grants from the National Science Foundation and
possibly be subject to fines and other penalties. To mitigate these risks, tools will be used to check for
compatibility for users that need aid.
Lab 1 – SCORPION Product Description
18
GLOSSARY
Amino Acids/Residues: The building blocks of proteins
API: Application Programmable Interface (abstract way for services to communicate)
Cross-validation Training: The process of dividing training data into k mutually exclusive subsets
(folds), of roughly equal size where some subsets are used for training, validating, and testing. The
process is repeated k times.
Data cleansing: The process of removing non-representative instances from the data set.
Dunbrack Lab: Part of the Fox Chase Cancer Research Center. Recognized for normalizing data from
the RCSB
ETL: Extract, Transform and Load. Referring to the manipulation of Data
FASTA: Format widely adopted in bioinformatics to make it easier to manipulate and parse sequences
GeoIP: Uses a lookup table of Internet Protocol addresses with known municipalities and providers to
match IP origin
GUI: Graphical User Interface
NSF: National Science Foundation
Protein Fold: A group of proteins made of amino acids that is formed into a functional shape.
PSI-BLAST: Position-Specific Iterative Basic Local Alignment Search Tool used for deriving the PSSM
PSSM: Position-Specific Scoring Matrix which includes information about evolutionary relatives of the
original protein sequence
RCSB Protein Data Bank: Research Collaboratory for Structural Bioinformatics database. The database
holds all known and recognized protein sequences.
REST: A REST API is a set of operations that can be invoked by means of any the four verbs, using the
actual URI as parameters for your operations. Four verbs including (GET, POST, PUT, and DELETE)
SCORPION: SeCOndaRy structure PredictION
STING: Streamlined Training In Neural-network GUI
Training set: Set of instances from the problem domain used to train the algorithm
508 Compliance: Adhering to guidelines established to make website content equally accessible to
people with disabilities
Lab 1 – SCORPION Product Description
19
REFERENCES
Biological Macromolecular Resource. (n.d.). RCSB Protein Data Bank. Retrieved Feb. 20, 2014, from
http://www.rcsb.org/pdb/home/home.do
Blue Team. (n.d.). SCORPION Protein Prediction Timed Experiment. . Retrieved February 11, 2014,
from www.cs.odu.edu/~410blue/CS410SCORPIONProteinPredictionTimeEx periment.xlsx
Cancer Research Funding - National Cancer Institute. (2013, August 23). Cancer Research Funding National Cancer Institute. Retrieved May 8, 2014, from
http://www.cancer.gov/cancertopics/factsheet/NCI/research-funding
Freitas, R. (1998, January 1). Nanomedicine. Chapter 3 page 1. Retrieved May 8, 2014, from
http://www.foresight.org/Nanomedicine/Ch03_1.html
Murphy, S. (2013, May 8). Deaths: Final Data for 2010. . Retrieved May 8, 2014, from
http://www.cdc.gov/nchs/data/nvsr/nvsr61/nvsr61_04.pdf
RCSB PDB - Histograms. (n.d.). RCSB PDB - Histograms. Retrieved May 8, 2014, from
http://www.rcsb.org/pdb/statistics/histogram.do?mdcat=mvStructure&mditem=residueC
ount&name=Residue%20Count
Section 508 . (n.d.). United States Department of Health and Human Services. Retrieved March 15,
2014, from http://www.hhs.gov/web/508/index.html
Section 508 Of The Rehabilitation Act. (n.d.). Section 508 Home. Retrieved March 15, 2014, from
http://www.section508.gov/Section-508-Of-The-Rehabilitation-Act
Yaseen, A., & Li, Y. Context-based Features Enhance Protein Secondary Structure Prediction
Accuracy.
Download