Running Head: Lab 2 – Sting Product Description Sting Prototype Product Specification SCORPION – Blue Team Old Dominion University CS411W – Janet Brunelle Author: David Eason Last Modified: November 12, 2014 Version: 2.0 1 Lab 1 – SCORPION Product Description 2 TABLE OF CONTENTS 1 Introduction ........................................................................................................................................................................... 3 1.1 Purpose .....................................................................................................................................................................3 1.2 Scope ..........................................................................................................................................................................4 1.3 Definitions, Acronyms, and Abbreviations ........................................................................................................6 1.4 References .......................................................................................................................................................................7 1.5 Overview ..........................................................................................................................................................................7 2 General Description ....................................................................................................................................................... 8 2.1 Prototype Architecture Description .....................................................................................................................8 2.2 Prototype Functional Description .........................................................................................................................9 2.3 External Interfaces ................................................................................................................................................... 13 3 Specific Requirements ................................................................................................................................................ 13 LIST OF FIGURES Figure 1- Website GUI.......................................................................................................................................................... 10 Figure 2- Using the API........................................................................................................................................................ 12 LIST OF TABLES Table 1 - RWP vs Prototype................................................................................................................................................. 9 Table 2- User Capabilities .................................................................................................................................................. 11 Lab 1 – SCORPION Product Description 3 1 INTRODUCTION 1.1 PURPOSE At Old Dominion University, a program named SCORPION is used to predict the secondary structure of a protein, and it is gaining popularity. This program is currently being used by researchers around the world in an effort to aid in cancer research. A very important part of cancer research and treatment involves the study of protein folding. A protein fold is made of a sequence of proteins, and the proteins are made of a sequence of amino acids. Currently, the most popular method of studying these sequences is slow and expensive. SCORPION gives a researcher the ability to accurately predict the sequences of amino acids, resulting in what is known as the protein’s secondary structure. With SCORPION as a tool for researchers, studies can be done more efficiently and with fewer expenses. Sting, the product being developed by the fall 2014 Blue team, will provide a new method to access SCORPION, as well as address issues with the current website. SCORPION was originally designed by Dr. Ashraf Yaseen as a graduate project that became popular with cancer researchers and computational biologists due to its high accuracy. With the high level of accuracy, growing numbers of people are accessing SCORPION to predict secondary structures and only have one method of using it. Currently, a website user visits the website, manually enters a series of characters that represent amino acids, enters an email address, and waits for the results to be sent to them. Sting will streamline the use of SCORPION by using an Application Programming Interface (API). The API will allow the user to programmatically submit queries to SCORPION, allowing it to be used quickly and efficiently. Sting will also provide a redesigned public website that will address issues relating to compliancy and professional image. The most important issue that will be addressed is the absence of government requirements. SCOPRION is partly funded by federal grants, and must Lab 1 – SCORPION Product Description 4 be compliant with the laws listed in the Rehabilitation Act of 1973. Section 508 of this Act is what SCORPION must specifically comply with, which was put in place to allow persons with accessibility needs to use electronic media such as video and text. Sting will ensure compliancy by using alternative text for images, using a color scheme friendly to the color blind, and detail text used for screen readers. The new 508 compliance website will have a user friendly and professional look, still providing the intended functionality to the user. Both product and prototype will incorporate user logins and usage data to the administrators. Website users will now have the ability to log in to see all of their previously submitted sequences as well as the results. This will aid the user in tracking and organizing queries. For logged in administrators, previous submissions and results will also be available as well as information about current website statistics. These statistics will include page views and user locations by country, allowing the customer to better understand the scope of the website users. 1.2 SCOPE An API will give a new method of accessing SCORPION which will be interfaced with the existing software. In order for more people to use it, it will be language agnostic, meaning any computer programming language can be used with it. The goal for users is to be able to efficiently and effortlessly send submissions to SCORPION for processing. Doing so allows other software to interface with SCORPION to take advantage of the secondary structure prediction accuracy. Sting will give a new, professional look to SCORPION and a new means of using it. With SCORPION associated closely with Old Dominion University, the website affects the institution’s image. As SCORPION continues to grow in popularity, it should professionally represent the organizations that own it. In order for SCORPION to be funded by federal grants and pass an audit, the website will be section 508 compliant. The website will also incorporate a Lab 1 – SCORPION Product Description log in functionality, which helps users organize submissions and feel more connected to SCORPION. Within the login functionality will be administrative specific information about website usage. For both prototype and final product, a super-user account will be provided. All statistical information, as well as user credentials, will come from third party sources. Doing so will keep security concerns minimal and keep the website maintenance low. For the prototype, the user will login with Google, and submissions will be stored into a SQLite 3 database. With the user’s unique ID given by Google, the submission can be tied to a user and kept in the database. (This space is intentionally left blank) 5 Lab 1 – SCORPION Product Description 1.3 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS Amino Acids- Building blocks of proteins API- Application Programmable Interface (abstract way for services to communicate) Data cleansing- The process of removing non-representative instances from the data set. GeoIP- Uses a lookup table of Internet Protocol addresses with known municipalities and providers to match IP origin GUI- Graphical User Interface oAuth2- Standard format of authentication used by security professionals Predict time- The calculated amount of time for SCORPION to receive input, predict a secondary protein structure, and send results back to the user. Protein Fold- A group of proteins made of amino acids that are formed into a functional shape. Sanitize – Removing invalid amino acid characters from input. SCORPION- SeCOndaRy structure PredictION Section 508 Compliance- Guidelines established to make website content equally accessible to people with disabilities. It is a part of the Accessibility Act of 1973. (This space is intentionally left blank) 6 Lab 1 – SCORPION Product Description 7 1.4 REFERENCES Biological Macromolecular Resource. (n.d.). RCSB Protein Data Bank. Retrieved Feb. 20, 2014, from http://www.rcsb.org/pdb/home/home.do Blue Team. (n.d.). SCORPION Protein Prediction Timed Experiment. . Retrieved February 11, 2014, from www.cs.odu.edu/~410blue/CS410SCORPIONProteinPredictionTimeEx periment.xlsx Cancer Research Funding - National Cancer Institute. (2013, August 23). Cancer Research Funding National Cancer Institute. Retrieved May 8, 2014, from http://www.cancer.gov/cancertopics/factsheet/NCI/research-funding Freitas, R. (1998, January 1). Nanomedicine. Chapter 3 page 1. Retrieved May 8, 2014, from http://www.foresight.org/Nanomedicine/Ch03_1.html Lab 1 - SCORPION. Version 2. (2014, October). STING. Blue Team. CS411W: David Eason Murphy, S. (2013, May 8). Deaths: Final Data for 2010. . Retrieved May 8, 2014, from http://www.cdc.gov/nchs/data/nvsr/nvsr61/nvsr61_04.pdf RCSB PDB - Histograms. (n.d.). RCSB PDB - Histograms. Retrieved May 8, 2014, from http://www.rcsb.org/pdb/statistics/histogram.do?mdcat=mvStructure&mditem=residueC ount&name=Residue%20Count Section 508 . (n.d.). United States Department of Health and Human Services. Retrieved March 15, 2014, from http://www.hhs.gov/web/508/index.html Section 508 Of The Rehabilitation Act. (n.d.). Section 508 Home. Retrieved March 15, 2014, from http://www.section508.gov/Section-508-Of-The-Rehabilitation-Act Yaseen, A., & Li, Y. Context-based Features Enhance Protein Secondary Structure Prediction Accuracy. 1.5 OVERVIEW This product specification provides the software configuration, required program libraries, interface, and features of the Sting prototype. The information provided in the remaining sections of this document includes a detailed description of required tasks and items needed for the functional requirements. The product specification requirements provided in Lab II Section 3.1 can be found in a separate document. Lab 1 – SCORPION Product Description 2 8 GENERAL DESCRIPTION Sting is a two part solution that allows individuals to access SCORPION via the Internet. The majority of users who will access SCORPION will be through the new website which will replace the existing one. The new website will automatically check user input for errors which will increase overall performance of SCORPION, removing wasted computation time. The new website will also feature optional login capabilities for users and administrators. The second part of Sting consists of a public API for anyone wishing to streamline SCORPION sequence submission. The public API is of a REST format so there will be less communication over the network rather than other API formats. The API will handle sequence submission to SCORPION as well as retrieving submission results. The documentation for using the API endpoints will be simple and concise, allowing new users to quickly learn how to utilize the services. 2.1 PROTOTYPE ARCHITECTURE DESCRIPTION The SCORPION website will be redesigned and a public API will be integrated into the system. The website will feature a navigation menu, user logins, 508 compliancy, and sequence submission cleansing. For website administrators, website statistics and user information will be displayed. Users will have access to all their previously submitted sequences as well as results. When the user submits a sequence through the website, embedded JavaScript code will check for invalid characters. If there are any illegal characters in the submission they will be removed. The integrated API will be on the same server as the website and will be accessible to anyone with access to the Internet. The API will allow for streamlined submissions to SCORPION, speeding up the time it takes for researchers to map protein folds. With a unique ID returned to the API user after a submission, the user can use it to find results of the completed job. Lab 1 – SCORPION Product Description 9 The differences between the product and prototype are functionality based, and are illustrated in Table 1.The prototype will only use Google for user authentication, whereas the product would use Yahoo, Facebook, and Amazon as well. For the prototype, the submissions are sent through a simulated version of SCORPION for demonstration purposes. The API would function the same way in both the prototype and product. Function Real World Product Prototype Google Authentication x x Yahoo Authentication x Facebook Authentication x Amazon Authentication x Administrator Profiles x x User Profiles x x Automatic Gathering of User Data* x Website Statistics x x Public API x x * Users will manually enter locations with the prototype TABLE 1 - RWP VS PROTOTYPE 2.2 PROTOTYPE FUNCTIONAL DESCRIPTION The Sting prototype will give SCORPION a new website. The website will have a new look and feel while providing more functionality for sequence submissions. The home page, illustrated in Figure 1, shows the new layout each web page will have. This format will be consistent, and the contents will be filled dynamically with information for a logged in user or administrator. After receiving input from a user for an amino acid sequence, the sequence will be checked for invalid characters, and if found, alert the user that they have been removed. Lab 1 – SCORPION Product Description 10 FIGURE 1- WEBSITE GUI Another feature provided in the prototype website will be user logins. Authentication will be provided by the Google oAuth API. This authentication method allows the user to authenticate with an existing account associated with Google. The user logs in, and then Google sends back user information. The information sent back to the webpage will be the user’s name, email address, and unique ID. With this functionality, users will be greeted with their Google associated name on the homepage. When the user submits a protein sequence, they will not be required to manually enter an email address. When no email address is entered, the default will be the associated login email address. Upon submission, the database will be updated to contain the email address, unique ID, and submitted sequence. Logged in users will also have access to see previous submissions and results that is stored during submission. Logged in administrators will have a section on the website showing user statistics as well as previous submission data. Administrators will also be able to log in and use the website as all other users. On each web page, a section of code will be placed that sends data about each visitor to Google Analytics. This data will later be collected using the Google Analytics API and Lab 1 – SCORPION Product Description 11 return information about page views and user locations. Table 1 illustrates the abilities of logged in users, non logged in users, and Administrators. Function Submit sequence Greet message View previous submissions View previous submission results Access to page views Access to user locations Not Logged In x User Type Logged In x x x x Administrator x x x x x x TABLE 2- USER CAPABILITIES The website will be 508 compliant. The current website does not conform to government standards required for federally funded projects, and if audited, this may cause financial penalties for Old Dominion University. For compliance, the prototype will contain accompanying text that would allow for disabled users with accessibility software to use the website. Users who are vision impaired will be able to use a screen reader, which would read aloud the accompanying text and website directions, giving the ability to navigate the page and submit sequences. The public API prototype will be integrated with the same system that the website is for sending submissions. The user will be able to submit an amino acid sequence accompanied by a subject and email through Internet protocols. Users will receive a unique ID upon submitting that can be used to get results programmatically. The API will be accessed via the Internet and will use the Hypertext Transfer Protocol (HTTP) methods, GET and POST. The process for sending and receiving data to the Sting API is illustrated in Figure 2. The POST method will be used to submit a sequence and will return a unique ID. The GET method will be used to retrieve submission results based upon the ID. All other HTTP methods will be refused, increasing security on the system. Lab 1 – SCORPION Product Description 12 FIGURE 2- USING THE API The prototype website and API will submit sequences to a mock SCORPION algorithm. For prototyping purposes, the real version of SCORPION will take a long time to process submissions and return results for testing. Instead, the Sting will use an algorithm that takes input like the SCORPION neural network, but returns a random sequence of amino acid characters. Lab 1 – SCORPION Product Description 13 The prototype will be architected in such a way that the real SCORPION system may be easily integrated. 2.3 EXTERNAL INTERFACES There exist no external interfaces needed in construction of the prototype. For the final product, no external interfaces will exist either. 3 SPECIFIC REQUIREMENTS All requirements for the Sting prototype exist in a separate document named, “Lab 2 Section 3”. The functional requirements are explicitly listed in section 3.1 and all aspects of the prototype are detailed. Specifications on how to configure external applications are also listed in this document.