Software Requirements Specification 1. Introduction 1.1 Purpose This software requirements specification (SRS) describes the functions and requirements of the web-based ESTMD system. The document is used during the specification process. It is the baseline of developing, validating and testing the software. The intended audience for the SRS is the faculty and researchers of Biology department of Kansas State University. 1.2 Scope This software requirements specification defines the requirement specification of the web-based ESTMD system. The system will focus on searching different level sequences (such as raw sequences, cleaned sequences, and assembled sequences) and related information (such as Gene Ontology, pathway) efficiently. 1.3 Definitions ESTMD – Expressed Sequence Tag Model Database. Java Servlets – A Java applet that runs within a web server environment. JSP – JavaServer Page. It provides a simplified, fast way to create dynamic web content. XML – Extensible Markup Language. A subset of SGML constituting a particular text markup language for interchange of structured data. The Unicode Standard is the reference character set for XML content. XSLT – Extensible Stylesheet Language Transformations. An XSLT style sheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary, such as (X)HTML or XSL-FO. 1.4 References IEEE STD 830-1998, “IEEE Recommended Practice for Software Requirements Specifications”, 1998 Edition, IEEE, 1998 Marty Hall, “Core Servlets and JavaServer Pages”, Prentice Hall PTR, 2000 ESTAP, “http://www.vbi.vt.edu/~estap” 1.5 Overview This document provides a description of the requirements for the web-based ESTMD system. Section 2 is the overall descriptions of the package including major components and product design. Section 3 provides specific requirements of different components and performance criteria. 2. Overall Description 1 This section provides an overview of key web-based ESTMD system requirements. It is intended for general information only, and does not describe all the details of the various items. 2.1 Product perspectives This product is a web-based database system. The main functions are: allow users to query information by inputting some keywords or IDs through web interfaces. They may also download sequences from the database or submit data to the database. 2.1.1 System Interfaces Access data from database Handle data submission 2.1.2 User Interfaces All the user interfaces are web-based. Main page Figure 1. Snapshot of the main page shows all the functions and search tools. Search in Detail 2 Figure 2. Snapshot of the “Search in Detail” page. Search by Keyword Figure 3. Snapshot of the “Search by Keyword” page. 3 Gene Ontology Figure 4. Snapshot of the “Gene Ontology” page. GO Classification 4 Figure 5. Snapshot of the “Gene Ontology Classification” page. Pathway 5 Figure 6. Snapshot of the “Pathway” page. Downloads Figure 7. Snapshot of the “Downloads” page. Data Submission 6 Figure 8. Snapshot of the “Data Submission” page. 2.1.3 Hardware Interfaces Server side: Speed: Pentium 4 Processor at 2.8 GHz CPU Architecture: x86 Network/connection architecture: TCP/IP, HTTP protocol Storage: 120 GB Ultra ATA/100 Hard Drive Memory: 1 GB PC 1066 RDRAM Client side: Network connection 2.1.4 Software Interfaces Server side: Operating System The software can be run on multiple platforms such as Microsoft Windows, Linux, and UNIX systems. Name Mnemonic Version Source Microsoft Windows Windows UNIX UNIX 2000,XP V1.1.7 Microsoft Corp. Sun Corp. Red Hat Linux V9.0 Red Hat Corp. Linux 7 Web server: Apache 2.0 Database server: MySQL 4.0 Client side: Internet Browser: Internet Explore or Netscape 2.2 Product functions (use case) Search in Detail Search by Keyword Contig View Gene Ontology User Login Tree View GO Classification Pathway Download Diagram: Use Case Project: ESTMD System Author: Yinghua Dong Data Submission Figure 9. Use Case for ESTMD System The product has the following main functions as shown in Figure 9: Search detail information on different levels of sequences Search general information on different levels of sequences Search gene ontology information Classify gene ontology of the sequences Search pathway information Download or submit data 2.3 User characteristics 8 It is necessary to know how to use a mouse, keyboard and Internet browser. The user interface will be friendly enough to guide the users. Knowledge of basic biology will be performed. 2.4 Constraints The main constraint of the project is MySQL 4.0 database management system. MySQL is faster than Oracle on small to medium sized databases, and is easy to administrate. But MySQL is less powerful on complex queries. Another constraint is that some data are not available now. Some related databases need to be downloaded from other web sites or obtained from the labs of Biology department. 3. Specific Requirements 3.1 External Interfaces There are 7 main web pages as user interfaces in this system. The detailed descriptions of all inputs and outputs of the system are as follow: 3.1.0 Login Inputs: user name and password Outputs: if correct, show main page; otherwise, show error message. 3.1.1 Search in Detail Inputs: choose items from drop-down boxes, type gene symbol, gene name, or any type of ID (such as unique sequence ID, clone ID, FlyBase ID, Genbank ID or accession ID), and check the check boxes of the features which users expected in the results. Outputs: the corresponding features according to users' selections. 3.1.2 Search by Keyword Inputs: choose items from drop-down boxes, and type keyword Outputs: a table includes clone ID, raw sequence length, cleaned sequence length, unique sequence ID, unique sequence length, gene name, and gene symbol. 3.1.3 Gene Ontology Search Inputs: choose items from drop-down boxes, type a single gene symbol/name, or ID, or choose a local file containing a batch of sequence IDs or FlyBase IDs; and choose radio boxes of gene ontology type and sort by options. Outputs: the results table includes GO ID, term, type, sequence ID, hit ID (FlyBase ID), and gene symbol. The hyperlinks on terms can show Gene Ontology tree structure. 3.1.4 Gene Ontology Classification 9 Inputs: a batch of gene symbols/names, or choose a local file containing sequence IDs; and check items from checkboxes of gene ontology types which users want to classify. Outputs: a table includes gene ontology type, subtype, sequence count, and percentage of sequences. 3.1.5 Pathway Search Inputs: choose items from drop-down boxes, type a single gene symbol/name, ID, EC number, pathway name, or choose a local file containing a batch of sequence IDs or FlyBase IDs; and choose radio boxes of “sort by” and “search scope”. Outputs: the results table includes pathway name, category, sequence ID, EC number and Enzyme count. 3.1.6 Downloads Inputs: click the item which user wants to download. Outputs: the corresponding sequences information 3.1.7 Data Submission Inputs: data information and user information Outputs: a success or failure message 3.2 Functions The validity of inputs will be checked on the client side. Error and exception will be handled on the server side. 3.3 Logical Database requirements Figure 10 shows the Entity-Relationship model of ESTMD. 3.4 Software System Attributes 3.4.1 Efficiency With traditional CGI, a new process is started for each HTTP request. However, with servlets, the Java virtual machine stays running and handles each request with a lightweight Java thread. If there are N requests to the same CGI program, the code for the CGI program is loaded into memory N times. With servlets, however, only a single copy of the servlet class would be loaded. This approach reduces server memory requirements and saves time by instantiating fewer objects. Servlets remain in memory even after they complete a response, so it is straightforward to store arbitrarily complex data between client requests. 3.4.2 Platform-independence Servlets are the Java platform technology of choice for extending and enhancing web servers. They provide a component-based, platformindependent method for building web-based applications. 10 Figure 10. E-R Model for ESTMD 3.4.3 Convenience Web interfaces make the system easy to use. User only needs to know how to use a web browser and does not need to download, install, or learn any special software. 3.4.4 Reliability HTML with JavaScript will validate user input on client side. Exceptions and errors on server side will be handled by java exception handling. 3.4.5 Security In traditional CGI, the programs are often executed by operating system shells, and processed by languages that do not automatically check array or string bounds. Servlets suffer from neither of these problems. Even if a servlet executes a system call to invoke a program on the local operating system, it does not use a shell to do so. And array bounds checking and other memory protection features are a central part of the Java programming language. 11 Three-tier structure can make the data safe. The client tier is not in direct communication with the database. In order to send or receive data it must communicate with the application-server tier which in turn communicates with the data-server tier. 12