Lab 2 – Prototype Product Specification 1 Running Head: LAB 2 – READ REQUIREMENTS CS 411W Lab II Prototype Product Specification For READ Prepared by: Jacob Phillmon, Black Group Date: April 8, 2013 Lab 2 – Prototype Product Specification 2 Running Head: LAB 2 – READ REQUIREMENTS 1 2 Introduction ............................................................................................................................. 3 1.1 Purpose ............................................................................................................................. 4 1.2 Scope ................................................................................................................................ 6 1.3 Definitions, Acronyms, and Abbreviations ...................................................................... 7 1.4 References ...................................................................................................................... 10 1.5 Overview ........................................................................................................................ 11 General Description .............................................................................................................. 11 2.1 Prototype Architecture Description ................................................................................ 11 2.2 Prototype Functional Description................................................................................... 13 2.3 External Interfaces.......................................................................................................... 16 2.3.1 Hardware Interfaces ................................................................................................ 16 2.3.2 Software Interfaces ................................................................................................. 16 2.3.3 User Interfaces ........................................................................................................ 17 2.3.4 Communication Protocols and interfaces ............................................................... 17 Lab 2 – Prototype Product Specification 3 Running Head: LAB 2 – READ REQUIREMENTS 1 Introduction In the United States there are over 4,700 research institutions (Digest of Education Statistics). These institutions publicize their research through publications and upload them to the Internet in order to share them with the online community. The need to upload these documents can expend a large amount of time because in most organizations it is a manual process. The workload can be so extensive that some groups, like Old Dominion University’s Computer Science Department, are unable to keep their systems up to date; the latest publication uploaded dates back to 2008. Outdated systems of this nature are a poor representation of group’s findings; the system should display the information in a way that advertises the organization to the general public. Funding the organization’s research projects are numerous grants that have been awarded by external funding agencies. Like publications, grants must also be stored in the system, and are just as tedious to upload as the publications they are associated with. READ is a repository for electronic aggregation of documents developed by Old Dominion University’s Computer Science department. It is designed specifically to the needs of Old Dominion University’s Computer Science department, but it can be integrated into any online system which displays a company’s publications and grants. It is designed to automate the process of adding and organizing publications and grants into a filterable format. It will also give users the option to filter what they are looking for, allowing users to narrow down topics and locate ones of relevant interest. The prototype will provide basic functionality including a user interface, a fully constructed database, user functions such as editing or adding publications or grants, and the automation of publication and grant submissions to the system. It will not include Lab 2 – Prototype Product Specification 4 Running Head: LAB 2 – READ REQUIREMENTS features that mainly provide aesthetic functionality, such as graphs that illustrate the number of publications a person a created over the past few years or the amount of grant money a person has earned. 1.1 Purpose READ, a repository for electronic aggregation of documents is a system designed specifically for Old Dominion University’s Computer Science Department. The department is composed of a group of faculty members, most of which produce numerous publications detailing their research every year. In an attempt to organize the faculty’s publications into a single viewable location, the department had a system in place where publications were manually submitted by the faculty and later added to the system manually by the system’s administrator. The process cost such a large amount of time just to update the system that most of the faculty stopped submitting publications all together. This can be seen in the systems display itself, as the last submitted publication dates back to the year 2008 (Recent Publications). The page also lacks any filter capabilities; all publications are displayed with those most recently published at the top and older ones going to the. The display page is no longer linked to the department’s homepage because it is out of date and no longer in use. The READ system is designed to encourage use of the new system through by automating the process of updating the system as well as adding additional browsing capabilities. Eventually the system may be expanded to be included in other departments at Old Dominion University as well as other organizations that require a system to organize their publications and grants. Lab 2 – Prototype Product Specification 5 Running Head: LAB 2 – READ REQUIREMENTS Intentionally left blank Figure 1 – Major Functional Component Diagram Figure 1 illustrates the components that will be used within the READ system. The system will be stored on a server owned by Old Dominion University. Major software components within the system include a graphical user interface, a database, and a Scraper. The system interface will be split up into two sections: a public section and a private section. The public section will allow anyone to browse and filter publications or grants stored within the system as well as allow the user to view author profile pages. The private section allows authors to edit or remove publications from the system over which they have ownership of. The private section will require a login interface that will validate whether or not the user is a valid author. A database will be used to house all publication and grant data as well as authors that are registered within the system. The user interface will communicate with the database in order to display publications and grants stored within it or when an author submits changes to their profile Lab 2 – Prototype Product Specification 6 Running Head: LAB 2 – READ REQUIREMENTS information, publications, or grants they have ownership over. The Schaefer Scraper will search specific sites over the Internet and extract publications that are associated with authors stored within the database. It will run on a timed basis set by system administrators and will update the database with the most recent publications automatically. A module called the Prediction Algorithm will be provided on the READ main webpage to determine if a company has enough storage space in order to use READ to meet their standards. The Prediction Algorithm will require the average amount of storage consumed by an author, the average number of uploaded files per author, along with the average size of the upload. READ will allow authors to have their publications and grants automatically added to the system. Only extra information such as a single thumbnail image will require manual input from the author. The system will still preserve the ability to allow authors to upload publications directly into the system without having to use the systems automated features. It will also allow people to view any publications or grants that are currently in the system. Additional information for authors in the READ system will be listed on their designated profile pages, as well as any publications or grants they are associated with. 1.2 Scope The READ prototype is aimed to store publications and grants created by the department’s faculty in a single location. The system is designed to minimize the amount of time needed to update the system with the most recent publications produced by the department’s faculty. Overall, the system will generate greater interest in Old Dominion University’s Computer Science Department. The prototype is designed to integrate the use of the Schaefer Scraper into a working database system and display environment. It will be used to demonstrate the functionality of the system to the Old Dominion University Computer Science department; this Lab 2 – Prototype Product Specification 7 Running Head: LAB 2 – READ REQUIREMENTS demonstration will allow them to decide on any changes they may want made to the system before it is fully developed. The prototype will use actual publications and grants created by Old Dominion University’s Computer Science faculty in order demonstrate the effectiveness of the Schaefer Scraper. Additional user interface functionality will also be implemented in order to demonstrate the systems usage. 1.3 Definitions, Acronyms, and Abbreviations Administrator/Administrative User: a user with increased privileges for editing database content Author: a person who publishes in an academic journal or other academic BibTeX: A file format for reference information in XML format. It will be used to automatically fill in key information when uploading or editing publications and grants. Computer Science (CS): An academic discipline based on advancing computing theory and algorithm development, that sometimes includes theory about software engineering methods. Client application: In a client/server architecture, the module that takes input and creates queries to be processed by a server, and receives the results from the server. Client/Server Architecture: A software engineering paradigm that separates functionality into a “client” application and a “server” application that interact. CSS: A programming language used to specify presentation of HTML pages Data Mining: The act of going through a source of input to find specific information. Database Schema: A description of the structure of database Funding Agency: The source of funds for research grants. These organizations usually have a Lab 2 – Prototype Product Specification 8 Running Head: LAB 2 – READ REQUIREMENTS limited amount of money to (pass out) principle investigator’s that submit an accepted application for research funds. GIT: A software system for controlling and organizing software versioning. GoogleScholar (http://scholar.google.com): Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. scholar.google.com Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc. that can be interacted with via a mouse and keyboard, through which a user interacts with a software application. Used to differentiate from a “command-line interface”, in which a user interacts with a software application solely through a text terminal. internet scraper: internet scraper / web scraper - (wikipedia) web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. JQuery Sparklines: A development library for the visualization of data. ODU: Old Dominion University. MicrosoftAcademic (http://academic.research.microsoft.com/): Microsoft Academic Search is a free service developed by Microsoft Research to help scholars, scientists, students, and practitioners quickly and easily find academic content, researchers, institutions, and activities. Microsoft Academic Search indexes not only millions of academic publications, it also displays the key relationships between and among subjects, content, Lab 2 – Prototype Product Specification 9 Running Head: LAB 2 – READ REQUIREMENTS and authors, highlighting the critical links that help define scientific research. Microsoft Academic Search makes it easy for you to direct your search experience in interesting and heretofore hidden directions with its suite of unique features and visualizations. MySQL: A database querying language. Parse: A technical term usually used to describe the processing of a statement written in a programming language. May be used generally to describe the processing of any statement for specific meaning. Perl: A widely-used programming language on the server-side of web applications. PHP: A widely-used programming language on the server-side of web applications. Principle Investigator (PI): The primary researcher that a research grant is bestowed upon, responsible for documenting the work and publishing research results. Publication or Academic Publication: A document created by a faculty member to share research. They are usually published in an academic journals, technical reports, and records of conference proceedings. Query: An algorithm sent to the database to either change the database or get back results READ: Repository for Electronic Aggregation of Documents RSS: A system for subscribing to and distributing news. Scraper: An automated application designed to scan a source of input such as a document or a website for pertinent information. Server application: In a client/server architecture, the module that takes queries or requests from a client module, process them, and returns the result to the client. Software Compatibility: A description of whether different softwares, or versions of software, Lab 2 – Prototype Product Specification 10 Running Head: LAB 2 – READ REQUIREMENTS can communicate/interact. SQL: A widely-used programming language used to query databases. SQL injection: Performing unauthorized queries on a database for malicious purposes. User Authentication: The process of verifying the access credentials of a user of an automated system, usually accomplished by requesting a username and password combination. Viewer: In the scope of our project an outside person who wishes to query the information contained in the READ database. Version Control: A method for organizing and recording different versions of documents that have been created over time. Virtual Private Server (VPS): A software version of a hardware server. Used to create independent servers (....) on a single piece of hardware. Webserver: A group of applications run on a computer or VPS in to serve webpages and provide server-side computation for browser-based client applications. A web server is a constantly “on” resource whose sole or main job is to respond to HTTP requests from browsers. XML: Extensible markup language. 1.4 References Digest of Education Statistics. 2011. National Center For Educational Statistics Web. 19 Nov 2012. <http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer= report>. "Recent Publications." Department Of Computer Science. N.p., n.d. Web. 13 Feb. 2013. <http://www.cs.odu.edu/recent_publications.shtml>. Lab 2 – Prototype Product Specification 11 Running Head: LAB 2 – READ REQUIREMENTS 1.5 Overview This product specification provides the hardware and software configuration, external interfaces, capabilities and features of the READ prototype. The information provided in the remaining sections of this document includes a detailed description of the hardware, software, and external interface architecture of the READ prototype; the key features of the prototype; the parameters that will be used to control, manage, or establish these features; and the performance characteristics of these features in terms of inputs, outputs, and user interaction. 2 General Description 2.1 Prototype Architecture Description Figure 2 Prototype Major Function Component Diagram Lab 2 – Prototype Product Specification 12 Running Head: LAB 2 – READ REQUIREMENTS The major hardware and software component structure of the READ prototype is illustrated in Figure 2. The READ system is stored on a Debian Virtual machine. Access to the system will require a computer the ability to browse the Internet. The main software components built within the system are the database, the Web-Based interface, and the Schafer Scraper. The database shall be written and created using MySQL software as it is a language the READ team has extensive experience working with. All publication and grant data stored in the database will be based off of actual publications and grants owned by Old Dominion University’s Computer Science faculty and graduate students. The Web-Based interface shall be written using PHP and standard HTML, as well as AJAX in order to create a type-ahead publication and grant filter and query system. The Schafer Scraper is a prebuilt module provided by Andrew Schaefer. It will provide all the functional capability of the Scraper needed except for the ability to add grants automatically into the READ system. Features Browsing Capabilities Real World Project Ability to browse all grants and publication Prototype Ability to browse all grants and publications Publication Filtering Capabilities Filtered by title, publisher, authors, publication date, date added, and keywords. Filtered by title, publisher, authors, publication date, date added, and keywords. Grant Filtering Capabilities Filtered by title, funding agency, principal or co-principal investigator, start date, end date, and active state. Filtered by title, funding agency, principal or co-principal investigator, start date, end date, and active state. Add, edit, and delete publications and grants Included. A thumbnail image and files may be associated with the document. Fields can be automatically filled in using a BibTex document. Included. A thumbnail image and files may be associated with the document. Fields can be automatically filled in using a BibTex document. Faculty page Lists faculty and provides a link to each person’s profile page Not included. Lab 2 – Prototype Product Specification 13 Running Head: LAB 2 – READ REQUIREMENTS Login interface Linked to Old Dominion University Computer Science accounts Linked to Old Dominion University Computer Science accounts Profile Page Displays authors’ profile picture, job title, email address, personal webpage link, and the author’s publications and grants. Displays graphs Displays authors’ profile picture, job title, email address, personal webpage link, and the author’s publications and grants. Graphs not included. Scraper Will update the system with new publications and grants and alert users when one is added to the system under their name. Will update the system with publications only and alert users when one is added to the system under their name. Prediction algorithm Predicts if the consumer has enough space to use the READ system. Not included Administrative Administrators are able to edit, add, Privileges or remove anything in the system. Table 1 – Features and Capabilities list Administrators are able to edit, add, or remove anything in the system. Table 1 details the differences between the real world project and the READ prototype. The prototype itself consists of most of the capabilities and features of the real world problem except for a few that are primarily aesthetic. For starters the profile page will not display graphs detailing information about the author’s contributions. The Prediction Algorithm will not be included in the prototype as it would only be used as a guideline for other groups that may wish to use the READ system. The faculty page will also not be included as the computer science department already has one on their main page. The department may choose to incorporate links to the profile pages from their own faculty page in the future. 2.2 Prototype Functional Description The major functional components are shown in Figure 2, and an in depth description of the system’s interface privileges is illustrated in Figure 3. When a user first visits the READ interface, they will have access only to the system’s viewer privileges, including the ability to Lab 2 – Prototype Product Specification 14 Running Head: LAB 2 – READ REQUIREMENTS view publications, grants, and profile information on each author in the system. The user can then choose to login to the system. If he or she is an authentic user, they will then be logged in as an author; if not, he or she will still only have access to the viewer privileges. Authors can add publications and grants to the system manually, edit publications and grants they have ownership of, and edit their own profile information. If their account is designated as an administrative account, he or she will have access to the following administrative privileges: the ability to remove publications and grants from the system, the ability to edit any publication or grant in the system, the ability to edit anyone’s profile information, and the ability to set the system’s default settings (such as the number of publications or grants that are displayed on a single page). Administrators and authors will still have access to all viewer privileges. Figure 3: READ interface privileges diagram. Lab 2 – Prototype Product Specification 15 Running Head: LAB 2 – READ REQUIREMENTS A detailed illustration of the flow of the Schaefer Scraper can be found in Figure 4. The scraper will search for publications created by ODU CS faculty members using predefined publication websites. If a publication is found, it will check if it has already been added to the system. If the publication is not in the system, the Schaefer Scraper will add it to the READ database, as well as add ownership of it to the author it had searched for. If the publication is already in the system, it will check to see if the author it is searching for already has ownership of it; if the author does not, it will add the author as an owner of the publication. After it has either added the publication to the system or given the specified author ownership privileges, the Schaefer Scraper will then send an email to the author that a publication has been added to the system under their name. When the author checks the email, he or she will be able to select whether or not the publication actually belongs to him or her. If the author denies ownership of the publication, the system will check if anyone else has ownership of it; if no one else has ownership over it, then the system removes the publication from the system altogether, but if someone else does have ownership over it then the system will only remove the author from the ownership list. If the author accepts ownership of the publication, the system either authorizes the publication to be shown in the system or it authorizes the user as an accepted owner of the publication. Intentionally left blank Lab 2 – Prototype Product Specification 16 Running Head: LAB 2 – READ REQUIREMENTS Figure 4: Scraper Flow Diagram 2.3 External Interfaces External interfaces will be limited to standard PC hardware and freely available software. The only custom interface will be the READ interface. 2.3.1 Hardware Interfaces No hardware interfaces will be built for this prototype. A PC will be used to demonstrate the READ system. The READ system will be hosted on an ODU Debian virtual machine. 2.3.2 Software Interfaces Group members will interact with the READ MySQL database using a putty windowing Lab 2 – Prototype Product Specification 17 Running Head: LAB 2 – READ REQUIREMENTS system. The READ web-based interface will be built using PHP, AJAX, Javascript, and XML. The code will be developed using standard text editing tools such as notepad++ and Emacs. The login interface will use the ODU CS department’s login system for authentication purposes. 2.3.3 User Interfaces Figure 5 represents the site map of the READ user interface. From the READ homepage it is possible to reach ones own user profile page, as well as the publication and grant query pages. If someone were to select one of the authors associated with a specific grant or publication from the query page, one is able to view that authors profile page. From one’s own profile page it is possible to add and edit grants and publications to the system. Figure 5: Site Map READ Homepage Publication Grant Administration User Profile Add Publications Add Grants Edit Publications Edit Grants 2.3.4 Communication Protocols and interfaces Https, rather than the normal http protocol, will be used in order to create a secure connection with the READ system. The only extra external interface used with the system will be an authors CS email system.