Running Head: Troy M. Connor Lab 2 1 CS411W Lab II Lab 2 – Prototype Specification For READ Prepared by: Troy M. Connor 07/26/2016 Running Head: Troy M. Connor Lab 2 2 TABLE OF CONTENTS 1. INTRODUCTION 3 1.1 Purpose 4 1.2 Scope 5 1.3 5 1.4 Definitions, Acronyms and Abbreviations REFERENCES 11 1.5 Overview 11 2. 12 General Description 2.1 Prototype Architecture Description 12 2.2 Prototype Functional Description 14 2.3 External Interfaces 17 2.3.1 Hardware Interfaces 17 2.3.2 Software Interfaces 17 2.3.3 User Interfaces 18 2.3.4 Communications Protocols and Interfaces 18 FIGURES FIGURE 1 READ PROTOTYPE MAJOR FUNCTIONAL COMPONENT DIAGRAM FIGURE 2. READ USER INTERFACE FIGURE 3. SCHAFER'S SCRAPER FIGURE 4 SITE MAP 13 15 16 16 Running Head: Troy M. Connor Lab 2 3 1. Introduction In the United States there are over 4,700 research institutions (Digest of Education and Statistics). The way that the public views the research by these institutions are done by an online list. Usually the process is manually done and can cause problems when the interested public is looking for specific research interests. These web pages are rarely kept up to date and do not provide an adequate representation of what research or publication was added. Old Dominion University’s Computer Science Department (ODUCS) is among the research institutions with this issue. They face the problem of having a web page dedicated to the listing of publications that has not been updated since 2008. Research grants are also not currently being kept up to date. The problem is that the process of entering grant information and publications into the present system becomes a single person’s responsibility. This makes it difficult for people interested in grant information and publications from Old Dominion’s Computer Science Department. The Repository for Electronic Aggregation of Documents (READ) is being developed by members of ODUCS. Its design will not only serve the Computer Science Department, but other departments as well. It is designed to automate the adding of publication and research grants for the public to view on an online web page. This can resolve the need of finding publications or grants to promote the interest in any research institution. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 4 1.1 Purpose READ is an online database that will house grants, publications, or links that will provide a means for viewers to see them using a number of specific filters. The goal is to minimize the need for an author to manually manage their publications through the features that READ will employ. READ will advertise ongoing research, and available grant information to whomever requires it. READ will have the users to work less in order to READ more. READ will search through different websites using a tool known as the web scraper for publications that match a registered authors credentials. The web scraper tool searches the Internet for Old Dominion University’s Computer Science recent publications and research grants and records the necessary data into a database. READ will store the information as a link to the document. The author will receive an email notification when publications have been added to the database that they published. This allows for an efficient way to gather all of an author’s publications into a single location seamlessly, with little extra work on their behalf. The system will allow viewers to browse through READ using filters for grants, multiple types of publications, and authors. Viewing author’s profiles will show personal information. It will also show a graphical representation of both publications they have created, and funding they received through grants they participated in along with a list of publications. This statistical information may inform viewers about a specific author’s area of expertise, level of activity, and a point of contact should they be interested in research of similar topics. (This space is intentionally left blank.) Running Head: Troy M. Connor Lab 2 5 1.2 Scope READ is an online database that will house grants, publications, or links that will provide a means for viewers to see them using a number of specific filters. The goal is to minimize the need for an author to manually manage their publications through the features that READ will employ. READ will advertise ongoing research and available grant. READ will help with the issue of having unorganized lists of publications and grant information. READ will search through different websites using a tool known as the web scraper for publications that match a registered author’s credentials. The web scraper tool searches the Internet for current faculty’s recent publications to store in the database. READ will store the information in a table in the database. The system will allow viewers to browse through READ using filters for grant information, multiple types of publications, and authors. Viewing author’s profiles will show personal information, publications they have created, and funding they received through grants they participated in. This statistical information may inform viewers about a specific author’s area of expertise, level of activity, and a point of contact should they be interested in research of similar topics. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 6 1.3 Definitions, Acronyms, and Abbreviations Administrator/Administrative User: a user with increased privileges for editing database content Author: A person that is able to add and edit publications and grants to the system under their name. BibTeX: A file format for reference information in XML format. Computer Science (CS): An academic discipline based on advancing computing theory and algorithm development, that sometimes includes theory about software engineering methods. Client application: In a client/server architecture, the module that takes input and creates queries to be processed by a server, and receives the results from the server. Client/Server Architecture: A software engineering paradigm that separates functionality into a “client” application and a “server” application that interact. CSS: A programming language used to specify presentation of HTML pages Data Mining: The act of going through a source of input to find specific information. Database Schema: A description of the structure of database Running Head: Troy M. Connor Lab 2 7 Funding Agency: The source of funds for research grants. These organizations usually have a limited amount of money to (pass out) principle investigator’s that submit an accepted application for research funds. GIT: A software system for controlling and organizing software versioning. GoogleScholar (http://scholar.google.com): A website that stores academic publications. Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc that can be interacted with via a mouse and keyboard, through which a user interacts with a software application. Internet scraper: A program that is designed to sort through data that is stored online Joomla!: A content management system for designing web interfaces. JQuery Sparklines: A development library for the visualization of data. ODU: Old Dominion University. Running Head: Troy M. Connor Lab 2 8 MicrosoftAcademic (http://academic.research.microsoft.com/): A website that stores information on academic publications MySQL: An implementation of SQL that is open source. Parse: A technical term usually used to describe the processing of a statement written in a programming language. Perl: A widely used programming language on the server-side of web applications. PHP: A widely used programming language on the server-side of web applications. Principle Investigator (PI): The primary researcher that a research grant is bestowed upon, responsible for documenting the work and publishing research results. Publication or Academic Publication: A document created by a faculty member to share research. They are usually published in an academic journals, technical reports, and records of conference proceedings. Query: An algorithm sent to the database to either change the database or get back results READ: Repository for Electronic Aggregation of Documents Running Head: Troy M. Connor Lab 2 9 RSS: A specification for subscribing to and distributing news. Scraper: An automated application designed to scan a source of input such as a document or a website for pertinent information. Server application: In a client/server architecture, the module that takes queries or requests from a client module, process them, and returns the result to the client. Software Compatibility: A description of whether different software, or versions of software, can communicate/interact. SQL: A widely used programming language used to manipulate databases. SQL injection: Performing unauthorized queries on a database for malicious purposes. User Authentication: The process of verifying the access credentials of a user of an automated system, usually accomplished by requesting a username and password combination. Viewer: an outside person who wishes to query the information contained in the READ database. Version Control: A method for organizing and recording different versions of documents that have been created over time. Running Head: Troy M. Connor Lab 2 10 Virtual Private Server (VPS): A software version of a hardware server, used to create independent servers on a single piece of hardware. Webserver: A group of applications run on a computer or VPS in to serve webpages and provide server-side computation for browser-based client applications. XML: Extensible markup language. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 11 1.4 References “Delta Cost Project Data.” The Delta project on Postsecondary Education Cost, Productivity, and Accounatablilty. The Delta project n.d. Web 9 Feb 2013 Connor, Troy M. CS 411 Lab I: Product Description for READ. 25 March 2013. 1.5 Overview The READ Prototype will contain a webserver, a web interface, a login interface, and the web scraper tool. It will be modeled using the Old Dominion University Computer Science Department’s publications hosted on a virtual machine. It will be running a Linux-based operating system. It will allow for authors to log on, edit, and upload data, give functional control of the web application to administrators, utilize the Schafer Scraper to find publications, and insert them into the READ database. These functions of the prototype will allow for easy access to numerous publications, grants, and tech reports written by Old Dominion University faculty. The software installed on this virtual machine will search the Internet for publications for ODU’s faculty members to insert into the database. The website that will display the publications and grant information will have filters. These filters will return the intended results to the user. The major components of the READ prototype will act as the Real World Product’s solution to demonstrate the use of the system. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 2. 12 General Description The READ prototype will be capable of storing and finding various kinds of publications and research grants. The READ prototype will be able to display this information to various users. It will also be capable of informing the authors of the publications found in order to confirm the authenticity of the publication. The system will allow authors to upload publications or research grants data themselves. The READ prototype will also reduce the various risks that are inherent in this kind of application. The legal risks of displaying copyrighted papers will be reduced by having a READ administrator go through the database and removing any papers that ODU does not have the legal right to display. It will also address various compatibility risks by making alternate versions of the page that should work with various web browsers. Security risks, such as SQL injection, will be taken care of with encryption. It will handle the risk of hard copy publication submission by making users scan documents before uploading them. 2.1 Prototype Architecture Description READ major components are, a link database, a web interface, the web scraper tool, and the Internet. The components READ has makes it able to be implemented to other areas as well as servicing the Old Dominion University Computer Science Department. The link database will be the container that stores all of the authors, publications and grant information. The web interface is the user interface that allows anyone to come to the website and search for results. The web scraper tool is the software generated to search for publications for faculty members and place them in the database. Running Head: Troy M. Connor Lab 2 13 The major components of READ listed in Figure 1 display a representation of what it will look like. Figure 1 READ Prototype Major Functional Component Diagram (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 14 2.2 Prototype Functional Description The functions that READ provides are searching and filtering for publications and grant information, scraping the Internet for publications, and login interface to allows author controls. Each function provided by READ all correspond with each other. For instance, the web scraper tool finds publications by the authors that are in an external database, and adds them to the READ database. The login interface allows authors to log in, edit their profile, add publications, or add grant information. The search and filter for the viewers allows users to find publications and grant information. Figure 2 displays the flow of how the author can utilize the user interface. Figure 3 shows the implementation of the web scraper tool that finds publications on the Internet for authors in the database. This web scraper will have the publications in the database kept up to date when it is run on intervals. The site map for all users shown in Figure 4 allows users to navigate the site to their liking. These filters will allow any user to navigate through the publications and grant information with ease. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 Figure 2. READ User Interface (This section intentionally left blank.) 15 Running Head: Troy M. Connor Lab 2 Figure 3. Schafer's Scraper Figure 4 Site Map 16 Running Head: Troy M. Connor Lab 2 17 2.3 External Interfaces The external interfaces with READ consist of the hardware interface, the software interface, the user interface and the communications protocol. These interfaces are vital to the operation of READ. Each interface is dependent on another to make READ fully functional. If one interface is down, it’s likely that the other interfaces will fall short as well. 2.3.1 Hardware Interfaces The hardware interfaces for READ consists of the link database and the Apache webserver. These interfaces allow for the publications to be stored. These interfaces allow the grant information to be stored and processed. The hardware interfaces not only houses the publications but if it not utilized will affect the whole READ system. READ has to stay online in order to be effective, and these hardware interfaces allow that. The hardware associated with the READ system has to maintain its integrity in order to function fully. 2.3.2 Software Interfaces The software interfaces include the Linux-based virtual machine with all the software installed on it. This software includes but not limited to, PHP, PERL, MYSQL, and PYTHON. This software along with the Schafer scraper allows READ to perform actions that manipulate the data in the database. It allows for READ to add, delete, search, filter, and create anything the author desires. The publications can be searched online with the scraper. The scraper can and will use the hardware interface to find publications and then store them in the database. The software interface is a vital component in READ as everything that functions is dependent on it. Running Head: Troy M. Connor Lab 2 18 2.3.3 User Interfaces The user interface is any computer that is hooked to the Internet. The user can access the web interface that will be the display for all the publications and grant information at a domain we chose. Right not the domain is https://411black.cs.odu.edu and it brings you to a webpage. This webpage allows users or authors to search for publications, while being able to perform a myriad of other functions. The functions that the user will be able to perform are all dedicated to the software we have installed on the virtual machine. Each time a user performs a query, the software performs an action and brings the user back the desired results. The user interface displays the information to the user in a way that seem very static. Each query or function is a dynamic operation. These operations flow by the use of the software coupled by the hardware. 2.3.4 Communications Protocols and Interfaces The communications protocols and interfaces used for READ will be TCP/IP. This protocol allows READ to function online. The functions being performed could be done locally but would serve no purpose. The purpose of READ is to reach users not just at the ODU level, but to everyone who shares an interest with specific research. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 2 19