Running Head: Troy M. Connor Lab 2 1 CS411W Lab II

advertisement
Running Head: Troy M. Connor Lab 2
1
CS411W Lab II
Lab 2 – Prototype Specification
For
READ
Prepared by: Troy M. Connor
07/26/2016
Running Head: Troy M. Connor Lab 2
2
TABLE OF CONTENTS
1.
INTRODUCTION
3
1.1 Purpose
4
1.2 Scope
5
1.3
5
1.4
Definitions, Acronyms and Abbreviations
REFERENCES
11
1.5 Overview
11
2.
12
General Description
2.1 Prototype Architecture Description
12
2.2 Prototype Functional Description
14
2.3 External Interfaces
17
2.3.1 Hardware Interfaces
17
2.3.2 Software Interfaces
17
2.3.3 User Interfaces
18
2.3.4 Communications Protocols and Interfaces
18
FIGURES
FIGURE 1 READ PROTOTYPE MAJOR FUNCTIONAL COMPONENT DIAGRAM
FIGURE 2. READ USER INTERFACE
FIGURE 3. SCHAFER'S SCRAPER
FIGURE 4 SITE MAP
13
15
16
16
Running Head: Troy M. Connor Lab 2
3
1. Introduction
In the United States there are over 4,700 research institutions (Digest of Education and
Statistics).
The way that the public views the research by these institutions are done by an
online list. Usually the process is manually done and can cause problems when the interested
public is looking for specific research interests. These web pages are rarely kept up to date and
do not provide an adequate representation of what research or publication was added. Old
Dominion University’s Computer Science Department (ODUCS) is among the research
institutions with this issue. They face the problem of having a web page dedicated to the listing
of publications that has not been updated since 2008. Research grants are also not currently
being kept up to date. The problem is that the process of entering grant information and
publications into the present system becomes a single person’s responsibility. This makes it
difficult for people interested in grant information and publications from Old Dominion’s
Computer Science Department.
The Repository for Electronic Aggregation of Documents (READ) is being developed by
members of ODUCS. Its design will not only serve the Computer Science Department, but other
departments as well. It is designed to automate the adding of publication and research grants for
the public to view on an online web page. This can resolve the need of finding publications or
grants to promote the interest in any research institution.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
4
1.1 Purpose
READ is an online database that will house grants, publications, or links that will provide
a means for viewers to see them using a number of specific filters. The goal is to minimize the
need for an author to manually manage their publications through the features that READ will
employ. READ will advertise ongoing research, and available grant information to whomever
requires it. READ will have the users to work less in order to READ more.
READ will search through different websites using a tool known as the web scraper for
publications that match a registered authors credentials. The web scraper tool searches the
Internet for Old Dominion University’s Computer Science recent publications and research
grants and records the necessary data into a database. READ will store the information as a link
to the document. The author will receive an email notification when publications have been
added to the database that they published. This allows for an efficient way to gather all of an
author’s publications into a single location seamlessly, with little extra work on their behalf.
The system will allow viewers to browse through READ using filters for grants, multiple
types of publications, and authors. Viewing author’s profiles will show personal information. It
will also show a graphical representation of both publications they have created, and funding
they received through grants they participated in along with a list of publications. This statistical
information may inform viewers about a specific author’s area of expertise, level of activity, and
a point of contact should they be interested in research of similar topics.
(This space is intentionally left blank.)
Running Head: Troy M. Connor Lab 2
5
1.2 Scope
READ is an online database that will house grants, publications, or links that will provide
a means for viewers to see them using a number of specific filters. The goal is to minimize the
need for an author to manually manage their publications through the features that READ will
employ. READ will advertise ongoing research and available grant. READ will help with the
issue of having unorganized lists of publications and grant information.
READ will search through different websites using a tool known as the web scraper for
publications that match a registered author’s credentials. The web scraper tool searches the
Internet for current faculty’s recent publications to store in the database. READ will store the
information in a table in the database.
The system will allow viewers to browse through READ using filters for grant
information, multiple types of publications, and authors. Viewing author’s profiles will show
personal information, publications they have created, and funding they received through grants
they participated in. This statistical information may inform viewers about a specific author’s
area of expertise, level of activity, and a point of contact should they be interested in research of
similar topics.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
6
1.3 Definitions, Acronyms, and Abbreviations
Administrator/Administrative User: a user with increased privileges for editing database content
Author: A person that is able to add and edit publications and grants to the system under their
name.
BibTeX: A file format for reference information in XML format.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development, that sometimes includes theory about software engineering
methods.
Client application: In a client/server architecture, the module that takes input and creates
queries to be processed by a server, and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Running Head: Troy M. Connor Lab 2
7
Funding Agency: The source of funds for research grants. These organizations usually have a
limited amount of money to (pass out) principle investigator’s that submit an accepted
application for research funds.
GIT: A software system for controlling and organizing software versioning.
GoogleScholar (http://scholar.google.com): A website that stores academic publications.
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc
that can be interacted with via a mouse and keyboard, through which a user interacts with
a software application.
Internet scraper: A program that is designed to sort through data that is stored online
Joomla!: A content management system for designing web interfaces.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
Running Head: Troy M. Connor Lab 2
8
MicrosoftAcademic (http://academic.research.microsoft.com/): A website that stores information
on academic publications
MySQL: An implementation of SQL that is open source.
Parse: A technical term usually used to describe the processing of a statement written in a
programming language.
Perl: A widely used programming language on the server-side of web applications.
PHP: A widely used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed upon,
responsible for documenting the work and publishing research results.
Publication or Academic Publication: A document created by a faculty member to share
research. They are usually published in an academic journals, technical reports, and records of
conference proceedings.
Query: An algorithm sent to the database to either change the database or get back results
READ: Repository for Electronic Aggregation of Documents
Running Head: Troy M. Connor Lab 2
9
RSS: A specification for subscribing to and distributing news.
Scraper: An automated application designed to scan a source of input such as a document or a
website for pertinent information.
Server application: In a client/server architecture, the module that takes queries or requests from
a client module, process them, and returns the result to the client.
Software Compatibility: A description of whether different software, or versions of software, can
communicate/interact.
SQL: A widely used programming language used to manipulate databases.
SQL injection: Performing unauthorized queries on a database for malicious purposes.
User Authentication: The process of verifying the access credentials of a user of an automated
system, usually accomplished by requesting a username and password combination.
Viewer: an outside person who wishes to query the information contained in the READ database.
Version Control: A method for organizing and recording different versions of documents that
have been created over time.
Running Head: Troy M. Connor Lab 2
10
Virtual Private Server (VPS): A software version of a hardware server, used to create
independent servers on a single piece of hardware.
Webserver: A group of applications run on a computer or VPS in to serve webpages and provide
server-side computation for browser-based client applications.
XML: Extensible markup language.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
11
1.4 References
“Delta Cost Project Data.” The Delta project on Postsecondary Education Cost, Productivity,
and Accounatablilty. The Delta project n.d. Web 9 Feb 2013
Connor, Troy M. CS 411 Lab I: Product Description for READ. 25 March 2013.
1.5 Overview
The READ Prototype will contain a webserver, a web interface, a login interface, and the
web scraper tool. It will be modeled using the Old Dominion University Computer Science
Department’s publications hosted on a virtual machine. It will be running a Linux-based
operating system. It will allow for authors to log on, edit, and upload data, give functional
control of the web application to administrators, utilize the Schafer Scraper to find publications,
and insert them into the READ database. These functions of the prototype will allow for easy
access to numerous publications, grants, and tech reports written by Old Dominion University
faculty. The software installed on this virtual machine will search the Internet for publications
for ODU’s faculty members to insert into the database. The website that will display the
publications and grant information will have filters. These filters will return the intended results
to the user. The major components of the READ prototype will act as the Real World Product’s
solution to demonstrate the use of the system.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
2.
12
General Description
The READ prototype will be capable of storing and finding various kinds of publications
and research grants. The READ prototype will be able to display this information to various
users. It will also be capable of informing the authors of the publications found in order to
confirm the authenticity of the publication. The system will allow authors to upload publications
or research grants data themselves.
The READ prototype will also reduce the various risks that are inherent in this kind of
application. The legal risks of displaying copyrighted papers will be reduced by having a READ
administrator go through the database and removing any papers that ODU does not have the
legal right to display. It will also address various compatibility risks by making alternate versions
of the page that should work with various web browsers. Security risks, such as SQL injection,
will be taken care of with encryption. It will handle the risk of hard copy publication submission
by making users scan documents before uploading them.
2.1 Prototype Architecture Description
READ major components are, a link database, a web interface, the web scraper tool, and
the Internet. The components READ has makes it able to be implemented to other areas as well
as servicing the Old Dominion University Computer Science Department. The link database will
be the container that stores all of the authors, publications and grant information. The web
interface is the user interface that allows anyone to come to the website and search for results.
The web scraper tool is the software generated to search for publications for faculty members
and place them in the database.
Running Head: Troy M. Connor Lab 2
13
The major components of READ listed in Figure 1 display a representation of what it will look
like.
Figure 1 READ Prototype Major Functional Component Diagram
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
14
2.2 Prototype Functional Description
The functions that READ provides are searching and filtering for publications and grant
information, scraping the Internet for publications, and login interface to allows author controls.
Each function provided by READ all correspond with each other. For instance, the web scraper
tool finds publications by the authors that are in an external database, and adds them to the
READ database. The login interface allows authors to log in, edit their profile, add publications,
or add grant information. The search and filter for the viewers allows users to find publications
and grant information. Figure 2 displays the flow of how the author can utilize the user interface.
Figure 3 shows the implementation of the web scraper tool that finds publications on the Internet
for authors in the database. This web scraper will have the publications in the database kept up to
date when it is run on intervals. The site map for all users shown in Figure 4 allows users to
navigate the site to their liking. These filters will allow any user to navigate through the
publications and grant information with ease.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
Figure 2. READ User Interface
(This section intentionally left blank.)
15
Running Head: Troy M. Connor Lab 2
Figure 3. Schafer's Scraper
Figure 4 Site Map
16
Running Head: Troy M. Connor Lab 2
17
2.3 External Interfaces
The external interfaces with READ consist of the hardware interface, the software
interface, the user interface and the communications protocol. These interfaces are vital to the
operation of READ. Each interface is dependent on another to make READ fully functional. If
one interface is down, it’s likely that the other interfaces will fall short as well.
2.3.1 Hardware Interfaces
The hardware interfaces for READ consists of the link database and the Apache webserver.
These interfaces allow for the publications to be stored. These interfaces allow the grant information to
be stored and processed. The hardware interfaces not only houses the publications but if it not utilized
will affect the whole READ system. READ has to stay online in order to be effective, and these hardware
interfaces allow that. The hardware associated with the READ system has to maintain its integrity in
order to function fully.
2.3.2 Software Interfaces
The software interfaces include the Linux-based virtual machine with all the software
installed on it. This software includes but not limited to, PHP, PERL, MYSQL, and PYTHON.
This software along with the Schafer scraper allows READ to perform actions that manipulate
the data in the database. It allows for READ to add, delete, search, filter, and create anything the
author desires. The publications can be searched online with the scraper. The scraper can and
will use the hardware interface to find publications and then store them in the database. The
software interface is a vital component in READ as everything that functions is dependent on it.
Running Head: Troy M. Connor Lab 2
18
2.3.3 User Interfaces
The user interface is any computer that is hooked to the Internet. The user can access the
web interface that will be the display for all the publications and grant information at a domain
we chose. Right not the domain is https://411black.cs.odu.edu and it brings you to a webpage.
This webpage allows users or authors to search for publications, while being able to perform a
myriad of other functions. The functions that the user will be able to perform are all dedicated to
the software we have installed on the virtual machine. Each time a user performs a query, the
software performs an action and brings the user back the desired results. The user interface
displays the information to the user in a way that seem very static. Each query or function is a
dynamic operation. These operations flow by the use of the software coupled by the hardware.
2.3.4 Communications Protocols and Interfaces
The communications protocols and interfaces used for READ will be TCP/IP. This
protocol allows READ to function online. The functions being performed could be done locally
but would serve no purpose. The purpose of READ is to reach users not just at the ODU level,
but to everyone who shares an interest with specific research.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 2
19
Download