– Prototype Product Specification for READ ... Lab 2 CS 411W Lab II

advertisement
Lab 2 – Prototype Product Specification for READ
CS 411W Lab II
Version 1
Prototype Product Specification
For
READ
Prepared by: Marcus Zehr – Black Team
Date: 04/8/2013
1
Lab 2 – Prototype Product Specification for READ
2
Table of Contents
1. Introduction …......……………………...…………...………………………..…………….…3
1.1 Purpose ……………………………….………….……………………………………4
1.2 Scope …......…………………………………………………………..…………….…5
1.3 Definitions, Acronyms, and Abbreviations ....…………...…………..…………….…7
1.4 References …......………..…………………………..………………..…………….…9
1.5 Overview …......……………………………..………………………..…………….…9
2. General Description …... ...………………………………….………………………..…..…10
2.1 Prototype Architecture Description ………………………………...…….....………10
2.2 Prototype Functional Description ………………………………………………...…11
2.3 External Interfaces ...……………………………………………..………………….12
2.3.1 Hardware Interfaces ……………………………………………………..12
2.3.2 Software Interfaces ……………………………………………………...12
2.3.3 User Interfaces …………………………………………………………..13
2.3.4 Communication Protocols / Interfaces ………………………………..…13
List of Figures
Figure 1. READ’s MFCD…………………………………………..………………………..10
Figure 2. READ’s Web Layout …………………………………………………..……..…...11
List of Tables
Table 1. READ Prototype Vs. RWP ……….……………………………………………...……..5
Lab 2 – Prototype Product Specification for READ
3
1. Introduction
There are more than 4,700 research institutions in the United States (Digest of Education
Statistics).These institutions can display and their research through abundant publications and
upload them to the Internet. However, many of the organizations associated with these
institutions lack an efficient method or procedure for uploading and maintaining documents such
as these publications. This is a problem for research institutions and is important to fix so that
institutions may be appropriately recognized for work which is completed. In doing this, research
institutions may further advertise specific areas of research being performed at any given time.
The current process for many organizations to present their publications online is nonautomated, slow, and tedious which means there are many areas to improve upon to properly
display and advertise an institutions’ publications. One main reason for this is that the
responsibility to update the system may rest upon a sole individual or administrator. This is the
issue that will be addressed through the development of the READ web application for research
institutions.
The Repository for Electronic Aggregation of Documents, READ, aims to automate the
currently manual process of submitting an organizations’ publications and grant information. It
will also keep this information better organized and allow publications and grants to be
searchable using filters while being displayed in an easy to read format. In addition to this,
READ will provide the ability for users to verify that their grant and publication information is
correct. Ultimately READ will ease the burden of keeping up with numerous publications to
allow researchers to spend more time working and less time managing files.
Lab 2 – Prototype Product Specification for READ
4
1.1 Purpose
READ is an online database which has an abundance of capabilities including the ability to
store information about, publications, and links to outside grants publications; providing a way
for users to browse and search through publications using a number of filters such as author,
publish date, keywords, and type of document; It allows a user to advertise their research, both
past and current, as well as information about any grants which apply to them. It minimizes the
need for a user to manually organize their publications by utilizing the Schaefer Scraper to
automatically find publications on the Internet. READ however will not provide access to
copyrighted material or gather research material for anyone outside of Old Dominion University.
The initial consumer for READ is Old Dominion University’s Computer Science Department
(ODUCS). Dr. Michele Weigle, a professor at ODU with her Ph.D. in Computer Science, had
requested a solution for this particular issue and is acting as group mentor for this project. The
ODUCS Department features 37 faculty members, 11 currently enrolled Ph. D students, and 11
Master’s students according to its website. These are all individuals who would be able to take
advantage of a system which could make it easier to discover relevant and up to date research,
but they do not begin to cover the number of people who would find READ to be an
indispensable resource in the future.
Once testing is complete at ODUCS then READ may be utilized by other departments within
Old Dominion University. Potentially READ could then be used at other universities to help with
their organizational and research needs as well as government institutions, research institutions,
and libraries.
Lab 2 – Prototype Product Specification for READ
5
1.2 Scope
The Schaefer Scraper will be used by READ to search for publications matching a registered
users credentials and extract pertinent information found in the publications such as the title,
author or authors, publication date, and type of publication. This information is then inserted into
the database along with a link to that specific publication and a notification email is sent to that
user to authorize the newly uploaded publication information. Based on the actions of the user
and patterns of denied publications, READ will learn when not to notify a user to authorize
certain publications when they are found.
With READ in place, any user will be able to browse the database using an assortment of
filters for publications. Publications may be searched by publish date, author or keywords, and
whether or not the full text is available. Grants may be searched by total amount, grant status,
funding agency, or investigators.
Each user will have a profile which will display information including the user’s name, job
title, personal photo, email address, affiliated organization, and homepage. The users’ profile
page will also include graphical representations of the number of publications they have authored
and time which they were published as well as any funding received through the participating
publications. In addition to the graphical representations, the profile page for each user will
include a list of that specific user’s publications.
The READ prototype will be able to store and view numerous publications, grants, and types
of publications. The publications which are stored on the READ database will also be searchable
as well as any grant information pertaining to those publications. It will also have the ability to
inform its users of any publications found using the Schaefer Scraper in order to confirm the
Lab 2 – Prototype Product Specification for READ
6
authenticity of the publications in the database. The table below will display the differences
between the final version of READ and its prototype.
Features
Real World Project
Prototype
Browsing Capabilities
Ability to browse all grants and publication
Ability to browse all grants and publications
Publication Filtering
Capabilities
Filtered by title, publisher, authors, publication
date, date added, and keywords.
Filtered by title, publisher, authors, publication
date, date added, and keywords.
Grant Filtering
Capabilities
Filtered by title, funding agency, principal or
co-principal investigator, start date, end date,
and active state.
Filtered by title, funding agency, principal or
co-principal investigator, start date, end date,
and active state.
Add, edit, and delete
publications and
grants
Included. A thumbnail image and files may be
associated with the document. Fields can be
automatically filled in using a BibTex
document.
Included. A thumbnail image and files may be
associated with the document. Fields can be
automatically filled in using a BibTex
document.
Faculty page
Lists faculty and provides a link to each
person’s profile page
Not included.
Login interface
Linked to Old Dominion University Computer
Science accounts
Linked to Old Dominion University Computer
Science accounts
Profile Page
Displays authors’ profile picture, job title,
email address, personal webpage link, and the
author’s publications and grants. Displays
graphs
Displays authors’ profile picture, job title, email
address, personal webpage link, and the
author’s publications and grants. Graphs not
included.
Scraper
Will update the system with new publications
and grants and alert users when one is added to
the system under their name.
Will update the system with publications only
and alert users when one is added to the system
under their name.
Prediction algorithm
Predicts if the consumer has enough space to
use the READ system.
Not included
Administrative
Privileges
Administrators are able to edit, add, or remove
anything in the system.
Administrators are able to edit, add, or remove
anything in the system.
Table 1 – READ Prototype Vs. RWP
Lab 2 – Prototype Product Specification for READ
7
1.3 Definitions, Acronyms, and Abbreviations
Administrator/Administrative User: a user with increased privileges for editing database content
Author: A person that is able to add and edit publications and grants to the system under their
name.
BibTeX: A file format for reference information in XML format. It will be used to automatically
fill in key information when uploading or editing publications and grants.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development that sometimes includes theory about software engineering
methods.
Client application: The module that takes input and creates queries to be processed by a server,
and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Funding Agency: The source of funds for research grants.
GIT: A software system for controlling and organizing software versioning.
Google Scholar: A search engine primarily used to find academic literature.
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc
that can be interacted with via a mouse and keyboard, through which a user interacts with
a software application. .
Internet scraper: A program which takes unstructured data on the web and puts it into structured
Lab 2 – Prototype Product Specification for READ
8
data that can be stored and analyzed in a central local database or spreadsheet.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
MicrosoftAcademic: A free service developed by Microsoft Research to help scholars, scientists,
students, and practitioners quickly and easily find academic content, researchers,
institutions, and activities.
MySQL: An open source database software.
Parse: To process a statement for specific meanings.
Perl: A widely-used programming language on the server-side of web applications.
PHP: A widely-used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed upon,
responsible for documenting the work and publishing research results.
Publication or Academic Publication: A document published in an academic journals, technical
reports, and records of conference proceedings.
Query: A command sent to the database to either change the database or get back results
READ: Repository for Electronic Aggregation of Documents
RSS: A dialect of XML for subscribing to and distributing news.
RWP: Real World Project.
Scraper: An automated application designed to scan a source of input such as a document or a
website for pertinent information.
Server application: In a client/server architecture, the module that takes queries or requests from
a client module, process them, and returns the result to the client.
Software Compatibility: A description of whether different software, or versions of software, can
Lab 2 – Prototype Product Specification for READ
9
communicate/interact.
SQL: A widely used programming language used to query databases.
SQL injection: Performing unauthorized queries on a database for malicious purposes.
User Authentication: The process of verifying the access credentials of a user of an automated
system, usually accomplished by requesting a username and password combination.
Viewer: An outside person who wishes to query the information contained in the READ
database.
Version Control: A method for organizing and recording different versions of documents that
have been created over time.
Virtual Private Server (VPS): A software version of a hardware server used to create
independent servers on a single piece of hardware.
Web server: A group of applications constantly “on” resource whose sole or main job is to
respond to HTTP requests from browsers.
XML: Extensible markup language.
1.4 References
Digest of Education Statistics. 2011. National Center For Educational Statistics Web. 19 Nov
2012.
<http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=report>.
1.5 Overview
This product specification provides the hardware and software configuration, external interfaces,
capabilities and features of the READ prototype. The information which is provided in the remainder of
this document includes a detailed description of the hardware, software, and internal design of the READ
prototype; the key features of the prototype; and the parameters that will be used to control, manage, or
establish those features.
Lab 2 – Prototype Product Specification for READ
10
2. General Description
READ will incorporate the use of simple hardware and software solutions which have been
integrated together seamlessly in order to perform its duties with ease in the hands of the users.
Figure 1 below illustrates the major functional components of READ. The READ solution
consists of a single server which will house three main software components: a web interface, a
publication link database, and Schaefer’s Scraper.
2.1 Prototype Architecture Description
Figure 1 – READ’s MFCD
The web interface itself will have both public and private areas available for access to users.
The private areas will require a user to log on to their account in order to access and will allow
for the user to perform various tasks. The user will then be allowed access to their own profile
page and administrative abilities. Inside the web interface the user may access the search filters
for publications and grants as well as other users’ public profiles.
The publication link database’s main function is to house and provide links to any external
publications and grant information to the users. This database will also contain files which have
Lab 2 – Prototype Product Specification for READ
11
been uploaded by the users including publications, grants, and other files which may be related
to either.
The last internal component is the Schaefer Scraper which is an automated tool that will
search specific external web sites for new publications submitted via a list of authors provided as
input within a XML file. The scraper will do this by looking for publications by the included
authors, collect and parse the results, and then export them into the READ link database for
further use.
2.2 Prototype Functional Description
The READ prototype will be vital in order to organize publications and the grant information
associated with them for Old Dominion University. The prototype which will be implemented
will be modeled using the Old Dominion University's Computer Science Department’s
publications and hosted on a virtual machine running a Linux based operating system. It will also
include all of the features of the real world project and be instrumental to maintain and upkeep
publications and grants for the university.
The READ Prototype will have the ability to search through the database and filter the results
based on user queries, implement RSS Feeds, allow for users to log on, edit, and upload data, and
give functional control of the web application to administrators. By utilizing the Schafer Scraper,
it will also find publications and insert them into the READ database. The users of this system
will be able to log on and access their own publication and grant information and have the ability
to edit this information. They will also be able to upload personal information and upload files to
the READ database. These functions of the prototype will allow for easy access to numerous
publications, grants, and tech reports written by Old Dominion University faculty and students.
This will all be located in a well-organized and easy to navigate user interface which will utilize
Lab 2 – Prototype Product Specification for READ
12
a filter and search page, displays for publications and grants, a profile display system, and a RSS
feed.
2.3 External Interfaces
This section goes over the software and devices used within the READ prototype. READ
requires the use of particular hardware and software to operate and contains a graphical user
interface.
2.3.1 Hardware Interfaces
READ will require a computer with an active connection to the Internet in order to perform
tasks. This computer is how the user will gain access to the web interface via a web browser and
directing themselves to the correct URL from the ODU computer science department home page.
2.3.2 Software Interfaces
The READ prototype which will be created will allow a user to log onto the system via a
web-based interface. This interface will then let the user search for or add publications to the
database which is publicly viewable. The Schafer Scraper will run as well on pre-defined
schedule in order to populate the database with information both initially and on a regular basis
as defined by the administrators of the READ program.
READ will be programmed using php and use mysql as these languages and open source
software are better suited for the needs of this project than others and will allow a large and
detailed database structure to be accessed with ease on the Internet. This prototype also
incorporates the use of an over the shelf piece if software named the Schaefer Scraper. This was
built using php and returns search results in HTML format. For the purpose of the prototype, the
Schaefer Scraper will be scraping information from Google Scholar, Microsoft Academic,
Arnetminer, Scopus, and Google Citation.
Lab 2 – Prototype Product Specification for READ
13
2.3.3 User Interfaces
Figure 2 – READ’s Web Layout
The user interface for the READ prototype is designed for use on a desktop computer but can
also be used on mobile devices with access to the internet as well. Figure 2 displays the flow of
the web sites GUI design and navigation links.
2.3.4 Communication Protocols / Interfaces
READ will be utilizing Transmission Control Protocol/Internet Protocol (TCP/IP). This is
done in order to ensure that data found by the Schaefer Scraper is delivered reliably to the
link server database.
Download