Lab2 Version 1 - ODU Computer Science

advertisement
Lab 2 – Prototype Product Specification 1
Running Head: LAB 2 – READ REQUIREMENTS
CS 411W Lab II
Prototype Product Specification
For
READ
Prepared by: Jacob Phillmon, Black Group
Date: April 8, 2013
Lab 2 – Prototype Product Specification 2
Running Head: LAB 2 – READ REQUIREMENTS
1
2
Introduction ............................................................................................................................. 3
1.1
Purpose ............................................................................................................................. 4
1.2
Scope ................................................................................................................................ 6
1.3
Definitions, Acronyms, and Abbreviations ...................................................................... 7
1.4
References ...................................................................................................................... 10
1.5
Overview ........................................................................................................................ 11
General Description .............................................................................................................. 11
2.1
Prototype Architecture Description ................................................................................ 11
2.2
Prototype Functional Description................................................................................... 13
2.3
External Interfaces.......................................................................................................... 16
2.3.1
Hardware Interfaces ................................................................................................ 16
2.3.2
Software Interfaces ................................................................................................. 16
2.3.3
User Interfaces ........................................................................................................ 17
2.3.4
Communication Protocols and interfaces ............................................................... 17
Lab 2 – Prototype Product Specification 3
Running Head: LAB 2 – READ REQUIREMENTS
1
Introduction
In the United States there are over 4,700 research institutions (Digest of Education
Statistics). These institutions publicize their research through publications and upload them to
the Internet in order to share them with the online community. The need to upload these
documents can expend a large amount of time because in most organizations it is a manual
process. The workload can be so extensive that some groups, like Old Dominion University’s
Computer Science Department, are unable to keep their systems up to date; the latest publication
uploaded dates back to 2008. Outdated systems of this nature are a poor representation of
group’s findings; the system should display the information in a way that advertises the
organization to the general public. Funding the organization’s research projects are numerous
grants that have been awarded by external funding agencies. Like publications, grants must also
be stored in the system, and are just as tedious to upload as the publications they are associated
with.
READ is a repository for electronic aggregation of documents developed by Old
Dominion University’s Computer Science department. It is designed specifically to the needs of
Old Dominion University’s Computer Science department, but it can be integrated into any
online system which displays a company’s publications and grants. It is designed to automate the
process of adding and organizing publications and grants into a filterable format. It will also give
users the option to filter what they are looking for, allowing users to narrow down topics and
locate ones of relevant interest. The prototype will provide basic functionality including a user
interface, a fully constructed database, user functions such as editing or adding publications or
grants, and the automation of publication and grant submissions to the system. It will not include
Lab 2 – Prototype Product Specification 4
Running Head: LAB 2 – READ REQUIREMENTS
features that mainly provide aesthetic functionality, such as graphs that illustrate the number of
publications a person a created over the past few years or the amount of grant money a person
has earned.
1.1 Purpose
READ, a repository for electronic aggregation of documents is a system designed
specifically for Old Dominion University’s Computer Science Department. The department is
composed of a group of faculty members, most of which produce numerous publications
detailing their research every year. In an attempt to organize the faculty’s publications into a
single viewable location, the department had a system in place where publications were manually
submitted by the faculty and later added to the system manually by the system’s administrator.
The process cost such a large amount of time just to update the system that most of the faculty
stopped submitting publications all together. This can be seen in the systems display itself, as the
last submitted publication dates back to the year 2008 (Recent Publications). The page also lacks
any filter capabilities; all publications are displayed with those most recently published at the top
and older ones going to the. The display page is no longer linked to the department’s homepage
because it is out of date and no longer in use. The READ system is designed to encourage use of
the new system through by automating the process of updating the system as well as adding
additional browsing capabilities. Eventually the system may be expanded to be included in other
departments at Old Dominion University as well as other organizations that require a system to
organize their publications and grants.
Lab 2 – Prototype Product Specification 5
Running Head: LAB 2 – READ REQUIREMENTS
Intentionally left blank
Figure 1 – Major Functional Component Diagram
Figure 1 illustrates the components that will be used within the READ system. The system
will be stored on a server owned by Old Dominion University. Major software components
within the system include a graphical user interface, a database, and a Scraper. The system
interface will be split up into two sections: a public section and a private section. The public
section will allow anyone to browse and filter publications or grants stored within the system as
well as allow the user to view author profile pages. The private section allows authors to edit or
remove publications from the system over which they have ownership of. The private section
will require a login interface that will validate whether or not the user is a valid author. A
database will be used to house all publication and grant data as well as authors that are registered
within the system. The user interface will communicate with the database in order to display
publications and grants stored within it or when an author submits changes to their profile
Lab 2 – Prototype Product Specification 6
Running Head: LAB 2 – READ REQUIREMENTS
information, publications, or grants they have ownership over. The Schaefer Scraper will search
specific sites over the Internet and extract publications that are associated with authors stored
within the database. It will run on a timed basis set by system administrators and will update the
database with the most recent publications automatically. A module called the Prediction
Algorithm will be provided on the READ main webpage to determine if a company has enough
storage space in order to use READ to meet their standards. The Prediction Algorithm will
require the average amount of storage consumed by an author, the average number of uploaded
files per author, along with the average size of the upload.
READ will allow authors to have their publications and grants automatically added to the
system. Only extra information such as a single thumbnail image will require manual input from
the author. The system will still preserve the ability to allow authors to upload publications
directly into the system without having to use the systems automated features. It will also allow
people to view any publications or grants that are currently in the system. Additional information
for authors in the READ system will be listed on their designated profile pages, as well as any
publications or grants they are associated with.
1.2 Scope
The READ prototype is aimed to store publications and grants created by the department’s
faculty in a single location. The system is designed to minimize the amount of time needed to
update the system with the most recent publications produced by the department’s faculty.
Overall, the system will generate greater interest in Old Dominion University’s Computer
Science Department. The prototype is designed to integrate the use of the Schaefer Scraper into a
working database system and display environment. It will be used to demonstrate the
functionality of the system to the Old Dominion University Computer Science department; this
Lab 2 – Prototype Product Specification 7
Running Head: LAB 2 – READ REQUIREMENTS
demonstration will allow them to decide on any changes they may want made to the system
before it is fully developed. The prototype will use actual publications and grants created by Old
Dominion University’s Computer Science faculty in order demonstrate the effectiveness of the
Schaefer Scraper. Additional user interface functionality will also be implemented in order to
demonstrate the systems usage.
1.3 Definitions, Acronyms, and Abbreviations
Administrator/Administrative User: a user with increased privileges for editing database
content
Author: a person who publishes in an academic journal or other academic
BibTeX: A file format for reference information in XML format. It will be used to
automatically
fill in key information when uploading or editing publications and grants.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development, that sometimes includes theory about software engineering
methods.
Client application: In a client/server architecture, the module that takes input and creates
queries to be processed by a server, and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Funding Agency: The source of funds for research grants. These organizations usually have a
Lab 2 – Prototype Product Specification 8
Running Head: LAB 2 – READ REQUIREMENTS
limited amount of money to (pass out) principle investigator’s that submit an accepted
application for research funds.
GIT: A software system for controlling and organizing software versioning.
GoogleScholar (http://scholar.google.com): Google Scholar provides a simple way to broadly
search for scholarly literature. From one place, you can search across many disciplines
and sources: articles, theses, books, abstracts and court opinions, from academic
publishers, professional societies, online repositories, universities and other web sites.
Google Scholar helps you find relevant work across the world of scholarly research.
scholar.google.com
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus,
etc. that can be interacted with via a mouse and keyboard, through which a user interacts
with a software application. Used to differentiate from a “command-line interface”, in
which a user interacts with a software application solely through a text terminal.
internet scraper: internet scraper / web scraper - (wikipedia) web scraping focuses more on the
transformation of unstructured data on the web, typically in HTML format, into
structured data that can be stored and analyzed in a central local database or spreadsheet.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
MicrosoftAcademic (http://academic.research.microsoft.com/): Microsoft Academic Search is a
free service developed by Microsoft Research to help scholars, scientists, students, and
practitioners quickly and easily find academic content, researchers, institutions, and
activities. Microsoft Academic Search indexes not only millions of academic
publications, it also displays the key relationships between and among subjects, content,
Lab 2 – Prototype Product Specification 9
Running Head: LAB 2 – READ REQUIREMENTS
and authors, highlighting the critical links that help define scientific research. Microsoft
Academic Search makes it easy for you to direct your search experience in interesting
and heretofore hidden directions with its suite of unique features and visualizations.
MySQL: A database querying language.
Parse: A technical term usually used to describe the processing of a statement written in a
programming language. May be used generally to describe the processing of any
statement for specific meaning.
Perl: A widely-used programming language on the server-side of web applications.
PHP: A widely-used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed upon,
responsible for documenting the work and publishing research results.
Publication or Academic Publication: A document created by a faculty member to share
research. They are usually published in an academic journals, technical reports, and
records of conference proceedings.
Query: An algorithm sent to the database to either change the database or get back results
READ: Repository for Electronic Aggregation of Documents
RSS: A system for subscribing to and distributing news.
Scraper: An automated application designed to scan a source of input such as a document or a
website for pertinent information.
Server application: In a client/server architecture, the module that takes queries or requests
from a client module, process them, and returns the result to the client.
Software Compatibility: A description of whether different softwares, or versions of software,
Lab 2 – Prototype Product Specification 10
Running Head: LAB 2 – READ REQUIREMENTS
can communicate/interact.
SQL: A widely-used programming language used to query databases.
SQL injection: Performing unauthorized queries on a database for malicious purposes.
User Authentication: The process of verifying the access credentials of a user of an automated
system, usually accomplished by requesting a username and password combination.
Viewer: In the scope of our project an outside person who wishes to query the information
contained in the READ database.
Version Control: A method for organizing and recording different versions of documents that
have been created over time.
Virtual Private Server (VPS): A software version of a hardware server. Used to create
independent servers (....) on a single piece of hardware.
Webserver: A group of applications run on a computer or VPS in to serve webpages and
provide server-side computation for browser-based client applications. A web server is a
constantly “on” resource whose sole or main job is to respond to HTTP requests from
browsers.
XML: Extensible markup language.
1.4 References
Digest of Education Statistics. 2011. National Center For Educational Statistics
Web. 19 Nov 2012.
<http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=
report>.
"Recent Publications." Department Of Computer Science. N.p., n.d. Web. 13 Feb. 2013.
<http://www.cs.odu.edu/recent_publications.shtml>.
Lab 2 – Prototype Product Specification 11
Running Head: LAB 2 – READ REQUIREMENTS
1.5 Overview
This product specification provides the hardware and software configuration, external
interfaces, capabilities and features of the READ prototype. The information provided in the
remaining sections of this document includes a detailed description of the hardware, software,
and external interface architecture of the READ prototype; the key features of the prototype; the
parameters that will be used to control, manage, or establish these features; and the performance
characteristics of these features in terms of inputs, outputs, and user interaction.
2 General Description
2.1 Prototype Architecture Description
Figure 2 Prototype Major Function Component Diagram
Lab 2 – Prototype Product Specification 12
Running Head: LAB 2 – READ REQUIREMENTS
The major hardware and software component structure of the READ prototype is
illustrated in Figure 2. The READ system is stored on a Debian Virtual machine. Access to the
system will require a computer the ability to browse the Internet. The main software components
built within the system are the database, the Web-Based interface, and the Schafer Scraper. The
database shall be written and created using MySQL software as it is a language the READ team
has extensive experience working with. All publication and grant data stored in the database will
be based off of actual publications and grants owned by Old Dominion University’s Computer
Science faculty and graduate students. The Web-Based interface shall be written using PHP and
standard HTML, as well as AJAX in order to create a type-ahead publication and grant filter and
query system. The Schafer Scraper is a prebuilt module provided by Andrew Schaefer. It will
provide all the functional capability of the Scraper needed except for the ability to add grants
automatically into the READ system.
Features
Browsing
Capabilities
Real World Project
Ability to browse all grants and
publication
Prototype
Ability to browse all grants and
publications
Publication
Filtering
Capabilities
Filtered by title, publisher, authors,
publication date, date added, and
keywords.
Filtered by title, publisher, authors,
publication date, date added, and
keywords.
Grant Filtering
Capabilities
Filtered by title, funding agency,
principal or co-principal
investigator, start date, end date,
and active state.
Filtered by title, funding agency,
principal or co-principal
investigator, start date, end date, and
active state.
Add, edit, and
delete publications
and grants
Included. A thumbnail image and
files may be associated with the
document. Fields can be
automatically filled in using a
BibTex document.
Included. A thumbnail image and
files may be associated with the
document. Fields can be
automatically filled in using a
BibTex document.
Faculty page
Lists faculty and provides a link to
each person’s profile page
Not included.
Lab 2 – Prototype Product Specification 13
Running Head: LAB 2 – READ REQUIREMENTS
Login interface
Linked to Old Dominion University
Computer Science accounts
Linked to Old Dominion University
Computer Science accounts
Profile Page
Displays authors’ profile picture,
job title, email address, personal
webpage link, and the author’s
publications and grants. Displays
graphs
Displays authors’ profile picture, job
title, email address, personal
webpage link, and the author’s
publications and grants. Graphs not
included.
Scraper
Will update the system with new
publications and grants and alert
users when one is added to the
system under their name.
Will update the system with
publications only and alert users
when one is added to the system
under their name.
Prediction
algorithm
Predicts if the consumer has enough
space to use the READ system.
Not included
Administrative
Administrators are able to edit, add,
Privileges
or remove anything in the system.
Table 1 – Features and Capabilities list
Administrators are able to edit, add,
or remove anything in the system.
Table 1 details the differences between the real world project and the READ prototype.
The prototype itself consists of most of the capabilities and features of the real world problem
except for a few that are primarily aesthetic. For starters the profile page will not display graphs
detailing information about the author’s contributions. The Prediction Algorithm will not be
included in the prototype as it would only be used as a guideline for other groups that may wish
to use the READ system. The faculty page will also not be included as the computer science
department already has one on their main page. The department may choose to incorporate links
to the profile pages from their own faculty page in the future.
2.2 Prototype Functional Description
The major functional components are shown in Figure 2, and an in depth description of the
system’s interface privileges is illustrated in Figure 3. When a user first visits the READ
interface, they will have access only to the system’s viewer privileges, including the ability to
Lab 2 – Prototype Product Specification 14
Running Head: LAB 2 – READ REQUIREMENTS
view publications, grants, and profile information on each author in the system. The user can
then choose to login to the system. If he or she is an authentic user, they will then be logged in as
an author; if not, he or she will still only have access to the viewer privileges. Authors can add
publications and grants to the system manually, edit publications and grants they have ownership
of, and edit their own profile information. If their account is designated as an administrative
account, he or she will have access to the following administrative privileges: the ability to
remove publications and grants from the system, the ability to edit any publication or grant in the
system, the ability to edit anyone’s profile information, and the ability to set the system’s default
settings (such as the number of publications or grants that are displayed on a single page).
Administrators and authors will still have access to all viewer privileges.
Figure 3: READ interface privileges diagram.
Lab 2 – Prototype Product Specification 15
Running Head: LAB 2 – READ REQUIREMENTS
A detailed illustration of the flow of the Schaefer Scraper can be found in Figure 4. The
scraper will search for publications created by ODU CS faculty members using predefined
publication websites. If a publication is found, it will check if it has already been added to the
system. If the publication is not in the system, the Schaefer Scraper will add it to the READ
database, as well as add ownership of it to the author it had searched for. If the publication is
already in the system, it will check to see if the author it is searching for already has ownership
of it; if the author does not, it will add the author as an owner of the publication. After it has
either added the publication to the system or given the specified author ownership privileges, the
Schaefer Scraper will then send an email to the author that a publication has been added to the
system under their name. When the author checks the email, he or she will be able to select
whether or not the publication actually belongs to him or her. If the author denies ownership of
the publication, the system will check if anyone else has ownership of it; if no one else has
ownership over it, then the system removes the publication from the system altogether, but if
someone else does have ownership over it then the system will only remove the author from the
ownership list. If the author accepts ownership of the publication, the system either authorizes
the publication to be shown in the system or it authorizes the user as an accepted owner of the
publication.
Intentionally left blank
Lab 2 – Prototype Product Specification 16
Running Head: LAB 2 – READ REQUIREMENTS
Figure 4: Scraper Flow Diagram
2.3 External Interfaces
External interfaces will be limited to standard PC hardware and freely available software.
The only custom interface will be the READ interface.
2.3.1 Hardware Interfaces
No hardware interfaces will be built for this prototype. A PC will be used to demonstrate
the READ system. The READ system will be hosted on an ODU Debian virtual machine.
2.3.2 Software Interfaces
Group members will interact with the READ MySQL database using a putty windowing
Lab 2 – Prototype Product Specification 17
Running Head: LAB 2 – READ REQUIREMENTS
system. The READ web-based interface will be built using PHP, AJAX, Javascript, and XML.
The code will be developed using standard text editing tools such as notepad++ and Emacs. The
login interface will use the ODU CS department’s login system for authentication purposes.
2.3.3 User Interfaces
Figure 5 represents the site map of the READ user interface. From the READ homepage
it is possible to reach ones own user profile page, as well as the publication and grant query
pages. If someone were to select one of the authors associated with a specific grant or
publication from the query page, one is able to view that authors profile page. From one’s own
profile page it is possible to add and edit grants and publications to the system.
Figure 5: Site Map
READ
Homepage
Publication
Grant
Administration
User Profile
Add
Publications
Add Grants
Edit
Publications
Edit Grants
2.3.4 Communication Protocols and interfaces
Https, rather than the normal http protocol, will be used in order to create a secure
connection with the READ system. The only extra external interface used with the system will
be an authors CS email system.
Download