Running head: LAB1 – READ DESCRIPTION

advertisement
Lab 1 – READ Product Description 1
Running head:
LAB1 – READ DESCRIPTION
Lab 1 – READ Product Description
Jacob Phillmon
CS411
Janet Brunelle
February 18, 2013
Lab 1 – READ Product Description 2
Running head:
LAB1 – READ DESCRIPTION
1
Introduction ........................................................................................................................................... 3
2
Read Product Description ..................................................................................................................... 4
3
2.1
Key Product Features and Capabilities ......................................................................................... 4
2.2
Major Components........................................................................................................................ 6
Identification of Case Study .................................................................................................................. 7
Read Product Prototype Description ............................................................................................................. 8
3.1
Prototype Architecture .................................................................................................................. 9
3.2
Prototype Features and Capabilities ............................................................................................ 10
3.3
Prototype Development Challenges ............................................................................................ 11
Glossary…………………………………………………………………………………………………...12
References………………………………………………………………………………………………....15
Lab 1 – READ Product Description 3
Running head:
1
LAB1 – READ DESCRIPTION
Introduction
In the United States there are over 4,700 research institutions (Digest of Education
Statistics). These institutions publicize their research through publications and upload them to
the Internet in order to share them with the online community. The need to upload these
documents can expend a large amount of time because in most organizations it is a manual
process. The workload can be so extensive that some groups, like Old Dominion University’s
Computer Science Department, are unable to keep their systems up to date; the latest publication
uploaded dates back to 2008. Outdated systems of this nature are a poor representation of
group’s findings; the system should display the information in a way that advertises the
organization to the general public. Funding the organization’s research projects are numerous
grants that have been awarded by external funding agencies. Like publications, grants must also
be stored in the system, and are just as tedious to upload as the publications they are associated
with.
READ is a repository for electronic aggregation of documents developed by Old
Dominion University’s Computer Science department. It is designed specifically to the needs of
Old Dominion University’s Computer Science department, but it can be integrated into any
online system which displays a company’s publications and grants. It is designed to automate the
process of adding and organizing publications and grants into a filterable format. It will also give
users the option to filter what they are looking for, allowing users to narrow down topics and
locate ones of relevant interest. The prototype will provide basic functionality including a user
interface, a fully constructed database, user functions such as editing or adding publications or
grants, and the automation of publication and grant submissions to the system. It will not include
features that mainly provide aesthetic functionality, such as graphs that illustrate the number of
Lab 1 – READ Product Description 4
Running head:
LAB1 – READ DESCRIPTION
publications a person a created over the past few years or the amount of grant money a person
has earned.
2
Read Product Description
READ is an online system that stores publications and grants that are associated with
members of Old Dominion University’s Computer Science department. The system is designed
to minimize the amount of time needed to update the system with the most recent publications
produced by the department’s faculty. Only extra information such as a single thumbnail image
will require manual input from the author. The system will still preserve the ability to allow
authors to upload publications directly into the system without having to use the systems
automated features.
2.1
Key Product Features and Capabilities
The system interface will allow anyone to browse through documents stored within the
system. In addition, it will also allow the user to filter the displayed documents into much
smaller, manageable results. Filters can be compared to information stored within the document,
such as the title, authors associated with the publication, the publication date, or specific
keywords manually set by the author. If a match is found after comparing the filter information
to the document information, then the document will be displayed in the filtered results. Filter
parameters will vary depending on the type of document currently displayed.
A personal profile will also be included within the system interface. Profile information,
such as a profile picture and additional descriptive information about the author, will be
displayed here; profile information can be altered by the profile’s owner at any time through an
editing interface. Additionally, any publications in which the profile’s owner is listed as an
Lab 1 – READ Product Description 5
Running head:
LAB1 – READ DESCRIPTION
author, or grants in which the profile’s owner is listed as a principle or co-principle investigator,
will be displayed here. If an author is logged into the system and is on their own profile page,
they can choose to edit publications and grants on the page because they are associated with
them. While editing publications or grants an author has the ability to submit a BibTex document
in order to fill in various fields instantaneously. Authors will also have the ability to upload files
to be associated with their publications if they so wish to do so; the files can be downloaded by
viewers of the READ system. Graphs detailing the number of publications and the amount of
grant money earned each year will also be displayed. Each author’s profile page can be viewed
by any user, but only authors logged into the system viewing their own profile page or a system
administrator will have rights to edit anything within it.
The system will use an external module called the Schaefer Scraper to search predefined
sites that contain publications created by author’s. It will extract publication and grant data
automatically from each site searched, including the title of the publication, the authors
associated with it, and a link to the page the publication is displayed within. The information
will then be stored in the READ database. After it has been added to the database, an alert
informing authors that a new publication or grant has been added under their name will be
emailed to authors whenever a new publication or grant has been added to the database. Authors
will be able to choose to either remove the publication from the system if they believe it was
added under their name in error by clicking on a link displayed within the alert or through their
own profile interface, or they have the option to add extra information to the publication and
correct any mistakes that may exist through an editing interface. Over time the Scraper will learn
to avoid alerting specific users based on the publication removal patterns made by authors. If for
Lab 1 – READ Product Description 6
Running head:
LAB1 – READ DESCRIPTION
some reason a publication is manually added to the system and the scraper finds a copy of it on
an external site, a duplicate copy will not be added to the system.
2.2
Major Components
Figure 1 – Major Functional Component Diagram
Figure 1 illustrates the components that will be used within the READ system. The system
will be stored on a server owned by Old Dominion University. Major software components
within the system include a graphical user interface, a database, and a Scraper. The system
interface will be split up into two sections: a public section and a private section. The public
section will allow anyone to browse and filter publications or grants stored within the system as
well as allow the user to view author profile pages. The private section allows authors to edit or
remove publications from the system over which they have ownership of. The private section
will require a login interface that will validate whether or not the user is a valid author. A
database will be used to house all publication and grant data as well as authors that are registered
Lab 1 – READ Product Description 7
Running head:
LAB1 – READ DESCRIPTION
within the system. The user interface will communicate with the database in order to display
publications and grants stored within it or when an author submits changes to their profile
information, publications, or grants they have ownership over. The Schaefer Scraper will search
specific sites over the Internet and extract publications that are associated with authors stored
within the database. It will run on a timed basis set by system administrators and will update the
database with the most recent publications automatically. A module called the Prediction
Algorithm will be provided on the READ main webpage to determine if a company has enough
storage space in order to use READ to meet their standards. The Prediction Algorithm will
require the average amount of storage consumed by an author, the average number of uploaded
files per author, along with the average size of the upload.
3
Identification of Case Study
The READ system is designed specifically for Old Dominion University’s Computer
Science Department. The department is composed of a group of faculty members, most of which
produce numerous publications detailing their research every year. In an attempt to organize the
faculty’s publications into a single viewable location, the department had a system in place
where publications were manually submitted by the faculty and later added to the system
manually by the system’s administrator. The process cost such a large amount of time just to
update the system that most of the faculty stopped submitting publications all together. This can
be seen in the systems display itself, as the last submitted publication dates back to the year 2008
(Recent Publications). The page also lacks any filter capabilities; all publications are displayed
with those most recently published at the top and older ones going to the. The display page is no
longer linked to the department’s homepage because it is out of date and no longer in use. The
READ system is designed to encourage use of the new system through by automating the
Lab 1 – READ Product Description 8
Running head:
LAB1 – READ DESCRIPTION
process of updating the system as well as adding additional browsing capabilities. Eventually the
system may be expanded to be included in other departments at Old Dominion University as well
as other organizations that require a system to organize their publications and grants.
Read Product Prototype Description
The READ prototype is designed to integrate the use of the Schaefer Scraper into a working
database system and display environment. It will be used to demonstrate the functionality of the
system to the Old Dominion University Computer Science department; this demonstration will
allow them to decide on any changes they may want made to the system before it is fully
developed. The prototype will use actual publications and grants created by Old Dominion
University’s Computer Science faculty in order demonstrate the effectiveness of the Schaefer
Scraper. Additional user interface functionality will also be implemented in order to demonstrate
the systems usage.
Intentionally left blank
Lab 1 – READ Product Description 9
Running head:
3.1
LAB1 – READ DESCRIPTION
Prototype Architecture
Figure 2 Prototype Major Function Component Diagram
The major hardware and software component structure of the READ prototype is
illustrated in Figure 2. The READ system is stored on a Debian Virtual machine. Access to the
system will require a computer the ability to browse the Internet. The main software components
built within the system are the database, the Web-Based interface, and the Schafer Scraper. The
database shall be written and created using MySQL software as it is a language the READ team
has extensive experience working with. All publication and grant data stored in the database will
be based off of actual publications and grants owned by Old Dominion University’s Computer
Science faculty and graduate students. The Web-Based interface shall be written using PHP and
standard HTML, as well as AJAX in order to create a type-ahead publication and grant filter and
Lab 1 – READ Product Description 10
Running head:
LAB1 – READ DESCRIPTION
query system. The Schafer Scraper is a prebuilt module provided by Andrew Schaefer. It will
provide all the functional capability of the Scraper needed except for the ability to add grants
automatically into the READ system.
3.2
Prototype Features and Capabilities
Features
Browsing
Capabilities
Real World Project
Ability to browse all grants and
publication
Prototype
Ability to browse all grants and
publications
Publication Filtering
Capabilities
Filtered by title, publisher, authors,
publication date, date added, and
keywords.
Filtered by title, publisher, authors,
publication date, date added, and
keywords.
Grant Filtering
Capabilities
Filtered by title, funding agency,
principal or co-principal investigator,
start date, end date, and active state.
Filtered by title, funding agency,
principal or co-principal investigator,
start date, end date, and active state.
Add, edit, and delete
publications and
grants
Included. A thumbnail image and files
may be associated with the document.
Fields can be automatically filled in
using a BibTex document.
Included. A thumbnail image and files
may be associated with the document.
Fields can be automatically filled in
using a BibTex document.
Faculty page
Lists faculty and provides a link to
each person’s profile page
Not included.
Login interface
Linked to Old Dominion University
Computer Science accounts
Linked to Old Dominion University
Computer Science accounts
Profile Page
Displays authors’ profile picture, job
title, email address, personal webpage
link, and the author’s publications and
grants. Displays graphs
Displays authors’ profile picture, job
title, email address, personal webpage
link, and the author’s publications and
grants. Graphs not included.
Scraper
Will update the system with new
publications and grants and alert users
when one is added to the system under
their name.
Will update the system with
publications only and alert users when
one is added to the system under their
name.
Prediction algorithm
Predicts if the consumer has enough
space to use the READ system.
Not included
Administrative
Administrators are able to edit, add, or
Privileges
remove anything in the system.
Table 1 – Features and Capabilities list
Administrators are able to edit, add, or
remove anything in the system.
Lab 1 – READ Product Description 11
Running head:
LAB1 – READ DESCRIPTION
Table 1 details the differences between the real world project and the READ prototype. The
prototype itself consists of most of the capabilities and features of the real world problem except for a few
that are primarily aesthetic. For starters the profile page will not display graphs detailing information
about the author’s contributions. The Prediction Algorithm will not be included in the prototype as it
would only be used as a guideline for other groups that may wish to use the READ system. The faculty
page will also not be included as the computer science department already has one on their main page.
The department may choose to incorporate links to the profile pages from their own faculty page in the
future.
3.3
Prototype Development Challenges
There are a number of challenges and risks that may appear during the development of the
READ system. First of all, there is a chance that the Schaefer Scraper may need to be modified
in order to be compatible with the READ system.. The format of the data extracted from various
websites might not meet the format specifications of the database we develop as well. This is
probably an inevitable problem that must be overcome, so the group will start deciphering the
code to the Schaefer Scraper early in development.
Secondly, there is the possibility that the prototype may not meet all of the user
requirements and specifications. A time limit has been placed for the production of the prototype,
so it might not be finished by the due date. It is also possible that it might not be finished due to
the lack of knowledge needed to develop the system. To avoid this, the task of coding the
prototype will be split up between group members. Any technical skills needed to develop the
prototype will also be researched ahead of time.
There is also the possibility that the interface may be incompatible with certain browsers.
The requirements for a page to be displayed on a browser such as Google Chrome differ from
those on one such as Fire Fox or Internet Explorer. Google Chrome will be the main focus group
Lab 1 – READ Product Description 12
Running head:
LAB1 – READ DESCRIPTION
for the prototype, but later on the interface will be expanded to be fully compatible with most
browsers and possibly even portable devices such as smart phones.
GLOSSARY
Administrator/Administrative User: a user with increased privileges for editing database
content
Author: a person who publishes in an academic journal or other academic
BibTeX: A file format for reference information in XML format. It will be used to
automatically
fill in key information when uploading or editing publications and grants.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development, that sometimes includes theory about software engineering
methods.
Client application: In a client/server architecture, the module that takes input and creates
queries to be processed by a server, and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Funding Agency: The source of funds for research grants. These organizations usually have a
limited amount of money to (pass out) principle investigator’s that submit an accepted
application for research funds.
GIT: A software system for controlling and organizing software versioning.
GoogleScholar (http://scholar.google.com): Google Scholar provides a simple way to broadly
Lab 1 – READ Product Description 13
Running head:
LAB1 – READ DESCRIPTION
search for scholarly literature. From one place, you can search across many disciplines
and sources: articles, theses, books, abstracts and court opinions, from academic
publishers, professional societies, online repositories, universities and other web sites.
Google Scholar helps you find relevant work across the world of scholarly research.
scholar.google.com
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus,
etc. that can be interacted with via a mouse and keyboard, through which a user interacts
with a software application. Used to differentiate from a “command-line interface”, in
which a user interacts with a software application solely through a text terminal.
internet scraper: internet scraper / web scraper - (wikipedia) web scraping focuses more on the
transformation of unstructured data on the web, typically in HTML format, into
structured data that can be stored and analyzed in a central local database or spreadsheet.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
MicrosoftAcademic (http://academic.research.microsoft.com/): Microsoft Academic Search is a
free service developed by Microsoft Research to help scholars, scientists, students, and
practitioners quickly and easily find academic content, researchers, institutions, and
activities. Microsoft Academic Search indexes not only millions of academic
publications, it also displays the key relationships between and among subjects, content,
and authors, highlighting the critical links that help define scientific research. Microsoft
Academic Search makes it easy for you to direct your search experience in interesting
Lab 1 – READ Product Description 14
Running head:
LAB1 – READ DESCRIPTION
and heretofore hidden directions with its suite of unique features and visualizations.
MySQL: A database querying language.
Parse: A technical term usually used to describe the processing of a statement written in a
programming language. May be used generally to describe the processing of any
statement for specific meaning.
Perl: A widely-used programming language on the server-side of web applications.
PHP: A widely-used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed upon,
responsible for documenting the work and publishing research results.
Publication or Academic Publication: A document created by a faculty member to share
research. They are usually published in an academic journals, technical reports, and
records of conference proceedings.
Query: An algorithm sent to the database to either change the database or get back results
READ: Repository for Electronic Aggregation of Documents
RSS: A system for subscribing to and distributing news.
Scraper: An automated application designed to scan a source of input such as a document or a
website for pertinent information.
Server application: In a client/server architecture, the module that takes queries or requests
from a client module, process them, and returns the result to the client.
Software Compatibility: A description of whether different softwares, or versions of software,
can communicate/interact.
SQL: A widely-used programming language used to query databases.
Lab 1 – READ Product Description 15
Running head:
LAB1 – READ DESCRIPTION
SQL injection: Performing unauthorized queries on a database for malicious purposes.
User Authentication: The process of verifying the access credentials of a user of an automated
system, usually accomplished by requesting a username and password combination.
Viewer: In the scope of our project an outside person who wishes to query the information
contained in the READ database.
Version Control: A method for organizing and recording different versions of documents that
have been created over time.
Virtual Private Server (VPS): A software version of a hardware server. Used to create
independent servers (....) on a single piece of hardware.
Webserver: A group of applications run on a computer or VPS in to serve webpages and
provide server-side computation for browser-based client applications. A web server is a
constantly “on” resource whose sole or main job is to respond to HTTP requests from
browsers.
XML: Extensible markup language.
REFERENCES
Digest of Education Statistics. 2011. National Center For Educational Statistics
Web. 19 Nov 2012.
<http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=
report>.
"Recent Publications." Department Of Computer Science. N.p., n.d. Web. 13 Feb. 2013.
<http://www.cs.odu.edu/recent_publications.shtml>.
Download