Running Head: Troy M. Connor Lab 1 1

advertisement
Running Head: Troy M. Connor Lab 1
Lab 1 – READ product description
Troy M. Connor
CS411W
Janet Brunelle
February 18, 2013
1
Running Head: Troy M. Connor Lab 1
2
TABLE OF CONTENTS
1.
INTRODUCTION
3
2
READ PRODUCT DESCRIPTION
4
2.1 Key Product Features and Capabilities
4
2.2 Major Components (Hardware/Software)
6
2.3 Target Market/Customer Base
7
3 READ PRODUCT PROTOTYPE DESCRIPTION
7
4.
5.
3.1Prototype Functional Goals and Objectives
7
3.2 Prototype Architecture (Hardware/Software)
8
3.3 Prototype Features and Capabilities
9
3.4 Prototype Development Challenges
9
Prototype Description
10
4.1 Major Functional Components (Hardware and Software)
10
4.2 Features and Capabilities
11
Glossary
13
FIGURES
FIGURE 1
8
FIGURE 2
11
TABLE 1
12
Glossary
14
Running Head: Troy M. Connor Lab 1
3
Introduction
In the United States there are over 4,700 research institutions (Digest of Education and
Statistics). The way that the public views the research being done by these institutions are done
by an online list. Usually the process is manually done and can cause problems when the
interested public is looking for specific research interests. These web pages are rarely kept up to
date and do not provide an adequate representation of what research or publication was added.
Old Dominion University’s Computer Science Department (ODUCS) is amongst the research
institutions with this issue. They face the problem of having a web page dedicated to the listing
of publications that has not been updated since 2008. Research grants funded to the University
are also in disarray and not currently being kept up to date. The problem is that the process of
entering grants and publications into the system becomes a single person’s responsibility. This
makes it difficult for people interested in viewing research grants and publications from Old
Dominion’s Computer Science Department. People interested in Old Dominion University’s
research will not be able to find updated publications or grants.
The Repository for Electronic Aggregation of Documents (READ) is being developed by
members of ODUCS. Its design will not only serve the Computer Science department, but other
departments as well. It is designed to automate the adding of publication and research grants for
the public to view on an online web page. This will solve the need of finding publications or
grants to promote the interest in any research institution.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 1
4
READ Product Description
READ is an online system database that will house grants, publications, or links that will
provide a means for viewers to see them using a number of specific filters. The goal is to
minimize the need for an author to manually manage their publications through the features that
READ will employ. READ will more efficiently advertise ongoing research, and available grants
to whoever requires. READ will have the users to work less in order to READ more.
2.1 Key Product Features and Capabilities
READ will search through different websites using a tool known as the Schaefer
Scraper™ for publications that match a registered authors credentials. The Schaefer Scraper tool
searches the Internet of current faculty recent publications and research grants and records the
necessary data into a database. READ will store the information as a link to the document. The
author will receive an email notification in order to authorize the publication. The system will
eventually learn when not to notify a user based on patterns provided by the author. This allows
for an efficient way to gather all of an author’s publications into a single location seamlessly,
with little extra work on their behalf.
The system will allow viewers to browse through READ using filters for grants, multiple
types of publications, and authors. Viewing author’s profiles will show personal information, and
a graphical representation of both publications they have created, and, funding they received
through grants they participated in along with a list of said publications. This statistical
information may inform viewers about a specific author’s area of expertise, level of activity, and
a point of contact should they be interested in research of similar topics.
Running Head: Troy M. Connor Lab 1
(This space is intentionally left blank.)
5
Running Head: Troy M. Connor Lab 1
6
2.2 Major Components (Hardware/Software)
The READ solution will consist of three major components: a web interface, a database,
and Schaefer's Scraper. The web interface contains both public and private sections. The section
allowing the authors to manipulate publication data will be private. Authentication will be
provided to allow authors to log in and out of the system.
The database is what will store the publications for the authors using the system. Its
primary focus will be to show the publications that the authors have made. This will allow the
viewing public to see publications and grants that Old Dominion University faculty and staff
currently have available.
The Schaefer's Scraper is the last part of READ. This software will automatically search
websites for publications from authors at Old Dominion University Computer Science
Department. On regularly scheduled intervals, the Schafer Scraper will export data found from
authors at Old Dominion Computer Science faculty to a text file. This data will be parsed to add
publications to the database.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 1
7
2.3 Target Market/Customer Base
The initial customer for READ is Old Dominion University's Computer Science
Department. The issue started when Dr. Weigle, who produces a lot of publications, saw her
work being unrecognized. According to the Computer Science website, the department features
37 faculty members, 11 currently enrolled Ph.D. students, and 111 currently enrolled Master's
students. All these potential authors could benefit from this service once implemented. When
testing at ODUCS becomes successful, READ would then be used by other departments at Old
Dominion University. READ could potentially be utilized by other universities, government
facilities, non-profit research institutions, or even libraries.
3 READ Product Prototype Description
The READ prototype will be modeled using the Old Dominion University Computer
Science Department’s publications and hosted on a virtual machine running a Linux-based
operating system. This prototype will also include all of the features of the Real World Product
(RWP) except for the prediction algorithm that calculates the size and storage needs of an
expanding database. The prototype will act as a template to the Real World Solution provided to
the ODUCS. Overall this prototype, once it is completed, will be a vital and usable tool to
organize publications and grants for the University.
3.1 Prototype Functional Goals and Objectives
The READ Prototype will be able to search through the database and filter the results
based on user queries and implemented RSS feeds. It will also allow for users to log on, edit,
Running Head: Troy M. Connor Lab 1
8
and upload data, give functional control of the web application to administrators, utilize the
Schafer Scraper to find publications, and insert them into the READ database. These functions of
the prototype will allow for easy access to numerous publications, grants, and tech reports
written by Old Dominion University faculty.
3.2 Prototype Architecture
Figure 1
The READ prototype will allow a user to log into the system with a web-based interface.
This will then let the user search for or add publications to the database. The public will be able
to view all grants and publications from any computer connected to the Internet. The Schafer
Scraper will run on an interval in order to populate the database with information. The
administrators of READ will define this interval to facilitate the population of grants and
publications.
Running Head: Troy M. Connor Lab 1
9
(This section intentionally left blank.)
3.3 Prototype Features and Capabilities
The READ prototype will be capable of storing and finding various kinds of publications
and research grants. The READ prototype will be able to display this information to various
users. It will also be capable of informing users of the publications found in order to confirm the
authenticity of the publication. The system will allow them to upload publication or research
grants data themselves. This will solve the problem by significantly easing the amount of effort
that is required for the people making the publications to put into the system.
The READ prototype will also reduce the various risks that are inherent in this kind of
application. The legal risks of displaying copyrighted papers will be reduced by having a READ
administrator go through the database and removing any papers that ODU does not have the
legal right to display. It will also address various compatibility risks by making alternate versions
of the page that should work with various web browsers. Security risks, such as SQL injection,
will be taken care of with encryption. It will handle the risk of hard copy publication submission
by making users scan documents before uploading them.
3.4 Prototype Development Challenges
Some challenges the READ prototype will face include learning the architecture and
implementing the Schaefer’s Scraper. Most of the group members looked over the Schaefer
Scraper, and implementing it will not be as simple as installing it. In order to properly utilize the
tool, the final stages to the building block will have to be implemented. This means, the tool is
incapable of getting the necessary data to populate the database. Another challenge will be
Running Head: Troy M. Connor Lab 1
10
getting someone from the systems group to act as an administrator for the system. This challenge
will be solved by the fact that a group member is part of the systems group.
4.
Prototype Description
The prototype for READ will include populated test data that has an access control
interface to log in as administrator or faculty member. This prototype will allow a representation
for the Real World Problem. It will demonstrate what can be achieved with a simple Real World
Solution. It will simulate the full product by allowing all levels (user, faculty, admin) to be able
to see the publications and grants. The faculty members will be able to add citations and
thumbnail images. The administrator will be allowed to promote members, make accounts,
suspend accounts, delete publications, and delete grants. All users will be allowed to view the
publications by filtering the queries with certain options. They will include, the year, the author,
and type of publication. The user will be able to view all of the publications of a particular user
and use the date range slider to filter through them. The user will be able to see the statistics of
each faculty member along with some spark lines to visually show the statistics.
4.1 Major Functional Components (Hardware and Software)
The major functional components that are in READ are the user interface, a database and
software to scrape sites. (Schafer’s scrapper). These components will allow the prototype of
READ to be implemented on the ODU CS server.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 1
11
Figure 2
4.2 Features and Capabilities
The following features and capabilities are going to be implemented to the READ
prototype. The search and filter will allow users to specify how they would like to look for
publications and grants. The profile-edit page will allow the author to make their page unique to
their liking. The RSS feed will show the publications and grants most recently added to the
database. The Schaefer Scraper will pull data into the database on timed intervals to populate the
database automatically. Access controls will allow authors and users to have different roles
when using the system. The prototype for READ will accomplish everything set out for the
Real World Problem’s solution with the exception of sparklines.
(This section intentionally left blank.)
Running Head: Troy M. Connor Lab 1
12
Table 1
Features
Real World Project
Prototype
Browsing
Ability to browse all grants and
Ability to browse all grants and
Capabilities
publication
publications
Publication
Filtered by title, publisher, authors,
Filtered by title, publisher, authors,
Filtering
publication date, date added, and
publication date, date added, and
Capabilities
keywords.
keywords.
Grant Filtering
Filtered by title, funding agency,
Filtered by title, funding agency,
Capabilities
principal or co-principal
principal or co-principal
investigator, start date, end date, and investigator, start date, end date, and
active state.
active state.
Add, edit, and
Included. A thumbnail image and
Included. A thumbnail image and
delete publications
files may be associated with the
files may be associated with the
and grants
document. Fields can be
document. Fields can be
automatically filled in using a
automatically filled in using a
Bibtext document.
Bibtext document.
Running Head: Troy M. Connor Lab 1
Faculty page
Lists faculty and provides a link to
13
Not included.
each person’s profile page
Login interface
Profile Page
Scraper
Prediction algorithm
Linked to Old Dominion University
Linked to Old Dominion University
Computer Science accounts
Computer Science accounts
Displays authors’ profile picture, job
Displays authors’ profile picture, job
title, email address, personal webpage
title, email address, personal webpage
link, and the author’s publications and
link, and the author’s publications and
grants. Displays graphs
grants. Graphs not included.
Will update the system with new
Will update the system with
publications and grants and alert users
publications only and alert users when
when one is added to the system under
one is added to the system under their
their name.
name.
Predicts if the consumer has enough
Not included
space to use the READ system.
Administrative
Administrators are able to edit, add, or
Administrators are able to edit, add, or
Privileges
remove anything in the system.
remove anything in the system.
Running Head: Troy M. Connor Lab 1
14
Glossary:
Administrator/Administrative User: a user with increased privileges for editing database
content
Author: A person that is able to add and edit publications and grants to the system under their
name.
BibTeX: A file format for reference information in XML format. It will be used to automatically
fill in key information when uploading or editing publications and grants.
Computer Science (CS): An academic discipline based on advancing computing theory and
algorithm development, that sometimes includes theory about software engineering methods.
Client application: In a client/server architecture, the module that takes input and creates
queries to be processed by a server, and receives the results from the server.
Client/Server Architecture: A software engineering paradigm that separates functionality into a
“client” application and a “server” application that interact.
CSS: A programming language used to specify presentation of HTML pages
Data Mining: The act of going through a source of input to find specific information.
Database Schema: A description of the structure of database
Funding Agency: The source of funds for research grants. These organizations usually have a
limited amount of money to (pass out) principle investigator’s that submit an accepted
application for research funds.
GIT: A software system for controlling and organizing software versioning.
GoogleScholar (http://scholar.google.com): Google Scholar provides a simple way to broadly
search for scholarly literature. From one place, you can search across many disciplines and
sources: articles, theses, books, abstracts and court opinions, from academic publishers,
Running Head: Troy M. Connor Lab 1
15
professional societies, online repositories, universities and other web sites. Google Scholar helps
you find relevant work across the world of scholarly research.
scholar.google.com
Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus,
etc that can be interacted with via a mouse and keyboard, through which a user interacts with a
software application. Used to differentiate from a “command-line interface”, in which a user
interacts with a software application solely through a text terminal.
Internet scraper: internet scraper / web scraper - (wikipedia) web scraping focuses more on the
transformation of unstructured data on the web, typically in HTML format, into structured data
that can be stored and analyzed in a central local database or spreadsheet.
JQuery Sparklines: A development library for the visualization of data.
ODU: Old Dominion University.
MicrosoftAcademic (http://academic.research.microsoft.com/): Microsoft Academic Search is a
free service developed by Microsoft Research to help scholars, scientists, students, and
practitioners quickly and easily find academic content, researchers, institutions, and activities.
Microsoft Academic Search indexes not only millions of academic publications, it also displays
the key relationships between and among subjects, content, and authors, highlighting the critical
links that help define scientific research. Microsoft Academic Search makes it easy for you to
direct your search experience in interesting and heretofore hidden directions with its suite of
unique features and visualizations.
MySQL: A database querying language.
Parse: A technical term usually used to describe the processing of a statement written in a
programming language. May be used generally to describe the processing of any statement for
Running Head: Troy M. Connor Lab 1
specific meaning.
Perl: A widely-used programming language on the server-side of web applications.
PHP: A widely-used programming language on the server-side of web applications.
Principle Investigator (PI): The primary researcher that a research grant is bestowed
16
Running Head: Troy M. Connor Lab 1
“Delta Cost Project Data.” The Delta project on Postsecondary Education Cost, Productivity,
and Accounatablilty. The Delta project n.d. Web 9 Feb 2013
17
Download