CS 411 W Lab II Prototype Product Specification For

advertisement

Lab2 –Prototype Specification

CS 411 W Lab II

Prototype Product Specification

For

READ

Version 1#

Prepared by: Andrew Sprague, Black Team

Date:03/31/2013

1

2

Lab2 –Prototype Specification

Table of Contents

1 INTRODUCTION .......................................................... Ошибка! Закладка не определена.

1.1

Purpose ................................................................................................................................3

1.2

Scope ....................................................................................................................................4

1.3

Definitions, Acronyms and Abbreviations ..............................................................................4

1.4

References ............................................................................................................................9

1.5

Overview ..............................................................................................................................9

2 GENERAL DESCRIPTION ..................................................................................................... 10

2.1

Prototype Architecture ........................................................................................................ 10

2.2

Prototype Functional Description ........................................................................................ 11

2.3

External Interfaces .............................................................................................................. 14

2.3.1

Hardware Interfaces ............................................................................................................... 14

2.3.2

Software Interfaces ................................................................................................................ 15

2.3.3

User Interfaces ........................................................................................................................ 16

2.3.4

Communication Protocols and Interfaces .............................................................................. 17

List of Figures

Figure 1. Major Functional Components .................................................................................................... 11

Figure 2. Scraper Process Flow ........................................................... Ошибка! Закладка не определена.

Figure 3. Site Map ....................................................................................................................................... 15

List of Tables

Table 1. RWP VS Prototype ......................................................................................................................... 12

3

Lab2 –Prototype Specification

1 Introduction

According to the Digest of Education Statistics there are 4,706 research institutions in the

United States (Digest of Education Statistics). The primary way these institutions attract both clients and new talent is to disseminate information on what they and their employees have accomplished. This dissemination is usually done by employees publishing papers. Universities, one of the largest groups of research institutions, make twenty percent of their annual income from federal contracts and grants (freeby50.com). These universities do not have a good online tool for sharing the work that they have done with prospective students or faculty.

Currently the systems that many institutions use to share publications are slow and tedious.

This issue causes much of the work that universities and faculty accomplish to go without the proper recognition. Students, who wish to go to a university that has professors who specialize in a particular research area, can have trouble discerning between universities. The work universities have done is not as well known, and, as a result, the faculty loses out on their work being recognized.

1.1.

Purpose

The Repository for the Electronic Aggregation of Documents or READ System is an online program that consists of a database and web scraper designed to automate the process of gathering and sharing faculty publications. READ will allow faculty to organize all of their publications and make any corrections that are required before sharing them to the public. Then the public may access and browse the publications using READ.

READ is an online system that will collect and store information on the publications and the grants obtained by various authors. READ will access the publications from various online

Lab2 –Prototype Specification sources and obtain the information about the publication including a link to where the actual publication is stored. The system will also allow for the authors to add their publications into

READ manually.

READ will also allow people from outside of the system to access it to see the

4 publication information that has been stored. People will be able to browse through publications and grants though a number of filters such as author, publication data, and keywords. The viewers will also be able to sort the filtered results by relevance

1.2. Scope

The READ prototype will be implemented in the Computer Science Department of Old

Dominion University. A prototype is needed because the scope of this project is larger than the timeframe allotted to create it. Some of the functionality of the READ system must be left out of the prototype.

The user types specified will be implemented in the READ prototype. The viewer, author, and administrator will all be included in the prototype. The functions of each of the user types will remain unchanged. After the prototype has been implemented, an administrator will be chosen from the faculty or the systems group.

1.3. Definitions, Acronyms, and Abbreviations

Administrator/Administrative User: a user with increased privileges for editing database content

Author: A person that is able to add and edit publications and grants to the system under their name.

5

Lab2 –Prototype Specification

BibTeX: A file format for reference information in XML format.

Computer Science (CS): An academic discipline based on advancing computing theory and algorithm development, that sometimes includes theory about software engineering methods.

Client application: In a client/server architecture, the module that takes input and creates queries to be processed by a server, and receives the results from the server.

Client/Server Architecture: A software engineering paradigm that separates functionality into a

“client” application and a “server” application that interact.

CSS: A programming language used to specify presentation of HTML pages

Data Mining: The act of going through a source of input to find specific information.

Database Schema: A description of the structure of database

Funding Agency: The source of funds for research grants. These organizations usually have a limited amount of money to (pass out) principle investigator’s that submit an accepted application for research funds.

GIT: A software system for controlling and organizing software versioning.

6

Lab2 –Prototype Specification

GoogleScholar ( http://scholar.google.com

): A website that stores academic publications.

Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc that can be interacted with via a mouse and keyboard, through which a user interacts with a software application. internet scraper: A program that is designed to sort through data that is stored online

Joomla!: A content management system for designing web interfaces.

JQuery Sparklines: A development library for the visualization of data.

ODU: Old Dominion University.

MicrosoftAcademic ( http://academic.research.microsoft.com/ ): A website that stores information on academic publications

MySQL: An implementation of SQL that is open source.

Parse: A technical term usually used to describe the processing of a statement written in a programming language.

Lab2 –Prototype Specification

Perl: A widely used programming language on the server-side of web applications.

PHP: A widely used programming language on the server-side of web applications.

Principle Investigator (PI): The primary researcher that a research grant is bestowed upon, responsible for documenting the work and publishing research results.

Publication or Academic Publication: A document created by a faculty member to share research. They are usually published in an academic journals, technical reports, and records of conference proceedings.

Query: An algorithm sent to the database to either change the database or get back results

READ: Repository for Electronic Aggregation of Documents

RSS: A specification for subscribing to and distributing news.

Scraper: An automated application designed to scan a source of input such as a document or a website for pertinent information.

Server application: In a client/server architecture, the module that takes queries or requests from

7

8

Lab2 –Prototype Specification a client module, process them, and returns the result to the client.

Software Compatibility: A description of whether different software, or versions of software, can communicate/interact.

SQL: A widely used programming language used to manipulate databases.

SQL injection: Performing unauthorized queries on a database for malicious purposes.

User Authentication: The process of verifying the access credentials of a user of an automated system, usually accomplished by requesting a username and password combination.

Viewer: an outside person who wishes to query the information contained in the READ database.

Version Control: A method for organizing and recording different versions of documents that have been created over time.

Virtual Private Server (VPS): A software version of a hardware server, used to create independent servers on a single piece of hardware.

Webserver: A group of applications run on a computer or VPS in to serve webpages and provide server-side computation for browser-based client applications.

Lab2 –Prototype Specification

XML: Extensible markup language.

1.4. References

Digest of Education Statistics . 2011. National Center For Educational Statistics Web. 19 Nov

2012.

http://nces.ed.gov/programs/digest/d11/tables/dt11_001.asp?referrer=report .

"Where Do Universities Get their Money From?." Free By 50. N.p., 13 2011. Web. 19 Nov

2012.

<http://www.freeby50.com/2011/11/where-do-universities-get-their-money.html>.

Lab 1 – READ Prototype Description. Version3. Repository for the Electronic Aggregation of

Documents

1.5. Overview

9

The product specification explains the various components that are involved in the READ prototype. The rest of the specification will explain the architecture and the included features.

The product interfaces will be explained.

Lab2 –Prototype Specification

2. General Description

READ is a system for storing and gathering information on publications. This does not necessarily encompass the storing of the actual text of the involved documents. READ will be used by the Old Dominion Computer Science department for use by the faculty.

10

2.1. Prototype Architecture Description

The prototype will consist of three main components that are the same as those of the finished product as shown in Figure 2. The prototype will include a basic user interface. It will also contain an implementation of the database. The Schaefer Scraper will be included to datamine websites.

The prototype will be implemented in the Computer Science Department’s servers. The database will be implemented using MySQL. The web based interface will be implemented in the prototype using Joomla!. Using this content management system will make logging in authors easier to implement the interface because, one of the team members working on READ already has a log in method for Computer Science faculty implemented in another project using

Joomla!. All of the queries to the database will be made through PHP scripts to interface between

MySQL and the web based interface. An interface between the Schaefer Scrapper and author information in the database will be written in python.

Lab2 –Prototype Specification

11

Figure 1 Prototype MFCD

2.2. Prototype Functional Description

The Prototype will include many of the features that are planned for in the final product as defined in Table 1. The prototype will allow viewers to search and filter the database through the web-site. It will also allow for minimal user-profile control. An RSS feed and email system will also be implemented, so that people can stay informed of what is contained within the database. Access Control will be a priority to prevent unauthorized users from updating author papers. The Schaefer Scraper will automate much of the process of updating the publication lists.

Lab2 –Prototype Specification

In the prototype, the Schaefer Scraper will search online for publication on one fourth of the

12 authors every week.

The prototype will not implement every feature of the finished product. The prototype will not include a learning algorithm that will make sure an incorrect paper is not resubmitted.

The prototype will also not include any visual representations of the data such as graphs and jQuery Sparkliness to display author statistics.

Features Real World Project Prototype

Browsing Capabilities Ability to browse all grants and publication

Ability to browse all grants and publications

Publication Filtering

Capabilities

Filtered by title, publisher, Filtered by title, publisher, authors, publication date, date authors, publication date, date added, and keywords. added, and keywords.

Grant Filtering Capabilities Filtered by title, funding agency, principal or coprincipal investigator, start date, end date, and active state.

Filtered by title, funding agency, principal or coprincipal investigator, start date, end date, and active state.

Add, edit, and delete publications and grants

Included. A thumbnail image Included. A thumbnail image and files may be associated and files may be associated with the document. Fields can with the document. Fields can

Lab2 –Prototype Specification

Features

Faculty page

Login interface

Profile Page

Scraper

13 be automatically filled in using a Bibtext document. be automatically filled in using a Bibtext document.

Real World Project Prototype

Lists faculty and provides a link to each person’s profile page

Not included.

Linked to Old Dominion Linked to Old Dominion

University Computer Science University Computer Science accounts accounts

Displays authors’ profile picture, job title, email address, personal webpage link, and the author’s publications and grants.

Displays graphs

Displays authors’ profile picture, job title, email address, personal webpage link, and the author’s publications and grants.

Graphs not included.

Will update the system with new publications and grants and alert users when one is added to the system under

Will update the system with publications only and alert users when one is added to

Lab2 –Prototype Specification their name.

Real World Product the system under their name.

14

Prototype Features

Prediction algorithm Predicts if the consumer has enough space to use the

READ system.

Not included

Administrative Privileges Administrators are able to Administrators are able to edit, add, or remove anything edit, add, or remove anything in the system. in the system.

Table 1 Key Prototype Features

2.3. External Interfaces

Interfaces for the READ system will be implemented using a client server system.

Because of this, most of the actual interfacing with the system will be done from the users’ computer and not physically with the system. All changes to the server will be done from a client system

2.3.1. Hardware Interfaces

The READ system will include no hardware interfaces other than the hardware on the client’s computer. Since READ is accessed in a web browser, the hardware required to run the

15

Lab2 –Prototype Specification client will include a screen and an internet connection. The system will not be directly accessible from the server.

2.3.2. Software Interfaces

The software interface will communicate SQL queries between the database and the user interface. It is necessary to have some level of security between this interface and the user interface, so that users cannot alter the database. A Bibtex parser will be used to parse the information received from the Bibtex information received by the Schaefer Scraper and place it into the database.

The Schaefer Scrapper will interface with Microsoft Academic in the prototype. It will gather information on author publications from the site in Bibtex format. The Schaefer Scrapper process is displayed in figure 2.

Figure 2. Scrapper Process Flow

16

Lab2 –Prototype Specification

2.3.3. User Interfaces

The user interface will compose of an online website hosted on ODU’s CS webservers.

The webpages will allow for users to log in, edit publication and grant information, and view publication and grant information. Publications will be kept on a separate page from grants. A welcome page will display recent publications and grants when the user first comes to the website. Each author will have their own user profile page which will be accessible through the faculty list page. The faculty list page will also include the profiles of graduate students who are authors at the bottom. The site map is displayed in figure 3.

Publication

READ

Homepage

Grant Administration

User Profile

Add

Publications

Add Grants

Edit

Publications

Edit Grants

Figure 3. Site Map

An automated email system and RSS feed will also be in place to inform users about publications added to the database. This system will inform the user that they have publications

Lab2 –Prototype Specification that need approval. It will also send a link with the email so that the user can be directed to the

17 publication that needs review.

2.3.4. Communications Protocols and Interfaces

Hypertext transfer protocol will be used to interface with web browsers. Transmission

Control Protocol and Internet protocol will also be used. Currently no other communication protocols will be used.

Download