Running Head: Troy M. Connor Lab 1 Lab 1 – READ product description Troy M. Connor CS411W Janet Brunelle February 18, 2013 1 Running Head: Troy M. Connor Lab 1 2 TABLE OF CONTENTS 1. INTRODUCTION 3 2 READ PRODUCT DESCRIPTION 4 2.1 Key Product Features and Capabilities 4 2.2 Major Components (Hardware/Software) 6 2.3 Target Market/Customer Base 7 3 READ PRODUCT PROTOTYPE DESCRIPTION 7 4. 5. 3.1Prototype Functional Goals and Objectives 7 3.2 Prototype Architecture (Hardware/Software) 8 3.3 Prototype Features and Capabilities 9 3.4 Prototype Development Challenges 9 Prototype Description 10 4.1 Major Functional Components (Hardware and Software) 10 4.2 Features and Capabilities 11 Glossary 13 FIGURES FIGURE 1 8 FIGURE 2 11 TABLE 1 12 Glossary 14 Running Head: Troy M. Connor Lab 1 3 Introduction In the United States there are over 4,700 research institutions (Digest of Education and Statistics). The way that the public views the research being done by these institutions are done by an online list. Usually the process is manually done and can cause problems when the interested public is looking for specific research interests. These web pages are rarely kept up to date and do not provide an adequate representation of what research or publication was added. Old Dominion University’s Computer Science Department (ODUCS) is amongst the research institutions with this issue. They face the problem of having a web page dedicated to the listing of publications that has not been updated since 2008. Research grants funded to the University are also in disarray and not currently being kept up to date. The problem is that the process of entering grants and publications into the system becomes a single person’s responsibility. This makes it difficult for people interested in viewing research grants and publications from Old Dominion’s Computer Science Department. People interested in Old Dominion University’s research will not be able to find updated publications or grants. The Repository for Electronic Aggregation of Documents (READ) is being developed by members of ODUCS. Its design will not only serve the Computer Science department, but other departments as well. It is designed to automate the adding of publication and research grants for the public to view on an online web page. This will solve the need of finding publications or grants to promote the interest in any research institution. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 1 4 READ Product Description READ is an online system database that will house grants, publications, or links that will provide a means for viewers to see them using a number of specific filters. The goal is to minimize the need for an author to manually manage their publications through the features that READ will employ. READ will more efficiently advertise ongoing research, and available grants to whoever requires. READ will have the users to work less in order to READ more. 2.1 Key Product Features and Capabilities READ will search through different websites using a tool known as the Schaefer Scraper™ for publications that match a registered authors credentials. The Schaefer Scraper tool searches the Internet of current faculty recent publications and research grants and records the necessary data into a database. READ will store the information as a link to the document. The author will receive an email notification in order to authorize the publication. The system will eventually learn when not to notify a user based on patterns provided by the author. This allows for an efficient way to gather all of an author’s publications into a single location seamlessly, with little extra work on their behalf. The system will allow viewers to browse through READ using filters for grants, multiple types of publications, and authors. Viewing author’s profiles will show personal information, and a graphical representation of both publications they have created, and, funding they received through grants they participated in along with a list of said publications. This statistical information may inform viewers about a specific author’s area of expertise, level of activity, and a point of contact should they be interested in research of similar topics. Running Head: Troy M. Connor Lab 1 (This space is intentionally left blank.) 5 Running Head: Troy M. Connor Lab 1 6 2.2 Major Components (Hardware/Software) The READ solution will consist of three major components: a web interface, a database, and Schaefer's Scraper. The web interface contains both public and private sections. The section allowing the authors to manipulate publication data will be private. Authentication will be provided to allow authors to log in and out of the system. The database is what will store the publications for the authors using the system. Its primary focus will be to show the publications that the authors have made. This will allow the viewing public to see publications and grants that Old Dominion University faculty and staff currently have available. The Schaefer's Scraper is the last part of READ. This software will automatically search websites for publications from authors at Old Dominion University Computer Science Department. On regularly scheduled intervals, the Schafer Scraper will export data found from authors at Old Dominion Computer Science faculty to a text file. This data will be parsed to add publications to the database. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 1 7 2.3 Target Market/Customer Base The initial customer for READ is Old Dominion University's Computer Science Department. The issue started when Dr. Weigle, who produces a lot of publications, saw her work being unrecognized. According to the Computer Science website, the department features 37 faculty members, 11 currently enrolled Ph.D. students, and 111 currently enrolled Master's students. All these potential authors could benefit from this service once implemented. When testing at ODUCS becomes successful, READ would then be used by other departments at Old Dominion University. READ could potentially be utilized by other universities, government facilities, non-profit research institutions, or even libraries. 3 READ Product Prototype Description The READ prototype will be modeled using the Old Dominion University Computer Science Department’s publications and hosted on a virtual machine running a Linux-based operating system. This prototype will also include all of the features of the Real World Product (RWP) except for the prediction algorithm that calculates the size and storage needs of an expanding database. The prototype will act as a template to the Real World Solution provided to the ODUCS. Overall this prototype, once it is completed, will be a vital and usable tool to organize publications and grants for the University. 3.1 Prototype Functional Goals and Objectives The READ Prototype will be able to search through the database and filter the results based on user queries and implemented RSS feeds. It will also allow for users to log on, edit, Running Head: Troy M. Connor Lab 1 8 and upload data, give functional control of the web application to administrators, utilize the Schafer Scraper to find publications, and insert them into the READ database. These functions of the prototype will allow for easy access to numerous publications, grants, and tech reports written by Old Dominion University faculty. 3.2 Prototype Architecture Figure 1 The READ prototype will allow a user to log into the system with a web-based interface. This will then let the user search for or add publications to the database. The public will be able to view all grants and publications from any computer connected to the Internet. The Schafer Scraper will run on an interval in order to populate the database with information. The administrators of READ will define this interval to facilitate the population of grants and publications. Running Head: Troy M. Connor Lab 1 9 (This section intentionally left blank.) 3.3 Prototype Features and Capabilities The READ prototype will be capable of storing and finding various kinds of publications and research grants. The READ prototype will be able to display this information to various users. It will also be capable of informing users of the publications found in order to confirm the authenticity of the publication. The system will allow them to upload publication or research grants data themselves. This will solve the problem by significantly easing the amount of effort that is required for the people making the publications to put into the system. The READ prototype will also reduce the various risks that are inherent in this kind of application. The legal risks of displaying copyrighted papers will be reduced by having a READ administrator go through the database and removing any papers that ODU does not have the legal right to display. It will also address various compatibility risks by making alternate versions of the page that should work with various web browsers. Security risks, such as SQL injection, will be taken care of with encryption. It will handle the risk of hard copy publication submission by making users scan documents before uploading them. 3.4 Prototype Development Challenges Some challenges the READ prototype will face include learning the architecture and implementing the Schaefer’s Scraper. Most of the group members looked over the Schaefer Scraper, and implementing it will not be as simple as installing it. In order to properly utilize the tool, the final stages to the building block will have to be implemented. This means, the tool is incapable of getting the necessary data to populate the database. Another challenge will be Running Head: Troy M. Connor Lab 1 10 getting someone from the systems group to act as an administrator for the system. This challenge will be solved by the fact that a group member is part of the systems group. 4. Prototype Description The prototype for READ will include populated test data that has an access control interface to log in as administrator or faculty member. This prototype will allow a representation for the Real World Problem. It will demonstrate what can be achieved with a simple Real World Solution. It will simulate the full product by allowing all levels (user, faculty, admin) to be able to see the publications and grants. The faculty members will be able to add citations and thumbnail images. The administrator will be allowed to promote members, make accounts, suspend accounts, delete publications, and delete grants. All users will be allowed to view the publications by filtering the queries with certain options. They will include, the year, the author, and type of publication. The user will be able to view all of the publications of a particular user and use the date range slider to filter through them. The user will be able to see the statistics of each faculty member along with some spark lines to visually show the statistics. 4.1 Major Functional Components (Hardware and Software) The major functional components that are in READ are the user interface, a database and software to scrape sites. (Schafer’s scrapper). These components will allow the prototype of READ to be implemented on the ODU CS server. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 1 11 Figure 2 4.2 Features and Capabilities The following features and capabilities are going to be implemented to the READ prototype. The search and filter will allow users to specify how they would like to look for publications and grants. The profile-edit page will allow the author to make their page unique to their liking. The RSS feed will show the publications and grants most recently added to the database. The Schaefer Scraper will pull data into the database on timed intervals to populate the database automatically. Access controls will allow authors and users to have different roles when using the system. The prototype for READ will accomplish everything set out for the Real World Problem’s solution with the exception of sparklines. (This section intentionally left blank.) Running Head: Troy M. Connor Lab 1 12 Table 1 Features Real World Project Prototype Browsing Ability to browse all grants and Ability to browse all grants and Capabilities publication publications Publication Filtered by title, publisher, authors, Filtered by title, publisher, authors, Filtering publication date, date added, and publication date, date added, and Capabilities keywords. keywords. Grant Filtering Filtered by title, funding agency, Filtered by title, funding agency, Capabilities principal or co-principal principal or co-principal investigator, start date, end date, and investigator, start date, end date, and active state. active state. Add, edit, and Included. A thumbnail image and Included. A thumbnail image and delete publications files may be associated with the files may be associated with the and grants document. Fields can be document. Fields can be automatically filled in using a automatically filled in using a Bibtext document. Bibtext document. Running Head: Troy M. Connor Lab 1 Faculty page Lists faculty and provides a link to 13 Not included. each person’s profile page Login interface Profile Page Scraper Prediction algorithm Linked to Old Dominion University Linked to Old Dominion University Computer Science accounts Computer Science accounts Displays authors’ profile picture, job Displays authors’ profile picture, job title, email address, personal webpage title, email address, personal webpage link, and the author’s publications and link, and the author’s publications and grants. Displays graphs grants. Graphs not included. Will update the system with new Will update the system with publications and grants and alert users publications only and alert users when when one is added to the system under one is added to the system under their their name. name. Predicts if the consumer has enough Not included space to use the READ system. Administrative Administrators are able to edit, add, or Administrators are able to edit, add, or Privileges remove anything in the system. remove anything in the system. Running Head: Troy M. Connor Lab 1 14 Glossary: Administrator/Administrative User: a user with increased privileges for editing database content Author: A person that is able to add and edit publications and grants to the system under their name. BibTeX: A file format for reference information in XML format. It will be used to automatically fill in key information when uploading or editing publications and grants. Computer Science (CS): An academic discipline based on advancing computing theory and algorithm development, that sometimes includes theory about software engineering methods. Client application: In a client/server architecture, the module that takes input and creates queries to be processed by a server, and receives the results from the server. Client/Server Architecture: A software engineering paradigm that separates functionality into a “client” application and a “server” application that interact. CSS: A programming language used to specify presentation of HTML pages Data Mining: The act of going through a source of input to find specific information. Database Schema: A description of the structure of database Funding Agency: The source of funds for research grants. These organizations usually have a limited amount of money to (pass out) principle investigator’s that submit an accepted application for research funds. GIT: A software system for controlling and organizing software versioning. GoogleScholar (http://scholar.google.com): Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: articles, theses, books, abstracts and court opinions, from academic publishers, Running Head: Troy M. Connor Lab 1 15 professional societies, online repositories, universities and other web sites. Google Scholar helps you find relevant work across the world of scholarly research. scholar.google.com Graphical User Interface (GUI): A computer interface composed of icons, text fields, menus, etc that can be interacted with via a mouse and keyboard, through which a user interacts with a software application. Used to differentiate from a “command-line interface”, in which a user interacts with a software application solely through a text terminal. Internet scraper: internet scraper / web scraper - (wikipedia) web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. JQuery Sparklines: A development library for the visualization of data. ODU: Old Dominion University. MicrosoftAcademic (http://academic.research.microsoft.com/): Microsoft Academic Search is a free service developed by Microsoft Research to help scholars, scientists, students, and practitioners quickly and easily find academic content, researchers, institutions, and activities. Microsoft Academic Search indexes not only millions of academic publications, it also displays the key relationships between and among subjects, content, and authors, highlighting the critical links that help define scientific research. Microsoft Academic Search makes it easy for you to direct your search experience in interesting and heretofore hidden directions with its suite of unique features and visualizations. MySQL: A database querying language. Parse: A technical term usually used to describe the processing of a statement written in a programming language. May be used generally to describe the processing of any statement for Running Head: Troy M. Connor Lab 1 specific meaning. Perl: A widely-used programming language on the server-side of web applications. PHP: A widely-used programming language on the server-side of web applications. Principle Investigator (PI): The primary researcher that a research grant is bestowed 16 Running Head: Troy M. Connor Lab 1 “Delta Cost Project Data.” The Delta project on Postsecondary Education Cost, Productivity, and Accounatablilty. The Delta project n.d. Web 9 Feb 2013 17