Project Proposal Outline i. Title Page - student does ii. Approval Page - student does iii. Table of Contents - student does 1. Introduction and Background 1.1 Problem Statement - student does 1.2 Previous Work - student extracts from materials 1.3 Background - student extracts from materials 1.4 Glossary - student extracts from materials 2. Project Description 2.1 Functional Specification 2.1.1 Functions Performed - customize from below 2.1.2 Limitations and Restrictions - customize from below 2.1.3 User Interface Design [if required] - n/a 2.1.4 Other User Inputs [if required] - n/a 2.1.5 Other User Outputs [if required] - n/a 2.1.6 System Data Files - provided below 2.2 Design Specification 2.2.1 System Data Flow Diagrams - customize from this document 2.2.2 System Structure Chart - student does 2.2.3 System Data Dictionary - provided below 2.2.4 Equipment Configuration - provided below 2.2.5 Implementation Languages - provided below 2.3 Implementation Plan 2.3.1 Deliverable Items - student does, notes provided below 2.3.2 Milestone Descriptions - student does 2.3.3 Milestone Completion Criteria - customize from below 2.3.4 Schedule of Milestone Completion - provided below 3. References - student does 4. Qualifications - student does 4.1 Personal Background 4.2 Courses Taken 4.3 Programs Written 4.4 Investigations 4.5 Projects 5. Grading Criteria - provided below Detailed Description of Outline Sections You will be integrating a library system (“XXX” in the text below) into the IntegraL Digital Library Integration infrastructure. Your masters project will be writing this “XXX Integrator” (also called the “IntegraL Plugin”, wrappers, and document schema mappers). Your integrator will run in the background. You do not need to write any interface. Instead you will be creating the linking rules, parameters and document input that the IntegraL infrastructure uses to create the link anchors and links. from Computer Science Project Guide - Page 2 You will need to customize each of these descriptions for your XXX Integrator. 1. Introduction and Background 1.1 Problem Statement - student does 1.2 Previous Work - student extracts from materials 1.3 Background - student extracts from materials 1.4 Glossary - student extracts from materials Your four sections need to be different from everybody else’s including from your project partner. Please write this from scratch yourself. The glossary, especially should match your project. 2. Project Description The purpose of this section is to describe the proposed project in detail: what will you do, how will you do it, and when will you do it. 2.1 Functional Specification This is a detailed specification of functions performed by the proposed system, from an external or user perspective, not from an internal or programmer viewpoint. Thus, the system is regarded as a black box with various inputs and outputs related by the functions performed by the system. The description should be sufficient for another programmer to implement the system. 2.1.1 Functions Performed List and briefly describe each of the functions which the system will be designed to perform for its user: What the system will do. The XXX Integrator ensures that IntegraL can automatically add link anchors to the pages generated by the XXX system. The XXX Integrator will perform the following functions. 1. Parsing pages Whenever a page of information is about to display to the user, the XXX integrator will parse the page to identify the elements-of-interest, determine their location and assign each a unique ID. The XXX integrator will create an XML message that includes the page, and this information and its elements-of-interest. It will also pass the page to the IntegraL lexical analyzer to gather keyword elements-of-interest and receive the results. To distinguish your report from your partners’ you should list the pages you will be integrating here. State that more details will be given in section 2.3.1. 2. Pass commands to the XXX system from Computer Science Project Guide - Page 3 Whenever the user selects a link to the XXX system from the list of links generated by IntegraL, the XXX integrator will pass that command to the XXX system. (These links are generated by the linking rules described later (also called mapping or relationship rules).) 2.1.2 Limitations and Restrictions List and describe each of the internal (self) and external (environment) limitations and/or restrictions on the range of system functions: What will the system not do. DO NOT INSULT THE READER BY INCLUDING ITEMS THAT WOULD NOT BE A SURPRISE. 1. There is no user interface design involved in this project. IntegraL provides the user interface. The XXX Integrator will only provide background functionality. 2. The XXX Integrator only parses documents of certain types. Currently these are HTML documents. But a final determination will be made in September (based on work outside this masters project) whether to also parse PDF and MS Word documents. In this case the appropriate parsing tools will be provided. 2.1.3 User Interface Design Give a detailed description of the system user interface including diagrams of all the ``work'' windows (or screens or panes), a table of operations for each work window, and precise descriptions of each operation that the user would regard as unfamiliar. A work window is one that contains data the user is editing, browsing or viewing. This section is required for all programs that engage the user interactively. Refer to the sample in Section 3.4 of this document. 2.1.4 Other User Inputs Give a precise description of the other inputs to the system including source (human or storage) syntax (format) and semantics (meaning). Give examples. This section is required for all programs that obtain input from their environment non interactively. 2.1.5 Other User Outputs Give a precise description of the other outputs of the system including syntax and semantics. Correlate the outputs with the inputs and the functions performed. Give examples. This section is required for all programs that obtain input from their environment non interactively. n/a n/a from Computer Science Project Guide - Page 4 n/a 2.1.6 System Data Files Give a precise description of the data files created or maintained by the system. Thus, for example, you would include files in a database and you would exclude executable files and text files. configuration file: Configuration options for the module – including all URLs, parsing constants, database connection parameters (if any), logging options etc. log4j.properties file: IntegraL uses log4j as the logging framework – all logging statements must conform to this specification, configuration options are also reviewed from the log4j.properties file which must be present in the classpath of the application. Database table structure (if any) used by the module. NOTE no option shall be hard-coded into the source code. If the student delivers code that has hard-coded constants or configuration parameters this will result in a significant reduction in grade – list all module specific configuration options here. System-wide configuration options: o Configuration for the Mapping Rules Engine. o Mapping Rules registry – an XML file that contains a dictionary of types and their applicable mapping rules. o Logging options for IntegraL o Database connection parameters 2.2 Design Specification This is a top level preliminary or provisional indication of the proposed system architecture and flow. You should correlate system functions with system structure and interface specifications. 2.2.1 System Data Flow Diagrams This is a hierarchical (or leveled) set of diagrams showing the flow of data elements into and out of the functional units of the program, data stores and environmental sources and sinks. Labeled arrows denote data flows. This diagram is complementary to the structure chart described next. Refer to the sample in Section 3.4 of this document. IntegraL is a loosely coupled system built on the IBM Web-Based Intermediary (WBI) proxy platform. IntegraL is a proxy server that allows users to browse the WWW through it. from Computer Science Project Guide - Page 5 IntegraL Request Editor Web HTMLTokenizer Plugin Mapping Rules Engine When the user browses to page his browser sends an HTTP request to the website via the IntegraL proxy server. IntegraL modifies the request and stores user parameters for user-browsing analysis. As the destination web server returns its response it passes through the IntegraL tokenizer. The IntegraL tokenizer then parses the HTML stream into HTML tokens. The XXX Integrator (a.k.a IntegraL plugin) then receives this tokenized stream and marks up its elements of interest. This marked up stream of HTML tokens is then sent to the Mapping Rules Engine (MRE). The MRE parses these marked up tokens, and supplements with links based on the semantic type of the elements of interest (specified by the mapping rules, also called linking or relationship rules). 2.2.2 System Structure Chart(s) This is a (set of) chart(s) showing the functional units of the system hierarchically organized to show which units call, use or contain other units. Each interface between two units (a call) is annotated with small arrows and data item labels to show the data exchanged between the units. Refer to the sample in Section 3.4 of this document. [The student should add a flowchart describing the core units of XXX integrator] To distinguish your report from your partners’ you should include the modules from the XXX system that your project will be integrating within the structure charts. 2.2.3 System Data Dictionary This is a comprehensive dictionary of all the data items that appear in the system data flow diagrams and the structure charts. At a minimum it contains, for each data item, its identifier, any abbreviation used instead of the identifier, the name of the type of the data, and a definition of the data item in the form of either a symbolic expression or a precise description. Refer to the sample in Section 3.4 of this document. from Computer Science Project Guide - Page 6 IntegraL's data model is based on the HTTP request/response structure. Each request and each response consists of a structured part and a stream part. The structured part corresponds to the header and the stream part corresponds to the body. HEADER (DocumentInfo) POST http://www.ibm.com/java HTTP/1.0 User-agent: MyBrowser Accept: text/html Content-length: 15 BODY (MegInputStream) My name is Paul Figure 1. An HTTP request containing both a header (structured) part and a body (stream) part. Figure 1 shows an HTTP request that contains both a header and a body. When IntegraL receives an HTTP request, it is parsed into these two parts. The header information is stored in an object of class DocumentInfo and the body information is made available through an object of class MegInputStream. The body information can then be read from the MegInputStream using its read(...) methods. HEADER (DocumentInfo) HTTP/1.0 200 Ok Server: MyWebServer Content-type: text/html Content-length: 36 BODY (MegInputStream) <html> <h1>Hello, world</h1> </html> Figure 2. An HTTP response containing both a header (structured) part and a body (stream) part. Figure 2 shows a typical HTTP response. When IntegraL receives this response, it is parsed in the same way as a request. The header information is stored in a DocumentInfo object and as with the Request the body is made available through a MegInputStream object. To produce new requests and responses, a IntegraL plugin is given a DocumentInfo object and a MegOutputStream object to manipulate. One may set a property of the DocumentInfo object using either the setRequestHeader(...) or setResponseHeader(...) methods. The HttpHeader, HttpRequest and HttpResponse classes are designed to make producing such headers easier. When the header information has been set appropriately, the IntegraL plugin may begin writing the body content to the MegOutputStream using its write(...) methods. 2.2.4 Equipment Configuration from Computer Science Project Guide - Page 7 Describe the equipment you will use to support the operation and development of your system. All development will be on standard PCs or laptops, using a Java editing environment. Code will be checked into CVS. The final product will run on Linux with an Apache server. Development work can be conducted on any platform, however, and ported to the test and production server. 2.2.5 Implementation Languages List the programming languages you plan to use for the implementation of your project and give reasons for choosing each language. Implementation will be in HTML, XML and Java. Pages are displayed to Web browsers in HTML. XML is a standard industry message passing format and the prescribed format for the IntegraL system. Java is the standard programming language used within the IntegraL system and thus required for compatibility with the system. 2.3 Implementation Plan This is a description of the plan for implementing the project. Here you commit yourself to a course of action and specify the criteria by which your performance is to be judged. Your final grade will depend, in large measure, upon your success in achieving the goals agreed upon between you and your project advisor. 2.3.1 Deliverable Items List and describe each of the items you will submit in fulfillment of the project requirements. Deliverable items include, but are not limited to, program executable file(s), program data file(s), program listings, program documentation, user manual and sample program runs. 1. XXX Integrator system that parses the following types of pages generated by the XXX system: Here list the types of pages in the XXX system that you will integrate. Carefully go through the XXX system and list each kind of page the system generates - including the home page, query pages, query results, content pages, help pages, etc. Note that most systems have a set number of pages between 8 and 30. Most screens will fall under one of these types. For people working in pairs on their system, split these pages between the two project proposals. Also list the kinds of elements on each page that IntegraL could place links on (using linking rules - see #2 below). Note that these elements may receive links to services from other systems, so don’t worry if you can not think of any services from the XXX system for a given element type. Include it anyway. To distinguish your report from your partners’ you should provide a detailed description (1 paragraph or more) about each of these pages. After the page description, list the actual elements on that page that you will place links on. from Computer Science Project Guide - Page 8 2. Linking rules for the following services and elements within the XXX system: Here identify the kind of services that the XXX system can provide for each type of element-ofinterest or object. Each service for each type of element will be one linking rule (also called mapping rules). For example if a library system were presented with a keyword, it could search for all documents with that keyword. If a library system were presented with an author, it could search for all documents written by that author. If a library system presented with an ISBN, it would see if it has that ISBN available. Figure out all the kinds of services your system provides for any kind of object. For people working in pairs on their system, split these services between the two project proposals. To distinguish your report from your partners’ you should provide a detailed description about each linking rule. Note from the PowerPoint presentation and some of the documentation that each linking rule has 6 parameters. Give your first cut at the values of these 6 parameters. For the relationship metadata, you can give a short “semantic description” of the link. For the condition, you can state “only for authenticated users” if XXX is a subscription database that the library pays for, or “none” if it is a generally available system. 3. The names and locations of any glossaries and thesauri the XXX system provides: See if any kind of glossary or thesaurus is available within your system and list these here. If so state that you will provide the IntegraL team with details about these glossaries/thesauri. Not all systems will have glossaries/thesauri. For people working in pairs on their system, only one of you should list this. Only one partner in a team will work on glossary/thesauri integration. This is a fairly trivial task. This should not appear in the proposal of the other partner anywhere. Include a description of any glossaries or thesauri the XXX system has - their location and what they contain. Write this yourself. It should not be the same as the description for the integrations that other teams are doing. 4. Search integration for the XXX Integrator. Most systems provide a search API. You will identify the search API that the XXX system uses and provide it to the IntegraL team. If the XXX system has no search API then you will need to write a search wrapper, which would be similar in detail to 2 page parsers. (We’ll help you do this.) Determine whether your system has a search API for your proposal. For people working in pairs on their system, one of you must do this. If your system has a search API, then this will be no additional work for you once identified. If your system needs a search wrapper, one partner should plan to integrate 3 fewer pages (item #1) than the other partner. Only one partner in a team will work on search integration. This should not appear in the proposal of the other partner anywhere. Include a description of the integration you will have to perform. Write this yourself. It should not be the same as the description for the integrations that other teams are doing. from Computer Science Project Guide - Page 9 5. Program code for the XXX Integrator. 6. Detailed documentation for the XXX Integrator. 7. Detailed documentation for each linking rule. 8. Detailed documentation of search interface. Only one partner in a team will work on search integration. 9. others? Only include this if you know what “others” you are going to do, and then list them explicitly! 2.3.2 Milestone Identification Identify each of the milestones or check points that mark the completion of some phase of project implementation. Milestones include, but are not limited to, detailed system analysis, system design, file design, module design, system test design, module coding, working breadboard with stubs, working system with stubs, system testing and documentation. List each of the deliverables from section 2.3.1 in the following items. 1. Detailed analysis of XXX system. 2. Detailed analysis of how to parse each type of page. 3. Detailed analysis of each linking rule. 4. Implementation of parsing each type of page. 5. Implementation of each linking rule. 6. Integration of search into IntegraL’s MetaFed environment. 7. whatever else is relevant (e.g., from paragraph description above) Only include #7 if you know “whatever else” you are going to do, and then list them explicitly! 2.3.3 Milestone Completion Criteria List the criteria by which the completion of each milestone is to be judged. If an objective measure is available then it should be specified. If a personal judgment is required then indicate who will make the determination. This information may be given in tabular form if desired. from Computer Science Project Guide - Page 10 Read this and only include what is relevant to your project. Don’t just copy the text directly. Deliver a technical specification describing the class-structure, data flow diagram and database structure if any by week 5. During development review with Project Leader any issues, problems you maybe having. Deliver a working version of your project by Week 12. If your project cannot be executed it should have been made clear to the Project Leader during the development phase, there should not be any surprises! Deliver a Project report in the departmental format by the end of Week 13. 2.3.4 Schedule of Milestone Completion Prepare a diagram or table giving the proposed completion date for each of the milestones listed in the previous two sections. See the sample in section 3.4 of this document. List each of the deliverables from section 2.3.1 in the following items. Weeks 1-3: Research & Analysis Weeks 3-5: Preparing Technical Specification, obtaining signoff from the Project Leader Weeks 6-11: Development Week 12: Testing and final review Week 13: Final changes if any. 5. Grading Criteria In this section you establish and define the criteria governing the grading of your project. Here you specify the relative emphasis you wish to be placed upon the different phases of your project. Assign a weight to each of the deliverable items and/or milestones listed in section 5.3 of the proposal so that the weights sum to one. Display this information in a table which your advisor will use in determining your grade for the project. Refer to the sample in section 3.4 of this document. You can note that the following criteria have been assigned by Professor Bieber. Does the student show an understanding of the basic principles of IntegraL, this must be clear from the documentation to be delivered – 40% Is the project functional – 10% Programming style – 10% Documentation (Tech. Spec, Project Report, Code documentation, comments etc.)- 40%