Legal Data Markup Software CS501 Requirements Presentation October 4 , 2000

advertisement
Legal Data Markup Software
CS501 Requirements Presentation
October 4th, 2000
Project Team
Sponsors
Developers
 Professor William Arms
 Ju Joh
 Professor Thomas Bruce
 Sylvia Kwakye
 Jason Lee
 Nidhi Loyalka
Reviewer
 Omar Mehmood
 Amy Siu
 Charles Shagong
 Brian Williams
Introduction



Objective: US Code (ASCII)  Wellformed, valid XML output
XML output used as input to other
applications
Goal of end-use: Making law available
for general public use
References

Current version of code: US Code 
HTML
XML tutorials and faqs
Tasmanian SGML DTD’s (EnAct)
W3C XML draft specification

The Perl CD Bookshelf



Overview





Functional Requirements
Usability Requirements
Minimum Performance Requirements
Design Constraints
Supportability Requirements
US Code

Acts of Congress (Law)



50 Titles (e.g. Armed Forces, Bankruptcy,
Copyrights, Labor, Patents, Transportation)
Constantly Updated by Congress
Each Title posted online with revisions in
ASCII format
Legal Information Institute

Associated with Cornell’s Law School

Founded in part by Thomas Bruce

Goal: Publish US Code on Web in a
presentable format for general public
Problems

Current version has difficulties with:



Overall US Code structure variations
Tables, footnotes, appendices
HTML lacks archival qualities of XML,
since it fails to show structural
relationships.
Title 1 (LII HTML)
Title 1 (ASCII from Congress)
Title 26 (LII HTML)
Title 26 (ASCII from Congress)
Title 50 (LII HTML)
Title 50 (ASCII from Congress)
Solution

LDMS will have to :








Maintain structural layout of US Code
Generate cascading table of contents
Allow title or full text search
Markup and preserve notes
Link cross-references
Preserve Catch lines
Generate Appendices
Highlight reserved words
Functionality


Directly follows from client-specified
qualities
Functional requirements

Table of Contents Generation
Direct representation of hierarchy inherent to
structure of US Code
Functionality

Functional requirements

Appendices Generation
LDMS will recognize appendix sections and markup
their constituent elements

Catchline Handling
LDMS will recognize short headers in US Code,
appropriately marking them
Functionality

Functional Requirements
 Preservation of Cross-references
LDMS will recognize self-referential links by
establishing anchors and links between text
sections

Table Handling
LDMS will recognize tabular data in US code,
marking up and organizing data elements into
proper dimensions and indices
Functionality

Functional Requirements
 Preservation of Notes



Critical for references, background information,
and sources
LDMS will recognize notes
Reserved Words Recognition


Critical attributes to entire subdivisions of text
LDMS will markup applicable text
Functionality

Functional Requirements

Graceful Failures


LDMS will markup unrecognizable variations in US Code
titles as such. If at all possible, LDMS will maintain
readability despite the graceful failure.
Special Character Handling


Non-standard characters have different meanings
LDMS will recognize, markup and represent nonconventional characters
Functionality

Functional Requirements

Navigational Aids


LDMS will facilitate next/previous reference
links.
Known Data Input Path

Raw ASCII US Code input located in known
directory
HOQ
+
+
+
+
Engineer Req. XSL
ASCII -> Unicode
Word Pattern Matching Special DTD Tags White Space Pattern Matching State Machine
Client Req.
Appendices
+
Special Characters +
Cross Ref.
+
Structural Layout +
Tables
+
TOC
+
Catch Line
+
Notes
+
Next/Prev
Graceful Failure
Magic Word
+
Difficulty
2
Importance
2
Least to Most (1 to 6)
+
+
+
+
1
1
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
5
5
3
6
“+” Positive Correlation between two requirements.
House of Quality
+
+
+
+
+
+
+
+
+
+
4
3
6
4
Usability

Development and Application Environment



Red Hat Linux running on Leda
Cron daemon will execute software at client
specified intervals
Two levels of users for human operation of
LDMS


Normal users
Power users
Normal Users





Computer Literacy assumed
Familiarity with Linux operating system
Required to start and/or stop program
from Linux command line window
Application Manual provided for training
30-60 minutes expected training time
Power Users

Familiarity with:




Standard development directory with




Linux operation system
Perl programming language
XML, DTDs and US code
LDMS source code
source code documentation
help files, and manual page
will be provided
One week expected training time
Time Estimation for Measurable Tasks

Given specifications of Leda, estimates for
conversion of all fifty titles of US code to XML

30 minutes to read US Code in its entirety

12-24 hours for conversion processing
Status Messages

During execution of LDMS, display
status messages at client-specified
intervals, notifying the user of the
progress within the current title.
Reliability

Availability


Available for use 100% of the time
Mean Time Between Failures (MTBF)


Product designed to fail gracefully
Exceptional errors should not occur within
useful lifetime of 3 years
Reliability

Mean Time To Repair (MTTR)
In case of product failure, MTTR depends on nature
of fault




Cause: Transient error in underlying platform
MTTR: Time taken for the job to be restarted
Cause: Fatal error in underlying platform
MTTR: Time taken to restart the system
Cause: Semantic Error within program
MTTR: Requires repair by reprogramming offending part
of product
Cause: Error in input.
MTTR: Time required to correct input and/or output
manually.
Reliability

Accuracy




Paramount to success of project
Must generate XML that reproduces original
structure within defined tolerances
Validation and integrity testing performed using
XSL stylesheet to view generated XML
Various components and tolerance levels of
accuracy are:


Structure represented by XML output: 95% accuracy
Table of Contents: 95% accuracy
Reliability







Reserved Words: 95% accuracy
Cross-references: 75% accuracy
Appendices: 75% accuracy
Catchlines: 95% accuracy
Preservation of Notes: 75% accuracy
Handling Tables: 75% accuracy
Handling Special Characters: 75% accuracy
Reliability

Acceptable Bugs


Delivering a perfect program is impossible
Bugs and defects not directly affecting
usability of program or accuracy of output
will be deemed tolerable
Supportability

Output file naming convention


Source Level Documentation




Take input filename, attach “.xml”
extension
All code, Use Peer Review
Standard Unix Manual (Man) Page
Program Design Document (PDD)
DTD Design Document (DDD)
Performance

Transaction Response Time



Average per US Code Title: 30Min. ±10Min.
Capacity: 1 Transaction at a time
Resource Utilization

12MB System Memory



2MB – Interpreted Perl Code
5MB – Input data buffer
5MB – Output data buffer
Design Constraints




OS: Leda – Redhat Linux
Development Language: Perl
File Input: ASCII
File Output: XML
Development System

leda.law.cornell.edu

233Mhz Pentium II

128MB RAM

28GB HDD
Software Interfaces
ASCII
LDMS
DTD
XML
Licensing Requirements

Extendable by Client



Possible Future Revenues
Might use downloaded Library Code
Joint Authorship Agreement written to
address Licensing
Joint Authorship Agreement
The undersigned agree to the following:
1.
That all code, documentation and other copyright-protected material produced in the
course of this CS501 project (PROJECT MATERIAL) shall be understood by all to be the
work of joint authors and not as a work made for hire;
2.
That the joint authors shall include all the undersigned, the CS501 students working on
the project and Thomas R. Bruce;
3.
That despite joint authorship there will be no duty on the part of the student authors,
individually or as a group, to account for any return on subsequent commercial use or
development of the PROJECT MATERIAL;
4.
That, in contrast, should Thomas R. Bruce or the Legal Information Institute realize
royalties or other direct financial return from licensing any of the PROJECT MATERIAL
there will be a duty to account to the other joint authors for any such revenue net of
costs; and
5.
That the undersigned will use care to assure that the PROJECT MATERIAL does not
incorporate code covered by copyright and licensed on terms that are inconsistent with
unlimited noncommercial distribution.
Legal, Copyright, and other Notices

No Warranty; however

Developers will do their best to fulfill
requirements, but have no legal duties to
do so
Applicable Standards
XML

Download