Legal Data Markup Software CS501 Design Presentation November 9 , 2000

advertisement
Legal Data Markup Software
CS501 Design Presentation
November 9th, 2000
Project Team
Sponsors
Developers
 Professor William Arms
 Ju Joh
 Professor Thomas Bruce
 Sylvia Kwakye
 Jason Lee
 Nidhi Loyalka
Reviewer
 Omar Mehmood
 Amy Siu
 Charles Shagong
 Brian Williams
Introduction



Objective: US Code (ASCII)  Wellformed, valid XML output
XML output used as input to other
applications
Goal of end-use: Making law available
for general public use
Overview





Development Environment
Execution Environment
Software Design
DTD Design
Packaging
Development Environment

Hardware

Server




233 MHz Intel PII processor
128 MB memory
28 GB hard disk
Notebook Computers



400 MHz Intel Celeron processor
96 MB memory
4.7 GB hard disk
Development Environment

Software






Red Hat Linux 6.2
Perl 5.6
SSH Secure Shell 2.3
CVS 1.10.7
Emacs 20.5.1
VIM 5.6
Execution Environment

Caveat
Client upgrades execution hardware and
software environment at own risk.
LDMS not guaranteed to work under
new conditions.
Execution Environment

Naming Standards

General Rule



Filename Naming Convention


Must start with a word in lower case.
First letter of addition words in upper case.
Example: thePerlFile.pl
File Name Length

Maximum of 20 characters.
Execution Environment

Naming Standards

Function Names



Must begin with a verb
Example: initializeModule
Variable names


Must begin with qualifiers
Example: $error_LastErrorMessage
Execution Environment

Naming Standards

Filehandle Names


Xml Output File Names


Must be all capital letters
Same as input file name with “.xml” extension
DTD Element Names


Element names in capital letters
Nested element names start with DIV
Execution Environment

Coding Standards





A function shall not exceed 100 lines.
A function shall have preceding comments
on its purpose, pre- and postcondition.
A variable shall have a purpose comment.
Each loop shall have begin and end
comments.
A 3-space indentation shall be used for
each block of code.
Execution Environment

Coding Standards




Perl contractions shall not be used.
Each file shall have a modification history
log.
Each file shall include a copyright and
license notice.
Version number shall correspond to major
and minor revisions to software
Software Design

System Architectural Components





Modules and their descriptions
Design Constraints
Error Handling
Application Environment
User Interfaces
System Architecture
Program
Read and
Parse File
Language
Parsing
Output
Figure 1: Top-level diagram of major architectural components.
UML Component Diagram
LDMS Main
File Parser
IH
SM
Natural
Language
WsPM
WPM
Output
EMH
FC
StM
XMH
File Parser Component
File Parser
Input
Handler
State
Machine
Natural Language Component
Natural
Language
Whitespace
Pattern Matching
Word
Pattern
Output Component
Output
Error
Message
File Creator
Status
Message
XML Output
Handler
WhiteSpacePatternMatching
StateMachine
WordPatternMatching
Input
StoreAndOutputErrors
StoreAndOutputFile
CreateFile
Status
Figure 2: UML class diagram for LDMS
Design Constraints



8-bit ASCII input files.
Non-uniform title structure.
Unattended operation.
Title Variation Example
-CITE11 USC Sec. 506
01/23/00
-EXPCITETITLE 11 - BANKRUPTCY
CHAPTER 5 - CREDITORS, THE DEBTOR, AND THE ESTATE
SUBCHAPTER I - CREDITORS AND CLAIMS
-HEADSec. 506. Determination of secured status
Title Variation (cont’d)
-CITE46 USC Sec. 13102
-EXPCITETITLE 46 - SHIPPING
Subtitle II - Vessels and Seamen
Part I - State Boating Safety Programs
CHAPTER 131 - RECREATIONAL BOATING SAFETY
-HEADSec. 13102. Program acceptance
01/05/99
Error Handling



Handled at topmost level.
Processed by StoreAndOutputErrors
module.
Standard report format:
<date> <time> <input filename> <user id>
<line number> <error message>

Four main categories of errors.
Error Categories
Error
Resolution
Print brief usage help,
exit.
Exit and log error
Output file already
message unless overwrite
exists.
flag is set.
Log to standard error,
Linux system error.
exit.
Non-critical data error. Tag region as
unprocessed, continue.
Improper command.
Application Environment

Preconditions




Input files must exist in a known path.
Required hardware and software must be
available.
Sufficient system resources must be free.
Postconditions

A valid, well-formed XML document
conforming to our DTD will be produced.
User Interface Design



Very little runtime interactivity required.
Command-line operation.
Allows batch processing.
Command-Line Arguments
Parameter
Effect
-O <filename> Output XML to <filename>.
-F
-V
-L#
-?
Force overwriting of existing
file.
Verbose error and status
messages.
Status messages every # lines
processed.
Display help message.
Status Reporting


Frequency of status reports controlled
by -L parameter.
Default is no status reporting.
Module Diagrams

Diagrams can be divided into two
categories:

Structural diagrams.


Flow diagram.
Behavioral diagrams.


Culture diagram.
Context diagram.
Flow Diagram
U.S. Code
(ASCII)
U.S. Code
House
Cornell LII
Public
U.S. Code
(ASCII)
LDMS
U.S. Code
(XML)
Culture Diagram
House
Format of
code is not
negotiable.
“Why does
publishing take
so long?”
Cornell LII
Seriously faulty
input must be
manually resolved.
LDMS
XML should be
double-checked.
Public
Context Diagram
House of
Representatives
Legal Data
Markup System
Produces
Uses as Input
Produces
XML
Executes
U.S. Code
Downloads
Cornell
Legal Information
Institute
Publishes
DTD Schema
STRUCTDIV
TITLEDATA
NAVGROUP
CITE
HEAD
EXPCITE
SOURCE
DIVEXPCITE
DIVSOURCE
STATUTE
(FIELD TAGS)
STATAMEND
DATATEXT
DATATEXTNAME
XREF
The <STRUCTDIV> Tag
Generic tag to define structural divisions. May
contain <TITLEDATA>, parsed character data
(#PCDATA), or another <STRUCTDIV>.




NAME - Label of division.
VLEVEL - Depth of division.
HLEVEL - Sequential order of division.
EID - Globally unique identifier.
The <TITLEDATA> Tag
A container for sequences of fields
(dashline-tagged text). May contain
<NAVGROUP>, <STATUTE>, #PCDATA,
or any of the field tags (MISC1-MISC8,
REFTEXT, COD, CHANGE, TRANS, EXEC,
CROSS, SECREF).
Navigational Tags





<NAVGROUP> - Container for
navigational information, such as
<CITE>, <HEAD>, and <EXPCITE>.
<CITE> - Label, section number, and
title.
<EXPCITE> - Hierarchy of catchlines.
<DIVEXPCITE> - Individual catchline.
<HEAD> - Name of current TOC
section.
Content Tags




<STATUTE> - Container for actual legal
data.
<SOURCE> - List of relevant sources.
<DIVSOURCE> - Individual sources
within a <SOURCE> tag.
<STATAMEND> - Amendments to a
statute.
Data Tags



<DATATEXT> - Text that consists of a
centered header, followed by content.
<DATATEXTNAME> - Header of the
current data.
<XREF> - Cross-reference: a link to
another area of the USC.
LDMS Tags in Action
-CITE1 USC Sec. 1
-EXPCITETITLE 1 - GENERAL PROVISIONS
CHAPTER 1 - RULES OF CONSTRUCTION
-HEADSec. 1. Words denoting number, gender, and so forth
…
01/23/00
LDMS Tags in Action
<STRUCTDIV name=”Sec.” vlevel=”3” hlevel=”1” eid=”112358”>
<TITLEDATA>
<NAVGROUP>
<CITE titlenumber=”1”>
1 USC Sec. 1
</CITE>
<EXPCITE level=”3”>
TITLE 1 - GENERAL PROVISIONS
CHAPTER 1 - RULES OF CONSTRUCTION
</EXPCITE>
<HEAD>
Sec. 1. Words denoting number, gender, and so forth
</HEAD>
…
01/23/00
Packaging

Release package will include:




Documentation
Source Code
Executable Files
Data Files
Documentation



Source-level documentation.
Program design document.
DTD design document.
Source-Level Documentation



Required for inclusion in each build.
Source code comments.
Separate text files.
Program Design Document





Intended as developer/maintainer
resource.
High-level view of processing engine.
Individual processing components.
Component interfaces.
Updated as development progresses.
DTD Design Document




Resource for DTD developers and
maintainers.
List of all elements and use.
List of all attributes and use.
Modified as development progresses.
Source Code



Source code for prototypes will not be
considered deliverables.
Testing harnesses will not be considered
deliverables.
All source code for release version will
be provided.
Executables and Data Files



One executable script file.
No other executables will be included.
DTD will be considered a deliverable.
Installation





No installation script is planned.
Path to Perl binary must be specified at
head of executable script.
Project directory must be copied in its
entirety to desired location.
Relative paths within directory must
remain unchanged.
User must have write permission
Download