Erich Schweighofer

advertisement
On the Way to Semantic Legal
Knowledge Systems
NII Shonan Meeting Seminar 057
Towards Explanation Production Combining Natural
Language Processing and Logical Reasoning
Shonan-EXPCOLL2014 - November 26-30, 2014
Erich Schweighofer
http://rechtsinformatik.univie.ac.at
Outline
Legal knowledge challenge
 What lawyers have?

◦ 6 views of a legal information system
What lawyers need?
 Semantic legal knowledge system
 Some theory
 Dynamic Electronic Legal Commentary

◦ Main tools

Conclusions
Erich
Schweighofer (2014)
2
Legal knowledge challenge (1)
 Knowledge
is the main production factor for law
◦ Model of a legal system
◦ Huge (gigabytes (GB), millions of documents, about 100,000 rules,
about 300.000 words, more than 10.000 legal concepts,)
◦ All or nothing … every document maybe relevant (no “toy
system”)
◦ Highly relevant networks of documents
◦ Dynamic (daily changes!) – real time information system
◦ Complex (many document types, advanced structure, legal
processes)
 Problem: How
to master the body of knowledge of
the legal order?
◦ Media: papyrus, paper, hard disk, DVD, memory disk, etc.
◦ Representation: rolls, books, journals, DVDs, online services, etc.
Erich
Schweighofer (2014)
3
Legal knowledge challenge (2)
◦ Search: reading, grepping, browsing, retrieving,
knowing
◦ Costs: representation (maintaining a library) and
search (time to find relevant information)
 Options
◦ Brain, notes, files, library, database, retrieval system,
internet, archive, knowledge system
◦ Public service vs. private investment
◦ Better man/machine co-operation
 Goal
◦ Higher efficiency
◦ Lower costs
Erich
Schweighofer (2014)
4
Data, information and knowledge


Concepts of data, information and knowledge are vaguely defined;
different definitions exist
Data: syntactic representation; collection of numbers, characters
and images in a (ICT) digital (binary) character set; everything that is
not computer code
◦ Law: prints of books and journals of a library, source code of
documents in a legal retrieval systems or of web documents

Information: syntactic representation with semantic meaning,
message, output, (sensory) input
◦ Law: laws, judgements, regulations, directives, decisions, facts, advisory
opinions, etc. as structured documents in a printed or electronic text
corpora

Knowledge: what is known; expertise & skills, either as an
abstraction of all available knowledge or a personal capacity acquired
through experience or education
◦ Law: team of highly qualified lawyers, e.g. high courts, law faculties, law
firms, etc. , in the future: legal knowledge systems
©
Erich Schweighofer (2014)
5
EU Law-making and law-implementing
process
Each
step in the
process creates
particular documents.
 COM, EP, Council
document series
+
national level!
+ legal redress
procedures
www.laquadrature.net
What lawyers have … legal text
retrieval(1)

Information retrieval (since 1958!)
◦ Text corpus
◦ Index (dictionary) of all words (without stop words)
◦ Boolean search with proximity operations
◦ Information need has to be represented as a Boolean query
◦ Good query: vocabulary & meta knowledge

Legal Open Data
◦ Official Gazettes
◦ Public legal information systems (e.g. EUR-Lex)
◦ Legal Information Institutes (e.g. AustLII)
◦ High standard
◦ XML: Akomo Ntoso, Legal XML
©
Erich Schweighofer (2014)
7
What lawyers have … legal text
retrieval(2)

Advanced information retrieval
◦ Vector Space Model (Smith, Schweighofer/Winiwarter etc.)
◦ Connectionist IR (Belew/Rose, Merkl/Schweighofer etc.)
◦ Probabilistic IR (Inference Networks) (Croft/Turtle etc.)

E-Discovery
◦ Extraction of relevant information from electronic text corpora
(electronically stored information or ESI)
◦ Pre-trial discovery (USA)
◦ Analysis of unstructured data
◦ Electronic Discovery Reference Model (EDRM)
 http://edrm.net
Erich
Schweighofer (2014)
8
What lawyers have … legal text
retrieval(3)
◦ NIST’s Text REtrieval Conference (TREC)
◦ DESI [(Discovery of Electronically Stored Information)]
Workshop
◦ Keyword search, machine learning, clustering, document
categorisation, predictive coding etc.
 Conrad, E-Discovery revisited: the need for artificial intelligence beyond
information retrieval, AI & Law (2010) 18:321 – 345
Erich
Schweighofer (2014)
9
„Google“ vs. legal search

Best information taken from
the web


Method: information retrieval
+ ranking


Some redundancy
Recall


◦ Should be only sufficient;
original information desired
but not required

Easy vocabulary
◦ All (most) terms exist

Exact references to relevant
norms, court decisions or
literature
Method: Boolean search
(proximity operators)
information retrieval
No or uncontrollable
redundancy
Recall
◦ Should be 100%; original
information required

Difficult vocabulary
◦ Only legal concepts
Status: text-corpus based approach

Text-corpus
◦ Task of LIIs (Legal Information Institutes) or publishers or
official legal information providers to deliver a
comprehensive legal text corpus (multimedia corpus)
◦ Identification and storage of all legal sources
◦ Bibliographic data


Search engine (e.g. Westlaw, LexisNexis, Sino, Oracle)
Meta data
◦ Manual work
 Degree of quality depends on public (e.g. Switzerland) or private
investment (Westlaw)
◦ Automatic generation of meta data
 Citations (AustLII)
Erich
Schweighofer (2014)
11
Erich
Schweighofer (2014)
12
5 views of a legal text corpus (1)
 Qiang Lu and Jack Conrad (2014)

Document view
◦ Text retrieval of a document structure

Annotation view
◦ Meta data
 e.g. key number system (Westlaw)

Citation view
◦ Citing or cited
◦ Passages of text (e.g. paragraph, sentence)
Erich
Schweighofer (2014)
13
5 views of a legal text corpus (2)

User view
◦ Session data
◦ Clicks, reading time, downloads, prints, citation
checks
◦ Subjective perspective
◦ Generalisation of user data (up to 1000 relevant
data sets)

Validity and applicability view
◦ E.g. Keycite on Westlaw Next

World view
◦ Factual situation of a case and its relations to the
legal system
Erich
Schweighofer (2014)
14
Jack
Conrad (2014)
Erich
Schweighofer (2014)
15
Erich
Schweighofer (2014)
16
What lawyers need …

Semantic knowledge system
◦ Structured meta representation of a legal order with rapid
access to a test retrieval system
◦ 6 views: document, annotation, network, user, applicability, facts
◦ Hybrid knowledge model (Schweighofer 1999)

Present: text of a legal commentary/legal handbook (mostly
print, now also electronically available)
◦ Intellectual product of experienced legal writers
◦ Not updated regularly
Why not link these semantic representation techniques to
text corpora and use knowledge acquisition techniques?
 Idea of a Dynamic Electronic Legal Commentary

◦ Schweighofer (Festschrift Seipel 2006, AI & Law 2007)
Erich
Schweighofer (2014)
17
Semantic legal knowledge system (1)

Machine has to do more …
◦ There are too many rules, statutes, court
decisions, administrative decisions, literature
texts, grey materials, soft information pieces …
◦ Retrieval is too difficult in time of some semantic
retrieval by Google (too much training required,
impossible trade-off of legal retrieval)
◦ Finding the document or document part within
millions of documents: ranking problem
◦ Clients do not accept any more that it is so
difficult to know everything in the law; they also
do legal search … with some results
Erich
Schweighofer (2014)
18
Semantic legal knowledge system (2)

New co-operation model
Support
◦ Semantic representation
◦ Meta data
◦ Semi-automated tools of text analysis

Use of excellence of lawyers
◦ Determining relevant parts of a legal decision even if it
changes over time or depends on a particular
jurisdiction or court
◦ Respect and challenge of views of authorities (Haft)
Erich
Schweighofer (2014)
19
Pragmatic approach of legal
knowledge representation (1)


Legal text corpora & file archives
Textual structure
◦ Facts, rules and arguments

Cases
◦ Easy cases (standard cases, eligible for automation), hard
cases (fight for the best legal solution, legal argumentation
skills required), curious cases (legal theory)

Evidence
◦ Easy evidence, hard evidence, automatically generated
evidence, customer-generated evidence, self-collected,
intelligence-based

Some order with logic
◦ John F. Sowa, Knowledge Representation (2000), p. XII
Erich
Schweighofer (2014)
20
Pragmatic approach of legal
knowledge representation (2)



“Without logic, a knowledge representation is vague, with no
criteria for determination whether statements are redundant or
contradictory. Without ontology, the terms and symbols are illdefined, confused, and confusing. And without computable
models, the logic and ontology cannot be implemented in
computer programs. Knowledge representation is the application
of logic and ontology to the task of constructing computable
models for some domain.”
Relations – a better logic model required
Hybrid model
◦ Being helpful in a man/machine co-operation using knowledgebased techniques
◦ Erich Schweighofer, Legal Knowledge Representation (1999)
Erich
Schweighofer (2014)
21
Pragmatic approach of legal
knowledge representation (3)
◦ “Knowledge representation in law is the challenge of how
knowledge and information on legal norms, judgements and
literature can be represented and how relevant information can
be gained for concrete case solutions. This question is at this
time above all pursued as special discipline of legal informatics
where naturally the emphasis is on automated forms.”
◦ Multimedia representation of knowledge pieces
 Facts: text, all kind of things, pictures, videos, intelligent forms,
big data (electronic discovery)
 Rules: text, graphics, visualisations, computer programmes
 Arguments: speeches, submissions, videos, graphics, semantic
argumentation models (Bart Verheij)
Erich
Schweighofer (2014)
22
Theoretically sound? (1)

Standard cases (“easy cases”)
◦ Relevant facts and its legal assessment are well established
 Goal: semantic structure of facts (e.g. picture of a speed violation, a tax
web form)
◦ Legal practice, not yet dominant but coming due to efficiency
concerns
◦ Production systems, first order logic
 E.g. Oracle Business Rule Engine, SPINdle, Java (Joahnnes Scharf)

Hard cases
◦ (Some) logical reasoning is a constituent principle of law (e.g.
basic rules of thinking)
◦ Logic of Aristoteles still relevant
 Theory of the syllogism still in high regard
 Modus Barbara
 No other modi, e.g. Baroco (thanks to famous German logician Lothar
Philipps who died this week)
Erich
Schweighofer (2014)
23
Theoretically sound? (2)
◦ Conceptual structure („Begriffsjurisprudenz“)
 Constant improving important goal of interpretation and dynamic
development of legal system
◦ Wilburg‘s „flexible system“ ("bewegliches System„ (Bydlinski et
al.)
 Interaction of organic co-operative forces in law
◦ Human rights
 Proportionality between goal of action and its and intrusion in other
rights
 Fair and just procedure
◦ More use of legal logic and legal ontologies required
but so far neglected or ignored
◦ Language use highly important as representation of
thoughts of authorities
 Not many rules but established practice (like English language)
Erich
Schweighofer (2014)
24
Dynamic Electronic Legal Commentary (1)
Abstract representation of law in a conceptual & logicalsystematic structure; like printed commentary but in a
machine-useable format
 Legal information system
 Conceptual structure

◦ Description of the world ([possible] facts)
◦ Description of the law ([possible] rules)
The core: links between possible facts (situations) and legal
consequences
 Strong use of knowledge acquisition techniques to ensure a
daily update

◦ Long research practice in legal informatics
 Smith, Schweighofer/Winiwarter/Merkl/Dietenbach, Moens, Daniels,
Brünninghaus, Wyner, Quaresma etc.
Dynamic Electronic Legal Commentary (2)

Challenge
◦ World ontologies have still some way to improve sufficiently,
legal formalisation has to move from small environments to the
real big world

Next step
◦ Tools like a navigator [time and document types, layers of the
legal order, consolidated texts] (e.g. PreLex) , citator or
terminologist; e.g. a semantic representation of the 6 views

Near future
◦ Some automated support for legal subsumption, e.g. helping in
the real game of applying legal provisions (could that also called
legal reasoning or a legal expert system
Tools of a Dynamic Electronic Legal
Commentary
Classification: document categorisation
• Thesaurus: semi-automatic generation of thesaurus descriptors (e.g.
work of Madori Ikeda and Akihito Yamamoto)
• Citations: automatic general of hypertext links
• Temporal relations: automatic generation of temporal relations
• Ranking: document vs. search request, document in the text corpus,
document in the citations network, document in the time line

◦ Use of textual entailment (e.g. work of Bernardo Magnini,Yosuke
Mayao) or Open Information Extraction (e.g. work of Ido Dragan)
• Text summarisation: semi-automatic generation of summaries of
documents
• Multilingualism: automatic translation of documents (e.g. Google
Translate)
 Free text search like in Westlaw, LexisNexis or in the work of Yu
Asano
Erich
Schweighofer (2014)
27
Some formalisation (1)

Legal concept:
◦ Header: Measures of equivalent effect (L)
◦ Definition: Discriminatory and non-discriminatory rules of Member
States hindering trade between Member States are illegal.
◦ Source: Article 34 TFEU, cases C-267/91 Keck and Mithouard, 120/78
Cassis de Dijon, 8/74 Dassonville
Relations: BT customs, measures of equivalent effect (A), freedom of
goods (A)
◦ Classification: 02.40
◦ Legal conceptual structure: customs union, freedom of goods
◦ Other information: none

Fact concept:
◦ Header: Liqueur in Germany (F)
◦ Definition: The minimum amount of alcohol which should exist in
liqueurs was 25% (up to 1978).
◦ Relations: Measures having equivalent effect
Erich
Schweighofer (2014)
28
Some formalisation (2)
◦ Source: DE Brandtweinmonopolgesetz (German liquor
monopoly act)
◦ Classification: 02.40
◦ Legal conceptual structure: customs
◦ Links: Measures having equivalent effect
◦ Other information: none


Anchor (link):
◦ Header: Measures having equivalent effect (A)
◦ Links: Liqueur in Germany (F), selling arrangements (F), Edam
cheese in France (F), vinegar in Italy (F), beer in Germany (F),
resale at a loss (F), advertising restrictions (F), distribution
restrictions (F) , measures having equivalent effect (L), Article 34
TFEU, Article 28 EC, Article 30 ECT, Article 30 EECT
◦ etc.
Erich
Schweighofer (2014)
29
A lot of work to be done



Powerful legal thesaurus (e.g. Switzerland)
Better knowledge model with more logic
Better extraction rules
◦
◦
◦
◦

Probabilistic retrieval techniques not sufficient
Textual entailment
Open Information Extraction
More NLP
Legal authors writing in semantic structure
(e.g. better semantic representations that
can be updated semi-automatically)
Erich
Schweighofer (2014)
30
Conclusions







Example of Big Data research
Move to semantic knowledge systems requires more
logic of text analysis and of knowledge representation
Knowledge model
Knowledge acquisition tool linking text corpora and
knowledge model
Result: some sort of a Dynamic Electronic Legal
Commentary
More research necessary to have a better data basis
of unsolved practical problems
Stronger co-operation between logicians, knowledge
engineering and lawyers required
Erich
Schweighofer (2014)
31
Thank you for your attention!
Erich Schweighofer
University of Vienna
Centre for Computers and Law
Vienna Centre for Legal Informatics
erich.schweighofer@univie.ac.at
http://rechtsinformatik.univie.ac.at
Jusletter IT
http://www.jusletter-it.eu
Erich
Schweighofer (2014)
32
Thank you for your attention!
(2)
JURIX2014, The 27th International Conference
on Legal Knowledge and Information Systems
10-12 December 2014, Jagiellonian University,
Kraków, PL
IRIS International Conference on Legal
Informatics, 26-28 February 2015, Salzburg, AT
 ICAIL 2015, The 15th International Conference
on Artificial Intelligence and Law (ICAIL 2015),
University of San Diego School of Law from
Monday, June 8 to Friday, June 12, 2015, USA
Erich
Schweighofer (2014)
33
Download