On the Way to Semantic Legal Knowledge Systems NII Shonan Meeting Seminar 057 Towards Explanation Production Combining Natural Language Processing and Logical Reasoning Shonan-EXPCOLL2014 - November 26-30, 2014 Erich Schweighofer http://rechtsinformatik.univie.ac.at Outline Legal knowledge challenge What lawyers have? ◦ 6 views of a legal information system What lawyers need? Semantic legal knowledge system Some theory Dynamic Electronic Legal Commentary ◦ Main tools Conclusions Erich Schweighofer (2014) 2 Legal knowledge challenge (1) Knowledge is the main production factor for law ◦ Model of a legal system ◦ Huge (gigabytes (GB), millions of documents, about 100,000 rules, about 300.000 words, more than 10.000 legal concepts,) ◦ All or nothing … every document maybe relevant (no “toy system”) ◦ Highly relevant networks of documents ◦ Dynamic (daily changes!) – real time information system ◦ Complex (many document types, advanced structure, legal processes) Problem: How to master the body of knowledge of the legal order? ◦ Media: papyrus, paper, hard disk, DVD, memory disk, etc. ◦ Representation: rolls, books, journals, DVDs, online services, etc. Erich Schweighofer (2014) 3 Legal knowledge challenge (2) ◦ Search: reading, grepping, browsing, retrieving, knowing ◦ Costs: representation (maintaining a library) and search (time to find relevant information) Options ◦ Brain, notes, files, library, database, retrieval system, internet, archive, knowledge system ◦ Public service vs. private investment ◦ Better man/machine co-operation Goal ◦ Higher efficiency ◦ Lower costs Erich Schweighofer (2014) 4 Data, information and knowledge Concepts of data, information and knowledge are vaguely defined; different definitions exist Data: syntactic representation; collection of numbers, characters and images in a (ICT) digital (binary) character set; everything that is not computer code ◦ Law: prints of books and journals of a library, source code of documents in a legal retrieval systems or of web documents Information: syntactic representation with semantic meaning, message, output, (sensory) input ◦ Law: laws, judgements, regulations, directives, decisions, facts, advisory opinions, etc. as structured documents in a printed or electronic text corpora Knowledge: what is known; expertise & skills, either as an abstraction of all available knowledge or a personal capacity acquired through experience or education ◦ Law: team of highly qualified lawyers, e.g. high courts, law faculties, law firms, etc. , in the future: legal knowledge systems © Erich Schweighofer (2014) 5 EU Law-making and law-implementing process Each step in the process creates particular documents. COM, EP, Council document series + national level! + legal redress procedures www.laquadrature.net What lawyers have … legal text retrieval(1) Information retrieval (since 1958!) ◦ Text corpus ◦ Index (dictionary) of all words (without stop words) ◦ Boolean search with proximity operations ◦ Information need has to be represented as a Boolean query ◦ Good query: vocabulary & meta knowledge Legal Open Data ◦ Official Gazettes ◦ Public legal information systems (e.g. EUR-Lex) ◦ Legal Information Institutes (e.g. AustLII) ◦ High standard ◦ XML: Akomo Ntoso, Legal XML © Erich Schweighofer (2014) 7 What lawyers have … legal text retrieval(2) Advanced information retrieval ◦ Vector Space Model (Smith, Schweighofer/Winiwarter etc.) ◦ Connectionist IR (Belew/Rose, Merkl/Schweighofer etc.) ◦ Probabilistic IR (Inference Networks) (Croft/Turtle etc.) E-Discovery ◦ Extraction of relevant information from electronic text corpora (electronically stored information or ESI) ◦ Pre-trial discovery (USA) ◦ Analysis of unstructured data ◦ Electronic Discovery Reference Model (EDRM) http://edrm.net Erich Schweighofer (2014) 8 What lawyers have … legal text retrieval(3) ◦ NIST’s Text REtrieval Conference (TREC) ◦ DESI [(Discovery of Electronically Stored Information)] Workshop ◦ Keyword search, machine learning, clustering, document categorisation, predictive coding etc. Conrad, E-Discovery revisited: the need for artificial intelligence beyond information retrieval, AI & Law (2010) 18:321 – 345 Erich Schweighofer (2014) 9 „Google“ vs. legal search Best information taken from the web Method: information retrieval + ranking Some redundancy Recall ◦ Should be only sufficient; original information desired but not required Easy vocabulary ◦ All (most) terms exist Exact references to relevant norms, court decisions or literature Method: Boolean search (proximity operators) information retrieval No or uncontrollable redundancy Recall ◦ Should be 100%; original information required Difficult vocabulary ◦ Only legal concepts Status: text-corpus based approach Text-corpus ◦ Task of LIIs (Legal Information Institutes) or publishers or official legal information providers to deliver a comprehensive legal text corpus (multimedia corpus) ◦ Identification and storage of all legal sources ◦ Bibliographic data Search engine (e.g. Westlaw, LexisNexis, Sino, Oracle) Meta data ◦ Manual work Degree of quality depends on public (e.g. Switzerland) or private investment (Westlaw) ◦ Automatic generation of meta data Citations (AustLII) Erich Schweighofer (2014) 11 Erich Schweighofer (2014) 12 5 views of a legal text corpus (1) Qiang Lu and Jack Conrad (2014) Document view ◦ Text retrieval of a document structure Annotation view ◦ Meta data e.g. key number system (Westlaw) Citation view ◦ Citing or cited ◦ Passages of text (e.g. paragraph, sentence) Erich Schweighofer (2014) 13 5 views of a legal text corpus (2) User view ◦ Session data ◦ Clicks, reading time, downloads, prints, citation checks ◦ Subjective perspective ◦ Generalisation of user data (up to 1000 relevant data sets) Validity and applicability view ◦ E.g. Keycite on Westlaw Next World view ◦ Factual situation of a case and its relations to the legal system Erich Schweighofer (2014) 14 Jack Conrad (2014) Erich Schweighofer (2014) 15 Erich Schweighofer (2014) 16 What lawyers need … Semantic knowledge system ◦ Structured meta representation of a legal order with rapid access to a test retrieval system ◦ 6 views: document, annotation, network, user, applicability, facts ◦ Hybrid knowledge model (Schweighofer 1999) Present: text of a legal commentary/legal handbook (mostly print, now also electronically available) ◦ Intellectual product of experienced legal writers ◦ Not updated regularly Why not link these semantic representation techniques to text corpora and use knowledge acquisition techniques? Idea of a Dynamic Electronic Legal Commentary ◦ Schweighofer (Festschrift Seipel 2006, AI & Law 2007) Erich Schweighofer (2014) 17 Semantic legal knowledge system (1) Machine has to do more … ◦ There are too many rules, statutes, court decisions, administrative decisions, literature texts, grey materials, soft information pieces … ◦ Retrieval is too difficult in time of some semantic retrieval by Google (too much training required, impossible trade-off of legal retrieval) ◦ Finding the document or document part within millions of documents: ranking problem ◦ Clients do not accept any more that it is so difficult to know everything in the law; they also do legal search … with some results Erich Schweighofer (2014) 18 Semantic legal knowledge system (2) New co-operation model Support ◦ Semantic representation ◦ Meta data ◦ Semi-automated tools of text analysis Use of excellence of lawyers ◦ Determining relevant parts of a legal decision even if it changes over time or depends on a particular jurisdiction or court ◦ Respect and challenge of views of authorities (Haft) Erich Schweighofer (2014) 19 Pragmatic approach of legal knowledge representation (1) Legal text corpora & file archives Textual structure ◦ Facts, rules and arguments Cases ◦ Easy cases (standard cases, eligible for automation), hard cases (fight for the best legal solution, legal argumentation skills required), curious cases (legal theory) Evidence ◦ Easy evidence, hard evidence, automatically generated evidence, customer-generated evidence, self-collected, intelligence-based Some order with logic ◦ John F. Sowa, Knowledge Representation (2000), p. XII Erich Schweighofer (2014) 20 Pragmatic approach of legal knowledge representation (2) “Without logic, a knowledge representation is vague, with no criteria for determination whether statements are redundant or contradictory. Without ontology, the terms and symbols are illdefined, confused, and confusing. And without computable models, the logic and ontology cannot be implemented in computer programs. Knowledge representation is the application of logic and ontology to the task of constructing computable models for some domain.” Relations – a better logic model required Hybrid model ◦ Being helpful in a man/machine co-operation using knowledgebased techniques ◦ Erich Schweighofer, Legal Knowledge Representation (1999) Erich Schweighofer (2014) 21 Pragmatic approach of legal knowledge representation (3) ◦ “Knowledge representation in law is the challenge of how knowledge and information on legal norms, judgements and literature can be represented and how relevant information can be gained for concrete case solutions. This question is at this time above all pursued as special discipline of legal informatics where naturally the emphasis is on automated forms.” ◦ Multimedia representation of knowledge pieces Facts: text, all kind of things, pictures, videos, intelligent forms, big data (electronic discovery) Rules: text, graphics, visualisations, computer programmes Arguments: speeches, submissions, videos, graphics, semantic argumentation models (Bart Verheij) Erich Schweighofer (2014) 22 Theoretically sound? (1) Standard cases (“easy cases”) ◦ Relevant facts and its legal assessment are well established Goal: semantic structure of facts (e.g. picture of a speed violation, a tax web form) ◦ Legal practice, not yet dominant but coming due to efficiency concerns ◦ Production systems, first order logic E.g. Oracle Business Rule Engine, SPINdle, Java (Joahnnes Scharf) Hard cases ◦ (Some) logical reasoning is a constituent principle of law (e.g. basic rules of thinking) ◦ Logic of Aristoteles still relevant Theory of the syllogism still in high regard Modus Barbara No other modi, e.g. Baroco (thanks to famous German logician Lothar Philipps who died this week) Erich Schweighofer (2014) 23 Theoretically sound? (2) ◦ Conceptual structure („Begriffsjurisprudenz“) Constant improving important goal of interpretation and dynamic development of legal system ◦ Wilburg‘s „flexible system“ ("bewegliches System„ (Bydlinski et al.) Interaction of organic co-operative forces in law ◦ Human rights Proportionality between goal of action and its and intrusion in other rights Fair and just procedure ◦ More use of legal logic and legal ontologies required but so far neglected or ignored ◦ Language use highly important as representation of thoughts of authorities Not many rules but established practice (like English language) Erich Schweighofer (2014) 24 Dynamic Electronic Legal Commentary (1) Abstract representation of law in a conceptual & logicalsystematic structure; like printed commentary but in a machine-useable format Legal information system Conceptual structure ◦ Description of the world ([possible] facts) ◦ Description of the law ([possible] rules) The core: links between possible facts (situations) and legal consequences Strong use of knowledge acquisition techniques to ensure a daily update ◦ Long research practice in legal informatics Smith, Schweighofer/Winiwarter/Merkl/Dietenbach, Moens, Daniels, Brünninghaus, Wyner, Quaresma etc. Dynamic Electronic Legal Commentary (2) Challenge ◦ World ontologies have still some way to improve sufficiently, legal formalisation has to move from small environments to the real big world Next step ◦ Tools like a navigator [time and document types, layers of the legal order, consolidated texts] (e.g. PreLex) , citator or terminologist; e.g. a semantic representation of the 6 views Near future ◦ Some automated support for legal subsumption, e.g. helping in the real game of applying legal provisions (could that also called legal reasoning or a legal expert system Tools of a Dynamic Electronic Legal Commentary Classification: document categorisation • Thesaurus: semi-automatic generation of thesaurus descriptors (e.g. work of Madori Ikeda and Akihito Yamamoto) • Citations: automatic general of hypertext links • Temporal relations: automatic generation of temporal relations • Ranking: document vs. search request, document in the text corpus, document in the citations network, document in the time line ◦ Use of textual entailment (e.g. work of Bernardo Magnini,Yosuke Mayao) or Open Information Extraction (e.g. work of Ido Dragan) • Text summarisation: semi-automatic generation of summaries of documents • Multilingualism: automatic translation of documents (e.g. Google Translate) Free text search like in Westlaw, LexisNexis or in the work of Yu Asano Erich Schweighofer (2014) 27 Some formalisation (1) Legal concept: ◦ Header: Measures of equivalent effect (L) ◦ Definition: Discriminatory and non-discriminatory rules of Member States hindering trade between Member States are illegal. ◦ Source: Article 34 TFEU, cases C-267/91 Keck and Mithouard, 120/78 Cassis de Dijon, 8/74 Dassonville Relations: BT customs, measures of equivalent effect (A), freedom of goods (A) ◦ Classification: 02.40 ◦ Legal conceptual structure: customs union, freedom of goods ◦ Other information: none Fact concept: ◦ Header: Liqueur in Germany (F) ◦ Definition: The minimum amount of alcohol which should exist in liqueurs was 25% (up to 1978). ◦ Relations: Measures having equivalent effect Erich Schweighofer (2014) 28 Some formalisation (2) ◦ Source: DE Brandtweinmonopolgesetz (German liquor monopoly act) ◦ Classification: 02.40 ◦ Legal conceptual structure: customs ◦ Links: Measures having equivalent effect ◦ Other information: none Anchor (link): ◦ Header: Measures having equivalent effect (A) ◦ Links: Liqueur in Germany (F), selling arrangements (F), Edam cheese in France (F), vinegar in Italy (F), beer in Germany (F), resale at a loss (F), advertising restrictions (F), distribution restrictions (F) , measures having equivalent effect (L), Article 34 TFEU, Article 28 EC, Article 30 ECT, Article 30 EECT ◦ etc. Erich Schweighofer (2014) 29 A lot of work to be done Powerful legal thesaurus (e.g. Switzerland) Better knowledge model with more logic Better extraction rules ◦ ◦ ◦ ◦ Probabilistic retrieval techniques not sufficient Textual entailment Open Information Extraction More NLP Legal authors writing in semantic structure (e.g. better semantic representations that can be updated semi-automatically) Erich Schweighofer (2014) 30 Conclusions Example of Big Data research Move to semantic knowledge systems requires more logic of text analysis and of knowledge representation Knowledge model Knowledge acquisition tool linking text corpora and knowledge model Result: some sort of a Dynamic Electronic Legal Commentary More research necessary to have a better data basis of unsolved practical problems Stronger co-operation between logicians, knowledge engineering and lawyers required Erich Schweighofer (2014) 31 Thank you for your attention! Erich Schweighofer University of Vienna Centre for Computers and Law Vienna Centre for Legal Informatics erich.schweighofer@univie.ac.at http://rechtsinformatik.univie.ac.at Jusletter IT http://www.jusletter-it.eu Erich Schweighofer (2014) 32 Thank you for your attention! (2) JURIX2014, The 27th International Conference on Legal Knowledge and Information Systems 10-12 December 2014, Jagiellonian University, Kraków, PL IRIS International Conference on Legal Informatics, 26-28 February 2015, Salzburg, AT ICAIL 2015, The 15th International Conference on Artificial Intelligence and Law (ICAIL 2015), University of San Diego School of Law from Monday, June 8 to Friday, June 12, 2015, USA Erich Schweighofer (2014) 33