The Open World Assumption or Sometimes its nice to know what we don’t know Nick Drummond, Rob Shearer © 2006, The University of Manchester About us ‣ Rob Shearer ‣ Automated reasoning for description logics ‣ Scalability of reasoning to very large ABox fact sets ‣ Integration of relational database technology with DL automated reasoning techniques ‣ UoM Information Management Group ‣ Nick Drummond ‣ OWL and DL knowledge modeling ‣ Ontology authoring tools (Protégé) ‣ UoM Bio-Health Informatics Group © 2006, The University of Manchester 2 Lots of relevant issues... ‣ Data versus knowledge ‣ Formal metadata semantics versus natural language or ad-hoc annotations ‣ Automated reasoning versus customengineered solutions ‣ Model checking versus consistency checking ‣ Open-world versus closed-world interpretation © 2006, The University of Manchester 3 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics © 2006, The University of Manchester 4 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Rob Nick 29 85 © 2006, The University of Manchester 5 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 © 2006, The University of Manchester 6 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 Manchester researchers © 2006, The University of Manchester 7 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 All Manchester researchers © 2006, The University of Manchester 8 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 All Manchester researchers at this workshop © 2006, The University of Manchester 9 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 All Manchester researchers at this workshop who are giving presentations © 2006, The University of Manchester 10 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics Name Age Rob Nick 29 85 Some Manchester researchers © 2006, The University of Manchester 11 Data versus knowledge ‣ Data needs to be interpreted to derive meaning (knowledge) ‣ The same data can be interpreted many different ways with different semantics ‣ Interpretation is often performed through querying and result-set processing ‣ Data is (hopefully!) encoded with respect to a particular interpretation ‣ Even in databases, the encoding interpretation is often not “closed-world” ‣ Lots of KR formalisms for recording metadata © 2006, The University of Manchester 12 Interpretation implementation ‣ Clear definitions of interpretation semantics are Good Things ‣ Formal knowledge representations have wellunderstood semantics and provide unambiguous interpretations ‣ Data can be interpreted with respect to KR metadata by existing general-purpose tools ‣ Custom code is required to interpret naturallanguage or ad-hoc metadata © 2006, The University of Manchester 13 Model checking versus consistency checking ‣ Different tools do different jobs! ‣ Integrity constraint and RDB query answering systems check whether a single model defined by the data satisfies some criteria ‣ “Reasoning systems” consider a space of “possible models” ‣ Facts or instance data are constraints which must be true in all models ‣ Additional semantic restrictions (axioms) constrain the set of models Note that the terms “constraints” and “restrictions” are used ambiguously © 2006, The University of Manchester 14 Open world versus closed world ‣ Open-world interpretation ‣ If fact C is true in every model of KB, then C is a consequence of KB: KB ⊨ C ‣ If C is true in no model of KB, then its negation is true in every model of KB: KB ⊨ ¬C ‣ If a C is true in some models but false in others, neither C nor its negation is a consequence of KB: KB ⊭ C; KB ⊭ ¬C ‣ Closed-world interpretation ‣ If a fact C is not a consequence of KB, assume that its negation is a consequence of KB ‣ KB ⊭ C implies KB ⊨ ¬C © 2006, The University of Manchester 15 In the beginning... ‣ Closed World Systems require a place to put everything ‣ You can’t say anything until there’s somewhere to say it ‣ Slot on a frame, field on an OO class, column in a DB ‣ We state what is possible © 2006, The University of Manchester 16 In the beginning... ‣ When we have an empty OWL ontology, everything is possible ‣ We then constrain an ontology iteratively, making it more restrictive as we go ‣ We state what is not possible Pig → Animal and (hasLimbs only Leg) © 2006, The University of Manchester 17 Negation as Failure (NaF) Animal Can Fly? Penguin No Shark No Hummingbird Yes ‣ Can pigs fly? ‣ In CWA, because the data doesn’t contain this fact, we assume false ‣ In the OWA, unless we have a statement (or we can infer) “pigs can/cannot fly” we return “don’t know” ‣ NaF - only false if “not(pigs can fly)” © 2006, The University of Manchester 18 What is the Semantic Web? ‣ A vision of a computer-understandable web ‣ Distributed knowledge and data in reusable form © 2006, The University of Manchester 19 Semantic Web Languages ‣ On the Semantic Web, we expect people to extend our models ‣ But we don’t want to worry in advance how © 2006, The University of Manchester 20 Incomplete Information ‣ The OWA assumes incomplete information by default ‣ We can intentionally underspecify and allow others to reuse and extend ‣ eg All sharks liveInHabitat some WaterHabitat ‣ Are there fresh/seawater sharks? ‣ Do we care? Someone might ‣ It can be useful to reuse © 2006, The University of Manchester 21 Reuse is good ‣ Be more specific when the application demands it ‣ In OWL, we extend an ontology by adding statements. ie we can not take any away ‣ By only committing to an answer if there is a statement to back it up, OWL remains monotonic ‣ if we extend an ontology, all existing true statements remain true © 2006, The University of Manchester 22 Interpreting Knowledge ‣ ‣ ‣ ‣ Is there a speaker at tea/coffee? Are there going to be biscuits at this meeting? Time Activity Speaker 09:00 Welcome Jessie Kennedy 9:10 Data webs: new visions for research David Shotton 9:40 Closed World Assumption Chris Date 10:25 Open World Assumption Nick Drummond 10:40-11:00 Tea/Coffee 11:00 The Semantic Gap between Databases and Ontologies Catherine Dolbear 11:30 Nullogy Chris Date CWA says “No” OWA says “Don’t know” unless a blank is interpreted as “Activity and not(hasSpeaker)” © 2006, The University of Manchester 23 Interpreting Knowledge ‣ ‣ ‣ I want to treat my patient with a painkiller that is not an anticoagulant Drug Effect Aspirin Painkiller Wharfarin Anticoagulant Paracetemol Painkiller CWA says “Aspirin”, “Paracetemol” OWA can’t say this unless we make explicit “Paracetemol is not an anticoagulant” © 2006, The University of Manchester 24 How do we choose? ‣ Not always clear cut ‣ Many problem domains have aspects of both Open World Problem Closed World Problem Does Nick Drummond know Chris Date? Rob Shearer? Is there a train from Manchester to Edinburgh today? (only x trains and y destinations) Do we bomb/trust this battlefield unit? Find me drugs that are not licensed for Is drug X suitable for treating disease X? (would need closure for each) Y? Has my package been delivered yet? © 2006, The University of Manchester 25 Why the Open World? ‣ Underspecification ‣ abstract, nested and unnamed entities ‣ Easily reusable (and extendable) ‣ Good at knowledge level (Ontology) ‣ Good at “schema”-”schema” mapping ‣ eg asserting/inferring equivalents ‣ They naturally deal with incomplete information ‣ eg Domain knowledge (eg science) - where we don’t know all of the answers yet © 2006, The University of Manchester 26 Why not(Open World)? ‣ Paradigm shift ‣ Involves technology/experience catch up ‣ Some problems are inherently closed world (often those that we ask “which are not...” or have a finite number of elements) ‣ but is possible to close the open world ‣ Dealing with defaults/exceptions ‣ CWA good at dealing with schema-data mapping ‣ integrity constraints, validation (parsing, form generation) ‣ Data structures are typically closed ‣ Meta-query ‣ What do we know??? © 2006, The University of Manchester 27 Conclusion ‣ OWA is good for describing knowledge in a way that is extensible ‣ CWA is good for constraining and validating data ‣ OWA and CWA are different ways of interpreting data and can be used alongside each other ‣ To overcome the difficulties of queries in each, perhaps we need decent explanation support for entailments? © 2006, The University of Manchester 28 Thankyou thanks to Alan Rector, Robert Stevens, Boris Motik, Héctor Pérez-Urbina, Bijan Parsia and BHIG at Manchester © 2006, The University of Manchester Questions? © 2006, The University of Manchester Other Issues ‣ Over/under constraining ‣ SPARQL, RDQL ‣ Single vs Multi model? © 2006, The University of Manchester 31 Terminology note - “Constraints” ‣ Much confusion ‣ Can mean... ‣ Integrity constraints ‣ prevent “incorrect” values from being asserted in a model ‣ used for validation/parsing/data input ‣ single model (usually) that contains only the facts asserted ‣ logical axioms ‣ ‣ ‣ ‣ eg restrictions, property domain/range everything can be true unless proven otherwise multiple possible models can satisfy the axioms this may cause some unintuitive inferences © 2006, The University of Manchester 32 Unique Name Assumption (UNA) ‣ If 2 things have different names (IDs) they are, by default, different ‣ But... Gnashers Teeth Dientes Dents Pearly Whites Zähne Denti © 2006, The University of Manchester 33 Unique Name Assumption ‣ CWA typically makes the UNA ‣ Useful for counting ‣ OWA doesn’t (always) make the UNA ‣ To allow later assertion that two things are the same or different (or this may be inferred) ‣ note: negation is required for distinctness ‣ RDF cannot make assertions about things being different ‣ OWL and many other logics can © 2006, The University of Manchester 34 Closure of Open World ‣ Common or garden closure ‣ disjoints, universals, covering & closure axioms ‣ Domain, Concept, Role closure ‣ K operator, query set subtraction etc ‣ where to handle non-monotonicity ‣ (in query/app level, once only transform?) © 2006, The University of Manchester 35 This talk is not... ‣ OWL vs Databases ‣ Databases deal with HOW data is stored - you can store OWL in databases ‣ OWL is about representing knowledge with machine understandable semantics © 2006, The University of Manchester 36