Knowledge representation for information Systems: 1. Introduction “Information modeling is concerned with the construction of computer-based symbol structures which model some part of the real world.” (Mylopoulos 98, p. 128) Application: part of the real world. Information base Application Denotation Information base Real world extract Individuals Atoms Categories Denotation Mappings … 1 Terms … Ontology: The real world extract. The organization of the information base reflects its content – the nature of the Application. Not the history of the information base. The locality principle: Organize information by subject matter (Brodie). Compare: indexed structures. Stream of statements. 2 Knowledge Levels: Epistemological (knowledge/conceptual) level: What an agent knows – 1. “There is at least one smart undergraduate student enrolled in class-IS3456” 2. “Process P1 ends before process P2 starts” The knowledge base can be told and asked at the knowledge level. Database views – knowledge level information. Corresponds (denotes) the ontology: 1. Ontology includes individuals that can be classified into categories (undergraduate-student, smart), and can be related (enrolled-in). 2. Ontology includes individual processes that have start and end points that can be compared. 3 Logical (conceptual) level: Level of encoding of the knowledge in information structures / sentences – “Exists X, smart(X) and undergraduate-student(X) and in(class-IS3456, X)” “class-IS3456 belongs-to exists(in, and(undergraduate-student, smart))” DB-view-constraint: The table of all smart undergraduates enrolled in class-IS3456 is not empty. “before(end(P1), start(P2)) “precedes(P1, P2) 4 Implementation level: Physical representation of the sentences of the logical level in architecture level data structures – 1. Indexed tables for undergraduates, smart, and enrolled-in entities. 2. Class objects for undergraduates and smart objects and for the enrolled-in relation. 3. 4. List encoding for logic sentences. Graphical encoding of time intervals. … 5. The choice of implementation – impact on efficiency. Irrelevant to the logical level. The choice of logical level – impact on expressive power (and efficiency): What can be said. What can NOT be said – represent incomplete information. 5 Evolution of information models Physical information models: oriented. Implementation data structures – Machine variable names, arrays, records, Btrees. Logical information symbol structures – models: tables, sets, relations, tuples. Relational DB, OO-DB. Not sensitive to modeling. 6 Abstract Conceptual information models: Structure information in cognitively meaningful ways – Semantic terms: Entities, associations, activities, agents, goals, constraints, time intervals, time points, events, distances, locations. Abstraction mechanisms: Categorization (classification). Generalization. Aggregation. Support: Psychological grounds. Engineering grounds – efficient implementations. 7 Information model – definition: A collection of formal structures + Mappings to applications. Operations: Management, retrieval, reasoning. Integrity rules. The relational model – as an example: Ontology Ontology Entities (finite) Tuples Attributes (mappings) Attributes-symbols Value-domains Domain-symbols Relations on entities Tables Management: Tuples: Add, delete, modify Tables: Relational algebra Integrity rules: Key, att-domains. 8 LANGUAGE: Syntax -- Includes ontological commitments. Semantics. Language vs. Information base – ERD as an example: Visual language Application Information base Entities (finite) Entity types Value-domains Relationship types … . . . Visual language Many Single Visual harder than symbolic – syntax includes topological considerations. 9 Conceptual modeling vs. Knowledge representation Conceptual modeling – Abstraction mechanisms. Meta-modeling,… Ontology investigation. Knowledge representation, knowledge agents – More reasoning – Tell – Ask operations. Algebraic algorithms. Goal based. Planning. Logic. More Philosophy based ontologies: Temporal. Common sense. Knowledge representation and conceptual modeling are getting close to each other. 10 History of Conceptual and KR models 1. Semantic networks: Start – Quillian 1966: A model for the structure of human memory. Ontology: Concepts (word senses – Quillian). Associations between the concepts. Attributes of concepts. Some concepts are organized in hierarchy. Attributes and associations are inherited through hierarchical associations. Representation (information base): Concept elements: Animate, Plant, University, Robin. Associations elements: isa, has, eats, owned -- defined as binary relations among the concept elements. Attribute elements – associated with concept elements. The isa association element – stands for the concept hierarchy. 11 The representation is captured by a Labeled Directed Graph: Labeled Nodes – Concept elements. Labeled arcs – Associations. Tags on nodes – attributes. ISA labeled sub-graph – Concept hierarchy. Must be a DAG (Directed Acyclic Graph). Visual language: visualization of a labeled directed graph. Clyde owner isa Robin isa Can fly Bird isa Animate isa isa Own1 isa Fish Penguin Can’t fly Can swim Can’t fly isa has Name Ownership Wings Used-for Used-for isa Situation Flying Referencing isa 12 isa Activity Inference in semantic networks: Quillian: Spreading activation procedure: Given: A word pair “horse food” Find paths: Horse –isa animal –eats food Horse –isa anumal –madeOf meat –isa food The paths stand for meaning. Standard Semantic Nets reasoning: Matching network graphs. “Find someone that has wings and is the owner of something” ?-X has owner ?-Y Wings Match: ?-X Clyde ?-Y Own1 13 Drawback of semantic networks: No ontological commitments! Only: Concepts, associations, true/false attributes: My CAR is white. A CAR has 4 wheels. A CAR is a sign of status. “Wild” inference! Compare with Databases. semi-structured data – web Limitations: no complex statements – OR, NEGATION, CONDITIONS, PARAMETERS. Usage: Very popular in the 70’s. Much Natural Language Processing (NLP). 14 Object-Oriented Programming Simula – 1966. SmallTalk. Ontology: Classes, objects. Class – common properties, behaviors. Subclass hierarchies. Inheritance. 15 Entity Relationship Data Model Chen, 1976. Ontology: Entities – organized in types. Relationships – relations on entities. Value domains. Attributes – mappings: entity/relationship type Value domain. Integrity constraints: Keys; cardinality constraints. Extension: Generalization -- Entity type hierarchy. Not appropriate: Fluids. Signals. temporal events. state changes. 16 Activity Based Ontologies Ross 1977 – SADT (Structured Analysis and Design Technique): Used for specifying requirements for software systems. Ontology: Activities; data. Hierarchy of activities. Representation: Visual. DFD – Information flow within organization. Ontology: Processes; data; data sources. Hierarchy of processes. Harel – Statecharts 1987. Used for specifying complex systems. Ontology: Activities, states. Hierarchy and depth of activities. Concurrency of states. 17 Conceptual models in Databases Enrich the relational model: Tools for entity modeling. Hierarchies of relations. Organize conceptual schema by: Generalization; aggregation; grouping. Organize exceptions – generalization hierarchies. Object-oriented databases. Meta-modeling – model the meaning and structure of an information source. 18 Conceptual modeling in Knowledge representation Minsky – Frames (1975). Structured representation. Common sense reasoning. Combine: Typical structure common sense inference. Schank and Abelson – Scripts (1977). Typical sequences of events. 19 Generic RESTAURANT frame Specialization-of: Types: range: default: if-needed: Business-establishment (Cafeteria, Seet-Yourself, Wait-to-be-seated) wait-to-be-seated IF plastic-orange-counter THEN fast-food, IF stack-of-trays THEN Cafeteria, IF wait-for-waitress-sign OR reservations-made THEN Wait-to-be-seated, OTHERWISE Seat-yourself. Location: range: if-needed: an ADDRESS (Look at the MENU) Name: if-needed: (Look at the MENU) Food-Style: range: (Burgers, Chinese, American. Seafood, French) default: American if-added: (Update Alternatives of RESTAURANT) Times-of-Operation: range: a Time-of-Day default: open evenings except Mondays Payment-Form range: (Cash, Credit, Check, Washing-Dishes-Script) Event-Sequence: default: Eat-at-Restaurant Script Alternatives: range: all restaurants with same FoodStyle if-needed: (Find all Restaurants with the same FoodStyle) 20 Description logics – CLASSIC 1989. formalization of semantic nets using logic. Inference algorithms. Structured definitions. Terminology of definitions (TBOX): american-assoc-company := and(company, exists(associate, american)) foreign-assoc-company := and(company, exists(associate, not american)) allied-company := and(company, or(american, american-assoc-company)) assoc-company := and(company, atleast(1, associate)) conglomerate := and(company, atleast(2, associate)) Assertions of contingent knowledge (ABOX): foreign-assoc-company(C1) company(C3) allied-company(C2) not american(C3) associate(C2, C3) associate(C, C2) Infer: and(american-assoc-company, foreign-company) ≤ conglomerate or(conglomerate(C1), conglomerate(C2)) 21 22 Requirement Engineering Ontologies include: Entities. Events. Constraints. Also, temporal constraints. Activities – organized in generalization hierarchies. KAOS – 1993: A framework for requirements modeling: Modeling concepts: Goals. Agents. Alternatives. Events. Actions. Existence modalities. Agent responsibilities. Meta-modeling. Extensible modeling framework. Methodology for constructing requirements. UML (Unified Modeling Language): 23 Integrate object-oriented analysis and design models. Data Integration Needed in complex systems – multiple data sources: Relational DBs. Texts. Pictures. Sound files. Scientific data sources. Essential in Web based systems. Such systems are termed: Heterogeneous systems. Heterogeneity of data. Autonomy of data sources Distribution. Federated (multi) databases. Common data model – schema integration. Main notion for building integrated systems: Mapping between data models. The mapping must be formal – based on well defined semantics. 24 Architectures for Data Integration Mediator and wrapper architectures: Wrapper – Defining and restricting access to a system through an abstract interface. Mediator – forwards queries to data sources and integrates results: Query plan. Query rewrite -- mapping. Execution plan. Common data model (CDM) – wrappers and mediators are built in (defined by mappings). Popular CDM – Semi-structured data. Specification language -- XML (Extended Markup Language). 25