Knowledge representation for information Systems:

advertisement
Knowledge representation for
information Systems:
1. Introduction
“Information modeling is concerned with the construction
of computer-based symbol structures which model some
part of the real world.” (Mylopoulos 98, p. 128)

Application: part of the real world.
Information
base
Application
Denotation
Information
base
Real world extract
Individuals
Atoms
Categories
Denotation
Mappings
…
1
Terms
…
Ontology: The real world extract.
 The organization of the information base
reflects its content – the nature of the
Application.

Not the history of the information base.
The locality principle: Organize information by
subject matter (Brodie).
Compare: indexed structures.
Stream of statements.
2
Knowledge Levels:

Epistemological (knowledge/conceptual)
level:
What an agent knows –
1. “There is at least one smart undergraduate
student enrolled in class-IS3456”
2. “Process P1 ends before process P2
starts”
The knowledge base can be told and asked at
the knowledge level.
Database views – knowledge level information.
Corresponds (denotes) the ontology:
1.
Ontology includes individuals
that can be classified into categories
(undergraduate-student, smart), and
can be related (enrolled-in).
2.
Ontology includes individual
processes that have start and end
points that can be compared.
3

Logical (conceptual) level:
Level of encoding of the knowledge in information
structures / sentences –
“Exists X, smart(X) and
undergraduate-student(X) and
in(class-IS3456, X)”
“class-IS3456 belongs-to
exists(in, and(undergraduate-student,
smart))”
DB-view-constraint: The table of all smart
undergraduates enrolled in class-IS3456 is
not empty.
“before(end(P1), start(P2))
“precedes(P1, P2)
4
 Implementation level:
Physical representation of the sentences of the
logical level in architecture level data structures –
1.
Indexed tables for undergraduates,
smart, and enrolled-in entities.
2.
Class objects for undergraduates
and smart objects and for the enrolled-in
relation.
3.
4.
List encoding for logic sentences.
Graphical encoding of time
intervals.
…
5.
The choice of implementation – impact on
efficiency. Irrelevant to the logical level.
The choice of logical level – impact on
expressive power (and efficiency):

What can be said.

What can NOT be said – represent
incomplete information.
5
Evolution of information models
 Physical information models:
oriented.
Implementation data structures –
Machine
variable names, arrays, records, Btrees.
 Logical information
symbol structures –
models:
tables, sets, relations, tuples.
Relational DB, OO-DB.
Not sensitive to modeling.
6
Abstract
 Conceptual information models: Structure
information in cognitively meaningful ways –
Semantic terms:
Entities, associations, activities, agents,
goals, constraints, time intervals, time
points, events, distances, locations.
Abstraction mechanisms:
 Categorization (classification).
 Generalization.
 Aggregation.
Support:


Psychological grounds.
Engineering grounds – efficient
implementations.
7
Information model – definition:
A collection of formal structures +

Mappings to applications.

Operations:
Management, retrieval, reasoning.

Integrity rules.
The relational model – as an example:
Ontology
Ontology
Entities (finite)
Tuples
Attributes
(mappings)
Attributes-symbols
Value-domains
Domain-symbols
Relations on
entities
Tables
Management:
Tuples: Add, delete, modify
Tables: Relational algebra
Integrity rules: Key, att-domains.
8
LANGUAGE:

Syntax -- Includes ontological commitments.

Semantics.
Language vs. Information base –
ERD as an example:
Visual language
Application
Information base
Entities (finite)
Entity types
Value-domains
Relationship
types
…
.
.
.
Visual language
Many
Single

Visual harder than symbolic – syntax includes
topological considerations.
9
Conceptual modeling vs. Knowledge
representation

Conceptual modeling –
 Abstraction mechanisms.
 Meta-modeling,…
 Ontology investigation.

Knowledge representation, knowledge
agents –

More reasoning –
Tell – Ask operations.
Algebraic algorithms.
Goal based.
Planning.
Logic.

More Philosophy based ontologies:
Temporal.
Common sense.
 Knowledge representation and conceptual
modeling are getting close to each other.
10
History of Conceptual and KR models
1.
Semantic networks:
Start – Quillian 1966: A model for the structure of
human memory.

Ontology:
 Concepts (word senses – Quillian).
 Associations between the concepts.
 Attributes of concepts.
 Some concepts are organized in
hierarchy.
 Attributes and associations are inherited
through hierarchical associations.

Representation (information base):
 Concept elements: Animate, Plant,
University, Robin.
 Associations elements: isa, has, eats,
owned -- defined as binary relations
among the concept elements.
 Attribute elements – associated with
concept elements.
 The isa association element – stands for
the concept hierarchy.
11
The representation is captured by a Labeled
Directed Graph:
 Labeled Nodes – Concept elements.
 Labeled arcs – Associations.
 Tags on nodes – attributes.
 ISA
labeled sub-graph – Concept
hierarchy. Must be a DAG (Directed Acyclic
Graph).

Visual language: visualization of a labeled
directed graph.
Clyde
owner
isa
Robin
isa
Can fly
Bird
isa
Animate
isa
isa
Own1
isa
Fish
Penguin
Can’t fly
Can swim
Can’t fly
isa
has
Name
Ownership
Wings
Used-for
Used-for
isa
Situation
Flying
Referencing
isa
12
isa
Activity
Inference in semantic networks:
Quillian: Spreading activation procedure:
Given: A word pair
“horse food”
Find paths:
 Horse –isa animal –eats food
 Horse –isa anumal –madeOf meat –isa
food
The paths stand for meaning.
Standard Semantic Nets reasoning:
Matching network graphs.
“Find someone that has wings and is the
owner of something”
?-X
has
owner
?-Y
Wings
Match: ?-X  Clyde
?-Y  Own1
13
Drawback of semantic networks:
No ontological commitments!
Only: Concepts, associations, true/false attributes:
 My CAR is white.
 A CAR has 4 wheels.
 A CAR is a sign of status.
“Wild” inference!
Compare with
Databases.
semi-structured
data
–
web
Limitations: no complex statements –
 OR,
 NEGATION,
 CONDITIONS,
 PARAMETERS.
Usage: Very popular in the 70’s.
Much Natural Language Processing (NLP).
14
Object-Oriented Programming
Simula – 1966.
SmallTalk.
Ontology: Classes, objects.
Class – common properties, behaviors.
Subclass hierarchies.
Inheritance.
15
Entity Relationship Data Model
Chen, 1976.
Ontology:
 Entities – organized in types.
 Relationships – relations on entities.
 Value domains.
 Attributes – mappings:
entity/relationship type  Value domain.
 Integrity constraints:
 Keys;
 cardinality constraints.

Extension:
Generalization -- Entity type hierarchy.
Not appropriate:
 Fluids.
 Signals.
 temporal events.
 state changes.
16
Activity Based Ontologies
Ross 1977 –
SADT (Structured Analysis and Design
Technique):
Used for specifying requirements for software
systems.
Ontology: Activities; data.
Hierarchy of activities.
Representation: Visual.
DFD – Information flow within organization.
Ontology: Processes; data; data sources.
Hierarchy of processes.
Harel – Statecharts 1987.
Used for specifying complex systems.
Ontology: Activities, states.
Hierarchy and depth of activities.
Concurrency of states.
17
Conceptual models in Databases

Enrich the relational model:
 Tools for entity modeling.
 Hierarchies of relations.
 Organize conceptual schema by:
Generalization; aggregation; grouping.
 Organize exceptions – generalization
hierarchies.

Object-oriented databases.

Meta-modeling – model the meaning and
structure of an information source.
18
Conceptual modeling in Knowledge
representation

Minsky – Frames (1975).
Structured representation.
Common sense reasoning.
 Combine: Typical structure
common sense inference.

Schank and Abelson – Scripts (1977).
Typical sequences of events.
19
Generic RESTAURANT frame
Specialization-of:
Types:
range:
default:
if-needed:
Business-establishment
(Cafeteria, Seet-Yourself, Wait-to-be-seated)
wait-to-be-seated
IF plastic-orange-counter THEN fast-food,
IF stack-of-trays THEN Cafeteria,
IF wait-for-waitress-sign OR reservations-made
THEN Wait-to-be-seated,
OTHERWISE Seat-yourself.
Location:
range:
if-needed:
an ADDRESS
(Look at the MENU)
Name:
if-needed:
(Look at the MENU)
Food-Style:
range: (Burgers, Chinese, American. Seafood, French)
default:
American
if-added:
(Update Alternatives of RESTAURANT)
Times-of-Operation:
range:
a Time-of-Day
default:
open evenings except Mondays
Payment-Form
range:
(Cash, Credit, Check, Washing-Dishes-Script)
Event-Sequence:
default:
Eat-at-Restaurant Script
Alternatives:
range:
all restaurants with same FoodStyle
if-needed: (Find all Restaurants with the same
FoodStyle)
20

Description logics – CLASSIC 1989.

formalization of semantic nets using
logic.

Inference algorithms.

Structured definitions.
Terminology of definitions (TBOX):
american-assoc-company := and(company,
exists(associate, american))
foreign-assoc-company := and(company,
exists(associate, not american))
allied-company := and(company,
or(american, american-assoc-company))
assoc-company := and(company,
atleast(1, associate))
conglomerate := and(company,
atleast(2, associate))
Assertions of contingent knowledge (ABOX):
foreign-assoc-company(C1)
company(C3)
allied-company(C2)
not american(C3)
associate(C2, C3)
associate(C, C2)
Infer:
and(american-assoc-company, foreign-company) ≤ conglomerate
or(conglomerate(C1), conglomerate(C2))
21
22
Requirement Engineering
Ontologies include:
 Entities.
 Events.
 Constraints. Also, temporal constraints.
 Activities – organized in generalization
hierarchies.
KAOS – 1993: A framework for requirements
modeling:
 Modeling concepts:
 Goals.
 Agents.
 Alternatives.
 Events.
 Actions.
 Existence modalities.
 Agent responsibilities.
 Meta-modeling.
 Extensible modeling framework.
 Methodology for constructing
requirements.
UML (Unified Modeling Language):
23
Integrate object-oriented analysis and design
models.
Data Integration

Needed in complex systems –
multiple data sources:
 Relational DBs.
 Texts.
 Pictures.
 Sound files.
 Scientific data sources.
 Essential in Web based systems.
Such systems are termed:
 Heterogeneous systems.
 Heterogeneity of data.
 Autonomy of data sources
 Distribution.
 Federated (multi) databases.
 Common data model – schema
integration.
Main notion for building integrated systems:
Mapping between data models.
 The mapping must be formal – based
on well defined semantics.
24
Architectures for Data Integration

Mediator and wrapper architectures:
 Wrapper – Defining and restricting access
to a system through an
abstract interface.
 Mediator – forwards queries to data
sources and integrates results:
 Query plan.
 Query rewrite -- mapping.
 Execution plan.
 Common data model (CDM) – wrappers and
mediators are built in (defined by mappings).
 Popular CDM – Semi-structured
data.
 Specification language -- XML
(Extended Markup Language).
25
Download