Knowledge Modeling, use of information sources in the study of relationships

advertisement
Knowledge Modeling, use of
information sources in the study of
domains and inter-domain
relationships
- A Learning Paradigm
by
Sanjeev Thacker
Introduction
• Introduction
– Background
– ADEPT
– Problems
– Contributions
Background
• Web is an ever-increasing
source of information
• Information of interest to user is
distributed across multiple
heterogeneous sources
• Need for integration to provide a
one point access for querying
ADEPT
• Besides querying, use the data
sources to extract useful knowledge
• Provide an environment for studying
domains
• Provide means to study and explore
complex inter-domain relationships
• Ability to pose complex information
requests across multiple domains
Problems
• Diverse and distributed sources
• Web sources unlike database
– Unstructured or semi-structured
– Inconsistencies and information
overlapping
• Heterogeneities
– Semantic
– Structural
– Syntactic
Problems
• Representation of complex
relationships
• Use of Knowledge Model for
complex information request
capability with embedded semantic
information
Contribution
•
•
•
•
Knowledge Model
Information Scape Model
Learning Paradigm
Visual Interfaces
Outline
•
•
•
•
•
•
•
Knowledge Modeling
Information Scapes
Learning Paradigm
Visual Interfaces
Related Work
Future Work
Demo
Knowledge
Modeling
• Approach to source modeling
– Global model and source model
– Source centric / query centric
Source Centric
Advantages
– Global model independent of source
model
– Modeling a source is independent of
other sources
– Dynamic addition, removal and
modification of sources
– Global view remains unaffected
– No source mapping required during
information integration
– More suitable for sources other than
database sources ( web sources)
Knowledge Base
• Comprises of
– Ontologies (Domain model)
– Resources
– Relationships
– Operations
Domain Hierarchy
Ontology
• Standardize meaning, description,
representation of involved attributes
• Capture the semantics involved via
domain characteristics
• Allow knowledge sharing and reuse
• Resolve resource model differences
by mapping them to the global
model of the ontology they represent
• Global interface
Ontology
• Description includes
– Attributes
– Domain Rules
– Functional Dependencies
Resource
• Desirable characteristics:
– Add, modify and delete resources for an
ontology dynamically without affecting
the systems knowledge
– Specify the sources in a manner such
that one can declaratively query them
– Since the number of resources is large
there is a need to identify the exact
usefulness of resources from the query
viewpoint and prune the others
Resource
• Description includes
– Attributes
– Binding Patterns
– Data Characteristics
– Local Completeness
Relationships
• Simple relationships:
– equals, less-than, like, is-a, is-part-of
• Are hierarchical or similarity based
• Complex relationships
– “Earthquakes cause Tsunami”, “Nuclear
explosions cause earthquakes”, “Airpollution affects vegetation”
Relationships
• Characteristics
– Involves multiple ontologies
– Requires understanding the semantics
involved in their interaction
– Cannot be expressed by simple
relational and logical operators alone
– Involves use of complex operations like
functions and simulations
Relationship
• Example
– “Nuclear explosion causes Earthquakes”
• NuclearTest Causes Earthquake:
dateDifference(NuclearTest.eventDate,
Earthquake.eventDate)<30
AND
distance(NuclearTest.latitude,
NuclearTest.longitude,
Earthquake,latitude,
Earthquake.longitude)<10000
Operations
• Functions, Simulations
• Functions
– user defined
– used to model the semantics
involved in the relationships
– used in post processing of result data
– example distance, dateDifference
• Simulations
– independent programs
– used for post processing of result data
– example clarke urban growth model
Information Scape
(Iscape)
• Representation of an information
request across multiple domains
• Can be deployed and executed
• Sources not explicitly specified like in
a query
• System is aware of the sources and
is able to identify the useful sources
• Semantic correlation across domains
is embedded within the information
request
Information Scape
• Definition
– An IScape may be defined as
information request over distributed
heterogeneous sources of information
involving multiple ontologies and the
relationships between them that
contains meta-information constructed
to facilitate the bridging of semantic
relationships between individual
sources.
Information Scape
• Ontologies
• Relationships
• Constraint
– Conjunctive boolean expression
• Runtime configurable constraint
– Conceptually different
• Grouping and group constraint
– Similar to having clause in SQL
• Projection list
Learning Paradigm
• Study of domain
• Use IScapes to study the domain
interaction by using relationships
• Relationships could lead to transitive
findings
• Explore the hypothetical
relationships to validate and
establish them or invalidate them
Learning Paradigm
• Data mining
– Age and breast cancer
• Relationships
– Nuclear Explosion causes Earthquakes
• Post processing
– Functions
– Simulations
– Charting tool
Learning Paradigm
• Find the earliest recorded Nuclear
test conducted
• Plot a graph of the average number
of Earthquakes of magnitude greater
than 5.8 per year starting from 1900
• Find the average number of
Earthquakes of magnitude greater
than 5.8 between 1900-1949 and
between 1950-present
Learning Paradigm
• Find the average number of
Earthquakes of magnitude greater
than 7 between 1900-1949 and
between 1950-present
• Find pairs of Nuclear tests and
Earthquakes that occurred with a
certain radius and a certain time
period of the explosion
Visual Interfaces
•
•
•
•
Knowledge Builder
IScape Builder
Web Interface
IScape Processing Monitor
Knowledge Builder
• GUI to build the knowledge base
– fast and easy to use
– Manually creating the knowledge could
be arduous and error prone
• Knowledge is stored in the standard
XML format
• Abstraction from the underlying
format and other technical details
Knowledge Builder
• Assists in the creation, deletion
and modification of the
knowledge base
• Automatically creates a
knowledge tree that assists in
relating the knowledge in a
better manner
Knowledge Builder
Knowledge
Hierarchy
IScape Builder
• GUI to create, deploy and execute
IScapes in a step by step manner
• IScape stored in XML format
• User abstraction to the underlying
structure
• Validity checks implemented
• Integrated tools
– the charting tool to plot charts with the
result data
IScape Builder
Web Interface
• Web accessible
– Knowledge Base
– Existing Iscapes
• Set the runtime configurable
constraint
• Execute existing IScapes
• View the tabulated results
• Cannot create new IScapes
Web Interface
Result Screen
IScape Processing
Monitor
• Color coded log entries describing the
IScape processing are generated
– Brief message along with agent name
– Time stamp
– detailed description and associated
data, if any
– IScape plan for the existing sources
– Intermediate results
• High level debugging tool
– Understand execution, locate failures
• Not available with the web interface
Monitor GUI
Related Work
• State of the art
– SIMS, TSIMMIS, Information Manifold,
Observer, Infosleuth
• Mainly focussed on one point
access for querying of integrated
data of a domain
• What makes ADEPT unique
– Relationships, IScapes, learning
paradigm distinguishes our system
from any prior work
Future Work
• Support rules of type “if-then” and
use of induction learning to speed up
the processing
• Recursive query capability required
• IScape over Iscape support required
• Simulations currently supported as
specialized function in our framework
• Statistical analysis tools like SAS for
time series analysis, logistic
regression
Download