Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker Introduction • Introduction – Background – ADEPT – Problems – Contributions Background • Web is an ever-increasing source of information • Information of interest to user is distributed across multiple heterogeneous sources • Need for integration to provide a one point access for querying ADEPT • Besides querying, use the data sources to extract useful knowledge • Provide an environment for studying domains • Provide means to study and explore complex inter-domain relationships • Ability to pose complex information requests across multiple domains Problems • Diverse and distributed sources • Web sources unlike database – Unstructured or semi-structured – Inconsistencies and information overlapping • Heterogeneities – Semantic – Structural – Syntactic Problems • Representation of complex relationships • Use of Knowledge Model for complex information request capability with embedded semantic information Contribution • • • • Knowledge Model Information Scape Model Learning Paradigm Visual Interfaces Outline • • • • • • • Knowledge Modeling Information Scapes Learning Paradigm Visual Interfaces Related Work Future Work Demo Knowledge Modeling • Approach to source modeling – Global model and source model – Source centric / query centric Source Centric Advantages – Global model independent of source model – Modeling a source is independent of other sources – Dynamic addition, removal and modification of sources – Global view remains unaffected – No source mapping required during information integration – More suitable for sources other than database sources ( web sources) Knowledge Base • Comprises of – Ontologies (Domain model) – Resources – Relationships – Operations Domain Hierarchy Ontology • Standardize meaning, description, representation of involved attributes • Capture the semantics involved via domain characteristics • Allow knowledge sharing and reuse • Resolve resource model differences by mapping them to the global model of the ontology they represent • Global interface Ontology • Description includes – Attributes – Domain Rules – Functional Dependencies Resource • Desirable characteristics: – Add, modify and delete resources for an ontology dynamically without affecting the systems knowledge – Specify the sources in a manner such that one can declaratively query them – Since the number of resources is large there is a need to identify the exact usefulness of resources from the query viewpoint and prune the others Resource • Description includes – Attributes – Binding Patterns – Data Characteristics – Local Completeness Relationships • Simple relationships: – equals, less-than, like, is-a, is-part-of • Are hierarchical or similarity based • Complex relationships – “Earthquakes cause Tsunami”, “Nuclear explosions cause earthquakes”, “Airpollution affects vegetation” Relationships • Characteristics – Involves multiple ontologies – Requires understanding the semantics involved in their interaction – Cannot be expressed by simple relational and logical operators alone – Involves use of complex operations like functions and simulations Relationship • Example – “Nuclear explosion causes Earthquakes” • NuclearTest Causes Earthquake: dateDifference(NuclearTest.eventDate, Earthquake.eventDate)<30 AND distance(NuclearTest.latitude, NuclearTest.longitude, Earthquake,latitude, Earthquake.longitude)<10000 Operations • Functions, Simulations • Functions – user defined – used to model the semantics involved in the relationships – used in post processing of result data – example distance, dateDifference • Simulations – independent programs – used for post processing of result data – example clarke urban growth model Information Scape (Iscape) • Representation of an information request across multiple domains • Can be deployed and executed • Sources not explicitly specified like in a query • System is aware of the sources and is able to identify the useful sources • Semantic correlation across domains is embedded within the information request Information Scape • Definition – An IScape may be defined as information request over distributed heterogeneous sources of information involving multiple ontologies and the relationships between them that contains meta-information constructed to facilitate the bridging of semantic relationships between individual sources. Information Scape • Ontologies • Relationships • Constraint – Conjunctive boolean expression • Runtime configurable constraint – Conceptually different • Grouping and group constraint – Similar to having clause in SQL • Projection list Learning Paradigm • Study of domain • Use IScapes to study the domain interaction by using relationships • Relationships could lead to transitive findings • Explore the hypothetical relationships to validate and establish them or invalidate them Learning Paradigm • Data mining – Age and breast cancer • Relationships – Nuclear Explosion causes Earthquakes • Post processing – Functions – Simulations – Charting tool Learning Paradigm • Find the earliest recorded Nuclear test conducted • Plot a graph of the average number of Earthquakes of magnitude greater than 5.8 per year starting from 1900 • Find the average number of Earthquakes of magnitude greater than 5.8 between 1900-1949 and between 1950-present Learning Paradigm • Find the average number of Earthquakes of magnitude greater than 7 between 1900-1949 and between 1950-present • Find pairs of Nuclear tests and Earthquakes that occurred with a certain radius and a certain time period of the explosion Visual Interfaces • • • • Knowledge Builder IScape Builder Web Interface IScape Processing Monitor Knowledge Builder • GUI to build the knowledge base – fast and easy to use – Manually creating the knowledge could be arduous and error prone • Knowledge is stored in the standard XML format • Abstraction from the underlying format and other technical details Knowledge Builder • Assists in the creation, deletion and modification of the knowledge base • Automatically creates a knowledge tree that assists in relating the knowledge in a better manner Knowledge Builder Knowledge Hierarchy IScape Builder • GUI to create, deploy and execute IScapes in a step by step manner • IScape stored in XML format • User abstraction to the underlying structure • Validity checks implemented • Integrated tools – the charting tool to plot charts with the result data IScape Builder Web Interface • Web accessible – Knowledge Base – Existing Iscapes • Set the runtime configurable constraint • Execute existing IScapes • View the tabulated results • Cannot create new IScapes Web Interface Result Screen IScape Processing Monitor • Color coded log entries describing the IScape processing are generated – Brief message along with agent name – Time stamp – detailed description and associated data, if any – IScape plan for the existing sources – Intermediate results • High level debugging tool – Understand execution, locate failures • Not available with the web interface Monitor GUI Related Work • State of the art – SIMS, TSIMMIS, Information Manifold, Observer, Infosleuth • Mainly focussed on one point access for querying of integrated data of a domain • What makes ADEPT unique – Relationships, IScapes, learning paradigm distinguishes our system from any prior work Future Work • Support rules of type “if-then” and use of induction learning to speed up the processing • Recursive query capability required • IScape over Iscape support required • Simulations currently supported as specialized function in our framework • Statistical analysis tools like SAS for time series analysis, logistic regression