Semantics Enabled Industrial and Scientific Applications: Research, Technology and Deployed Applications Keynote - the First Online Metadata and Semantics Research Conference http://www.metadata-semantics.org Part I: Industrial Applications November 23, 2005 Amit Sheth, CTO, Semagix Inc http://www.semagix.com 7/1/2016 1 2005 SEMAGIX All rights reserved. Outline 7/1/2016 2 I will drive the talk with applications. In the process, we will review underlying processes, technologies and research challenges. Part I: Industrial Semantic Technology Applications in Risk and Compliance Part II: Health-care Semantic Web Application Part III: Bioinformatics Semantic Web applications Part I relates to applications developed for Semagix’s customers using a technology that commercialized research at University of Georgia’s LSDIS lab. Many slides have notes which provide additional material and pointers to related documents/papers and talks for further information. 2004 SEMAGIX All rights reserved. Things to Consider About the Semantic (Web) Technologies 7/1/2016 3 Build Ontology • Build Schema (model level representation • Populate with Knowledgebase (people, location, organizations, events) Automatic Semantic Annotation (Extract Semantic Metadata) • Any type of document, multiple sources of documents • Metadata can be stored with or sparely from documents Applications: search (ranked list of documents of interest (semantic search), integrate/portal, summarize/explain, analyze, make decisions • Reasoning techniques: graph analysis, inferencing Types of content/documents Use of standards Scalability Performance opscenter 2004 SEMAGIX All rights reserved. Semantic (Web) Technology State of the Art 7/1/2016 4 Ontology-driven Information System Lifecycle Building a scalable and high performance system with support for: Ontology creation and maintenance Ontology-driven Semantic Metadata Extraction/Annotation Utilizing semantic metadata and ontology Schema Creation Analytic Application Creation Semantic search/querying/browsing Ontology API MB BSBQ Application Creation Ontology Population Information and application integration - normalization Analysis/Mining/Discovery – relationships KB Metadata Extraction 2004 SEMAGIX All rights reserved. Types of Ontologies (or things close to ontology) 7/1/2016 5 Upper ontologies: modeling of time, space, process, etc Broad-based or general purpose ontology/nomenclatures: Cyc, WordNet ; Domain-specific or Industry specific ontologies News: politics, sports, business, entertainment (also see TAP and SWETO) Financial Market Terrorism Biology: Open Biomedical Ontologies , GlycO; PropeO Clinical (See Open Clinical) GO (nomenclature), NCI (schema), UMLS (knowledgebase), … Application Specific and Task specific ontologies Anti-money laundering, NeedToKnow, (Employee or Vendor Whetting) Equity Research Repertoire Management Fundamentally different approaches in developing ontologies: schema vs populated; community efforts vs reusing knowledge sources 2004 SEMAGIX All rights reserved. Evolution of Meta Data 7/1/2016 6 More sophisticated semantic technologies exploit ontologies and • Provide scalability and flexibility • Handle all types of data (unstructured, semi-structured, structured) • Create SmartData – enhancing raw data with context and relationships • Accommodate SmartQuerying – flexible, intelligent querying • Enable powerful enterprise decision making 2004 SEMAGIX All rights reserved. Automatic Semantic Matadata Extraction from unstructured data 7/1/2016 7 Semagix Semantic Enhancement Engine [Hammond, Sheth, Kochut 2002] 2004 SEMAGIX All rights reserved. 7/1/2016 8 Semantic Annotation/ Metadata Extraction + Enhancement 2004 SEMAGIX All rights reserved. Automatic Semantic Annotation 7/1/2016 COMTEX Tagging 9 Value-added Semagix Semantic Tagging Content ‘Enhancement’ Rich Semantic Metatagging Limited tagging (mostly syntactic) Value-added relevant metatags added by Semagix to existing COMTEX tags: • Private companies • Type of company • Industry affiliation • Sector • Exchange • Company Execs • Competitors © Semagix, Inc. 2004 SEMAGIX All rights reserved. Semagix Freedom Architecture for building ontology-driven information system 7/1/2016 10 2004 SEMAGIX All rights reserved. Global Bank 7/1/2016 11 Aim • Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing business with Problem • Volume of internal and external data needed to be accessed • Complex name matching and disambiguation criteria • Requirement to ‘risk score’ certain attributes of this data Approach • Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc); Sophisticated entity disambiguation • Semantic querying, Rules specification & processing Solution • Rapid and accurate KYC checks • Risk scoring of relationships allowing for prioritisation of results • Full visibility of sources and trustworthiness 2004 SEMAGIX All rights reserved. The Process 7/1/2016 12 Ahmed Yaseer: Watch list • Appears on Watchlist ‘FBI’ Organization Hamas FBI Watchlist member of organization • Works for Company ‘WorldCom’ • Member of organization ‘Hamas’ appears on Watchlist Ahmed Yaseer works for Company WorldCom Company 2004 SEMAGIX All rights reserved. Global Investment Bank 7/1/2016 Watch Lists Law Enforcement Regulators Public Records World Wide Web content 13 BLOGS, RSS Semi-structured Government Data Un-structure text, Semi-structured Data Establishing New Account User will be able to navigate the ontology using a number of different interfaces Scores the entity based on the content and entity relationships Example of Fraud Prevention application used in financial services 2004 SEMAGIX All rights reserved. Law Enforcement Agency 7/1/2016 14 Aim • Provision of an overarching intelligence system that provides a unified view of people and related information Problem • Need to create unique entities from across multiple disparate, non-standardised databases; Requirement to disambiguate ‘dirty’ data • Need to extract insight from unstructured text Approach • Multiple database extractors to disambiguate data and form relevant relationships • Modelling of behaviours/patterns within very large ontology (6Mn+ entities) Solution • Merged and linked case data from multiple sources using effective identification, disambiguation, and link analysis • Dynamic annotation of documents • Single query across multiple datasets • 360 view of an individual and relevant associations 2004 SEMAGIX All rights reserved. Profile Creation Complex Querying 7/1/2016 15 Gisondi, white ford expedition, main street, assault, traffic offences Summary of Results Investigation Free text searching across aggregated information sources 2004 SEMAGIX All rights reserved. Profile Creation 7/1/2016 16 Complex Querying Summary of Results Investigation Unified view of direct and indirect results that best match the complex query and the profile 2004 SEMAGIX All rights reserved. Profile Creation 7/1/2016 17 Complex Querying Summary of Results Investigation Direct and indirect Aggregated Knowledge relationship scoring driven by risk weightings knowledge from disparate sources Annotation of known entities from within free text 2004 SEMAGIX All rights reserved. Technical Capabilities 7/1/2016 18 Ontology-driven Information Systems Ontology Quality and Freshness trusted knowledge sources, weekly to daily update Populated Ontology Size millions of assertions; sometimes exceeding 10 million Data: Type and Amount structured, semi-structured, unstructured Metadata Extraction Automatic extraction, semantic metadata; Computation: query expressiveness (over metadata and ontology), rules, ranking Visualization Scalability and Performances main-memory vs database based 2004 SEMAGIX All rights reserved. http://www.semagix.com QUESTIONS? 7/1/2016 19 A relevant article: http://68.236.189.240/article/stoy-20050401-05.html A relevant conference: 2005 SEMAGIX All rights reserved. 7/1/2016 20 2004 SEMAGIX All rights reserved. 7/1/2016 21 2004 SEMAGIX All rights reserved. 7/1/2016 22 2004 SEMAGIX All rights reserved.