Semantics Enabled Industrial and Scientific
Applications: Research, Technology and Deployed
Applications
Keynote - the First Online Metadata and Semantics Research
Conference http://www.metadata-semantics.org
Part I: Industrial Applications
November 23, 2005
Amit Sheth, CTO, Semagix Inc http://www.semagix.com
7/1/2016
1
2005 SEMAGIX All rights reserved.
Outline
7/1/2016
2
I will drive the talk with applications. In the process, we will
review underlying processes, technologies and research
challenges.
Part I: Industrial Semantic Technology Applications in Risk
and Compliance
Part II: Health-care Semantic Web Application
Part III: Bioinformatics Semantic Web applications
Part I relates to applications developed for Semagix’s
customers using a technology that commercialized
research at University of Georgia’s LSDIS lab.
Many slides have notes which provide additional material
and pointers to related documents/papers and talks for
further information.
2004 SEMAGIX All rights reserved.
Things to Consider About the Semantic (Web) Technologies
7/1/2016
3
 Build Ontology
• Build Schema (model level representation
• Populate with Knowledgebase (people, location,
organizations, events)
 Automatic Semantic Annotation (Extract Semantic Metadata)
• Any type of document, multiple sources of documents
• Metadata can be stored with or sparely from documents
 Applications: search (ranked list of documents of interest
(semantic search), integrate/portal, summarize/explain,
analyze, make decisions
• Reasoning techniques: graph analysis, inferencing
Types of content/documents
Use of standards
Scalability
Performance
opscenter
2004 SEMAGIX All rights reserved.
Semantic (Web) Technology State of the Art
7/1/2016
4
Ontology-driven Information System Lifecycle
Building a scalable and high performance
system with support for:
Ontology creation and maintenance
Ontology-driven Semantic Metadata
Extraction/Annotation
Utilizing semantic metadata and ontology
Schema
Creation
Analytic
Application
Creation
Semantic search/querying/browsing
Ontology API
MB
BSBQ
Application
Creation
Ontology
Population
Information and application
integration - normalization
Analysis/Mining/Discovery –
relationships
KB
Metadata
Extraction
2004 SEMAGIX All rights reserved.
Types of Ontologies (or things close to ontology)
7/1/2016
5
Upper ontologies: modeling of time, space, process, etc
Broad-based or general purpose ontology/nomenclatures: Cyc, WordNet ;
Domain-specific or Industry specific ontologies

News: politics, sports, business, entertainment (also see TAP and SWETO)

Financial Market

Terrorism

Biology: Open Biomedical Ontologies , GlycO; PropeO

Clinical (See Open Clinical)

GO (nomenclature), NCI (schema), UMLS (knowledgebase), …
Application Specific and Task specific ontologies

Anti-money laundering, NeedToKnow, (Employee or Vendor Whetting)

Equity Research

Repertoire Management
Fundamentally different approaches in developing ontologies:
schema vs populated; community efforts vs reusing knowledge sources
2004 SEMAGIX All rights reserved.
Evolution of Meta Data
7/1/2016
6
More sophisticated semantic
technologies exploit ontologies and
• Provide scalability and flexibility
• Handle all types of data
(unstructured, semi-structured,
structured)
• Create SmartData – enhancing
raw data with context and
relationships
• Accommodate SmartQuerying –
flexible, intelligent querying
• Enable powerful enterprise
decision making
2004 SEMAGIX All rights reserved.
Automatic Semantic Matadata Extraction
from unstructured data
7/1/2016
7
Semagix Semantic Enhancement Engine
[Hammond, Sheth, Kochut 2002]
2004 SEMAGIX All rights reserved.
7/1/2016
8
Semantic
Annotation/
Metadata
Extraction
+
Enhancement
2004 SEMAGIX All rights reserved.
Automatic Semantic Annotation
7/1/2016
COMTEX Tagging
9
Value-added Semagix Semantic Tagging
Content
‘Enhancement’
Rich Semantic
Metatagging
Limited tagging
(mostly syntactic)
Value-added
relevant metatags
added by Semagix
to existing
COMTEX tags:
• Private companies
• Type of company
• Industry affiliation
• Sector
• Exchange
• Company Execs
• Competitors
© Semagix, Inc.
2004 SEMAGIX All rights reserved.
Semagix Freedom Architecture
for building ontology-driven information system
7/1/2016
10
2004 SEMAGIX All rights reserved.
Global Bank
7/1/2016
11
Aim
• Legislation (PATRIOT ACT) requires banks to identify ‘who’ they are doing
business with
Problem
• Volume of internal and external data needed to be accessed
• Complex name matching and disambiguation criteria
• Requirement to ‘risk score’ certain attributes of this data
Approach
• Creation of a ‘risk ontology’ populated from trusted sources (OFAC etc);
Sophisticated entity disambiguation
• Semantic querying, Rules specification & processing
Solution
• Rapid and accurate KYC checks
• Risk scoring of relationships allowing for prioritisation of results
• Full visibility of sources and trustworthiness
2004 SEMAGIX All rights reserved.
The Process
7/1/2016
12
Ahmed Yaseer:
Watch list
• Appears on
Watchlist ‘FBI’
Organization
Hamas
FBI Watchlist
member of organization
• Works for Company
‘WorldCom’
• Member of
organization ‘Hamas’
appears on Watchlist
Ahmed Yaseer
works for Company
WorldCom
Company
2004 SEMAGIX All rights reserved.
Global Investment Bank
7/1/2016
Watch Lists
Law
Enforcement
Regulators
Public
Records
World Wide
Web content
13
BLOGS,
RSS
Semi-structured Government Data Un-structure text, Semi-structured Data
Establishing
New Account
User will be able to navigate
the ontology using a number
of different interfaces
Scores the entity
based on the
content and entity
relationships
Example of
Fraud Prevention
application used in
financial services
2004 SEMAGIX All rights reserved.
Law Enforcement Agency
7/1/2016
14
Aim
• Provision of an overarching intelligence system that provides a unified view of
people and related information
Problem
• Need to create unique entities from across multiple disparate, non-standardised
databases; Requirement to disambiguate ‘dirty’ data
• Need to extract insight from unstructured text
Approach
• Multiple database extractors to disambiguate data and form relevant relationships
• Modelling of behaviours/patterns within very large ontology (6Mn+ entities)
Solution
• Merged and linked case data from multiple sources using effective identification,
disambiguation, and link analysis
• Dynamic annotation of documents
• Single query across multiple datasets
• 360 view of an individual and relevant associations
2004 SEMAGIX All rights reserved.
Profile
Creation
Complex
Querying
7/1/2016
15
Gisondi, white ford expedition, main street, assault, traffic offences
Summary of
Results
Investigation
Free text searching
across aggregated
information sources
2004 SEMAGIX All rights reserved.
Profile
Creation
7/1/2016
16
Complex
Querying
Summary of
Results
Investigation
Unified view of direct and
indirect results that best
match the complex query
and the profile
2004 SEMAGIX All rights reserved.
Profile
Creation
7/1/2016
17
Complex
Querying
Summary of
Results
Investigation
Direct and indirect
Aggregated
Knowledge
relationship scoring
driven by risk
weightings
knowledge from
disparate sources
Annotation of known
entities from within
free text
2004 SEMAGIX All rights reserved.
Technical Capabilities
7/1/2016
18
 Ontology-driven Information Systems
 Ontology Quality and Freshness
trusted knowledge sources, weekly to daily update
 Populated Ontology Size
millions of assertions; sometimes exceeding 10 million
 Data: Type and Amount
structured, semi-structured, unstructured
 Metadata Extraction
Automatic extraction, semantic metadata;
 Computation: query expressiveness (over
metadata and ontology), rules, ranking
 Visualization
 Scalability and Performances
main-memory vs database based
2004 SEMAGIX All rights reserved.
http://www.semagix.com
QUESTIONS?
7/1/2016
19
A relevant article: http://68.236.189.240/article/stoy-20050401-05.html
A relevant conference:
2005 SEMAGIX All rights reserved.
7/1/2016
20
2004 SEMAGIX All rights reserved.
7/1/2016
21
2004 SEMAGIX All rights reserved.
7/1/2016
22
2004 SEMAGIX All rights reserved.