Presentation on Metadata

advertisement
Metadata, Enabling techniques and
technologies
A CSCI8350 lecture
Amit P. Sheth
Metadata, Enabling techniques and technologies
• What is Metadata ?
• Metadata Descriptions and Standards
• Metadata Storage/Exchange/Infrastructure
• (Automated) Metadata
Creation/Extraction/Tagging
• Metadata Usage/Applications
What is Metadata?
• Data about data
– Statements, contexts
– Recursive – data about “data about data”
• Applications
– Content management
– Cataloguing
– Information retrieval, search
–…
"A Web content repository without metadata is like a library without an
index," - Jack Jia, IWOV
Information Interoperability:
key metadata objective and benefit
• System
• Syntax
• Structure
• Semantics
Protocols Metadata Domain Modeling,
Ontologies
A continuum – from data to knowledge
Types of Metadata for digital media
• Media type-specific metadata
– eg.,texture of images,font size…
• Media processing-specific metadata
– eg.,search, retrieval, personalized filtering
• Content Specific metadata
– eg.,rocket related video and documents
Dublin Core Metadata Initiative
• Simple element set designed for resource
description
• International, inter-discipline, W3C community
consensus
• “Semantic” interface among resource description
communities (very limited form of semantics)
Source:www.desire.org
Dublin Core RDF
<xml>
<?namespace href = "http://w3.org/rdf-schema" as = "RDF">
<?namespace href = "http://metadata.net/DC" as = "DC">
<RDF:Abbreviated>
<RDF:Assertion RDF:HREF = http://www.mysite.com/mydoc.html
DC:Title = "I've Never Metadata I've Never Liked“
DC:Creator = "Mary Crystal“
DC:Subject = "Metadata, Dublin Core, Stuff"/>
</RDF:Abbreviated>
</xml>
Metadata for Digital Data
Metadata
Data Type
Metadata Type
Q-Features [Jain and Hampapur]
Image, Video
Domain Specific
R-Features [Jain and Hampapur]
Image, Video
Domain Independent
Meta-Features [Jain and Hampapur]
Image, Video
Content Independent
Impression Vector [Kiyoki et al.]
Image
Content Descriptive
NDVI, Spatial Registration [Anderson and Stonebraker]
Image
Domain Specific
Speech Feature Index [Glavitsch et al.]
Audio
Direct Content Based
Topic Change Indices [Chen et al.]
Audio
Direct Content Based
Document Vectors [ Deerwester et al.]
Text
Direct Content Based
Inverted Indices [Kahle and Medlar]
Content Classification Metadata [Bohm and Rakow]
Text
MultiMedia
Direct Content Based
Domain Specific
Document Composition Metadata [Bohm and Rakow]
MultiMedia
Domain Independent
Metadata Templates [Ordille and Miller]
Media Independent
Domain Specific
Land Cover, Relief [Sheth and Kashyap]
Parent Child Relationships [Shklar et al.]
Media Independent
Text
Domain Specific
Domain Independent
Contexts [Sciore et al., Kashyap and Sheth]
Structured
Domain Specific
Concepts from Cyc [Collet et al.]
Structured
Domain Specific
User’s Data Attributes [Shoens et al.]
Domain Specific Ontologies [Mena et al.]
Text, Structured
Media Independent
Domain Specific
Domain Specific
Sheth, Klas: Multimedia Data Management 1998
Multiple heterogeneous metadata models with different tag
names for the same data in the same GIS domain
Kansas State
FGDC Metadata Model
UDK Metadata Model
Theme keywords: digital line graph,
hydrography, transportation...
Search terms: digital line graph,
hydrography, transportation...
Title: Dakota Aquifer
Topic: Dakota Aquifer
Online linkage:
http://gisdasc.kgs.ukans.edu/dasc/
Adress Id:
http://gisdasc.kgs.ukans.edu/dasc/
Direct Spatial Reference Method: Vector
Measuring Techniques: Vector
Horizontal Coordinate System Definition:
Universal Transverse Mercator
Co-ordinate System:
Universal Transverse Mercator
… … … ...
… … … ...
Different views of Metadata
Domain Independent Specifications (RDF)
Frameworks/Infrastructures (XCM)
Application Specific
ICE
Media Specific
Metadata
Domain Specific
NewsML, FGDC/UDK
MPEG7, VoiceXML
Creating and Serving Metadata to
Power the Life-cycle of Content
Taalee Infrastructure Services
Taalee Content Applications
Produce
Aggregate
Catalog/
Index
Integrate
Syndicate
Personalize
Interactive
Marketing
Where is the
content?
Whose is it?
What is this
content
about?
What other
content is it
related to?
What is the right
content for this
user?
What is the
best way to
monetize this
interaction?
Taalee Semantic MetaBase
Broadcast,
Wireline,
Wireless,
Interactive TV
Types of Specs and Standards
(or MetaModels)
• Domain Independent: (MCF), RDF, MOF, DublinCore
• Media Specific: MPEG4, MPEG7, VoiceXML
• Domain/Industry Specific (metamodels): MARC (Library), FGDC
and UDK (Geographic), NewsML (News), PRISM (Publishing)
• Application Specific: ICE (Syndication)
• Exchange/Sharing: XCM, XMI
• Orthogonal/(Other): RDFS, namespaces, conceptual models
(UML), ontologies (OWL),
what RDF can do for metadata ?
• Designed to impose structural constraint on syntax to support
consistent encoding, exchange and processing of metadata.
• Domain Independent Metadata standard.
Metadata extraction from heterogeneous content/data
WWW, Enterprise
Repositories
Nexis
UPI
AP
Feeds/
Documents
Digital Videos
...
...
Data Stores
Digital Maps
...
Digital Images
Create/extract as much (semantics)
metadata automatically as possible, from:
Any format (HTML, XML, RDB, text, docs)
Many media
Push, pull
Proprietary, Deep Web, Open Source
Digital Audios
EXTRACTORS
METADATA
Alternatives for Metadata Extraction
Statistical methods/Cluster Analysis
Learning/AI and Collab. Filtering
Word or Phrase
Reference data/Concept-terms/
Dictionary/Thesaurus
By topic/industry/subject/domain
Ontologies/Domain Models
deeper
understanding
KnowledgeBase
By Entities and Relationships
Extracting a Text Document:
Syntactic approach
INCIDENT MANAGEMENT SITUATION REPORT
Friday August 1, 1997 - 0530 MDT
LAYOUT
NATIONAL PREPAREDNESS LEVEL II
CURRENT SITUATION: Alaska continues to experience large fire activity. Additional fires have been
staffed for structure protection.
SIMELS, Galena District, BLM. This fire is on the east side of the Innoko Flats, between Galena and McGr
The fore is active on the southern perimeter, which is burning into a continuous stand of black spruce. The
fire has increased in size, but was not mapped due to thick smoke. The slopover on the eastern perimeter is
35% contained, while protection of the historic cabit continues.
Date => day month int ‘,’ int
CHINIKLIK MOUNTAIN, Galena District, BLM. A Type II Incident Management Team (Wehking) is
assigned to the Chiniklik fire. The fire is contained. Major areas of heat have been mopped up. The fire is
contained. Major areas of heat have been mopped-up. All crews and overhead will mop-up where the fire
burned beyond the meadows. No flare-ups occurred today. Demobilization is planned for this weekend,
depending on the results of infrared scanning.
Organizing Information:
Automatic Classification
Traditional Text
Categorization
Customer
Training
Set
Statistical/AI
Techniques
Classify
Place in
a taxonomy
Routing/Distribution
Customer
Article Feed
4715
7/1/2016
Classification of
Article 4715
Standard Metadata
Feed Source: iSyndicate
Posted Date: 11/20/2000
Taalee’s Categorization & Automatic Metadata Creation
Knowledge-base &
Statistical/AI Techniques
Taalee
Training
Set
Classify
Place in
a taxonomy
Catalog
Metadata
Automated Content
Enrichment (ACE)
FTE
Article 4715 Metadata
Standard
metadata
Customer
Training
Set
Semantic
metadata
Feed Source: iSyndicate
Posted Date: 11/20/2000
Company Name: France Telecom,
Equant
Ticker Symbol: FTE, ENT
Exchange: NYSE
Topic: Company News
Company Analysis
Conference Calls
Earnings
Stock Analysis
ENT
Company Analysis
Conference Calls
Earnings
Stock Analysis
NYSE
Member Companies
Market News
IPOs
Classification
of Article 4715
Article Feed
4715
Taalee Enterprise
Content Manager
Customization Suite
Precise
syndication/filtering
Routing/Distribution
Map to another taxonomy
Automatic Categorization & Metadata Tagging (Taalee, Inc.)
Video Segment
with Associated Text
ABSOLUTE CONTROL OF THE SENATE IS
STILL IN QUESTION. AS OF TONIGHT, THE
REPUBLICANS HAVE 50 SENATE SEATS AND
THE DEMOCRATS 49. IN WASHINGTON STATE,
THE SENATE RACE REMAINS TOO CLOSE TO
CALL. IF THE DEMOCRATIC CHALLENGER
UNSEATS THE REPUBLICAN IUMBENT THE
SENATE WILL BE EVENLY DIVIDED. IN
MISSOURI, REPUBLICAN SENATOR JOHN
ASHCROFT SAYS HE WILL NOT CHALLENGE
HIS LOSS TO GOVERNOR MEL CARNAHAN
WHO DIED IN A CRASH THREE WEEKS AGO.
GOVERNOR CARNAHAN'S WIFE IS EXPECTED
TO TAKE HIS PLACE. IN THE HIGHEST PROFILE
SENATE EVENT OF THE NIGHT, HILLARY
CLINTON WON THE NEW YORK SENATE SEAT.
SHE IS THE FIRST FIRST LADY TO RUN MUCH
LESS WIN.
Segment Description
Auto
Categorization
Semantic
Metadata
Taalee Inc, 2000
Automatic Categorization & Metadata
Tagging (Web page)
Video with
Editorialized
Text on the Web
Auto
Categorization
Semantic Metadata
Taalee Inc, 2000
Automatic Categorization & Metadata
Tagging (Feed)
Text
From
Bllomberg
Auto
Categorization
Semantic Metadata
Taalee Inc, 2000
Taalee Extraction and Knowledgebase Enhancement
Web Page
Enhanced Metadata Asset
Extraction
Agent
Taalee, Inc.
1999-2002
Sheth et al, 2002 Managing Semantic Content for the Web
Semantic Enhancement Server
Semantic Enhancement
Server: Semantic Enhancement
Server classifies content into the
appropriate topic/category (if not
already pre-classified), and
subsequently performs entity
extraction and content
enhancement with semantic
metadata from the Semagix
Freedom Ontology
How does it work?
• Uses a hybrid of statistical,
machine learning and
knowledge-base techniques for
classification
• Not only classifies, but also
enhances semantic metadata
with associated domain
knowledge
© Semagix, Inc.
Ambiguity Resolution during Metadata Extraction from content text
Document
----------------
Ontology
lookup
Entity
Candidate
SES
Find Entity Candidates in the document:
 Names and Synonyms
 Common variations (Jr, Sr, III, PLC, .com, etc.)
...
Note: Entity Candidates can be restricted to a relevant subset of ontology
Resolve ambiguities for the entity using any/all of
No
Multiple matches
found during
entity lookup?
Yes
these criteria:
 Direct/Indirect relationships with other entities found
 Proximity analysis of related entities
 Entity refinement using subset analysis (‘Doe’ vs. ‘John Doe’)
ambiguity resolved
 List relationships between identified entities in same document (optional in output)
 List relationship trails e.g.
 CompExec  position  CompanyName
 Politician  party  country  watchList
Overcoming the key issue of resolving ambiguities in facts & evidence
• Aggregation and normalization of any type of fact and evidence
into the domain ontology
–
Resolution of issues over terminology
• i.e. “Benefit number” is an alias of “SSN”
–
Resolution of issues over identity
• i.e. is executive “Larry Levy” an existing entity or a
new entity?
–
Enabling decisions to be made on the trustworthiness of
existing facts
• Which source did the data originate from?
• How much supporting evidence was there?
–
Validating and enforcing constraints, e.g. cardinality
• President of the United States (has cardinality) =
Single
• Terrorist (has cardinality) = Multiple
Overcoming the key issue of resolving ambiguities in facts & evidence
(Contd…)
• Managing temporal aspects of the domain
–
Expiration of entity instances
–
E.g., “Hillary Clinton” is no longer the First Lady of the United
States but was until “May 3rd 2001”
• Providing auditing capabilities
–
Stamping evidence with date, time and source
–
E.g., Terrorist: “Seamus Monaghan”; date extracted: “2003-0130; time extracted: 16:45:27; source; FBI Watch list
• Ontological relationships makes for more expressive model and
provide better semantic description (compared to taxonomies)
–
Information can be presented in natural language format
–
E.g., “Bob Scott” is a founder member of business entity “AIX
LLP” that has traded in “Iran” that is on “FATF watch-list”
Example Scenario 1
Sample content text
Have you ever been to Athens?
How about Japan?
Ontology Matches:
- A: Athens[, Greece, Europe ]
- B: Athens[, Georgia, United States of America, North America ]
- C: Athens[, Ohio, United States of America, North America ]
- D: Athens[, Tennessee, United States of America, North America ]
-E: Japan[, Asia]
Scores:
A, B, C, D and E all scored equally – hence no ambiguity resolution possible
Example Scenario 2
Sample content text
Have you ever been to Athens?
Or anywhere else in Georgia?
How about Japan?
Ontology Matches:
- A: Athens[, Greece, Europe ]
- B: Athens[, Georgia, United States of America, North America ]
- C: Athens[, Ohio, United States of America, North America ]
- D: Athens[, Tennessee, United States of America, North America ]
- E: Georgia[, Asia ]
- F: Georgia[, United States of America, North America ]
- G: Georgia On My Mind, Inc.
-H: Japan[, Asia]
Scores:
B and F scored highest because of exact text match and relationship
Result:
Entity Ambiguity Resolved
Automatic Semantic Annotation of Text:
Entity and Relationship Extraction
KB, statistical
and linguistic
techniques
Semantic Enhancement Engine, 2002
Metadata Extraction and Semantic
Enhancement
[Hammond, Sheth, Kochut 2002]
Automatic Semantic Annotation
COMTEX Tagging
Value-added Semagix Semantic Tagging
Content
‘Enhancement’
Rich Semantic
Metatagging
Limited tagging
(mostly syntactic)
Value-added
relevant metatags
added by Semagix
to existing
COMTEX tags:
• Private companies
• Type of company
• Industry affiliation
• Sector
• Exchange
• Company Execs
• Competitors
© Semagix, Inc.
Metadata Usage: Keyword,
Attribute and Content Based Access
Keyword Search vs Attribute Search with
Semantic metadata
Taalee Metadata on
Football Assets
Metadata from Typical
Virage Search on
Cataloging of Football
football touchdown
Assets
Rich Media Reference Page
Baltimore 31, Pit 24
http://www.nfl.com
Brian Griese Interview Part Four
Brian Griese talks about the
first touchdown he ever threw.
URL: http://cbs.sportsline...
Jimmy Smith Interview Part Seven
Jimmy Smith explains his
philosophy on showboating.
URL: http://cbs.sportsline...
Quandry Ismail and Tony Banks hook up for their third long
touchdown, this time on a 76-yarder to extend the Raven’s
lead to 31-24 in the third quarter.
League:
Teams:
Score:
Players:
Event:
Produced by:
Posted date:
Professional
Ravens, Steelers
Bal 31, Pit 24
Quandry Ismail, Tony Banks
Touchdown
NFL.com
2/02/2000
Taalee’s Semantic Search
Highly customizable, precise and freshest A/V search
Delightful, relevant information,
exceptional targeting opportunity
Context and Domain Specific Attributes
Uniform Metadata for Content from Multiple
Sources, Can be sorted by any field
Creating a Web of
related information
What can a context do?
Taalee Directory
Georgia Bulldogs
System recognizes ENTITY & CATEGORY
Taalee Directory
Careless whisper
Semantic Relationships
Metadata Application Example
Semantic Applications for highly relevant
and fresh content:
Personalization and
Targeting/interactive marketing
Please contact Taalee for live demonstrations
Personalized Directory
Change
Context
Obtain a whole universe of information (that you may not even
have thought of) about some entities that have always been of
interest to you.
Please enter such semantic keywords below.
Personalized Queries & Hot Topics
Personalized Queries
1. My Stock Portfolio
Microsoft suffers serious hack attack
Cisco Systems Inc
PERSONALIZATION
Analyst Safa Rashtchy on Yahoo!
PeopleSoft, Inc
AT&T Corp.
2. My Football Fantasy Team
more…
Gators' Spurrier ready for 'big' game
Tech's Vick looks to become complete QB
Bucs excited about Hamilton
Jasper Sanks
rumbles
into the end zone…
HOT
Topics!!!
Edwards explains reasons for leaving BYU
1. Election 2000
more…
Video: Explaining the electoral map
Race for White House hots up
3. Julia Roberts Collection
SeniorsHill"
Give Gore Florida Edge
Movie Trailer: "Notting
more…
Trailer - Runaway Bride
2. Middle East Peace Conflict
Patrick
More die as Israel steps up security
Movie Trailer: "Stepmom"
Israel braces for suicide bombs
Conspiracy Theory
more…
Pentagon probes Cole's security
4. Pink Floyd Collection
3. Napster
Controversy
Set the Controls
for the Heart
of the Sun…
more…
The Brain Behind Napster
Wish You Were Here
Napster Lawsuit
Round And Around
Keep Talking
Creative Nomad II
more…
The Post War Dream
more…
Metadata: Targeting
Semantic/Interactive Targeting
Buy Al Pacino Videos
Buy Russell Crowe Videos
Buy Christopher Plummer Videos
Buy Diane Venora Videos
Buy Philip Baker Hall Videos
Buy The Insider Video
Precisely targeted through the use of Structured Metadata and integration from multiple sources
Web: Extreme Personalization
Realtime
Feeds
Web sites
and Pages
Interests,
Preferences
Time-Shifted
Content Aggregator
Content
Databases
Personalized
Content
Content
Personalized
Content
Semantic EngineTM
Structured,
Hi-Quality
Semantic Metabase
Application of Semantic Metadata and
Automatic Content Enrichment
MyMedia
$
MyStocks
 News
w Sports
 Music
%
%
User has already completed Web
Based registration and
personalization at Voquette’s
Enterprise Customer site.
User’s “Wireless Home page”
shows the categories for his
interests. There is an alert (new
content) for his stock and sports
categories.
Application of Semantic Metadata and
Automatic Content Enrichment
My Stocks
MyMedia
$
MyStocks
 News
w Sports
 Music
%
%
CSCO
NT
IBM
Market
Clicking on MyStocks brings
down user’s Personal Portfolio
list. The user wants to see news
items about Cisco (see next
slide).
Search at the bottom is a
semantic search that
understands the financial
domain, and the knowledge of
user’s portfolio. Typically
search can be done by typing
one word or selecting from a
dynamic, personalized menu.
Application of Semantic Metadata and
Automatic Content Enrichment
CSCO
My Stocks
MyMedia
$
MyStocks
 News
w Sports
 Music
Analyst Call
CSCO
%
Conf Call
NT
Earnings
Different types of recent
audio content about
Cisco are available.
The user clicks to see a
listing of Analyst Calls
on Cisco (next slide).
%IBM
Market
%
Icons at the bottom of
the screen enable
contextually relevant
functions: listen, set
alert on story, add to
playlist.
Application of Semantic Metadata and
Automatic Content Enrichment
CSCO Analysis
My Stocks
MyMedia
$
Analyst Call
MyStocks
 News
w Sports
 Music
CSCO
%
%
CSCO
NT
Conf Call
11/08 ON24 Payne
11/07 ON24 H&Q CC
11/06 CBS Langlesis
Earnings
IBM
Market
%
Clicking on the link for Cisco Analyst Calls displays a listing
sorted by date. Semantic filtering uses just the right metadata to
meet screen and other constrains. E.g., Analyst Call focuses on
the source and analyst name or company. The icon denote
additional metadata, such as “Strong Buy” by H&Q Analyst.
iTV: Taalee’s Extreme Personalization
Immediate
Interests,
Preferences,
Content
Provider
(DBS, DISH, Wink,
AOL-TV)
Content,
“Programs”
Meta-Data
Tagged
Content
Semantic EngineTM
Structured,
Hi-Quality
Semantic
Metabase
Personalized
Content Capsules,
Redirects and
Programming
Metadata for Automatic Content Enrichment
Interactive Television
This screen is customizable
with interactivity feature
using metadata such as whether
there is a new Conference
Call video on CSCO.
Part of the screen can be
automatically customized to
show conference call specific
information– including transcript,
participation, etc. all of which are
relevant metadata
Conference Call itself can have
embedded metadata to
support personalization and
interactivity.
This segment has embedded or referenced metadata that is
used by personalization application to show only the stocks
that user is interested in.
Metadata in Enterprise Apps
Collection
Sony
Processing
Production Support
Network
Content
Categorize
Affiliate
Feeds
Catalog
Integrate
Public
Sources
Rich Data
Metabase
Filter, Search, Consolidate,
Personalize, Archive,
Licensing, Syndication
Description
Produced by : CNN
Posted Date : 12/07/2000
Reporter
: David Lewis
Event
: Election 2000
Location
: Tallahassee, Florida, USA
People
: Al Gore
(1.33) – 12/06/00 - ABC
(2.53) - 12/06/00 - CBS
(5.16) - 12/06/00 - ABC
(2.46) - 12/06/00 - FOX
(1.33) - 12/06/00 - NBC
-- Breaking News -Gore Demands That Recount Restart
(5.33) - 12/06/00
(1.33) - 12/06/00 - CBS
(1.33) - 12/06/00 - ABC
Gore Says Fla. Can't Name Electors
(3.57) - 12/06/00 - CBS
(2.33) - 12/06/00 - CBS
Bush Meets Colin Powell at Ranch
(4.27) - 12/06/00 - ABC
(3.12) - 12/06/00 - NNS
Market Tumbles on Earnings Warning
(3.44) - 12/06/00 - FOX
(0.32) - 12/06/00 - CBS
Barak Outlines His Peace Plan
(1.33) - 12/06/00 - CBS
(7.24) - 12/06/00 - CBS
TALLAHASSEE, Florida (CNN) –
Though the two presidential candidates
have until noon Wednesday to file briefs in
Al Gore's appeal to the Florida Supreme
Court, the outcome of two trials set on the
same day in Leon County, Florida, may
offer Gore his best hope for the presidency.
Democrats in Seminole County are seeking
to have 15,000 absentee ballots thrown out
in that heavily Republican jurisdiction -- a
move that would give Gore a lead of up to
5,000 votes statewide.
Lawyers for the plaintiff, Harry Jacobs, claim
the ballots should be rejected because they
say County Elections Supervisor Sandra
Goard allowed Republican workers to fill out
voter identification numbers on 2,126
incomplete absentee ballot applications sent
in by GOP voters, while refusing to allow
Democratic workers to do the same thing for
Democratic voters.
The GOP says that suit, and one similar to it
from Martin County, demonstrates
Democratic Party politics at its most
desperate. Gore is not a party to either of
those lawsuits. On Tuesday, the judge in the
Metadata’s role in emerging
iTV infrastructure
Video
Enhanced
Digital Cable
MPEG-2/4/7
MPEG
Encoder
Create Scene Description Tree
Channel sales
through Video Server Vendors,
Video App Servers, and Broadcasters

MPEG
Decoder
GREAT
USER
EXPERIENCE
Retrieve Scene Description Track
License metadata decoder and
semantic applications to
device makers
Node = AVO Object
Scene
Description
Tree
“Cisco Systems”
Node
Taalee
Semantic
Engine
Produced by: Fox Sports
Creation Date: 12/05/2000
League: NFL
Teams: Seattle Seahawks,

Atlanta Falcons
Players: John Kitna
Coaches: Mike Holmgren,

Dan Reeves
Location: Atlanta
Object Content Information (OCI)
Enhanced
XML
Description
“Cisco Systems”
Metadata-rich
Value-added Node
Ontology Design – Fundamental Principles
• There is no one correct way to model a domain— there are
always viable alternatives.
• The best solution almost always depends on the application
that you have in mind and the extensions that you
anticipate.
• Ontology development is necessarily an iterative process.
• Concepts in the ontology should be close to objects
(physical or logical) and relationships in your domain of
interest.
• These are most likely to be nouns (objects) or verbs
(relationships) in sentences that describe your domain.
Ontology Development 101: A Guide to Creating Your First Ontology
Natalya F. Noy and Deborah L. McGuinness
Semantic (Web) Technology –
State of the Art
Semantic Technology – Key Features
• Design ontology schema
• Automatically Populate ontology with domain knowledge (at
Enterprise Scale)
• Maintain Freshness of ontology (almost) automatically
• Processing of heterogeneous information (structured, semitructured and unstructured)
• Automatic Semantic Metadata Extraction using lexical,
statistical or NLP techniques*
• Automatic Semantic Metadata Extraction using populated
ontology (Knowledgebase approach)
• Logic based reasoning (inferencing)
• Graph/relationship traversal based reasoning
Ontology-driven Information System Lifecycle
Schema
Creation
Analytic
Application
Creation
Ontology API
MB
Ontology
Population
KB
BSBQ
Application
Creation
Semantic Visualization
Metadata
Extraction
Semagix Freedom Architecture:
for building ontology-driven information system
Sheth et al, 2002 Managing Semantic Content for the Web
© Semagix, Inc.
Ontology Creation and Maintenance Steps
1. Ontology Model Creation (Description)
2. Knowledge Agent Creation
Ontology
Semantic Query
Server
4. Querying the Ontology
3. Automatic aggregation of Knowledge
© Semagix, Inc.
Download