Multimedia Semantic Web

advertisement
The Multimedia Semantic Web
Bill Grosky
Multimedia Information Systems Laboratory
University of Michigan-Dearborn
Dearborn, Michigan
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
CBR – Where are We?


Development of feature-based techniques for
content-based retrieval is a mature area, at least
for images
CBR researchers should now concentrate on
extracting semantics from multimedia
documents so that retrievals using conceptbased queries can be tailored to individual users


The semantic gap
(Semi)-automated multimedia annotation
Multimedia Annotation

Multimedia annotations should be
semantically rich



Multiple semantics
A social theory based on how multimedia
information is used
This can be discovered by placing
multimedia information in a natural,
context-rich environment
Context-Rich Environments

Structural context – Author’s contribution



Document’s author places semantically
similar pieces of information close to each
other
User can cluster together semantically similar
pieces of information
Dynamic context – User’s contribution

Short browsing sub-paths are semantically
coherent
Context-Rich Environments


The WEB is a perfect example of a
context-rich environment
Develop multimedia annotations through
cross-modal techniques




Audio
Images
Text
Video
Semantic Web



This program overlaps another very important
current research topic, the semantic web
Web page annotations are the backbone of this
research effort
We have something very important to offer to
this area



Multimedia documents
Deriving multiple semantics for a single document
Combining our efforts will enrich both
communities
Semantic Web

“The Semantic Web is a new initiative to
transform the web into a structure that supports
more intelligent querying and browsing, both by
machines and by humans. This transformation is
to be supported through the generation and use
of metadata constructed via web annotation
tools using user-defined ontologies that can be
related to one another.”
Somewhere on the web
End User
Semantic Web
Ontology Articulation
Toolkit
Agents
Ontology Construction
Tool
Ontologies
Community Portal
x C  D
Web-Page Annotation
Tool
Inference
Engine
Annotated Web Pages
Based on www.semanticweb.org
Metadata Repository
Semantic Web

Plan a vacation within the next month

Bill instructed his semantic web agent through
his handheld browser.

An agent retrieved Bill’s vacation profile from his
travel agent, retrieved Bill’s availability from his
calendar, checked availability of airlines, hotels
and restaurants, and made all the necessary
arrangements.
Semantic Web

Multimedia semantic web

Plan a vacation close to where
is being exhibited.
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
Anglograms

Image object


Entire image
Some meaningful portion of an image


semcon
Point-based features


corner points
color histograms
Anglograms
Point feature map
for shape
Anglograms
Point feature map
for color
Anglograms
Voronoi diagram of n = 18 sites
Anglograms
Dual graph of a Voronoi
diagram
Delaunay triangulation of
n = 18 sites
Anglograms

Delaunay triangulation of a set of n points


O(n log n) algorithm
Invariance of Delaunay triangles of a set
of points to



translation
rotation
scaling
Anglograms

Spatial layout of point set

Anglogram

Computed by discretizing and counting the angles
of the Delaunay triangles


Which angles are counted?
O(max(n #bins)) algorithm

What is bin size?
A set of 26 points
Delaunay triangulations of the point set and its
two transformed variants
Anglograms

Computation of color anglogram of an
image


Divide image evenly into a number of M*N
non-overlapping blocks
Each individual block is abstracted as a
unique feature point labeled with its spatial
location and dominant colors
Anglograms

Computation of color anglogram of an
image

Point feature map


Normalized feature points, after adjusting any two
neighboring feature points to a fixed distance
Construct Delaunay triangulation for each set
of feature points labeled with identical color
Anglograms

Computation of color anglogram of an
image


Compute anglogram based on each Delaunay
triangulation
Color anglogram for image

Concatenating all the anglograms together
Anglograms
Pyramid image
Anglograms
Anglograms
Hue component
Anglograms
Saturation component
Anglograms
Point feature map
Anglograms
Feature points of
hue 2
Anglograms
Delaunay triangulation
of hue 2
Anglograms
Delaunay triangulation
of saturation 5
Anglograms
Number of angles
Anglogram
60
50
40
30
20
10
0
1 2 3
4 5 6
7 8 9 10 11 12 13 14 15 16 17 18 19
Bin number
Anglogram of saturation 5
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
Finding Latent Semantics


We want to transform low-level features to
a higher level of meaning
Used for dimension reduction in QBIC


Searching in high-dimensional spaces
More importantly, it creates clusters of cooccurring features

So-called concepts
Finding Latent Semantics


Latent Semantic Analysis (LSA) was introduced
to overcome a fundamental problem in textual
information retrieval
Users want to retrieve on the basis of
conceptual content

Individual words provide unreliable evidence about
conceptual meanings

Synonymy


Many ways to refer to the same object
Polysemy

Most words have more than one distinct meaning
Finding Latent Semantics

Searching for documents concerning
automobiles



Tend to use the key-word automobile
A statistical analysis determines that the keywords automobile and car tend to co-occur
LSA will retrieve documents in which the keyword car appears, but not the key-word
automobile
Finding Latent Semantics

Term-document association




It is assumed that there exists some underlying latent
semantic structure in the data that is partially obscured by
the randomness of term choice
By semantic structure we mean the correlation structure in
which individual terms appear in documents
Semantic implies only the fact that terms in a document
may be taken as referents to the document itself or to its
topic
Statistical techniques are used to estimate this latent
semantic structure, and to get rid of obscuring noise
Finding Latent Semantics

Singular-value decomposition (SVD)






Take a large matrix of term-document association
Construct a semantic space wherein terms and documents that
are closely associated are placed near to each other
SVD allows the arrangement of space to reflect the major
associative patterns and ignore smaller, less important influence
As a result, terms that did not actually appear in a document
may still end up close to the document, if that is consistent with
the major patterns of association
Position in the space serves as the semantic indexing
Retrieval proceeds by using the terms in a query to identify a
point in the semantic space, and documents in its neighborhood
are returned as relevant results
Finding Latent Semantics

Term-document matrix



d documents
t terms
Represented by a t  d term-document matrix
A

Each document is represented by a column


document vector
Each term is represented by a row

term vector
Finding Latent Semantics
The terms (t = 6)
 t1: bak(e,ing)
 t2: recipes
 t3: bread
 t4: cake
 t5: pastr(y,ies)
 t6: pie
The document titles (d = 5)
 d1: How to Bake Bread Without Recipes
 d2: The Classic Art of Viennese Pastry
 d3: Numerical Recipes: The Art of Scientific Computing
 d4: Breads, Pastries, Pies and Cakes: Quantity Baking Recipes
 d5: Pastry: A Book of Best French Recipes
Finding Latent Semantics
10010 
10111 


10010 
  

00010 
01011 


00010 
0.5774
0.5774

0.5774
A
 0
 0

 0
0 0 0.4082
0 1 0.4082
0 0 0.4082
0 0 0.4082
1 0 0.4082
0 0 0.4082

0.7071
0 

0 
0.7071

0 
0
Finding Latent Semantics

SVD is a dimension reduction technique



Reduced-rank approximation to both column
space and row space
Find a rank-k approximation to matrix A with
minimal change to that matrix for a given
value of k
This decomposition exists for any matrix A
Finding Latent Semantics

SVD of a term-document matrix A

A = U  VT


A is t  d
U is a t  r orthogonal matrix, where r is rank(A)



The columns of U are a basis for the column space of A
U is the matrix of eigenvectors of the matrix AAT
 is an r  r diagonal matrix having singular values 1  2 
…  r of A in order along its diagonal
 2 is the
 VT is a r  d


matrix of eigenvalues of AAT or ATA
orthogonal matrix
The rows of VT are a basis for the row space of A
V is the matrix of eigenvectors of the matrix ATA
Finding Latent Semantics
td
tr
rr
rd
Finding Latent Semantics

A special rank-k approximation, Ak


Ak = Uk k VkT
Uk


k


First k columns of U
First k diagonal values of 
VkT

First k rows of VT
Finding Latent Semantics
0.5774
0.5774

0.5774
A
 0
 0

 0
0 0 0.4082
0 1 0.4082
0 0 0.4082
0 0 0.4082
1 0 0.4082
0 0 0.4082

0.7071
0 

0 
0.7071

0 
0
0
0
0
1.6950
 0
1.1158
0
0

 0
0
0.8403
0

0
0
0.4195
 0
 0
1
0
0

0
0
0
 0
0.2670
0.7479

0.2670
U 
0.1182
0.5198

0.1182
 0.2567
0.5308
 0.3981  0.5249
 0.2847  0.7071
0.0816
 0.2567
 0.0127
0.5308
0.2774
 0.2847
0.6394
0.8423
0.0838
 0.1158
 0.0127
0.2774
0.6394
0
0.4366  0.4717 0.3688
0
0.3067 0.7549
0.0998

0


0 V  0.4412  0.3568  0.6247


0
0.4909  0.0346 0.5711

0.5288 0.2815  0.3712
0

0
0 
0.7071
0 

0
 0.7071
0
0 

0
0.7071 
0
 0.6715
0

 0.2760  0.5000
0.1945  0.5000

0.6571
0

 0.0577 0.7071 
Finding Latent Semantics

Reduce the rank to 3
0.5774
0.5774

0.5774
A
 0
 0

 0
0 0 0.4082
0 1 0.4082
0 0 0.4082
0 0 0.4082
1 0 0.4082
0 0 0.4082
 0.4971  0.0330 0.0232

 0.6003

0.0094
0.9933
0.7071

0  A3   0.4971  0.0330 0.0232


0.0740  0.0522
0 
 0.1801
 0.0326 0.9866
0.0094
0.7071


0.0740  0.0522
 0.1801
0 
0
0.4867  0.0069
0.3858 0.7091 
0.4867  0.0069

0.2320 0.0155 
0.4402 0.7043 

0.2320 0.0155 
Finding Latent Semantics
Documents w/o SVD
Term
1 2 3 4
Mark
15 0 0 0
Twain
15 0 20 0
Samuel
0 10 5 0
Clemens 0 20 10 0
Purple
0 0 0 20
Lion
0 0 0 15
Score
30
0
20
0
Query
1
1
0
0
0
0
Finding Latent Semantics
Document with SVD
Query
Term
1
2
3
4
Mark
3.7 3.5 5.5 0
1
Twain
11.0 10.3 16.1 0
1
Samuel
4.1 3.9 6.1 0
0
Clemens 8.3 7.8 12.2 0
0
Purple
0
0
0
20
0
Lion
0
0
0
15
0
Score
14.7 13.8 21.6
0
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
Using Text for Improved Image Search

10 sets of 5 similar images
Using Text for Improved Image Search


Color anglogram
Each image is divided into 64 nonoverlapping blocks





Extract average hue and average saturation values of
each block
Hue and saturation each quantized into 10 values
Generate Delaunay triangles for each hue value and each
saturation value
Count two largest angles and quantize them into 36 bins,
each of 5°
Feature vector has 720 elements
Using Text for Improved Image Search

Annotations

Extra 15 elements

Category positions


sky, sun, land, water, boat, grass, horse, rhino, bird,
human, pyramid, column, tower, sphinx, snow
Each image annotated with appropriate
keywords and the area coverage of each of
these keywords

e.g., sky (0.55), sun (0.15), water (0.30)
Using Text for Improved Image Search
Raw color global histogram data
0.3% improvement
Raw color global histogram data using LSA
0.5% improvement
Annotated color global histogram data using LSA
Using Text for Improved Image Search
Raw color anglogram data
0.5% improvement
Raw color anglogram data using LSA
1% improvement
Annotated color anglogram data using LSA
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
Using Images for Improved Text Search

Using documents collected from news Web sites




News headlines are often used as URL anchors and
document titles
Topic can be represented easily and clearly by a
group of keywords in the headline
News web sites often have extensive coverage of the
same topic during certain period of time
News documents often include multimedia
components which are closely related to the topic
Using Images for Improved Text Search


Discover the semantic correlation between
keywords and image in the same document
A collection of 20 documents from cnn.com



4 semantic categories of 5 documents each
43 keywords
Select 1 image from each document

Color anglogram
Using Images for Improved Text Search
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
Bush, in first address as president
Education, tax cuts top Bush's Washington agenda
Campaign promises could prove troublesome for Bush
Bush's to-do list: Set tone for next four years
George W. Bush: The 43rd President
Rescue mission for crippled Russian sub enters second day
Russian official says chances not good for rescue of trapped crew aboard sunken nuclear sub
Kursk salvage raises questions
Russia to start recovering Kursk bodies
Russian navy begins attempt to evacuate sailors from sunken sub
Clinton acquitted; president apologizes again
Clinton apologizes to nation
Clinton's evolving apology for the Lewinsky affair
Clinton will not address impeachment in State of the Union
Clinton says 'presidents are people, too'
MIR prepares for risky plunge
Mir positioned for fiery descent
A Mir risk
Mir demise causes international high anxiety
New Zealand issues Mir warning
Using Images for Improved Text Search
Using Images for Improved Text Search

Integrated feature vector F = [f1, f2,…, f143]T



Textual feature vector K = [k1, k2, …, k43]T
Image feature vector I = [i1, i2, …, i100]T
Feature document matrix A = [F1, F2, …, F20]

A = UΣVT


U is 143  143, Σ is 143  20, and V is 20  20
k = 12


Ak = UkΣkVkT
Uk is 143  12, Σk is 12  12, and Vk is 20  12
Using Images for Improved Text Search




Each image is normalized to 192  128,
and then divided into 64 non-overlapping
blocks
Extract average hue and saturation values of
each block
Hue and saturation each quantized into 10
values
Generate Delaunay triangles for each hue value
and each saturation value
Using Images for Improved Text Search



Count two largest angles and quantize
them into 36 bins, each of 5°
Image feature vector has 720 elements
Feature document matrix A is 763  20


SVD
k = 12
Using Images for Improved Text Search
Keywords only
1% improvement
Keywords using LSA
3% improvement
Image (global color histogram)
annotated keywords using LSA
21% improvement
Image (anglogram) annotated keywords
using LSA
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
Web Page Structure


Genre detection
We do the following:






Display web page in the program
Get tag hierarchy with area co-ordinates
Normalize the web page to size 512 * 512
Divide page in 16*16 blocks
Calculate area covered by each tag in each block
considering the level of the tag in tag hierarchy
For each feature tag get the center coordinates of the
blocks where it is covering maximum area as
compared with other tags on the same level
Web Page Structure
Web Page Structure
Web Page Structure

Histogram


36 bins with two large angles
Tags independent of level

Try approach where tag on lower level overrides
upper-level tag
Web Page Structure

Set of tags defined 

Initially, a large set of feature tags (52) is
defined to ensure a powerful set of
independent features for the discrimination of
web pages
A second set of tags (3) is defined based on
histograms created for initial set of tags so
that these tags will better differentiate web
pages
Web Page Structure

Experiment # 1

Categories defined are
Detroit News
 Times of India
 Tribune India
 Esakal
 Amazon.com
 Buy.com

Web Page Structure

Cluster category based on closest page
Matches
Failures
52 tags
26
10
3 tags
27
9
Web Page Structure

Experiment # 2

Categories defined are

News paper environment





Detroit News
Times of India
Tribune India
Esakal
e - Commerce environment


Amazon.com
Buy.com
Web Page Structure
Matches
Failures
52 tags
33
3
3 tags
33
3
Contents

Introduction





CBR – Where are we?
Multimedia annotation
Context-rich environments
Semantic web
Our work






Anglograms
Finding latent semantics
Using text for improved image search
Using images for improved text search
Web page structure
A cross-modal theory of linked document semantics
A Cross-Modal Theory of Linked
Document Semantics

Environment

Suppose one has a linked set of multimedia
documents
Web
 Content-based hypermedia


This provides a rich context for individual
chunks of information
The structure of individual multimedia documents
 The link structure

A Cross-Modal Theory of Linked
Document Semantics

Goal

Derive document semantics based on user
browsing behavior

The same document has multiple semantics


Different people see different meanings in the same
document
Over short browsing paths, an individual user’s
wants and needs are uniform

The pages visited over these short paths exhibit
semantics in congruence with these wants and needs
A Cross-Modal Theory of Linked
Document Semantics

Questions




How can the semantics of a web page be derived
given a set of user browsing paths that end at that
page?
How can we characterize the semantics of a user
browsing path?
How can web page semantics help us in navigating
the web more efficiently?
How can our approach actually be implemented in the
real web world?
A Cross-Modal Theory of Linked
Document Semantics

Our approach

We use actual browsing paths to find the
latent semantics of web pages
Textual features
 Image features
 Structural features


We hope to find general concepts comprising
various textual and image features which
frequently co-occur
A Cross-Modal Theory of Linked
Document Semantics

We believe that a user’s browsing path
exhibits semantic coherence

While the user’s entire path exhibits multiple
semantics, especially pages far from each
other on the path, neighboring pages,
especially the portions close to the links
taken, are semantically close to each other
A Cross-Modal Theory of Linked
Document Semantics

We would like to characterize the
contiguous sub-paths of a user’s browsing
path that exhibit similar semantics and
detect the semantic break points along the
path where the semantics appreciably
change

Collect these sub-paths into a multiset
A Cross-Modal Theory of Linked
Document Semantics



We categorize the semantics of each web page
based on a history of the semantically-coherent
browsing paths of all users which end at that
page
A browsing path will be represented by a highdimensional vector
The various positions of the vector correspond
to the presence of



textual keywords
image features (visual keywords)
structural features (structural keywords)
A Cross-Modal Theory of Linked
Document Semantics


From the complete set of web pages
under consideration, we extract a set of
textual, visual, and structural keywords
For each multiset, M, of sub-paths that we
are to analyze, we form three matrices



term-path matrix
image-path matrix
structure-path matrix
A Cross-Modal Theory of Linked
Document Semantics

The (i,j)th element of these matrices are
determined by

Strength of the presence of ith keyword along the jth
browsing path

Determined by




How many times this term occurs on the pages along the path
How much time the user spends examining these pages
How close each occurrence of the ith keyword is to both the
outgoing and incoming anchor positions
How many times this browsing path occurs in M
A Cross-Modal Theory of Linked
Document Semantics



These matrices may be concatenated
together in various ways to produce an
overall keyword-path matrix
Perform latent-semantic analysis to get
concepts
A page is then represented by a set of
concept classes
Conclusions




Researchers in CBR should now be
concentrating on extracting semantics from
multimedia documents
The web is a perfect testbed for studying semi(automated) techniques for multimedia
annotation due to contextual richness
CBR + Semantic Web = The Multimedia
Semantic Web
Get Involved!!!
Download