The Multimedia Semantic Web Bill Grosky Multimedia Information Systems Laboratory University of Michigan-Dearborn Dearborn, Michigan Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics CBR – Where are We? Development of feature-based techniques for content-based retrieval is a mature area, at least for images CBR researchers should now concentrate on extracting semantics from multimedia documents so that retrievals using conceptbased queries can be tailored to individual users The semantic gap (Semi)-automated multimedia annotation Multimedia Annotation Multimedia annotations should be semantically rich Multiple semantics A social theory based on how multimedia information is used This can be discovered by placing multimedia information in a natural, context-rich environment Context-Rich Environments Structural context – Author’s contribution Document’s author places semantically similar pieces of information close to each other User can cluster together semantically similar pieces of information Dynamic context – User’s contribution Short browsing sub-paths are semantically coherent Context-Rich Environments The WEB is a perfect example of a context-rich environment Develop multimedia annotations through cross-modal techniques Audio Images Text Video Semantic Web This program overlaps another very important current research topic, the semantic web Web page annotations are the backbone of this research effort We have something very important to offer to this area Multimedia documents Deriving multiple semantics for a single document Combining our efforts will enrich both communities Semantic Web “The Semantic Web is a new initiative to transform the web into a structure that supports more intelligent querying and browsing, both by machines and by humans. This transformation is to be supported through the generation and use of metadata constructed via web annotation tools using user-defined ontologies that can be related to one another.” Somewhere on the web End User Semantic Web Ontology Articulation Toolkit Agents Ontology Construction Tool Ontologies Community Portal x C D Web-Page Annotation Tool Inference Engine Annotated Web Pages Based on www.semanticweb.org Metadata Repository Semantic Web Plan a vacation within the next month Bill instructed his semantic web agent through his handheld browser. An agent retrieved Bill’s vacation profile from his travel agent, retrieved Bill’s availability from his calendar, checked availability of airlines, hotels and restaurants, and made all the necessary arrangements. Semantic Web Multimedia semantic web Plan a vacation close to where is being exhibited. Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics Anglograms Image object Entire image Some meaningful portion of an image semcon Point-based features corner points color histograms Anglograms Point feature map for shape Anglograms Point feature map for color Anglograms Voronoi diagram of n = 18 sites Anglograms Dual graph of a Voronoi diagram Delaunay triangulation of n = 18 sites Anglograms Delaunay triangulation of a set of n points O(n log n) algorithm Invariance of Delaunay triangles of a set of points to translation rotation scaling Anglograms Spatial layout of point set Anglogram Computed by discretizing and counting the angles of the Delaunay triangles Which angles are counted? O(max(n #bins)) algorithm What is bin size? A set of 26 points Delaunay triangulations of the point set and its two transformed variants Anglograms Computation of color anglogram of an image Divide image evenly into a number of M*N non-overlapping blocks Each individual block is abstracted as a unique feature point labeled with its spatial location and dominant colors Anglograms Computation of color anglogram of an image Point feature map Normalized feature points, after adjusting any two neighboring feature points to a fixed distance Construct Delaunay triangulation for each set of feature points labeled with identical color Anglograms Computation of color anglogram of an image Compute anglogram based on each Delaunay triangulation Color anglogram for image Concatenating all the anglograms together Anglograms Pyramid image Anglograms Anglograms Hue component Anglograms Saturation component Anglograms Point feature map Anglograms Feature points of hue 2 Anglograms Delaunay triangulation of hue 2 Anglograms Delaunay triangulation of saturation 5 Anglograms Number of angles Anglogram 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Bin number Anglogram of saturation 5 Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics Finding Latent Semantics We want to transform low-level features to a higher level of meaning Used for dimension reduction in QBIC Searching in high-dimensional spaces More importantly, it creates clusters of cooccurring features So-called concepts Finding Latent Semantics Latent Semantic Analysis (LSA) was introduced to overcome a fundamental problem in textual information retrieval Users want to retrieve on the basis of conceptual content Individual words provide unreliable evidence about conceptual meanings Synonymy Many ways to refer to the same object Polysemy Most words have more than one distinct meaning Finding Latent Semantics Searching for documents concerning automobiles Tend to use the key-word automobile A statistical analysis determines that the keywords automobile and car tend to co-occur LSA will retrieve documents in which the keyword car appears, but not the key-word automobile Finding Latent Semantics Term-document association It is assumed that there exists some underlying latent semantic structure in the data that is partially obscured by the randomness of term choice By semantic structure we mean the correlation structure in which individual terms appear in documents Semantic implies only the fact that terms in a document may be taken as referents to the document itself or to its topic Statistical techniques are used to estimate this latent semantic structure, and to get rid of obscuring noise Finding Latent Semantics Singular-value decomposition (SVD) Take a large matrix of term-document association Construct a semantic space wherein terms and documents that are closely associated are placed near to each other SVD allows the arrangement of space to reflect the major associative patterns and ignore smaller, less important influence As a result, terms that did not actually appear in a document may still end up close to the document, if that is consistent with the major patterns of association Position in the space serves as the semantic indexing Retrieval proceeds by using the terms in a query to identify a point in the semantic space, and documents in its neighborhood are returned as relevant results Finding Latent Semantics Term-document matrix d documents t terms Represented by a t d term-document matrix A Each document is represented by a column document vector Each term is represented by a row term vector Finding Latent Semantics The terms (t = 6) t1: bak(e,ing) t2: recipes t3: bread t4: cake t5: pastr(y,ies) t6: pie The document titles (d = 5) d1: How to Bake Bread Without Recipes d2: The Classic Art of Viennese Pastry d3: Numerical Recipes: The Art of Scientific Computing d4: Breads, Pastries, Pies and Cakes: Quantity Baking Recipes d5: Pastry: A Book of Best French Recipes Finding Latent Semantics 10010 10111 10010 Â 00010 01011 00010 0.5774 0.5774 0.5774 A 0 0 0 0 0 0.4082 0 1 0.4082 0 0 0.4082 0 0 0.4082 1 0 0.4082 0 0 0.4082 0.7071 0 0 0.7071 0 0 Finding Latent Semantics SVD is a dimension reduction technique Reduced-rank approximation to both column space and row space Find a rank-k approximation to matrix A with minimal change to that matrix for a given value of k This decomposition exists for any matrix A Finding Latent Semantics SVD of a term-document matrix A A = U VT A is t d U is a t r orthogonal matrix, where r is rank(A) The columns of U are a basis for the column space of A U is the matrix of eigenvectors of the matrix AAT is an r r diagonal matrix having singular values 1 2 … r of A in order along its diagonal 2 is the VT is a r d matrix of eigenvalues of AAT or ATA orthogonal matrix The rows of VT are a basis for the row space of A V is the matrix of eigenvectors of the matrix ATA Finding Latent Semantics td tr rr rd Finding Latent Semantics A special rank-k approximation, Ak Ak = Uk k VkT Uk k First k columns of U First k diagonal values of VkT First k rows of VT Finding Latent Semantics 0.5774 0.5774 0.5774 A 0 0 0 0 0 0.4082 0 1 0.4082 0 0 0.4082 0 0 0.4082 1 0 0.4082 0 0 0.4082 0.7071 0 0 0.7071 0 0 0 0 0 1.6950 0 1.1158 0 0 0 0 0.8403 0 0 0 0.4195 0 0 1 0 0 0 0 0 0 0.2670 0.7479 0.2670 U 0.1182 0.5198 0.1182 0.2567 0.5308 0.3981 0.5249 0.2847 0.7071 0.0816 0.2567 0.0127 0.5308 0.2774 0.2847 0.6394 0.8423 0.0838 0.1158 0.0127 0.2774 0.6394 0 0.4366 0.4717 0.3688 0 0.3067 0.7549 0.0998 0 0 V 0.4412 0.3568 0.6247 0 0.4909 0.0346 0.5711 0.5288 0.2815 0.3712 0 0 0 0.7071 0 0 0.7071 0 0 0 0.7071 0 0.6715 0 0.2760 0.5000 0.1945 0.5000 0.6571 0 0.0577 0.7071 Finding Latent Semantics Reduce the rank to 3 0.5774 0.5774 0.5774 A 0 0 0 0 0 0.4082 0 1 0.4082 0 0 0.4082 0 0 0.4082 1 0 0.4082 0 0 0.4082 0.4971 0.0330 0.0232 0.6003 0.0094 0.9933 0.7071 0 A3 0.4971 0.0330 0.0232 0.0740 0.0522 0 0.1801 0.0326 0.9866 0.0094 0.7071 0.0740 0.0522 0.1801 0 0 0.4867 0.0069 0.3858 0.7091 0.4867 0.0069 0.2320 0.0155 0.4402 0.7043 0.2320 0.0155 Finding Latent Semantics Documents w/o SVD Term 1 2 3 4 Mark 15 0 0 0 Twain 15 0 20 0 Samuel 0 10 5 0 Clemens 0 20 10 0 Purple 0 0 0 20 Lion 0 0 0 15 Score 30 0 20 0 Query 1 1 0 0 0 0 Finding Latent Semantics Document with SVD Query Term 1 2 3 4 Mark 3.7 3.5 5.5 0 1 Twain 11.0 10.3 16.1 0 1 Samuel 4.1 3.9 6.1 0 0 Clemens 8.3 7.8 12.2 0 0 Purple 0 0 0 20 0 Lion 0 0 0 15 0 Score 14.7 13.8 21.6 0 Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics Using Text for Improved Image Search 10 sets of 5 similar images Using Text for Improved Image Search Color anglogram Each image is divided into 64 nonoverlapping blocks Extract average hue and average saturation values of each block Hue and saturation each quantized into 10 values Generate Delaunay triangles for each hue value and each saturation value Count two largest angles and quantize them into 36 bins, each of 5° Feature vector has 720 elements Using Text for Improved Image Search Annotations Extra 15 elements Category positions sky, sun, land, water, boat, grass, horse, rhino, bird, human, pyramid, column, tower, sphinx, snow Each image annotated with appropriate keywords and the area coverage of each of these keywords e.g., sky (0.55), sun (0.15), water (0.30) Using Text for Improved Image Search Raw color global histogram data 0.3% improvement Raw color global histogram data using LSA 0.5% improvement Annotated color global histogram data using LSA Using Text for Improved Image Search Raw color anglogram data 0.5% improvement Raw color anglogram data using LSA 1% improvement Annotated color anglogram data using LSA Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics Using Images for Improved Text Search Using documents collected from news Web sites News headlines are often used as URL anchors and document titles Topic can be represented easily and clearly by a group of keywords in the headline News web sites often have extensive coverage of the same topic during certain period of time News documents often include multimedia components which are closely related to the topic Using Images for Improved Text Search Discover the semantic correlation between keywords and image in the same document A collection of 20 documents from cnn.com 4 semantic categories of 5 documents each 43 keywords Select 1 image from each document Color anglogram Using Images for Improved Text Search 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Bush, in first address as president Education, tax cuts top Bush's Washington agenda Campaign promises could prove troublesome for Bush Bush's to-do list: Set tone for next four years George W. Bush: The 43rd President Rescue mission for crippled Russian sub enters second day Russian official says chances not good for rescue of trapped crew aboard sunken nuclear sub Kursk salvage raises questions Russia to start recovering Kursk bodies Russian navy begins attempt to evacuate sailors from sunken sub Clinton acquitted; president apologizes again Clinton apologizes to nation Clinton's evolving apology for the Lewinsky affair Clinton will not address impeachment in State of the Union Clinton says 'presidents are people, too' MIR prepares for risky plunge Mir positioned for fiery descent A Mir risk Mir demise causes international high anxiety New Zealand issues Mir warning Using Images for Improved Text Search Using Images for Improved Text Search Integrated feature vector F = [f1, f2,…, f143]T Textual feature vector K = [k1, k2, …, k43]T Image feature vector I = [i1, i2, …, i100]T Feature document matrix A = [F1, F2, …, F20] A = UΣVT U is 143 143, Σ is 143 20, and V is 20 20 k = 12 Ak = UkΣkVkT Uk is 143 12, Σk is 12 12, and Vk is 20 12 Using Images for Improved Text Search Each image is normalized to 192 128, and then divided into 64 non-overlapping blocks Extract average hue and saturation values of each block Hue and saturation each quantized into 10 values Generate Delaunay triangles for each hue value and each saturation value Using Images for Improved Text Search Count two largest angles and quantize them into 36 bins, each of 5° Image feature vector has 720 elements Feature document matrix A is 763 20 SVD k = 12 Using Images for Improved Text Search Keywords only 1% improvement Keywords using LSA 3% improvement Image (global color histogram) annotated keywords using LSA 21% improvement Image (anglogram) annotated keywords using LSA Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics Web Page Structure Genre detection We do the following: Display web page in the program Get tag hierarchy with area co-ordinates Normalize the web page to size 512 * 512 Divide page in 16*16 blocks Calculate area covered by each tag in each block considering the level of the tag in tag hierarchy For each feature tag get the center coordinates of the blocks where it is covering maximum area as compared with other tags on the same level Web Page Structure Web Page Structure Web Page Structure Histogram 36 bins with two large angles Tags independent of level Try approach where tag on lower level overrides upper-level tag Web Page Structure Set of tags defined Initially, a large set of feature tags (52) is defined to ensure a powerful set of independent features for the discrimination of web pages A second set of tags (3) is defined based on histograms created for initial set of tags so that these tags will better differentiate web pages Web Page Structure Experiment # 1 Categories defined are Detroit News Times of India Tribune India Esakal Amazon.com Buy.com Web Page Structure Cluster category based on closest page Matches Failures 52 tags 26 10 3 tags 27 9 Web Page Structure Experiment # 2 Categories defined are News paper environment Detroit News Times of India Tribune India Esakal e - Commerce environment Amazon.com Buy.com Web Page Structure Matches Failures 52 tags 33 3 3 tags 33 3 Contents Introduction CBR – Where are we? Multimedia annotation Context-rich environments Semantic web Our work Anglograms Finding latent semantics Using text for improved image search Using images for improved text search Web page structure A cross-modal theory of linked document semantics A Cross-Modal Theory of Linked Document Semantics Environment Suppose one has a linked set of multimedia documents Web Content-based hypermedia This provides a rich context for individual chunks of information The structure of individual multimedia documents The link structure A Cross-Modal Theory of Linked Document Semantics Goal Derive document semantics based on user browsing behavior The same document has multiple semantics Different people see different meanings in the same document Over short browsing paths, an individual user’s wants and needs are uniform The pages visited over these short paths exhibit semantics in congruence with these wants and needs A Cross-Modal Theory of Linked Document Semantics Questions How can the semantics of a web page be derived given a set of user browsing paths that end at that page? How can we characterize the semantics of a user browsing path? How can web page semantics help us in navigating the web more efficiently? How can our approach actually be implemented in the real web world? A Cross-Modal Theory of Linked Document Semantics Our approach We use actual browsing paths to find the latent semantics of web pages Textual features Image features Structural features We hope to find general concepts comprising various textual and image features which frequently co-occur A Cross-Modal Theory of Linked Document Semantics We believe that a user’s browsing path exhibits semantic coherence While the user’s entire path exhibits multiple semantics, especially pages far from each other on the path, neighboring pages, especially the portions close to the links taken, are semantically close to each other A Cross-Modal Theory of Linked Document Semantics We would like to characterize the contiguous sub-paths of a user’s browsing path that exhibit similar semantics and detect the semantic break points along the path where the semantics appreciably change Collect these sub-paths into a multiset A Cross-Modal Theory of Linked Document Semantics We categorize the semantics of each web page based on a history of the semantically-coherent browsing paths of all users which end at that page A browsing path will be represented by a highdimensional vector The various positions of the vector correspond to the presence of textual keywords image features (visual keywords) structural features (structural keywords) A Cross-Modal Theory of Linked Document Semantics From the complete set of web pages under consideration, we extract a set of textual, visual, and structural keywords For each multiset, M, of sub-paths that we are to analyze, we form three matrices term-path matrix image-path matrix structure-path matrix A Cross-Modal Theory of Linked Document Semantics The (i,j)th element of these matrices are determined by Strength of the presence of ith keyword along the jth browsing path Determined by How many times this term occurs on the pages along the path How much time the user spends examining these pages How close each occurrence of the ith keyword is to both the outgoing and incoming anchor positions How many times this browsing path occurs in M A Cross-Modal Theory of Linked Document Semantics These matrices may be concatenated together in various ways to produce an overall keyword-path matrix Perform latent-semantic analysis to get concepts A page is then represented by a set of concept classes Conclusions Researchers in CBR should now be concentrating on extracting semantics from multimedia documents The web is a perfect testbed for studying semi(automated) techniques for multimedia annotation due to contextual richness CBR + Semantic Web = The Multimedia Semantic Web Get Involved!!!