Outline A Matrix Factorization Framework for JJointlyy Analyzing y g Multiple p Nonnegative g Data Sources Sunil Kumar Gupta, Dinh Phung*, Phung*, Brett Adams, Svetha Venkatesh Motivation Shared Subspace Learning Applications Experimental Results Conclusion Institute for MultiMulti-sensor Processing & Content Analysis (IMPCA) Curtin University of Technology, Perth, Australia Text Mining Workshop 2011, Arizona, USA 30th April, 2011 * Presenting author Problem Aim Joint modelling of multiple data sources to exploit their collective strength while retaining their variability or differences. M ti ti Motivation Research community has mainly focused their efforts on analyzing single data source. Subspace learning across multiple data sources can capture certain information otherwise not possible if analyzing them independently. CCA based methods can’t be applied in general scenarios due to the need for correspondences across each data source source. Can we develop a model which can systematically capture the collective strengths of the data from multiple sources which share some underlying structures? (*Ack: some graphics are from images.google.com) Can we tell what is going on: jointly and individually? Nonnegative Data Sources Multiple Nonnegative Shared S b Subspace Learning i ((MS(MS S-NMF)) In this work, we confine ourselves to modelling only nonnegative data sources. g data sources are important p and widelyy encountered in real world such as Nonnegative Text Image Video Counting data NMF Multiple Nonnegative Shared S b Subspace Learning i ((MS(MS S-NMF)) Multiple Nonnegative Shared S b Subspace Learning i ((MS(MS S-NMF)) Let us denote the data matrix for i-th source by Xi and its dimension be MxNi and write it the th decomposition d iti as : where S n, i v S n | i v and S n PowerSet1,..., n For n = 3 data sources : X 1 [W1 | W12 | W13 | W123 ][ H1T,1 | H1T,12 | H1T,13 | H1T,123 ]T (a) Chain sharing (b) Pair-wise Pair wise sharing (c) Full sharing X 2 [W2 | W12 | W23 | W123 ][ H 2T, 2 | H 2T,12 | H 2T, 23 | H 2T,123 ]T X 3 [W3 | W13 | W23 | W123 ][ H 3T,3 | H 3T,13 | H 3T, 23 | H 3T,123 ]T We have freedom to specify what sharing configuration to be used. Multiplicative updates …(MS--NMF continued) …(MS We minimize the following joint decomposition error computed across all data matrices where ||.||F is the Frobenius norm and λi is defined as the following We propose iterative solution for the above problem and details can be found in the paper. Social Media Applications Social Media Retrieval MS-NMF based retrieval algorithm Improving social media retrieval in target medium with the h hhelp l off other h auxiliary ili social i l media di sources. Cross-media retrieval or retrieval across multiple social Crossmedia sources. W , H v Query set (Q) Vocabulary (V) {Retrieved items} form query vector qx using vocabulary V and Q rank the similarities in decreasing order No. of items to be retrieved (N) i ,v project qx onto the subspace (to get qh) q x Wi qh compute cosine similarity between query q y vector and the items in the subspace Wi Cross--Social Media Retrieval Cross MS-NMF based cross-media retrieval algorithm Vocabulary (V) {Retrieved items} form query vector qx using vocabulary V and Q Data collection W , H Use subspace Wv for cross media configuration v, v e.g. e g W12 for retrieval across medium 1 and 2, similarly, use W123 for retrieval across medium 1, 2 and 3. Query set (Q) Experiments v We created a cross social media data by crawling the textual tags of three disparate social media genres : Text (from BlogSpot website) Image (from Flickr website) Video (from YouTube website) i ,v project qx onto the subspace (to get qh) q x Wv qh Dataset size compute cosine similarity b/w qh and the items of involved media (e g Hi,v and Hj,v) in the (e.g. subspace Wv rank the similarities in decreasing order Data Set : Concept Distribution Christmas Holi Academy Awards Australian Open Olympic Games US Election El i Earthquake Terror Attacks Global Warming Concepts Avg. Tags Per Item (rounded rounded)) BlogSpot 10000 ‘Academy Awards’, ‘Australian Open’, ‘Olympic Games’, ‘US Election’, ‘Christmas’, ‘Earthquake’, ‘Cricket World Cup’ 6 Flickr 20000 ‘Academy Awards’, ‘Australian Open’, ‘Olympic Games’, ‘US Election’, ‘Christmas’, ‘Terror Attacks’ , ‘Holi’ 8 YouTube 7000 ‘Academy Awards’, ‘Australian Open’, ‘Olympic Games’, ‘US Election’, ‘Terror Election Terror Attacks Attacks’, ‘Earthquake’ Earthquake , ‘Global Global Warming Warming’ 7 Choice Subspace Dimensions (Kv) Find the number of the common features (tags in our case) between the two datasets, say Mv. Use “the the rule of thumb thumb” suggested by [K.V. [K V Mardia et al 1979, 1979 Multivariate Analysis] Analysis] as Kv M v / 2 Initialize using above heuristic and then perform cross-validation based on retrieval precision performance. Experiment--I Experiment Experiment--II Experiment (Improving Social Media Retrieval in Transfer Learning Setting) (Retrieving Items across Multiple Social Media Sources) BASELINES NMF (no sharing), JSNMF [7] with BlogSpot as auxiliary, JSNMF[7] with Flickr, Flickr and tag-based Precision and Recall measures for cross-media scenario are defined as the following: where n is the number of media involved for retrieval ; for a particular query, Ai and Gi are the answer set and ground-truth set from the i-th medium. Baseline-I : Tag based matching BASELINES : Baseline-II : Lin et al. [12] Baseline-III : JSNMF [7] Precision-Scope and MAP plots 11-point Precision-Recall plots Cross-media retrieval results across BlogSpot/Flickr / 11-point Precision-Recall (BlogSpot/Flickr) Precision-Scope and MAP (BlogSpot/Flickr) Cross-media retrieval results across BlogSpot/YouTube / b 11-point Precision-Recall (BlogSpot/YouTube) Precision-Scope and MAP (BlogSpot/YouTube) Cross-media retrieval results across Flickr/YouTube / b Cross-media retrieval results across BlogSpot/Flickr/YouTube / / b 11-point Precision-Recall Precision-Scope and MAP JSNMF[7] can not be applied in this case as it is limited to two data source cases only ! 11-point Precision-Recall (Flickr/YouTube) Precision-Scope and MAP (Flickr/YouTube) Topical p Analysis y Conclusion Definition of Entropy is usual whereas Impurity of a topic is defined as the following where NGD(tx, ty) is normalized Google Distance [4] between two terms tx and ty; g words in a topic. p and L is the number of “significant” Distribution of Entropy and Impurity values computed across various topics (a) Entropy Distribution (b) Impurity Distribution We presented a novel framework for jointly modelling data from multiple nonnegative sources with arbitrary sharing topologies. topologies We demonstrated its application on two social media problems (1) improved tagtagbased social media retrieval within one domain (2) CrossCross-social media retrieval We empirically demonstrated that controlled sharing is crucial to avoid any negative knowledgeknowledge-transfer from auxiliary data sources. 