Multimedia Data Mining Arvind Balasubramanian arvind@utdallas.edu Multimedia Lab The University of Texas at Dallas Me and My Research • Research Interests: – – – – Machine Learning Data Mining Statistical Analysis Applications of the above in Multimedia • I am currently working on – Optimizing index and retrieval structures for human motion data – Analysis of Tongue motion data to identify baseline characteristics of pronunciations (classification, speech therapy) Data Mining and Multimedia • Uncovering hidden information from data. • Exploiting data to obtain new knowledge and interpret results. • Immense applications in Multimedia. Data Mining Techniques • Classification • Prediction • Cluster Analysis & Class Discovery • Extraction and Retrieval • Statistical Analysis Ideas for Projects Text Mining • Information Extraction from Domain-specific documents – involves extracting data from free text pieces and populating a database – Serves to organize required information available in unorganized form – Not enough in itself; combine with class discovery Ideas for Projects Text Mining • New Class Discovery using Clustering techniques – identifying groups of keywords that do not fall into known categories – creating new categories and validating them – Possibly employ clustering algorithms with proper similarity measure or distance functions Ideas for Projects Text Mining (contd.) • Query-based document retrieval system – employ one of several base models such as a probabilistic model or a vector space model – design an efficient indexing system – include relevance ranking feature – possibly make the system intelligent using machine learning techniques Ideas for Projects Pattern Recognition in Multimedia Data • Scope – analyze and identify interrelationships within Multimedia data sets – Derive a composite score from several different subscores • Methods – classic techniques like Principal Component Analysis (PCA) and Factor Analysis (FA) – Statistical methods such as Regression analysis Ideas for Projects Pattern Recognition in Multimedia Data (contd.) • Methods – Principal Component Analysis (PCA) (a) Dimensionality Reduction (b) Efficient Storage and Retrieval of Media data (c) Applications in any multi-dimensional media: Images (noise reduction), Video (content analysis), Audio (Voice Signature recognition) Ideas for Projects Pattern Recognition in Multimedia Data (contd.) • Methods – Factor Analysis (FA) (a) Minimize data redundancy (b) Reveal hidden patterns (c) combining attributes to form a single attribute by determining the importance and contribution of each attribute (d) Medical analysis, IQ tests, Personality tests, Software measurement, Multimedia content analysis, Motion Capture Data analysis. Ideas for Projects Pattern Recognition in Multimedia Data (contd.) • Methods – Statistical Analysis (a) Correlation analysis to bring out interrelationships between data attributes (b) Regression analysis to analyze the ability of a set of data attributes to predict other data attributes Ideas for Projects Prediction and Suggestion Systems • An intelligent shopping application or a movie review application that – learns from user ratings or purchases, and suggests other products or options – Examples: Netflix & Amazon – Many machine learning techniques could be employed: Bayesian reasoning and classification algorithms like Adaboosting Ideas for Projects Prediction and Suggestion Systems • An intelligent media hosting application that – learns from user queries and requests, and accordingly suggests other media items – Suggested items would be retrieved by querying on the features of the media features and metadata – Examples: Esnips music hosting – Many machine learning techniques could be employed: Bayesian reasoning and classification algorithms Ideas for Projects • Ideas for alternative projects having to do with applications of machine learning, data mining and statistical analysis in the domain of multimedia are welcome. • Tools – Weka, Matlab, Statistical software packages (even Excel helps a lot!!). Thank You Arvind Balasubramanian arvind@utdallas.edu Multimedia Lab The University of Texas at Dallas