Music Information Retrieval With Condor Scott McCaulay Joe Rinkovsky Pervasive Technology Institute Indiana University Overview PFASC is a suite of applications developed at IU to perform automated similarity analysis of audio files Potential applications include organization of digital libraries, recommender systems, playlist generators, audio processing PFASC is a project in the MIR field, an extension and adaptation of traditional Text Information Retrieval techniques to sound files Elements of PFASC, specifically the file by file similarity calculation, have proven to be a very good fit with Condor What We’ll Cover Condor at Indiana University Background on Information Retrieval and Music Information Retrieval The PFASC project PFASC and Condor, experience to date and results Summary Condor at IU Initiated in 2003 Utilizes 2350 Windows Vista machines from IU’s Student Technology Clusters Minimum 2GB memory, 100 Mb network Available to students at 42 locations on the Bloomington campus 24 x 7 Student use is top priority, Condor jobs are suspended immediately on use Costs to Support Condor at IU Annual marginal annual cost to support Condor Pool at IU is < $15K Includes system administration, head nodes, file servers Purchase and support of STC machines are funded from Student Technology Fees Challenges to Making Good use of Condor Resources at IU Windows environment – Research computing environment at IU is geared to Linux, or to exotic architectures Ephemeral resources – Machines are moderately to heavily used at all hours, longer jobs are likely to be preempted Availability of other computing resources – Local users are far from starved for cycles, limited motivation to port Examples of Applications Supported on Condor at IU Hydra Portal (2003) – Job submission portal – Suite of Bio apps, Blast, Meme, FastDNAml Condor Render Portal (2006) – Maya, Blender video rendering PFASC (2008) – Similarity analysis of audio files Information Retrieval - Background Science of organizing documents for search and retrieval Dates back to 1880s (Hollerith) Vannevar Bush, first US presidential science advisor, presages hypertext in “As We May Think” (1945) The concept of automated text document analysis, organization and retrieval was met with a good deal of skepticism until the 1990s. Some critics now grudgingly concede that it might work Calculating Similarity The Vector Space Model Each feature found in a file is assigned a weight based on the frequency of its occurrence in the file and how common that feature is in the collection Similarity between files is calculated based on common features and their weights. If two files share features not common to the entire collection, their similarity value will be very high This vector space model (Salton) is the basis of many text search engines, and also works well with audio files For text files, features are words or character strings. For Audio files, features are prominent frequencies within frames of audio or sequences of frequencies across frames. Some Digital Audio History Uploaded to Compuserve 10/1985 – one of the most popular downloads at the time! 10 seconds of digital audio Time to download (300 baud): 20 minutes Time to load: 20 minutes (tape) 2 minutes (disk) Storage space: 42K From this to Napster in less than 15 years Explosion of Digital Audio 1500 1000 physical digital 500 0 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 RIAA Sales Figures (millions) Digital audio today similar to text 15 years ago Poised for 2nd phase of the digital audio revolution? – Ubiquitous, easy to create, access, share – Lack of tools to analyze, search or organize How can we organize this enormous and growing volume of digital audio data for discovery and retrieval? What’s done today Pandora - Music Genome Project – expert manual classification of ~ 400 attributes Allmusic – manual artist similarity classification by critics last.fm – Audioscrobbler – collaborative filtering from user playlists iTunes Genius – collaborative filtering from user playlists What’s NOT done today Any analysis (outside of research) of similarity or classification based on the actual audio content of song files Possible Hybrid Solution Automated Analysis User Behavior Manual Metadata Classification/Retrieval system could use elements of all three methods to improve performance Music Information Retrieval Applying traditional IR techniques for classification, clustering, similarity analysis, pattern matching, etc. to digital audio files Recent field of study, has accelerated with the inception of the ISMIR conference in 2000 and MIREX evaluation in 2004. Common Basis of an MIR System Select very small segment of audio data, 20-40ms Use fast Fourier transform (FFT) to convert to frequency data This ‘frame’ of audio becomes the equivalent of a word in a text file for similarity analysis The output of this ‘feature extraction’ process is input to various analysis or classification processes PFASC additionally combines prominent frequencies from adjacent frames to create temporal sequences as features PFASC as an MIR Project Parallel Framework for Audio Similarity Clustering Initiated at IU in 2008 Team includes School of Library and Information Science (SLIS), Cognitive Science, School of Music and Pervasive Technologies Institute (PTI) Have developed MPI-based feature extraction algorithm, SVM classification, vector space similarity analysis, some preliminary visualization. Wish list includes graphical workflow, job submission portal, use in MIR classes PFASC Philosophy and Methodology Provide an end-to-end framework for MIR, from workflow to visualization Recognize temporal context as an critical element of audio and a necessary part of feature extraction Simple concept, simple implementation, one highly configurable algorithm for feature extraction Dynamic combination and tuning of results from multiple runs, user controlled weighting Make good use of available cyberinfrastructure Support education in MIR PFASC Feature Extraction Example 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 folk hiphop rock 0.1 0 Summary of 450 files classified by genre, showing most prominent frequencies across spectrum PFASC Similarity Matrix Example Hip Hop Folk Rock Hip Hop 0.115 0.049 0.042 Folk 0.049 0.087 0.024 Rock 0.042 0.024 0.168 Audio file summarized as a vector of feature values, similarity calculated between vectors Value is between 0.0 and 1.0, 0.0 = no commonality, 1.0 = files are identical In the above example, same genre files had similarity scores 3.352 times higher than different genre files Classification vs. Clustering Most work in MIR involves classification, e.g. genre classification, an exercise that may be arbitrary and limited in value Calculating similarity values among all songs in a library may be more practical for music discovery, playlist generation, grouping by combinations of selected features Calculating similarity is MUCH more computationally intensive than categorization, comparing all songs in a library of 20,000 files requires ~200 million comparisons Using Condor for Similarity Analysis Good fit for IU Condor resources, a very large number of short duration jobs Jobs are independent, can be restarted and run in any order Large number of available machines provides great wall clock performance advantage over IU supercomputers PFASC Performance and Resources A recent run of 450 jobs completed in 16 minutes. Time to run in serial on a desktop machine would have been about 19 hours Largest run to date contained 3,245 files, over 5 million song-to-song comparisons, completed in less than eight hours, would have been over 11 days on a desktop Queue wait time for 450 processors on IU’s Big Red is typically several days, for 3000+ processors it would be up to a month Porting to Windows Visualizing Results Visualizing Results PFASC Contributors Scott McCaulay (Project Lead) Ray Sheppard (MPI Programming) Eric Wernert (Visualization) Joe Rinkovsky (Condor) Steve Simms (Storage & Workflow) Kiduk Yang (Information Retrieval) John Walsh (Digital Libraries) Eric Isaacson (Music Cognition) Thank you!