DOES MUSIC SIMILARITY MEASURES MAKE SENSE? USING MSD TO EXTRACT TRENDS AND RELATIONSHIPS IN MUSIC SIMILARITY SPACE By: Asma Rafiq PhD Student Centre for Digital Music Content Introduction Motivation Challenges Research Questions Research Plan Conclusion My Background Topper of MS Software Engineering class as EURECA scholar, Sep 09 – July 10 and scholarship holder of President Talent Farming Scheme during Bachelor of IT. Completed master’s thesis within a month duration and produced a publication on that Produced another publication based on a project done as a part of a course module at master’s level Developed a SIS for SU and WMS as a group project Worked as Intern on Year Book Profile Survey, performed other tasks such as content-viewing, case study analysis, success story collection and review, etc. at Pakistan Software Export Board (Govt) Ltd., Ministry of IT. Poster presentation at C4DM 10th Anniversary and 3rd EECS PG Conference. Publications: Publication based on Master’s Thesis: Rafiq, A. and Georgieva, O. “Combined Search Trends”, International Conference on Automatics and Informatics, pp. 1–4, Oct 2010, Sofia Bulgaria. Publication based on IT Entrepreneurship coursework: Wahid, A. Rafiq, A., Ahmad, F. and Ruskuv, P. “Discovering business opportunities via Search Trends”. International Conference for Entrepreneurship, Innovation and Regional Development, pp. 818–826, May 2010, Novi Sad, Serbia. Publication on Social Networks Mesnage, C.S., Rafiq, A., Dixon, S. and Brixtel, R., “Music Discovery with Social Networks”, Workshop on Music Recommendation and Discovery, ACM RecSys, October, 2011, Chicago, IL, USA. Then, why did I reached this stage? Started with “Modeling Users' Intentions for the Enhancement of Music Recommendation Systems” Online Social Network profiles seemed like a good starting point to gather information to extract user intent idea was to make “Music recommendations for events using Online Social Networks” Garg, R., et al, (2011) estimated (empirically) that music discovery increases 6 times due to peer influence in SNS. Various unexpected issues arose which were beyond my control and caused delay in the development of the project instead of May, it was launched in June. Overall, working with social networks without a prior agreement can be extremely problematic. Contribution to PhD Thesis A chapter on “Music Recommendation and Discovery using Online Social Networks” Two papers will be used to back this work. One is already published in WOMRAD 2011 and another submission to a relevant conference such as ACM RecSys 2012 will be done next year. Next Two Chapters: Using Million Song Dataset (which is a recently released, freely-available collection of detailed audio features and metadata for a million contemporary popular music tracks) to explore trends and relationships in music similarity space Why MSD? No dependence to gather data, it is a fixed dataset and a copy stored at QMUL (Although, it needs to be updated with new dataset from Lastfm and Echonest (Taste Profile), and MusixMatch (Lyrics) A very large dataset that supports statistically significant results. Music Data Mining Music Data Mining focuses on extracting valuable information from the large datasets containing music-related data in order to fulfill different user needs such as retrieval and classification of music. Why Music Similarity Space? To investigate whether the music similarity is really useful or not, and find relationships between items already identified as similar e.g. Do similar artists (according to Echonest) play similar tracks (Last.fm)? Can we predict future relations to appear in similar fashion? Motivation Developing a better understanding of the music and the artists that perform the music The large dataset can be used to design automated algorithms that replace and support human decision-making for similarity between different items. Sophisticated analytics of large dataset can substantially reveal valuable insights, that would otherwise remain hidden. Challenges Many of the best performing music similarity estimation techniques suffer from very high computational complexity as they are based on techniques such as the Kullback-Leibler Divergence, Monte-Carlo sampling or the Earth Mover's Distance (EMD) These techniques are also difficult to scale with standard indexing techniques as they produce non-metric similarity spaces Levy and Sandler (2006) proposed a modified approach based on the Mahalanobis distance, which produces a metric similarity space at a lower computational cost, but results in lower level of performance. Research Questions What insights about music similarity can we gain from mining music data? What are the relationships between similarity metrics in various systems? What are some of the challenges in processing these extremely large datasets? How likely users of a different system (i.e. undisclosed partner of Echonest) are going to play similar songs precomputed by last.fm? Echonest Taste Profile user dataset and Lastfm pre-computed similar songs will be used for this purpose. The user-track-playcount triple will be matched against the similar songs from Lastfm to that track and that specific user will be looked up again, if he/she has listened to any of the similar songs computed by last.fm Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. Association rule generation is usually split up into two separate steps: First, minimum support is applied to find all frequent itemsets in a database. Second, these frequent itemsets and the minimum confidence constraint are used to form rules. Apriori Algorithm is the best to mine association rules. It uses a breadth-first search strategy to counting the support of itemsets and uses a candidate generation function which exploits the downward closure property of support. It attempts to find subsets which are common to at least a minimum number C of the itemsets. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation), and groups of candidates are tested against the data. The algorithm terminates when no further successful extensions are found. Continued from previous slide… It shall be used to find relationship between similar songs listened by the user in two different system The results of this experiment will be of interest to learn whether these two systems (Echonest and Lastfm) recommend similar songs. Also, how many times user played similar songs. Why would the user listen to these tracks? What factors make them similar? We can expand this research by incorporating other features from the MSD such as year of release, danceability, beat, energy, loudness, tempo, etc. and determine which of the features plays the most important A demo will be presented as a proof-of-concept Research Plan WORKPLAN WP1: Comparison of Song Similarity between Echonest and Lastfm WP2: Relationship between similar artists (Echonest) and similar tracks (Lastfm) WP3:: Relationship between similar tracks and related tags in Lastfm and MusicBrainz WP4: Finding relationship between lyrics and tags related to same track WP4: Stage 2 WP5: Development of Similarity Space which utilises these relationships 1 2 Month 4 5 3 6 7 8 Complete Complete Complete Complete Complete … Conclusion A better understanding and gaining insights of music similarity with the help of large scale music related data Development of algorithms that could scale up for commercial systems Revealing relationships between various features related to music for determining what makes music sound similar? Thank you for your attention! Questions? References Garg, R.; Smith, M.D.; Telang, R.; , "Discovery of Music through Peers in an Online Community," System Sciences (HICSS), 2011 44th Hawaii International Conference on , vol., no., pp.1-10, 4-7 Jan. 2011 doi: 10.1109/HICSS.2011.168 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&ar number=5718904&isnumber=5718420 M. Levy and M. Sandler, 2006. “Lightweight measures for timbral similarity of musical audio” Proceedings of the 1st ACM workshop on Audio and music computing multimedia, pages 27-36, 2006.