International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 5- Feb 2014 A Novel Co-Clustering Mechanism for Generation of Optimal Clusters Kalpana Palla1, P.Rajasekhar2 1 2 M.Tech Scholar, Assistant Professor Dept. of CSE, Avanthi Institute of Engineering and Technology,Visakhapatnam. 1,2 Abstract: Co-clustering is an important research issue in now a days in the field of data mining, musical data is one type of example foe this proposal, We are proposing an efficient data clustering mechanism with incremental clustering algorithm with genetic feature, Clustering is a mostly unsupervised procedure and the majority of the clustering algorithms depend on certain assumptions in order to define the subgroups present in a data set. In our approach we are performing incremental clustering on music dataset which consists of artists and tags, for optimal clusters apart from the traditional approaches. I.INTRODUCTION Clustering is the Processing of grouping the similar type of objects based on the similarity between the data objects, this similarity can be measures by the distance or semantic similarity or any other non trivial measure which gives the relation between the data objects, Grouping the musical data takes more interesting in the recent days of interest based on the feature set of the musical data objects. Even though various approaches released in the years of research every approach has their own advantages and disadvantages like random selection of the centroios,local optima, implementation complexity issues while dealing with tree structures and many other. In the real time scenario every data object need not to maintain same dimensionality. Music artist similarity has been an active research topicin music information retrieval for a long time since it is especially useful for music recommendation and organization[1, 2]. Many characteristics can be brought into considerationfor defining similarity, e.g., sound, lyrics, genre,style, and mood. Methods for calculating artistic similarityinclude recent proposals that are based on the similarityinformation provided by the All Music Guide website(http://www.allmusic.com) as well as those based onthe user access history (e.g., see [10]). Although there hasbeen considerable effort into developing effective and efficient method for calculating artist similarity, several challengesstill exist. First, artist similarity varies considerablywhen considering different aspects of artists such as genre,mood, style, culture, and acoustics. Second, the user accesshistory data are often very sparse and hard to ISSN: 2231-5381 acquire.Third, even if we can obtain the categorical descriptionsof two artists using All Music Guide, comparing the descriptionsis not trivial since there are semantic similaritiesAmong different descriptions. For example, given two moodterms witty and thoughtful, we cannot simply quantify theirsimilarity as 0 just because they are different words or as 1because they are synonyms. There are a number of key concepts to consider when comparingthese approaches. The cold start problem refers to thefact songs that are not annotated cannot be retrieved. Thisproblem is related to popularity bias in that popular songs (inthe short-head) tend to be annotated more thoroughly thanunpopular songs (in the long-tail) [11]. This often leadsto a situation in which a short-head song is ranked abovea long-tail song despite the fact that the longtail song maybe more semantically relevant. We prefer an approach thatavoids the cold start problem (e.g., autotagging). If this isnot possible, we prefer approaches in which we can explicitlycontrol which songs are annotated (e.g., survey, games),rather than an approach in which only the more popular songsare annotated (e.g., social tags, web documents).A strong labeling [3] is when a song has been explicitlylabeled or not labeled with a tag, depending on whether ornot the tag is relevant. This is opposed to a weak labeling inwhich the absence of a tag from a song does not necessarilyindicate that the tag is not relevant. For example, a songmay feature drums but is not explicitly labeled with the tag“drum”. Weak labeling is a problem if we want to designa MIR system with high recall, or if our goal is to collecta training data set for a supervised autotagging system thatuses discriminative classifiers (e.g., [4, 7]).It is also important to consider the size, structure, andextensibility of the tag vocabulary. In the context of textbasedlarge anddiverse set of semantic tags, where each tag describes somemeaningful attribute or characterization of music. In thispaper, we limit our focus to tags that can be used consistentlyby a large number of individuals when annotating novelsongs based on the audio content alone. This does not includetags that are personal (e.g., “seen live”), judgmental (e.g.,“horrible”), or represent external knowledge about the song(e.g., geographic origins of an artist). It should be noted thatthese tags are also useful for retrieval (and recommendation) and merit additional attention from the MIR community. http://www.ijettjournal.org Page 246 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 5- Feb 2014 II. RELATED WORK Music data usually consists of the artists and their respective styles and which uses the relevance matrix between the number of songs related to the artist.Intially computes the distance between the data objects which means data retrieved from the relenace matrix and place the data points in nearest clusters until the termination point meets. meansclustering, constrained spectral clustering, and constrainedclustering using non-negative matrix factorizations; and little has been done on utilizing constraints forhierarchical clustering. Recently, there do exist a few works onincorporating constraints into hierarchical clustering (e.g., byextending the partial known hierarchy with the constraints toa full hierarchy or by modifying the order of cluster mergingprocess). III.PROPOSED SYSTEM Co-clustering gives the clustering the more than one type of data slimantaneously like documents and ur;ls and while text clustering styles and artists for music data clustering, our hierarchical approach merges the similar data based on the relevance matrix and similarity matrix between the data points. Hierarchal approach generates the optimal cluster by aging meging the similardata objects after forming the data relation between the objects .Hierarchical Clustering is the generation of tree-like clusterstructures without user supervision. Hierarchical clustering algorithmsorganize input data either bottom-up (agglomerative)or top-down (divisive) [12]. In general hierarchical agglomerative clustering ismore frequently used than hierarchical divisiveclustering. Co-clustering refers to clustering of more than onedata type. Dhillon proposes bipartite spectral graph partitioningapproaches to co-cluster words and documents. Long etalpropose a general principled model, called relation summarynetwork, for co-clustering heterogeneous data presentedas a k-partite graph.While hierarchical clustering deals with only one type ofdata and co-clustering produces only one level of data organization,hierarchical co-clustering aims at simultaneouslyconstructing hierarchical structures for two or more data types,that is, it attempts to achieve the function of both hierarchicalclustering and co-clustering. Because of this unique nature,hierarchical co-clustering is receiving special attention from researchers[14], [15]. Xuet al. proposed a hierarchical divisiveco-clustering algorithm [17] to simultaneously find out documentclusters and the associated word clusters. Shao et al. [16] incorporated this hierarchical divisive co-clustering algorithminto a novel artist similarity quantifying framework for thepurpose of assisting artist similarity quantification by utilizingthe style and mood clusters information. In their framework, theartist similarity is based on style similarity and mood similarity.Even though this hierarchical divisive coclustering methodis given, to our best knowledge, few researchers have studied the hierarchical agglomerative coclustering methods .In recent years, much work has been done on constrainedclustering—integrating various forms of background knowledgein the clustering process. Existing constrained clusteringmethods have been focused on the use of background informationin the form of instance level “must-link” and “cannot-link”constraints, which, as the naming suggests, assert that, for apair of data instances, they must be in the same cluster andthey should be in distinct clusters, respectively. Most of theconstrained clustering algorithms in the literature are designedfor partitional clustering methods e.g., constrained K- ISSN: 2231-5381 In this paper we are proposing an efficient clustering algorithm for cluster the music data, with respect to the artists and tags, in this proposed we combined k-means algorithm with genetic algorithm for optimal clusters, genetic algorithm gives the solutions for the problems like NP hard problem, apart from the traditional random selection of the centroids we proposed a novel approach i.e intra cluster variances for specifying the number of centroids. Initially we construct the relevance matrix between the artists and styles or moods of the music objects and then compute the intra cluster varience or distance between the data objects and computes the minimum distance with respect to number of clusters specified by the user and then forwarded to evolutionary approach for optimal clusters Algorithm: Musical CoClustering Step 1: Initialize the musical data with Artists and Styles or moods Step 2 : Read the input number of clusters Step 3: compute the relevance matrix between artists and styles/moods Step 4: for i:=0 to Max_Number_Of_Iterations Compute manhatten distance between data points Step 6: Store minimum distance and minimum index Step 7 : Group the similar data points Step 8: Return the clusters Step 9: terminate Compute the relevance matrix between artists and styles/ moods, which indicates the number of relations between data points and compute the manhattern distance between the data points and specify the minimum number of clusters, initially select random selection of the centroids and compute the manhattern distance between all the data points and compute the minimum distance and minimum index of the datapoints and place it in respective clusters and continue same process, select the another centoird it should not be repeated as previous centroid and proceed with the same process until maximum number of iterations. http://www.ijettjournal.org Page 247 International Journal of Engineering Trends and Technology (IJETT) – Volume 8 Number 5- Feb 2014 For optimal performance compute with agglutinative approach by merging the same set of data points which has same set of data points, Combine the data points which has same distance and again cluster the data in hierarchal manner (tree structure) in hierarchal manner the following diagram shows the clusters formed after merging of data points as follows IV.CONCLUSION Finally we conclude that we proposed an efficient novel co clustering of musical data, user need not to specify the number of clusters, it can be manipulated through intra cluster variances, and for distance measure we had used the genetic feature for best fitness or optimal clusters. REFERENCES [1] J.J. Aucouturier, F. Pachet, P. Roy, and A. Beurive. Signal + context = better classification. ISMIR, 2007. [2] L. Barrington, M. Yazdani, D. Turnbull, and G. Lanckriet. Combining feature kernels for semantic music retrieval. ISMIR, 2008. [3] G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos.Supervised learning of semantic classes for image annotation and retrieval.IEEEPAMI, 29(3):394–410, 2007. [4] O. Celma, P. Cano, and P. Herrera. Search sounds: An audio crawler focused on weblogs. In ISMIR, 2006. [5] S. Clifford. Pandora’s long strange trip.Inc.com, 2007. [6] J. S. Downie. Music information retrieval evaluation exchange (MIREX), 2005. [7] D. Eck, P. Lamere, T. Bertin-Mahieux, and S. Green. Automatic generation of social tags for music recommendation.In Neural InformationProcessing Systems Conference (NIPS), 2007. [8] W. Glaser, T. Westergren, J. Stearns, and J. Kraft.Consumer item matching method and system.US Patent Number 7003515, 2006. [9] P. Knees, T. Pohle, M. Schedl, D. Schnitzer, and K. Seyerlehner. A document-centered approach to a natural language music search engine.In ECIR, 2008. [10] P. Knees, T. Pohle, M. Schedl, and G. Widmer. A music search engine built upon audio-based and web-based similarity measures. In ACMSIGIR, 2007. [11] P. Lamere and O. Celma. Music recommendation tutorial notes. ISMIR Tutorial, September 2007. ISSN: 2231-5381 [12] E. L. M. Law, L. von Ahn, and R. Dannenberg. Tagatune: a game for music and sound annotation. In ISMIR, 2007. [13] M. Levy and M. Sandler. A semantic space for music derived from social tags. In ISMIR, 2007. [14] M. Mandel and D. Ellis.A web-based game for collecting music metadata.In ISMIR, 2007. [15] C. McKay and I. Fujinaga. Musical genre classification: Is it worth pursuing and how can it be improved? ISMIR, 2006. [16] F. Miller, M. Stiksel, and R. Jones.Last.fm in numbers.Last.fm press material, February 2008. [17] M. Sordo, C. Lauier, and O. Celma. Annotating music collections: How content-based similarity helps to propagate labels. In ISMIR, 2007. [18] D. Turnbull. Design and Development of a Semantic Music Discovery Engine.PhD thesis, UC San Diego, 2008. [19] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet. Semantic annotation and retrieval of music and sound effects.IEEE TASLP, 16(2), 2008. [20] D. Turnbull, R. Liu, L. Barrington, and G Lanckriet. Using games tocollect semantic information about music.In ISMIR ’07, 2007. BIOGRAPHIES P.Rajasekhar completed M.Tech GITAM University.He is working as Assistant Professor in Dept. of Computer Science and Engineering Avanthi Institute of Engineering and Technology affiliated to JNTU, Kakinada. He has an experience of 10 years in teaching. His interesting areas are Software Engineering, Data Modeling. Kalpana Palla completed her BTech (Information Technology) in Sri Vasavi Engineering College affiliated to JNTU, Kakinada. She is pursuing MTech (Software Engineering) Avanthi Institute of Engineering and Technology affiliated to JNTU, Kakinada. Her interested areas are Software Engineering and BioInformatics. http://www.ijettjournal.org Page 248