STAT 503 Case Study: Clustering of music clips 1 Description This data was collected by Dr Cook from her own CDs. Using a Mac she read the track into the music editing software Amadeus II, snipped and saved the first 40 seconds as a WAV file. (WAV is an audio format developed by Microsoft, commonly used on Windows but it is getting less popular.) These files were read into R using the package tuneR. This converts the audio file into numeric data. All of the CDs contained left and right channels, and variables were calculated on both channels. The resulting data has 62 rows (cases) and 7 columns (variables). • LVar, LAve, LMax: average, variance, maximum of the frequencies of the left channel. • LFEner: an indicator of the amplitude or loudness of the sound. • LFreq: Median of the location of the 15 highest peak in the periodogram. There are 11 tracks by Abba, 11 from the Beatles and 10 the Eels, which would be considered to be Rock, and 13 tracks by Vivaldi, 6 of Mozart and 8 of Beethoven, considered to be Classical. There are also 3 tracks from Enya, considered to be New Wave. The main question we want to answer is: “Can we group the tracks into a small number of clusters according to their similarity on audio charactieristics?” This information might be used to arrange tracks on a digital music player. Other questions of interest might be: • Do the rock tracks have different characteristics than classical tracks? • How does Enya compare to rock and classical tracks? • Are there differences between the tracks of different artists? 1 2 Plan for Analysis Approach Summary statistics (marginal and conditional) Reason extract location/scale information Plots explore data distributions Numerical clustering Grouping the tracks into clusters of similar audio attributes. Use hierarchical, k-means, model-based and self-organizing maps. 2 Type of questions addressed “How are rock tracks different on average than classical tracks?” “What is the average LAve for Abba relative to other Artists?” “Are there unusual tracks?”, “Is there any obvious clustering of the tracks?” “Which tracks might be considered alike?” 3 3.1 Results Summary Statistics LVar 1.99×107 2.64×107 Mean SD LAve -7.81 47.22 LMax 2.25×104 8.76×103 LFEner 104.03 5.48 LFreq 231.39 176.69 Table 1: Overall means and standard deviations of the variables. Artist Abba Beatles Eels LVar 8.52×106 4.45×107 5.11×107 LAve -81.5 -5.99 4.59 LMax 2.35×104 2.76×104 3.13×104 LFEner 103 108 108 LFreq 135 147 181 Beethoven Mozart Vivaldi 7.61×106 4.69×106 3.00×106 -0.74 -5.94 39.1 2.11×104 1.89×104 1.45×104 101 101 102 350 396 305 Enya 5.03×107 -11.8 1.61×104 103 95 Table 2: Means of the variables by artist. The classical tracks in general have lower LVar than rock tracks. Abba has substantially lower LAve on average than all other artists, and Vivaldi has substantially higher values on average. The LMax values are similar on average for all artists. Beatles and Eels have higher LFEner values on average. Classical tracks have lower LFreq values on average than rock tracks. 3 3.2 Plots The dotplots in Figure 1 show the distribution of values for each artist. Abba tracks have unusually low values of LAve. Two Eels tracks have unusually large LVar values. One Beatles track has an unusually low LFEner value. LVar LAve LMax Vivaldi Mozart Enya Eels Beethoven Beatles Abba 0.0e+00 5.0e+07 1.0e+08 −100 0 LFEner 100 200 5000 15000 25000 LFreq Vivaldi Mozart Enya Eels Beethoven Beatles Abba 85 90 95 100 105 110 1150 200 400 600 800 Figure 1: Dotplots of each variable by Artist. This is a snapshot from the tour, that reveals a number of features in the data. Saturday Morning and V6 are two unusual tracks that are simply outliers. Several tracks are different to their type of music: Hey Jude, B4, B8. There is some obvious clustering. The Abba tracks are distinguishable from the tracks of other artists, mostly due to LAve. There is a cluster of rock tracks, a mixture of Eels and Beatles tracks. 4 4 4.1 Cluster Analysis Hierarchical 0.0e+00 1.2e+08 0e+00 Saturday Morning 4e+07 0e+00 Saturday Morning Saturday Morning All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Girl Cant Buy Me Love Rock Hard Times I Feel Fine Help Ticket to Ride Penny Lane Lone Wolf I Want to Hold Your Hand Love Me Do Waterloo Yesterday B4 The Good Old Days Eleanor Rigby Dancing Queen Agony Restraining Anywhere Is V6 B8 B3 Mamma Mia M6 B1 HeyJude Knowing Me Take a Chance M3 V11 Pax Deorum M5 B5 V10 V8 B2 V4 B6 The Winner The Memory of Trees V5 V2 V12 V13 M1 M2 V1 V7 B7 I Have A Dream SOS Lay All You Money V3 Super Trouper V9 M4 4e+08 8e+08 Ward Single Girl Knowing Me Take a Chance M3 B1 HeyJude B3 Mamma Mia M6 V11 Pax Deorum M5 B5 The Winner The Memory of Trees V5 V2 V12 V10 V8 B2 V4 B6 Lay All You Money V3 V13 M1 M2 V1 V7 B7 I Have A Dream SOS Super Trouper V9 M4 Restraining Anywhere Is Dancing Queen The Good Old Days Eleanor Rigby Agony V6 B8 Love Me Do Waterloo Yesterday B4 All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Cant Buy Me Love Ticket to Ride Rock Hard Times I Feel Fine Help Penny Lane Lone Wolf I Want to Hold Your Hand 2e+07 hclust (*, "ward") Complete music.dist hclust (*, "single") V13 M1 M2 V1 V7 B7 I Have A Dream SOS V11 Pax Deorum M5 B5 V10 V8 B2 V4 B6 The Winner The Memory of Trees V5 V2 V12 Knowing Me Take a Chance M3 B1 HeyJude B3 Mamma Mia M6 Lay All You Money V9 M4 Super Trouper V3 Love Me Do Waterloo Yesterday B4 Dancing Queen The Good Old Days Eleanor Rigby Restraining Anywhere Is Agony V6 B8 Girl Cant Buy Me Love Ticket to Ride All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Rock Hard Times I Feel Fine Help Penny Lane Lone Wolf I Want to Hold Your Hand 6.0e+07 Wards linkage suggests two clusters are suitable to summarize the data. This would result in one cluster of 14 purely rock tracks, and a second cluster of 48 mixed tracks. A three cluster solution would break the large cluster into two, one with 12 tracks (8 rock, 3 classical, 1 new wave) and the other with 36 tracks (10 rock, 24 classical, 2 new wave). With single linkage individual tracks are sequentially peeled off the pack, illustrating the skewed nature of the data. Saturday morning, and Girl are singleton clusters. The other 12 tracks from the Wards linkage first cluster are grouped together by single linkage, too. Saturday Morning is peeled off into a singleton cluster by complete. linkage. And the other 13 tracks of from the Wards linkage first cluster are grouped together by complete linkage. The second cluster of 12 tracks (8 rock, 3 classical, 1 new wave) from wards linkage is also grouped together by complete linkage. music.dist<-dist(d.music[,-c(1:2)]) music.hc1<-hclust(music.dist,method="ward") music.hc2<-hclust(music.dist,method="single") music.hc3<-hclust(music.dist,method="complete") par(mfrow=c(3,1),mar=c(1,2,2,2)) plot(music.hc1,main="Ward",xlab=" ") text(music.hc1) 5 Comparison of methods, using confusion tables cl.14<-cutree(music.hc1,4) cl.24<-cutree(music.hc2,4) cl.34<-cutree(music.hc3,4) table(cl.14,cl.24) table(cl.14,cl.34) table(cl.34,cl.24) When the dendrograms are cut into 4 clusters the results are as follows for the different methods. Ward 1 2 3 4 1 12 36 0 0 Single 2 3 0 0 0 0 12 0 0 1 4 0 0 1 0 Ward 1 2 3 4 1 12 36 0 0 Complete 2 3 0 0 0 0 10 0 0 1 4 0 0 3 0 Complete 1 2 3 4 1 48 0 0 0 Single 2 3 0 0 10 0 0 1 2 0 4 0 0 0 1 It looks like there is a big difference between the methods but its due mostly to the singleton clusters peeled off in single and complete linkage. When the dendrograms are cut into 5 clusters the results are as follows for the different methods. Ward 1 2 3 4 5 1 12 36 0 0 0 Single 2 3 0 0 0 0 4 0 0 1 0 0 4 0 0 0 0 1 5 0 0 0 0 8 Ward 1 2 3 4 5 1 12 0 0 0 0 Complete 2 3 0 0 36 0 0 4 0 0 0 6 4 0 0 0 1 0 5 0 0 0 0 3 Complete 1 2 3 4 5 1 12 36 0 0 0 Single 2 3 0 0 0 0 4 0 0 1 0 0 4 0 0 0 0 1 Wards linkage and complete linkage agree the most. There are two sets of tracks which both methods agree are cohesive clusters (labeled 1 and 2 by both methods) with 12 and 36 tracks respectively. 6 5 0 0 6 0 2 Because there is a difference in variance between the variables, and we’re using Euclidean distance, we really should standardize the data before clustering it. These are the results for standardized data. 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 Saturday Morning V5 HeyJude V3 V1 Money Take a Chance Super Trouper Knowing Me Mamma Mia Lay All You Dancing Queen Waterloo SOS I Have A Dream The Winner 3.0 0 V6 B6 B4 Girl V2 V7 5 10 20 Saturday Morning Girl Rock Hard Times I Want to Hold Your Hand All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Cant Buy Me Love Ticket to Ride Lone Wolf Help I Feel Fine Penny Lane Dancing Queen Waterloo Take a Chance Knowing Me Mamma Mia Lay All You Super Trouper Money V5 HeyJude V2 V7 B4 B1 M6 M3 B3 V10 M2 B6 M1 M5 M4 B5 SOS I Have A Dream The Winner V4 B2 V8 V11 The Memory of Trees V12 Pax Deorum V13 V6 Agony B8 The Good Old Days Yesterday Anywhere Is Love Me Do Eleanor Rigby B7 V9 Restraining V1 V3 30 Ward Single V6 V10 M2 V4 B2 V13 Pax Deorum The Memory of Trees V12 V8 V11 Rock Hard Times I Want to Hold Your Hand Anywhere Is B7 V9 Restraining Lone Wolf Cant Buy Me Love Ticket to Ride Help I Feel Fine Penny Lane All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Love Me Do Eleanor Rigby B8 The Good Old Days Yesterday Agony B1 M6 M3 B3 B5 M4 M1 M5 2.5 hclust (*, "ward") Complete music.dist hclust (*, "single") V2 V7 V5 HeyJude Dancing Queen Waterloo Take a Chance Mamma Mia Lay All You Knowing Me Super Trouper Money SOS B4 M6 M3 B3 Agony B1 All in a Days Work Yellow Submarine Love of the Loveless Wrong About Bobby Love Me Do Eleanor Rigby Anywhere Is B8 The Good Old Days Yesterday I Have A Dream The Winner V8 V11 V4 B2 The Memory of Trees V12 B6 M4 B5 M1 M5 V10 M2 V1 V3 Pax Deorum V13 B7 V9 Restraining Saturday Morning Cant Buy Me Love Ticket to Ride Lone Wolf Help I Feel Fine Penny Lane Girl Rock Hard Times I Want to Hold Your Hand Wards linkage suggests five clusters are suitable to summarize the data. Two of the clusters would be exclusively rock tracks. The remaining clusters would contain a mix of rock and classical and new wave. Single linkage peels a number of individual tracks off the pack, finding the outliers in the data. V6, Saturday morning, V2, V7, V5, Hey Jude, V3, V1 are peeled off before larger groups are formed. Complete linkage also peels off some of the unusual tracks: V6, Saturday morning,V2, V7, V5, Hey Jude. When the dendrograms are cut into 5 clusters the results are as follows for the different methods. Ward 1 2 3 4 5 1 8 11 12 14 13 Single 2 3 0 0 0 0 0 1 1 0 0 0 4 0 0 0 1 0 5 0 0 0 0 1 Ward 1 2 3 4 5 Complete 1 2 3 8 0 0 11 0 0 12 0 0 12 2 2 4 0 0 4 0 0 1 0 0 5 0 0 0 0 10 Complete 1 2 3 4 5 1 47 0 2 0 9 Single 2 3 0 0 1 0 0 0 0 1 0 0 4 0 1 0 0 0 There is little agreement at 5 clusters, because single and complete linkage have mostly peeled off singleton and small clusters, up to this point. There is one group of 10 tracks where Wards and complete agree. 7 5 0 0 0 0 1 We investigate the solution by linking the cluster identities with a tour plot. The 10 tracks where Wards linkage and complete linkage agree form a tight group in the 5D data space. 8 4.2 k-means k-means clustering, with k = 2, . . . , 14 is computed on the data. The 5 cluster solution is given below. Wards 1 2 3 4 5 1 0 0 3 0 14 k-means 2 3 0 0 0 9 1 5 0 0 0 0 4 8 2 1 0 0 5 0 0 3 16 0 The results match for roughly 47 out of 62 cases. The mapping from Wards to k-means is as follows: 1 → 4 (8 tracks) 2 → 3 (9 tracks) 4 → 5 (16 tracks) 5 → 1 (14 tracks) Which allows us to rearrange the confusion table. Wards 5 3 2 1 4 1 14 3 0 0 0 k-means 2 3 0 0 1 5 0 9 0 0 0 0 4 0 1 2 8 0 5 0 3 0 0 16 The 5-cluster results of k-means agree substantially with Wards linkage hierarchical clustering. The similar groupings are: Wards 5 k-means 1 2 3 1 4 4 5 Tracks All in a Days Work, Saturday Morning, Love of the Loveless, Girl, Rock Hard Times, Lone Wolf, Wrong About Bobby, I Want to Hold Your Hand, Cant Buy Me Love, I Feel Fine, Ticket to Ride, Help, Yellow Submarine, Penny Lane The Winner, V4, V8, B2, The Memory of Trees, Pax Deorum, V11, V12, V13 Dancing Queen, Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, Money, Waterloo V2, V5, V7, V10, M1, M2, M3, M4, M5, M6, B1, B3, B4, B5, B6, HeyJude The results are interesting: 8 of the 11 Abba tracks are in one cluster (W1,K4), one cluster contains purely rock tracks (W5,K1), and one cluster contains all classical tracks except for the unusual Beatles track Hey Jude (W4,K5). library(mva) music.km2<-kmeans(d.music.std[,-c(1,2)],2) table(cl.15,music.km5$cluster) 9 4.3 SOM It is important to use standardized data, because SOM is using Euclidean distance. After trying several sizes of SOM, we settled on a 6 × 6 grid. This is about the largest size map we can fit with 62 data points. We used the linear initialization, both gaussian and bubble neighborhoods, and differing numbers of iterations. The favorite model is summarized below. 6 SOM Map Money Super Trouper I Have A Dream 5 Waterloo Dancing Mamma Mia Queen KnowingTake Me a Chance Lay All You SOS V4 The V8 Winner Pax Deorum The Memory of Trees B2V11 y 3 4 V12 V5 2 HeyJude All in aLove Days Work Me Do Wrong About Bobby RestrainingV13 Yesterday Anywhere Is Eleanor Rigby Yellow Submarine I Feel Fine B8 B7 V9 M5 M1 The Good Old Days Love of the Loveless B6 B5 1 B3 M3 B4 M4 M6 0 V2 V10 Penny Lane Ticket to Ride Help Lone Wolf Rock Hard Times Agony B1 Cant Buy Me Love I Want to Hold Your Hand M2 V7 0 1 Girl V3 V6 V1 2 3 Saturday Morning 4 5 x 10 6 music.som<-som(d.music.std[,-c(1:2)],6,6, neigh="bubble",rlen=1000) xmx<-jitter(music.som$visual$x,factor=3) xmy<-jitter(music.som$visual$y,factor=3) par(mfrow=c(1,1),pty="s") plot(xmx,xmy,type="n",pch=16,xlab="x", ylab="y",main="SOM Map", xlim=c(-0.5,6),ylim=c(-0.5,6)) text(xmx,xmy,dimnames(d.music)[[1]]) The rock tracks are mostly in the upper right, and the classical tracks are mostly in the lower left. Unusual tracks Girl and Saturday Morning are at one corner. The Abba tracks are mostly clustered together. Vivaldi tracks seem to be the most different classical tracks, on the lower left fringe. 4.4 Model-based It isn’t important to standardize the data before model-based clustering because the model accounts for different variances between variables. We explore the fit for five variance-covariance parametrizations, EII, VII, EEE, EEV, VVV, for the number of clusters ranging from 1-36. The upper limit matches the number of clusters used in SOM. The BIC values for the different models are plotted below. −9000 −11000 BIC −7000 3 3 3 4 3 4 3 4 5 3 4 4 3 4 3 4 3 4 3 4 3 4 4 3 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 2 1 EII 2 VII 3 EEE 4 EEV 5 VVV 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 2 2 2 2 1 1 2 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 0 5 10 15 20 25 30 35 −5400 number of clusters 1 −5500 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 2 2 2 2 1 EEE 2 EEV −5600 BIC 1 2 1 2 2 2 2 2 4 6 8 10 12 14 number of clusters According to BIC the elliptical models are much better than the spherical models (top plot). Examining only the elliptical models (bottom plot) would suggest that the EEE with 14 clusters is the best model. The BIC values for EEE are fairly flat from 1-13 clusters, peaks some at 14 and then drops. For the EEE model, 14 clusters looks best, followed by 10-12, 7, and 1-2. The EEV model has an interesting range of BIC values for different clusters: 1 cluster is as good as 9 clusters and much better than any other choice of number. music.mc<-EMclust(d.music[,-c(1:2)],1:36,c("EII","VII","EEE","EEV","VVV")) par(pty="m",mfrow=c(2,1)) plot(music.mc) legend(1,-6300,col=c(1:5),lty=c(1:5), legend=c("1 EII","2 VII","3 EEE","4 EEV","5 VVV"),bg="white") music.mc<-EMclust(d.music[,-c(1:2)],1:15,c("EEE","EEV")) plot(music.mc) abline(h=seq(-5610,-5400,by=10),col="gray80") legend(1,-5550,col=c(1:2),lty=c(1:2), legend=c("1 EEE","2 EEV"),bg="white") box() The 14 clusters for the EEE model are (organized some according to similarity of cluster mean): 11 Cluster Cluster Means LAve LMax LFEner 50.2 3.3 × 104 114 -2.8 3.1 × 104 112 -4.3 3.2 × 104 108 Names of tracks 11 12 10 LVar 1.3 × 108 8.3 × 107 4.2 × 107 LFreq 41 246 108 13 5.9 × 107 -3.5 3.1 × 104 111 160 14 2.4 × 107 -20.1 2.9 × 104 107 232 1 1.6 × 107 -24.2 2.7 × 104 106 169 2 4 1.4 × 107 7.2 × 106 216.2 -83.0 3.0 × 104 2.7 × 104 105 102 198 92 3 6.5 × 106 17.5 2.3 × 104 102 274 5 6 9 4.7 × 106 3.1 × 106 2.4 × 106 14.2 1.6 -21.3 1.8 × 104 2.1 × 104 1.3 × 104 86 98 103 209 552 233 7 8 5.3 × 105 5.8 × 105 7.7 -9.2 8.1 × 103 5.7 × 103 99 105 566 222 Saturday Morning Girl, Cant Buy Me Love All in a Days Work, Love of the Loveless, Wrong About Bobby, Yellow Submarine Rock Hard Times, Lone Wolf, I Want to Hold Your Hand, I Feel Fine, Ticket to Ride, Help, Penny Lane The Good Old Days, Love Me Do, Yesterday, B4, Waterloo Dancing Queen, Agony, Eleanor Rigby, B8, Anywhere Is V6 Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, Money V1, V3, V9, M3, M6, Restraining, B1, B3, B7, V13 V5, HeyJude V7, M4, B5 I Have A Dream, SOS, M1, M2, M5, The Memory of Trees, Pax Deorum, V11 V2, V10, B6 The Winner, V4, V8, B2, V12 The common variance-covariance matrix is S= 8.5 × 1012 −1.6 × 107 7.2 × 108 1.2 × 106 5.1 × 107 −1.6 × 107 5.9 × 102 −2.3 × 104 2.8 × 100 2.2 × 102 7.2 × 108 −2.3 × 104 5.0 × 106 −2.0 × 103 8.4 × 104 1.2 × 106 2.8 × 100 −2.0 × 103 3.4 × 100 2.4 × 101 5.1 × 107 2.2 × 102 8.4 × 104 2.4 × 101 1.6 × 104 The unusual tracks discovered earlier are in singleton clusters, V6, Saturday Morning. Hey Jude the unusual Beatles song is grouped with the classical tracks. There are some similar groupings to previous methods. All in a Days Work, Love of the Loveless, Wrong About Bobby, Yellow Submarine, Rock Hard Times, Lone Wolf, I Want to Hold Your Hand, I Feel Fine, Ticket to Ride, Help, Penny Lane, The Good Old Days, Love Me Do, Yesterday are grouped near together. The Abba songs for the most part are grouped together, although not as neatly as in SOM. There is some disagreement with SOM. V6 is considered a singleton by model-based clustering, but more closely aligned with the classical tracks in SOM. Agony is seen to be similar to Dancing Queen, and Eleanor Rigby by model-based clustering but as closer to the classical tracks by SOM. Perhaps the biggest difference is that SOM puts the Abba tracks all very close to each other, but model-based clustering breaks them up over five clusters. Although SOM would group the Abba tracks across 4 nodes. On closer inspection the model-based clustering doesn’t look so appealing. 4.5 Comparing Methods The SOM map is a nice way to compare the tracks on a contiuum, rather than discrete clusters, but digesting the 36 resulting clusters separately could be dizzying. On the map we want to draw bounding lines between apparent clusters. There are other methods which can produce similar types of displays as the SOM map: principal component analysis (left plot below) and multidimensional scaling (right plot below). 12 The k-means (EII in model-based) 14 cluster solution for comparison with the EEE model best solution is: 14 1 3 5 6 7 12 2 4 8 9 10 13 11 V6 V1, V3, V9, B7, V13, V10, M1, M2, M4, M5, B5, B6 V2, V7 M3, M6, Agony, B1, B3, B4 V5, Hey Jude V4, V8, B2, The Memory of Trees, Pax Deorum, V11, V12 The Good Old Days, Restraining, Love Me Do, Yesterday, Eleanor Rigbym B8, Anywhere Is I Have A Dream, The Winner, SOS Dancing Queen, Waterloo Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, Money All in a Days Work, Love of the Loveless, Lone Wolf, Wrong About Bobby, Cant Buy Me Love, I Feel Fine, Ticket to Ride, Help, Yellow Submarine, Penny Lane Girl, Rock Hard Times, I Want to Hold Your Hand Saturday Morning This looks better in terms of distinguishing between rock and classical track. The Abba tracks are mostly together, and the other rock tracks that have been found to be similar by other methods are grouped together. The classical songs are grouped in mostly together, along with the few unusual rock tracks (Agony, Hey Jude) and the new wave tracks. Lets examine a smaller number of clusters. Here is the solution for 9 clusters using k-means. 13 6 9 5 7 2 3 1 4 8 V6 V2, V7 V10, M1, M2, M4, M5, B5, B6 V1, V9, M3, M6, Agony, Restraining, B1, B3, B4, B7 V5, HeyJude The Winner, V3, V4, V8, B2, The Memory of Trees, Pax Deorum, V11, V12, V13 Dancing Queen, Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, I Have A Dream, Money, SOS, Waterloo All in a Days Work, The Good Old Days, Love of the Loveless, Wrong About Bobby, Love Me Do, Yesterday, Yellow Submarine, Eleanor Rigby, B8, Anywhere Is Saturday Morning, Girl, Rock Hard Times, Lone Wolf, I Want to Hold Your Hand, Cant Buy Me Love, I Feel Fine, Ticket to Ride, Help, Penny Lane BUT Saturday Morning is not in its own cluster. Here is the solution for 10 clusters: 10 2 5 1 4 8 3 6 7 9 V6 V2, V7, V10, M2 V1, M3, M6, Agony, B1, B3, B4 V5, M4, M5, B5, B6, B7, HeyJude V3, V4, V8, M1, B2, The Memory of Trees, Pax Deorum, V11, V12, V13 I Have A Dream, The Winner, SOS Dancing Queen, Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, Money, Waterloo V9, The Good Old Days, Restraining, Love Me Do, Yesterday, Eleanor Rigby, B8, Anywhere Is All in a Days Work, Love of the Loveless, Girl, Rock Hard Times, Lone Wolf, Wrong About Bobby, I Want to Hold Your Hand, Cant Buy Me Love, I Feel Fine, Ticket to Ride, Help, Yellow Submarine, Penny Lane Saturday Morning 14 5 Conclusions • Here are my final clusters: Cluster Cluster Means LAve LMax LFEner 216.2 3.0 × 104 105 7.4 1.2 × 104 101 7.5 2.7 × 104 102 2 1.8 × 104 94 12 1.0 × 104 104 Tracks 10 2 5 1 4 LVar 1.4 × 107 1.9 × 106 1.1 × 107 3.0 × 106 1.7 × 106 LFreq 198 690 402 287 168 8 3 2.7 × 106 1.1 × 107 -71 -85 1.3 × 104 2.7 × 104 104 103 243 95 6 1.7 × 107 -3.2 2.5 × 104 106 165 7 5.7 × 107 -3.6 3.1 × 104 110 157 9 1.3 × 108 50.2 3.3 × 104 114 41 V6 V2, V7, V10, M2 V1, M3, M6, Agony, B1, B3, B4 V5, M4, M5, B5, B6, B7, HeyJude V3, V4, V8, M1, B2, The Memory of Trees, Pax Deorum, V11, V12, V13 I Have A Dream, The Winner, SOS Dancing Queen, Knowing Me, Take a Chance, Mamma Mia, Lay All You, Super Trouper, Money, Waterloo V9, The Good Old Days, Restraining, Love Me Do, Yesterday, Eleanor Rigby, B8, Anywhere Is All in a Days Work, Love of the Loveless, Girl, Rock Hard Times, Lone Wolf, Wrong About Bobby, I Want to Hold Your Hand, Cant Buy Me Love, I Feel Fine, Ticket to Ride, Help, Yellow Submarine, Penny Lane Saturday Morning • The unusual tracks are: – V6 because it has a very high LAve value. – Saturday Morning because it has very high LAve and LVar values. – Hey Jude and Agony are more similar to classical tracks. – V9 and B8 are more similar to rock tracks. – Abba tracks because they have very low LAve. • New wave tracks are similar to both rock and classical. • Below is a plot of the cluster means, and the location of the clusters on the SOM map. 15 Cluster means 10 4 2 0 −2 −4 9 4 2 0 −2 −4 8 4 2 0 −2 −4 7 4 2 0 −2 −4 6 4 2 0 −2 −4 5 4 2 0 −2 −4 1 2 3 4 5 6 7 8 9 10 4 4 2 0 −2 −4 3 4 2 0 −2 −4 2 4 2 0 −2 −4 1 LVar LAve LMax Variables LFEner LFreq 4 2 0 −2 −4 LVar LAveLMax LFEner LFreq References Fraley, C. and Raftery, A. E. (2002) Model-based Clustering, Discriminant Analysis, Density Estimation”, Journal of the American Statistical Association, 97, 611–631, http://www.stat.washington.edu/mclust. Hastie, T., Tibshirani, R., and Friedman, J. (2001) ”The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, Springer, New York, ISBN 0 387 95284 5. Kohonen, T., (2000) Self-organizing Maps (3rd ed), Springer, Berlin, ISBN 3 540 67921 9. Ripley, B.D. (1996) ”Pattern Recognition and Neural Networks” Cambridge University Press, ISBN 0 521 46086 7. Venables, W.N. and Ripley, B.D. (2002) Modern Applied Statistics with S, Springer, New York, ISBN 0 387 95457 0. 16 Appendix d.music<-read.csv("music-plusnew-sub-full.csv",row.names=1) apply(d.music[,-c(1,2)],2,mean) apply(d.music[,-c(1,2)],2,sd) d.music.std<-cbind(d.music[,c(1,2)],apply(d.music[,-c(1,2)],2,f.std.data)) # Summary statistics apply(d.music[,-c(1,2)],2,mean) apply(d.music[,-c(1,2)],2,sd) apply(d.music[d.music[,1]=="Abba",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Beatles",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Eels",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Beethoven",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Mozart",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Vivaldi",-c(1,2)],2,mean) apply(d.music[d.music[,1]=="Enya",-c(1,2)],2,mean) # Plots library(lattice) d.music.df<-data.frame(Artist=factor(rep(d.music[,1],5)), y=as.vector(as.matrix(d.music[,3:7])), meas=factor(rep(1:5, rep(62,5)), labels=names(d.music[,-c(1,2)]))) postscript("music-dotplot.ps",width=8.0,height=8.0,horizontal=FALSE, paper="special",family="URWHelvetica") par(pty="s",mar=c(2,1,1,1)) plt.bg<-trellis.par.get("background") plt.bg$col<-"grey90" trellis.par.set("background",plt.bg) stripplot(Artist~y|meas, data=d.music.df, scales=list(x="free"), strip=function(...) strip.default(style=1,...), panel=function(x,y){panel.grid(h=-1,v=5,col="white") panel.stripplot(x,y,col=1,pch=16)}, xlab="",pch=16,col=1,layout=c(3,2),aspect=1, as.table=T) dev.off() # Hierarchical clustering music.dist<-dist(d.music[,-c(1:2)]) music.dist<-dist(d.music.std[,-c(1:2)]) music.hc1<-hclust(music.dist,method="ward") music.hc2<-hclust(music.dist,method="single") music.hc3<-hclust(music.dist,method="complete") postscript("music-hclust.ps",width=5.0,height=10.0,horizontal=FALSE, paper="special",family="URWHelvetica") par(mfrow=c(3,1),mar=c(1,2,2,2)) plot(music.hc1,main="Ward",xlab=" ") text(music.hc1) plot(music.hc2,main="Single",ylab=" ") text(music.hc2) plot(music.hc3,main="Complete",ylab=" ") 17 text(music.hc3) dev.off() cl.12<-cutree(music.hc1,2) cl.22<-cutree(music.hc2,2) cl.32<-cutree(music.hc3,2) cl.13<-cutree(music.hc1,3) cl.23<-cutree(music.hc2,3) cl.33<-cutree(music.hc3,3) cl.14<-cutree(music.hc1,4) cl.24<-cutree(music.hc2,4) cl.34<-cutree(music.hc3,4) cl.15<-cutree(music.hc1,5) cl.25<-cutree(music.hc2,5) cl.35<-cutree(music.hc3,5) cl.16<-cutree(music.hc1,6) cl.26<-cutree(music.hc2,6) cl.36<-cutree(music.hc3,6) cl.17<-cutree(music.hc1,7) cl.27<-cutree(music.hc2,7) cl.37<-cutree(music.hc3,7) cl.18<-cutree(music.hc1,8) cl.28<-cutree(music.hc2,8) cl.38<-cutree(music.hc3,8) cl.19<-cutree(music.hc1,9) cl.29<-cutree(music.hc2,9) cl.39<-cutree(music.hc3,9) cl.110<-cutree(music.hc1,10) cl.210<-cutree(music.hc2,10) cl.310<-cutree(music.hc3,10) cl.111<-cutree(music.hc1,11) cl.211<-cutree(music.hc2,11) cl.311<-cutree(music.hc3,11) cl.112<-cutree(music.hc1,12) cl.212<-cutree(music.hc2,12) cl.312<-cutree(music.hc3,12) cl.113<-cutree(music.hc1,13) cl.213<-cutree(music.hc2,13) cl.313<-cutree(music.hc3,13) cl.114<-cutree(music.hc1,14) cl.214<-cutree(music.hc2,14) cl.314<-cutree(music.hc3,14) 18 table(cl.12,cl.22) table(cl.12,cl.32) table(cl.32,cl.22) table(cl.13,cl.23) table(cl.13,cl.33) table(cl.33,cl.23) table(cl.14,cl.24) table(cl.14,cl.34) table(cl.34,cl.24) table(cl.15,cl.25) table(cl.15,cl.35) table(cl.35,cl.25) table(cl.16,cl.26) table(cl.16,cl.36) table(cl.36,cl.26) table(cl.17,cl.27) table(cl.17,cl.37) table(cl.37,cl.27) table(cl.18,cl.28) table(cl.18,cl.38) table(cl.38,cl.28) table(cl.19,cl.29) table(cl.19,cl.39) table(cl.39,cl.29) table(cl.110,cl.210) table(cl.110,cl.310) table(cl.310,cl.210) table(cl.111,cl.211) table(cl.111,cl.311) table(cl.311,cl.211) table(cl.112,cl.212) table(cl.112,cl.312) table(cl.312,cl.212) table(cl.113,cl.213) table(cl.113,cl.313) table(cl.313,cl.213) table(cl.114,cl.214) table(cl.114,cl.314) table(cl.314,cl.214) for (i in 1:5) cat(i,",",dimnames(d.music)[[1]][cl.15==i],"\n") for (i in 1:5) cat(i,",",dimnames(d.music)[[1]][music.km5$cluster==i],"\n") dimnames(d.music)[[1]][music.km5$cluster==3&cl.15==2] library(genegobitree) library(Rggobi) ggobi() setup.gobidend(music.hc1,d.music) color.click.dn(music.hc1,d.music) 19 library(mva) music.km2<-kmeans(d.music.std[,-c(1,2)],2) music.km3<-kmeans(d.music.std[,-c(1,2)],3) music.km4<-kmeans(d.music.std[,-c(1,2)],4) music.km5<-kmeans(d.music.std[,-c(1,2)],5) music.km6<-kmeans(d.music.std[,-c(1,2)],6) music.km7<-kmeans(d.music.std[,-c(1,2)],7) music.km8<-kmeans(d.music.std[,-c(1,2)],8) music.km9<-kmeans(d.music.std[,-c(1,2)],9) music.km10<-kmeans(d.music.std[,-c(1,2)],10) music.km11<-kmeans(d.music.std[,-c(1,2)],11) music.km12<-kmeans(d.music.std[,-c(1,2)],12) music.km13<-kmeans(d.music.std[,-c(1,2)],13) music.km14<-kmeans(d.music.std[,-c(1,2)],14) table(cl.15,music.km5$cluster) d.music.clust1<-cbind(d.music,cl.12,cl.22,cl.32,cl.13,cl.23,cl.33, cl.14,cl.24,cl.34,cl.15,cl.25,cl.35,cl.16,cl.26,cl.36,cl.17,cl.27,cl.37, cl.18,cl.28,cl.38, music.km2$cluster,music.km3$cluster,music.km4$cluster,music.km5$cluster, music.km6$cluster,music.km7$cluster,music.km8$cluster) dimnames(d.music.clust1)[[2]][8]<-"HC-W2" dimnames(d.music.clust1)[[2]][9]<-"HC-S2" dimnames(d.music.clust1)[[2]][10]<-"HC-C2" dimnames(d.music.clust1)[[2]][11]<-"HC-W3" dimnames(d.music.clust1)[[2]][12]<-"HC-S3" dimnames(d.music.clust1)[[2]][13]<-"HC-C3" dimnames(d.music.clust1)[[2]][14]<-"HC-W4" dimnames(d.music.clust1)[[2]][15]<-"HC-S4" dimnames(d.music.clust1)[[2]][16]<-"HC-C4" dimnames(d.music.clust1)[[2]][17]<-"HC-W5" dimnames(d.music.clust1)[[2]][18]<-"HC-S5" dimnames(d.music.clust1)[[2]][19]<-"HC-C5" dimnames(d.music.clust1)[[2]][20]<-"HC-W6" dimnames(d.music.clust1)[[2]][21]<-"HC-S6" dimnames(d.music.clust1)[[2]][22]<-"HC-C6" dimnames(d.music.clust1)[[2]][23]<-"HC-W7" dimnames(d.music.clust1)[[2]][24]<-"HC-S7" dimnames(d.music.clust1)[[2]][25]<-"HC-C7" dimnames(d.music.clust1)[[2]][26]<-"HC-W8" dimnames(d.music.clust1)[[2]][27]<-"HC-S8" dimnames(d.music.clust1)[[2]][28]<-"HC-C8" dimnames(d.music.clust1)[[2]][29]<-"KM-2" dimnames(d.music.clust1)[[2]][30]<-"KM-3" dimnames(d.music.clust1)[[2]][31]<-"KM-4" dimnames(d.music.clust1)[[2]][32]<-"KM-5" dimnames(d.music.clust1)[[2]][33]<-"KM-6" dimnames(d.music.clust1)[[2]][34]<-"KM-7" dimnames(d.music.clust1)[[2]][35]<-"KM-8" f.writeXML(d.music.clust1,"music-clust1.xml",data.num=1) for (i in 1:6) 20 cat(i,",",dimnames(d.music)[[1]][music.km6$cluster==i],"\n") music.som<-som(d.music.std[,-c(1:2)],6,6,rlen=100) music.som<-som(d.music.std[,-c(1:2)],6,6,rlen=200) music.som<-som(d.music.std[,-c(1:2)],6,6,rlen=300) music.som<-som(d.music.std[,-c(1:2)],6,6,rlen=400) music.som<-som(d.music.std[,-c(1:2)],6,6,rlen=1000) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=100) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=200) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=300) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=400) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=1000) music.som<-som(d.music.std[,-c(1:2)],6,6,init="random",neigh="bubble",rlen=100) music.som<-som(d.music.std[,-c(1:2)],6,6,init="random",neigh="bubble", rlen=1000) xmx<-jitter(music.som$visual$x,factor=3) xmy<-jitter(music.som$visual$y,factor=3) par(mfrow=c(1,1),pty="s") plot(xmx,xmy,type="n",pch=16,xlab="x",ylab="y",main="SOM Map", xlim=c(-0.5,6),ylim=c(-0.5,6)) text(xmx,xmy,dimnames(d.music)[[1]]) dimnames(music.som$code)<-list(NULL,names(d.music[,-c(1,2)])) d.music.clust<-cbind(d.music.std,xmx,xmy) dimnames(d.music.clust)[[2]][8]<-"Map 1" dimnames(d.music.clust)[[2]][9]<-"Map 2" d.music.grid<-cbind(rep("0",36),rep("0",36),music.som$code, music.som$code.sum[,1:2]) dimnames(d.music.grid)[[2]][1]<-"Artist" dimnames(d.music.grid)[[2]][2]<-"Type" dimnames(d.music.grid)[[2]][8]<-"Map 1" dimnames(d.music.grid)[[2]][9]<-"Map 2" d.music.clust<-rbind(d.music.grid,d.music.clust) f.writeXML(d.music.clust, "music-SOM.xml",data.num=2,dat1.id<-c(1:dim(d.music.clust)[1]), dat2=cbind(c(1:60),c(1:60)), dat2.source=x33.l[,1], dat2.destination=x33.l[,2], dat2.name="SOM",dat2.id=paste(rep("l",60),c(1:60))) # Favorite model music.som<-som(d.music[,-c(1:2)],6,6,neigh="bubble",rlen=1000) music.som<-som(d.music.std[,-c(1:2)],6,6,neigh="bubble",rlen=1000) xmx<-jitter(music.som$visual$x,factor=3) xmy<-jitter(music.som$visual$y,factor=3) postscript("music-som.ps",width=8.0,height=8.0,horizontal=FALSE, paper="special",family="URWHelvetica") par(mfrow=c(1,1),pty="s") plot(xmx,xmy,type="n",pch=16,xlab="x",ylab="y",main="SOM Map", xlim=c(-0.5,6),ylim=c(-0.5,6)) text(xmx,xmy,dimnames(d.music)[[1]]) dev.off() # Setting up the net lines 21 n.nodes<-6 x33.l<-NULL for (i in 1:n.nodes) { for (j in 1:n.nodes) { if (j<n.nodes) x33.l<-rbind(x33.l,c((i-1)*n.nodes+j,(i-1)*n.nodes+j+1)) if (i<n.nodes) x33.l<-rbind(x33.l,c((i-1)*n.nodes+j,i*n.nodes+j)) }} # Model-based postscript("music-mc1.ps",width=8.0,height=8.0,horizontal=FALSE, paper="special",family="URWHelvetica") music.mc<-EMclust(d.music[,-c(1:2)],1:36,c("EII","VII","EEE","EEV","VVV")) par(pty="m",mfrow=c(2,1)) plot(music.mc) legend(1,-6300,col=c(1:5),lty=c(1:5), legend=c("1 EII","2 VII","3 EEE","4 EEV","5 VVV"),bg="white") music.mc<-EMclust(d.music[,-c(1:2)],1:15,c("EEE","EEV")) plot(music.mc) abline(h=seq(-5610,-5400,by=10),col="gray80") legend(1,-5550,col=c(1:2),lty=c(1:2), legend=c("1 EEE","2 EEV"),bg="white") box() dev.off() smry<-summary(music.mc,d.music[,-c(1:2)]) cl<-smry$classification cl.mat<-matrix(0,62,2) cl.mat[cl==1,1]<-1 cl.mat[cl==2,2]<-1 prm<-mstepEEV(d.music[,-c(1:2)],cl.mat) d.music[cl==1,1:2] d.music[cl==2,1:2] for (i in 1:14) cat(i,",",dimnames(d.music)[[1]][cl==i],"\n") smry<-summary(music.mc,d.music[,-c(1:2)]) smry t(smry$mu) smry$sigma mc.clust.dist<-dist(t(smry$mu)) mc.clust.mean<-hclust(mc.clust.dist,method="single") plot(mc.clust.mean) music.mc<-EMclust(d.music[,-c(1:2)],11,"EEV") music.mc2<-EMclust(d.music[,-c(1:2)],2,"EEE") summary(music.mc2,d.music[,-c(1:2)]) music.mc3<-EMclust(d.music[,-c(1:2)],7,"EEE") summary(music.mc3,d.music[,-c(1:2)]) mccl<-summary(music.mc3,d.music[,-c(1:2)])$classification for (i in 1:7) cat(i,",",dimnames(d.music)[[1]][mccl==i],"\n") 22 # Generate the ellipses in 5D vc<-smry$sigma[,,1] evc<-eigen(vc) vc2<-(evc$vectors)%*%diag(sqrt(evc$values))%*%t(evc$vectors) y1<-f.gen.sphere(500,5) y1<-y1%*%vc2 y1[,1]<-y1[,1]+smry$mu[1,1] y1[,2]<-y1[,2]+smry$mu[2,1] y1[,3]<-y1[,3]+smry$mu[3,1] y1[,4]<-y1[,4]+smry$mu[4,1] y1[,5]<-y1[,5]+smry$mu[5,1] vc<-smry$sigma[,,2] evc<-eigen(vc) vc2<-(evc$vectors)%*%diag(sqrt(evc$values))%*%t(evc$vectors) y2<-f.gen.sphere(500,5) y2<-y2%*%vc2 y2[,1]<-y2[,1]+smry$mu[1,2] y2[,2]<-y2[,2]+smry$mu[2,2] y2[,3]<-y2[,3]+smry$mu[3,2] y2[,4]<-y2[,4]+smry$mu[4,2] y2[,5]<-y2[,5]+smry$mu[5,2] y<-cbind(rep(0,1000),c(rep(1,500),rep(2,500)), rbind(y1,y2)) y[,1]<-factor(y[,1]) y[,2]<-factor(y[,2]) dimnames(y)<-list(NULL,names(d.music)) d.music.mc<-rbind(d.music,y) f.writeXML(d.music.mc,"music-mclust.xml",data.num=1) # Set up a full data set kmcl6<-music.km6$cluster dimnames(music.som$code)<-list(NULL,names(d.music[,-c(1,2)])) d.music.clust<-cbind(d.music,cl6,kmcl6,xmx,xmy) dimnames(d.music.clust)[[2]][8]<-"HC-W6" dimnames(d.music.clust)[[2]][9]<-"KM-6" dimnames(d.music.clust)[[2]][10]<-"Map 1" dimnames(d.music.clust)[[2]][11]<-"Map 2" d.music.grid<-cbind(rep("0",36),rep("0",36),music.som$code,rep(0,36),rep(0,36), music.som$code.sum[,1:2]) dimnames(d.music.grid)[[2]][1]<-"Artist" dimnames(d.music.grid)[[2]][2]<-"Type" dimnames(d.music.grid)[[2]][8]<-"HC-W6" dimnames(d.music.grid)[[2]][9]<-"KM-6" dimnames(d.music.grid)[[2]][10]<-"Map 1" dimnames(d.music.grid)[[2]][11]<-"Map 2" d.music.clust<-rbind(d.music.grid,d.music.clust) f.writeXML(d.music.clust, "SOM-music.xml",data.num=2,dat1.id<-c(1:dim(d.music.clust)[1]), dat2=cbind(c(1:60),c(1:60)), dat2.source=x33.l[,1], 23 dat2.destination=x33.l[,2], dat2.name="SOM",dat2.id=paste(rep("l",60),c(1:60))) # Utility functions f.gen.sphere<-function(n=100,p=5) { x<-matrix(rnorm(n*p),ncol=p) xnew<-t(apply(x,1,norm.vec)) xnew } norm.vec<-function(x) { x<-x/norm(x) x } norm<-function(x) { sqrt(sum(x^2))} # Out put data for ggvis edges<-NULL k<-1 for (i in 1:61) for (j in (i+1):62) { edges<-rbind(edges,c(i,j,music.dist[k])) k<-k+1 } f.writeXML(d.music.std,"music-MDS.xml",data.num=2,dat1.id<-c(1:62), dat1.name="Music", dat2=cbind(1:1891,edges[,3]), dat2.source=edges[,1],dat2.destination=edges[,2], dat2.name="dist",dat2.id=c(1:1891)) # Comparison of clusters for (i in 1:14) cat(i,",",dimnames(d.music)[[1]][music.km14$cluster==i],"\n") # Check smaller number of clusters for (i in 1:9) cat(i,",",dimnames(d.music)[[1]][music.km9$cluster==i],"\n") for (i in 1:10) cat(i,",",dimnames(d.music)[[1]][music.km10$cluster==i],"\n") for (i in 1:11) cat(i,",",dimnames(d.music)[[1]][music.km11$cluster==i],"\n") options(digits=2) for (i in 1:10) cat(i,",",apply(d.music[music.km10$cluster==i,-c(1,2)],2,mean),"\n") km10.mn<-NULL for (i in 1:10) km10.mn<-rbind(km10.mn,apply(d.music[music.km10$cluster==i,-c(1,2)],2,mean)) 24 km10.mn.std<-apply(km10.mn,2,f.std.data) range(km10.mn.std) postscript("music-means.ps",width=5.0,height=8.0,horizontal=FALSE, paper="special",family="URWHelvetica") plot(c(1,5),c(-2.5,2.8),type="n",axes=F,xlab="Variables",ylab="") rect(0.8,-2.8,5.2,3.1,col="gray80") abline(v=c(1:5),col="white") abline(h=seq(-2.5,2.5,by=0.5),col="white") clrs<-c(1:7,9:11) for (i in 1:10) { lines(c(1:5),km10.mn.std[i,],lty=i,col=clrs[i]) points(c(1:5),km10.mn.std[i,],pch=i,col=clrs[i]) } axis(side=1,at=c(1:5),labels=names(d.music[,-c(1,2)])) legend(1,-0.8,lty=c(1:10),pch=c(1:10),col=clrs, legend=c(1:10),bg="gray80") title("Cluster means") box() dev.off() library(lattice) d.music.df<-data.frame(cluster=factor(rep(music.km10$cluster,5)), y=as.vector(as.matrix(d.music.std[,3:7])), meas=factor(rep(1:5, rep(62,5)), labels=names(d.music[,-c(1,2)]))) postscript("music-means2.ps",width=3.0,height=10.0,horizontal=FALSE, paper="special",family="URWHelvetica") plt.bg<-trellis.par.get("background") plt.bg$col<-"grey90" trellis.par.set("background",plt.bg) xyplot(y~meas|cluster,data=d.music.df,xlab="",ylab="", box.ratio=1,layout=c(1,10),col=1,pch=16, panel=function(x,y){panel.grid(h=-1,v=5,col="white") panel.stripplot(x,y,col=1,pch=16)}) dev.off() 25