Cluster analysis of USArrests using R

ANG ZHI NUO SIT3010 T3 Background of data The "USArrests" dataset, sourced from the Federal Bureau of Investigation (FBI), is a fundamental resource for delving into the intricate interplay between crime and socioeconomic factors in the United States. Comprising 50 rows, each representing a U.S. state, and 4 columns featuring variables such as murder rates, assault rates, urban population percentages, and rape rates, this dataset facilitates in-depth analysis of regional crime disparities and potential correlations between these variables, with rates calculated per 100,000 people. It serves as a valuable tool for seamlessly exploring and interpreting crime trends and their intricate relationships with demographic and socio-economic factors throughout the United States. Result Figure 1. Plot of Euclidean distance matrix The distance matrix plot vividly illustrates the degree of similarity or dissimilarity between data points. In this plot, darker squares signify greater distances, suggesting heightened dissimilarity, while lighter squares denote shorter distances, indicating stronger similarity. Notably, the plot unveils inherent data groupings; for instance, the stark dissimilarity observed between Vermont and North Dakota in contrast to Nevada hints at the likely formation of two distinct clusters within the dataset. On the other hand, Wisconsin is closed to Maine and are likely to form same cluster. Figure 2. Plots of cluster with different number of groups Figure 3. Elbow plot for different number of clusters Multiple k-cluster plots are generated to determine the optimal number of clusters. For k=4, the data is effectively partitioned into four distinct and non-overlapping clusters, demonstrating a strong performance. However, as k increases, starting from k=5, the clusters begin to overlap, which is indicative of a less desirable outcome, as overlapping clusters can compromise the quality of the clustering results. This conclusion gains further validation from the elbow method, which illustrates that increasing k beyond 4 does not lead to a substantial reduction in within-cluster sum of squares (WSS), emphasizing the appropriateness of a 4-cluster solution. Table 1. Summary table of k-mean clustering with k=4 custers 1 2 3 4 Murder Assault UrbanPop 1.4118898 0.8743346 -0.8145211 -0.9615407 -1.106601 -0.9301069 -0.4894375 -0.3826001 0.5758298 0.6950701 1.0394414 0.722637 Rape 0.01927104 -0.9667633 -0.2616538 1.27693964 Tables 2. tables of states in each clusters states group Alabama Arkansas Georgia Louisiana Mississippi North Carolina South Carolina Tennessee 1 1 1 1 1 1 1 1 states group Idaho Iowa Kentucky Maine Minnesota Montana Nebraska New Hampshire North Dakota South Dakota Vermont West Virginia Wisconsin 2 2 2 2 2 2 2 2 2 2 2 2 2 states group Connecticut Delaware Hawaii Indiana Kansas Massachusetts New Jersey Ohio Oklahoma Oregon Pennsylvania Rhode Island Utah Virginia Washington Wyoming 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 states group Alaska Arizona California Colorado Florida Illinois Maryland Michigan Missouri Nevada New Mexico New York Texas 4 4 4 4 4 4 4 4 4 4 4 4 4 In this K-means clustering analysis with four clusters, the table provides insights into the composition of each cluster, as well as their respective means for the variables Murder, Assault, UrbanPop, and Rape. Cluster 1, with a size of 8, exhibits above-average values for Murder and Assault, but a below-average value for UrbanPop. Cluster 2, with 13 data points, is characterized by below-average values in all variables. Cluster 3, the largest with 16 data points, displays below-average Murder and Assault rates but an above-average UrbanPop value. Finally, Cluster 4, comprising 13 data points, has above-average values for all variables, indicating a higher prevalence of crime in this cluster. Discussion In this intriguing K-means clustering analysis, the data has been partitioned into four distinct clusters, each with its own compelling narrative. Cluster 1 transports us to areas where urban life isn't as bustling (low UrbanPop), yet a dark secret is unveiled - a high murder rate and a somewhat enigmatic presence of assault. This seemingly paradoxical cluster might be attributed to the fact that in less urban settings, law enforcement and social services might be less accessible, potentially leading to higher crime rates. Meanwhile, Cluster 2 takes us to quieter, less urban locales where crime seems to have taken a backseat, with low UrbanPop and significantly lower rates of murder, assault, and even the lesser-discussed crime, rape. The tranquility of these areas can be a result of less crowded living conditions and possibly stronger community ties. Shifting our focus to Cluster 3, we are introduced to the world of vibrant urban life (high UrbanPop) combined with lower crime rates, giving the impression of a bustling metropolis with an aura of safety. Here, increased urbanization might lead to more efficient law enforcement and social programs, contributing to the lower crime rates. Lastly, Cluster 4 presents a scenario where bustling and intense urban living, characterized by high UrbanPop, is matched by soaring rates of murder, assault, and even the unfortunate occurrence of rape. This striking contrast highlights the complexity of urban environments, where various factors like population density, economic disparities, and social dynamics can create conditions conducive to higher crime rates. This analysis offers a captivating exploration of the diverse crime and population landscapes within our dataset, shedding light on the intricate relationship between urbanization and criminal activity.

Cluster analysis of USArrests using R

Related documents

Products

Support

Cluster analysis of USArrests using R

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib