Video Games and the Communities They Foster A statistical study of the video gamersphere: its games, gamers, and societal gaming communities. Colin Bayer Aaron Miller Zak Dehlawi Patrick Carroll The University of Washington June 2008 INTRODUCTION With the advent of Microsoft’s Xbox Live online gaming platform, now more than ever gaming has become a social event. Therefore, one can logically hypothesize that communities of gamers exist inside today’s video gamersphere. These communities must share characteristic traits: games played in common, game genres, etc. Furthermore, we can extract general descriptive statistics regarding the games in our dataset: those which are the most challenging, the easiest, and most and least popular. Data clustering is the practice of segmenting a dataset into a finite amount of subsets. These subsets are called clusters and the data which belong to each cluster share a common trait: often the value of a mathematical measure of proximity to the cluster’s other members. Clustering gamers requires a metric of distance. Microsoft requires each Xbox Live-enabled game to provide some number of achievements, in-game goals that award Gamerscore points to users who complete them. A user's Gamerscore is equal to the sum of the Gamerscore points provided by each achievement that they have completed, or "unlocked". The ratio that each game has contributed to a user’s Gamerscore is a simplistic, yet useful distance metric. METHODS Clustering gamers necessitates that the dataset contain the achievements and their Gamerscore values each gamer has unlocked during his/her gaming career. Additionally, to reduce computation, the dataset should include each gamer’s Gamerscore. The dataset was obtained in two pieces: first a list of gamer usernames, then their achievements and associated Gamerscores. The usernames were pulled by crawling MyGamerCard.net’s leader-board and live-tracker (which shows who is currently online in the Xbox Live system) as well as a few other sites such as 360voice.com. The achievements and associated Gamerscores per gamer were scraped from the Xbox Live website (xbox.live.com) by using regular expression matching for the attained usernames. The results of the crawling and scraping were inserted into a MySQL database which affords easy data selection for analysis. To cluster gamers using rational Gamerscore per game, each gamer was treated as a point in the gamesphere’s n-dimensional space, where n is the number of distinct games in the dataset. The ith coordinate of the gamer’s n-tuple is the ratio of how much the corresponding game has contributed to the gamer’s Gamerscore. We implemented three common clustering methods/algorithms: hierarchical, k-means, and model-based. Hierarchical clustering follows this algorithm: 1. Cluster the two gamers who, when represented by gamer-points are the closest together, have the smallest Euclidian distance between them. 2. Repeat the initial step; however one of the gamers may be replaced by the centroid of an existing cluster. 3. Stop when a decided-upon maximum number of clusters is attained. K-Means clustering assigns each n-dimensional gamer-point to the cluster whose centroid lies nearest. The centroid is computed by averaging each of the n-dimensions separately. The K-Means algorithm follows five core steps: 1. Assume finite number of clusters, k, exists. 2. Randomly generate each cluster’s centroid. 3. Assign each gamer-point to the centroid whose Euclidian distance is at a minimum. 4. Recompute the cluster’s centroids. 5. Repeat Steps 3 and 4 until no change in cluster assignment occurs. //TODO: Describe model-based clustering and how it chooses a kvalue for you to k-means cluster on. For game descriptive statistics we examined summary statistics as well as created histograms which visually convey summary statistics.