Draft11 - Statistics - University of Washington

advertisement
Video Games and the
Communities They Foster
A statistical study of the video gamersphere: its
games, gamers, and societal gaming communities.
Colin Bayer
Aaron Miller
Zak Dehlawi
Patrick Carroll
The University of Washington
June 2008
INTRODUCTION
With the advent of Microsoft’s Xbox Live online gaming platform,
now more than ever gaming has become a social event. Therefore,
one can logically hypothesize that communities of gamers exist inside
today’s video gamersphere. These communities must share
characteristic traits: games played in common, game genres, etc.
Furthermore, we can extract general descriptive statistics regarding
the games in our dataset: those which are the most challenging, the
easiest, and most and least popular.
Data clustering is the practice of segmenting a dataset into a
finite amount of subsets. These subsets are called clusters and the
data which belong to each cluster share a common trait: often the
value of a mathematical measure of proximity to the cluster’s other
members.
Clustering gamers requires a metric of distance. Microsoft
requires each Xbox Live-enabled game to provide some number of
achievements, in-game goals that award Gamerscore points to users
who complete them. A user's Gamerscore is equal to the sum of the
Gamerscore points provided by each achievement that they have
completed, or "unlocked". The ratio that each game has contributed to
a user’s Gamerscore is a simplistic, yet useful distance metric.
METHODS
Clustering gamers necessitates that the dataset contain the
achievements and their Gamerscore values each gamer has unlocked
during his/her gaming career. Additionally, to reduce computation, the
dataset should include each gamer’s Gamerscore. The dataset was
obtained in two pieces: first a list of gamer usernames, then their
achievements and associated Gamerscores. The usernames were
pulled by crawling MyGamerCard.net’s leader-board and live-tracker
(which shows who is currently online in the Xbox Live system) as well
as a few other sites such as 360voice.com. The achievements and
associated Gamerscores per gamer were scraped from the Xbox Live
website (xbox.live.com) by using regular expression matching for the
attained usernames. The results of the crawling and scraping were
inserted into a MySQL database which affords easy data selection for
analysis.
To cluster gamers using rational Gamerscore per game, each
gamer was treated as a point in the gamesphere’s n-dimensional
space, where n is the number of distinct games in the dataset. The ith
coordinate of the gamer’s n-tuple is the ratio of how much the
corresponding game has contributed to the gamer’s Gamerscore.
We implemented three common clustering methods/algorithms:
hierarchical, k-means, and model-based. Hierarchical clustering follows
this algorithm:
1. Cluster the two gamers who, when represented by gamer-points
are the closest together, have the smallest Euclidian distance
between them.
2. Repeat the initial step; however one of the gamers may be
replaced by the centroid of an existing cluster.
3. Stop when a decided-upon maximum number of clusters is
attained.
K-Means clustering assigns each n-dimensional gamer-point to the
cluster whose centroid lies nearest. The centroid is computed by
averaging each of the n-dimensions separately. The K-Means
algorithm follows five core steps:
1. Assume finite number of clusters, k, exists.
2. Randomly generate each cluster’s centroid.
3. Assign each gamer-point to the centroid whose Euclidian
distance is at a minimum.
4. Recompute the cluster’s centroids.
5. Repeat Steps 3 and 4 until no change in cluster assignment
occurs.
//TODO: Describe model-based clustering and how it chooses a kvalue for you to k-means cluster on.
For game descriptive statistics we examined summary statistics as well
as created histograms which visually convey summary statistics.
Download