April 8, 2020
Data Analysis on
Massive Online Game Logs
Dora Cai – NCSA, Univ. of Illinois
Growing Popularity of Online Games
• 135 million gamers are playing
worldwide
• Thousands of game titles have
been developed
• Enormous game logs have been
generated and collected
• Game logs are unique resource
for Social Science studies
• Many researchers are working
on game log analysis
2
The Research Team
Started in 2007, about 20 members
University of Illinois at Urbana-Champaign
Professor Marshall Scott Poole, post-doctoral scholars and PhD students
Northwestern University
Professor Noshir Contractor, post-doctoral scholars and PhD students
University of Southern California
Professor Dmitri Williams and PhD students
University of Minnesota - Twin Cities
Professor Jaideep Srivastava and PhD students
3
Project Data Flow
Gordon Cluster
Internet Players
Game Logs
UIUC Database
Analysis Software
4
Research Issues in Game Log Analysis
5
Are there social networks behind the
scene?
What are the characteristics of the
social networks in game play?
Is player’s behavior predictable?
Does player’s behavior reflect his/her
personality?
What is the relationship between the
virtual world and real world?
What is the impact of game play on
player’s personal life?
Does team assembly improve play
performance?
Project Achievement
Project has been funded by NSF, ARI, AFRL, and ARL
More than 40 conference and journal papers have been published
More than 30 graduate students have been trained
8 PhD students worked on this project have graduated
A comprehensive game log database has been constructed
Project has attracted collaborations from many academic institutions
and game companies
A spinoff company has been created by two of the PIs
6
My Involvement in the Project
Join the project since 2008
Construct and maintain a game log database
(4.5TB)
English
中文
English
Integrate game logs in 3 languages (English,
Chinese and Japanese) from 4 online games (Ever
Quest II, Chevalier’s Romance 3, Dragon’s Nest,
Eve Online) into one single database
Help researchers effectively use HPC and
databases in their research
Work with the research team:
Build the prediction models based on player’s
behavior
Design and implement the algorithms for group
detection
Visualize the social networks in online games
7
中文
日本語の
English
English
English
中文
English
A New Tool: SocialMapExplorer
A web-based application for visualizing the social networks of
online games
An application implemented using GoogleMap API, HTML,
JavaScript
A highly interactive tool: Users can choose analysis variables,
aggregation levels, time periods, and location regions
A tool using visual features (color, size, shape, weight and
font) to represent various network features
A tool for visualizing data on a real map and tightly combining
time and spatial information with other study attributes
A tool capable to process a terabyte-scale dataset with
complex data structure
3 modules: NetViewer, GroupDetector, and
CorrelationFinder
8
Work Flow for SocialMapExplorer
Step 1: Data summarization
Apply data-mining/data-warehouse techniques to
construct materialized views on data cubes
Step 2: Geocoding
Match players’ zip-code with an official USA zipcode book and assign latitude/longitude
coordinates for each player
Step 3: Data visualization
Visualize data on real maps
9
Player
Zip-Code
Latitude
Longitude
1234567
15603
-122.26252
37.90194
2345678
44327
-56.77754
23.78321
……
…..
……
……
Module: NetViewer
Designed for analyzing network dynamics by
visualizing social networks in time series
Trace networking events and make the linkage
between involved parties
Able to choose different data sets based on
user’s interest
Display networks at different intervals:
minute/hour/day
Run in two modes: dynamic and static
AJAX technique was used to automatically
reload partial display
10
NetViewer - Chat Network
11
Module: GroupDetector
Designed to detect groups and visualize group
evolution
Scan game logs and identify the trigger events
for group reorganization
Able to choose game tasks and time periods
Display single group or multiple groups
Can run in two modes: dynamic and static
Use AJAX technique to automatically reload
partial display
12
GroupDetector - Group evolution in a task
13
Module: CorrelationFinder
Designed to discover the correlation between census
data and game play
Visualize census variables as the background colors at
the county level, and visualize the players’ behaviors as
the foreground marker and links
Reveal hidden correlations by overlapping two-layer
graphs
Able to choose analysis variables from census data and
game behavior data
Able to select location and regions based on user’s
interest
Visualize variables in a quantitative manner
Verify correlation by statistic methods
14
Is there a correlation
between them?
CorrelationFinder – Overlapping Technique
Two layers:
Each county of California is filled using gradient colors based on the
population density
Player volume (aggregated to the zip-code level) is represented as
markers with gradient colors
15
Two layers:
CorrelationFinder:
Median Age with Conversation Volume
16
Computation Complexity
Major computation cost:
Data Summarization
m – number of rows (R) in game logs
n – number of time and location attributes (A)
p – number of aggregation levels (L)
Geocoding
m – number of Players(P) in game logs
n – number of zip-code in the zip-code book(Z)
Data Visualization
17
x – number of snapshots in time series (T)
m – number of edges (E) in drawing
n – number of markers (R) in drawing
p – number of links (L) in drawing
Data Analysis on Gordon
Massive computer nodes with rich memory on Gordon speed up the
data processing
On standalone sever: With 8 CPUs and 12GB RAM, data summarization
and geocoding took over 500 hours
On Gordon: 8 parallel jobs with each using 16 cores, all jobs done with 48
hours
Software stack, especially R, supported on Gordon allows the
project to run lengthy and complex data analysis
The system support group and consulting office at SDSC always
provide prompt services
We appreciate the effort of the SDSC’s Gordon team
18