April 8, 2020 Data Analysis on Massive Online Game Logs Dora Cai – NCSA, Univ. of Illinois Growing Popularity of Online Games • 135 million gamers are playing worldwide • Thousands of game titles have been developed • Enormous game logs have been generated and collected • Game logs are unique resource for Social Science studies • Many researchers are working on game log analysis 2 The Research Team Started in 2007, about 20 members University of Illinois at Urbana-Champaign Professor Marshall Scott Poole, post-doctoral scholars and PhD students Northwestern University Professor Noshir Contractor, post-doctoral scholars and PhD students University of Southern California Professor Dmitri Williams and PhD students University of Minnesota - Twin Cities Professor Jaideep Srivastava and PhD students 3 Project Data Flow Gordon Cluster Internet Players Game Logs UIUC Database Analysis Software 4 Research Issues in Game Log Analysis 5 Are there social networks behind the scene? What are the characteristics of the social networks in game play? Is player’s behavior predictable? Does player’s behavior reflect his/her personality? What is the relationship between the virtual world and real world? What is the impact of game play on player’s personal life? Does team assembly improve play performance? Project Achievement Project has been funded by NSF, ARI, AFRL, and ARL More than 40 conference and journal papers have been published More than 30 graduate students have been trained 8 PhD students worked on this project have graduated A comprehensive game log database has been constructed Project has attracted collaborations from many academic institutions and game companies A spinoff company has been created by two of the PIs 6 My Involvement in the Project Join the project since 2008 Construct and maintain a game log database (4.5TB) English 中文 English Integrate game logs in 3 languages (English, Chinese and Japanese) from 4 online games (Ever Quest II, Chevalier’s Romance 3, Dragon’s Nest, Eve Online) into one single database Help researchers effectively use HPC and databases in their research Work with the research team: Build the prediction models based on player’s behavior Design and implement the algorithms for group detection Visualize the social networks in online games 7 中文 日本語の English English English 中文 English A New Tool: SocialMapExplorer A web-based application for visualizing the social networks of online games An application implemented using GoogleMap API, HTML, JavaScript A highly interactive tool: Users can choose analysis variables, aggregation levels, time periods, and location regions A tool using visual features (color, size, shape, weight and font) to represent various network features A tool for visualizing data on a real map and tightly combining time and spatial information with other study attributes A tool capable to process a terabyte-scale dataset with complex data structure 3 modules: NetViewer, GroupDetector, and CorrelationFinder 8 Work Flow for SocialMapExplorer Step 1: Data summarization Apply data-mining/data-warehouse techniques to construct materialized views on data cubes Step 2: Geocoding Match players’ zip-code with an official USA zipcode book and assign latitude/longitude coordinates for each player Step 3: Data visualization Visualize data on real maps 9 Player Zip-Code Latitude Longitude 1234567 15603 -122.26252 37.90194 2345678 44327 -56.77754 23.78321 …… ….. …… …… Module: NetViewer Designed for analyzing network dynamics by visualizing social networks in time series Trace networking events and make the linkage between involved parties Able to choose different data sets based on user’s interest Display networks at different intervals: minute/hour/day Run in two modes: dynamic and static AJAX technique was used to automatically reload partial display 10 NetViewer - Chat Network 11 Module: GroupDetector Designed to detect groups and visualize group evolution Scan game logs and identify the trigger events for group reorganization Able to choose game tasks and time periods Display single group or multiple groups Can run in two modes: dynamic and static Use AJAX technique to automatically reload partial display 12 GroupDetector - Group evolution in a task 13 Module: CorrelationFinder Designed to discover the correlation between census data and game play Visualize census variables as the background colors at the county level, and visualize the players’ behaviors as the foreground marker and links Reveal hidden correlations by overlapping two-layer graphs Able to choose analysis variables from census data and game behavior data Able to select location and regions based on user’s interest Visualize variables in a quantitative manner Verify correlation by statistic methods 14 Is there a correlation between them? CorrelationFinder – Overlapping Technique Two layers: Each county of California is filled using gradient colors based on the population density Player volume (aggregated to the zip-code level) is represented as markers with gradient colors 15 Two layers: CorrelationFinder: Median Age with Conversation Volume 16 Computation Complexity Major computation cost: Data Summarization m – number of rows (R) in game logs n – number of time and location attributes (A) p – number of aggregation levels (L) Geocoding m – number of Players(P) in game logs n – number of zip-code in the zip-code book(Z) Data Visualization 17 x – number of snapshots in time series (T) m – number of edges (E) in drawing n – number of markers (R) in drawing p – number of links (L) in drawing Data Analysis on Gordon Massive computer nodes with rich memory on Gordon speed up the data processing On standalone sever: With 8 CPUs and 12GB RAM, data summarization and geocoding took over 500 hours On Gordon: 8 parallel jobs with each using 16 cores, all jobs done with 48 hours Software stack, especially R, supported on Gordon allows the project to run lengthy and complex data analysis The system support group and consulting office at SDSC always provide prompt services We appreciate the effort of the SDSC’s Gordon team 18