On Statistical Approaches to Online Game Bot Detection P9592205 廖宜財 Advisor: 朱浩華 博士 Co-advisor: 陳昇瑋 博士 Date: 2009/7/21 Outlines • • • • • • Introduction Related Work FPS Bots Detection Rhythm Bot Detection Future Work Conclusion 2016/6/28 2/80 MMOG • MMOG (Massively Multiplayer Online Game) has become the most popular Internet activities 2016/6/28 www.mic.iii.org.tw 3/80 MMOG Genres • • • • • • RPG (Role Playing Game) SLG (Simulation game) FPS (First-Player-Shooter) Game Gaming Rhythm Game … 2016/6/28 4/80 Major Challenge: Cheating • Why cheating? – Get profit from game – Self-transcendence • Game cheating types – Hacker/Game exploits • Expert Programmers • Serious, but can be prevented by hardware/game architecture – Game Bots • AI programs that can perform some tasks in place of gamers • Everyone can download and use it! 2016/6/28 5/80 Game Bots • Unfair competition – Bots accumulating rewards efficiently in 24 hours a day • Economy problem – Easy money Currency inflation – Reduce new-comers to join – Game provider will close the game 2016/6/28 6/80 The Challenge of Bot Detection • Hard to detect – Bots obeys the game rules perfectly – No general detection methods are available today Internet 2016/6/28 7/80 Our Goal for Bot Detection • Propose bot detect approaches as general as possible • Passive detection – No intrusion in players’ gaming experience • No client software is required 2016/6/28 8/80 Related Work - Bot Prevention (1/2) • Human intelligence – Detection: Bots may show regular or peculiar behavior – Confirmation: Most of bots will off-line immediately while detect GM (Game Master) • Pros and cons – Easy – A lot of GM, a lot of salary – Judge by subjective feeling of GM 2016/6/28 9/80 Related Work - Bot Prevention (2/2) • CAPTCHA tests (Completely Automated Public Turing test to tell Computer and Human Apart) – Bots are hard to recognize distorted images • Pros and cons – Accurately – Annoy innocent players 2016/6/28 10/80 Related Work - Bot Detection • Process monitoring at client side – Constantly changing the bot program’s signature • Traffic analysis at the network – Remove bot traffic’s regularity by heavy-tailed random delays • Aiming-bot detection using Dynamic Bayesian Network – Specific to aiming bots that help aim the target accurately 2016/6/28 11/80 Progress • • • • • • Introduction Related Works FPS Bots Detection Rhythm Bot Detection Future Work Conclusion 2016/6/28 12/80 Quake 2 • A classic FPS game – “The Best Game Ever” by PC Gamer in 1998 – Sell over one million copies • Many real-life human traces are available on the Internet – There are in-game recording functions • Open source (include Quake 3) – Scholarly research – A lot of bots 2016/6/28 13/80 A Screen Shot of Quake 2 2016/6/28 14/80 Data Collection (1/2) • Human traces can be downloaded from websites – Competition records – Experience shared – To show off themselves • Traces downloaded from: GotFrag Quake http://www.gotfrag.com/ 2016/6/28 Planet Quake http://planetquake.gamespy.com/ Demo Squad http://q2scene.net/ds/ Revilla Quake Sites http://www.revilla.nildram.co.uk/ 15/80 Data Collection (2/2) • Bot traces collected on our own Quake server – CR BOT 1.14 (http://arton.cunst.net/quake/crbot/) – Eraser Bot 1.01 (http://impact.frag.com) – ICE Bot 1.0 (http://planetquake.com/ice) • For automation, we implement a bot-launcher program – – – – – 2016/6/28 Launch Quake 2 and bot program Monitor the time spent Create bots Select bot’s AI … 16/80 Screen Shot of “Q2BotLauncher” *Bots selection Game map name *In-game trace recording file name 2016/6/28 17/80 Trace Data Description • Quake 2 official trace format – DM2 • Types of dm2 file format – Client side recording • Player point of view • Replay-able – Server side recording • Server point of view • Non-replay-able 2016/6/28 Players’ traces Bots’ traces 18/80 DM2 Format Reference • Source packages from Id-soft – The newest release from the end of November 1998 with both Mission Packs – Plain game • Linux version: ftp://ftp.idsoftware.com/pub/quake2/source/q2src320.shar.Z • Windows version: ftp://ftp.idsoftware.com/pub/quake2/source/q2src320.exe – Xatrix Quake II Mission Pack 1 • Linux version: ftp://ftp.idsoftware.com/pub/quake2/source/xatrixsrc320.shar.Z • Windows version: ftp://ftp.idsoftware.com/pub/quake2/source/xatrixsrc320.exe – Rouge Quake II Mission Pack 2: • Linux version: ftp://ftp.idsoftware.com/pub/quake2/source/roguesrc320.shar.Z • Windows version: ftp://ftp.idsoftware.com/pub/quake2/source/roguesrc320.exe 2016/6/28 19/80 Parsing Example of DM2 Block size MsgID Frame Move EOF 2016/6/28 20/80 Example: Frame Move Server frame Area bits Read n bytes Frame Move Position Delta frame View Angle Weapon State Read size *2 byte state Player tag Total entity 0x80=need next byte # Entity Entity data Player-flags, need parse that for other player data Package entity tag 2016/6/28 21/80 Tools We Developed • DM2 Parser – DM2 is inconvenient to parse by Statistical tools – Filter out the information we don’t need – Convert traces to a CSV file format • 3D trace viewer – Quake 2 is 3D game, we need observe traces in 3D space to figure out trace features 2016/6/28 22/80 DM2 Parser 2016/6/28 23/80 DM2 Parser Output (Time) 2016/6/28 (X, Y, Z) 24/80 3D Viewer 2016/6/28 25/80 Data Summary • Each cut into 1,000-second segments • Totally 143.8 hours of traces were collected ICE bot tends to stay in a place, and try to ambush other players 2016/6/28 26/80 Our Solution: Trajectory-based Detection • Based on the avatar’s moving trajectory in game • Applicable for all genres of games where players control the avatar’s movement directly • Avatar’s trajectory is high-dimensional (both in time and spatial domains) 2016/6/28 27/80 The Rationale behind Our Scheme • The trajectory of the avatar controlled by a human player is hard to simulate for two reasons: – Complex context information: Players control the movement of avatars based on their knowledge, experience, intuition, and a great deal of information provided in the game. – Human is not always logical and optimal • How to model and simulate realistic movements is still an open question in the AI field?! 2016/6/28 28/80 Data Representation • We give up the z value of trace – Avatar moving on ground, unless fall/climb, the z-vale is not changed t 2016/6/28 (X, Y) (X, Y) (X, Y) 29/80 Trails of CRBot Building, obstacles Color density= visit frequency 2016/6/28 30/80 Trails of Eraser Bot 2016/6/28 31/80 Trails of ICE Bot 2016/6/28 32/80 Trails of Human Players Human tend to explore all areas For finding items Human players normally avoid open area For safety Human players spent more time in narrow area For safety 2016/6/28 33/80 Individual Trajectory • Human tend to turn their directions continuously and slightly – Searching – Quick movement CRBOT Human player 2016/6/28 Eraser Bot ICE Bot 34/80 Detection Scheme • Feature Based – Given a segment of a trajectory, {xt, yt}, 1 ≤ t ≤ T, we extract the following features from this two-dimensional time series • • • • ON/OFF Activity Pace Path Turn • SVM classifier 2016/6/28 35/80 ON/OFF Activity • Move (ON) or stop (OFF) • We define ON periods as consecutive periods of movement longer than 1 second, and OFF periods as the remaining time frames 2016/6/28 36/80 Pace • Avatars are allowed to move at different speeds – Running, slow walking, step-by-step walking, lateral shifting, and moving backwards – Fast move • The SD paces > 10 units • Teleportation frequency – Some players are like to jump into teleport for quick and long distance moving, some are not – Bots has their probability to do that – Avatar die – Offset in one second >= 60 units 2016/6/28 37/80 Path: Linger • Why linger? – Waiting for item spawn – Dog-fight with enemies – Subconscious • Linger definition (t1, x1, y1) | t1-t2 | > 30 sec. and (t2, x2, y2) 2016/6/28 ( x1 x2 ) 2 ( y1 y2 ) 2 < 300 units 38/80 Path: Smoothness • Avatar moves in straight or zig-zag patterns – In dog-fight, it is clearly more ‘zig-zag’ – Follow the cover, it is more ‘zig-zag’ – Subconscious • Smoothness definition (t1, x1, y1) The number of times the avatar moves across the line (x1, y1)-(x2,y2) during the period (t1,t2) (t2, x2, y2) 2016/6/28 39/80 Path: Detourness • The effectiveness of avatar’s movements • Detourness definition (t1, x1, y1) length of the movement ( x1 x2 ) 2 ( y1 y2 ) 2 (t2, x2, y2) 2016/6/28 40/80 Turn • The frequency and amplitude of how avatars change direction • Turn definition (t , x , y ) 1 1 1 t3 - t2 = t2 - t1 θ>= 30° θ>= 60° θ>= 90° (t2, x2, y2) θ (t3, x3, y3) 2016/6/28 41/80 Feature Summary (1/2) Feature Observation ON/OFF On: players’ mean and SD are the highest Pace 2016/6/28 Summary Human tend to move all the time Off: players’ mean is higher than bots, SD is longer than bots Human tend to wait for a longer time after a long move Four player types have different behavior Great discriminability 42/80 Feature Summary (2/2) Feature Observation Summary Path Players’ linger frequency is the lowest Lingering in a place for a long time is dangerous Players’ smoothness is the lowest Players' movements are irregular and unpredictable Movements of human players are relatively more efficient Human players tend to move away from current positions to another place efficiently Turn 2016/6/28 Turn frequency of players’ in Human players tend to adjust their 30° is the highest, and relatively directions continuously and slightly lower in 90°, average turn angle of human players is the lowest 43/80 Evaluation (1/2): Bot Detection Accuracy is > 90% Good accuracy Time > 800 Result is ok 2016/6/28 44/80 Evaluation (2/2): Player-Type Classification Accuracy > 90% Accuracy > 98% Good accuracy 2016/6/28 45/80 Progress • • • • • • Introduction Related Work FPS Bots Detection Rhythm Bot Detection Future Work Conclusion 2016/6/28 46/80 Casual Game • Simple and easy gameplay – Poker, puzzle… • Short term – Quickly reach final stage during work break • Free (Item mall) 2016/6/28 47/80 Market Scale of Casual Game in Taiwan 2016/6/28 www.mic.iii.org.tw 48/80 Rhythm Game • One of the most popular game in casual games 2016/6/28 49/80 Let’s Watch Some Dances 2016/6/28 50/80 Why Rhythm Bot? • Game bots: automated AI programs that can perform certain tasks in place of gamers • Player performance based on timing information, which is provided by the client – Easy to cheat Game client Game server Internet 2016/6/28 51/80 The Challenges • It’s easy to construct a classifier to detect game bots as – 1) human behavior contains more variance, – 2) each player has her own tendency to make errors / skill levels • BUT, it’s easy for bots to fight back by learning human behavior 2016/6/28 52/80 Dancing Online (唯舞獨尊) • Why DO (Dancing Online)? – Top ten discussed game in casual game forum • www.gamer.com.tw – We know the game designers • Query some game rules from them 2016/6/28 53/80 DO Bot • Only one bot – DCO(舞林至尊) • http://www.play55.net/ – Monthly-fee: NT$300 • The game itself is free! • Some features – Autorun – Random select song – ADV • Anti-Detection Value 2016/6/28 54/80 Data Collection Challenges • DO did not design any logs preservation mechanism • To add this feature is not possible – Modify the source code will need to re-test and reduce the profit – Contracted agreement • Implement a bot-like recording program – Record players' and/or bot’s traces from client side 2016/6/28 55/80 Recording Programming Principle • Hook process – Inject our codes to the address space of DO • http://www.codeproject.com/KB/DLL/DLL_Injectio n_tutorial.aspx/ • Code-rewrite – Detours the original codes of DO to our codes • http://research.microsoft.com/enus/projects/detours/ 2016/6/28 56/80 Recording Program Architecture <<Component>> Detours Inject <<Component>> DO Monitor Detours <<Component>> Recording Codes 2016/6/28 57/80 Screen Shots of Recording Program Recording Program State indicator 2016/6/28 58/80 Predication • The bot can finish all keys in a very short time – The probability distribution is centered • The bot almost never make any error keypress – The error rate is very low (≒0) • The key-press of bot is not affected by key combinations – Probability distribution patterns are similar 2016/6/28 59/80 Keystroke Mapping 2016/6/28 60/80 The Sorted Traces 2016/6/28 61/80 Data Collection: Real Players • Recorded by our recording program • Training data – Six real players to play of different bpm and/or different difficulty level • Player-type classification – 10 real players – Assign 20 songs in each 4 game levels 2016/6/28 62/80 Data Collection: Bot • • • • Recorded by our recording program Enable autorun Enable random select song For each run, manual set ADV from 0 to 700ns and select difficult level from easy to top 2016/6/28 63/80 Data Summary 2016/6/28 64/80 Scheme: Inter-Keypress-Time Based Approach (1/3) • Key combination (16 combinations) – (↑、↓、←、→)x (↑、↓、←、→) • Inter-keypress-time t1 t1 2016/6/28 t2 t3 t2 t4 t5 t3 t6 t7 t8 t4 65/80 Scheme: Inter-Keypress-Time Based Approach (2/3) • Conditional probability distributions – CDF (Cumulative Distribution Function) • f( t | current input: key, previous input: key ) – Example: • Current arrow key: ← • Previous arrow key: ↑ f (t | current input , previous input ) 2016/6/28 66/80 Scheme: Inter-Keypress-Time Based Approach (3/3) • Classifier – Kolmogorove-Smirnov test (KS test) D1,2 sup F1 x F2 x x 1,2: two players – Multinomial Experiment F: CDF • Bonferroni method • α = 0.05/16 ≈ 0.003 – If each trial error rate remain 5% – 1 – ( 1 – 0.05)16 ≈ 0.5599 • Any p-value of combinations < α – They are from two different distribution 2016/6/28 67/80 Probability Distribution of 16 key Combinations 2016/6/28 68/80 Player-Type Consistency Check • Players’ behaviors are different in different time – Players progress by learning and practicing in long term – Can it make our scheme results in misjudgement? 2016/6/28 69/80 Consistency Check: Result • All p-value > α(0.003): two samples are from same population 2016/6/28 70/80 Simulated Bots • Real bot: DCO – Too easy to classified – The distribution is centered and approximate 0 • Simulation bot 1 – Fix the inter-keypress-time and add uniform distribution to simulate real players – Example: 0.2 ± 0.05 sec • Simulation bot 2 – Bot learn players average inter-keypress-time of each 4 combinations – Calculate the mean and standard deviation of them – Use normal distribution to simulate the probability distribution 2016/6/28 71/80 Simulated Bot 1: Result • All p-value < α(0.003): two samples are from different population 2016/6/28 72/80 Simulated Bot 2: Result • We can still identify the bot 2016/6/28 73/80 Future Work • The scheme can be fight back if bot designer: – Find out the CDF – Familiars with the probability and statistics • Idea: pressure – Why players make mistake? • Probability: Easy to measure and simulate • Pressure: Abstract, and it is difficult to quantify • Every player behaves similar under the same pressures – There are no existing pressure formula • The bot-designer can not fight back without our formula 2016/6/28 74/80 Scheme: Pressure-Error Based Approach • We try to utilize the conditional probability function to learn the model – Interval time of indicators – key combinations – Inter-keypress-time 2016/6/28 75/80 Quantify the Pressure 2016/6/28 76/80 Users’ Error Rate under Different Pressures 2016/6/28 77/80 Conclusion (1/2) • We propose a trajectory-based approach for game bots detection – It is a general technique • The avatar's movement is controlled by the player directly – Achieve a detection accuracy of 95% or higher • When the trace length is 200 seconds or longer – Can distinguish between real players and bots • Difficult to simulate human players' movement 2016/6/28 78/80 Conclusion (2/2) • We propose a inter-keypress-time based approach for rhythm game bot detection – It is a general technique – The first paper for rhythm game bot detection 2016/6/28 79/80 Thank You for Your Attention! 2016/6/28 80/80