Race Detection for Event-driven Mobile Applications Chun-Hung Hsiao Jie Yu Satish Narayanasamy Ziyun Kong Cristiano Pereira Gilles Pokam Peter Chen Jason Flinn University of Michigan University of Michigan / Twitter University of Michigan University of Michigan Intel Intel University of Michigan University of Michigan Rise of Event-Driven Systems Mobile apps Web apps Data-centers Lack tools for finding concurrency errors in these systems 2 Why Event-Driven Programming Model? Need to process asynchronous input from a rich set of sources 3 Events and Threads in Android Looper Thread Event Queue Threads Regular Threads send( signal(m) wr(x) wait(m) ) onServiceConnected() { ... } onClick() { ... } rd(x) 4 Conventional Race Detection e.g., FastTrack [PLDI’09] Looper Thread onClick() { ... } Regular Threads Causal order: happenssend( signal(m) ) before ( ) defined by synchronization operations wr(x) onServiceConnected() { ... } Conflict: Read-Write or wait(m) rd(x) Write-Write data accesses to same location Race ( ): Conflicts that are not causally ordered 5 Conventional Race Detection: Problem Looper Thread onClick() { send( Regular Threads ); } onReceive() { *p; } onDestroy() { NullPointerException! p = null; } Conventional race detectors cannot find such errors in Android Problem: Causality model is too strict Should not assume program order between events 6 Model Events as Threads? Event Event onReceive() { onDestroy() { } } Event onClick() { send( } ); Regular Threads p = null; *p; Race 7 Events as Threads: Problem Regular Threads Event onServiceConnected() { Event *p; } send( ) send( ) onDestroy() { False race p = null; } Missing causal order! Problem: Causality model is too weak Android system guarantees certain causal orders between events 8 Challenge 1: Modeling Causality Goal: Precisely infer causal order between events that programmers can assume Looper Thread A onClick() { send( } B B ); onReceive() { *p; } C A→B C || B onDestroy() { p = null; } 9 Challenge 2: Not All Races are Bugs Races between events (e.g., ~9000 in ConnectBot) Order violations Events Atomicity violations Events Not a problem in p = new T; Android events! p = null; *p; p = null; *p; One looper thread executes all events non-preemptively Solution: Commutativity analysis identifies races that cause order violations 10 Outline • Causality Model • Commutativity Analysis • Implementation & Results 11 Conventional causal order; Event atomicity; Event queue order Causality Model • Android uses both thread-based and eventbased models • Causal order is derived based on following rules: 1. Conventional causal order; order in thread-based model 2. Event atomicity; atomicity 3. Event queue order 12 Conventional causal order; Event atomicity; Event queue order Looper Thread begin(A) fork(thread) Fork-join Regular Thread begin(thread) end(A) Program order send(B) begin(B) Send fork(thread) → begin(thread) end(thread) → join(thread) signal(m) → wait(m) signal(m) end(B) Signal-wait wait(m) send(event) → begin(event) 13 Conventional causal order; Event atomicity; Event queue order One looper thread executes all events non-preemptively => events are atomic Looper Thread begin(A) Regular Thread fork(thread) begin(thread) end(A) begin(B) Ordered due to event atomicity send(B) begin(A) → end(B) end(A) → begin(B) end(B) 14 Conventional causal order; Event atomicity; Event queue order Looper Thread Regular Thread Event Queue send(A) A send(B) B begin(A) send(A) → send(B) end(A) begin(B) Ordered due to FIFO queue order end(A) → begin(B) end(B) 15 Conventional causal order; Event atomicity; Event queue order It’s Not That Simple… Special send APIs can overrule the FIFO order – Event with execution delay – Prioritize an event • sendAtFront(event): inserts event to queue’s front Special event queue rules handle these APIs. See paper for details. 16 Event Orders due to External Input Looper Thread A onClick() { send( } B Assume all events generated by the external environment are ordered B ); onReceive() { *p; } C onDestroy() { p = null; } 17 What is External Input? External Environment surfaceflinger context_manager IPC system_server App 18 Outline • Causality Model • Commutativity Analysis • Implementation & Results 19 Problem: Not All Races are Bugs Races between events Order violations Atomicity violations Not a problem in Android events! 20 Order Violations in Events Looper Thread Looper Thread onReceive() { *p; } onDestroy() { p = null; } Race between non-commutative events => order violation 21 Races in Commutative Events Looper Thread Looper Thread onLayout() { if(!flag) return; resize(); } onPause() { flag = false; } racy events are commutative => not a race bug Hard to determine if events are commutative! 22 Solution: Commutativity Analysis Report races between known non-commutative operations -- uses & frees Looper Thread A onClick() { send( } Heuristics to handle commutative events with uses and frees. See paper for details. B B ); onReceive() { *p; } C onDestroy() { p = null; } Use Free 23 Outline • Causality Model • Commutativity Analysis • Implementation & Results 24 CAFA: Race Detection Tool for Android App surfaceflinger context_manager Java Libs system_server Java Libs Dalvik VM Dalvik VM Native Libs Android Kernel Native Libs IPC Binder Logger CAFA Analyzer Offline Also Logger device race the detector system in the kernel service based for on Logs logs data access operations synchronization operations processes trace graph collection reachability forinference complete related to uses andtest freescausality for causality 25 Tested Applications 26 Use-after-Free Races 115 races; 69 race bugs (67 unknown bugs) 32 benign races (27.8%): Imprecise commutative analysis 31 (27.0%) 46 (40.0%) 13 (11.3%) 38 (33.0%) 25 (21.7%) Races in conventional causality model Races in Android causality model Between events Between threads False positives 14 false races (12.2%): Imprecise causal order: -- Imperfect implementation 27 Performance Overhead • Trace collection – 2x to 6x; avg: ~3.2x – Interactive performance is fair • Offline analysis – Depends on number of events – 30 min. to 16 hrs. for analyzing ~3000 to ~7000 events 28 Summary • Races due to asynchronous events is wide spread • Contributions – Causality model for Android events – Commutativity analysis identifies races that can cause order violations – Found 67 unknown race bugs with 60% precision • Future work – Commutativity analysis for finding a broader set of order violations – Optimize performance 29