MUVI: Automatically Inferring Multi-Variable Access Correlations and Detecting Related Semantic and Concurrency Bugs Shan Lu (shanlu@cs.uiuc.edu) Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang, Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou University of Illinois http://opera.cs.uiuc.edu Bugs are bad! Software bugs are costly! Account for 40% of system failures [Marcus2000] Cost US economy $59.5 billion annually [NIST] Techniques to improve program correctness are desired Software bug categories Memory bugs Semantic bugs Improper memory accesses and usage A lot of study and effective detection tools Violation to the design requirements or programmer intentions Biggest part (~80%*) of software bugs No silver bullet Concurrency bugs Wrong synchronization in concurrent execution Increasingly important with the pervading concurrent program trend Hard to detect * Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern Open Source Software [ACID’06] An important type of semantic information Variable Access Correlation Software programs contain many variables Variables are NOT isolated Semantic bond exists among variables Correct programs consistently access correlated variables t x y z s v u w Variable correlation in programs Semantic correlation widely exists among variables Class THD { … 4 M Y D B struct net_device_stats struct fb_var_screeninfo struct st_test_file * { { … cur_file; … int red_msb; char* db; long rv_packets int blue_msb; int db_length; long rv_bytes; int green_msb; struct st_test_file * file_stack; int transp_msb; } MySQL } Constraint specification Linux Different representation Linux } Different aspects MySQL Implementation -demand Variable access correlation ( constraint ) Maintaining correlation usually needs consistent access write ( db ) access* ( db_length ) write ( rv_packets) write ( rv_bytes ) access ( red/…/transp) access ( red/…/transp) write ( file_stack ) ( cur_file ) write A1 ( x ) A2 ( y ) access read write access read write Variable access correlation *access: read or write Violating the correlations leads to bugs Programmers may forget to access correlated variables Correlated variables struct fb_var_screeninfo { … int red_msb; int blue_msb; int green_msb; int transp_msb; } Mostly consistent access --- correct int imsttfb_check_var ( … ) { ... var->red_msb = 0; var->green_msb = 0; var->blue_msb = 0; var->transp_msb = 0; … } Inconsistent access --- BUG! int neofb_check_var (...) { ... var->red_msb=0; var->green_msb=0; var->blue_msb=0; /* forget transp_msb!!*/ ... } More examplesConfirmed Linux of inconsistentby developers update bugs are in our paper. A type of semantic bugs not handled by previous tools Inconsistent update bugs Violating the correlations leads to bugs (ii) Programmers may forget to synchronize concurrent accesses to correlated variables Thread 1 struct JSCache { … JSEntry table[SIZE]; bool empty; } Mozilla js_FlushPropertyCache ( … ) { Thread 2 js_PropertyCacheFill ( … ) { lock ( T ) lock ( T ) memset ( cachetable, 0, SIZE); unlock ( T…) cachetable[indx] = obj; unlock ( T…) lock ( E ) lock ( E ) cacheempty = FALSE;BUG cacheempty = TRUE; } unlock ( E ) } unlock ( E ) This is NOT a traditional data race bug Bug occurs even if accesses to each single variable are well synchronized Multi-variable concurrency bugs Our contribution A technique to automatically infer variable access correlation Bug detection based on variable access correlation Inconsistent-update semantic bugs Multi-variable concurrency bugs Disclose correlations and new bugs from real-world applications (Linux-device_driver, Mozilla, MySQL, Httpd) > 6000 variable correlations 39 new inconsistent-update semantic bugs 4 new multi-variable concurrency bugs from Mozilla Outline Motivation MUVI variable access correlation inference MUVI bug detection What is variable access correlation Inconsistent-update semantic bug detection Multi-variable concurrency bug detection Evaluation Conclusions Basic idea of correlation inference access correlation A1 ( x ) A2 ( y ) Our target: Our inference method: Statistically infer access correlation based to judge ``together’’? on variable access pattern How in source code Our metric: Assumption: mature program, mostly correct static code distance within a Access x and y appear together in many times function scope correlation Our paper talks about other x and y seldom appear separately potential metrics How to do this efficiently? Frequent itemset mining A common data mining technique Itemset: a set of items ( no order ) E.g. (v, w, x, y, z) Sub-itemset: E.g. (w, y) Itemset database Goal: find frequent sub-itemsets in an itemset database Support: number of appearances E.g. support of (w, y) is 3 Frequent: support > threshold ( v, w, x, y, z ) (v, w, y, z, s ) (v, w, y, t ) (v, x, m, n) Flowchart of variable correlation inference Source files Pre-processing How? Itemset Database Mining Frequent variable sets Post-processing Variable access correlation How? MUVI Inference algorithm (pre-process) What is an item? Program Source Code What is an itemset? A variable A function What to put?into an itemset? Accessed variables Access type (read/write) Itemset Database MUVI Inference algorithm (pre-process) Input: program Output: an itemset database Flow-insensitive, inter-procedural analysis Consider Global variables and structure-typed variables Also consider variables accessed in callee functions int x; f1 ( ) { f3 f2 ( ) { read x; f1 f2 } S t; write t.y; } int z; f3 ( ) { read z; f1 ( ); f2 ( ); } Database f1 {read, x} f2 {write, S::y} f3 {read, z} … …… MUVI Inference algorithm (post-process) Input: frequent variable sets (x, y), which appear together in many functions Pruning What if x and y appear separately many times? Prune out low confidence (conditional probability) pairs What if x is too popular, e.g. stderr, stdout? Categorize based on access type write (x) write (y)? Or write (x) read (y)? etc. Output: variable correlation A1 ( x ) A2 ( y ) Outline Motivation MUVI variable access correlation inference MUVI bug detection Inconsistent-update semantic bug detection Multi-variable concurrency bug detection Evaluation Conclusions Inconsistent-update bug detection Step 1: get all write(x)acc(y) correlations Step 2: get all violations to above correlations Step 3: prune out unlikely bugs Code analysis to check caller and callee functions int neofb_check_var (...) { ... var->red_msb=0; var->green_msb=0; var->blue_msb=0; /* forget transp_msb!!*/ ... } write (fb_var_screeninfo::blue_msb) access (fb_var_screeninfo::transp_msb) #support = 11 #violation = 1 (function neofb_check_var) inconsistent-update bug Multi-variable concurrency bug detection -- MUVI Lock-set algorithm Original algorithm Look for common locks among conflicting accesses to each shared variable MV Lock-Set algorithm Look for common locks among conflicting accesses to each shared variable and their correlated accesses Thread 1 Thread 2 Lock-Set MV A3 ( y ) Thread 1 Lock ( T ) Thread 2 Lock ( T ) memset (cachetable,0,SIZE) ; cachetable[indx] = obj; Unlock ( T ) Unlock ( T ) A1 ( x ) A2 ( x ) ∩ LL (A1) (A1) =∩Ф L? (A3) = Ф ? L (A2) ∩ Lock ( E ) Lock ( E ) cacheempty = TRUE; cacheempty = FALSE; Unlock ( E ) Unlock ( E ) Multi-variable concurrency bug detection -- Other MUVI extension algorithm MUVI happens-before algorithm Check the happens-before relation among conflicting accesses to each single variable Check the happens-before relation among conflicting accesses to each single variable and correlated accesses Other extension Extending hybrid race detection Extending atomicity violation bug detection Outline Motivation MUVI variable access correlation inference MUVI bug detection Inconsistent-update semantic bug detection Multi-variable concurrency bug detection Evaluation Conclusions Methodology For variable correlation and inconsistent-update bug detection: Linux (device driver) Mozilla MySQL PostgreSQL All latest versions For multi-variable concurrency bug detection: Five existing real bugs from Mozilla and MySQL Find four new multi-variable concurrency bugs during the detection process Results on correlation inference App. #Access- #Involved %False Analysis Correlation Variables Positives Time Mozilla 1431 1380 16% 157m MySQL Linux 726 3353 703 3038 13% 19% 19m 175m 939 833 15% 98m Postgre-SQL Macro, inline functions coincidence Inconsistent-update bug detection results App. Linux Mozilla MySQL Postgre-SQL # of MUVI # of new # of bad # of false bug report bugs found programming positives 40 22 (12) 5 13 30 20 10 7 (0) 9 (5) 1 (0) 8 3 4 Semantic exceptions Wrong correlations No future read access 15 8 5 Multi-variable concurrency bug detection results MV-Lockset Bug Detect Bug? False Positive Moz-js1 Y 1 Moz-js2 Y 2 Moz-imap Y 0 MySQL-log Y 3 MySQL-blog N 0 MV-Happens-Before Variables are conditionally correlated correlation is missed by MUVI has The similar results Multi-variable concurrency bug detection results struct JSRuntime { int totalStrings; /* # of allocated strings*/ double lengthSum; /* Total length of allocated strings */ } Mozilla jscntxt.h Thread 1 js_NewString( … ) { // allocate a new string JS_ATOMIC_INCREMENT (&(rt->totalStrings)); Thread 2 printJSStringStats ( ... ) { count = rt totalStrings; mean = rt lengthSum / count; printf ( …… ); PR_Lock(rtLock); rt->lengthSum += length; PR_Unlock(rtLock); } Mozilla jsstr.h } Mozilla jsstr.c Wrong result! 4 new multi-variable concurrency bugs detected! Conclusion Variable access correlations can be inferred Variable access correlation is important Help detect two types of bugs Other usage Provide specifications to ease programming Provide hints for assigning locks or TMs E.g. AtomicSet, AutoLocker, Colorama Related works Program specification inference Code pattern mining [LiOSDI04], [LiFSE05], [LivshitsFSE05], etc. Concurrency bug detection [ErnstICSE00], [EnglerSOSP01], [KremenekOSDI06], [LiblitPLDI03], [WhaleyISSTA02], [YangICSE06], etc. [ChoiPLDI02], [EnglerSOSP03], [FlanaganPOPL04], [SavageTOCS97], [Praun01], [XuPLDI05], [YuSOSP05], etc. Techniques for easing concurrent programming [Harris03], [HerlihyISCA93], [McCloskeyPOPL06], [Rajwar02], [Hammond04], [Moore6], [Rossbach07], etc. Acknowledgement Prof. Stefan Savage (shepherd) Anonymous reviewers Prof. Liviu Iftode GOOGLE student travel grant NSF, DOE, Intel research grants Thanks! http://opera.cs.uiuc.edu