Shan Lu's slides on MUVI [ppt]

advertisement
MUVI: Automatically Inferring
Multi-Variable Access Correlations and
Detecting Related Semantic and Concurrency Bugs
Shan Lu (shanlu@cs.uiuc.edu)
Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weihang Jiang,
Zhenmin Li, Raluca A. Popa, and Yuanyuan Zhou
University of Illinois
http://opera.cs.uiuc.edu
Bugs are bad!

Software bugs are costly!



Account for 40% of system failures [Marcus2000]
Cost US economy $59.5 billion annually [NIST]
Techniques to improve program correctness are
desired
Software bug categories

Memory bugs



Semantic bugs




Improper memory accesses and usage
A lot of study and effective detection tools
Violation to the design requirements or programmer intentions
Biggest part (~80%*) of software bugs
No silver bullet
Concurrency bugs



Wrong synchronization in concurrent execution
Increasingly important with the pervading concurrent program trend
Hard to detect
* Have Things Changed Now? -- An Empirical Study of Bug Characteristics in Modern
Open Source Software [ACID’06]
An important type of semantic information
Variable Access Correlation


Software programs contain many variables
Variables are NOT isolated


Semantic bond exists among variables
Correct programs consistently access
correlated variables
t
x
y
z
s v
u
w
Variable correlation in programs

Semantic correlation widely exists among variables
Class THD
{
…
4
M Y
D B
struct net_device_stats struct fb_var_screeninfo struct st_test_file *
{
{ …
cur_file;
…
int red_msb;
char* db;
long rv_packets
int blue_msb;
int db_length;
long rv_bytes;
int green_msb;
struct st_test_file *
file_stack;
int transp_msb;
}
MySQL }
Constraint
specification
Linux
Different
representation
Linux
}
Different
aspects
MySQL
Implementation
-demand
Variable access correlation ( constraint )

Maintaining correlation usually needs consistent
access
write ( db )
 access* ( db_length )
write ( rv_packets)  write
( rv_bytes )
access ( red/…/transp) access
( red/…/transp)
write ( file_stack )
( cur_file )
 write
A1 ( x )  A2 ( y )
access
read
write
access
read
write
Variable access correlation
*access: read or write
Violating the correlations leads to bugs

Programmers may forget to access correlated variables
Correlated
variables
struct
fb_var_screeninfo
{ …
int red_msb;
int blue_msb;
int green_msb;
int transp_msb;
}

Mostly consistent access
--- correct
int imsttfb_check_var ( … )
{
...
var->red_msb = 0;
var->green_msb = 0;
var->blue_msb = 0;
var->transp_msb = 0;
…
}
Inconsistent access
--- BUG!
int neofb_check_var (...)
{
...
var->red_msb=0;
var->green_msb=0;
var->blue_msb=0;
/* forget transp_msb!!*/
...
}
More examplesConfirmed
Linux
of inconsistentby
developers
update bugs are
in our paper.
A type of semantic bugs not handled by previous tools
Inconsistent update bugs
Violating the correlations leads to bugs (ii)

Programmers may forget to synchronize concurrent
accesses to correlated variables
Thread 1
struct JSCache {
…
JSEntry table[SIZE];
bool empty;
}
Mozilla

js_FlushPropertyCache ( … ) {
Thread 2
js_PropertyCacheFill ( … ) {
lock ( T )
lock ( T )
memset ( cachetable, 0, SIZE);
unlock ( T…)
cachetable[indx] = obj;
unlock ( T…)
lock ( E )
lock ( E )
cacheempty = FALSE;BUG
cacheempty = TRUE;
}
unlock ( E )
}
unlock ( E )
This is NOT a traditional data race bug

Bug occurs even if accesses to each single variable are well
synchronized
Multi-variable concurrency bugs
Our contribution


A technique to automatically infer variable access
correlation
Bug detection based on variable access correlation



Inconsistent-update semantic bugs
Multi-variable concurrency bugs
Disclose correlations and new bugs from real-world
applications (Linux-device_driver, Mozilla, MySQL, Httpd)



> 6000 variable correlations
39 new inconsistent-update semantic bugs
4 new multi-variable concurrency bugs from Mozilla
Outline

Motivation



MUVI variable access correlation inference
MUVI bug detection




What is variable access correlation
Inconsistent-update semantic bug detection
Multi-variable concurrency bug detection
Evaluation
Conclusions
Basic idea of correlation inference
access correlation
A1 ( x )  A2 ( y )

Our target:

Our inference method:
Statistically infer access correlation based
to judge
``together’’?
on variable access pattern How
in source
code



Our metric:
Assumption: mature program, mostly
correct
static code distance within a
Access
x and y appear together in many
times
function
scope
correlation
 Our paper talks about
other
x and y seldom appear separately
potential metrics
How to do this efficiently?
Frequent itemset mining

A common data mining technique




Itemset: a set of items ( no order )
 E.g. (v, w, x, y, z)
Sub-itemset:
 E.g. (w, y)
Itemset database
Goal: find frequent sub-itemsets
in an itemset database


Support: number of appearances
 E.g. support of (w, y) is 3
Frequent: support > threshold
( v, w, x, y, z )
(v, w, y, z, s )
(v, w, y, t )
(v, x, m, n)
Flowchart of variable correlation inference
Source files
Pre-processing
How?
Itemset Database
Mining
Frequent variable sets
Post-processing
Variable access correlation
How?
MUVI Inference algorithm (pre-process)

What is an item?


Program
Source
Code
What is an itemset?


A variable
A function
What to put?into an
itemset?


Accessed variables
Access type (read/write)
Itemset
Database
MUVI Inference algorithm (pre-process)



Input: program
Output: an itemset database
Flow-insensitive, inter-procedural analysis


Consider Global variables and structure-typed variables
Also consider variables accessed in callee functions
int x;
f1 ( ) {
f3
f2 ( ) {
read x;
f1
f2
}
S t;
write t.y;
}
int z;
f3 ( ) {
read z;
f1 ( );
f2 ( );
}
Database
f1 {read, x}
f2 {write, S::y}
f3 {read, z}
…
……
MUVI Inference algorithm (post-process)

Input: frequent variable sets
(x, y), which appear together in many functions

Pruning



What if x and y appear separately many times?
 Prune out low confidence (conditional probability) pairs
What if x is too popular, e.g. stderr, stdout?
Categorize based on access type


write (x)  write (y)? Or write (x)  read (y)? etc.
Output: variable correlation
A1 ( x )  A2 ( y )
Outline



Motivation
MUVI variable access correlation inference
MUVI bug detection




Inconsistent-update semantic bug detection
Multi-variable concurrency bug detection
Evaluation
Conclusions
Inconsistent-update bug detection



Step 1: get all write(x)acc(y) correlations
Step 2: get all violations to above correlations
Step 3: prune out unlikely bugs

Code analysis to check caller and callee functions
int neofb_check_var (...)
{ ...
var->red_msb=0;
var->green_msb=0;
var->blue_msb=0;
/* forget transp_msb!!*/
...
}
write (fb_var_screeninfo::blue_msb) 
access (fb_var_screeninfo::transp_msb)
#support = 11
#violation = 1 (function neofb_check_var)
inconsistent-update bug
Multi-variable concurrency bug detection
-- MUVI Lock-set algorithm

Original algorithm


Look for common locks among conflicting accesses to each
shared variable
MV Lock-Set algorithm

Look for common locks among conflicting accesses to each
shared variable and their correlated accesses
Thread 1 Thread 2
Lock-Set MV
A3 ( y )
Thread 1
Lock ( T )
Thread 2
Lock ( T )
memset (cachetable,0,SIZE) ;
cachetable[indx] = obj;
Unlock ( T )
Unlock ( T )
A1 ( x )
A2 ( x )
∩ LL (A1)
(A1) =∩Ф
L?
(A3) = Ф ?
L (A2) ∩
Lock ( E )
Lock ( E )
cacheempty = TRUE;
cacheempty = FALSE;
Unlock ( E )
Unlock ( E )
Multi-variable concurrency bug detection
-- Other MUVI extension algorithm

MUVI happens-before algorithm



Check the happens-before relation among conflicting
accesses to each single variable
Check the happens-before relation among conflicting
accesses to each single variable and correlated accesses
Other extension


Extending hybrid race detection
Extending atomicity violation bug detection
Outline



Motivation
MUVI variable access correlation inference
MUVI bug detection




Inconsistent-update semantic bug detection
Multi-variable concurrency bug detection
Evaluation
Conclusions
Methodology

For variable correlation and inconsistent-update
bug detection:





Linux (device driver)
Mozilla
MySQL
PostgreSQL
All latest versions
For multi-variable concurrency bug detection:

Five existing real bugs from Mozilla and MySQL
Find four new multi-variable concurrency bugs during the detection process
Results on correlation inference
App.
#Access- #Involved %False Analysis
Correlation Variables Positives Time
Mozilla
1431
1380
16%
157m
MySQL
Linux
726
3353
703
3038
13%
19%
19m
175m
939
833
15%
98m
Postgre-SQL
Macro, inline functions
coincidence
Inconsistent-update bug detection results
App.
Linux
Mozilla
MySQL
Postgre-SQL
# of MUVI # of new
# of bad
# of false
bug report bugs found programming positives
40
22 (12)
5
13
30
20
10
7 (0)
9 (5)
1 (0)
8
3
4
Semantic exceptions
Wrong correlations
No future read access
15
8
5
Multi-variable concurrency bug detection results
MV-Lockset
Bug

Detect Bug?
False Positive
Moz-js1
Y
1
Moz-js2
Y
2
Moz-imap
Y
0
MySQL-log
Y
3
MySQL-blog
N
0
MV-Happens-Before
Variables are conditionally correlated
correlation
is missed by MUVI
has The
similar
results
Multi-variable concurrency bug detection results
struct JSRuntime {
int totalStrings;
/* # of allocated strings*/
double lengthSum;
/* Total length of
allocated strings */
}
Mozilla jscntxt.h

Thread 1
js_NewString( … )
{
// allocate a new string
JS_ATOMIC_INCREMENT
(&(rt->totalStrings));
Thread 2
printJSStringStats ( ... )
{
count = rt totalStrings;
mean = rt lengthSum / count;
printf ( …… );
PR_Lock(rtLock);
rt->lengthSum += length;
PR_Unlock(rtLock);
}
Mozilla jsstr.h
}
Mozilla jsstr.c
Wrong
result!
4 new multi-variable concurrency bugs detected!
Conclusion

Variable access correlations can be inferred

Variable access correlation is important


Help detect two types of bugs
Other usage


Provide specifications to ease programming
Provide hints for assigning locks or TMs

E.g. AtomicSet, AutoLocker, Colorama
Related works

Program specification inference


Code pattern mining


[LiOSDI04], [LiFSE05], [LivshitsFSE05], etc.
Concurrency bug detection


[ErnstICSE00], [EnglerSOSP01], [KremenekOSDI06], [LiblitPLDI03],
[WhaleyISSTA02], [YangICSE06], etc.
[ChoiPLDI02], [EnglerSOSP03], [FlanaganPOPL04], [SavageTOCS97],
[Praun01], [XuPLDI05], [YuSOSP05], etc.
Techniques for easing concurrent programming

[Harris03], [HerlihyISCA93], [McCloskeyPOPL06], [Rajwar02],
[Hammond04], [Moore6], [Rossbach07], etc.
Acknowledgement





Prof. Stefan Savage (shepherd)
Anonymous reviewers
Prof. Liviu Iftode
GOOGLE student travel grant
NSF, DOE, Intel research grants
Thanks!
http://opera.cs.uiuc.edu
Download