Dynamic Software Update Testing: Framework and Empirical Study

advertisement
Dynamic Software
Update Testing:
Framework and
Empirical Study
Christopher M. Hayden, Eric A. Hardisty,
Michael Hicks, Jeffrey S. Foster
University of Maryland, College Park
Dynamic Software Updating
(DSU)
 Performing updates to software at runtime has clear
benefits:
 Increased software availability
 No need to terminate active connections /
computation
 … but can we trust updated software?
 Critical to ensure updates are safe
2
Our Contributions
 Verification of DSU through testing:
 Testing Procedure
 Test Minimization Algorithm
 Empirical Study:
 Effectiveness of Minimization
 Update Safety / Effectiveness of Safety Checks
3
DSU Safety
 DSU creates the opportunity for new sources of
bugs:
 Faulty state transformation
 Unsafe update timing
 Safety Checks – restrict when updates may be
applied

Activeness Safety / Con-freeness Safety
4
Activeness Safety (AS)
 AS prevents updates to active code
 In this example, no patch updating main or foo is
allowed:
main() {
foo() {
…
foo();
…
baz();
bar();
}
}
5
Con-freeness Safety (CFS)
 CFS (Stoyle, et al ‘05) allows updates to active code
only when type safety can be ensured
 In this example, no patch updating the signature of
baz or bar is allowed:
main() {
foo() {
…
foo();
…
baz();
bar();
}
}
6
Unsafe Timing:
Type Safety
Version 0
Version 1 (patch)
int foo(int x, int y) {
return x + y;
}
void foo(int *x, int y) {
*x += y;
}
crash
void bar() {
int z = 0;
…
z = foo(z, 5)
}
void bar() {
int z = 0;
…
foo(&z, 5)
}
7
DSU Testing
 Safety Checks offer limited guarantees:
 CFS and AS ensure type-safe execution
 AS ensures that you never return to old code following
an update
 Neither of these properties ensure safe update timing
 We propose testing to verify the correctness of
allowed update points:
 Use existing suite of application system tests
 Ensure that updating anywhere during the execution of
those tests results in an execution that passes the test.
8
Testing Procedure
 Approach:
Trace Start
 Instrument application to
trace update points
 Execute system test and
gather initial trace
Potential Update Points
 For each update point in
the initial trace, perform
an update test: force an
update at that point while
executing the system test
9
Testing Procedure
 Approach:
 Instrument application to
trace update points
 Execute system test and
gather initial trace
 For each update point in
the initial trace, perform
an update test: force an
update at that point while
executing the system test
✔
initial trace
10
Testing Procedure
 Approach:
 Instrument application to
trace update points
 Execute system test and
gather initial trace
 For each update point in
the initial trace, perform
an update test: force an
update at that point while
executing the system test
✔ ✔✘ ✔
initial trace
update tests 11
Update Test Minimization
 Program traces may have thousands or millions of
update points
 Many update tests have the same behavior for a given
patch

we can eliminate redundant tests
Version 0
void main() {
foo();
bar();
baz();
}
Patch A
Patch B
baz() {…}
foo() {…}
bar() {…}
baz() {…}
All update
points yield All update points
same behavior
yield distinct
behavior12
Minimization Algorithm
 Execution events are traced if they have the potential to
conflict with a patch

A event conflicts with a patch p if applying p before the
event might produce a different result than applying p
after the event

Example: function calls, global variable accesses
 Trace the execution of a test T on P0
 Iterate through the trace noting the last update point
each time we reach a conflicting trace element
 Run only the identified update tests Tnp
13
Empirical Results
14
Experimental Setup
 Based testing infrastructure on top of the
Ginseng DSU system (Neamtiu, et al):
 Modified to support tracing and updating at pre-
selected update points
 Insertion of explicit update points before each
function call to approximate more liberal systems
 Disabled safety checking (CFS) for experiments
 Tested 3 years of patches to OpenSSH and
vsftpd (only report OpenSSH in this talk)
15
Program Modifications
foo() {
while (1) { // main loop
update();
extract {
... // main loop body
}
}
extract {
... // after main Loop
}
Identify Long-running
loops
Add a Manually Selected
Update Point
Perform
Loop Body Extraction
Perform
Continuation Extraction
}
16
Experiments: Update Test Suite
 How many update tests must be run to test real-
world updates to real-world applications?
 How effective is minimization at eliminating
redundant tests?
17
Update Test Suite Size:
OpenSSH
D to next version
Reduction
#
Tests
Sig
Fun
Type
All Points
Activeness-Safe Points
0
75
3
98
5
580,871
g
31,791
(95%)
35,314
g
3,027
(91%)
1
75
0
6
0
705,322
g
1,795
(~100%)
587,578
g
1,717
(~100%)
2
76
5
238
11
638,720
g
63,011
(90%)
20,902
g
2,353
(89%)
3
91
0
18
0
772,198
g
4,324
(99%)
638,803
g
3,775
(99%)
4
91
13
172
10
773,086
g
27,399
(96%)
21,343
g
1,564
(93%)
5
104
0
24
1
878,235
g
17,398
(98%)
111,950
g
1,723
(98%)
6
104
6
257
10
879,668
g
47,092
(95%)
44,278
g
2,139
(95%)
7
104
4
179
12
918,717
g
89,601
(90%)
100,854
g
4,141
(96%)
8
105
0
72
3
973,364
g
34,293
(96%)
61,724
g
2,070
(97%)
9
104
10
157
7
933,514
g
52,356
(94%)
61,051
g
2,891
(95%)
Total
8,053,695
g
369,060
(95%)
1,683,797
g
25,400
(98%)
18
Empirical Study of Update Safety
 How many failures occur when applying updates
arbitrarily?
 How many failures occur when applying updates
subject only to the AS and CFS safety checks?
19
Safety: OpenSSH
D to next version
Update
Tests
Sig
Fun
Type
All Points
Failed
Total
CFS Points
Failed
Total
AS Points
Failed
Total
0
75
3
98
5
19,715
580,871
0
68,044
0
35,314
1
75
0
6
0
0
705,322
0
705,322
0
587,578
2*
76
5
238
11
306,965
683,720
1,688
75,307
4
20,902
3
91
0
18
0
0
772,198
0
772,198
0
638,803
4*
91
13
172
10
565,681
773,086
609
110,633
380
21,343
5
104
0
24
1
10,703
878,235
0
130,000
0
111,950
6
104
6
257
10
163,333
879,668
44,461
96,183
110
44,278
7
104
4
179
12
11,380
918,717
1
80,070
1
100,854
8
105
0
72
3
3
973,364
0
261,885
0
61,724
9
104
10
157
7
357,919
933,514
24
121,337
0
61,051
Total
1,435,699
8,053,695
46,783
2,420,979
495
1,683,797
20
Unsafe Timing:
Version Inconsistency
Version 0
Version 1 (patch)
void foo() {
bar();
…
baz();
}
void foo() {
bar();
…
baz();
}
void bar() { … }
void bar() { dig(); … }
void baz() { dig(); … }
void baz() { … }
Manually Selected Update Points
D to next version
Safety
#
Tests
Sig
Fun
Type
Reduction
Failed
Total
0
75
3
98
5
566
g
566
(0%)
0
566
1
75
0
6
0
630
g
592
(6%)
0
630
2
76
5
238
11
568
g
568
(0%)
0
568
3
91
0
18
0
783
g
770
(2%)
0
783
4
91
13
172
10
782
g
782
(0%)
0
782
5
104
0
24
1
860
g
841
(2%)
0
860
6
104
6
257
10
859
g
859
(0%)
0
859
7
104
4
179
12
850
g
850
(0%)
0
850
8
105
0
72
3
868
g
823
(5%)
0
868
9
104
10
157
7
833
g
833
(0%)
0
833
Tota
l
7,59
9
g
7,48
4
(2%
)
0
7,59
9
22
Summary
 We have argued that verification is necessary to
prevent unsafe updates
 Provided empirical evidence that AS/CFS cannot
prevent all unsafe updates
 We have presented an approach for testing dynamic
updates
 We have presented and evaluated a minimization
strategy to make update testing more practical
23
Additional Slides
24
Unsafe Timing:
Type Safety
Version 0
Version 1 (patch)
int foo(int x, int y) {
return x + y;
}
void foo(int *x, int y) {
*x += y;
}
crash
void bar() {
int z = 0;
…
z = foo(z, 5)
}
void bar() {
int z = 0;
…
foo(&z, 5)
}
25
Reduction: vsftpd
D to next version
Reduction
#
Sig
Fun
Typ
e
0
0
6
0
210,142
g
26
(~100%)
102,307
g
26
(~100%)
1
1
12
0
210,142
g
516
(~100%)
69,775
g
166
(~100%)
2
0
21
0
215,223
g
1,122
(99%)
55,555
g
553
(99%)
3
0
76
0
220,564
g
3,866
(98%)
37,265
g
1,912
(95%)
4
0
10
1
218,586
g
19,893
(91%)
2,123
g
301
(86%)
5
0
25
1
223,098
g
15,910
(93%)
67,330
g
3,567
(95%)
6
0
100
2
223,199
g
200,653
(14%)
7,437
g
2,742
(63%)
7
0
93
2
222,296
g
10,371
(95%)
3,098
g
275
(91%)
Total
1,753,250
g
252,357
(86%)
344,890
g
9,542
(97%)
All Points
Activeness-Safe Points
26
Safety: vsftpd
D to next version
#
All Points
Failed
Total
CFS Points
Failed
Total
AS Points
Sig
Fun
Type
Failed
0
0
6
0
0
210,142
0
210,142
0
1
1
12
0
2,462
210,142
558
90,073
2
0
21
0
0
215,223
0
3
0
76
0
0
220,564
4
0
10
1
43,233
5
0
25
1
6
0
100
7
0
93
Total
Manual Points
Failed
Total
35,314
0
80
0
587,578
0
80
215,223
0
20,902
0
80
0
220,564
0
638,803
0
80
218,586
546
4,478
0
21,343
0
80
58
223,098
0
24,924
0
111,950
0
80
2
2,115
233,199
0
3,737
0
44,278
0
82
2
234
222,296
0
1,993
0
100,854
0
80
Total
48,102
1,753,25
0
1,104
771,134
0
344,890
0
642
27
Which Tests?
P0
Old Behavior
Bugs & Deprecated Features
P1
Unchanged
Behavior
New Behavior
28
Bug-fixes & New Features
Nondeterminism
 Program traces may differ between runs
 Timing of signal handlers
 Number of iterations of loops performing IO
 Dependence on random numbers, system time,
memory addresses, etc.
 Handling nondeterminism:
 Ensure that traces match up to update point
 Annotate ignored regions of execution for which the
produced trace is ignored for matching purposes
29
Program Versions
vsftpd
OpenSSH
#
Versio
n
LoC
Tests
D to next version
Sig
Fun
Type
#
Versio
n
LoC
Tests
D to next version
Sig
Fun
Type
0
3.5p1
46,73
5
75
3
98
5
0
2.0.0
13,04
8
13
0
6
0
1
3.6.1p1
48,45
9
75
0
6
0
1
2.0.1
13,05
9
13
1
12
0
2
3.6.1p2
48,47
3
76
5
238
11
2
2.0.2p2
13,11
4
13
0
21
0
3
3.7.1p1
50,44
8
91
0
18
0
3
2.0.2p3
14,29
3
13
0
76
0
4
3.7.1p2
50,46
0
91
13
172
10
4
2.0.2
16,87
0
13
0
10
1
5
3.8p1
51,82
2
104
0
24
1
5
2.0.3
12,97
7
13
0
25
1
6
3.8.1p1
51,83
8
104
6
257
10
6
2.0.4
14,42
7
14
0
100
2
7
3.9p1
53,26
0
104
4
179
12
7
2.0.5
14,48
2
13
0
93
230
Unsafe Timing:
Version Inconsistency
Version 0
Version 1 (patch)
void foo() {
bar();
…
baz();
}
void foo() {
bar();
…
baz();
}
void bar() { … }
void bar() { dig(); … }
void baz() { dig(); … }
void baz() { … }
31
Unsafe Timing:
Version Inconsistency (vsftpd)
Version 0
Version 1 (patch)
void
handle_upload_common() {
void
handle_upload_common() {
ret = do_file_recv();
ret = do_file_recv();
if (ret == SUCCESS)
write(226, “OK.”);
}
void do_file_recv() {
… // receive file
if (ret == SUCCESS)
write(226, “OK.”);
return ret;
}
}
void do_file_recv () {
… // receive file
return ret;
}
32
Unsafe Timing:
Version Inconsistency (OpenSSH)
Version 0
Version 1 (patch)
void maincont() {
extracted();
…
serverloop2();
}
void maincont() {
extracted();
…
serverloop2();
}
void extracted() { … }
void extracted() {
global_ptr = init;
}
void serverloop2() {
global_ptr = init;
tmp = (*global_ptr).pw;
}
void serverloop2() {
tmp = (*global_ptr).pw;
}
33
Activeness Safety (AS)
 AS prevents updates to active code
 In this example, no patch updating main or foo is
allowed:
main() {
extracted();
foo();
…
baz();
}
extracted() {
// initialization
// code
…
}
foo() {
…
bar();
}
34
Minimization Algorithm
Initial Trace
Update? (1)
…
Call(foo)
Update? (2)
…
Call(bar)
Update? (3)
…
Call(baz)
p
Algorithm State
Last Update Pt: 1?
Algorithm State
Points To Test: {}
Algorithm State
Last Update Pt: 1
Points
To Test:
Last
Update
Pt:{}12
Algorithm State
Points
To Test:State
{}
Algorithm
Last Update Pt: 2
Last Update
Pt:{}2
3
Points
To Test:
Algorithm State
Points To Test: {}
Last Update Pt: 3
Points To Test: {{}3 }
(patch A)
baz() {…}
35
Minimization Algorithm
Initial Trace
Update? (1)
…
Call(foo)
Update? (2)
…
Call(bar)
Update? (3)
…
Call(baz)
p
Algorithm State
Last Update Pt: 1?
Algorithm State
Points To Test: {}
Algorithm State
Last Update Pt: 1
Points
To Test:
Last
Update
Pt:{{}121 }
Algorithm State
Points
To Test:State
{1}
Algorithm
Last Update Pt: 2
Last Update
Pt:{3
211,}2 }
Points
To Test:
Algorithm
AlgorithmState
State
Points To Test: { 1, 2 }
Last Update Pt: 3
Points To Test: { 1, 2
2,}3 }
(patch B)
foo() {…}
bar() {…}
baz() {…}
36
Download