Dynamic Software Update Testing: Framework and Empirical Study Christopher M. Hayden, Eric A. Hardisty, Michael Hicks, Jeffrey S. Foster University of Maryland, College Park Dynamic Software Updating (DSU) Performing updates to software at runtime has clear benefits: Increased software availability No need to terminate active connections / computation … but can we trust updated software? Critical to ensure updates are safe 2 Our Contributions Verification of DSU through testing: Testing Procedure Test Minimization Algorithm Empirical Study: Effectiveness of Minimization Update Safety / Effectiveness of Safety Checks 3 DSU Safety DSU creates the opportunity for new sources of bugs: Faulty state transformation Unsafe update timing Safety Checks – restrict when updates may be applied Activeness Safety / Con-freeness Safety 4 Activeness Safety (AS) AS prevents updates to active code In this example, no patch updating main or foo is allowed: main() { foo() { … foo(); … baz(); bar(); } } 5 Con-freeness Safety (CFS) CFS (Stoyle, et al ‘05) allows updates to active code only when type safety can be ensured In this example, no patch updating the signature of baz or bar is allowed: main() { foo() { … foo(); … baz(); bar(); } } 6 Unsafe Timing: Type Safety Version 0 Version 1 (patch) int foo(int x, int y) { return x + y; } void foo(int *x, int y) { *x += y; } crash void bar() { int z = 0; … z = foo(z, 5) } void bar() { int z = 0; … foo(&z, 5) } 7 DSU Testing Safety Checks offer limited guarantees: CFS and AS ensure type-safe execution AS ensures that you never return to old code following an update Neither of these properties ensure safe update timing We propose testing to verify the correctness of allowed update points: Use existing suite of application system tests Ensure that updating anywhere during the execution of those tests results in an execution that passes the test. 8 Testing Procedure Approach: Trace Start Instrument application to trace update points Execute system test and gather initial trace Potential Update Points For each update point in the initial trace, perform an update test: force an update at that point while executing the system test 9 Testing Procedure Approach: Instrument application to trace update points Execute system test and gather initial trace For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔ initial trace 10 Testing Procedure Approach: Instrument application to trace update points Execute system test and gather initial trace For each update point in the initial trace, perform an update test: force an update at that point while executing the system test ✔ ✔✘ ✔ initial trace update tests 11 Update Test Minimization Program traces may have thousands or millions of update points Many update tests have the same behavior for a given patch we can eliminate redundant tests Version 0 void main() { foo(); bar(); baz(); } Patch A Patch B baz() {…} foo() {…} bar() {…} baz() {…} All update points yield All update points same behavior yield distinct behavior12 Minimization Algorithm Execution events are traced if they have the potential to conflict with a patch A event conflicts with a patch p if applying p before the event might produce a different result than applying p after the event Example: function calls, global variable accesses Trace the execution of a test T on P0 Iterate through the trace noting the last update point each time we reach a conflicting trace element Run only the identified update tests Tnp 13 Empirical Results 14 Experimental Setup Based testing infrastructure on top of the Ginseng DSU system (Neamtiu, et al): Modified to support tracing and updating at pre- selected update points Insertion of explicit update points before each function call to approximate more liberal systems Disabled safety checking (CFS) for experiments Tested 3 years of patches to OpenSSH and vsftpd (only report OpenSSH in this talk) 15 Program Modifications foo() { while (1) { // main loop update(); extract { ... // main loop body } } extract { ... // after main Loop } Identify Long-running loops Add a Manually Selected Update Point Perform Loop Body Extraction Perform Continuation Extraction } 16 Experiments: Update Test Suite How many update tests must be run to test real- world updates to real-world applications? How effective is minimization at eliminating redundant tests? 17 Update Test Suite Size: OpenSSH D to next version Reduction # Tests Sig Fun Type All Points Activeness-Safe Points 0 75 3 98 5 580,871 g 31,791 (95%) 35,314 g 3,027 (91%) 1 75 0 6 0 705,322 g 1,795 (~100%) 587,578 g 1,717 (~100%) 2 76 5 238 11 638,720 g 63,011 (90%) 20,902 g 2,353 (89%) 3 91 0 18 0 772,198 g 4,324 (99%) 638,803 g 3,775 (99%) 4 91 13 172 10 773,086 g 27,399 (96%) 21,343 g 1,564 (93%) 5 104 0 24 1 878,235 g 17,398 (98%) 111,950 g 1,723 (98%) 6 104 6 257 10 879,668 g 47,092 (95%) 44,278 g 2,139 (95%) 7 104 4 179 12 918,717 g 89,601 (90%) 100,854 g 4,141 (96%) 8 105 0 72 3 973,364 g 34,293 (96%) 61,724 g 2,070 (97%) 9 104 10 157 7 933,514 g 52,356 (94%) 61,051 g 2,891 (95%) Total 8,053,695 g 369,060 (95%) 1,683,797 g 25,400 (98%) 18 Empirical Study of Update Safety How many failures occur when applying updates arbitrarily? How many failures occur when applying updates subject only to the AS and CFS safety checks? 19 Safety: OpenSSH D to next version Update Tests Sig Fun Type All Points Failed Total CFS Points Failed Total AS Points Failed Total 0 75 3 98 5 19,715 580,871 0 68,044 0 35,314 1 75 0 6 0 0 705,322 0 705,322 0 587,578 2* 76 5 238 11 306,965 683,720 1,688 75,307 4 20,902 3 91 0 18 0 0 772,198 0 772,198 0 638,803 4* 91 13 172 10 565,681 773,086 609 110,633 380 21,343 5 104 0 24 1 10,703 878,235 0 130,000 0 111,950 6 104 6 257 10 163,333 879,668 44,461 96,183 110 44,278 7 104 4 179 12 11,380 918,717 1 80,070 1 100,854 8 105 0 72 3 3 973,364 0 261,885 0 61,724 9 104 10 157 7 357,919 933,514 24 121,337 0 61,051 Total 1,435,699 8,053,695 46,783 2,420,979 495 1,683,797 20 Unsafe Timing: Version Inconsistency Version 0 Version 1 (patch) void foo() { bar(); … baz(); } void foo() { bar(); … baz(); } void bar() { … } void bar() { dig(); … } void baz() { dig(); … } void baz() { … } Manually Selected Update Points D to next version Safety # Tests Sig Fun Type Reduction Failed Total 0 75 3 98 5 566 g 566 (0%) 0 566 1 75 0 6 0 630 g 592 (6%) 0 630 2 76 5 238 11 568 g 568 (0%) 0 568 3 91 0 18 0 783 g 770 (2%) 0 783 4 91 13 172 10 782 g 782 (0%) 0 782 5 104 0 24 1 860 g 841 (2%) 0 860 6 104 6 257 10 859 g 859 (0%) 0 859 7 104 4 179 12 850 g 850 (0%) 0 850 8 105 0 72 3 868 g 823 (5%) 0 868 9 104 10 157 7 833 g 833 (0%) 0 833 Tota l 7,59 9 g 7,48 4 (2% ) 0 7,59 9 22 Summary We have argued that verification is necessary to prevent unsafe updates Provided empirical evidence that AS/CFS cannot prevent all unsafe updates We have presented an approach for testing dynamic updates We have presented and evaluated a minimization strategy to make update testing more practical 23 Additional Slides 24 Unsafe Timing: Type Safety Version 0 Version 1 (patch) int foo(int x, int y) { return x + y; } void foo(int *x, int y) { *x += y; } crash void bar() { int z = 0; … z = foo(z, 5) } void bar() { int z = 0; … foo(&z, 5) } 25 Reduction: vsftpd D to next version Reduction # Sig Fun Typ e 0 0 6 0 210,142 g 26 (~100%) 102,307 g 26 (~100%) 1 1 12 0 210,142 g 516 (~100%) 69,775 g 166 (~100%) 2 0 21 0 215,223 g 1,122 (99%) 55,555 g 553 (99%) 3 0 76 0 220,564 g 3,866 (98%) 37,265 g 1,912 (95%) 4 0 10 1 218,586 g 19,893 (91%) 2,123 g 301 (86%) 5 0 25 1 223,098 g 15,910 (93%) 67,330 g 3,567 (95%) 6 0 100 2 223,199 g 200,653 (14%) 7,437 g 2,742 (63%) 7 0 93 2 222,296 g 10,371 (95%) 3,098 g 275 (91%) Total 1,753,250 g 252,357 (86%) 344,890 g 9,542 (97%) All Points Activeness-Safe Points 26 Safety: vsftpd D to next version # All Points Failed Total CFS Points Failed Total AS Points Sig Fun Type Failed 0 0 6 0 0 210,142 0 210,142 0 1 1 12 0 2,462 210,142 558 90,073 2 0 21 0 0 215,223 0 3 0 76 0 0 220,564 4 0 10 1 43,233 5 0 25 1 6 0 100 7 0 93 Total Manual Points Failed Total 35,314 0 80 0 587,578 0 80 215,223 0 20,902 0 80 0 220,564 0 638,803 0 80 218,586 546 4,478 0 21,343 0 80 58 223,098 0 24,924 0 111,950 0 80 2 2,115 233,199 0 3,737 0 44,278 0 82 2 234 222,296 0 1,993 0 100,854 0 80 Total 48,102 1,753,25 0 1,104 771,134 0 344,890 0 642 27 Which Tests? P0 Old Behavior Bugs & Deprecated Features P1 Unchanged Behavior New Behavior 28 Bug-fixes & New Features Nondeterminism Program traces may differ between runs Timing of signal handlers Number of iterations of loops performing IO Dependence on random numbers, system time, memory addresses, etc. Handling nondeterminism: Ensure that traces match up to update point Annotate ignored regions of execution for which the produced trace is ignored for matching purposes 29 Program Versions vsftpd OpenSSH # Versio n LoC Tests D to next version Sig Fun Type # Versio n LoC Tests D to next version Sig Fun Type 0 3.5p1 46,73 5 75 3 98 5 0 2.0.0 13,04 8 13 0 6 0 1 3.6.1p1 48,45 9 75 0 6 0 1 2.0.1 13,05 9 13 1 12 0 2 3.6.1p2 48,47 3 76 5 238 11 2 2.0.2p2 13,11 4 13 0 21 0 3 3.7.1p1 50,44 8 91 0 18 0 3 2.0.2p3 14,29 3 13 0 76 0 4 3.7.1p2 50,46 0 91 13 172 10 4 2.0.2 16,87 0 13 0 10 1 5 3.8p1 51,82 2 104 0 24 1 5 2.0.3 12,97 7 13 0 25 1 6 3.8.1p1 51,83 8 104 6 257 10 6 2.0.4 14,42 7 14 0 100 2 7 3.9p1 53,26 0 104 4 179 12 7 2.0.5 14,48 2 13 0 93 230 Unsafe Timing: Version Inconsistency Version 0 Version 1 (patch) void foo() { bar(); … baz(); } void foo() { bar(); … baz(); } void bar() { … } void bar() { dig(); … } void baz() { dig(); … } void baz() { … } 31 Unsafe Timing: Version Inconsistency (vsftpd) Version 0 Version 1 (patch) void handle_upload_common() { void handle_upload_common() { ret = do_file_recv(); ret = do_file_recv(); if (ret == SUCCESS) write(226, “OK.”); } void do_file_recv() { … // receive file if (ret == SUCCESS) write(226, “OK.”); return ret; } } void do_file_recv () { … // receive file return ret; } 32 Unsafe Timing: Version Inconsistency (OpenSSH) Version 0 Version 1 (patch) void maincont() { extracted(); … serverloop2(); } void maincont() { extracted(); … serverloop2(); } void extracted() { … } void extracted() { global_ptr = init; } void serverloop2() { global_ptr = init; tmp = (*global_ptr).pw; } void serverloop2() { tmp = (*global_ptr).pw; } 33 Activeness Safety (AS) AS prevents updates to active code In this example, no patch updating main or foo is allowed: main() { extracted(); foo(); … baz(); } extracted() { // initialization // code … } foo() { … bar(); } 34 Minimization Algorithm Initial Trace Update? (1) … Call(foo) Update? (2) … Call(bar) Update? (3) … Call(baz) p Algorithm State Last Update Pt: 1? Algorithm State Points To Test: {} Algorithm State Last Update Pt: 1 Points To Test: Last Update Pt:{}12 Algorithm State Points To Test:State {} Algorithm Last Update Pt: 2 Last Update Pt:{}2 3 Points To Test: Algorithm State Points To Test: {} Last Update Pt: 3 Points To Test: {{}3 } (patch A) baz() {…} 35 Minimization Algorithm Initial Trace Update? (1) … Call(foo) Update? (2) … Call(bar) Update? (3) … Call(baz) p Algorithm State Last Update Pt: 1? Algorithm State Points To Test: {} Algorithm State Last Update Pt: 1 Points To Test: Last Update Pt:{{}121 } Algorithm State Points To Test:State {1} Algorithm Last Update Pt: 2 Last Update Pt:{3 211,}2 } Points To Test: Algorithm AlgorithmState State Points To Test: { 1, 2 } Last Update Pt: 3 Points To Test: { 1, 2 2,}3 } (patch B) foo() {…} bar() {…} baz() {…} 36