Automated Whitebox Fuzz Testing Patrice Godefroid (Microsoft Research) Michael Y. Levin (Microsoft Center for Software Excellence) David Molnar (UC-Berkeley, visiting MSR) Presented by Yuyan Bao Acknowledgement • This presentation is extended and modified from • The presentation by David Molnar Automated Whitebox Fuzz Testing • The presentation by Corina Păsăreanu (Kestrel) Symbolic Execution for Model Checking and Testing Agenda • • • • • • • • • Fuzz Testing Symbolic Execution Dynamic Testing Generation Generational Search SAGE Architecture Conclusion Contribution Weakness Improvement Fuzz Testing • An effective technique for finding security vulnerabilities in software • Apply invalid, unexpected, or random data to the inputs of a program. • If the program fails(crashing, access violation exception), the defects can be noted. Whitebox Fuzzing • This paper present an alternative whitebox fuzz testing approach inspired by recent advances in symbolic execution and dynamic test generation – Run the code with some initial well-formed input – Collect constraints on input with symbolic execution – Generate new constraints (by negating constraints one by one) – Solve constraints with constraint solver – Synthesize new inputs Symbolic Execution • • • • A method for deriving test cases which satisfy a given path Performed as a simulation of the computation on the path Initial path function = Identity function, Initial path condition = true Each vertex on the path induce a symbolic composition on the path function and a logical constraint on the path condition: If an assignment was made: X g( X ) f g If a conditional decision was made: path condition path condition f branch condition [ X f ( X )] Output: path function and path condition for the given path Slide from “White Box Testing and Symbolic Execution” by Michael Beder White Box Testing and Symbolic Execution 6 Symbolic Execution Symbolic execution tree: x:X,y:Y PC: true true x:X,y:Y PC: X>Y Code: int x, y; if (x > y) { x = x + y; y = x - y; x = x - y; if (x – y > 0) assert (false); } x:X,y:Y PC: X<=Y x:X+Y,y:Y PC: X>Y x:X+Y,y:X PC: X>Y Not reachable x:Y,y:X PC: X>Y - “Simulate” the code using symbolic values instead of program numeric data (PC=“path condition”) false true x:Y,y:X PC: X>Y Y-X>0 false x:Y,y:X PC:X>Y Y-X<=0 FALSE! 7 Dynamic Test Generation • Executing the program starting with some initial inputs • Performing a dynamic symbolic execution to collect constraints on inputs • Use a constraint solver to infer variants of the previous inputs in order to steer the next executions Dynamic Test Generation input = “good” void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } • • Running the program with random values for the 4 input bytes is unlikely to discovery the error: a probability of about 1 / 230 Traditional fuzz testing is difficult to generate input values that will drive program through all its possible execution paths. Depth-First Search I0 != ‘b’ I1 != ‘a’ I2 != ‘d’ I3 != ‘!’ good void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I0 I1 I2 I3 != != != != ‘b’ ‘a’ ‘d’ ‘!’ Depth-First Search good goo! void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I0 I1 I2 I3 != != != == ‘b’ ‘a’ ‘d’ ‘!’ Practical limitation good godd void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } I0 I1 I2 I3 != != == != Every time only one constraint is expanded, low efficiency! ‘b’ ‘a’ ‘d’ ‘!’ Generational search • BFS with a heuristic to maximize block coverage • Score returns the number of new blocks covered Slide from “Symbolic Execution” by Kevin Wallace, CSE504 2010-04-28 14 Generational Search Key Idea: One Trace, Many Tests bood gaod godd good goo! void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } “Generation 1” test cases I0 I1 I2 I3 == == == == ‘b’ ‘a’ ‘d’ ‘!’ The Generational Search Space void top(char input[4]) { int cnt = 0; if (input[0] == ‘b’) cnt++; if (input[1] == ‘a’) cnt++; if (input[2] == ‘d’) cnt++; if (input[3] == ‘!’) cnt++; if (cnt >= 3) crash(); } 0 1 1 2 1 2 2 1 2 2 2 good goo! godd god! gaod gao! gadd gad! bood boo! bodd bod! baod baod badd bad! SAGE Architecture (Scalable Automated Guided Execution) Input0 Task: Tester Check for Crashes (AppVerifier) Constraints Trace File Task: Tracer Trace Program (Nirvana) Task: CoverageCollector Gather Constraints (Truscan) Input1 Input2 … InputN Task: SymbolicExecutor Solve Constraints (Disolver) Initial Experiences with SAGE • Since 1st MS internal release in April’07: dozens of new security bugs found (most missed by blackbox fuzzers, static analysis) • Apps: image processors, media players, file decoders,… • Many bugs found rated as “security critical, severity 1, priority 1” • Now used by several teams regularly as part of QA process Experiments ANI Parsing - MS07-017 RIFF...ACONLIST B...INFOINAM.... 3D Blue Alternat e v1.1..IART.... ................ 1996..anih$...$. ................ ................ ..rate.......... ..........seq .. ................ ..LIST....framic on......... .. Initial input RIFF...ACONB B...INFOINAM.... 3D Blue Alternat e v1.1..IART.... ................ 1996..anih$...$. ................ ................ ..rate.......... ..........seq .. ................ ..anih....framic on......... .. Only 1 in 232 chance at random! Crashing test case Initial input : 341 branch constraints, 1,279,939 total instructions. Crash test cases: 7706 test cases, 7hrs 36 mins Different Crashes From Different Initial Input Files 100 Bug ID seed1 seed2 seed3 seed4 seed5 seed6 seed7 1867196225 X X X X X 2031962117 X X X X X X X 612334691 1061959981 X 1212954973 X 1011628381 X 842674295 1246509355 X X X X X X X X 1527393075 X 1277839407 X 1951025690 zero bytes X Media 1: 60 machine-hours, 44,598 total tests, 357 crashes, 12 bugs Conclusion Blackbox vs. Whitebox Fuzzing • Cost/precision tradeoffs – Blackbox is lightweight, easy and fast, but poor coverage – Whitebox is smarter, but complex and slower – Recent “semi-whitebox” approaches • Less smart but more lightweight: Flayer (taint-flow analysis, may generate false alarms), Bunny-the-fuzzer (taint-flow, source-based, heuristics to fuzz based on input usage), autodafe, etc. • Which is more effective at finding bugs? It depends… – Many apps are so buggy, any form of fuzzing finds bugs! – Once low-hanging bugs are gone, fuzzing must become smarter: use whitebox and/or user-provided guidance (grammars, etc.) • Bottom-line: in practice, use both! Contribution • A novel search algorithm with a coverage-maximizing heuristic • It performs symbolic execution of program traces at the x86 binary level • Tests larger applications than previously done in dynamic test generation Weakness • Gathering initial inputs • The class of inputs characterized by each symbolic execution is determined by the dependence of the program’s control flow on its inputs • Not a general solution • Only for x86 windows applications • Only focus on file-reading applications • Nirvana simulates a given processor’s architecture Improvement • Portability • Translate collected traces into an intermediate language • Reproduce bugs on different platforms without re-simulating the program • Better way to find initial inputs efficiently Thank you! Questions?