WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Andrea Fioraldi, Daniele Cono D’Elia and Emilio Coppa @andreafioraldi andreafioraldi@gmail.com Format-aware Fuzzing Input Format Model Input Generation WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Program Under Test Crashes 2 Format-aware Fuzzing ● LangFuzz ● Peach ● Spike ● CSmith ● ... WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 3 Problems ● Impossible if the input structure is unknown WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 4 Problems ● Impossible if the input structure is unknown ● May fail to find bugs related to syntactically invalid inputs in parsers WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 5 Problems ● Impossible if the input structure is unknown ● May fail to find bugs related to syntactically invalid inputs in parsers ● Parser implementations do not always closely mirror format specifications WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 6 Problems ● Impossible if the input structure is unknown ● May fail to find bugs related to syntactically invalid inputs in parsers ● Parser implementations do not always closely mirror format specifications ● Models take some time to be written by a human (and contains simplifications) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 7 Problems ● Impossible if the input structure is unknown ● May fail to find bugs related to syntactically invalid inputs in parsers ● Parser implementations do not always closely mirror format specifications ● Models take some time to be written by a human (and contain simplifications) ● Wrong models make fuzzing ineffective WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 8 Solutions? ● Automatically learn the model from the actual implementation of the parser WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 9 Solutions? ● Automatically learn the model from the actual implementation of the parser ● Generate not always syntactically valid inputs WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 10 Solutions? ● Automatically learn the model from the actual implementation of the parser ○ (Approximation of) Taint Tracking ■ ○ Machine Learning ■ ○ [Learn&Fuzz] [REINAM] Oracle based ■ ● [Tupni] [Autogram] [Polyglot] [Grimoire] [GLADE] Generate not always syntactically valid inputs WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 11 Coverage-guided Fuzzing Coverage Corpus Input Mutation Program Under Test Crashes WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 12 Problems ● Fail to explore deep paths behind parsers WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 13 Problems ● Fail to explore deep paths behind parsers ● Affected by roadblocks (multi-byte comparisons, checksums, hashes, …) if (hash(input[0:8]) != input[8:12]) exit(1) if (input[12:16] == 0xABADCAFE) bug() WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 14 Structured Fuzzing Corpus Coverage Input Format Model Input Mutation WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats Program Under Test Crashes 15 Structured Fuzzing ● AFLSmart ● Nautilus ● Superion ● Libprotobuf-Mutator ● Zest ● ... WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 16 Bypass Roadblocks ● Concolic Fuzzing ○ ● (Approximation of) Taint Tracking ○ ● [Driller] [QSYM] [Eclipser] [TaintScope] [Vuzzer] [Angora] [Redqueen] Sensitive feedbacks ○ [LAF-Intel] [CompareCoverage] [FuzzFactory] [IJON] WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 17 Bypass Roadblocks ● Concolic Fuzzing ○ ● (Approximation of) Taint Tracking ○ ● [Driller] [QSYM] [Eclipser] [TaintScope] [Vuzzer] [Angora] [Redqueen] Sensitive feedbacks ○ [LAF-Intel] [CompareCoverage] [FuzzFactory] [IJON] WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 18 Idea #1 ● Reuse expensive analysis to bypass roadblocks previously explored in past works to enable Structure-aware mutations WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 19 Bypass Roadblocks [Redqueen] ● Mutations targeting magic byte comparisons (Input-To-State) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 20 Bypass Roadblocks [Redqueen] ● Mutations targeting magic byte comparisons (Input-To-State) input: AAAABBBBCCCCBBBB cmp eax, FFFF → eax = BBBB WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 21 Bypass Roadblocks [Redqueen] ● Mutations targeting magic byte comparisons (Input-To-State) input: AAAABBBBDDCCDDCC (equivalent in coverage) cmp eax, FFFF → eax = BBBB WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 22 Bypass Roadblocks [Redqueen] ● Mutations targeting magic byte comparisons (Input-To-State) input: AAAABBBBDDCCDDCC (equivalent in coverage) cmp eax, FFFF → eax = BBBB new input: AAAAFFFFDDCCDDCC WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 23 Bypass Roadblocks [Redqueen] ● Mutations targeting magic byte comparisons (Input-To-State) ● Patch out checksum checks WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 24 Formats as an AST [Grimoire] + / = 12 WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 5 3 25 Not all formats are parsed into an AST WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 26 Comparisons for validation if (chunk->size_field > SIZE_MAX) error(“Invalid Chunk Size”); WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 27 Idea #2 ● Instead of using memory accesses to reconstruct the format ([Tupni] [Autogram]) use the comparisons instructions that are likely validation checks WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 28 Idea #3 ● Don’t learn a model and use it to guide the fuzzer, but reconstruct each time the structure and apply mutations. This avoids the problem of having errors in the learning process. WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 29 Weizz ● Based on AFL 2.52b ● Binary-only (QEMU) ● Approximate Taint to bypass Roadblocks and learn information about validation checks ● Structural mutations based on that information (inspired by [AFLSmart]) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 30 Architecture WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 31 Architecture WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 32 Architecture WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 33 GetDeps: Approximating Taint Tracking Input: AAAABBBBCCCCDDDD cmp eax, FFFF → eax = AAAA WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 34 GetDeps: Approximating Taint Tracking Input: AAAABBBBCCCCDDDD cmp eax, FFFF → eax = AAAA Bitflip #1: BAAABBBBCCCCDDDD cmp eax, FFFF → eax = BAAA WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 35 Detect Checksum Checks ● One operand is I2S ● The other operand is not I2S and GetDeps revealed dependencies on some input bytes ● The sets of their byte dependencies are disjoint WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 36 Input Tags ● Comparison ID ● Timestamp ● Parent ID ● Number of tags with the same ID ● The Comparison ID of the inner checksum that guard this byte ● Flags (which CMP operand, if this is a checksum field, …) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 37 Many Comparisons affected by the same byte 1. Prioritize Checksum fields WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 38 Many Comparisons affected by the same byte 1. Prioritize Checksum fields 2. Prioritize comparisons appeared earlier in time (possible validation checks) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 39 Many Comparisons affected by the same byte 1. Prioritize Checksum fields 2. Prioritize comparisons appeared earlier in time (possible validation checks) 3. Prioritize if the number of bytes influencing the comparison are low WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 40 Fixing Checksum ● Late-stage repair ● Topological Sort (Tags have the info for this) ● Unpatch false positives WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 41 Locating Fields WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 42 Locating Chunks struct { int type; int x , y; int cksm; }; WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 43 Locating Chunks struct { int type; int x , y; int cksm; }; 1. Pick a tag type WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 44 Locating Chunks struct { int type; int x , y; int cksm; }; 1. Pick a tag type 2. Recurse if next Timestamp (ts) > current WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 45 Locating Chunks struct { int type; int x , y; int cksm; }; 1. Pick a tag type 2. Recurse if next Timestamp (ts) > current WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 46 Locating Chunks struct { int type; int x , y; int cksm; }; 1. Pick a tag type 2. Recurse if next Timestamp (ts) > current 3. Go forward if next ID = current Parent WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 47 Locating Chunks struct { int type; int x , y; int cksm; }; 1. Pick a tag type 2. Recurse if next Timestamp (ts) > current 3. Go forward if next ID = current Parent 4. With a probability take untagged part and recurse again WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 48 Mutating Chunks [AFLSmart] ● Addition ● Deletion ● Splicing WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 49 Mutating Chunks [Weizz] ● Addition ○ Select a chunk A and adds a chunk from another input in the queue with the same parent ID in the first tag of A before or after A Current input: A Other input: Generated input: B A WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats B 50 Mutating Chunks [Weizz] ● Deletion ○ Select a chunk and removes it Current input: A Generated input: WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 51 Mutating Chunks [Weizz] ● Splicing ○ Select a chunk A and replaces it with a chunk from another input in the queue with the same comparison ID in the first tag Current input: A Other input: Generated input: B B WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 52 Evaluation 1. Comparison with popular fuzzers over chunk-oriented programs 2. New bugs found by Weizz 3. Role of structural mutations and roadblock bypassing? WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 53 Evaluation WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 54 Evaluation (60% conf. intervals) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 55 Evaluation (60% conf. intervals) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 56 Evaluation w/o I2S WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 57 Evaluation (60% conf. intervals) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 58 Evaluation (60% conf. intervals) WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 59 Bugs WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 60 Evaluation w/o I2S WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats w/o struct. mut. 61 Future Directions ● Taint Tracking for large inputs ● More chunk location heuristics ○ Exclude types of tags as starting point for a chunk ○ Apply traditional file-format reverse engineering algorithms based on memory accesses to tags ● Port to other OSes WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 62 Thank You https://github.com/andreafioraldi/weizz-fuzzer WEIZZ: Automatic Grey-Box Fuzzing for Structured Binary Formats 63