CIL: Intermediate Language and Tools for Analysis and Transformation of C programs George C.Necula Scott McPeak S.P.Rahul Westley Weimer University of California, Berkeley Proc. of Conference on Compiler Construction, 2002 INDEX Author Questions Overview Introduction Evaluation 1st AUTHOR George C.Necula Scott McPeak S.P.Rahul Westley Weimer George C.Necula George C. Necula, Philip Wadler: Proceedings of the 35th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2008, San Francisco, California, USA, January 7-12, 2008 ACM 2008 Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008) François Pottier, George C. Necula: Proceedings of TLDI'07: 2007 ACM SIGPLAN International Workshop on Types in Languages Design and Implementation, Nice, France, January 16, 2007 ACM 2007 Jeremy Condit, Matthew Harren, Zachary R. Anderson, David Gay, George C. Necula: Dependent Types for Low-Level Programming. ESOP 2007: 520-535 Bor-Yuh Evan Chang, Xavier Rival, George C. Necula: Shape Analysis with Structural Invariant Checkers. SAS 2007: 384-401 Ajay Chander, David Espinosa, Nayeem Islam, Peter Lee, George C. Necula: Enforcing resource bounds via static verification of dynamic checks. ACM Trans. Program. Lang. Syst. 29(5): (2007) CONTINUED Jens Knoop, George C. Necula, Wolf Zimmermann: Preface. Electr. Notes Theor. Comput. Sci. 176(3): 1-2 (2007) Sumit Gulwani, George C. Necula: A polynomial-time algorithm for global value numbering. Sci. Comput. Program. 64(1): 97-114 (2007) George C. Necula: Using Dependent Types to Port Type Systems to Low-Level Languages. CC 2006: 1 Feng Zhou, Jeremy Condit, Zachary R. Anderson, Ilya Bagrak, Robert Ennals, Matthew Harren, George C. Necula, Eric A. Brewer: SafeDrive: Safe and Recoverable Extensions Using Language-Based Techniques. OSDI 2006: 45-60 Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, George C. Necula: XFI: Software Guards for System Address Spaces. OSDI 2006: 75-88 Bor-Yuh Evan Chang, Matthew Harren, George C. Necula: Analysis of Low-Level Code Using Cooperating Decompilers. SAS 2006: 318-335 Bor-Yuh Evan Chang, Adam J. Chlipala, George C. Necula: A Framework for Certified Program Analysis and Its Applications to Mobile-Code Safety. VMCAI 2006: 174-189 Scott McPeak Scott McPeak, George C. Necula: Data Structure Specifications via Local Equality Axioms. CAV 2005: 476-490 George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3): 477-526 (2005) Scott McPeak, George C. Necula: Elkhound: A Fast, Practical GLR Parser Generator. CC 2004: 73-88 Jeremy Condit, Matthew Harren, Scott McPeak, George C. Necula, Westley Weimer: CCured in the real world. PLDI 2003: 232-244 George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228 George C. Necula, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy code. POPL 2002: 128-139 Dan Bonachea, Eugene Ingerman, Joshua Levy, Scott McPeak: An Improved Adaptive MultiStart Approach to Finding Near-Optimal Solutions to the Euclidean TSP. GECCO 2000: 143150 S.P.Rahul George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228 George C. Necula, Shree Prakash Rahul: Oracle-based checking of untrusted software. POPL 2001: 142-154 Westley Weimer Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, Claire Le Goues: A genetic programming approach to automated software repair. GECCO 2009: 947-954 Raymond P. L. Buse, Westley Weimer: The road not taken: Estimating path execution frequency statically. ICSE 2009: 144-154 Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest: Automatically finding patches using genetic programming. ICSE 2009: 364-374 Pieter Hooimeijer, Westley Weimer: A decision procedure for subset constraints over regular languages. PLDI 2009: 188-198 Tamim I. Sookoor, Timothy W. Hnat, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: Macrodebugging: global views of distributed program execution. SenSys 2009: 141-154 Claire Le Goues, Westley Weimer: Specification Mining with Few False Positives. TACAS 2009: 292-306 CONTINUED Nicholas Jalbert, Westley Weimer: Automated duplicate detection for bug tracking systems. DSN 2008: 52-61 bibliographical record in XML Kinga Dobolyi, Westley Weimer: Changing Java's Semantics for Handling Null Pointer Exceptions. ISSRE 2008: 47-56 Raymond P. L. Buse, Westley Weimer: A metric for software readability. ISSTA 2008: 121130 Raymond P. L. Buse, Westley Weimer: Automatic documentation inference for exceptions. ISSTA 2008: 273-282 Xiang Yin, John C. Knight, Elisabeth A. Nguyen, Westley Weimer: Formal Verification by Reverse Synthesis. SAFECOMP 2008: 305-319 Timothy W. Hnat, Tamim I. Sookoor, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse: MacroLab: a vector-based macroprogramming framework for cyber-physical systems. SenSys 2008: 225-238 Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM Trans. Program. Lang. Syst. 30(2): (2008) 2nd Questions Q1: What does the recursive structure transformation look like in CIL? Q2: What's the implementation of Integrating a CFG into the Intermediate Language? Q3: How do they achieve the goal of making code immune to stack-smashing attack? Q4: What are the difficulties in designing the wholeprogram merger and what about implementation? Q5: How does the merger deal with .lib and .dll? BackDraws in C Phenomenon: the same syntax but different meanings. What if low-level representation? No ambiguities for loss of structural information about types, loops, and other high-level constructs. 3rd OVERVIEW CIL CIL is both lower-level than abstract-syntax trees, by clarifying ambiguous constructs and removing redundant ones, and also higher-level than typical intermediate languages designed for compilation, by maintaining types and a close relationship with the source program. Feature The main advantage of CIL is that it compiles all valid C programs into a few core constructs with a very clean semantics. Translating from CIL to C is fairly easy. Q1: What does the recursive structure transformation look like in CIL? Q2: What's the implementation of Integrating a CFG into the Intermediate Language? (After transformation, call Cil.computeCFGInfo<Compute all statements and find the successor and predecessor of each statement;Return a list of statements>) 4th INTRODUCTION Basic components Compilation( C ---> CIL) A whole-program merger Representative application BASIC COMPONENTS Lvalue An lvalue is expressed as a pair of a base plus an offset. The base address can be either the starting address for the storage for a variable (local or global) or any pointer expression. BASIC COMPONENTS Expression & Instruction Note: Casts are inserted explicitly to make the program conform to our type system. BASIC COMPONENTS Statement BASIC COMPONENTS Types CIL moves all type declarations to the beginning of the program and gives them global scope. All anonymous composite types are given unique names in CIL and every composite types has its own declaration at the toplevel. BASIC COMPONENTS Attributes It is often useful to have a mechanism for the programmer to communicate additional information to the program analysis. The type attributes for a base type must be specified immediately following the type. The type attributes for a pointer type must be specified immediately after the * symbol. The attributes for a function type or for an array type can be specified using parenthesized declarators. COMPILATION One of the most significant transformations is that expressions that contain side-effects are separated into statements. Type specifiers are interpreted and normalized. Nested structure tag definitions are pulled apart. This means that all structure tag definitions can be found by a simple scan of the globals. Prototypes are added for those functions that are called before being defined. Furthermore, if a prototype exists but does not specify the type of parameters that is fixed. Initializers are normalized to include specific initialization for the missing elements. CIL will remove from the source file those type declarations, local variables and inline functions that are not used in the file. This means that your analysis does not have to see all the ugly stuff that comes from the header files. Local variables in inner scopes are pulled to function scope (with appropriate renaming). Local scopes thus disappear. This makes it easy to find and operate on all local variables in a function. A WHOLE-PROGRAM MERGER A tool that merges all of a program’s compilation units into a single compilation unit, with proper renaming to preserve semantics considering many analyses are most effective when applied to the whole program. Q4: What's the difficulties in designing the whole-program merger and what about implementation? File-scope identifiers must be renamed properly to avoid clashes with globals and with similar identifiers in different files. Solution: Structural Equivalence VS Name Equivalence For each file there are two merging phases. In the first phase we merge the types and tags.Then in the second stage we rewrite the variable declarations and function bodies. REPRESENTATIVE APP Q3: How do they achieve the goal of making code immune to stack-smashing attack? CIL modifies the program to maintain a separate stack for return addresses. Even if a buffer overrun attack occurs the actual correct return address will be taken from the special stack. 5th EVALUATION CIL has been tested very extensively. It is able to process the SPECINT95 benchmarks, the Linux kernel, GIMP and other open-source projects. CIL was tested against GCC’s c-torture testsuite and (except for the tests involving complex numbers and inner functions, which CIL does not currently implement) CIL passes most of the tests. Specifically CIL fails 23 tests out of the 904 c-torture tests that it should pass. GCC itself fails 19 tests. Thank you! More information at http://hal.cs.berkeley.edu/cil/