CIL

advertisement
CIL:
Intermediate Language and
Tools for Analysis and
Transformation of C programs
George C.Necula
Scott McPeak
S.P.Rahul
Westley Weimer
University of California, Berkeley
Proc. of Conference on Compiler Construction, 2002
INDEX

Author

Questions

Overview

Introduction

Evaluation
1st AUTHOR

George C.Necula

Scott McPeak

S.P.Rahul

Westley Weimer
George C.Necula






George C. Necula, Philip Wadler: Proceedings of the 35th ACM SIGPLAN-SIGACT
Symposium on Principles of Programming Languages, POPL 2008, San Francisco,
California, USA, January 7-12, 2008 ACM 2008
Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM
Trans. Program. Lang. Syst. 30(2): (2008)
François Pottier, George C. Necula: Proceedings of TLDI'07: 2007 ACM SIGPLAN
International Workshop on Types in Languages Design and Implementation, Nice, France,
January 16, 2007 ACM 2007
Jeremy Condit, Matthew Harren, Zachary R. Anderson, David Gay, George C. Necula:
Dependent Types for Low-Level Programming. ESOP 2007: 520-535
Bor-Yuh Evan Chang, Xavier Rival, George C. Necula: Shape Analysis with Structural
Invariant Checkers. SAS 2007: 384-401
Ajay Chander, David Espinosa, Nayeem Islam, Peter Lee, George C. Necula: Enforcing
resource bounds via static verification of dynamic checks. ACM Trans. Program. Lang. Syst.
29(5): (2007)
CONTINUED







Jens Knoop, George C. Necula, Wolf Zimmermann: Preface. Electr. Notes Theor. Comput.
Sci. 176(3): 1-2 (2007)
Sumit Gulwani, George C. Necula: A polynomial-time algorithm for global value numbering.
Sci. Comput. Program. 64(1): 97-114 (2007)
George C. Necula: Using Dependent Types to Port Type Systems to Low-Level Languages.
CC 2006: 1
Feng Zhou, Jeremy Condit, Zachary R. Anderson, Ilya Bagrak, Robert Ennals, Matthew
Harren, George C. Necula, Eric A. Brewer: SafeDrive: Safe and Recoverable Extensions
Using Language-Based Techniques. OSDI 2006: 45-60
Úlfar Erlingsson, Martín Abadi, Michael Vrable, Mihai Budiu, George C. Necula: XFI:
Software Guards for System Address Spaces. OSDI 2006: 75-88
Bor-Yuh Evan Chang, Matthew Harren, George C. Necula: Analysis of Low-Level Code
Using Cooperating Decompilers. SAS 2006: 318-335
Bor-Yuh Evan Chang, Adam J. Chlipala, George C. Necula: A Framework for Certified
Program Analysis and Its Applications to Mobile-Code Safety. VMCAI 2006: 174-189
Scott McPeak







Scott McPeak, George C. Necula: Data Structure Specifications via Local Equality Axioms.
CAV 2005: 476-490
George C. Necula, Jeremy Condit, Matthew Harren, Scott McPeak, Westley Weimer:
CCured: type-safe retrofitting of legacy software. ACM Trans. Program. Lang. Syst. 27(3):
477-526 (2005)
Scott McPeak, George C. Necula: Elkhound: A Fast, Practical GLR Parser Generator. CC
2004: 73-88
Jeremy Condit, Matthew Harren, Scott McPeak, George C. Necula, Westley Weimer:
CCured in the real world. PLDI 2003: 232-244
George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate
Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228
George C. Necula, Scott McPeak, Westley Weimer: CCured: type-safe retrofitting of legacy
code. POPL 2002: 128-139
Dan Bonachea, Eugene Ingerman, Joshua Levy, Scott McPeak: An Improved Adaptive MultiStart Approach to Finding Near-Optimal Solutions to the Euclidean TSP. GECCO 2000: 143150
S.P.Rahul


George C. Necula, Scott McPeak, Shree Prakash Rahul, Westley Weimer: CIL: Intermediate
Language and Tools for Analysis and Transformation of C Programs. CC 2002: 213-228
George C. Necula, Shree Prakash Rahul: Oracle-based checking of untrusted software.
POPL 2001: 142-154
Westley Weimer






Stephanie Forrest, ThanhVu Nguyen, Westley Weimer, Claire Le Goues: A genetic
programming approach to automated software repair. GECCO 2009: 947-954
Raymond P. L. Buse, Westley Weimer: The road not taken: Estimating path execution
frequency statically. ICSE 2009: 144-154
Westley Weimer, ThanhVu Nguyen, Claire Le Goues, Stephanie Forrest: Automatically
finding patches using genetic programming. ICSE 2009: 364-374
Pieter Hooimeijer, Westley Weimer: A decision procedure for subset constraints over regular
languages. PLDI 2009: 188-198
Tamim I. Sookoor, Timothy W. Hnat, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse:
Macrodebugging: global views of distributed program execution. SenSys 2009: 141-154
Claire Le Goues, Westley Weimer: Specification Mining with Few False Positives. TACAS
2009: 292-306
CONTINUED







Nicholas Jalbert, Westley Weimer: Automated duplicate detection for bug tracking systems.
DSN 2008: 52-61
bibliographical record in XML
Kinga Dobolyi, Westley Weimer: Changing Java's
Semantics for Handling Null Pointer Exceptions. ISSRE 2008: 47-56
Raymond P. L. Buse, Westley Weimer: A metric for software readability. ISSTA 2008: 121130
Raymond P. L. Buse, Westley Weimer: Automatic documentation inference for exceptions.
ISSTA 2008: 273-282
Xiang Yin, John C. Knight, Elisabeth A. Nguyen, Westley Weimer: Formal Verification by
Reverse Synthesis. SAFECOMP 2008: 305-319
Timothy W. Hnat, Tamim I. Sookoor, Pieter Hooimeijer, Westley Weimer, Kamin Whitehouse:
MacroLab: a vector-based macroprogramming framework for cyber-physical systems.
SenSys 2008: 225-238
Westley Weimer, George C. Necula: Exceptional situations and program reliability. ACM
Trans. Program. Lang. Syst. 30(2): (2008)
2nd Questions

Q1: What does the recursive structure transformation look like
in CIL?

Q2: What's the implementation of Integrating a CFG into the
Intermediate Language?



Q3: How do they achieve the goal of making code
immune to stack-smashing attack?
Q4: What are the difficulties in designing the wholeprogram merger and what about implementation?
Q5: How does the merger deal with .lib and .dll?
BackDraws in C
Phenomenon: the same syntax but different meanings.

What if low-level representation?
No ambiguities for loss of structural information about types,
loops, and other high-level constructs.
3rd OVERVIEW
CIL
CIL is both lower-level than abstract-syntax trees, by clarifying
ambiguous constructs and removing redundant ones, and also
higher-level than typical intermediate languages designed for
compilation, by maintaining types and a close relationship with
the source program.

Feature
The main advantage of CIL is that it compiles all valid C programs
into a few core constructs with a very clean semantics.
Translating from CIL to C is fairly easy.
Q1: What does the recursive structure transformation look like in CIL?
Q2: What's the implementation of Integrating a CFG into the
Intermediate Language?
(After transformation, call Cil.computeCFGInfo<Compute all
statements and find the successor and predecessor of each
statement;Return a list of statements>)
4th INTRODUCTION

Basic components

Compilation( C ---> CIL)

A whole-program merger

Representative application
BASIC COMPONENTS

Lvalue
An lvalue is expressed as a pair of a base plus an offset. The
base address can be either the starting address for the storage
for a variable (local or global) or any pointer expression.
BASIC COMPONENTS

Expression & Instruction
Note:
Casts are inserted explicitly to make the program conform to our
type system.
BASIC COMPONENTS

Statement
BASIC COMPONENTS

Types
CIL moves all type declarations to the beginning of the program
and gives them global scope.
All anonymous composite types are given unique names in CIL
and every composite types has its own declaration at the toplevel.
BASIC COMPONENTS

Attributes
It is often useful to have a mechanism for the programmer to
communicate additional information to the program analysis.

The type attributes for a base type must be specified
immediately following the type.

The type attributes for a pointer type must be specified
immediately after the * symbol.

The attributes for a function type or for an array type can be
specified using parenthesized declarators.
COMPILATION

One of the most significant transformations is that expressions
that contain side-effects are separated into statements.

Type specifiers are interpreted and normalized.

Nested structure tag definitions are pulled apart. This means
that all structure tag definitions can be found by a simple scan
of the globals.

Prototypes are added for those functions that are called before
being defined. Furthermore, if a prototype exists but does not
specify the type of parameters that is fixed.

Initializers are normalized to include specific initialization for
the missing elements.

CIL will remove from the source file those type declarations,
local variables and inline functions that are not used in the file.
This means that your analysis does not have to see all the ugly
stuff that comes from the header files.

Local variables in inner scopes are pulled to function scope
(with appropriate renaming). Local scopes thus disappear. This
makes it easy to find and operate on all local variables in a
function.
A WHOLE-PROGRAM MERGER

A tool that merges all of a program’s compilation
units into a single compilation unit, with proper
renaming to preserve semantics considering
many analyses are most effective when applied
to the whole program.
Q4: What's the difficulties in designing the whole-program
merger and what about implementation?

File-scope identifiers must be renamed properly
to avoid clashes with globals and with similar
identifiers in different files.
Solution:


Structural Equivalence VS Name Equivalence
For each file there are two merging phases. In
the first phase we merge the types and tags.Then
in the second stage we rewrite the variable
declarations and function bodies.
REPRESENTATIVE APP
Q3: How do they achieve the goal of making code
immune to stack-smashing attack?
CIL modifies the program to maintain a separate stack for
return addresses. Even if a buffer overrun attack occurs the
actual correct return address will be taken from the special
stack.
5th EVALUATION


CIL has been tested very extensively. It is able to process
the SPECINT95 benchmarks, the Linux kernel, GIMP and
other open-source projects.
CIL was tested against GCC’s c-torture testsuite and (except
for the tests involving complex numbers and inner functions,
which CIL does not currently implement) CIL passes most of
the tests. Specifically CIL fails 23 tests out of the 904 c-torture
tests that it should pass. GCC itself fails 19 tests.
Thank you!
More information at http://hal.cs.berkeley.edu/cil/
Download