Functions and stack frames [1em] C

advertisement
Functions and stack frames
LLVM
C −→ x86
Hayo Thielecke
September 21, 2015
Contents
Big idea 1: stack for function calls
Big idea 2: names compiled into indices
Target architecture
Compiling functions
Adding pointers
Optimizations: inlining and tail calls
Compiling structure and array access
Objects in C++
Aims and overview
I
We will see some typical C code compiled to x86 assembly by
LLVM
I
Emphasise general principles used in almost all compilers
I
Use LLVM on C and x86 for example and concreteness
I
What LLVM does, not details of how it does it internally
I
Enough to compile some C code by hand line by line
I
C language features 7→ soup of mainly mov instructions
I
Various language features on top of vanilla functions
I
Optimizations
Two big ideas in compiling functions
stack ↔ recursion
compare: parsing stack
many abstract and not so abstract machines use stacks
including JVM
In C: one stack frame per function call
Names → indices
Names can be compiled into indices, discovered many times
deBruijn indices: lambda calculus without variables
cartesian closed categories, CAM machine for CAML
In C: variables become small integers to be added to the base
pointer
Using LLVM
Please do experiments yourself for seeing how LLVM compiles C.
If you do not have LLVM on your computer:
ssh into a lab machine and type
module load llvm/3.3
To compile, type
clang -S test.c
Then the assembly code will be in test.s
Function frodo will be labelled frodo: in test.s
For optimization, use
clang -S -O3 test.c
Stack frame details
The details differ between architectures (e.g., x86, ARM, SPARC)
Ingredients of stack frames, in various order, some may be missing:
return address
parameters
local vars
saved frame pointer
caller or callee saved registers
static link (in Pascal and Algol, but not in C)
this pointer for member functions (in C++)
Naive calling convention: push args on stack
Push parameters
Then call function; this pushes the return address
This works.
It makes it very easy to have variable number of arguments, like
printf in C.
But: stack is slow; registers are fast.
Compromise: use registers when possible, “spill” into stack
otherwise
Optimzation (-O flags) often lead to better register usage
Call stack: used by C at run time for function calls
Convention: we draw the stack growing downwards on the page.
Suppose function g calls function f.
...
parameters for g
frame for g
return address for g
automatic variables in g
parameters for f
frame for f
return address for f
automatic variables in f
There may be more in the frame, e.g. saved registers
Call stack: one frame per function call
Recursion example: fac(n) calls fac(n - 1)
...
2
argument
frame for fac(2)
return address
1
argument
frame for fac(1)
return address
0
frame for fac(0)
return address
argument
Clang function idiom
http://llvm.org/docs/LangRef.html#calling-conventions
f:
pushq %rbp
movq %rsp, %rbp
... body of function f
popq %rbp
ret
parameters are passed in registers rdi, rsi
return value is passed in register rax
Parameters compiled into indices
Compare Latex: parameters are numbered rather than named.
\newcommand\demand[2]{Dear #1, please pay \pounds #2.}
\demand{Bob}{7}
Dear Bob, please pay £7.
DeBruijn lambda calculus
compare array access
Typical C code to compile
long f(long x, long y)
{
long a, b;
a = x + 42;
b = y + 23;
return a * b;
}
Parameters/arguments:
x and y
Local/automatic variables
a and b
More precisely, x and y are formal parameters.
In a call f(1,2), 1 and 2 are the actual parameters.
Computing the index
Simple in principle:
walk over the syntax tree and keep track of declarations
The declarations tell us the size: long x means x needs 8 bytes
That is why C has type declarations in the first place
long f(int x, int y) // put y at -8 and x at -16
{
int a;
// put a at -24
int b;
// put b at -32
a = x;
// now we know where a and x are
// relative to rbp
}
Exercise: what happens if we also have char and float declarations?
Stacks and architectures
The view that C has of the hardware is a kind of abstract stack
machine
Burroughs mainframes were stack machines
https://en.wikipedia.org/wiki/Burroughs_large_systems
modern architectures have many registers, particularly RISC
SPARC had register windows
Here: stack first (easy to understand), then registers for
optimization
Other approach: assume there are unlimited registers, then “spill ”
them into the stack as needed
Target architecture
We will only need a tiny subset of assembly.
Quite readable.
Instruction we will need:
mov push pop call ret jmp add mul test be
The call instruction pushes the current instruction pointer onto the
stack as the return address
ret pops the return address from the stack and makes it the new
instruction pointer
A nice target architecture should have lots of general-purpose
registers with indexed addressing.
Like RISC, but x86 is getting there in the 64-bit architecture
x86 in AT&T syntax
mov syntax is target last
mov x y is like y = x ;
Assembly generated by clang version 3.3
r prefix on registers means 64 bit register
movq etc: q suffix means quadword = 64 bits
% register
$ constant
indexed addressing -24(%rbp)
Leaf functions
leaf functions
If you draw function calls, these functions are the leaves (= tree
nodes without any children)
There is less work to do with the frame in a leaf function.
The stack pointer need not be adjusted.
Spilling registers
Registers are limited, but the stack can always grow
each function call gets a new frame
If a function calls another one, some registers may have to be
saved (spilled) into the stack frame.
LLVM may insert comments “spill” and “reload”
Similarly for complicated expressions that require more
intermediate values then we have registers.
Conditionals
if-then-else is broken down into
test
testq x x tests whether x is zero
branch
be L branch to label L if previous test was equal to zero
Frame and stack pointers
return address
The frame pointer %fp points at the current stack frame
Also called base pointer on x86 or %rbp
The stack pointer %rsp points at the top of the stack (which we
draw at the bottom of the page)
Some compilers use both a frame pointer and a stack pointer.
Sometimes only one pointer is used.
Stack frame in naive calling convention
push parameters in reverse order, then jump
On function entry, the stack looks like this:
parameter n
..
.
parameter 1
return addr
stack pointer
Stack frame in clang C calling convention on x86
The stack grows down on the page
lower addresses are lower on the page
return addr
old bp
frame/base pointer
parameter n
..
.
parameter 1
local var
..
.
local var
stack pointer, if used
Clang stack frame example
The stack grows down on the page
lower addresses are lower on the page
parameters are passed in registers, may be saved into frame if
needed
return addr
old bp
base pointer
y
bp - 8
x
bp - 16
a
bp - 24
b
bp - 32
Compiled with clang -S
long f(long x, long y)
{
long a, b;
a = x + 42;
b = y + 23;
return a * b;
}
y7→ rbp-8
x7→ rbp-16
a7→ rbp-24
b7→ rbp-32
f:
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movq -8(%rbp), %rsi
addq $42, %rsi
movq %rsi, -24(%rbp)
movq -16(%rbp), %rsi
addq $23, %rsi
movq %rsi, -32(%rbp)
movq -24(%rbp), %rsi
imulq -32(%rbp), %rsi
movq %rsi, %rax
popq %rbp
ret
Compiled with clang -S -O3
long f(long x, long y)
{
long a, b;
a = x + 42;
b = y + 23;
return a * b;
}
f:
addq $42, %rdi
leaq 23(%rsi), %rax
imulq %rdi, %rax
ret
Many arguments
Some passed on the stack, not in registers. These have positive
indices. Why?
long a(long x1, long x2,
long x3, long x4, long x5,
long x6, long x7, long x8)
{
return x1 + x7 + x8;
}
a:
addq
8(%rsp), %rdi
addq
16(%rsp), %rdi
movq %rdi, %rax
ret
Calling another function
long f(long x)
{
return g(x + 2) - 7;
}
f:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rdi
addq $2, %rdi
callq g
subq $7, %rax
addq $16, %rsp
popq %rbp
ret
Calling another function -O3
long f(long x)
{
return g(x + 2) - 7;
}
Call by reference in C = call by value + pointer
void f(int *p)
{
*p = *p + 2; // draw stack after this statement
}
void g()
{
int x = 10;
f(&x);
}
...
x
12
...
...
p
•
...
Escaping variables
The compiler normally tries to place variable in registers.
The frame slot may be ignored.
However, if we apply & to a variable, its value must be kept in the
frame slot.
For comparison: call by value modifies only local copy
void f(int y)
{
y = y + 2; // draw stack after this statement
}
void g()
{
int x = 10;
f(x);
}
...
x
10
...
...
y
12
...
Call with pointer: calling function
void f(long x, long *p)
{
*p = x;
}
long g()
{
long a = 42;
f(a + 1, &a);
return a;
}
g:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
leaq -8(%rbp), %rsi
movq $42, -8(%rbp)
movq -8(%rbp), %rax
addq $1, %rax
movq %rax, %rdi
callq f
movq -8(%rbp), %rax
addq $16, %rsp
popq %rbp
ret
Call with pointer: called function
void f(long x, long *p)
{
*p = x;
}
long g()
{
long a = 42;
f(a + 1, &a);
return a;
}
f:
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
movq %rsi, -16(%rbp)
movq -8(%rbp), %rsi
movq -16(%rbp), %rdi
movq %rsi, (%rdi)
popq %rbp
ret
Call with pointer: optimized with -O3
void f(long x, long *p)
{
*p = x;
}
f:
movq %rdi, (%rsi)
ret
Function pointer as function parameter
void g(void (*h)(int))
{
int x = 10;
h(x + 2);
}
void f(int y) { ... }
... g(f) ...
...
•
h
10
x
12
y
frame for g
frame for f
code for f
Function pointer as a parameter
long f(long (*g)(long))
{
return g(42);
}
f:
pushq %rbp
movq %rsp, %rbp
movabsq $42, %rax
movq %rdi, -8(%rbp)
movq %rax, %rdi
callq *-8(%rbp)
addq $16, %rsp
popq %rbp
ret
Function inlining
Function inlining = do a function call at compile time
Saves the cost of a function call at runtime
Copy body of called function + fill in arguments
Be careful about side effects in the arguments
Function inlining example
long sq(long x)
{
return x * x;
}
long f(long y)
{
long z = sq(++y);
return z;
}
Note: cannot just do ++y * ++y
f:
incq %rdi
imulq %rdi, %rdi
movq %rdi, %rax
ret
Tail call optimization
Tail position = the last thing that happens inside a function body
before the return f is in tail position
return f(E);
f is not in tail position
return 5 + f(E);
or
return f(E) - 6;
A function call in tail position can be compiled as a jump.
Function as a parameter, non-tailcall
long f(long (*g)(long))
{
return g(42) - 6;
}
f:
pushq %rax
movq %rdi, %rax
movl $42, %edi
callq *%rax
addq $-6, %rax
popq %rdx
ret
Function as a parameter, tailcall optimized
long f(long (*g)(long))
{
return g(42);
}
f:
movq %rdi, %rax
movl $42, %edi
jmpq
*%rax # TAILCALL
Factorial optimized
The recursive function call is optimzed away and replaced by
branches
common subexpression elimination of 1
eax is half of rax
long factorial(long n)
{
if(n == 0)
return 1;
else
return n *
factorial(n - 1);
}
factorial:
movl $1, %eax
testq %rdi, %rdi
je .LBB0_2
.LBB0_1:
imulq %rdi, %rax
decq %rdi
jne .LBB0_1
.LBB0_2:
ret
CPS and SSA
malloc and sharing
p = malloc ( N ) ;
q = p;
Stack
Heap
...
p
•
q
•
...
N bytes
malloc two different objects
p = malloc ( N ) ;
q = malloc ( N ) ;
Stack
Heap
...
p
•
q
•
...
N bytes
N bytes
Use after free: memory may be reused and overwritten
int *p = malloc(sizeof(int));
*p = 100;
free(p);
char *q = malloc(4);
strcpy(q, "abc");
printf("%s\n", q);
printf("%d\n", *p);
Stack
Heap
...
p
•
q
•
...
abc\0
Compiling structures and objects
Same idea as in stack frames:
access in memory via pointer + index
Structure definition tells the compiler the size and indices of the
member
No code is produced for a struct.
But the compiler’s symbol table is extended: it knows about the
member names and their types
Structure access then uses indexed addressing using those indices
Structure access
struct S {
int x;
int y;
};
void s(struct S *p)
{
p->x = 23;
p->y = 45;
}
s:
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
movq -8(%rbp), %rdi
movq $23, (%rdi)
movq -8(%rbp), %rdi
movq $45, 8(%rdi)
popq %rbp
ret
Array access
void f(long a[])
{
long b[5];
long j = 2;
b[j] = a[1];
}
f:
pushq %rbp
movq %rsp, %rbp
movq %rdi, -8(%rbp)
movq $2, -56(%rbp)
movq -8(%rbp), %rdi
movq 8(%rdi), %rdi
movq -56(%rbp), %rax
movq
%rdi, -48(%rbp,%rax,8)
popq %rbp
ret
Exercise: where do the indexes in the code come from? Draw the
stack and explain.
Calling virtual function
Code for g:
class C {
public:
virtual long f();
};
void g(C *p)
{
long a = p->f();
}
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movq
%rdi, -8(%rbp)
movq
-8(%rbp), %rdi
movq
(%rdi), %rax
callq *(%rax)
movq
%rax, -16(%rbp)
addq $16, %rsp
popq %rbp
ret
Member function access to object
Code for f:
class C {
long n;
public:
long f()
{
return n;
}
};
pushq %rbp
movq %rsp, %rbp
movq
%rdi, -8(%rbp)
movq
-8(%rbp), %rdi
movq
(%rdi), %rax
popq %rbp
ret
Inlining and C++
optimizing virtual function calls
template parameters and inlining
Download