Functions and stack frames LLVM C −→ x86 Hayo Thielecke September 21, 2015 Contents Big idea 1: stack for function calls Big idea 2: names compiled into indices Target architecture Compiling functions Adding pointers Optimizations: inlining and tail calls Compiling structure and array access Objects in C++ Aims and overview I We will see some typical C code compiled to x86 assembly by LLVM I Emphasise general principles used in almost all compilers I Use LLVM on C and x86 for example and concreteness I What LLVM does, not details of how it does it internally I Enough to compile some C code by hand line by line I C language features 7→ soup of mainly mov instructions I Various language features on top of vanilla functions I Optimizations Two big ideas in compiling functions stack ↔ recursion compare: parsing stack many abstract and not so abstract machines use stacks including JVM In C: one stack frame per function call Names → indices Names can be compiled into indices, discovered many times deBruijn indices: lambda calculus without variables cartesian closed categories, CAM machine for CAML In C: variables become small integers to be added to the base pointer Using LLVM Please do experiments yourself for seeing how LLVM compiles C. If you do not have LLVM on your computer: ssh into a lab machine and type module load llvm/3.3 To compile, type clang -S test.c Then the assembly code will be in test.s Function frodo will be labelled frodo: in test.s For optimization, use clang -S -O3 test.c Stack frame details The details differ between architectures (e.g., x86, ARM, SPARC) Ingredients of stack frames, in various order, some may be missing: return address parameters local vars saved frame pointer caller or callee saved registers static link (in Pascal and Algol, but not in C) this pointer for member functions (in C++) Naive calling convention: push args on stack Push parameters Then call function; this pushes the return address This works. It makes it very easy to have variable number of arguments, like printf in C. But: stack is slow; registers are fast. Compromise: use registers when possible, “spill” into stack otherwise Optimzation (-O flags) often lead to better register usage Call stack: used by C at run time for function calls Convention: we draw the stack growing downwards on the page. Suppose function g calls function f. ... parameters for g frame for g return address for g automatic variables in g parameters for f frame for f return address for f automatic variables in f There may be more in the frame, e.g. saved registers Call stack: one frame per function call Recursion example: fac(n) calls fac(n - 1) ... 2 argument frame for fac(2) return address 1 argument frame for fac(1) return address 0 frame for fac(0) return address argument Clang function idiom http://llvm.org/docs/LangRef.html#calling-conventions f: pushq %rbp movq %rsp, %rbp ... body of function f popq %rbp ret parameters are passed in registers rdi, rsi return value is passed in register rax Parameters compiled into indices Compare Latex: parameters are numbered rather than named. \newcommand\demand[2]{Dear #1, please pay \pounds #2.} \demand{Bob}{7} Dear Bob, please pay £7. DeBruijn lambda calculus compare array access Typical C code to compile long f(long x, long y) { long a, b; a = x + 42; b = y + 23; return a * b; } Parameters/arguments: x and y Local/automatic variables a and b More precisely, x and y are formal parameters. In a call f(1,2), 1 and 2 are the actual parameters. Computing the index Simple in principle: walk over the syntax tree and keep track of declarations The declarations tell us the size: long x means x needs 8 bytes That is why C has type declarations in the first place long f(int x, int y) // put y at -8 and x at -16 { int a; // put a at -24 int b; // put b at -32 a = x; // now we know where a and x are // relative to rbp } Exercise: what happens if we also have char and float declarations? Stacks and architectures The view that C has of the hardware is a kind of abstract stack machine Burroughs mainframes were stack machines https://en.wikipedia.org/wiki/Burroughs_large_systems modern architectures have many registers, particularly RISC SPARC had register windows Here: stack first (easy to understand), then registers for optimization Other approach: assume there are unlimited registers, then “spill ” them into the stack as needed Target architecture We will only need a tiny subset of assembly. Quite readable. Instruction we will need: mov push pop call ret jmp add mul test be The call instruction pushes the current instruction pointer onto the stack as the return address ret pops the return address from the stack and makes it the new instruction pointer A nice target architecture should have lots of general-purpose registers with indexed addressing. Like RISC, but x86 is getting there in the 64-bit architecture x86 in AT&T syntax mov syntax is target last mov x y is like y = x ; Assembly generated by clang version 3.3 r prefix on registers means 64 bit register movq etc: q suffix means quadword = 64 bits % register $ constant indexed addressing -24(%rbp) Leaf functions leaf functions If you draw function calls, these functions are the leaves (= tree nodes without any children) There is less work to do with the frame in a leaf function. The stack pointer need not be adjusted. Spilling registers Registers are limited, but the stack can always grow each function call gets a new frame If a function calls another one, some registers may have to be saved (spilled) into the stack frame. LLVM may insert comments “spill” and “reload” Similarly for complicated expressions that require more intermediate values then we have registers. Conditionals if-then-else is broken down into test testq x x tests whether x is zero branch be L branch to label L if previous test was equal to zero Frame and stack pointers return address The frame pointer %fp points at the current stack frame Also called base pointer on x86 or %rbp The stack pointer %rsp points at the top of the stack (which we draw at the bottom of the page) Some compilers use both a frame pointer and a stack pointer. Sometimes only one pointer is used. Stack frame in naive calling convention push parameters in reverse order, then jump On function entry, the stack looks like this: parameter n .. . parameter 1 return addr stack pointer Stack frame in clang C calling convention on x86 The stack grows down on the page lower addresses are lower on the page return addr old bp frame/base pointer parameter n .. . parameter 1 local var .. . local var stack pointer, if used Clang stack frame example The stack grows down on the page lower addresses are lower on the page parameters are passed in registers, may be saved into frame if needed return addr old bp base pointer y bp - 8 x bp - 16 a bp - 24 b bp - 32 Compiled with clang -S long f(long x, long y) { long a, b; a = x + 42; b = y + 23; return a * b; } y7→ rbp-8 x7→ rbp-16 a7→ rbp-24 b7→ rbp-32 f: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq %rsi, -16(%rbp) movq -8(%rbp), %rsi addq $42, %rsi movq %rsi, -24(%rbp) movq -16(%rbp), %rsi addq $23, %rsi movq %rsi, -32(%rbp) movq -24(%rbp), %rsi imulq -32(%rbp), %rsi movq %rsi, %rax popq %rbp ret Compiled with clang -S -O3 long f(long x, long y) { long a, b; a = x + 42; b = y + 23; return a * b; } f: addq $42, %rdi leaq 23(%rsi), %rax imulq %rdi, %rax ret Many arguments Some passed on the stack, not in registers. These have positive indices. Why? long a(long x1, long x2, long x3, long x4, long x5, long x6, long x7, long x8) { return x1 + x7 + x8; } a: addq 8(%rsp), %rdi addq 16(%rsp), %rdi movq %rdi, %rax ret Calling another function long f(long x) { return g(x + 2) - 7; } f: pushq %rbp movq %rsp, %rbp subq $16, %rsp movq %rdi, -8(%rbp) movq -8(%rbp), %rdi addq $2, %rdi callq g subq $7, %rax addq $16, %rsp popq %rbp ret Calling another function -O3 long f(long x) { return g(x + 2) - 7; } Call by reference in C = call by value + pointer void f(int *p) { *p = *p + 2; // draw stack after this statement } void g() { int x = 10; f(&x); } ... x 12 ... ... p • ... Escaping variables The compiler normally tries to place variable in registers. The frame slot may be ignored. However, if we apply & to a variable, its value must be kept in the frame slot. For comparison: call by value modifies only local copy void f(int y) { y = y + 2; // draw stack after this statement } void g() { int x = 10; f(x); } ... x 10 ... ... y 12 ... Call with pointer: calling function void f(long x, long *p) { *p = x; } long g() { long a = 42; f(a + 1, &a); return a; } g: pushq %rbp movq %rsp, %rbp subq $16, %rsp leaq -8(%rbp), %rsi movq $42, -8(%rbp) movq -8(%rbp), %rax addq $1, %rax movq %rax, %rdi callq f movq -8(%rbp), %rax addq $16, %rsp popq %rbp ret Call with pointer: called function void f(long x, long *p) { *p = x; } long g() { long a = 42; f(a + 1, &a); return a; } f: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq %rsi, -16(%rbp) movq -8(%rbp), %rsi movq -16(%rbp), %rdi movq %rsi, (%rdi) popq %rbp ret Call with pointer: optimized with -O3 void f(long x, long *p) { *p = x; } f: movq %rdi, (%rsi) ret Function pointer as function parameter void g(void (*h)(int)) { int x = 10; h(x + 2); } void f(int y) { ... } ... g(f) ... ... • h 10 x 12 y frame for g frame for f code for f Function pointer as a parameter long f(long (*g)(long)) { return g(42); } f: pushq %rbp movq %rsp, %rbp movabsq $42, %rax movq %rdi, -8(%rbp) movq %rax, %rdi callq *-8(%rbp) addq $16, %rsp popq %rbp ret Function inlining Function inlining = do a function call at compile time Saves the cost of a function call at runtime Copy body of called function + fill in arguments Be careful about side effects in the arguments Function inlining example long sq(long x) { return x * x; } long f(long y) { long z = sq(++y); return z; } Note: cannot just do ++y * ++y f: incq %rdi imulq %rdi, %rdi movq %rdi, %rax ret Tail call optimization Tail position = the last thing that happens inside a function body before the return f is in tail position return f(E); f is not in tail position return 5 + f(E); or return f(E) - 6; A function call in tail position can be compiled as a jump. Function as a parameter, non-tailcall long f(long (*g)(long)) { return g(42) - 6; } f: pushq %rax movq %rdi, %rax movl $42, %edi callq *%rax addq $-6, %rax popq %rdx ret Function as a parameter, tailcall optimized long f(long (*g)(long)) { return g(42); } f: movq %rdi, %rax movl $42, %edi jmpq *%rax # TAILCALL Factorial optimized The recursive function call is optimzed away and replaced by branches common subexpression elimination of 1 eax is half of rax long factorial(long n) { if(n == 0) return 1; else return n * factorial(n - 1); } factorial: movl $1, %eax testq %rdi, %rdi je .LBB0_2 .LBB0_1: imulq %rdi, %rax decq %rdi jne .LBB0_1 .LBB0_2: ret CPS and SSA malloc and sharing p = malloc ( N ) ; q = p; Stack Heap ... p • q • ... N bytes malloc two different objects p = malloc ( N ) ; q = malloc ( N ) ; Stack Heap ... p • q • ... N bytes N bytes Use after free: memory may be reused and overwritten int *p = malloc(sizeof(int)); *p = 100; free(p); char *q = malloc(4); strcpy(q, "abc"); printf("%s\n", q); printf("%d\n", *p); Stack Heap ... p • q • ... abc\0 Compiling structures and objects Same idea as in stack frames: access in memory via pointer + index Structure definition tells the compiler the size and indices of the member No code is produced for a struct. But the compiler’s symbol table is extended: it knows about the member names and their types Structure access then uses indexed addressing using those indices Structure access struct S { int x; int y; }; void s(struct S *p) { p->x = 23; p->y = 45; } s: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq -8(%rbp), %rdi movq $23, (%rdi) movq -8(%rbp), %rdi movq $45, 8(%rdi) popq %rbp ret Array access void f(long a[]) { long b[5]; long j = 2; b[j] = a[1]; } f: pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq $2, -56(%rbp) movq -8(%rbp), %rdi movq 8(%rdi), %rdi movq -56(%rbp), %rax movq %rdi, -48(%rbp,%rax,8) popq %rbp ret Exercise: where do the indexes in the code come from? Draw the stack and explain. Calling virtual function Code for g: class C { public: virtual long f(); }; void g(C *p) { long a = p->f(); } pushq %rbp movq %rsp, %rbp subq $16, %rsp movq %rdi, -8(%rbp) movq -8(%rbp), %rdi movq (%rdi), %rax callq *(%rax) movq %rax, -16(%rbp) addq $16, %rsp popq %rbp ret Member function access to object Code for f: class C { long n; public: long f() { return n; } }; pushq %rbp movq %rsp, %rbp movq %rdi, -8(%rbp) movq -8(%rbp), %rdi movq (%rdi), %rax popq %rbp ret Inlining and C++ optimizing virtual function calls template parameters and inlining