Lecture 4: Advanced Instructions, Control, and Branching cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr. Rozier (UM) COURSE PLAN FOR TODAY 1. Control Instructions 2. Basic Memory Access CONTROL INSTRUCTIONS Branching • Branches allow us to transfer control of the program to a new address. – b (<suffix>) <label> – bl (<suffix>) <label> b start bl start Branch (b) • Branch, possibly conditionally, to a new address. beq subroutine @ If Z=1, branch • Good practice to use bal instead of b. Branch with link (bl) • Branch, possibly conditionally, to a new address. – Before the branch is complete, store the PC in the LR. – Allows easy return from the branch. bleq subroutine @ If Z=1, branch, saving the PC Branch with link (bl) • How do we get back once we’ve saved the PC? mov pc, lr • Moves the contents of the link register to the program counter. Goto vs. Subroutine • b – Allows us to transfer control of the program without returning. • bl – Allows us to transfer control of the program, with returning. Implementing If Statements • C code: if (i == j) f = g+h; else f = g - h; • ARM code cmp r0, r1 @ Set flags via r0-r1 and discard bne Else add r2, r3, r4 @ r2 = r3 + r4 bal Exit Else: sub r2, r3, r4 @ r2 = r3 - r4 Exit: Implementing Loop Statements • C code: while (i < j) i += 1; • ARM code Loop: i<j i>=j cmp r0, r1 bge Exit add r0, r0, #1 bal Loop Exit: i < j? i=i+1 Exit What about a Case Statement? • Say we have a case statement: switch(x) { case 0: foo(); break; case 1: bar(); break; case 2: baz(); break; case 3: qux(); break; } Jump Tables • Set up a portion of memory as such: Memory Location Contents 0x???? + 0 address of foo 0x???? + 4 address of bar 0x???? + 8 address of baz 0x???? + 12 address of qux • If our case variable is stored in r0… – r0 << #2 is the index into our jump table of the function address. Cases Statements as Jump Tables (assume r0 holds the switch variable) ldr r1, =jumptable ldr pc, [r1, r0, lsl #2] Wasn’t that easy? Jump Table Exercise • Convert the follow C-style code into ARMv6 instructions: x = getPrime(); switch(x) { case 1: firstPrime(); break; case 3: secondPrime(); break; case 5: thirdPrime(); break; case 7: fourthPrime(); break; default: notFirstFourPrimes(); } Pseudo-Instructions • Notice the use of: – ldr r0, =casetable – We saw this before in helloworld.s • What is really going on here? Pseudo-Instructions Code as we wrote it: ldr r1, swi 0 mov r7, swi 0 =string #1 Disasembled code: 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 muleq r1 [pc, #8] #1 r4 r0 This is weird… • Let’s play with gdb… x/x 0x8090 0x8090 <_exit+8>: x/x 0x10094 0x10094 <string>: 0x00010094 “Hello World!\nA\025” Looking back at the assembled code… 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 muleq r1 • While at instruction 0x8080 p/x $pc 0x8080 [pc, #8] #1 r4 r0 Looking back at the assembled code… 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 muleq r1 [pc, #8] #1 r4 r0 • While at instruction 0x8080 p/x $pc+8 The real value of $pc 0x8088 +4 +4 Looking back at the assembled code… 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 muleq r1 • While at instruction 0x8080 p/x $pc+8 0x8088 p/x ($pc+8)+8 0x8090 [pc, #8] #1 r4 r0 So why does it show up as muleq? • Representing instructions Cond 000000 A S Rd Rn Rs 1001 Rm – Condition Field • 0000 – EQ – 0000 | 000000 | 0 | 0 |????|????|????| 1001|???? Instruc 0000 000000 Hex 0 0 Bin 0000 0000 0 ???? ???? ???? 1001 ???? 0 1 0 0 9 4 0000 0001 0000 0000 1001 0100 mul r1, r4, r0 mul{<cond>}{S} rd, rm, rs 0 So why does it show up as muleq? • Representing instructions Cond 000000 Instruc 0000 000000 Hex 0 0 Bin 0000 0000 A S 0 Rd Rs 1001 Rm ???? ???? ???? 1001 ???? 0 1 0 0 9 4 0000 0001 0000 0000 1001 0100 mul r1, r4, r0 mul{<cond>}{S} rd, rm, rs mul 0001, 0100, 0000 0 Rn So what is this? Code as we wrote it: ldr r1, swi 0 mov r7, swi 0 =string #1 Disasembled code: 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 muleq r1 [pc, #8] #1 r4 r0 The problem with immediates • The fact that instructions, AND all their arguments, must take up only 32 bits limits the size of immediates to 1 byte. – Range 0 – 255. – Hello world was in 0x10094 – PC was at 0x8088 – Max offset with immediate value? • 0x8088 + 0xFF = 0x8187 Enter, the Literal Pool Last instruction in basic block 0x8080 ldr r1, 0x8084 svc 0x0 0x8088 mov r7 0x808c svc 0x0 0x8090 00 01 Literal Pool [pc, #8] #1 00 94 Basic Blocks • A basic block is a sequence of instructions with – No embedded branches (except at end) – No branch targets (except at beginning) A compiler identifies basic blocks for optimization An advanced processor can accelerate execution of basic blocks So how do we use the Literal Pool? • We’ve been using it all along. ldr r0, =string ldr r0, [pc, #8] What about procedures? • Implement the following: foo(int x) { x=x+1 } main() { x = 1; foo(x); } What about procedures? • Implement the following: int foo(int x) { if (x < 5) x = foo(x+1); return(x) } main() { x = 1; x = foo(x); } The problem with Procedures • When we never branch back, who cares what state the registers are in? • If we branch and return, we expect our registers to be in the state that we left them! – Procedures need to use registers too! A Call Chain Image by David Thomas Procedure Calling 1. 2. 3. 4. 5. Place parameters for procedure in registers Transfer control to procedure Procedure acquires storage Procedure performs function. Procedure places return value in appropriate register 6. Return control The Stack • Region of memory managed with stack discipline. • Accessed with push and pop The Stack Procedure Call Example 0x000 0x004 0x008 0x100 push {lr} bl procedure_2 pop {pc} lr 0x080 0x108 sp 0x100 0x10c pc x 0x104 0x110 0x114 0x118 Procedure Call Example 0x000 0x004 0x008 0x100 0x080 push {lr} bl procedure_2 pop {pc} lr 0x080 0x108 sp 0x104 0x10c pc x 0x104 0x110 0x114 0x118 Procedure Call Example 0x000 0x004 0x008 0x100 0x080 0x104 0x008 0x108 push {lr} bl procedure_2 pop {pc} lr 0x008 0xa4f sp 0x104 0x10c 0xfff pc x 0x110 0xcc4 0x114 0xbeef 0x118 0xdead Procedure Call Example 0x000 0x004 0x008 0x100 0x080 0x104 0x008 0x108 push {lr} bl procedure_2 pop {pc} lr 0x080 0xa4f sp 0x100 0x10c 0xfff pc x 0x110 0xcc4 0x114 0xbeef 0x118 0xdead Stack Frames • Stack frames “belong” to a procedure. • Store local variables here (they go out of scope automatically) • Can communicate with other procedures with a stack frame. Communicating with Procedures Caller sets up stack frame 0x100 lr 0x104 return 0 0x108 return 1 0x10c 0x110 0x114 0x118 Communicating with Procedures Callee stores values before return 0x100 lr 0x104 return 0 0x108 return 1 0x10c 0x110 0x114 0x118 Procedures and Register Use • What if we want to use some registers in the procedure? • Caller could have data in it! Stack Discipline and ABI • Stack Discipline is important • Define an Application Binary Interface – How should procedures communicate? How should the be called? • Consistency, following standards, is the key. Conventions and Discipline Caller • Caller saves temporary values in its frame before the call. Callee • Callee saves temporary values in its frame before using. • Caller restores values in its frame after the call. • Callee restores values in its frame after using. WRAP UP For next time • More on stacks, procedures, and calling.