ppt

advertisement

MIPS ISA-II: Procedure Calls

& Program Assembly

Module Outline

Review ISA and understand instruction encodings

• Arithmetic and Logical Instructions

• Review memory organization

• Memory (data movement) instructions

• Control flow instructions

• Procedure/Function calls

• Program assembly, linking, & encoding

(2)

Reading

• Reading 2.8, 2.12

• Appendix A: A1 - A.6

• Practice Problems: 10, 14,23

• Goals

 Understand the binary encoding of complete program executables o How can procedures be independently compiled and linked (e.g., libraries)?

o What makes up an executable? o How do libraries become part of the executable?

o What is the role of the ISA in encoding programs?

o What constitutes the hardware/software interface

(3)

Procedure Calls

• Basic functionality

 Transfer of parameters & control to procedure

 Transfer of results & control back to the calling program

 Support for nested procedures

• What is so hard about this?

 Consider independently compiled code modules o Where are the inputs?

o Where should I place the outputs?

o Recall: What do you need to know when you write procedures in C?

(4)

Specifics

• Where do we pass data

 Preferably registers  make the common case fast

 Memory as an overflow area

• Nested procedures

 The stack, $fp, $sp and $ra

 Saving and restoring machine state

• Set of rules that developers/compilers abide by

 Which registers can am I permitted to use with no consequence?

 Caller and callee save conventions for MIPS

(5)

• Register usage

• What about nested calls?

• What about excess arguments?

Basic Parameter Passing

arg1: arg2: loop: func: exit:

.data

.word 22, 20, 16, 4

.word 33,34,45,8

.text

addi $t0, $0, 4 move $t3, $0 move $t1, $0 move $t2, $0 beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $v0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop sub $v0, $a0, $a1 jr $ra

---

PC

$31

PC

$31

+ 4

(6)

Leaf Procedure Example

• C code: int leaf_example (int g, h, i, j)

{ int f; f = (g + h) - (i + j); return f;

}

 Arguments g, …, j are passed in $a0, …, $a3

 f in $s0 (we need to save $s0 on stack – we will see why later)

 Results are returned in $v0, $v1 argument registers

$a0

$a1

$a2

$a3 procedure

$v0

$v1 result registers

(7)

Procedure Call Instructions

• Procedure call: jump and link jal ProcedureLabel

 Address of following instruction put in $ra

 Jumps to target address

• Procedure return: jump register jr $ra

 Copies $ra to program counter

 Can also be used for computed jumps o e.g., for case/switch statements

Example:

(8)

Leaf Procedure Example

• MIPS code: leaf_example: addi $sp, $sp, -4 sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra

Save $s0 on stack

Procedure body

Result

Restore $s0

Return

(9)

High Address

$fp

$sp

Old Stack Frame

Procedure Call Mechanics

System Wide Memory Map

$sp stack

$fp

New Stack

Frame

$sp arg registers return address

Saved registers local variables

$gp

PC dynamic data static data text reserved

Low Address compiler

ISA

HW compiler addressing

(10)

Example of the Stack Frame

$fp

$sp arg 1 arg 2

..

callee saved registers

$s0-$s9 caller saved registers

$a0-$a3

$t0-$t9 local variables

..

$fp

$ra

Call Sequence

1. place excess arguments

2. save caller save registers

($a0-$a3, $t0-$t9)

3. jal

4. allocate stack frame

5. save callee save registers

($s0-$s9, $fp, $ra)

6 set frame pointer

Return

1. place function argument in $v0

2. restore callee save registers

3. restore $fp

4. pop frame

5. jr $31

(11)

Policy of Use Conventions

Name Register number

$zero 0

$v0-$v1

$a0-$a3

2-3

4-7

$t0-$t7

$s0-$s7

$t8-$t9

8-15

16-23

24-25

$gp

$sp

$fp

28

29

30

$ra 31 values for results and expression evaluation arguments temporaries saved more temporaries global pointer stack pointer frame pointer

Usage the constant value 0 return address

(12)

Summary: Register Usage

• $a0 – $a3 : arguments (reg ’ s 4 – 7)

• $v0, $v1: result values (reg ’ s 2 and 3)

• $t0 – $t9 : temporaries

 Can be overwritten by callee

• $s0 – $s7: saved

 Must be saved/restored by callee

• $gp : global pointer for static data (reg

28)

• $sp : stack pointer (reg 29)

• $fp : frame pointer (reg 30)

• $ra : return address (reg 31)

(13)

Non-Leaf Procedures

• Procedures that call other procedures

• For nested call, caller needs to save on the stack:

 Its return address

 Any arguments and temporaries needed after the call

• Restore from the stack after the call

(14)

Non-Leaf Procedure Example

• C code: int fact (int n)

{ if (n < 1) return f; else return n * fact(n - 1);

}

 Argument n in $a0

 Result in $v0

(15)

Template for a Procedure

1. Allocate stack frame ( decrement stack pointer )

2. Save any registers ( callee save registers )

3. Procedure body ( remember some arguments may be on the stack!

)

4. Restore registers ( callee save registers )

5. Pop stack frame ( increment stack pointer )

6. Return ( jr $ra )

(16)

Non-Leaf Procedure Example

} int fact (int n)

{ callee save if (n < 1) return f; else return n * fact(n - 1) ; restore

(17)

Non-Leaf Procedure Example

• MIPS code:

Callee save fact: addi $sp, $sp, -8 # adjust stack for 2 items sw $ra, 4($sp) # save return address

Termination

Check sw $a0, 0($sp) # save argument slti $t0, $a0, 1 # test for n < 1

Leaf Node beq $t0, $zero, L1 addi $v0, $zero, 1 # if so, result is 1 addi $sp, $sp, 8 # pop 2 items from stack jr $ra # and return

L1: addi $a0, $a0, -1 # else decrement n

Recursive call jal fact # recursive call lw $a0, 0($sp) # restore original n lw $ra, 4($sp) # and return address

Intermediate

Node addi $sp, $sp, 8 # pop 2 items from stack mul $v0, $a0, $v0 # multiply to get result jr $ra # and return

(18)

Module Outline

Review ISA and understand instruction encodings

• Arithmetic and Logical Instructions

• Review memory organization

• Memory (data movement) instructions

• Control flow instructions

• Procedure/Function calls

• Program assembly, linking, & encoding

(19)

The Complete Picture

Reading: 2.12, A2, A3, A4, A5

C program compiler

Assembly assembler

Object module linker

Object library executable loader memory

(20)

The Assembler

• Create a binary encoding of all native instructions

 Translation of all pseudo-instructions

 Computation of all branch offsets and jump addresses

 Symbol table for unresolved (library) references

• Create an object file with all pertinent information

Header (information)

Example

:

Text segment

Data segment

Relocation information

Symbol table

(21)

Assembly Process

• One pass vs. two pass assembly

• Effect of fixed vs. variable length instructions

• Time, space and one pass assembly

• Local labels, global labels, external labels and the symbol table

 What does mean when a symbol is unresolved?

• Absolute addresses and re-location

(22)

L1: main: loop: then: exit:

.

data

.word 0x44,22,33,55 # array

.text

.globl main la $t0, L1 li $t1, 4 add $t2, $t2, $zero lw $t3, 0($t0) add $t2, $t2, $t3 addi $t0, $t0, 4 addi $t1, $t1, -1 bne $t1, $zero, loop bgt $t2, $0, then move $s0, $t2 j exit move $s1, $t2 li $v0, 10 syscall

Example

What changes when you relocate code?

00400000] 3c081001 lui $8, 4097 [L1]

[00400004] 34090004 ori $9, $0, 4

[00400008] 01405020 add $10, $10, $0

[0040000c] 8d0b0000 lw $11, 0($8)

[00400010] 014b5020 add $10, $10, $11

[00400014] 21080004 addi $8, $8, 4

[00400018] 2129ffff addi $9, $9, -1

[0040001c] 1520fffc bne $9, $0, -16 [loop-0x0040001c]

[00400020] 000a082a slt $1, $0, $10

[00400024] 14200003 bne $1, $0, 12 [then-0x00400024]

[00400028] 000a8021 addu $16, $0, $10

[0040002c] 0810000d j 0x00400034 [exit]

[00400030] 000a8821 addu $17, $0, $10

[00400034] 3402000a ori $2, $0, 10

[00400038] 0000000c syscall

Assembly

Program

Native

Instructions

Assembled

Binary

(23)

Linker & Loader

• Linker

 “Links” independently compiled modules

 Determines “real” addresses

 Updates the executables with real addresses

• Loader

 As the name implies

 Specifics are operating system dependent

(24)

Program A

Assembly A

Program B

Assembly B cross reference labels

Linking

header text static data reloc symbol table debug

• Why do we need independent compilation?

Study: Example on pg. 127

• What are the issues with respect to independent compilation?

• references across files ( can be to data or code!

)

• absolute addresses and relocation

(25)

Example:

# separate file

.text

addi $4, $0, 4 addi $5, $0, 5 jal func_add done

0x20040004

0x20050005

000011

0x0340200a

0x0000000c

# separate file

.text

.globl func_add func_add: add $2, $4, $5 0x00851020 jr $31 0x03e00008

0x00400000

0x00400004

0x00400008

0x0040000c

0x00400010

0x00400014

0x00400018

Ans: 0x0c100005

0x20040004

0x20050005

?

0x3402000a

0x0000000c

0x008551020

0x03e00008

(26)

Loading a Program

• Load from image file on disk into memory

1.

Read header to determine segment sizes

2.

Create virtual address space ( later )

3.

Copy text and initialized data into memory o Or set page table entries so they can be faulted in

4.

Set up arguments on stack

5.

Initialize registers (including $sp, $fp, $gp)

6.

Jump to startup routine o Copies arguments to $a0, … and calls main o When main returns, do exit syscall

(27)

Dynamic Linking

• Static Linking

 All labels are resolved at link time

 Link all procedures that may be called by the program

 Size of executables?

• Dynamic Linking: Only link/load library procedure when it is called

 Requires procedure code to be relocatable

 Avoids image bloat caused by static linking of all

( transitively ) referenced libraries

 Automatically picks up new library versions

(28)

Indirection table

Stub: Loads routine ID,

Jump to linker/loader

Linker/loader code

Dynamically mapped code

Lazy Linkage

(29)

The Computing Model Revisited

Register File (Programmer Visible State)

0x00

0x01

0x02

0x03

Memory Interface stack

Processor Internal Buses

0x1F

Dynamic Data

Program

Counter

Instruction register

Kernel registers

Programmer Invisible State

Data segment

(static)

Text Segment

Reserved

0xFFFFFFFF

Arithmetic Logic Unit (ALU)

Memory Map

Program Execution and the von Neumann model

(30)

Instruction Set Architectures (ISA)

• Instruction set architectures are characterized by several features

1. Operations

 Types, precision, size

2. Organization of internal storage

 Stack machine

 Accumulator

 General Purpose Registers (GPR)

3. Memory addressing

 Operand location and addressing

(31)

Instruction Set Architectures

4. Memory abstractions

 Segments, virtual address spaces (more later)

 Memory mapped I/O (later)

5. Control flow

 Condition codes

 Types of control transfers – conditional vs. unconditiional

• ISA design is the result of many tradeoffs

 Decisions determine hardware implementation

 Impact on time, space, and energy

• Check out ISAs for PowerPC, ARM, x86,

SPARC, etc.

(32)

ARM & MIPS Similarities

• ARM: the most popular embedded core

• Similar basic set of instructions to MIPS

Date announced

Instruction size

Address space

Data alignment

Data addressing modes

Registers

Input/output

ARM

1985

32 bits

32-bit flat

Aligned

9

15 × 32-bit

Memory mapped

MIPS

1985

32 bits

32-bit flat

Aligned

3

31 × 32-bit

Memory mapped

(33)

Compare and Branch in ARM

• Uses condition codes for result of an arithmetic/logical instruction

 Negative, zero, carry, overflow

 Compare instructions to set condition codes without keeping the result

• Each instruction can be conditional

 Top 4 bits of instruction word: condition value

 Can avoid branches over single instructions

CPU/Core

Z V C N

$0

$1

$31

ALU

(34)

Instruction Encoding

Differences?

(35)

The Intel x86 ISA

• Evolution with backward compatibility

 8080 (1974): 8-bit microprocessor o Accumulator, plus 3 index-register pairs

 8086 (1978): 16-bit extension to 8080 o Complex instruction set (CISC)

 8087 (1980): floating-point coprocessor o Adds FP instructions and register stack

 80286 (1982): 24-bit addresses, MMU o Segmented memory mapping and protection

 80386 (1985): 32-bit extension (now IA-32 ) o Additional addressing modes and operations o Paged memory mapping as well as segments

(36)

The Intel x86 ISA

• Further evolution…

 i486 (1989): pipelined , on-chip caches and FPU

 Pentium (1993): superscalar , 64-bit datapath o Later versions added MMX (Multi-Media eXtension) instructions o The infamous FDIV bug

 Pentium Pro (1995), Pentium II (1997) o New microarchitecture (see Colwell, The Pentium

Chronicles)

 Pentium III (1999) o Added SSE (Streaming SIMD Extensions) and associated registers

 Pentium 4 (2001) o New microarchitecture o Added SSE2 instructions

(37)

The Intel x86 ISA

• And further…

 AMD64 (2003): extended architecture to 64 bits

 EM64T – Extended Memory 64 Technology (2004) o AMD64 adopted by Intel (with refinements) o Added SSE3 instructions

 Intel Core (2006) o Added SSE4 instructions, virtual machine support

 AMD64 (announced 2007): SSE5 instructions

 Intel Advanced Vector Extension ( AVX announced

2008)

• If Intel didn ’ t extend with compatibility, its competitors would!

 Technical elegance ≠ market success

• Commonly thought of as a Complex Instruction

Set Architecture (CISC)

(38)

Basic x86 Registers

(39)

Basic x86 Addressing Modes

• Two operands per instruction

Source/dest operand

Register

Second source operand

Register

Register

Register

Memory

Immediate

Memory

Register

Memory Immediate

Memory addressing modes

Address in register

Address = R base

+ displacement

Address = R base

Address = R base

+ 2 scale × R index

+ 2 scale × R index

(scale = 0, 1, 2, or 3)

+ displacement

(40)

x86 Instruction Encoding

• Variable length encoding

 Postfix bytes specify addressing mode

 Prefix bytes modify operation o Operand length, repetition, locking, …

(41)

Implementing IA-32

• Complex instruction set makes implementation difficult

 Hardware translates instructions to simpler microoperations o Simple instructions: 1–1 o Complex instructions: 1–many

 Microengine similar to RISC

 Market share makes this economically viable

• Comparable performance to RISC

 Compilers avoid complex instructions

• Better code density

(42)

Fallacies

• Powerful instruction  higher performance

 Fewer instructions required

 But complex instructions are hard to implement o May slow down all instructions, including simple ones

 Compilers are good at making fast code from simple instructions

• Use assembly code for high performance

 But modern compilers are better at dealing with modern processors

 More lines of code  more errors and less productivity

(43)

Fallacies

• Backward compatibility  instruction set does not change

 But they do accrete more instructions x86 instruction set

(44)

Summary

• Instruction complexity is only one variable

 lower instruction count vs. higher CPI / lower clock rate

• Design Principles:

 simplicity favors regularity

 smaller is faster

 good design demands compromise

 make the common case fast

• Instruction set architecture

 a very important abstraction indeed!

(45)

Study Guide

• Compute number of bytes to encode a SPIM program

• What does it mean for a code segment to be relocatable?

• Identify addresses that need to be modified when a program is relocated.

 Given the new start address modify the necessary addresses

• Given the assembly of an independently compiled procedure, ensure that it follows the MIPS calling conventions, modifying it if necessary

(46)

Study Guide (cont.)

• Given a SPIM program with nested procedures, ensure that you know what registers are stored in the stack as a consequence of a call

• Encode/disassemble jal and jr instructions

• Computation of jal encodings for independently compiled modules

• How can I make procedure calls faster?

 Hint: What about a call is it that takes time?

• How are independently compiled modules linked into a single executable? (assuming one calls a procedure located in another)

(47)

• Argument registers

• Caller save registers

• Callee save registers

• Disassembly

• Frame pointer

• Independent compilation

• Labels: local, global, external

• Linker/loader

• Linking: static vs. dynamic vs. lazy

Glossary

• Native instructions

• Nested procedures

• Object file

• One/two pass assembly

• Procedure invocation

• Pseudo instructions

• Relocatable code

• Stack frame

• Stack pointer

• Symbol table

• Unresolved symbol

(48)

Download