Course Notes
Rick Parent ( parent@cse.ohio-state.edu
) http://www.cse.ohio-state.edu/~parent
Wayne Heym ( w.heym@ieee.org
) http://www.cse.ohio-state.edu/~heym
Copyright © 1998-2005 by Rick Parent, Todd Whittaker, Bettina Bair, Pete Ware, Wayne Heym
CSE360 1
Positional Number Systems: position of character in string indicates a power of the base (radix).
Common bases: 2, 8, 10, 16. (What base are we using to express the names of these bases?)
– Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form the alphabet of the decimal system.
E.g., 316
10
=
– Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the alphabet.
E.g., 474
8
=
CSE360 2
CSE360
– Base 16 (hexadecimal): digits 0-9 and A-F.
E.g., 13C
16
=
– Base 2 (binary): digits (called “ bits
”) 0, 1 form the alphabet.
E.g., 100110 =
– In general, radix r representations use the first r chars in
{0…9, A...Z} and have the form
Summing d n -1
r n -1 + d n -2 d n -1 d n -2
r n -2 + … + d
0
… d
1 d
0
.
r 0 will convert to base 10. Why to base 10?
3
Base Conversions
– Convert to base 10 by multiplication of powers
E.g., 10012
5
= ( )
10
– Convert from base 10 by repeated division
E.g., 632
10
= ( )
8
– Converting base x to base y : convert base x to base 10 then convert base 10 to base y
CSE360 4
CSE360
– Special case: converting among binary, octal, and hexadecimal is easier
Go through the binary representation, grouping in sets of 3 or
4.
E.g., 11011001
2
= 11 011 001 = 331
8
11011001
2
E.g., C3B
16
= 1101 1001 = D9
= ( )
8
16
5
What is special about binary?
– The basic component of a computer system is a transistor (tran sfer re sistor): a two state device which switches between logical “1” and “0” (actually represented as voltages on the range 5V to 0V).
– Octal and hexadecimal are bases in powers of 2, and are used as a shorthand way of writing binary. A hexadecimal digit represents 4 bits, half of a byte.
1 byte = 8 bits. A bit is a b inary dig it .
– Get comfortable converting among decimal, binary, octal, hexadecimal. Converting from decimal to hexadecimal (or binary) is easier going through octal.
CSE360 6
Binary Hex
0000 0
0001
0010
1
2
0011
0100
0101
0110
0111
5
6
3
4
7
5
6
3
4
7
1
2
Decimal Binary Hex
0 1000 8
1001
1010
9
A
1011
1100
1101
1110
1111
D
E
B
C
F
11
12
13
14
15
Decimal
8
9
10
CSE360 7
Ranges of values
– Q: Given k positions in base n , how many values can you represent?
– A: n k values over the range (0… n k -1)
10 n =10, k =3: 10 3 =1000 range is (0…999)
10 n =2, k =8: 2 8 =256 range is (0…255)
10 n =16, k =4: 16 4 =65536 range is (0…65535)
10
– Q: How are negative numbers represented?
CSE360 8
Integer representation:
– Value and representation are distinct. E.g., 12 may be represented as XII, C
16 be represented as -C
16
, 12
10
, and 1100
2
. Note: -12 may
, -12
10
, and -1100
2
.
– Simple and efficient use of hardware implies using a specific number of bits, e.g., a 32-bit string, in a binary encoding. Such an encoding is “fixed width.”
– Four methods: (fixed-width) simple binary, signed magnitude, binary coded decimal, and 2’s complement.
– Simple binary: as seen before, all numbers are assumed to be positive, e.g., 8-bit representation of
66
10
= 0100 0010
2 and 194
10
= 1100 0010
2
CSE360 9
CSE360
– Signed magnitude: simple binary with leading sign bit.
0 = positive, 1 = negative. E.g., 8-bit signed mag.:
66
10
= 0 100 0010
2
-66
10
= 1 100 0010
2
What ranges of numbers may be expressed in 8 bits?
Largest:
Smallest:
Extend 1100 0010 to 12 bits:
10
CSE360
Problems: (1) Compare the signed magnitude numbers
1000 0000 and 0000 0000 . (2) Must have “subtraction” hardware in addition to “addition” hardware.
– Binary Coded Decimal (BCD): use a 4 bit pattern to express each digit of a base 10 number
0000 = 0 0001 = 1 0010 = 2 0011 = 3
0100 = 4 0101 = 5 0110 = 6 0111 = 7
1000 = 8 1001 = 9 1010 = + 1011 = -
E.g., 123 : 0000 0001 0010 0011
+123 : 1010 0001 0010 0011
-123 : 1011 0001 0010 0011
11
CSE360
BCD Disadvantages:
– Takes more memory. 32 bit simple binary can represent more than 4 billion discrete values. 32 bit BCD can hold a sign and
7 digits (or 8 digits for unsigned values) for a maximum of
110 million values, a 97% reduction.
– More difficult to do arithmetic. Essentially, we must force the
Base 2 computer to do Base 10 arithmetic.
BCD Advantages:
– Used in business machines and languages, i.e., in COBOL for precise decimal math.
– Can have arrays of BCD numbers for essentially arbitrary precision arithmetic.
12
CSE360
– Two’s Complement
Used by most machines and languages to represent integers. Fixes the -0 in the signed magnitude, and simplifies machine hardware arithmetic.
Divides bit patterns into a positive half and a negative half (with zero considered positive); n bits creates a range of [-2 n-1 … 2 n-1 -1].
CODE
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Simple
0
1
2
5
6
7
3
4
8
9
10
11
12
13
14
15
Signed
+0
1
2
5
6
7
3
4
-0
-1
-5
-6
-7
-2
-3
-4
2’s comp
0
1
2
5
6
7
3
4
-8
-7
-3
-2
-1
-6
-5
-4
13
CSE360
– Representation in 2’s complement; i.e., represent i in n-bit 2’s complement, where -2 n-1
i
+2 n-1 -1
Nonnegative numbers: same as simple binary
Negative numbers:
– Obtain the n -bit simple binary equivalent of | i |
– Obtain its negation as follows:
• Invert the bits of that representation
• Add 1 to the result
Ex.: convert -320
10 to 16-bit 2’s complement
Ex.: extend the 12-bit 2’s complement number
1101 0111 1000 to 16 bits.
14
Binary Arithmetic
– Addition and subtraction only for now
– Rules: similar to standard addition and subtraction, but only working with 0 and 1.
0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
0 - 0 = 0
1 - 0 = 1
1 - 1 = 0
1 + 1 = 10 10 - 1 = 1
– Must be aware of possible overflow .
Ex.: 8-bit signed magnitude 0101 0110 + 0110 0011 =
Ex.: 8-bit signed magnitude 0101 0110 - 0110 0011 =
CSE360 15
2’s Complement binary arithmetic
– Addition and subtraction are the same operation
– Still must be aware of overflow.
Ex.: 8 bit 2’s complement: 23
10
+ 45
10
=
Ex.: 8 bit 2’s complement: 100
10
+ 45
10
=
Ex.: 8 bit 2’s complement: 23
10
- 45
10
=
CSE360 16
CSE360
– 2’s Complement overflow
Opposite signs on operands can’t overflow
If operand signs are same, but result’s sign is different, must have overflow
Can two positives sum to positive and still have overflow?
Can two negatives?
17
Characters and Strings
– EBCDIC, Extended Binary Coded Decimal Interchange Code
Used by IBM in mainframes (360 architecture and descendants).
Earliest system
– ASCII, American Standard Code for Information Interchange.
Most common system
– Unicode, http://www.unicode.org
New international standard
Variable length encoding scheme with either 8- or 16-bit minimum
“a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.”
CSE360 18
ASCII
– see table 1.7 on pg. 18.
In Unix, run “man ascii”.
– 7 bit code
Printable characters for human interactions
Control characters for non-human communication (computercomputer, computer-peripheral, etc.)
– 8-bit code: most significant bit may be set
Extended ASCII (IBM), includes graphical symbols and lines
ISO 8859, several international standards
Unicode’s UTF-8, variable length code with 8-bit minimum
CSE360 19
Easy to decode
– But takes up a predictable amount of space
Upper and lower case characters are 0x20 (32
10
) apart
ASCII representation of ‘3’ is not the same as the binary representation of 3.
– To convert ASCII to binary (an integer), ‘3’-‘0’ = 3
Line feed (LF) character
– 000 1010
– ‘\n’ = 0xa
2
= 0x0a = 10
10
Character ASCII Binary ASCII Hex
‘ ’
‘A’
‘a’
‘R’
‘r’
‘0’
‘3’
010 0000
1
1
1
1
0
1
0
1
0 0001
0 0001
1 0010
1 0010
011 0000
011 0011
0x20
0x41
0x61
0x52
0x72
0x30
0x33
CSE360 20
String: definition is programming language dependent.
– C, C++: strings are arrays of characters terminated by a null byte.
Decode:
1000001, 1010011, 1000011, 1001001, 1001001, 0100000, 1101001,
1110011, 0100000, 1100101, 1100001, 1110011, 1111001, 0000000
– Or (in hex):
41 53 43 49 49 20 69 73 20 65 61 73 79 00
How many bytes is this?
What’s the use of the ’00’ byte at the end?
CSE360 21
Simple data compression
– ASCII codes are fixed length.
– Huffman codes are variable length and based on statistics of the data to be transmitted.
Assign the shortest encoding to the most common character.
– In English, the letter ‘e’ is the most common.
– Either establish a Huffman code for an entire class of messages,
– Or create a new Huffman code for each message, sending/storing both the coding scheme and the message.
“a widely used and very effective technique for compressing data; savings of 20% to 90% are typical, depending on the characteristics of the file being compressed.” (Cormen, p. 337)
CSE360 22
Char
Fixed len encoding
00
Freq
.5
Var len encoding
1
# bits
1
Expected
# bits
.5
01
10
11
.25
.15
.10
01
001
000
2
3
3
.5
.45
.3
Avg len 2 1.75
CSE360 23
CSE360
Huffman Tree for “a man a plan a canal panama”
– Examine data set and determine frequencies of letters (example ignores spaces, normally significant)
Count Frequency
‘a’
‘c’
‘l’
‘m’
‘n’
‘p’
2
4
2
10
1
2
0.476190
0.047619
0.095238
0.095238
0.190476
0.095238
– Create a forest of single node trees. Choose the two trees having the smallest total frequencies (the two “smallest” trees), and merge them together (lesser frequency as the left subtree, for definiteness, to make grading easier). Continue merging until only one tree remains.
24
Reading a ‘1’ calls for following the left branch.
Reading a ‘0’ calls for following the right branch.
Decoding using the tree:
To decode ‘0001’, start at root and follow r_child, r_child, r_child, l_child, revealing encoded ‘m’.
Huffman Tree for "a man a plan a canal panama"
1.0
'a'
.4762
.5238
'n'
.1905
.3333
.1428
.1905
'c'
.0476
'l'
.0952
'm'
.0952
'p'
.0952
CSE360 25
CSE360
Comparison of Huffman and 3-bit code example
– 3-bit: 000 011000100 000 101010000100 000
001000100000010 101000100000011000 = 63 bits
– Huffman: 1 0001101 1 00000010101 1
001110110010 0000101100011 = 46 bits
– Savings of 17 bits, or 27% of original message
‘a’
‘c’
‘l’
‘m’
‘n’
‘p’
Totals
3-bit code Huffman Code Count H length 3 length
000 1 10 10 30
001
010
0011
0010
1
2
4
8
3
6
011
100
101
0001
01
0000
2
4
2
8
8
8
6
12
6
46 63
26
Data transmission, aging media, static interference, dust on media, etc. demand the ability to detect errors.
Single bit errors detected by using parity checking.
Parity, here, is the “the state of being odd or even.”
CSE360 27
CSE360
– How to detect a 1-bit error:
Ex.: send ASCII ‘S’ : send 1010011 , but receive 1010010 ?
Add a 1-bit parity to make an odd or even number of bits per byte.
‘S’ ‘E’
ASCII 101 0011 100 0101
Even parity 0101 0011 1100 0101
Odd Parity 1101 0011 0100 0101
Parity bit is stripped by hardware after checking.
Sender/receiver both agree to odd or even parity.
2 flipped bits in the same encoding are not detected.
28
Two meanings for Hamming distance. 2 nd is generalization of 1 st . 1 st is: distance between two encodings of the same length.
1. A count of the number of bits different in encoding 1 vs. encoding
2.
E.g., dist(1100, 1001) = dist(0101, 1101) =
2. Generalize to an entire code by taking the minimum over all distinct pairs (2 nd meaning).
– The ASCII encoding scheme has a Hamming distance of 1.
– A simple parity encoding scheme has a Hamming distance of 2.
Hamming distance serves as a measure of the robustness of error checking (as a measure of the redundancy of the encoding).
CSE360 29
Editing, Assembling, Linking, and Loading
– There are three components to the Instructional SPARC Emulator
(ISEM) package that we use for this class:
the assembler,
the linker, and
the emulator/debugger.
CSE360 30
Editing
– There are a number of programs that you can use to create your source files.
Emacs is probably the most popular;
vi is also available, but its command syntax is difficult to learn and use;
using pine program, you can use the pico editor, which combines many features of Emacs into a simple menu-driven facility.
– Start Emacs by “xemacs sourcefile.s &”, which creates the file called sourcefile.s.
– Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t".
– For other editors, you are on your own.
CSE360 31
% type xmp0.s
.data
! Assembler directive: data starts here. A_m, B_m, and
A_m: .word ’?’ ! C_m are symbolic constants. Furthermore, each
B_m : .word 0x30 ! is an address of a certain-sized chunk of memory. Here,
C_m : .word 0 ! each chunk is four bytes (one word) long. When the
! program gets loaded, each of these chunks stores a
! number in 2’s complement encoding, as follows: At
! address C_m, zero; at B_m, 48; at A_m, 0x3F = 077 = 63.
start:
.text
set A_m, %r2 ld [%r2], %r2 set B_m, %r3 ld [%r3], %r3
! Assembler directive, instructions start here
! Label (symbolic constant) for this address
! Put address A_m into register 2
! Use r2 as an indirect address for a load (read)
! Put address B_m into register 3
! Read from B_m and replace r3 w/ value at addr B_m sub %r2, %r3, %r2 ! Subtract r3 from r2, save in r2 set C_m, %r4 ! Put address C_m into register 4 st %r2, [%r4] terminate: ta 0
! Store (write) r2 to memory at address C_m
! Label for address where ’ta 0’ instruction stored
! Stop the program beyond_end: ! Label for address beyond the end of this program
CSE360 32
Assembling
– The assembler is called "isem-as", and is the GNU Assembler
(GAS), configured to cross-assemble to a SPARC object format.
– It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator.
– The syntax for invoking the assembler is: isem-as [-a[ls]] sourcefile.s -o objectfile.o
– The input is read from sourcefile.s, and the output is written to objectfile.o.
– The option "-a" tells the assembler to produce a listing file. The sub-options "l" and "s" tell the assembler to include the assembly source in the listing file and produce a symbol table, respectively.
CSE360 33
The listing file
– Will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file.
– Column 1 identifies a line number in your source file.
– Column 2 is an offset for where this instruction or data resides in memory.
– Column 3 is the image of what is put in memory, either the machine instructions or the representation of the data.
– The final column is the source code that produced the line.
– At the bottom of the file you will find the symbol table.
– Again, the symbols are represented as offsets that are relocated when the program is loaded into memory.
CSE360 34
1 .data
2 0000 0000003F A_m: .word ’?’
3 0004 00000030 B_m: .word 0x30
4 0008 00000000 C_m: .word 0
5 000c 00000000 .text
6 start:
7 0000 05000000 set A_m, %r2
7 8410A000
8 0008 C4008000 ld [%r2], %r2
9 000c 07000000 set B_m, %r3
9 8610E000
10 0014 C600C000 ld [%r3], %r3
11 0018 84208003 sub %r2, %r3, %r2
12 001c 09000000 set C_m, %r4
12 88112000
13 0024 C4210000 st %r2, [%r4]
14 terminate:
15 0028 91D02000 ta 0
16 002c 01000000 beyond_end:
Line in source file
(.s)
DEFINED SYMBOLS xmp0.s:2 .data:00000000 A_m xmp0.s:3 .data:00000004 B_m xmp0.s:4 .data:00000008 C_m xmp0.s:6 .text:00000000 start xmp0.s:14 .text:00000028 terminate xmp0.s:16 .text:0000002c beyond_end
Labels are symbolic offsets
Offset to address in memory
Contents at address in memory
NO UNDEFINED SYMBOLS
CSE360 35
Linking
– Linking turns a set of raw object file(s) into an executable program.
– From the manual page, " ld combines a number of object and archive files, relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld ."
– Several object files are combined into one executable using ld; the separate files could reference symbols from one another.
– The output of the linker is an executable program.
– The syntax for the linker is as follows: isem-ld objectfile.o [-o execfile]
Examples
% isem-ld foo.o -o foo Links foo.o into the executable foo.
% isem-ld foo.o Links foo.o into the executable a.out.
CSE360 36
Loading/Running
– Execute the program and test it in the emulation environment.
– The program "isem" is used to do this, and the majority of its features are covered in your lab manual.
– Invoke isem as follows isem [execfile]
Examples
% isem foo Invokes the emulator, loads the program foo
% isem Invokes the emulator, no program is loaded
– Once you are in the emulator, you can run your program by typing "run" at the prompt.
CSE360 37
% isem xmp0
Instructional SPARC Emulator
Copyright 1993 - Computer Science Department
University of New Mexico
ISEM comes with ABSOLUTELY NO WARRANTY
ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998
Loading File: xmp0
2000 bytes loaded into Text region at address 8:2000
2000 bytes loaded into Data region at address a:4000
PC: 08:00002020 nPC: 00002024 PSR: 0000003e N:0 Z:0 V:0 C:0 start : sethi 0x10, %g2
ISEM> run
Program exited normally.
Assembly language programs are not notoriously chatty.
CSE360 38
reg
– Gives values of all 32 general registers
– Also PC
ISEM> reg
----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---
G 00000000 00000000 0000000f 00000030 00004008 00000000 00000000 00000000
O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
PC: 08:0000204c nPC: 00002050 PSR: 0000003e N:0 Z:0 V:0 C:0 symb
– Shows the resolved values of all symbolic constants
ISEM> symb
Symbol List beyond_end : sethi 0x0, %g0 dump [addr]
– Either symbol or hex address
– Gives the values stored in memory
A_m : 00004000
B_m : 00004004
.
.
.
terminate : 00004028
ISEM> dump A_m
0a:00004000 00 00 00 3f 00 00 00 30 00 00 00 0f 00 00 00 00 ...?...0........
0a:00004010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0a:00004020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
CSE360 39
break [addr]
– Set breakpoints in execution
– Once execution is stopped, you can look at the contents of registers and memory.
trace
– Causes one (or more) instruction(s) to be executed
– Registers are displayed
– Handy for sneaking up on an error when you’re not sure where it is.
For the all-time “most wanted” list of errors (and their fixes)
– http://www.cse.ohio-state.edu/~heym/360/common/faq.html
CSE360 40
Terminology from Ch. 2:
– Flip flop: basic storage device that holds 1 bit
– D flip flop: special flip flop that outputs the last value that was input to it (a d ata signal).
– Clock: two different meanings: (1) a control signal that oscillates (low to high voltage) every x nanoseconds;
(2) the “write select” line for a flip flop.
Data
In
D Flip Flop
Data
Out
Clock one cycle
CSE360 41
CSE360
– Register: collection of flip flops with parallel load.
Clock (or “write select”) signal controlled. Stores instructions, addresses, operands, etc.
– Bus: Collection of related data lines (wires).
Input Bus d7 d6 d5 d4 d3 d2 d1 d0
Clock Clock
8
8 Bit Register
8
Output Bus
42
– Combinational circuits: implement Boolean functions.
No feedback in the circuit, output is strictly a function of input.
Gates: and, or, not, xor
AND
OR NOT XOR
CSE360
E.g., xy +
z x y z f
43
CSE360
– Gates can be used in combination to implement a simple (half) adder.
Addition creates a value, plus a carry-out.
Z = X
Y
CO = X
Y
X Y Z CO
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
X
Y
Z
CO
44
– Sequential Circuits: introduce feedback into the circuit.
Outputs are functions of input and current state .
D
Q
C
– Multiplexers: combinational circuits that use n bits to select an output from 2 n input lines.
i0 i1 i2 i3
4 to 1 MUX f s0 s1
CSE360 45
Von Neumann
Architecture
– Can access either instructions or data from memory in each cycle.
– One path to memory
( von Neumann bottleneck )
– Stored program system. No distinction between programs and data
Main Memory System
Address
Pathway
Data and
Instruction
Pathway
Operational Registers Arithmetic and Logic Unit
Program Counter
Control Unit
Input/Output System
CSE360 46
Examples of Von Neumann architecture to be explored in this course:
SAM: tiny, good for learning architecture
MIPS: text’s example assembly language
SPARC: labs
M68HC11: used in ECE 567 (taken by CSE majors)
Roughly, the order of presentation in this course is as follows:
A couple of days on the Main Memory System
Weeks on the Central Processing Unit (CPU)
Finish the course with the I/O System
CSE360 47
Memory: Can be viewed as an array of storage elements.
– The index of each element is called the address .
– Each element holds the same number of bits. How many bits per element? 8, 16, 32, 64?
32 bits 64 bits 8 bits = 1 byte
2
...
0
1 n-1
2
...
0
1 n-1
16 bits
0
1
2
...
n-1
0
1
2
...
n-1
CSE360 48
•If a machine’s memory is 5-bit addressable , then, at each distinct address, 5 bits are stored. The contents at each address are represented by 5 bits.
• If 3 bits are used to represent memory addresses , then the memory can have at most 2 3 = 8 distinct addresses.
• Such a memory can store at most
8 5 = 40 bits of data.
• If the data bus is 10 bits wide, then up to 10 bits at a time can be transferred between memory and processor; this is a 10-bit word.
Address
Decimal Binary
0
Contents
000 00011
1 001 01111
4
5
2
3
6
7
010
011
100
101
110
111
01110
10100
00101
01110
10100
10011
CSE360 49
Let’s look deeper.
– Suppose each memory element is stored in a bank and given a relative address.
– You could have several such banks in your memory.
– The GLOBAL address of each element would be:
[relative address] & [bank address] .
– To get two elements at a time, start reading from bank 0 (don’t start from bank 1; this would be a
“memory address not aligned” error).
000
001
010
011
100
101
Bank 0
000 0
001 0
010 0
011 0
100 0
101 0
000
001
010
011
100
101
000
001
010
011
100
101
Bank 0
Bank 1
000 1
001 1
010 1
011 1
100 1
101 1
Global addresses , not contents.
Think of the contents as being underneath the global addresses .
CSE360 50
CSE360
– Memory alignment: Assume a byte addressable machine with 4-byte words. Where are operands of various sizes positioned?
bytes: on a byte boundary (any address)
half words: on half word boundary (even addresses)
words: on word boundary (addresses divisible by 4)
double words: on double word boundary (addresses divisible by 8)
51
Byte ordering: how numeric data is stored in memory
– Ex.: 247896511
10
– Stored at address 0
= 0EC699BF
16
0 OE
1 C6
2 99
Big Endian
High order
(big end) is at byte 0
Little
Endian
Low order
(little end) is at byte 0
0 BF
1 99
2 C6
3 BF 3 0E
Contrast with bit ordering
CSE360
7 6 5 4 3 2 1 0
1 0 1 1 1 1 1 1
52
Read/Write operations: must know the address to read or write.
(read = fetch = load, write = store)
CPU puts address on address bus
A0
A1
CPU sends read signal
– (R/
W=1, CS=1)
A(m-1)
– (Read/don’t Write, Chip Select)
Wait
CS
R/
W
Memory puts data on data bus
– reset (CS=0)
D0
D1
D(n-1)
CSE360 53
CSE360
– Types of memory:
ROM: Read Only Memory: non-volatile (doesn’t get erased when powered down; it’s a combinational circuit!)
PROM: Programmable ROM: use a ROM burner to write data to it initially. Can’t be re-written.
EPROM: Erasable PROM. Uses UV light to erase.
EEPROM: Electrically Erasable PROM.
RAM: Random access memory. Can efficiently read/write any location (unlike sequential access memory). Used for main memory.
– Many variations (types) of RAM, all volatile
• SDRAM, DDR SDRAM
• RDRAM
• www.tomshardware.com
54
CPU: executes instructions -- primitive operations that the computer can perform.
– E.g., arithmetic A+B data movement A := B control logical if expr goto label
AND, OR, XOR…
Instructions specify both the operation and the operands . An encoded operand is often a location in memory where the value of interest may be found (address of value of interest).
CSE360 55
CSE360
– Instruction set: all instructions for a machine.
Instruction format specifies number and type of operands.
Ex.: Could have an instruction like
ADD A, B, R
Where A , B , and R are the addresses of operands in memory.
The result is R := A+B .
Addr
0
4
8
C
Memory
8
9
17
Label
A
B
R
56
CSE360
– Actually, the “instruction” might be represented in a source file as:
0x41444420412C20422C20520A . …
A D D A , B , R
As such, it is an assembly language instruction.
– An assembler might translate it to, say, 0x504C , the machine’s representation of the instruction.
As such, it is a machine language instruction.
57
Simple instruction set: the Accumulator machine.
– Simplify instruction set by only allowing one operand.
Accumulator implied to be the second operand.
– Accumulator is a special register. Similar to a simple calculator.
ADD addr
SUB addr
MPY addr
DIV addr
LOAD addr
STORE addr
ACC
ACC + M[addr]
ACC ACC - M[addr]
ACC ACC * M[addr]
ACC
ACC / M[addr]
ACC
M[addr]
M[addr]
ACC
CSE360 58
CSE360
Ex.: C = A B + C D
LOAD 20
MPY 21
STORE 30
LOAD 22
MPY 23
ADD 30
STORE 22
! 1)Acc<-M[20]
! 2)Acc<-Acc*M[21]
! M[30]<-Acc
! 3)Acc<-M[22]
! 4)Acc<-Acc*M[23]
! 5)Acc<-Acc+M[30]
! M[22]<-Acc
20
21
22
23
...
30
C
D
A
B temp
Accumulator
1)
2)
3)
4)
5)
– Machine language: Converting from assembly language to machine language is called assembling .
59
Assume 8-bit architecture. Each instruction may be 8 bits. 3 bits hold the op-code and 5 bits hold the operand.
op-code operand
7 5 4 0
How much memory can we address?
How many op-codes can we have?
Convert the mnemonic op-codes into binary codes.
Operation Code
ADD
SUB
MPY
DIV
LOAD
STORE
000
001
010
011
100
101
CSE360 60
CSE360
Hand assemble our program:
LOAD 20
MPY 21
STORE 30
...
100 10100
010 10101
101 11110
...
Instructions are stored in consecutive memory:
5
6
…
20
21
0
1
2
3
4
22
23
…
30
Addr Memory Mnemonic
6
7
…
20
100 10100 LOAD A
010 10101 MPY B
101 11110 STORE temp
100 10110 LOAD C
010 10111 MPY D
…
4
000 11110 ADD temp
101 10110 STORE C
5
A
B
C
D temp
61
CSE360
INC
Addr
Decode
Op
Timing and
Control
3 PC
IR
2 12
9
4 Bus
5 6 7
0
MAR MDR 10 11
ACC 1
ALU
Memory
8
13 14
62
CSE360
– Control signals: control functional units to determine order of operations, access to bus, loading of registers, etc.
2
3
4
0
1
5
6
7
Number Operation Number
ACC bus load ACC
PC bus load PC load IR load MAR
MDR bus load MDR
8
9
10
11
12
13
14
Operation
ALU ACC
INC PC
ALU operation
ALU operation
Addr bus
CS
R/W
63
0
9
ACC 1
3
INC
PC
Addr
Decode
Op
Timing and
Control
2 12
IR
4
10 11
5
MAR
6 7
Bus
MDR
0
3
4
1
2
5
6
7
Number Operation Number
ACC
PC
load IR bus load ACC bus load PC load MAR
MDR bus load MDR
8
9
10
11
12
13
14
Operation
ALU ACC
INC PC
ALU operation
ALU operation
Addr bus
CS
R/W
ALU
Memory
8
13 14
CSE360 64
7
8
4
5
1
2
3
State
0
PC to bus load MAR
INC to PC load PC
CS, R/W
MDR to bus load IR
Addr to bus load MAR
Fetch
Y
ACC to bus load MDR
CS
OP=store
Y
MDR to bus load ACC
N
CS, R/W
OP=load
N
MDR to bus
ALU to ACC
ALU op load ACC
Execute
6
CSE360 65
Put the address of the next instruction in the Addr Register and Inc. PC.
INC
Addr
Decode
Op
PC to bus load MAR
INC to PC load PC
CS, R/W
MDR to bus load IR
Addr to bus load MAR
Fetch
9
3 PC
2 12
IR
4
5
Timing and
Control
6 7
Bus
0
OP=store
10 11 MAR MDR
ACC to bus load MDR
CS
CS, R/W
ACC 1
ALU OP=load
Execute
Memory
8
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
13 14
CSE360 66
Fetch the word of memory at Address, and load into Data Register.
INC
PC to bus load MAR
INC to PC load PC
Addr
Decode
Op
CS, R/W
Fetch
3 PC
Timing and
Control
MDR to bus load IR
IR
Addr to bus load MAR
9
2 12
4 Bus
OP=store
5 6 7
0
ACC to bus load MDR
CS, R/W
Execute 10 11 MAR MDR
CS
OP=load
ACC 1
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
ALU
Memory
8
13 14
67 CSE360
Send the word from the Data Register to the Instruction Register.
INC
PC to bus load MAR
INC to PC load PC
Addr
Decode
Op
CS, R/W
Fetch
3 PC
Timing and
Control
MDR to bus load IR IR
Addr to bus load MAR
2 12
9
4 Bus
OP=store 5 6 7
0
ACC to bus load MDR
CS
CS, R/W
ACC 1
10 11 MAR MDR
OP=load
Execute
ALU
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
8
Memory
13 14
68 CSE360
Put the address from the instruction in the Address Register.
INC
PC to bus load MAR
INC to PC load PC
Addr
Decode
Op
CS, R/W
Fetch
3 PC
Timing and
Control
MDR to bus load IR
IR
Addr to bus load MAR 9
2 12
4 Bus
OP=store
5 6 7
0
ACC to bus load MDR
CS
CS, R/W
ACC 1
10 11 MAR MDR
OP=load
Execute
ALU MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
Memory
8
13 14
69 CSE360
PC
MAR
MDR
IR
ACC
CSE360 70
Take the value from the ACCumulator and store it in the Data Register.
INC
Addr
PC to bus load MAR
INC to PC load PC
Decode
Op
Timing and
Control
CS, R/W
Fetch
3 PC
MDR to bus load IR
Addr to bus load MAR
9
2 12
IR
4 Bus
5
OP=store
6 7
0
MAR MDR ACC to bus load MDR
CS
CS, R/W
Execute
10 11
OP=load
ACC 1
ALU
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
8
Memory
13 14
71 CSE360
Write the data from the Data Register to the address stored in the MAR.
INC
Addr
PC to bus load MAR
INC to PC load PC
Decode
Op
Timing and
Control
CS, R/W
Fetch
3 PC
MDR to bus load IR
Addr to bus load MAR
9
2 12
IR
4 Bus
5
OP=store
6 7
0
MAR MDR
ACC to bus load MDR
CS
CS, R/W
Execute
ACC
10 11
OP=load
1
ALU
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
8
Memory
13 14
72 CSE360
Load the word at the Address from the Addr Reg into the Data Register.
INC
PC to bus load MAR
INC to PC load PC
Addr
Decode
Op
CS, R/W
Fetch
3 PC
Timing and
Control
MDR to bus load IR
Addr to bus load MAR
9
2 12
IR
4 Bus
OP=store 5 6 7
0
ACC to bus load MDR
CS
CS, R/W
ACC 1
10 11 MAR MDR
OP=load
Execute
ALU
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
8
Memory
13 14
73 CSE360
PC
MAR
MDR
IR
ACC
CSE360 74
Load the word from Data Register into the ACCumulator.
INC
PC to bus load MAR
INC to PC load PC
Addr
Decode
Op
Fetch
CS, R/W
3 PC
MDR to bus load IR
IR
Addr to bus load MAR 9
2 12
4
OP=store
Timing and
Control
5 6 7
Bus
0
ACC to bus load MDR
CS
CS, R/W
ACC 1
10 11 MAR MDR
OP=load
Execute
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
ALU
Memory
8
13 14
75 CSE360
Use word from the Data Register for Arith Op and put result in ACC.
INC
PC to bus load MAR
INC to PC load PC
CS, R/W
MDR to bus load IR
Addr to bus load MAR
Fetch
9
3 PC
Addr
2 12
Decode
Op
IR
4
Timing and
Control
OP=store
5 6 7
Bus
0
ACC to bus load MDR
CS
CS, R/W
10 11 MAR MDR
OP=load
Execute
ACC 1
MDR to bus load ACC
MDR to bus
ALU to ACC
ALU op load ACC
ALU
Memory
8
13 14
76 CSE360
•What is necessary to implement a new instruction?
•New states?
•New control signals?
•New fetch/execute cycle?
•An Example:
•SWAP
Exchange value in Accumulator with value at
Address
•SWAP addr ! Acc <- #M[addr], M[addr] <- #Acc
CSE360 77
What changes to fetch/execute cycle?
– The fetch part of the cycle usually remains the same.
– Recall the values stored in registers after each state
E.g., After State 6,
what values are in each register?
– PC
– MAR
– MDR
– IR
– ACC
Handy to have #M[addr] in MDR
– Start after state 6 then… .
ACC to bus load MDR
CS
PC to bus load MAR
INC to PC load PC
CS, R/W
MDR to bus load IR
Addr to bus load MAR
OP=store
MDR to bus load ACC
CS, R/W
OP=load
Fetch
MDR to bus
ALU to ACC
ALU op load ACC
Execute
CSE360 78
Save the Data value from the MDR in the Address Register.
INC
MDR -> bus
Load MAR
Addr
Decode
Op
PC 3
Timing and
Control
IR
2 12
9
4 Bus
5 6 7
0
MAR MDR 10 11
ACC 1
ALU
Memory
8
13 14
79 CSE360
Send the ACCumulator value to the Data Register.
INC
ACC -> bus load MDR
Addr
Decode
Op
PC 3
Timing and
Control
IR
2 12
9
4 Bus
5 6 7
0
MAR MDR 10 11
ACC 1
ALU
Memory
8
13 14
80 CSE360
Put the saved value from the MAR into the ACCumulator.
INC
Addr
MAR->bus load ACC
Decode
3 PC
Op
Timing and
Control
IR
Note: there is no control signal in the current architecture opposite of
5 (Load MAR), so we would have to create a new control signal
(MAR to bus) in addition to creating these new states.
0
ACC
9
1
8
2 12
10 11
ALU
4
5
MAR
Memory
6 7
MDR
Bus
13 14
81 CSE360
Put (reload) the address from the instruction in the Address Register.
INC
Addr -> bus load MAR
Addr
Timing and
Control
3 PC
Decode
Op
9
2 12
IR
4
6 7
Bus
5
0
MAR MDR 10 11
ACC 1
ALU
Memory
8
CSE360
13 14
82
New State 13 (Old 5): Control Signals 13
Write the data from the Data Register to the address stored in the MAR.
INC
Addr
CS
Decode
Op
Timing and
Control
3 PC
IR
2 12
9
4 Bus
5 6 7
0
MAR MDR 10 11
ACC 1
ALU
Memory
8
CSE360
13 14
83
Changes to States, added 9 thru 13
Changes to Signals, added 15: MAR -> bus
Changes to Fetch/Execute, new register transfer language (RTL)
PC -> bus, load MAR, INC -> PC, Load PC
CS, R/w
MDR -> bus, load IR
Addr -> bus, load MAR
CS, R/w
MDR -> bus, load MAR
ACC -> bus, load MDR
MAR -> bus, load ACC
Addr -> bus, load MAR
CS
CSE360 84
RISC vs. CISC
– Complex Instruction Set Computer (CISC): many, powerful instructions. Grew out of the need for high code density . Instructions have varying lengths, number of operands, formats, and clock cycles in execution.
– Reduced Instruction Set Computer (RISC): fewer, less powerful, optimized instructions. Grew out of opportunity for simpler, faster hardware. Instructions have fixed length, number of operands, formats, and similar number of clock cycles in execution.
CSE360 85
Motivation: memory is comparatively slow.
– 10x to 20x slower than processor.
– Need to minimize number of trips to memory.
Provide faster storage in the processor -registers.
Registers (16, 32, 64 bits wide) are used for intermediate storage for calculations, or repeated operands.
Accumulator machine
– One data register -- ACC.
– 2 memory accesses per instruction -- one for the instruction and one for the operand.
Add more registers (R0, R1, R2, …, Rn)
CSE360 86
How many addresses to specify?
– With binary operations, need to know two source operands, a destination, and the operation.
E.g., op (dest_operand) (src_op1) (src_op2)
– Based on number of operands, could have:
3 addr. machine: both sources and dest are named.
2 addr. machine: both sources named, dest is a source.
1 addr. machine: one source named, other source and dest. is the accumulator.
0 addr. machine: all operands implicit and available on the stack .
CSE360 87
1-address architecture: a:=a b+c d e
– Memory only
Using registers
Code
LOAD 100
MPY 104
STORE 100
LOAD 108
MPY 112
MPY 116
ADD 100
STORE 100
# mem refs
2
2
2
2
2
2
2
2
Code
LOAD 100
MPY 104
STORE R2
LOAD 108
MPY 112
MPY 116
ADD R2
STORE 100
# mem refs
2
2
2
2
2
1
1
2
1½-address architecture: at least one operand must always be a register. (½ address is register, 1 address is the memory operand: LOAD 100, R1).
– Like an accumulator machine, but with many accumulators.
CSE360 88
3-address architecture: a:=a b+c d e
– Using memory only:
Code
MPY 100, 100, 104 ;a:=a b
MPY 200, 108, 112 ;t:=c d
MPY 200, 116, 200 ;t:=e t
ADD 100, 200, 100 ;a:=t+a
– Using registers:
Code
MPY R2, 100, 104 ;t1:=a b
MPY R3, 108, 112 ;t2:=c d
MPY R3, 116, R3 ;t2:=e t2
ADD 100, R3, R2 ;a:=t1+t2
# mem refs
# mem refs
Memory
100 (a)
104 (b)
108 (c)
112 (d)
116 (e)
...
200 (t)
– What about instruction size?
CSE360 89
2-address architecture: a:=a b+c d e
– Using memory only:
Code # mem refs
MPY 100, 104 ;a:=a b
MOVE 200, 108 ;t:=c
MPY 200, 112 ;t:=t d
MPY 200, 116 ;t:=t e
ADD 100, 200 ;a:=t+a
– Using registers:
Code
MPY 100, 104 ;a:=a b
MOVE R2, 108 ;R2:=c
MPY R2, 112 ;R2:=R2 d
MPY R2, 116 ;R2:=R2 e
ADD 100, R2 ;a:=t+a
4
3
4
4
4
# mem refs
4
2
3
2
2
Memory
100 (a)
104 (b)
108 (c)
112 (d)
116 (e)
...
200 (t)
– Most CISC arch. this way, making 1 operand implicit
CSE360 90
0-address architecture: a:=a b+c d e
– Stack machine: All operands are implicit. Only push and pop touch memory. All other operands are pulled from the top of stack, and result is pushed on top.
E.g., HP calculators.
Code
PUSH A
PUSH B
MPY
PUSH C
PUSH D
PUSH E
MPY
MPY
ADD
POP A
# mem refs
2
1
1
1
2
2
2
2
2
1
4
3
2
1
0
Stack
CSE360 91
Load/Store Architectures -- RISC
– Use of registers is simple and efficient. Therefore, the only instructions that can access memory are load and store . All others reference registers.
Code
LOAD R2, 100 ;R2 a
LOAD R3, 104 ;R3 b
LOAD R4, 108 ;R4 c
LOAD R5, 112 ;R5 d
LOAD R6, 116 ;R6 e
MPY R2, R2, R3 ;R2 a b
MPY R3, R4, R5 ;R3 c d
MPY R3, R3, R6 ;R3 (c d) e
ADD R2, R2, R3 ;R2 a b+(c d) e
STORE 100, R2 ;a a b+(c d) e
# mem refs
2
1
1
2
1
1
2
2
2
2
CSE360 92
Why load/store architectures?
– Number of instructions (hence, memory references to fetch them) is high, but can work without waiting on memory.
– Claim: overall execution time is lower. Why?
Clock cycle time is lower (no micro code interpretation).
More room in CPU for registers and memory cache.
Easier to overlap instruction execution through pipelining .
– Side effects:
Register interlock: delaying execution until memory read completes.
Instruction scheduling: rearranging instructions to prevent register interlock (loads on SPARC) and to avoid wasting the results of pipelined execution (branches on SPARC).
CSE360 93
SPARC ( S calable P rocessor ARC hitecture)
– Used in Sun workstations, descended from RISC-II developed at UC Berkeley
– General Characteristics:
32-bit word size (integer, address, register size, etc.)
Byte-addressable memory
RISC load/store architecture, 32-bit instruction, few addressing modes
Many registers (32 general purpose, 32 floating point, various special purpose registers)
– ISEM: Instructional SPARC Emulator - nicer than a real machine for learning to write assembly language programs.
CSE360 94
Structure
– Line oriented: 4 types of lines
Blank - Ignored
Labeled -
– Any line may be labeled. Creates a symbol in listing. Labels must begin with a letter (other than ‘L’), then any alphanumeric characters. Label must end with a colon “ : ”. Label just assigns a name to an address.
Assembler Directives - E.g., .data
.word .text
, etc.
Instructions
– Comments start after “ !
” character and go to the end of the line.
.data
x_m: .word 0x42 y_m: .word 0x20 z_m: .word 0 start:
.text
set x_m, %r2 ld [%r2], %r2 set y_m, %r3 ld [%r3], %r3
! Load x into reg 2
! Load y into reg 3
CSE360 95
Directives: Instructions to the assembler
– Not executed by the machine
.data
-- following section contains declarations
– Each declaration reserves and initializes a certain number of bits of storage for each of zero or more operands in the declaration.
• .word -- 32 bits
• .half -- 16 bits
• .byte -- 8 bits
E.g.,
.data
w: .half 27000 x: .byte 8 y: .byte ’m’, 0x6e, 0x0, 0, 0 z: .word 0x3C5F
.text
-- following section contains executable instructions
CSE360 96
Registers -- 32 bits wide
– 32 general purpose integer registers, known by several names to the assembler
%r0-%r7 also known as %g0-%g7 global registers -- Note,
%r0 always contains value 0.
%r8-%r15 also known as %o0-%o7 output registers
%r16-%r23 also known as %l0-%l7 local registers
%r24-%r31 also known as %i0-%i7 input registers
Use the %r0-%r31 names for now. Other names are used in procedure calls.
– 32 floating point registers %f0-%f31 . Each reg. is single precision. Double prec. uses reg. pairs.
CSE360 97
Assembly language
– 3-address operations - format different from book op src1, src2, dest !opposite of text
E.g., add %r1, %r2, %r3 !%r3 %r1 + %r2 or %r2, 0x0004, %r2 !%r2 %r2 b-w-or 0x0004
– Contrast SPARC with MiPs (used in the book)
indirect address notation: @addr vs [addr]
operand order, especially the destination register
register notation: R2 vs. %r2
branches
CSE360 98
CSE360
– 2-address operations: load and store ld [addr], %r2 ! %r2
M[addr] st %r2, [addr] ! M[addr] %r2
Often use set to put an address (a label, a symbolic constant) into a register, followed by ld to load the data itself.
set x_m, %r1 !put addr x_m into %r1 ld [%r1],%r2 !use addr in %r1 to load %r2
– Immediate values: instruction itself contains some data to be used in execution.
99
CSE360
– Immediate values (continued)
E.g., add %rs, siconst
13
, %rd !%rd
%rs+const
Constant is coded into instruction itself, therefore available after fetching the instruction (no extra trip to memory for an operand).
On SPARC, no special notation for differentiating constants from addresses because no ambiguity in a load/store architecture.
Immediate value coded in 13 bit sign-extended value. Range is, then, -2 12 …2 12 -1 or -4096 to 4095.
Immediate values can be specified in decimal, hexadecimal, octal, or binary.
E.g., add %r2, 0x1A, %r2 ! %r2
%r2 + 26
100
CSE360
– Synthetic Instructions: assembler translates one
“instruction” into several machine instructions.
set : used to load a 32-bit signed integer constant into a register. Has 2 operands - 32 bit value and register number.
How does that fit into a 32 bit instruction?
E.g., set iconst
32
, %rd set -10, %r3 set x_m, %r4 set ’=’, %r8
clr %rd : used to set all bits in a register to 0. How?
mov %rs, %rd : copies a register.
neg %rs, %rd : copies the negation of a register.
101
CSE360
– Operand sizes
double word = 8 bytes, word = 4 bytes, half word = 2 bytes, byte = 8 bits. Recall memory alignment issues.
set x_m, %r2 !Put addr x_m in %r2 ld [%r2], %r1 !load word ldsb [%r2], %r1 !load byte, sign extended ldub [%r2], %r1 !load byte, extend with 0’s
– st %r1, [%r2] !store word, addr is mult of 4 stb %r1, [%r2] !store byte, any address sth %r1, [%r2] !store half word, address is even
Characters use 8 bits
ldub to load a character stb to store a character
102
CSE360
– Traps : provides initial help with I/O, also used in operating systems programming.
ta 0 : terminate program
ta 1 : output ASCII character from %r8
ta 2 input ASCII character into %r8
ta 4 : output integer from %r8 in unsigned hexadecimal
ta 5 : input integer into %r8, can be decimal, octal, or hex
E.g., set ’=’, %r8 !put ’=’ in %r8 ta 1 !output the ’=’ ta 5 !read in value into %r8 mov %r8, %r1 !copy %r8 into %r1 set 0x0a, %r8 !load a newline into %r8 ta 1 !output the newline
103
CSE360
– More assembler directives (.asciz and .ascii):
Each of the following two directives is equivalent:
– msg01: .asciz "a phrase"
– msg01: .byte 'a', ' ', 'p', 'h', 'r'
.byte 'a', 's', 'e', 0
Note that .asciz generates one byte for each character between the quote (") marks in the operand, plus a null byte at the end.
The .ascii directive does not generate that extra byte. Each of the following three directives is equivalent:
– digits: .ascii "0123456789"
– digits: .byte '0', '1', '2', '3', '4', '5'
.byte '6', '7', '8', '9'
– digits: .byte 0x30, 0x31, 0x32, 0x33, 0x34
.byte 0x35, 0x36, 0x37, 0x38, 0x39
104
CSE360
– Quick review of instructions so far:
ld [addr], %rd ! %rd
M[addr]
st %rd, [addr] ! M[addr] %r2
op %rs1, %rs2, %rd ! op is ALU op op %rs, siconst
13
, %rd ! %rd %rs op const set siconst
32
, %rd ! %rd
const ta # ! trap signal
– Have actually seen many more variants, e.g., ldub , ldsb , sth , clr , mov , neg , add , sub , smul , sdiv , umul , udiv , etc. Can evaluate just about any simple arithmetic expression.
105
.data
x_m: .word 0xa1b2c3d4
.skip 12
.text
set x_m, %r2 ld [%r2], %r3 ldsb [%r2], %r4 ldub [%r2], %r5 st %r3, [%r2+4] sth %r3, [%r2+8] stb %r3, [%r2+12] ta 0
After this runs, what values are in %r2-5, and memory locations starting at byte address x_m?
CSE360 106
In addition to sequential execution, need ability to repeatedly and conditionally execute program fragments.
– High level language has: while , for , do , repeat , case , if-then-else , etc.
– Assembler has if , goto .
– Compare: high level vs. pseudo-assembler, implementation of f=n!
f = 1; i = 2; while (i <= n)
{ f = f * i; i = i + 1;
} f = 1 i = 2 loop: if (i > n) goto done f = f * i i = i + 1 goto loop done: ...
CSE360 107
CSE360
– Branch -- put a new address in the program counter.
Next instruction comes from the new address, effectively, a “goto”.
– Unconditional branch
(book)
(SPARC)
BRANCH addr ! PC addr ba addr ! PC addr
– Conditional branch
(book) BRcc R1, R2, target
“if R1 cc R2 then PC target” and cc is comparison operation (e.g., LT is
<
, GE is
, etc.)
108
CSE360
– Evaluating conditional branches
Evaluate condition
If condition is true, then
PC
target, else PC
PC+1
– Consider changes to the fetch-execute cycle given earlier for accumulator machine. What needs to change?
PC to bus, etc.
Addr to bus, load
PC
Yes
Yes
Yes
Cond=T
No
OP=
BRANCH
No
OP=BRcc
No
Fetch
Execute
109
CSE360
Other conditions (from text, very similar to MIPS)
BRLT Rn, Rm, target
BRLE Rn, Rm, target
BREQ Rn, Rm, target
BRNE Rn, Rm, target
BRGE Rn, Rm, target
BRGT Rn, Rm, target
; if Rn
<
Rm then PC target
; if Rn
Rm then PC target
; if Rn
Rm then PC target
; if Rn
Rm then PC target
; if Rn Rm then PC target
; if Rn Rm then PC target
Can implement high level control structures now. Back to the factorial example using the book’s assembly language:
LOAD R1, #1 ; R1 = f = 1
LOAD
LOAD loop: BRGT
MPY
ADD
R2,
R3,
R2,
R1,
R2,
BRANCH loop done: STORE f,
#2 n
R3,
R1,
R2,
R1 done
R2
#1
; R2 = i = 2
; R3 = n
; branch if i > n
; f = f * i
; i = i + 1
; goto loop
; f = n!
110
CSE360
– Condition Codes
Book’s assembly language has 3-address branches. SPARC uses 1-address branches. Must use condition codes.
Non-MIPS machines use condition codes to evaluate branches.
Condition Code Register (CCR) holds these bits. SPARC has
4-bit CCR.
N Z V C
N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a trace , or in the reg command under ISEM.
Condition codes are not changed by normal ALU instructions.
Must use special instructions ending with cc , e.g., addcc .
111
.text
start: set 1, %r2 set 0xFFFFFFFE, %r1 cc_set: subcc %r1, %r2, %r3 end: ta 0
! –2 in 32-bit 2’s comp
! r3<= -2-1
ISEM> reg
----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---
G 00000000 fffffffe 00000001 00000000 00000000 00000000 00000000 00000000
O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
PC: 08:00002028 nPC: 0000202c PSR: 0000003e N:0 Z:0 V:0 C:0 cc_set : subcc %g1, %g2, %g3
ISEM> trace
----0--- ----1--- ----2--- ----3--- ----4--- ----5--- ----6--- ----7---
G 00000000 fffffffe 00000001 fffffffd 00000000 00000000 00000000 00000000
O 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
L 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
I 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
PC: 08:0000202c nPC: 00002030 PSR: 00b0003e N:1 Z:0 V:0 C:0
CSE360 112
CSE360
– Setting the condition codes
Regular ALU operations don’t set condition codes.
Use addcc , subcc , smulcc , sdivcc , etc., to set condition codes.
E.g., Suppose %r1 contains -4 and %r2 contains 5.
N Z V C addcc %r1, %r2, %r3 subcc %r1, %r2, %r3 subcc %r2, %r1, %r3 subcc %r1, %r1, %r3
113
1
How does a computer add?
– Design a circuit that adds three single digit binary
C in numbers. Results in a sum, and a carry out.
x y
X Y Sum C out c in
0 0 0 0 0
1 0 0 1 0
0 0 1 1
1 0 1 0
0
1 c out
0 1 0 1
1 1 0 0
0
1
0 1 1 0
1 1 1 1
1
1
Sum c out x y
FA
Sum
CSE360 114 c in
Now cascade the full adder hardware register x register y c out
FA FA FA FA FA 0 register z
How are CCR bits set? (Above is a ripple-carry adder.)
– C-bit = C out
– V-bit = C out
C n-1
– Z-bit =
(rz n-1
– N-bit = rz n-1
rz n-2
rz n-3
...
rz
0
)
CSE360 115
CSE360
– Branches use logic to evaluate CCR (SPARC)
Operation
Branch always
Branch never
Branch not equal
Branch equal
Branch greater
Branch less or equal
Branch greater or equal
Branch less
Branch greater, unsigned
Branch less or equal, unsigned
Branch carry clear
Branch carry set
Branch positive
Branch negative
Branch overflow clear
Branch overflow set
Assembler Syntax Branch Condition ba bn bne be bg ble target target target target target target bge bl target target bgu target bleu target bcc bcs target target bpos target bneg target bvc target bvs target
1 (always)
0 (never)
Z
Z
(Z
(N
V))
(Z
(N
V))
(N
V)
N
V
(C
Z)
C
Z
C
C
N
N
V
V
116
CSE360
– Setting Condition Codes (continued)
Synthetic instruction cmp %rs1, %rs2
– Sets CCR, but doesn't modify any registers.
– Implemented as subcc %rs1, %rs2, %g0
Back to the factorial example (SPARC) set 1, %r1 set 2, %r2 set n, %r3 ld [%r3], %r3
! %r1 = f = 1
! %r2 = i = 2
! Get loc of n
! Put n in %r3 loop: done: cmp %r2, %r3 bg done nop umul %r1, %r2, %r1 add %r2, 1, %r2 ba loop nop set f, %r3 st %r1, [%r3]
! Set CCR (i?n)
! i > n done
! Branch delay
! f = f * i
! i = i + 1
! Goto loop
! Branch delay
! Get loc of f
! f = n!
117
CSE360
– Branch delay slots: unique to RISC architecture
Non-technical explanation: processor is running so fast, it can’t make a quick turn.
– Instruction following branch is always executed.
Technical explanation: the efficiency advantage of pipelining is greater if the following instruction, which has almost completed execution, is allowed to complete.
Compilers take advantage of branch delay slots by putting a useful instruction there if possible.
For our purposes, use the nop (no operation) instruction to fill branch delay slots. Beware! Forgetting the nop will be a large source of errors in your programs!
118
Converting high level control structures
– You get to be the “compiler”.
Some compilers convert the source language (C, Pascal,
Modula 2, etc.) into assembly language and then assemble the result to an object file. GNU C, C++ do this to GAS (Gnu
Assembler).
– if-then-else, while-do, repeat-until are all possible to create in a structured way in assembly language.
CSE360 119
General guidelines
– Break down into independent (or nested) logical units
– Convert to if/goto pseudo-code.
f = 1; for (i=2; i<=n; i++) f = f * i; f=1 i=2 loop: if (i>n) goto done f = f*i i = i+1 goto loop done: ...
– Mechanical, step-by-step, non-creative process
CSE360 120
CSE360
if-then-else if (a<b) c = d + 1; else c = 7; init: set a, %r2 ! get &a into r2 ld [%r2], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 if: cmp %r2, %r3 ! a ?? b (want >=) bge else ! a >= b, do then nop set d, %r5 ! get &d into r5 ld [%r5], %r5 ! get d into r5 add %r5, 1, %r4 ! r4 <- d+1 ba end nop else: set 7, %r4 ! get 7 into r4 end: set c, %r5 ! get &c into r5 st %r4, [%r5] ! c <- r4 if/goto if (a >= b) goto else c = d + 1 goto end else: c = 7 end:
121
while loops: while (a<b) a = a+1; c = d; init: set a, %r4 ! get &a into r4 ld [%r4], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 whle: cmp %r2, %r3 ! a ?? b (want >=) bge done ! a >= b skip body nop body: add %r2, 1, %r2 ! r2 = a + 1 st %r2, [%r4] ! a = a + 1 ba whle ! repeat loop body nop done: set c, %r5 ! get &c into r5
...
if/goto: whle: if (a>=b) goto done body: a = a+1 goto whle done: c = d
CSE360 122
CSE360
repeat-until loops: repeat
… until (a>b) rpt: ...
...
set a, %r2 ; get &a into r2
ld [%r2], %r2 ; get a into r2
set b, %r3 ; get &b into r3
ld [%r3], %r3 ; get b into r3
cmp %r2, %r3 ; a <= b?
ble rpt ; do body again nop if/goto: repeat:
… if (a<=b) goto repeat
123
Complex condition if((a<b)and(b>=c))
…
Primitive Language
if (a>=b) then goto skip
if (b<c) then goto skip body: ...
...
skip: ...
if((a<b)or(b>=c))
…
These can be combined and used in if/else or while loops.
Primitive Language
if (a<b) then goto body
if (b<c) then goto skip body: ...
...
skip: ...
CSE360 124
CSE360
– Optimizing code: change order of instructions, combine instructions, take advantage of branch delay slots.
Factorial example again. ( for i:=n downto 1 do… ) loop: set 1, %r1 set n, %r2 ld [%r2], %r2
! %r1=f=1
! Get loc of n
! Put n in %r2 umul %r1, %r2, %r1 ! f=f*n subcc %r2, 1, %r2 ! Decrement n bg loop ! Repeat nop set f, %r3 st %r1, [%r3]
! Branch delay
! Get loc of f
! f=n!
Reduced 7 instructions in loop to just 4.
(You gain no advantage if you optimize code in your labs.)
125
Remember lab0?
.data
x_m: .word 0x42 y_m: .word 0x20 z_m: .word 0 start:
.text
set x_m, %r2 ld [%r2], %r2 set y_m,%r3 ld [%r3], %r3 and so on…
Suppose you gave this command to ISEM (after loading):
ISEM> dump start start 05 00 00 10 84 10 a0 00 c4 00 80 00 07 00 00 10
Could you find the set instruction?
CSE360 126
First, Instruction Encoding is how instructions are assembled
– All instructions must fit into 32 bits.
Register-register: op=10, i=0
31 30 29 25 24 19 18 14 13 12 op rd op3 rs1 i asi
5 4 rs2
Register-immediate: op=10, i=1 op rd op3 rs1 i simm13
Floating point: op=10, i=0 op rd op3 rs1 i opf rs2
CSE360 127
CSE360
Call instructions: op=01
31 30 29 op disp30
Branch instructions: op=00, op2=010
31 30 29 28 25 24 22 21 cond op2 disp22
SETHI instructions: op=00, op2=100 op rd op2 imm22
Ex.: add %r2, %r3, %r4
31 30 29 25 24 19 18 14 13 12
10 00100 000000 00010 0 00000000
5 4
00011 in hexadecimal: 88008003
128
Usually used to put the value of an address in memory into a register.
For example, set 0x4004, %r3
C an do neither ‘add %r0, 0x4004, %r3’ nor ‘or %r0, 0x4004, %r3’. Why not?
SET is a synthetic instruction which may be implemented in two steps.
bit positions 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
#1 sethi 0x10, %r3 ! Puts 0x10 in the Most Significant 22 bits
%r3 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 hex value
0 0x12481248
0x10 sethi
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 x x x x x x x x x x 0x10
%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x4000
#2 or %r3, 0x0004, %r3 ! Puts 0x0004 in the least significant bits
%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0x4000
0x0004
OR
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0x00000004
%r3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0x4004
Machine language encoding for 'set 0x4004, %r3' sethi 0x10, %r3 or %r3, 4, %r3
0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0x 07 00 00 10
1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0x 86 10 E0 04
CSE360 129
05 00 00 10
16
0000 0101 0000 0000 0000 0000 000 1 0000
2
Instruction Group (bits 30:31) = 00
Destination Register (bits 25:29) = 00010
Op Code (bits 22:24) = 100
Constant (bits 0:21) = 00000000000000000 10000
Meaning: sethi 0x10, %r2
%r2 <-- 00000000000000000 10000 0000000000 (0x4000)
CSE360 130
Binary
84 10 A0 00 1000 0100 0001 0000 1010 0000 0000 0000
Group O
P
Rd Rs1 Rs2 SICONST
C4 00 80 00
07 00 00 10
86 10 E0 04
CSE360 131
set iconst, rd sethi or sethi or
%hi (iconst), rd rd, %lo (iconst), rd
--or--
%hi (iconst), rd
%g0 , iconst, rd
--or--
CSE360 132
Bit Manipulation Instructions
– Bitwise logical operations
and %rs1, %rs2, %rd
10010011… (32 bits)
01111001…
or %rs1, %rs2, %rd
10010011… (32 bits)
01111001…
xor %rs1, %rs2, %rd
10010011… (32 bits)
01111001… x y x y
0 0 0
0 1 0
1 0
1 1
0
1 x y x+y
0 0 0
0 1 1
1 0
1 1
1
1 x y x y
0 0 0
0 1
1 0
1 1
1
1
0
CSE360 133
CSE360
andn %rs1, %rs2, %rd
10010011… (32 bits)
01111001… x y x y
0 0 0
0 1
1 0
1 1
0
1
0
orn %rs1, %rs2, %rd
10010011… (32 bits)
01111001… x y x y
0 0 1
0 1
1 0
1 1
0
1
1
not %rs, %rd
10010011… (32 bits) x x
0 1
1 0
Recall the cc operations, so andcc , orcc , etc. are available.
(However, there is no notcc ; use xnorcc .)
134
CSE360
For what kinds of things are these bit level operations used?
Recall the synthetic operation clr , and mov .
clr %r2 mov %r2, %r3
or %r0, %r0, %r2 or %r0, %r2, %r3
Masking operations: Want to select a bit or group of bits from a set of 32. E.g., convert lower (or upper) to upper case:
‘ a ’ in binary is 01100001
‘ A ’ in binary is 01000001
All we need to do is “turn off” the bit in position 5.
and %r1, 0b11011111, %r1 will turn off that bit!
What if we subtract 32 (0b100000) from %r1?
What about converting upper to lower case?
135
CSE360
– Bitwise shifting operations
Shift logical left: sll %rs1, %rs2, %rd
%rs1 : data to be shifted
%rs2 : shift count
%rd : destination register
E.g., set 0xABCD1234, %r2 sll %r2, 3, %r3
%r2: 1010 1011 1100 1101 0001 0010 0011 0100
%r3: 0101 1110 0110 1000 1001 0001 1010 0000
sll is equivalent to multiplying by a power of 2 (barring overflow). (In the decimal system, what’s a shortcut for multiplying by a power of ten?)
136
CSE360
Shift Logical Right: srl %rs1, %rs2, %rd
– Shifts right instead of left, inserting zeros.
Arithmetic shifts: propagate the sign bit when shifting right, e.g., sra . (Left shift doesn't change.)
– Almost equivalent to dividing by a power of 2.
Rotating shifts: Bits that would have gone into the bit bucket are shifted in instead. (E.g., rr, rl)
Rotate Right
–
Rotate not implemented in SPARC
Rotate Left
137
Assembler directives
Are not encoded as machine instructions
Memory alignment: .align 4
– Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment
Reserve bytes of space: .skip 20
– Useful for allocating large amounts of space (e.g., arrays)
Create a symbolic constant: .set mask, 0x0f
– Can now use the word “mask” anywhere we could use the constant 0x0f previously
All this is leading to additional addressing modes , which help us work with pointers, arrays, and records in assembly language.
CSE360 138
Addressing Modes
– How do we specify operand values?
In a register, location is encoded in the instruction.
As a constant, immediate value is in the instruction.
In memory, operand is somewhere in memory, location may only be known at runtime.
– Memory operands:
Effective address: actual location of operand in memory. This may be calculated implicitly (e.g., by a displacement in the instruction) or may be calculated by the programmer in code.
CSE360 139
– Summary of addressing modes:
Mode
Immediate
Register Direct
Memory Direct
Example add %r1, 100, %r1
Loc. Of Operand instruction add %r1, %r2, %r1
%r2 add %r1, [2000], %r2 mem[2000]
Suitable for
Constants
SPARC?
Yes
Integers, constants Yes
Integers, constants No
Memory Indirect add %r1, [[2000]], %r2 mem[mem[2000]] Pointers
Register Indirect ld [%r1], %r2 mem[%r1] Pointers
Register Indexed st %r1, [%r2+%r3]
Register st %r1, [%r2+x] mem[%r2+%r3] mem[%r2+x]
Arrays
Records
No
Yes
Yes
Yes
Displaced
Post Increment ld [%r1]+, %r2 mem[%r1] increment %r1
Arrays, strings, stacks
No
Pre Decrement ld -[%r1], %r2
Arrays, strings, stacks
No decrement %r1, mem[%r1]
CSE360 140
CSE360
– Memory Direct addressing
Entire address is in the instruction (not in SPARC).
E.g., accumulator machine: each instruction had an opcode and a hard address in memory.
– Can’t be done on SPARC because an address is 32 bits, which is the length of an instruction. No room for opcodes, etc. Can be done in CISC because multi-word instructions are permitted.
– Memory Indirect addressing
Pointer to operand is in memory. Instruction specifies location of pointer. Requires three memory fetches (one each for instruction, pointer, and data). Not in RISC machines because instruction is too slow; such an instruction would cause its own register interlock!
141
CSE360
– Register Indirect addressing
Register has address of operand (a pointer). Instruction specifies register number, effective address is contents of register.
Ex.:
.data
n_m: .word 5 ; initialize n to 5
.text
set n_m, %r1 ; %r1 has n_m, pointer to n ld [%r1], %r3 ; fetch n into %r3
142
Ex.: sum up array of integers: n_m: a_m:
.data
.word 5 sum_m: .word 0 b_m: .skip 5*4
! Size of array
.word 4,2,5,8,3 ! 5 word array
! Sum of elements
! another 5 word array loop:
.text
clr %r2 set n_m, %r3 ld [%r3], %r3 set a_m, %r4 ld [%r4], %r5
! r2 will hold sum
! r3 points to n
! r3 gets array size
! r4 points to array a
! Load element of a into r5 add %r5, %r2, %r2 ! sum = sum + element add %r4, 4, %r4 ! Incr ptr by word size subcc %r3, 1, %r3 ! Decrement counter bg loop nop set sum_m, %r1 ! r1 points to sum st %r2, [%r1] ta 0
! Loop until count = 0
! Branch delay slot
! Store sum
! done
5 n_m
4 a_m
2 a_m+4
5 a_m+8
8
3 a_m+12 a_m+16 sum_m r2 r3 r4 r5
0 5
4
3
2
1 a_m a_m+4 a_m+8 a_m+12 a_m+16 loop loop+1 loop+2 loop+3 loop+4
CSE360 143
C-style example of pointer data type char x; // object of type character char * ptr; ptr = &x;
*ptr = ‘a’;
// pointer to character type
// ptr has address of x (points to x)
// store ‘a’ at address in ptr x_m: ptr_m: x_m r1 ptr_m r2
‘a’ r3
Assembly language equivalent
.data
x_m: .byte 0 ! reserve character space; x_m = &x; [x_m] = x
.align 4 ! align to word boundary ptr_m: .word 0 ! pointer variable; [ptr_m] = ptr
.text
set x_m, %r1 set ptr_m, %r2
! get address x_m into %r1
! get address ptr_m into %r2 st %r1, [%r2] set ’a’, %r3 set ptr_m, %r2 ld [%r2], %r1 stb %r3, [%r1]
! make [ptr_m] point to [x_m]
! put character ‘a’ into r3
! get address ptr_m into %r2
! get address [ptr_m], i.e. x_m, into %r1
! store ‘a’ at address [ptr_m], i.e., ptr
‘a’ x_m, i.e., addr of x
CSE360 144
– Register Indexed addressing
Suitable for accessing successive elements of the same type in a data structure.
Ex.: Swap elements A[i] and A[k] in array
.data
A: .skip 24*4 ! reserve array[0..23] of int
! assume i is in %r2 and k is in %r3
.text
set A, %r4 ! beginning of array ptr.
sll %r2, 2, %r2 ! “multiply” i by 4
sll %r3, 2, %r3 ! “multiply” k by 4
ld [%r2+%r4], %r7 ! r7 <- a[i]
ld [%r3+%r4], %r8 ! r8 <- a[k]
st %r8, [%r2+%r4] ! a[i] <- r8 r2 r3 r4 r7 r8
001 0010 A
100 1000
A
A+4
A+8
A+12 after sll
Effective address calculations!
CSE360 145
CSE360
Simulating Register Indirect addressing on SPARC
–
SPARC doesn't truly have register indirect addressing. We can write st %r2, [%r1] but assembler converts this automatically into st %r2, [%r1+%r0]
Array mapping functions: used by compilers to determine addresses of array elements. Must know upper bound, lower bound, and size of elements of array.
– Total storage = (upper - lower + 1)*element_size
– Address offset for element at index k = (k - lower)*element_size
– Address (byte) offset for A[3] = (3-0)*4 = 12
– This is for 1 dimensional arrays only!
146
CSE360
1D array mapping functions: Want an array of n elements, each element is 4 bytes in size, array starts at address arr .
– Total storage is
4n bytes
– First element is at arr+0
– Last element is at arr+4(n-1)
– k th ( k can range from 0…n-1 ) element is at arr+4k . Array uses zero-based indexing .
arr+0 k=0 arr+4 k=1 arr+8 k=2 arr+12 k=3 arr+16 k=4 array of 6 elements, 4 bytes each arr+20 k=5
147
CSE360
2D array mapping functions: must linearize the 2D concept; e.g., map the 2D structure into 1D memory.
0 1 2 3 4
0 0,0 0,1 0,2 0,3 0,4
1 1,0 1,1 1,2 1,3 1,4
3 Rows
(0...2)
2 2,0 2,1 2,2 2,3 2,4
5 Columns (0...4)
– Convert into 1D array in memory
0,0 0,1 0,2 0,3 0,4 1,0 1,1 .....
2,3 2,4
148
CSE360
2 ways to convert to 1D
– Row major order (Pascal, C, Modula-2) stores first by rows, then by columns. E.g.,
0,0 0,1 0,2 0,3 0,4 1,0 1,1 .....
2,3 2,4
– Column major order (FORTRAN) stores first by columns then by rows. E.g.,
0,0 1,0 2,0 0,1 1,1 2,1 0,2 .....
1,4 2,4
– Row major 2D array mapping function: Given an array starting at address arr that is x rows by y columns, each element is m bytes in size, and indices start at zero, then element (i, j) may be found at location: arr + (y
i + j)
m
149
CSE360
3D array mapping function: natural extension of 2D function.
Store by row, then column, then depth.
+1 +3 +5 +7 +9
0,0,1 0,1,1 0,2,1 0,3,1 0,4,1
+0 +2
1,0,1 1,1,1 1,2,1 1,3,1 1,4,1
0,0,0 0,1,0 0,2,0 0,3,0 0,4,0
+10
1,0,0
+12
2,0,1 2,1,1 2,2,1 2,3,1 2,4,1
0,1,0 1,1,0 1,2,0 1,3,0 1,4,0
2,0,0 2,1,0 2,2,0 2,3,0 2,4,0
3 Rows, 5 Columns, 2 Depth
– Array starting at arr with x rows, y columns, depth z , m element size. Element (i, j, k) is found at location: arr + (z
( y
i + j) + k)
m
150
CALCULATE: total storage offset for A(i,j,k) address for A(i,j,k)
1D element size (#bytes) 4
# rows (x)
# cols (y)
7
1
# depth (z) starting addr (0) i= j= k=
1
0
1
4
0
1
1
1
8
0
3
5
2D
2
0
1
2
12
1
3
5
3D
1
CSE360 151
! Example that adds 1 to every element of columns 1 and 2, not 0, of a 5 by 3 array
.data
arr_m:
.set rows, 5 ! define symbolic constants
.set cols, 3
.skip rows * cols * 4
.text
! allocate space (.skip 60 same)
...
set arr_m, %r3 ! get address of array
%r1
%r1, rows done
! %r1 is i (row)
! done if i >= rows loop1: clr cmp bge nop set loop2: cmp bge nop inc2: inc1: done: umul add umul ld add st add ba nop inc ba nop
...
1, %r2
%r2, cols inc1
%r1, cols, %r4
%r4, %r2, %r4
%r4, 4, %r4
[%r3+%r4], %r5
%r5, 1, %r5
%r5, [%r3+%r4]
%r2, 1, %r2 loop2
%r1 loop1
! %r2 is j (col); start at one (skip col zero)
! if at last column, done with row
! # elements to skip for current row
! then which column being accessed
! change from element to byte offset
! get arr[i][j]
! add 1 to the element value
! store it back to arr[i][j]
! next column
! continue inner loop over columns
! next row
! continue outer loop over rows
CSE360 152
CSE360
– Displacement Addressing
Suitable for accessing the individual fields of record data structures. Each field can be of a different type.
Name
Age
DOB
20 Characters
Integer
Integer
Logical view of a record
Use .set
directive to establish offsets to fields within records.
Then use displacement addressing to access those fields.
Actual layout of record in memory person+0
20 bytes 4 bytes person+20
4 bytes person+24
153
Ex.: Add 1 to the age field in a person record
.data
.set name, 0 ! offset to name field
.set age, 20 ! offset to age field
.set dob, 24 ! offset to date of birth person: .skip 28 ! size of a person record
.text
....
set person, %r1 ! get addr of person record
ld [%r1+age], %r2 ! get the age of the person
add %r2, 1, %r2 ! increment age by 1
st %r2, [%r1+age] ! store back to record
Problem: alignment in memory. May have to waste some space in the person record in order to have the integer fields align on a word boundary.
CSE360 154
CSE360
– Auto-increment and Auto-decrement addressing
SPARC does not support these modes. They may be simulated using register indirect addressing followed by an add or subtract of the size of the element on that register.
Useful for traversing arrays forward (auto-increment) and backward (auto-decrement). Also useful for stacks and queues of data elements.
155
CSE360
– Subroutines and subroutine linkage
Subroutines: programming mechanism to facilitate repeated computations and modularization.
– Use of subroutines
Basis for structured and disciplined programming
Compact code (no need to write monolithic loops)
Relatively easy to debug (no cut-and-paste errors)
Requires little hardware support, mostly protocols and conventions to handle parameters.
156
CSE360
– Terminology
Caller: the code (which could be a subroutine itself) which invokes the subroutine of interest
Callee: the subroutine being invoked by the caller
Function: subroutine that returns one or more values back to the caller and exactly one of these values is distinguished as the return value
Return value: the distinguished value returned by a function
157
CSE360
– Terminology (continued)
Procedure: a subroutine that may return values to the caller
(through the subroutine’s parameter(s) ), but none of these values is distinguished as the return value
Return address: address of the subroutine call instruction
Parameters: information passed to/from a subroutine (a.k.a. arguments )
Subroutine linkage: a protocol for passing parameters between the caller and the callee
158
CSE360
– Subroutine linkage
Calling a subroutine
– Assembly language syntax for calling a subroutine call label nop
– Must change the program counter (as in a branch instruction) however, we must also keep track of where to resume execution after the subroutine finishes. Call instruction handles this atomically (i.e., without interruption ) by:
%r15 #PC
(PC
#nPC) nPC
label
Returning from a subroutine
– Assembly language syntax for returning from a subroutine retl nop
159
CSE360
Returning from a subroutine (continued)
– Again, must change the program counter to return to an instruction after the one that called the subroutine. The address of the instruction that called it was saved in %r15, and we must skip over the branch delay slot as well. So, this is accomplished by: nPC %r15+8
Parameter passing: 2 approaches
– Register based linkage: pass parameters solely through registers.
Has the advantage of speed, but can only pass a few parameters, and it won’t support nested subroutine calls. Such a subroutine is called a leaf subroutine.
– Stack based linkage: pass parameters through the run-time stack .
Not as fast, but can pass more parameters and have nested subroutine calls (including recursion).
160
CSE360
– Subroutine linkage :
Startup Sequence: load parameters and return address into registers, branch to subroutine.
Prologue: if non-leaf procedure then save return address to memory, save registers used by callee.
Epilogue: place return parameters into registers, restore registers saved in prologue, restore saved return address, return.
Cleanup Sequence: work with returned values
Caller
Startup
Sequence
Cleanup
Sequence ca ll re tl
Callee
Prologue
Body
Epilogue
161
CSE360
– Example: Print subroutine.
main:
.text
set set
1, %r1
3, %r2
%r1, %r8 print print: mov call nop mov call nop add call nop ta set or mov ta mov ta retl nop
%r2, %r8 print
%r1, %r2, %r8 print
0
‘0’, %r1
%r8, %r1, %r2
%r2, %r8
1
‘ \ n’, %r8
1
! Initialize r1 and r2
! Print %r1
! Print %r2
! Do our calculation
! Print the result (expect ‘4’)
! Ascii value of zero
! Treat r8 as parameter
! Move into output register
! Output character
! Output end of line (newline)
! Return
What’s wrong with the above code?
162
CSE360
– Which registers can leaf subroutines change?
Convention for optimized leaf procedures:
Register(s)
%r0
%r1
%r2-%r7
%r8
%r8-%r13
%r14
%r15
%r30
%r16-%r29, %r31
Use
Zero
Mentionable?
Yes
Temporary Yes
Caller’s variables No
Return value
Parameters
Stack pointer
Return address
Yes
Yes
No
Yes, but preserve
Frame pointer No
Caller’s variables No
The subroutine must not use the value in any other register except to save it to memory somewhere and restore it before returning to the caller.
Problem: how can a subroutine call another subroutine? How can a subroutine call itself?
163
– Example: procedure to print linked list of ints.
head 5 7 4 1
.data
.set dta, 0 ! offset in record to data
.set ptr, 4 ! offset in record to next pointer head: .word 0
.text
main: . . . . ! does all init and allocation of list
set head, %r8 ! prepare parameter to traverse proc
ld [%r8], %r8 ! follow head pointer to first node
call trav ! call subroutine
nop ! branch delay
. . . .
trav: mov %r8, %r1 ! copy pointer to %r1 loop: cmp %r1, 0 ! check for null pointer
be done ! null pointer means we are done
nop ! branch delay
ld [%r1+dta], %r8 ! follow pointer and get data field
ta 4 ! print data field
ld [%r1+ptr], %r1 ! get pointer to next record
ba loop
nop ! branch delay done: retl nop nil
CSE360 164
CSE360
– Review of parameter passing mechanisms:
Pass by value copy: parameters to subroutine are copies upon which the subroutine acts.
Pass by result copy: parameters are copies of results produced by the subroutine.
Pass by reference copy: parameters to subroutine are (copies of) addresses of values upon which the subroutine acts. Callee is responsible for saving each result to memory at the location referred to by the appropriate parameter.
Hybrid: some parameters passed by value copy, some by result copy, and/or some by reference copy. Callee is responsible for saving results for reference parameters.
165
CSE360
– Parameter passing notes:
Array or record parameters typically are passed by reference copy (efficiency reasons). Primitive data types may be passed either way.
Conventions among languages allows any language to call functions in any other language:
– Pascal: VAR parameters are passed by reference copy; all others are passed by value copy.
– C: all parameters are passed by value copy. Must explicitly pass a pointer if you want a reference parameter.
– C++: like Pascal, can pass by value or reference copy.
– FORTRAN: all things passed by reference copy (even constants).
– ADA: pass by value/result copy.
166
.text ! Example 10.1 of Lab Manual
! pr_str – print a null terminated string
! Parameters: %r8 – pointer to string (initially)
!
! Temporaries: %r8 – the character to be printed
! %r9 – pointer to string
!
pr_str: mov %r8, %r9 ! we need %r8 for the “ta 1” below pr_lp: ldub [%r9], %r8 ! load character cmp %r8, 0 ! check for null be pr_dn nop ta 1 ! print character ba pr_lp inc %r9 ! increment the pointer (in
! branch delay slot) pr_dn: retl nop
CSE360 167
CSE360
Summary from text (p. 220)
– Pass by value copy: For small “in” parameters. Subroutines cannot alter the originals whose copies are passed as parameters.
– Pass by value/result copy: For small “in/out” parameters.
Caller’s cleanup sequence stores values of any “in/out” parameters.
– Pass by reference copy: for “in/out” parameters of all sizes, and large “in” parameters. “Out” values are provided by changing memory at those addresses. (Note: pass by reference copy is passing an address by value copy.)
168
– Write Sparc code for the caller and callee for the following subroutine using register based parameter passing
! global_function Integer subchr (A, B, C)
! Substitutes character C for each B in string [A],
!
and returns count of changes.
!
! // In comments, "[A+index]" is denoted by "ch".
! index = 0
! count = 0
! LOOP: if [A+index]=0 go to END // while (ch != 0) {
! if [A+index] B go to INC // if (ch == B) {
! [A+index]=C // ch = C;
! count=count+1 // count++; }
! INC: index=index+1 // index++;
! go to LOOP // }
! END:
Assume
.data
C_m: .byte ’I’
B_m: .byte ’i’
! data section
! parameter C
! parameter B
A_m: .asciz "i will tip" ! parameter A
.align 4
R_m: .word 0 ! for storing result count
CSE360 169
Stack based linkage
– Advantages
Permits subroutines to call others.
Allows a larger number of parameters to be passed.
Permits records and arrays to be passed by value copy.
Saving of registers by callee is “built-in”.
A way for callee to reserve memory for other uses is “built-in”, too.
– Disadvantages
Slower than register based
More complex protocol
– Why a stack?
Subroutine calls and returns happen in a last-in first-out order (LIFO).
Also known as a runtime stack, parameter stack, or subroutine stack.
CSE360 170
CSE360
Items “saved” on the stack in one activation record
– Parameters to the subroutine
– Old values of registers used in the subroutine
– Local memory variables used in subroutine
– Return value and return address
Say A() calls B() , B() calls C() , and C() calls
A()
Runtime Stack
2nd stack frame for A
1st stack frame for C
1st stack frame for B
1st stack frame for A
Expanded View
Local variables
Saved general purpose registers
Return addresses
Return values
Parameters
171
– Stack based linkage parameter passing convention
Startup sequence:
– Push parameters
– Push space for return value
Prologue
– Push registers that are changed
(including return address)
– Allocate space for local variables
Epilogue
– Restore general purpose registers
– Free local variable space
– Use return address to return
Cleanup Sequence
– Pop and save returned values
– Pop parameters
Caller
Startup
Sequence
Cleanup
Sequence ca ll re tl
CSE360
Callee
Prologue
Body
Epilogue
172
CSE360
– Stack based parameter passing example:
Register %r14
%sp
stack pointer
–
Invariant: Always indicates the top of the stack (it has the address in memory of the last item on stack, usually a word).
– Moved when items are “pushed” onto the stack.
– Due to interruptions (system interrupts (I/O) and exceptions), values stored above %sp (at addresses less than %sp) can change at any time! Hence, any access above %sp is unsafe!
Register %r30
%fp
frame pointer
– Indicates the previous stack pointer. Activation record is from
(some subroutine-specific number of words before) the %fp to the %sp.
–
Invariant: %fp is constant within a subroutine (after prologue).
173
– Stack based parameter passing example:
Want to implement the following subroutine (also a caller):
! global_function Integer subchr (A, B, C)
! Substitutes character C for all B in string A,
!
and returns count of changes.
!
! // In comments, "*(A+index)" is denoted by "ch".
! index = 0
! count = 0
! LOOP: if *(A+index)=0 go to END // while (ch != 0) {
! if *(A+index) B go to INC // if (ch == B) {
! *(A+index)=C // ch = C;
! count=count+1 // count++; }
! INC: index=index+1 // index++;
! go to LOOP // }
! END:
.data
.align 4
R_m: .word 0
! data section
C_m: .byte ’I’
B_m: .byte ’i’
! parameter C
! parameter B
A_m: .asciz "i will tip" ! parameter A
! for storing result count
CSE360 174
.data
C_m: .word ’I’
B_m: .word ’i’
A_m: .asciz "i will tip"
.align 4 stack: .skip 250*4 bstak:
R_m: .word 0
! data section
! parameter C
! parameter B
! parameter A
! align to word address
! allocate 250 word stack
! point to bottom of stack
! reserve for count
.text
! Program’s one-time initialization start: set bstak, %sp mov %sp, %fp
! set initial stack ptr
! set initial frame ptr
! STARTUP SEQUENCE to call subchr() sub %sp, 16, %sp ! move stack ptr set A_m, %r1 ! A is passed by reference st %r1, [%sp+4] ! push address on stack set B_m, %r1 ld [%r1], %r1
! B is passed by value
! get value of B st %r1, [%sp+8] ! push parameter B on stack set C_m, %r1 ld [%r1], %r1
! C is passed by value
! get value of C st %r1, [%sp+12] ! push parameter C on stack
! SUBROUTINE CALL call subchr nop
! CLEANUP SEQUENCE
! make subroutine call
! branch delay slot ld [%sp], %r1 ! pop return value off stack add %sp, 16, %sp ! pop stack set R_m, %r2 st %r1, [%r2]
. . .
! get address of R
! store R
! the rest of the program stack:
%sp ->
%fp ->
Return value addr (a) b c
CSE360 175
! SUBROUTINE PROLOGUE subchr: sub %sp, 32, %sp st %fp, [%sp+28] add %sp, 32, %fp st %r15, [%fp-8] st %r8, [%fp-12]
…
! open 8 words on stack
! Save old frame pointer
! old sp is new fp
! save return address
! Save gen. Register
! Save r9-r13, omitted
! SUBROUTINE BODY ld_reg: ld [%fp+4], %r8 ld [%fp+8], %r9 ld [%fp+12], %r10 clr %r12 clr %r13 loop: ldub [%r8+%r13], %r11 cmp %r11, 0x0 be done cmp %r11, %r9 bne inc nop stb %r10, [%r8+%r13] add %r12, 1, %r12 add %r13, 1, %r13 inc: ba loop nop done: st %r12, [%fp+0]
! “pop” (load) addr of A
! “pop” (load) value of B
! “pop” (load) value of C
! count
! index
! load a string chr
! is chr=null?
! then go to done
! is chr<>B? (branch delay)
! then go to inc
! branch delay slot
! change chr to C
! increment count
! increment index
! do next chr
! branch delay slot
! “push” (store) count on stack
%sp ->
%fp ->
! EPILOGUE … ld [%fp-12], %r8 ld [%fp-8], %r15 ld [%fp-4], %fp add %sp, 32, %sp retl nop
! Restore r9-r13, omitted
! Restore r8
! get saved return address
! Get old value of frame ptr
! Restore stack pointer
! return to caller
! branch delay slot
...
%r9
%r8 return addr old frame ptr
Return value addr (a) b c
CSE360 176
General Guidelines
– Keep Startups, Cleanups, Prologues, and Epilogues standard (but not necessarily identical); easy to cut, paste, and modify.
– Caller: leave space for return value on the TOP of the stack.
– Callee: always save and restore locally used registers.
– Pass data structures and arrays by reference, all others by value (efficiency).
CSE360 177
Motorola M68HC11
Called “HC11” for short
Used in ECE 567, a course required of CSE majors
References:
– Data Acquisition and Process Control with the
M68HC11 Microcontroller, 2nd Ed., by F. F. Driscoll,
R. F. Coughlin, and R. S. Villanucci, Prentice-Hall,
2000.
– http://www.cse.ohiostate.edu/~heym/360/common/e_series.pdf
CSE360 178
Late in an academic term (such as now), you can hope to access on-line lecture notes from the
Electrical and Computer Engineering course,
ECE 265.
Visit http://www.ece.osu.edu
Under “Academic Program”, click on the link
“ECE Course Listings”.
Find 265 and click on the link “Syllabus of this quarter”.
CSE360 179
HC11 Sparc
CISC
Instruction encoding lengths vary (8 to 32 bits)
About 316 instructions
RISC, Load/Store
Instruction encoding lengths constant (32 bits)
About 175 instructions
4 16-bit user registers, one of which is divided into two 8bit registers
32 32-bit user integer registers
CSE360 180
HC11
8-bit data bus
16-bit address bus
8-bit addressable
Instruction execution not overlapped
Sparc
32-bit data bus
32-bit address bus
8-bit addressable
Instruction execution overlapped in a pipeline
CSE360 181
A Strange Fact: The HC11 architecture “allows accessing an operand from an external memory location with no execution-time penalty.”
[p. 27, M68HC11 Processor Manual, http://www.cse.ohio-state.edu/~heym/360/common/e_series.pdf
]
Reason: The HC11 requirements state that the
CPU cycle must be kept long enough to accommodate a memory access within one cycle.
This seeming miracle is accomplished by keeping processor speed slow enough.
CSE360 182
CSE360
15
7
0 7 0
Accumulator A Accumulator B
Accumulator D
0
X Index Register
Y Index Register
Stack Pointer (SP)
Program Counter (PC)
183
CSE360
Condition Code Register (CCR)
7 6 5
S X H
4 3 2 1 0
I N Z V C
Carry/Borrow
Overflow
Zero
Negative
I Interrupt Mask
Half-Carry
X Interrupt Mask
Stop
184
Like Sparc, it is line-oriented.
A line may:
– Be blank (containing no printable characters),
– Be a comment line, the first printable character being either a semicolon (‘;’) or an asterisk (‘*’), or
– Have the following format (“[] means an optional field”):
[Label] Operation [Operand field] [Comment field]
CSE360 185
Label:
– begins in column 1, ending either with a space or a colon (‘:’)
– Contains 1 to 15 characters
– Case sensitive
– The first character may not be a decimal digit (0-9)
– Characters may be upper- or lowercase letter, digits 0-9, period (‘.’), dollar sign (‘$’), or underscore (‘_’)
CSE360 186
Operation:
– Cannot begin in column 1
– Contains:
Instruction mnemonic,
Assembler directive, or
Macro call (we haven’t studied macro expansion in this course)
Operand field:
– Terminated by a space or tab character,
– So multiple operands are separated by commas (‘,’) without using any spaces or tabs
CSE360 187
Comment field:
– Begins with the first space character following the operand field (or following the operation, if there is no operand field)
– So no special printable character is required to begin a comment field
– But it appears to be conventional to begin a comment field with a semicolon (‘;’)
CSE360 188
Decimal
Hexadecimal
Octal
Binary
CSE360
Encoding HC11 Sparc
No symbol
$
@
%
No symbol
0x
0
0b
189
HC11 Sparc Meaning
Set location counter (origin)
End of source
Equate symbol to a value
Form constant byte
CSE360
ORG
END
EQU
FCB
.data or .text
Doesn’t have
.set
.byte
190
Meaning HC11 Sparc
Form double byte
Form character string constant
Reserve memory byte or bytes
FDB
FCC
RMB
.half
.ascii
.skip
CSE360 191
Immediate (IMM)
Extended (EXT)
Direct (DIR)
Inherent (INH)
Relative (REL)
Indexed (INDX, INDY)
CSE360 192
Assembler interprets the # symbol to mean the immediate addressing mode
Examples
– LDAA #10
– LDAA #$1C
– LDAA #@17
– LDAA #%11100
– LDAA #’C’
– LDAA #LABEL
CSE360 193
Lack of # symbol indicates extended or direct addressing mode. These are forms of memory direct addressing, like SAM.
“Extended” means full 16-bit address, whereas
“Direct” means directly to a low address, specified using only the least significant 8 bits of the address.
Examples
– LDAA $2025
– LDAA LABEL
CSE360 194
Examples
– LDAA $C2
– LDAA LABEL
CSE360 195
All operands are implicit (i.e., inherent in the instruction)
Examples: ABA, SBA, DAA
ABA means add the contents of register B to the contents of A, placing the sum in A (A + B
A)
SBA means A – B
A
DAA means to adjust the sum that got placed in A by the previous instruction to the correct BCD result; e.g., $09 + $26 yields $2F in A, then DAA changes this to $35.
CSE360 196
Used only for branch instructions
Relative to the address of the following instruction
(the new value of the PC)
Signed offset from -128 to +127 bytes
Examples
– BGE
-18
– BHS 27
– BGT LABEL
CSE360 197
Uses the contents of either the X or Y register and adds it to a (positive, unsigned) offset contained in the instruction to calculate the effective address
Example
– LDAA 4,X
CSE360 198
When an interrupt is acknowledged, the CPU’s hardware saves the registers’ contents on the stack.
An interrupt service routine ends with a(n) RTI instruction. This instruction automatically restores the CPU register values from the copies on the stack.
CSE360 199
It’s reasonably safe to say that every instruction that changes a register (A, B, D, X, Y, SP) affects the CCR appropriately. Unlike Sparc, there are no arithmetic instructions that do not set condition codes.
There do exist instructions that compare a register to a memory location by subtracting the memory contents from the register and throwing the result away, but setting the CCR (CMPA, CMPB, CPD,
CPX, CPY).
CSE360 200
Problem: Produce the following waveforms on the three least significant bits (LSBs) of parallel 8-bit output Port B (mapped to $1004), where we name the bits X, Y, and Z in increasing order of significance (X is bit 0; Y is bit 1; Z is bit 2).
10 ms
X
20 ms
Y
15 ms
Z
CSE360 201
STACK: EQU $00FF ; set stack pointer
PORTB: EQU $1004 ; set address of Port B
ORG 0
DELAY1: FCB 10 ; set the waveform times
DELAY2: FCB 20 ; for X, Y, and Z
DELAY3: FCB 15
CSE360 202
ORG $E000 ; program starts at $E000
MAIN: LDS #STACK ; initialize stack pointer
L0: LDAA #1 ; set X on Port B to 1
STAA PORTB
LDAB DELAY1 ; delay for 10 ms
L1: JSR DELAY_1MS
DECB
BNE L1
CSE360 203
LDAA #%00000010
STAA PORTB
; set Y on Port B to 1
LDAB DELAY2 ; delay for 20 ms
L2: JSR DELAY_1MS
DECB
BNE L2
LDAA #%00000100 ; set Z on Port B to 1
STAA PORTB
LDAB DELAY3 ; delay for 15 ms
L3: JSR DELAY_1MS
DECB
BNE L3
BRA L0 ; continue to cycle
CSE360 204
; subr. to delay for 1 ms DELAY_1MS: PSHB
LDAB #198
DELAY: DECB
BRN DELAY
NOP
BNE DELAY
PULB
RETURN: RTS
ORG $FFFE ; initialize reset vector
RESET: FDB MAIN
END
CSE360 205
Traps, Exceptions, and Extended Operations
– Other side of low level programming -- the interface between applications and peripherals
– OS provides access and protocols
CSE360 206
CSE360
– BIOS: Basic Input/Output System
Subroutines that control I/O
No need for you to write them as application programmer
OS interfaces application with BIOS through traps (extended operations (XOPs))
Applications software
BIOS
Keyboard Screen Mouse Disk
207
CSE360
– Where are OS traps kept? Two approaches:
Transient monitor: traps kept in a library that is copied into the application at link-time
Appl 1
OS rtns
Appl 2
OS rtns
Appl 3
OS rtns
Appl 4
OS rtns
Resident monitor: always keep OS in main memory; applications share the trap routines.
Appl 1 Appl 3 Appl 5
OS rtns
Appl 2 Appl 4 Appl 6
OS routines monitor devices. Frequently used routines kept resident; others loaded as needed.
208
CSE360
– (Assuming a res. monitor) How to find I/O routines?
Store routines in memory, and make a call to a hard address.
E.g., call 256
– When new OS is released, need to recompile all application programs to use different addresses.
Use a dispatcher
– Dispatcher is a subroutine that takes a parameter (the trap number). Dispatcher knows where all routines actually are in memory, and makes the branch for you. Dispatcher subroutine must always exist in the same location.
BIOS 1
Application Dispatcher
BIOS n
209
CSE360
Use vectored linking
– Branch table exists at a well known location. The address of each trap subroutine is stored in the table, indexed by the trap number.
– On RISC, usually about 4 words reserved in the table. If the trap routine is larger than 4 words, can call the actual routine.
100
104
108
100+4n
Addr of trap 0
Addr of trap 1
Addr of trap 2
Addr of trap n
100
116
132
100+16n
210
CSE360
– Levels of privilege
Supervisor mode - can access every resource
User mode - limited access to resources
OS routines operate in supervisor mode, access is determined by bit in PSW (processor status word).
XOP (book’s notation) can always be executed, sets privilege to supervisor mode ( ta )
RTX (book’s notation) can only be executed by the OS, and returns privilege to user mode ( rett )
– Exceptions
Caused by invalid use of resource. E.g., divide by zero, invalid address, illegal operation, protection violation, etc.
211
CSE360
Control transferred automatically to exception handler routine.
Similar to trap or XOP transfer.
Exceptions vs. XOPs
– XOPs explicit in code, exceptions are implicit
– XOPs service request and return to application; exceptions print message and abort (unless masked ).
– Trap example: non-blocking read ta 3
If there is nothing in the keyboard buffer, return with a message that nothing is there. Otherwise, put the character into register 8.
212
CSE360
Status of the keyboard is kept in a memory location, as is the
(one-character) keyboard buffer. Memory mapped devices.
! ta 3 returns character if one is there, otherwise
! it returns 0x8000000 into %r8
set 0x8000000, %r8 ! set default return val
set KbdStatus, %r1 ! KbdStatus is memory loc
ld [%r1], %r1 ! read status (1 is ready)
andcc %r1, 1, %r1 ! check status
be rtn ! can’t read anything
set KbdBuff, %r1 ! KbdBuff is memory loc
ld [%r1], %r8 ! get character rtn: rett ! return to caller
On SPARC, trap table has 256 entries. 0-127 are reserved for exceptions and external interrupts. 128-255 are used for
XOPs. Trap table begins at address 0x0000. Each entry is 4 instructions (16 bytes) long.
213
CSE360
Trap execution: ta 3
– Calculate trap address: 3 * 16 + 0x0800 = 16 * (3 + 0x080)
– Save nPC and PSW to memory
• SPARC uses register windows
• Assumes local registers are available
– Set privilege level to supervisor mode
– Update PC with trap address (and make nPC = PC + 4) (jumps to trap table)
– Trap table has instruction ba ta3_handler
– rett
• Restores PC (from saved nPC value) and PSW (resets to user mode)
• Returns to application program
214
Programmed I/O
– Early approach: Isolated I/O
Special instructions to do input and output, using two operands: a register and an I/O address.
CPU puts device address on address bus, and issues an I/O instruction to load from or store to the device.
CSE360 215
CSE360
Isolated I/O addr bus data bus read/write
Memory
CPU addr bus data bus read/write
I/O
216
CSE360
No special I/O instructions. Treat the I/O device like a memory address. Hardware checks to see if the memory address is in the I/O device range, and makes the adjustment.
Use high addresses (not “real” memory) for I/O memory maps.
E.g., 0xFFFF0000 through 0xFFFFFFFF.
memory unused addr bus data bus read/write
Memory
CPU
I/O
I/O unused
217
CSE360
– Advantages of each
Memory mapped: reduced instruction set, reduced redundancy in hardware.
Isolated: don’t have to give up memory address space on machines with little memory
218
CSE360
UARTs
– Universal Asynchronous Receiver Transmitter
Keyboard
01101010 serial
UART parallel
.
.
0
0
1
1
0 CPU
– Asynchronous = not on the same clock.
– Handshake coordinates communication between two devices.
– A kind of programmed I/O.
219
UART registers
– Control: set up at init, speed, parity, etc.
– Status: transmit empty, receive ready, etc.
– Transmit: output data
– Receive: input data
– All four needed for bidirectional communications,
– Status/control, transmit / receive often combined.
Why?
Control bus
Address bus
Control Reg
Status Reg
Transmit Reg
Receive Reg
Data bus
Transmit
Logic
Receive
Logic
CSE360 220
Memory mapped UARTs
– Both memory and I/O “listen” to the address bus. The appropriate device will act based on the addresses.
– Keyboards and Printers require three addresses (when addresses are not combined).
– Modems require four.
– (why?)
Address bus
Control bus
CPU
Memory UART1 UART2
Data bus
CSE360
FFFF 0000
FFFF 0004
FFFF 0008
FFFF 000C
FFFF 0010
UART 1 data
UART 1 status
UART 1 control
UART 2 xmit
UART 2 recv
FFFF 0014
FFFF 0018
FFFF 001C
UART 2 status
UART 2 control
UART 3 xmit and so on
221
Programmed I/O Characteristics:
– Used to determine if device is ready (can it be read or written).
– Each device has a status register in addition to the data register.
– Like previous trap example, must check status before getting data.
– Involves polling loops .
CSE360 222
Ex.: ta 2 handler (blocking keyboard input) ta_2_handler:
set KbdBuff, %r1 ! get addr of kbd buffer
set KbdStatus, %r9 ! get addr of kbd status wait: ld [%r9], %r10 ! get status
andcc %r10, 1, %r10 ! check if ready
be wait ! loop until ready
nop ! branch delay
ld [%r1], %r8 ! get data
rett ! return from trap
Nope ..
Not yet..
Hang on..
Are you ready?...
Are you ready now?...
How about NOW?...
Can’t afford to wait like this. Computer is millions of times faster than a typist. Also, multi-tasking operating systems can’t wait.
Special purpose computers can wait. E.g., microwave oven controllers.
Must have a better way! Interrupts are the answer!
CSE360 223
Programmed (polled) I/O used busy waiting.
– Advantages: simpler hardware
– Disadvantages: wastes time
Interrupts (IRQs on PCs)
– I/O device “requests” service from CPU.
– CPU can execute program code until interrupted.
Solves busy waiting problems.
– Interrupt handlers are run (like traps) whenever an interrupt occurs. Current application program is suspended.
CSE360 224
Servicing an interrupt
– I/O controller generates interrupt, sets request line “high”.
– CPU detects interrupt at beginning of fetch/execute cycle
(for interrupts “between” instructions).
– CPU saves state of running program, invokes intrpt. handler.
– Handler services request; sets the request line “low”.
– Control is returned to the application program.
Application
Program
:
:
*Interrupt
Detected*
:
:
Interrupt
Handler
Service
Request
:
:
Clear
Interrupt
CSE360 225
Changes to fetch/execute cycle
Problems
– Requires additional hardware in
Timing & Control.
– Queuing of interrupts
– Interrupting an interrupt handler
(solution: priorities and maskable interrupts)
– Interrupts that must be serviced within an instruction
– How to find address of interrupt handler
Y
Save PC
Save PSW
PSW=new PSW
PC=handler_addr
Interrupt
Pending?
PC -> bus load MAR
INC to PC load PC
N
CSE360 226
– Want to print a string without busy waiting.
– Want to return to the application as fast as possible
I’m ready!
CSE360 227
Install trap handler into trap table
– Buffer is like circular queue
– only outputs, at most, one character
Disp_buf: disp_buf: .skip 256 ! buffers string to print disp_frnt: .byte 0 ! offset to front of queue disp_bck: .byte 0 ! offset to back of queue ta_6_handler:
! Copy str from mem[%r8] to mem[disp_buf+disp_bck]
! Disp_back = (disp_back+len(str)) mod 256
! If display is ready
! If first char is not null, then output it
! Disp_frnt = (disp_frnt+1) mod 256 rett ! Return from trap disp_frnt
Oldest byte
Undisplayed byte newest byte disp_bck
CSE360 228
This too outputs only one character at most, but when display becomes ready again, it generates another interrupt which invokes this routine!
display_IRQ_handler:
! Save any registers used
! If disp_frnt != disp_bck (queue is not empty)
! Get char at mem[disp_frnt]
! If char is not null, then output it
! Disp_frnt = (disp_frnt+1) mod 256
! Restore registers and set the request line “low” rett ! Return from trap
I’m ready!
Uses the UART for transmission.
CPU
Memory
CSE360 229
Problems with interrupt driven I/O
CPU is involved with each interrupt
Each interrupt corresponds to transfer of a single byte
Lots of overhead for large amounts of data (blocks of
512 bytes)
Memory
Execute 10s or 100s of instructions per byte
CPU
Device
Controller
Transfer one word of data
Interrupt
Transfer one byte of data
CSE360 230
DMA (Direct Memory Access)
Want I/O without CPU intervention
Want larger than one byte data transfers
Solution: add a new device that can talk to both I/O devices and memory without the CPU; a “specialized” CPU strictly for data transfers.
CPU
Memory
Device
Controller
DMA
Controller
CSE360 231
Steps to a DMA transfer
– CPU specifies a memory address, the operation
(read/write), byte count, and disk block location to the
DMA controller (or specify other I/O device).
– DMA controller initiates the I/O, and transfers the data to/from memory directly
– DMA controller interrupts the CPU when the entire block transfer is completed.
Problem
– Conflicts accessing memory. Can either arbitrate access or get a more expensive dual ported memory system.
CSE360 232