ECE369: Fundamentals of Computer Architecture

advertisement
ECE 369
Chapter 3
ECE369
1
Multiplication
•
•
More complicated than addition
– Accomplished via shifting and addition
More time and more area
ECE369
2
Multiplication: Implementation
Start
Multiplier0 = 1
Multiplicand
1. Test
Multiplier0
Multiplier0 = 0
Shift left
64 bits
1a. Add multiplicand to product and
place the result in Product register
Multiplier
Shift right
64-bit ALU
32 bits
2. Shift the Multiplicand register left 1 bit
Product
Write
Control test
3. Shift the Multiplier register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
ECE369
3
Example
Multiplicand
Shift left
64 bits
Multiplier
Shift right
64-bit ALU
32 bits
Product
Write
Control test
64 bits
ECE369
4
Second version
Multiplicand
Shift left
64 bits
Start
Multiplier
Shift right
64-bit ALU
Multiplier0 = 1
32 bits
Product
Write
1. Test
Multiplier0
Multiplier0 = 0
Control test
64 bits
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
Multiplicand
2. Shift the Product register right 1 bit
32 bits
Multiplier
Shift right
32-bit ALU
3. Shift the Multiplier register right 1 bit
32 bits
32nd repetition?
Product
Shift right
Write
Control test
No: < 32 repetitions
Yes: 32 repetitions
64 bits
Done
ECE369
5
Example
Multiplicand
32 bits
Multiplier
Shift right
32-bit ALU
32 bits
Product
Shift right
Write
Control test
64 bits
ECE369
6
Final version
Start
Product0 = 1
1. Test
Product0
Multiplicand
Product0 = 0
32 bits
1a. Add multiplicand to the left half of
the product and place the result in
the left half of the Product register
32-bit ALU
Product
Shift right
Write
Control
test
2. Shift the Product register right 1 bit
64 bits
32nd repetition?
No: < 32 repetitions
Yes: 32 repetitions
Done
ECE369
7
Example
Multiplicand
32 bits
32-bit ALU
Product
Shift right
Write
Control
test
64 bits
ECE369
8
Division
•
•
•
•
Even more complicated
– Can be accomplished via shifting and addition/subtraction
More time and more area
Negative numbers: Even more difficult
There are better techniques, we won’t look at them
ECE369
9
Division (7÷2)
ECE369
12
Improved Division
ECE369
14
Number Systems
• Fixed Point: Binary point of a real number in a certain
position
– Can treat real numbers as integers, do the addition
or subtraction normally
– Conversion 9.8125 to fixed point (4 binary digits)
• Addition or division rule
• Keep multiplying fraction by 2, anytime there is a
carry out insert 1 otherwise insert 0 and then left
shift (= 1001.1101)
• Scientific notation:
– 3.56*10^8 (not 35.6*10^7)
– May have any number of fraction digits (floating)
ECE369
15
Floating point (a brief look)
•
•
•
We need a way to represent
– Numbers with fractions, e.g., 3.1416
– Very small numbers, e.g., 0.000000001
– Very large numbers, e.g., 3.15576 x 109
Representation:
– Sign, exponent, fraction: (–1)sign x fraction x 2exponent
– More bits for fraction gives more accuracy
– More bits for exponent increases range
IEEE 754 floating point standard:
– single precision: 8 bit exponent, 23 bit fraction
– double precision: 11 bit exponent, 52 bit fraction
ECE369
16
IEEE 754 floating-point standard
•
•
•
1.f x 2e
1.s1s2s3s4…. snx2e
Leading “1” bit of significand is
implicit
•
Exponent is “biased” to make sorting easier
– All 0s is smallest exponent, all 1s is largest
– Bias of 127 for single precision and 1023
for double precision
If exponent bits are all 0s and if mantissa bits are all 0s, then zero
If exponent bits are all 1s and if mantissa bits are all 0s, then +/- infinity
17
ECE369
Single Precision
–summary: (–1)sign x (1+significand) x 2(exponent – bias)
• Example:
• 11/100 = 11/102= 0.11 = 1.1x10-1
–Decimal: -.75 = -3/4 = -3/22
–Binary: -.11 = -1.1 x 2-1
–IEEE single precision: 1 01111110 10000000000000000000000
–exponent-bias=-1 => exponent = 126 = 01111110
ECE369
18
Opposite Way
Sign
Exponent
-
129
Fraction
0x2-1+1x2-2=0.25
ECE369
19
Floating point addition
1.610x10-1 + 9.999x101
0.01610x101 + 9.999x101
10.015x101
1.0015x102
1.002x102
ECE369
20
Floating point addition
•
Sign
Exponent
Fraction
Sign
Exponent
Start
Fraction
1. Compare the exponents of the two numbers.
Shift the smaller number to the right until its
0
Small ALU
exponent would match the larger exponent
Exponent
difference
2. Add the significands
1
0
1
0
1
3. Normalize the sum, either shifting right and
incrementing the exponent or shifting left
and decrementing the exponent
Shift right
Control
Overflow or
Big ALU
Yes
underflow?
No
0
0
1
Exception
1
4. Round the significand to the appropriate
Increment or
decrement
number of bits
Shift left or right
No
Still normalized?
Rounding hardware
Yes
Sign
Exponent
Fraction
Done
ECE369
21
Add 0.510 and -0.437510
ECE369
22
Multiplication
ECE369
23
Floating point multiply
•
To multiply two numbers
– Add the two exponent (remember access 127 notation)
– Produce the result sign as exor of two signs
– Multiply significand portions
– Results will be 1x.xxxxx… or 01.xxxx….
– In the first case shift result right and adjust exponent
– Round off the result
– This may require another normalization step
ECE369
24
Multiplication 0.510 and -0.437510
ECE369
25
Floating point divide
•
To divide two numbers
– Subtract divisor’s exponent from the dividend’s exponent
(remember access 127 notation)
– Produce the result sign as exor of two signs
– Divide dividend’s significand by divisor’s significand portions
– Results will be 1.xxxxx… or 0.1xxxx….
– In the second case shift result left and adjust exponent
– Round off the result
– This may require another normalization step
ECE369
26
Floating point complexities
•
•
•
•
•
Operations are somewhat more complicated (see text)
In addition to overflow we can have “underflow”
Accuracy can be a big problem
– IEEE 754 keeps two extra bits, guard and round
– Four rounding modes
– Positive divided by zero yields “infinity”
– Zero divide by zero yields “not a number”
– Other complexities
Implementing the standard can be tricky
Not using the standard can be even worse
– See text for description of 80x86 and Pentium bug!
ECE369
27
Lets Build a Processor, Introduction to Instruction
Set Architecture
•
•
First Step Into Your Project !!!
How could we build a 1-bit ALU for add, and, or?
•
Need to support the set-on-less-than instruction (slt)
– slt is an arithmetic instruction
– produces a 1 if a < b and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
•
Need to support test for equality (beq $t5, $t6, Label)operation
– use subtraction: (a-b) = 0 implies a = b
a
•
How could we build a 32-bit ALU?
32
ALU
result
32
b
Must Read Appendix
32
ECE369
28
One-bit adder
•
•
Takes three input bits and generates two output bits
Multiple bits can be cascaded
cout = a.b + a.cin + b.cin
sum = a <xor> b <xor> cin
ECE369
29
Building a 32 bit ALU
CarryIn
Operation
Operation
CarryIn
a0
b0
CarryIn
ALU0
Result0
CarryOut
a
0
a1
1
2
b
Result
b1
CarryIn
ALU1
Result1
CarryOut
a2
b2
CarryIn
ALU2
Result2
CarryOut
CarryOut
a31
b31
ECE369
CarryIn
ALU31
Result31
30
What about subtraction (a – b) ?
•
•
Two's complement approach: just negate b and add.
How do we negate?
Operation
Binvert
CarryIn
•
Operation
CarryIn
A very clever solution:
a
000 = and
001 = or
010 = add
110 = subtract
a
0
0
1
1
Result
0
b
2
b
Result
2
1
CarryOut
ECE369
CarryOut
31
Supporting Slt
•
Can we figure out the idea?
000 = and
001 = or
010 = add
110 = subtract
111 = slt
ECE369
32
Test for equality
•
Notice control lines
Bnegate
Operation
000 = and
001 = or
010 = add
110 = subtract
111 = slt
•
a0
b0
CarryIn
ALU0
Less
CarryOut
Result0
a1
b1
0
CarryIn
ALU1
Less
CarryOut
Result1
a2
b2
0
CarryIn
ALU2
Less
CarryOut
Result2
Zero
Note: Zero is a 1 if result is zero!
a31
b31
0
ECE369
CarryIn
ALU31
Less
Result31
Set
Overflow
33
How about “a nor b”
000 = and
001 = or
010 = add
110 = subtract
111 = slt
ECE369
34
Big Picture
ECE369
35
Conclusion
•
We can build an ALU to support an instruction set
– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
•
Important points about hardware
– all of the gates are always working
– speed of a gate is affected by the number of inputs to the gate
– speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
•
Our primary focus: comprehension, however,
•
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
How about my instruction smt (set if more than)???
ECE369
36
ALU Summary
•
•
•
•
We can build an ALU to support addition
Our focus is on comprehension, not performance
Real processors use more sophisticated techniques for arithmetic
Where performance is not critical, hardware description languages
allow designers to completely automate the creation of hardware!
ECE369
37
Optional Reading
ECE369
38
Overflow
ECE369
39
Formulation
ECE369
40
A Simpler Formula ?
ECE369
41
Problem: Ripple carry adder is slow!
•
•
Is a 32-bit ALU as fast as a 1-bit ALU?
Is there more than one way to do addition?
•
Can you see the ripple? How could you get rid of it?
c1 = a0b0 + a0c0 + b0c0
c2 = a1b1 + a1c1 + b1c1
c3 = a2b2 + a2c2 + b2c2
c4 = a3b3 + a3c3 + b3c3
•
c2 =
c3 =
c4 =
Not feasible! Why?
ECE369
42
Carry Bit
cout  ab  acin  bcin
ci 1  aibi  ai ci  bi ci
c1  a0b0  a0c0  b0c0
c2  a1b1  a1c1  b1c1
 a1b1  a1 a0b0  a0c0  b0c0   b1 a0b0  a0c0  b0c0 
 a1b1  a1a0b0  a1a0c0  a1b0c0  b1a0b0  b1a0c0  b1b0c0
c3  a2b2  a2c2  b2c2
ECE369
43
Generate/Propagate
c1  a0b0  a0c0  b0 c0
c1  a0b0  (a0  b0 )c0
c2  a1b1  a1c1  b1c1
 a1b1  (a1  b1 )c1
 a1b1  (a1  b1 )a0b0  (a0  b0 )c0 
common  {ai bi , ai  bi }
ai
bi
ci+1
ai
bi
ci+1
0
0
0
0
0
0
0
1
0
0
1
1
1
0
0
1
0
1
1
1
1
1
1
1
generate  ai bi
propagate  ai  bi
ECE369
44
Generate/Propagate (Ctd.)
generate  ai bi
propagate  ai  bi
c1  a0b0  (a0  b0 )c0
 g 0  p0 c0
c2  a1b1  (a1  b1 )c1
 g1  p1 ( g 0  p0 c0 )
 g1  p1 g 0  p1 p0 c0
ci 1  g i  pi ci
ECE369
45
Carry-look-ahead adder
•
Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry?
gi = ai . bi
– When would we propagate the carry?
pi = ai + bi
•
Did we get rid of the ripple?
a3 a2 a1 a0
b3 b2 b1 b0
c1 = g0 + p0c0
c2 = g1 + p1c1 c2 = g1 + p1g0 + p1p0c0
c3 = g2 + p2c2 c3 = g2 + p2g1 + p2p1g0 + p2p1p0c0
c4 = g3 + p3c3 c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0c0
•
Feasible! Why?
c1 = a0b0 + a0c0 + b0c0
c2 = a1b1 + a1c1 + b1c1
c3 = a2b2 + a2c2 + b2c2
c4 = a3b3 + a3c3 + b3c3
c2 =
c3 =
c4 =
ECE369
46
A 4-bit carry look-ahead adder
• Generate g and p term for each bit
• Use g’s, p’s and carry in to
generate all C’s
• Also use them to generate block G
and P
• CLA principle can be used
recursively
ECE369
47
16 Bit CLA
ECE369
48
Gate Delay for 16 bit Adder
generate  ai bi
1
propagate  ai  bi
1+2
1+2+2
ECE369
49
64-bit carry lookahead adder
ECE369
50
Download