Emulating ‘MUL’ How multiplication of unsigned integers can be performed in software

```Emulating ‘MUL’
How multiplication of unsigned
integers can be performed in
software
Hardware support absent?
• The earliest microprocessors (from Intel,
Zilog, and others) did not implement the
multiplication instructions – that operation
would have had to be done in software
• Even with our Core-2 Quad processor,
there is an ultimate limit on the sizes of
integers that can be multiplied using the
processor’s built-in ‘multiply’ instructions
Multiplying with Base Ten
• Here’s how we learn to multiply multi-digit
integers using ordinary ‘decimal’ notation
8765
x 4321
-----------------8765
17530
26295
+ 35060
-----------------37873565
 multiplicand
 multiplier
 partial product
 partial product
 partial product
 partial product
 product
Analogy with Base Ten
• When multiplying multi-digit integers using
‘binary’ notation, we apply the same idea
1001
x 1011
-----------------1001
1001
0000
+ 1001
-----------------1100011
 multiplicand (=9)
 multiplier (=11)
 partial product
 partial product
 partial product
 partial product
 product (=99)
Some observations…
• With ‘binary’ multiplication of two N-digit
values, the product can require 2N-digits
• Each ‘partial product’ is either zero or is
equal to the value of the ‘multiplicand’
• Succeeding partial products are ‘shifted’
• So if we want to ‘emulate’ multiplication of
unsigned binary integers using software,
we must implement these observations
8-bit case
• The smallest case to consider is using the
‘mul’ instruction to compute the product of
8-bit values, say in registers AL and BL
.section
number1:
number2:
product:
.section
mov
mov
mul
mov
.data
.byte
.byte
.word
100
200
0
.text
number1, %al
number2, %bl
%bl
%ax, product
Doing it by hand
100 = 0x64 = 01100100 (binary)
01100100
AL
200 = 0xC8 = 11001000 (binary)
11001000
BL
00000000
00000000
11001000
00000000
00000000
11001000
11001000
+ 00000000
------------------------------------0100111000100000
(BL x 0)
(BL x 0)
(BL x 1)
(BL x 0)
(BL x 0)
(BL x 1)
(BL x 1)
(BL x 0)
20000 = 0x4E20 = 0100111000100000 (binary)
01001110 00100000
AX
Using x86 instructions
.section .text
softmulb:
push
%rcx
# save caller’s count-register
sub
mov
nxbit8: rcr
jnc
sub
%ah, %ah
\$9, %rcx
\$1, %ax
%bl, %ah
nxbit8
%ah, %cl
# zero-extend AL to (CF:AH:AL)
# number of bits in (CF:AH)
# next multiplier-bit to Carry-Flag
# skip addition if CF-bit is zero
# go back to shift in next CF-bit
# set CF-bit if 8-bits exceeded
%rcx
# recover caller’s count-register
pop
ret
Visual depiction
ROR \$1, %AX
CF
0
AH
00000000
AL
multiplier
17-bit value gets ‘rotated’ 1-place to the right
BL
multiplicand
then multiplicand is added to AH (unless CF =0)
Exhaustive testing
• We can insert ‘inline assembly language’
in a C++ program to construct a loop that
checks our software multiply operation
against every case of the CPU’s ‘mul’
• Our test-program is names ‘multest.cpp’
• You can compile it like this:
\$ g++ multest.cpp softmulb.s -o multest
In-class exercises
• Can you write a ‘software’ emulation for
the CPU’s 16-bit multiply operation?
mul %bx
• Can you write a ‘software’ emulation for
the CPU’s 32-bit multiply operation?
mul %ebx
```