CSE240 Homework 2

advertisement
CSE240 Homework 2
Huaxia Xia, 10/11/2000
H&P2.1 Answer:
From Figure 2.26, we get:
Number of instructions of data reference(i.e., "load", "store", and "load imm") is:
26% + 9% + 4% = 39%
And the total number of branch instructions (cond brand, jump, call, return) is:
17% + 1% + 1% + 1% = 20%
a. For data references, 17% of them need 0 bit, so that the instruction length
should be 16 bits. Since a bit for sign is needed, only data references from 1 to
7 bits can be encoded in 8-bit mode, the rate is 57% - 17% = 40%, with
instruction length of 24 bits. The rate of 16-bit data is 100%-57%=43%, with
instruction length of 32bits.
As for branches, 0% of them need 0 bits; 93% of them need 8 bits with
instruction length of 24bits, 7% of them need 16 bits with instruction length of
32 bits.
Thus the average instruction length is:
16*41% + (16*17% + 24*40% + 32*43%)*39% + (24*93% + 32*7%)*20%
=21.64 bits
b. For data references, 43% of them need more than 8 bits; for branches, 7% of
them need more than 8 bits.
The average instruction length is:
24*(1-43%*39% - 7%*20%) + 48*(43%*39% + 7%*20%) = 28.36 bits
c. For instruction set with fixed offset length of 16 bits, all the branch and the
data reference instructions need 32 bits. The average instruction length is:
16*(1-39% - 20%) + 32 *(39% + 20%) = 25.44
Summary: If we only consider code length, then (a) is better than (c) and better
than (b).
H&P2.2 Answer:
a. Assume we need to elimate x% instructions so as to get same performance.
(1-x%) * IC * CPI * (1+10%)*ClockCycle = IC * CPI * ClockCycle
 x% = 9.09%
From Figure 2.26, we know that the rate of load is 22.8%, so the percentage of
load being elimated is:
9.09% / 22.9% = 39.5%
b. For example:
LOAD R1, 0(Rb)
ADD R2, R1, R1
Since we can use only one data with register-memory, we cannot substitude
the two "R1" at the same time.
H&P2.6 Answer:
The program code:
SW R0, 2000(R0)
; store 0 to I
loop: LW R1, 2000(R0)
; load I to R1
SLEI R10, R1, #100
; if i<=100, R101; else R100
BEQZ R10, endloop
; if i>100, jump to endloop
SLLI R2, R1, #2
; R2  R1*4, the offset of array
LW R3, 5000(R2)
; load B[i]
LW R4, 2000(R0)
; load C
ADD R5, R3, R4
; R5 B[i] + C
SW R5, 0(R2)
; store R5 to A[i]
ADDI R1, R1, #1
;ii+1
SW R1, 2000(R0)
J
loop
; jump to next iteration
endloop: …
Totally, the instructions excuted is 1 + 11 * 101 = 1112; number of memory-data
access is 1+5*101 = 506; code size is 12*4=48 bytes.
H&P2.8 Answer:
The program code: Assume i, C, A, B is stored in R1, R2, R3, R4, respectively
ADD R1, R0, R0
; store 0 to I
loop: SLEI R10, R1, #100
; if i<=100, R101; else R100
BEQZ R10, endloop
; if i>100, jump to endloop
SLLI R5, R1, #2
; R5  R1*4, the offset of array
ADD R6, R4, R5
; R6  address of B[i]
LW R7, 0(R6)
; R7  B[i]
ADD R7, R7, R2
; R7  B[i] + C
ADD R6, R3, R5
; R6  address of A[i]
SW R7, 0(R6)
; store R7 to A[i]
ADDI R1, R1, #1
J
LOOP
endloop: …
Totally, the instructions excuted is 1 + 10 * 101 = 1011; the number of memorydata access is 2*101 = 202; code size is 11*4=44 bytes.
H&P2.12 Answer:
a.
From Figure 2.26 we know that the percentage of load & store is 26%+9%
= 35%, so the instruction count will decrease in 3.5%. The rate of IC onn
the enhanced DLX compared to the original DLX is:
1-3.5% = 96.5%
b.
CPUTimenew = ICnew * CPI * ClockCyclenew
= 96.5% * ICold * CPI * 105%*ClockCycleold
= 1.013 CPUTimeold
So, the original machine will be faster by about 1.3%.
Download