Homework

advertisement
Some Homework Questions 2016
Others may be discussed during class hours. This file will be updated as we progress
in the course.
Q1) Create a 3-4 slide PPT that shows a comparison between MIPS and ARM’s
recent ISA. Show similarly to how we had the MIPS ISA in the lecture - focusing on
types of instructions and their encodings.
Q2) Run the geekbench 3 benchmark-suite on your computer to measure CPU
performance. Compare with your colleagues (project) results and discuss speedups
and reasons for differences.
Q3) Assume that two embedded processors that are identical expect memory
systems.
One uses CAM and the other 2-way RAM-TAG caches with same size blocks and
overall size. Assume that in the I$ one can eliminate 60% of all Tag checks and TAG
power is otherwise 65% of total read power in a CAM but only 35% in the 2-way
cache. There are no cache misses. CAM and Tag have 32 lines as in lecture version. A
CAM lookup is proportional in power with the number of rows it needs to check.
Assume that a TAG lookup is proportional with associativity (how many rows you
check). Which CPU is more efficient and why? Under what assumptions? Make up
other assumptions if you need to.
Q4) A single-issue processor fetching one instruction per cycle consumes 25% of its
energy in the instruction memory system; all accesses in the application considered
are from the instruction cache, i.e., there is no DRAM access, and only one
instruction is fetched per cycle. Now assume a branch predictor that has roughly the
same power-per-access as a single instruction cache fetch due to a similar
associative organization. 20% of all instructions executed are branches. A single
instruction fetch and a branch prediction each take one cycle to complete.
(i) How much energy is consumed (in processor energy %) in the branch predictor?
Assume that all branches are predicted correctly and CPI is 1. (Hint: this is an easy
question; don’t overthink it).
(ii) Now assume that all branches are predicted Taken in the Decode stage. The
Decode stage has the ability to calculate the target address but you will know
whether the branch was correctly predicted in the Execute stage only. The pipeline
is 5 stages long with Fetch, Decode, Execute, Memory, and Write-back. Assume that
60% of the branches are conditional taken, 30% are conditional untaken, and 10%
are unconditional. 20% of all instructions are branches. No branch delay slots could
be filled.
Please list first the penalty in cycles (if any) for conditional taken, conditional
untaken, and unconditional branches. Calculate the CPI assuming that CPI is 1
without branch penalty. (Hint: for CPI calculation, imagine 100 instructions. Think
about the execution time with and without branch penalty).
Penalty conditional taken:
Penalty conditional untaken:
Penalty unconditional branch:
New CPI:
Q5) (i) Assuming the following 2-bit predictor (see below) please identify a pattern
of 6 branches (e.g., use T and N sequence notation) that would be poorly predicted
(less than 10% accuracy). Assume that the predictor is in the T*N state originally
and your first branch is not taken (N).
Propose a different 2-bit scheme that would handle the original pathological pattern
better than this predictor by at least 40%. You also need to point out the weakness
of your new scheme -- if any, e.g., its pathological case. To get full credit you need to
show (a) the new state machine, and (b) patterns. (Hint: we studied in class one
example of a 2-bit scheme that would also work).
Baseline Scheme
(above)
scheme (show above):
Pathological pattern:
Weakness of new scheme, e.g., pattern:
Your
(ii) Describe how can you save energy attributed to branch prediction. Mention at
least two methods we discussed in class. (Hint: think about how you can make
branch prediction more efficient).
Method 1:
Method 2:
Download