Uploaded by Rishika Kushwah

CMPEN431 Exam2A Practice 1.pdf-2

advertisement
CMPEN 431 Practice Exam 2A
1. (11) Caching
a) (6) Consider a cache for a system that does not support virtual memory (i.e. no paging and no
translation). The byte-addressable address space consists of Z bytes, the cache has S sets, A
ways, and a block size of K bytes.
i)
If we want to address compulsory misses, and we do not change the capacity of the cache,
what parameters will we change and how? Will the bit length of the tag become larger,
smaller, or stay the same? Why?
ii) If we want to address conflict misses, and we do not change the capacity of the cache, what
parameters will we change and how? Will the bit length of the tag become larger, smaller,
or stay the same? Why?
iii) If block size decreases by 2x, associativity triples, and cache capacity goes up by 12x, how
many bits are in each of the tag, index and block offset fields, in terms of the given
parameters of the original cache?
b) (5) There are three programs, FOO, BAR, and BAZ, a reference microbenchmark “ALWAYS-HITS”
and an architecture SHODAN-N where each processor in the SHODAN-N series varies only by the total size
of its (highly associative) single-level cache. The size of the cache doubles with each increment of N (i.e.
SHODAN-3 has 8 times as much cache as SHODAN-0). Assume that the access patterns of all three
programs are not pathological with respect to cache parameters (i.e. assume as a simplifying assumption
that changing among reasonable replacement policies would not have significant effects on performance,
nor would the relative behaviors of the programs change significantly if the associativity or block size were
modestly increased or decreased, etc.)
As FOO, BAR, and BAZ are run across the SHODAN series of processors in order from a SHODAN-0 to a
SHODAN-7, the following behaviors are observed:
The performance of FOO is very good (similar to ALWAYS-HITS) on a SHODAN-0, but gets progressively
worse by SHODAN-7, although it still performs reasonably well.
The performance of BAR is initially poor, and then increases greatly between SHODAN-2 to SHODAN-3,
and then declines.
The performance of BAZ is initially poor, and gets progressively better from SHODAN-0 to SHODAN-7,
but the rate of improvement decreases with every step.
i.)
(3) Describe the properties of FOO, BAR, and BAZ that would produce these
respective behaviors (you may assume that each program has uniform behavior if
it simplifies your answer).
ii.)
(2) Assume that you are designing a follow-on to the SHODAN series, the POLITO
processors. Describe a cache memory system that will effectively serve all three
programs and justify your answers. Note any compromises in the design where
one program suffers at the expense of the others, and how.
This study source was downloaded by 100000826153891 from CourseHero.com on 11-28-2023 22:05:56 GMT -06:00
https://www.coursehero.com/file/80945504/CMPEN431-Exam2A-Practice-1pdf/
2. (10) Caching in Virtual Memory Systems
Assume that you have a byte-addressable MIPS system with the following properties and configuration:

Physical address space: 26 bits; Virtual address space: 32 bits; Page size: 16KB; word = 32 bits

8 entry, 2-way associative ITLB and DTLB

VIPT L1 D-cache = 16 entry, 2-way associative, with 8-byte blocks; write-allocate/write-back

VIPT L1 I-cache = 32 entry, 2-way associative, with 4-byte blocks

Entry for each TLB or D$ entry consists of {valid, tag, data} Tag and data in hexadecimal.
I$ entries shown as {valid, tag, decoded instruction} for ease of exposition

Cache-block Representation Endianness: If the data block containing address 0x0006 was 0x0123456789ABCDEF,
the byte loaded from 0x0006 would have integer value = 0xCD.
DTLB:
SET 0
1,0x1234,0x000
1,0x0000,0x0A2
SET 1
1,0x0000,0x003
1,0xC234,0x002
SET 2
1,0x7234,0x023
1,0x0000,0x100
SET 3
SET 0
L1 D$:
SET 1
SET 2
SET 3
SET 4
SET 5
SET 6
SET 7
ITLB:
L1 I$:
SET 0
SET 1
SET 2
SET 3
SET 4
SET 5
SET 6
SET 7
SET 8
SET 9
SET 10
SET 11
SET 12
SET 13
SET 14
SET 15
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,0x0000,0x001
0x00000, 0x1234567887654321
0x00100, 0xFEEDEEC5C0DEF00D
0x00200, 0xBEE5BEE5BEE5BEE5
0x00387, 0xB0B0D0D01A2B3C4D
0x00000, 0x0102030405060708
0x00100, 0xFE99EE88C077F066
0x00200, 0x1E1D1E15101E101D
0x00386, 0xFEEDEEC5C0DEF00D
1,
1,
1,
1,
1,
1,
1,
1,
1,0x2234,0x009
0x00308, 0xC00CEF0990FEDCBA
0x00201, 0x1FEEDEE15DEADC0D
0x00101, 0xDEAFBEEFDEADBEE5
0x00001, 0x1FEED515E1FF00D5
0x00389, 0x0910111213141516
0x00201, 0x54ED43E132EA210D
0x00108, 0x11E2D3E455E6D708
0x00001, 0x1FEEDEE15DEADC0D
SET 0
1,0x0004,0x007
1,0x1234,0x040
SET 1
1,0x0040,0x640
1,0x0004,0x068
SET 2
1,0x012C,0x022
1,0x0040,0x00F
SET 3
1,0x1234,0xD1E
1,0x0CA7,0x486
0x64000,
0x48600,
0x48604,
0xD1E10,
0x02211,
0x04040,
0x00737,
0x00FAD,
0x64000,
0x48600,
0x48604,
0xD1E10,
0x02211,
0x04040,
0x00737,
0x00FAD,
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
$13,
$5,
$3,
$2,
$13,
$13,
$13,
$13,
$13,
$13,
$13,
$9,
$7,
$5,
$3,
$2,
0x1238($0)
0x2236($0)
0x4238($0)
0x8232($0)
0x5230($0)
0x623E($0)
0x723C($0)
0x823A($0)
0x1238($0)
0x61DF($0)
0x6264($0)
0x8242($0)
0x5234($0)
0x623C($0)
0x7230($0)
0x823A($0)
1,
1,
0,
1,
1,
1,
1,
1,
1,
1,
0,
1,
1,
1,
1,
1,
0x04000,
0x06880,
0x21264,
0x64004,
0x640AD,
0x48633,
0x04000,
0x64086,
0x04000,
0x06880,
0x21264,
0x64004,
0x640AD,
0x48633,
0x04000,
0x64086,
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
LBU
This study source was downloaded by 100000826153891 from CourseHero.com on 11-28-2023 22:05:56 GMT -06:00
https://www.coursehero.com/file/80945504/CMPEN431-Exam2A-Practice-1pdf/
$0,
$13,
$1,
$13,
$2,
$13,
$3,
$13,
$4,
$13,
$5,
$13,
$6,
$9,
$7,
$8,
0x9236($0)
0xA234($0)
0xB232($0)
0xC230($0)
0xD22E($0)
0xE23C($0)
0xF23A($0)
0x0238($0)
0x9238($0)
0xA236($0)
0xB234($0)
0xC232($0)
0xD230($0)
0xE23E($0)
0xF23C($0)
0x023A($0)
If the result of executing the current instruction is 1) an I-cache & ITLB hit, and 2) a D-cache &
DTLB hit that puts 0x4D (circled) in register 13, what is the address (PC) of the current
instruction? Show work.
This study source was downloaded by 100000826153891 from CourseHero.com on 11-28-2023 22:05:56 GMT -06:00
https://www.coursehero.com/file/80945504/CMPEN431-Exam2A-Practice-1pdf/
3. (6) Cache Performance
The base CPI of a system, excluding memory stalls, is QUUX
Loads and stores collectively constitute A% of all instructions
Accessing the L1 data cache takes P cycles (accounted for in base CPI).
Accessing the L1 instruction cache takes D cycles (accounted for in base CPI).
Misses to main memory take an average of K cycles.
The L1 D-cache miss rate/access is MD. The L1 I-cache miss rate/access is MI.
The L1 D-cache miss rate/access in a double-capacity cache is MDX.
The L1 I-cache miss rate/access in a double-capacity cache is MIX.
Your team is considering either i) adding a unified L2 cache for both data and instruction accesses or ii)
doubling the size of the existing L1 caches at a penalty of 1 extra cycle per access.
SOLELY from an AMAT optimization perspective, and assuming that the L2 miss rate for instructions is
half that of the miss rate for data, what relationship would have to hold true for the L2 cache to be a
better option?
This study source was downloaded by 100000826153891 from CourseHero.com on 11-28-2023 22:05:56 GMT -06:00
https://www.coursehero.com/file/80945504/CMPEN431-Exam2A-Practice-1pdf/
4. (8) Multi-threading
Consider a 2-wide in-order 5-stage pipeline with all functional units are replicated 2x, except data
memory ports, that initially supports only 1 thread, and has the below schedule. Assume that the
branch to FOO is taken.
1
2
3
4
7
8
FOO: lw $2, 0($4)
F
D
E
M W
lw $3, 0($5)
F
d
D
E
M
W
addu $2, $2, $3
f
F
d
D
E
M
W
sw $2, 0($4)
f
F
d
D
E
M
W
lw $4, 4($4)
f
F
D
E
f
F
D
E
M W
M W
addi $5, $5, 4
bne $4, $0, FOO
F
d
d
<MISFETCH> sll $0, $0, $0
F
d
d
CYCLE 
Instruction
5
6
9
1
0
1
1
1
2
D
E
M
W
D
-
-
-
1
3
1
4
1
5
1
6
1
7
a)
(5) Assume that two copies of the same code were running on a two-wide, 2-thread FGMT
(Fine-Grained-Multi-Threaded) processor with simple round-robin scheduling. In what cycle
would the first thread to execute FOO fetch the second dynamic instance of that
instruction?
b)
(3) Consider two possible implementations of designs that support two threads:
i)
A single core, 4-issue, 2-way simultaneous multi-threading dynamically scheduled
(OoO) processor
ii)
A multiprocessor with two 2-issue single-threaded OoO cores
Assume that both designs have a two-level cache hierarchy, and that the L1 cache size per core
is identical.
Describe a two-threaded workload (either multi-threaded or multi-process) where each thread
has a fixed amount of work to perform that would be expected to reach a point where both
threads have completed execution significantly faster on i) than ii) and explain why.
This study source was downloaded by 100000826153891 from CourseHero.com on 11-28-2023 22:05:56 GMT -06:00
https://www.coursehero.com/file/80945504/CMPEN431-Exam2A-Practice-1pdf/
Powered by TCPDF (www.tcpdf.org)
1
8
1
9
Download