CS 152 Computer Architecture & Engineering Section 7 Spring 2010

advertisement
CS 152
Computer Architecture & Engineering
Section 7
Spring 2010
Andrew Waterman
University of California, Berkeley
Mystery Die
Mystery Die
Mystery Die
• RISC II: 41K transistors, 4 micron NMOS @ 12 MHz
• 2.2x faster than VAX 11-780 (1500 TTL chips @ 5MHz)
Agenda
• Quiz 2 Post-Mortem
– Mean: 53.1
– Standard Deviation: 9.0
Quiz 2, Q1
• N=1024. Store/Load miss rate for 4KB 2-way
cache w/LRU replacement?
• LRU => no conflicts between loads/stores
• Loads are unit-stride with no reuse
• All misses compulsory => 1/8
• All stores miss because of capacity misses
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• What about FIFO replacement?
• Stores and loads could now conflict. When?
• Stores always use set i/8 % 64
• Loads always use set j/8 % 64
• Conflicts occur when these are equal
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• What about FIFO replacement?
• Stores and loads could now conflict. When?
• Stores always use set i/8 % 64
• Loads always use set j/8 % 64
• Conflicts occur when these are equal
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• Is Write-Allocate a good idea for this code?
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• Is Write-Allocate a good idea for this code?
• On every store miss, 32 bytes of data are read
into cache then discarded, so no
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• Is Write-Back a good idea for this code?
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• Is Write-Back a good idea for this code?
• For Write-Allocate, bad (32 bytes written back
for each 4 byte store (total of 64 bytes traffic))
• Otherwise, OK, except the Write-Through
alternative had a write buffer, which will
dramatically reduce miss penalty
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• If cache were fully associative, how could we
improve code’s performance?
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• If cache were fully associative, how could we
improve code’s performance?
• Block the transpose
• FA makes this easier; lots of solutions
• Here’s one; let B = 8 (words in cache line)
for(i = 0; i < N; i+=B)
for(j = 0; j < N; j++)
for(k = 0; k < B; k++)
B[j*N+(i+k)] = A[(i+k)*N+j];
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• What about TLB misses?
• 4KB pages, 1024-entry DM TLB
• Compulsory misses first
• 2 matrices * (1024^2 words)/(1024 words/page)
• = 2048
for(i = 0; i < N; i++)
for(j = 0; j < N; j++)
B[j*N+i] = A[i*N+j];
Quiz 2, Q1
• What about TLB misses?
•
•
•
•
•
•
•
• 4KB pages, 1024-entry DM TLB
Now consider some iteration 0 ≤ i < N-1
After iteration i, TLB[i] = Ai, and TLB[k] = Bk, k≠i
During iteration i+1, store to Bi will miss
Then store to Bi+1 will miss, kicking out Ai+1
Next load to Ai+1 will miss
for(i = 0; i < N; i++)
3 conflicts/iteration
for(j = 0; j < N; j++)
3072+2048 misses total
B[j*N+i] = A[i*N+j];
Quiz 2, Q2
• Basic idea of microtags: SA caches put tag
check on critical path (data-out)
• Reduce critical path by using subset of tag to
select way
• In this cache, microtag check -> data out remains
critical path, but 1/6 faster
Quiz 2, Q2
• AMAT = hit time + miss rate * miss penalty
• Hit time not multiplied by hit rate
• You have to pay the hit time even on a miss
Quiz 2, Q2
• Microtag uniqueness affects conflict misses
• Increases compared to 4-way SA
• But still much better than DM
• Otherwise, why would we build a
microtagged cache? Just use DM
Quiz 2, Q2
• Aliasing question was unintentionally tricky:
microtags are a red herring
• The aliasing problem is just the same as for any
virtually-indexed physically-tagged cache with
index+offset ≤ page offset
• Aliases always map to the same set, which would
be fine for DM, but with SA they can live in
different ways
Quiz 2, Q2
• Aliasing question was unintentionally tricky:
microtags are a red herring
• The aliasing problem is just the same as for any
virtually-indexed physically-tagged cache with
index+offset ≤ page offset
• Simple fix: on a miss, you already have the
physical tag and all physical tags in the set
• Iff there’s a match, there’s an alias
Quiz 2, Q3
• 2x associativity, capacity & line size constant
• Increases hit time due to data-out muxing
• Reduces conflict misses
• Halving line size (associativity & #sets constant)
• Reduces hit time (capacity down)
• Increases miss rate (same reason)
• Reduces miss penalty (shorter lines, less to fetch)
Quiz 2, Q3
• Physical -> virtual cache
• Hit time reduced (only real reason to do this is to
remove TLB from hit path)
• Effect on miss rate ambiguous
• More misses for aliases
• More misses for context switches w/o ASIDs
• Fewer misses due to address space contiguity
• Increased miss penalty because TLB lookup is
moved to miss path, and for anti-aliasing
Quiz 2, Q3
• Write buffer
• Reduces both store miss penalty and hit time
• HW prefetching
• HW isn’t on hit path, so no effect on hit time
• Reduces miss rate (main reason)
• Prefetch buffer hit considered “slow hit”, not
miss
• Reduces miss penalty (prefetches can be in-flight
when miss occurs)
Download