CS 152 Computer Architecture and Engineering Midterm II Review Session Lecture 26 --

advertisement
CS 152
Computer Architecture and Engineering
Lecture 26 -- Midterm II Review Session
2014-4-29
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
Play:
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Today - Midterm II Review Session
Study Tips
HW 2, problem by problem
(if there is time)
HKN
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
CS152 Midterm II
May 1st, 2014
#
Points
Name:
1
25
SSID:
2
25
3
25
4
25
“All the work is my own. I have no prior knowledge
of the exam contents, aside from guidance from
class staff. I will not share the contents with others
in CS152 who have not taken it yet.”
Signature:
Please write clearly, and put your name on each
page. Please abide by word limits. Good luck!
Eric Love
John Lazzaro
Tot 100
What does it cover? Lectures 9 onward
Focus will be on problems that require you to do a task
(write a small program, trace through execution ,etc)
that demonstrates that you understand a concept.
[...]
No transistor-level questions (DRAM and SRAM cells,
etc)
Time for a quick walk-through
...
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
CS 152
Computer Architecture and Engineering
Lecture 9 -- Memory
2014-2-18
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Latency is not the same as bandwidth!
Thus, push to
faster DRAM
interfaces
1
13-bit
row
address
input
of
81
92
de
co
de
r
What if we want all of the 16384 bits?
In row access time (55 ns) we can do
22 transfers at 400 MT/s.
16-bit chip bus -> 22 x 16 = 352 bits <<
Now the row access
16384time looks fast!
16384
columns
8192 rows
134 217 728 usable bits
(tester found good bits in bigger array)
16384 bits delivered by sense
amps
Select requested bits, send off the
CS 152 L9: Memory
UC Regents Spring 2014 © UCB
CS 152
Computer Architecture and Engineering
Lecture 10 -- Cache I
2014-2-20
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Latency: A closer look
Read latency: Time to return first byte of a random access
Reg
L1 Inst
L1
Data
L2
DRAM
Disk
Size
1K
64K
32K
512K
256M
80G
Latency
(cycles)
1
3
3
11
160
1E+07
Latency
(sec)
0.6n
1.9n
1.9n
6.9n
100n
12.5m
1.6G 533M 533M 145M
10M
80
Hz
Architect’s latency toolkit:
(1) Parallelism. Request data from N 1-bit-wide memories
at the same time. Overlaps latency cost for all N bits.
Provides N times the bandwidth. Requests to N memory
banks (interleaving) have potential of N times the
bandwidth.
(2)
Pipeline memory. If memory has N cycles of latency,
issue a request each cycle, receive it N cycles later.
CS 194-6 L8: Cache
UC Regents Fall 2008 © UCB
CS 152
Computer Architecture and Engineering
Lecture 11 -- Cache II
2014-2-25
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Issue #4: When to write to lower level ...
Write-Through
Write-Back
Policy
Data written to
cache block
also written to
lower-level
memory
Write data only
to the cache
Update lower
level when a
block falls out
of the cache
Do read misses
produce writes?
No
Yes
Do repeated
writes make it
to lower level?
Yes
No
CS 152 L11: Cache II
Related
issue:
Do writes to
blocks not
in the cache
get put in
the cache
(”writeallocate”)
or not?
UC Regents Spring 2014 © UCB
CS 152
Computer Architecture and Engineering
Lecture 12 -- Virtual Memory
2014-2-27
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
The TLB caches page table entries
In this example,
physical and virtual
pages must be
the same size!
TLB caches
page table
entries.
virtual address
page
for ASID
off
Physical
frame
address
Page Table
2
0
1
3
TLB
frame page
2
2
0
5
CS 152 L15: Virtual Memory
physical address
frame
page off
MIPS handles TLB misses
in software (random
replacement). Other
machines use hardware.
V=0 pages either
reside on disk or
have not yet been
allocated.
OS handles V=0
“Page fault”
UC Regents Fall 2006 © UCB
CS 152
Computer Architecture and Engineering
Lecture 13 - Synchronization
2014-3-4
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Non-blocking consumer synchronization
Another atomic read-modify-write instruction:
Compare&Swap(Rt,Rs, m)
if (Rt == M[m])
then
M[m] = Rs; Rs = Rt; /* do swap */
else
/* do not swap */
Assuming sequential consistency: MEMBARs not shown
...
try: LW R3, head(R0) ; Load queue head into R3
spin: LW R4, tail(R0)
BEQ R4, R3, spin
LW R5, 0(R3)
ADDI R6, R3, 4
Compare&Swap R3,
BNE R3, R6, try
; Load queue tail into R4
; If queue empty, wait
; Read x from queue into R5
; Shift head by one word
R6, head(R0); Try to update head
; If not success, try again
If R3 != R6, another thread got here first, so we must try
again.
If
thread swaps out before Compare&Swap, no latency
problem;
CS 152 L24: Multiprocessors
UC Regents Fall 2006 © UCB
CS 152
Computer Architecture and Engineering
Lecture 14 - Cache Design and Coherence
2014-3-6
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Writes from 10,000 feet ... for write-thru L1
1. Writing CPU
takes control of
bus.
For write-thru caches ...
CPU1
CPU0
Cache
Snooper
Cache
Snooper
Memory
bus
Shared Main Memory Hierarchy
To a first-order, reads will “just work”
if write-thru caches implement this
policy.
A “two-state” protocol (cache lines
are “valid” or “invalid”).
CS 152 L14: Cache Design and Coherency
2. Address to be
written is
invalidated in all
other caches.
Reads will no
longer hit in cache
and get stale
data.
3. Write is sent to
main memory.
Reads will cache
miss, retrieve new
value from main
UC Regents Spring 2014 © UCB
CS 152
Computer Architecture and Engineering
Lecture 15 -- Advanced CPUs
2014-3-11
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L15: Superscalars and Scoreboards
UC Regents Spring 2014 © UCB
Split pipelines: a write-after-write hazard.
Solution: SUB detects R1 clash in decode
stage and stalls, via a pipe-write scoreboard.
WAW Hazard
DIV R1, R2, R3
SUB R1, R2, R3
If long latency DIV
and short latency
SUB are sent to
parallel pipes, SUB
may finish first.
CS 194-6 L9: Advanced Processors I
The pipeline splits after the RF
stage, feeding functional units
with different latencies.
UC Regents Fall 2008 © UCB
IF (Fetch)
Superscalar
R machine
ID (Decode)
IR
IR
RegFile
rd1
rs2
ws1
64
WB
IR
IR
Y
R
rd2
Y
R
IR
IR
B
wd1
Data
Instr
Mem
rs3
Addr
ws2
rd3
A
rs4
rd4
B
wd2
32
PC and
Sequencer
MEM
A
rs1
Instruction
Issue Logic
EX (ALU)
WE1
WE2
IR
IF (Fetch)
CS 194-6 L9: Advanced Processors I
IR
ID (Decode)
EX (ALU)
MEM
WB
UC Regents Fall 2008 © UCB
CS 152
Computer Architecture and Engineering
Lecture 17 -- Networks, Routers, Google
2014-3-20
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
6 key parameters scale across dimension of
“by one server”, “by 80-server rack” and “by array”
To get more DRAM and disk capacity,
you must work on a scale larger than a single server.
But as you do, latency and bandwidth degrade,
because network performance << a server bus,
and because array network is under-provisioned.
Exception: disk latency is roughly scale-independent.
CS 152
Computer Architecture and Engineering
Lecture 18 -- Dynamic Scheduling I
2014-4-1
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Given an endless supply of registers ...
Rename “architected registers” (Ri, Fi) to new
“physical registers” (PRi, PFi) on each write.
ADDI R1,R0,64
ADDI PR01,PR00,64
R1→ PR01
F0→ PF00
F4,0(R1)
LD PF00 0(PR01)
ADDD PF04, PF00, PF02
SD PF04, 0(PR01)
SUBI PR11, PR01, 8
BEQZ PR11 ENDLOOP
ITER2: LD PF10 0(PR11)
What was gained?
An instruction
may execute once all of
its source registers
have been written.
CS 152 L18: Dynamic Scheduling I
ADDD PF14, PF10, PF02
SD PF14, 0(PR11)
SUBI PR21, PR11, 8
BEQZ PR21 ENDLOOP
ITER3: LD PF20 O(PR21)
[...]
UC Regents Spring 2014 © UCB
CS 152
Computer Architecture and Engineering
Lecture 19 -- Dynamic Scheduling II
2014-4-3
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Rename stage close-up:
(1) Allocates new physical registers for destinations,
(2) Looks up physical register numbers for sources,
(3) Handle rename dependences within the 4
issuing instructions in one clock cycle!
For mis-speculation recovery
Timestamped.
Input: 4 instructions specifying
architected registers.
Output:
12 physical
registers
numbers:
1 destination
and 2 sources
for the 4
instructions
to be issued.
CS 152
Computer Architecture and Engineering
Lecture 20 -- Dynamic Scheduling III
2014-4-8
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Micro-op translation example ...
ADC m32, r32: // for a simple m32 address mode
Becomes:
LD T1 0(EBX); // EBX register point to m32
ADD T1, T1, CF; // CF is carry flag from EFLAGS
ADD T1, T1, r32; // Add the specified register
ST 0(EBX) T1; // Store result back to m32
Instruction traces of IA-32 programs show most
executed instructions require 4 or fewer micro-ops.
Translation for these ops are cast into
logic gates, often over several pipeline cycles.
CS 152 L20: Dynamic Scheduling III
UC Regents Fall 2006 © UCB
CS 152
Computer Architecture and Engineering
Lecture 21 -- Dataflow
2014-4-10
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Dataflow
stages
of 21264
Idea: Write dataflow programs that reference
physical registers, to execute on this machine.
Input:
Instructions
that
reference
physical
registers.
Scoreboard: Tracks writes
to physical registers.
CS 152
Computer Architecture and Engineering
Lecture 22 -- GPU + SIMD + Vectors I
2014-4-15
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Pure
data
move
opcode.
Or, part of
a math
opcode.
CS 152
Computer Architecture and Engineering
Lecture 23 -- GPU + SIMD + Vectors II
2014-4-17
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Assume MacBook Air ... 1386 x 768 screen ...
We are all zoomed
in on Google Maps
Lets us cache
a 1024 x 1024
window of the
11 PB Earth map
in 34.7 MB!
Top pyramid image is 4K x 4K ...
Idea: Keep only a 1386 x 768
window of top images in RAM ...
Zoom all
the way in ...
units
of pixels
Bottom stack image shows the
smallest part of the 1 mile sq.
patch of the Earth
of any stack image.
units of
sq. miles
Graphics hardware
displays bottom stack
image, which fills MacBook
Air display.
units
of
miles
Hardware
interpolation
of stack levels.
CS 152
Computer Architecture and Engineering
Lecture 24 -- Voxel Processing
2014-4-22
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
After processing ...
A 3-D matrix of
cubes, in object
space (X,Y,Z).
8-bit density
value stored
for each cube
(0 = “air”).
256^3 = 16 MB
= 10 inch cube
(for 1mm voxels)
0.125 mm voxels?
8 GB
Interesting to computer architects
because n^3 grows so quickly!
CS 152
Computer Architecture and Engineering
Lecture 25 -- Digital Imaging
2014-4-24
John Lazzaro
(not a prof - “John” is always OK)
TA: Eric Love
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Camera interface to the outside world
Simple Power Hookup
Serial port to
control the
camera.
8-bit Dout Port
54 MHz Clk
1280 x 1024
@ 15 fps
640 x 512
@ 30 fps
YCrCb 4:2:2
CS 250 L12: CMOS Imagers
UC Regents Fall 2012 © UCB
AWARE-2:
Array of
98 phone
camera
modules
(14 M-pixel)
1.3
G-pixel
camera
@3
frames/sec
On Thursday
Mid-term II ...
Ground rules ...
Mid-term: How to do well ...
Problem intro often features a lecture slide.
If you have to teach yourself that slide
during the test, you’re starting out behind.
Getting the problem correct requires
thinking on your feet to do a new design
or analyze one given to you.
There will not be “you can only get it if do
the reading” problems ... but the reading
helps you understand how to think through
the problem.
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Mid-term: There may be math ...
No memorization: If we ask about Amdahl’s
Law, we will show its
definition lecture slide.
Understanding is needed: A problem may
require you to apply equation to a design,
etc.
Cannot use
You may need to do:
electronic devices
simple algebra and calculus,
... more
add a few numbers by hand,
administrative
etc.
info after we do
some content.
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
When is it? Where is it? Ground rules.
9:30 AM sharp, Tuesday May 1st,
306 Soda.
Every-other-seat seating, except for the
front rows, where every-seat is permitted.
No blue-books needed. We will be handing
out a paper test. Pencil is preferred.
Pencils down @ 10:55 AM, so we can
collect papers before next class comes in.
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
When is it? Where is it? Ground rules.
No use of calculators, smartphones,
laptops, etc ... during the exam.
Closed-book, closed-notes. Just pencils,
erasers. No consulting with students.
Restroom breaks are OK, but you’ll still
need to hand in your exam @ 10:55.
Questions are reserved for serious
concerns about a bug in the question.
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
Today - Midterm II Review Session
Study Tips
HW 2, problem by problem
(if there is time)
HKN
CS 152 L16: Midterm I Review
UC Regents Spring 2014 © UCB
On Thursday
Mid-term II ...
See you there !
Download