CS 152 Computer Architecture and Engineering Lecture 22: Final Lecture Krste Asanovic

advertisement
CS 152 Computer Architecture
and Engineering
Lecture 22: Final Lecture
Krste Asanovic
Electrical Engineering and Computer Sciences
University of California, Berkeley
http://www.eecs.berkeley.edu/~krste
http://inst.cs.berkeley.edu/~cs152
Today’s Lecture
• Review entire semester
– What you learned
• Follow-on classes
• What’s next in computer architecture?
5/6/2008
CS152-Spring’08
2
The New CS152 Executive Summary
(what was promised in lecture 1)
The processor your
predecessors built in
CS152
What you’ll
understand and
experiment with in
the new CS152
Plus, the technology
behind chip-scale
multiprocessors
(CMPs)
5/6/2008
CS152-Spring’08
3
From Babbage to IBM 650
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
5/6/2008
CS152-Spring’08
4
IBM 360: Initial Implementations
Model 30
...
Storage
8K - 64 KB
Datapath
8-bit
Circuit Delay 30 nsec/level
Local Store
Main Store
Control Store Read only 1sec
Model 70
256K - 512 KB
64-bit
5 nsec/level
Transistor Registers
Conventional circuits
IBM 360 instruction set architecture (ISA) completely
hid the underlying technological differences between
various models.
Milestone: The first true ISA designed as portable
hardware-software interface!
With minor modifications it still survives today!
5/6/2008
CS152-Spring’08
5
Microcoded Microarchitecture
busy?
zero?
opcode
holds fixed
microcode instructions
controller
(ROM)
Datapath
Data
holds user program
written in macrocode
instructions (e.g.,
MIPS, x86, etc.)
5/6/2008
Addr
Memory
(RAM)
CS152-Spring’08
enMem
MemWrt
6
Implementing Complex Instructions
Opcode
ldIR
zero?
OpSel
ldA
busy
32(PC)
31(Link)
rd
rt
rs
ldB
2
IR
ExtSel
2
Imm
Ext
enImm
rd
rt
rs
3
A
ALU
control
32 GPRs
+ PC ...
ALU
RegWrt
32-bit Reg
data
Bus
MA
addr
addr
B
enALU
Memory
MemWrt
enReg
data
enMem
32
rd  M[(rs)] op (rt)
M[(rd)]  (rs) op (rt)
M[(rd)]  M[(rs)] op M[(rt)]
5/6/2008
RegSel
ldMA
CS152-Spring’08
Reg-Memory-src ALU op
Reg-Memory-dst ALU op
Mem-Mem ALU op
7
From CISC to RISC
• Use fast RAM to build fast instruction cache of uservisible instructions, not fixed hardware microroutines
– Can change contents of fast instruction memory to fit what
application needs right now
• Use simple ISA to enable hardwired pipelined
implementation
– Most compiled code only used a few of the available CISC
instructions
– Simpler encoding allowed pipelined implementations
• Further benefit with integration
– In early ‘80s, can fit 32-bit datapath + small caches on a single chip
– No chip crossings in common case allows faster operation
5/6/2008
CS152-Spring’08
8
Nanocoding
Exploits recurring
control signal patterns
in code, e.g.,
ALU0 A  Reg[rs]
...
ALUi0 A  Reg[rs]
...
PC (state)
code
next-state
address
code ROM
nanoaddress
nanoinstruction ROM
data
• MC68000 had 17-bit code containing either 10-bit jump or 9-bit
nanoinstruction pointer
– Nanoinstructions were 68 bits wide, decoded to give 196 control
signals
5/6/2008
CS152-Spring’08
9
“Iron Law” of Processor Performance
Time = Instructions
Cycles
Time
Program
Program * Instruction * Cycle
– Instructions per program depends on source code, compiler
technology, and ISA
– Cycles per instructions (CPI) depends upon the ISA and the
microarchitecture
– Time per cycle depends upon the microarchitecture and the
base technology
Microarchitecture
Microcoded
Single-cycle unpipelined
Pipelined
5/6/2008
CS152-Spring’08
CPI
>1
1
1
cycle time
short
long
short
10
5-Stage Pipelined Execution
0x4
Add
PC
addr
rdata
we
rs1
rs2
rd1
ws
wd rd2
GPRs
IR
Inst.
Memory
I-Fetch
(IF)
Data
Memory
Imm
Ext
wdata
Write
Decode, Reg. Fetch Execute
Memory
(ID)
(EX)
(MA)
Back
t0 t1 t2 t3 t4 t5 t6 t7 . . . . (WB)
time
instruction1
instruction2
instruction3
instruction4
instruction5
5/6/2008
ALU
we
addr
rdata
IF1
ID1 EX1 MA1 WB1
IF2 ID2 EX2 MA2 WB2
IF3 ID3 EX3 MA3 WB3
IF4 ID4 EX4 MA4 WB4
IF5 ID5 EX5 MA5 WB5
CS152-Spring’08
11
Pipeline Hazards
• Pipelining instructions is complicated by HAZARDS:
– Structural hazards (two instructions want same hardware resource)
– Data hazards (earlier instruction produces value needed by later
instruction)
– Control hazards (instruction changes control flow, e.g., branches or
exceptions)
• Techniques to handle hazards:
– Interlock (hold newer instruction until older instructions drain out of
pipeline)
– Bypass (transfer value from older instruction to newer instruction as
soon as available somwhere in machine)
– Speculate (guess effect of earlier instruction)
• Speculation needs predictor, prediction check, and
recovery mechanism
5/6/2008
CS152-Spring’08
12
Exception Handling 5-Stage Pipeline
Commit
Point
PC address
Exception
Select
Handler
PC
5/6/2008
Kill F
Stage
D
Decode
E
Illegal
Opcode
+
M
Overflow
Data
Mem
Data address
Exceptions
Exc
D
Exc
E
Exc
M
PC
D
PC
E
PC
M Asynchronous
Kill D
Stage
Kill E
Stage
CS152-Spring’08
W
Interrupts
EPC Cause
Inst.
Mem
PC
Kill
Writeback
13
Processor-DRAM Gap (latency)
µProc 60%/year
CPU
“Moore’s Law”
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
7%/year
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
1
1982
1983
1984
1985
1986
1987
DRAM
1980
1981
Performance
1000
Time
Four-issue 2GHz superscalar accessing 100ns DRAM could
execute 800 instructions during time for one memory access!
5/6/2008
CS152-Spring’08
14
Common Predictable Patterns
Two predictable properties of memory references:
– Temporal Locality: If a location is referenced it is
likely to be referenced again in the near future.
– Spatial Locality: If a location is referenced it is likely
that locations near it will be referenced in the near
future.
5/6/2008
CS152-Spring’08
Memory Address (one dot per access)
Memory Reference Patterns
Temporal
Locality
Spatial
Locality
Time
Donald J. Hatfield, Jeanette Gerald: Program
Restructuring for Virtual Memory. IBM Systems Journal
10(3): 168-192 (1971)
Causes for Cache Misses
• Compulsory: first-reference to a block a.k.a. cold
start misses
- misses that would occur even with infinite cache
• Capacity: cache is too small to hold all data needed
by the program
- misses that would occur even under perfect
replacement policy
• Conflict: misses that occur because of collisions
due to block-placement strategy
- misses that would not occur with full associativity
5/6/2008
CS152-Spring’08
17
A Typical Memory Hierarchy c.2006
Split instruction & data
primary caches
(on-chip SRAM)
CPU
RF
Multiported
register file
(part of CPU)
5/6/2008
L1
Instruction
Cache
Multiple interleaved
memory banks
(DRAM)
Memory
Unified L2
Cache
L1 Data
Cache
Memory
Memory
Memory
Large unified secondary cache
(on-chip SRAM)
CS152-Spring’08
18
Modern Virtual Memory Systems
Illusion of a large, private, uniform store
Protection & Privacy
OS
several users, each with their private
address space and one or more
shared address spaces
page table  name space
Demand Paging
Provides the ability to run programs
larger than the primary memory
useri
Primary
Memory
Swapping
Store
Hides differences in machine
configurations
The price is address translation on
each memory reference
5/6/2008
CS152-Spring’08
VA
mapping
TLB
PA
19
Hierarchical Page Table
Virtual Address
31
22 21
p1
0
12 11
p2
offset
10-bit 10-bit
L1 index L2 index
offset
Root of the Current
Page Table
p2
p1
(Processor
Register)
Level 1
Page Table
page in primary memory
page in secondary memory
Level 2
Page Tables
PTE of a nonexistent page
5/6/2008
Data Pages
CS152-Spring’08
20
Address Translation & Protection
Virtual Address
Virtual Page No. (VPN)
offset
Kernel/User Mode
Read/Write
Protection
Check
Address
Translation
Exception?
Physical Address
Physical Page No. (PPN)
offset
• Every instruction and data access needs address
translation and protection checks
A good VM design needs to be fast (~ one cycle) and
space efficient -> Translation Lookaside Buffer (TLB)
5/6/2008
CS152-Spring’08
21
Address Translation in CPU Pipeline
PC
Inst
TLB
Inst.
Cache
D
Decode
E
TLB miss? Page Fault?
Protection violation?
+
M
Data
TLB
Data
Cache
W
TLB miss? Page Fault?
Protection violation?
• Software handlers need restartable exception on page fault or protection
violation
• Handling a TLB miss needs a hardware or software mechanism to refill TLB
• Need mechanisms to cope with the additional latency of a TLB:
– slow down the clock
– pipeline the TLB and cache access
– virtual address caches
– parallel TLB/cache access
5/6/2008
CS152-Spring’08
22
Concurrent Access to TLB & Cache
VA
VPN
L
TLB
PA
PPN
b
k
Page Offset
Tag
Virtual
Index
=
hit?
Direct-map Cache
2L blocks
2b-byte block
Physical Tag
Data
Index L is available without consulting the TLB
cache and TLB accesses can begin simultaneously
Tag comparison is made after both accesses are completed
Cases: L + b = k
5/6/2008
L+b<k
CS152-Spring’08
L+b>k
23
CS152 Administrivia
• Lab 4 competition winners!
• Quiz 6 on Thursday, May 8
– L19-21, PS 6, Lab 6
• Last 15 minutes, course survey
– HKN survey
– Informal feedback survey for those who’ve not done it already
• Quiz 5 results
5/6/2008
CS152-Spring’08
24
Complex Pipeline Structure
ALU
IF
ID
Issue
GPR’s
FPR’s
Mem
WB
Fadd
Fmul
Fdiv
5/6/2008
CS152-Spring’08
25
Superscalar In-Order Pipeline
PC
Inst. 2
D
Mem
Dual
Decode
• Fetch two instructions per cycle;
issue both simultaneously if one
is integer/memory and other is
floating-point
• Inexpensive way of increasing
throughput, examples include
Alpha 21064 (1992) & MIPS
R5000 series (1996)
• Same idea can be extended to
wider issue by duplicating
functional units (e.g. 4-issue
UltraSPARC) but register file
ports and bypassing costs grow
quickly
5/6/2008
GPRs
FPRs
X1
+
X1
X2
Data
Mem
X3
W
X2
Fadd
X3
W
X2
Fmul
X3
Commit
Point
Unpipelined
FDiv X2
CS152-Spring’08
divider
X3
26
Types of Data Hazards
Consider executing a sequence of
rk (ri) op (rj)
type of instructions
Data-dependence
r3  (r1) op (r2)
r5  (r3) op (r4)
Anti-dependence
r3  (r1) op (r2)
r1  (r4) op (r5)
Output-dependence
r3  (r1) op (r2)
r3  (r6) op (r7)
5/6/2008
CS152-Spring’08
Read-after-Write
(RAW) hazard
Write-after-Read
(WAR) hazard
Write-after-Write
(WAW) hazard
27
Phases of Instruction Execution
PC
I-cache
Fetch
Buffer
Issue
Buffer
Func.
Units
Result
Buffer
Arch.
State
5/6/2008
Fetch: Instruction bits retrieved
from cache.
Decode: Instructions placed in appropriate
issue (aka “dispatch”) stage buffer
Execute: Instructions and operands sent to
execution units .
When execution completes, all results and
exception flags are available.
Commit: Instruction irrevocably updates
architectural state (aka “graduation” or
“completion”).
CS152-Spring’08
28
Pipeline Design with Physical Regfile
Branch
Resolution
kill
Branch
Prediction
PC
Fetch
kill
kill
Decode &
Rename
Update predictors
kill
Out-of-Order
Reorder Buffer
In-Order
Commit
In-Order
Physical Reg. File
Branch
ALU MEM
Unit
Store
Buffer
D$
Execute
5/6/2008
CS152-Spring’08
29
Reorder Buffer Holds
Active Instruction Window
… (Older instructions)
ld r1, (r3)
add r3, r1, r2
sub r6, r7, r9
add r3, r3, r6
ld r6, (r1)
add r6, r6, r3
st r6, (r1)
ld r6, (r1)
… (Newer instructions)
Commit
Execute
Fetch
Cycle t
5/6/2008
…
ld r1, (r3)
add r3, r1,
sub r6, r7,
add r3, r3,
ld r6, (r1)
add r6, r6,
st r6, (r1)
ld r6, (r1)
…
r2
r9
r6
r3
Cycle t + 1
CS152-Spring’08
30
Branch History Table
Fetch PC
00
k
I-Cache
Instruction
Opcode
BHT Index
2k-entry
BHT,
2 bits/entry
offset
+
Branch?
Target PC
Taken/¬Taken?
4K-entry BHT, 2 bits/entry, ~80-90% correct predictions
5/6/2008
CS152-Spring’08
31
Two-Level Branch Predictor
Pentium Pro uses the result from the last two branches
to select one of the four sets of BHT bits (~95% correct)
00
Fetch PC
k
2-bit global branch
history shift register
Shift in
Taken/¬Taken
results of each
branch
Taken/¬Taken?
5/6/2008
CS152-Spring’08
32
Branch Target Buffer (BTB)
I-Cache
2k-entry direct-mapped BTB
PC
(can also be associative)
Entry PC
Valid
predicted
target PC
valid
target
k
=
match
•
•
•
•
Keep both the branch PC and target PC in the BTB
PC+4 is fetched if match fails
Only taken branches and jumps held in BTB
Next PC determined before branch fetched and decoded
5/6/2008
CS152-Spring’08
33
Combining BTB and BHT
• BTB entries are considerably more expensive than BHT, but can
redirect fetches at earlier stage in pipeline and can accelerate
indirect branches (JR)
• BHT can hold many more entries and is more accurate
BTB
BHT in later
pipeline stage
corrects when
BTB misses a
predicted
taken branch
BHT
A
P
F
B
I
J
R
E
PC Generation/Mux
Instruction Fetch Stage 1
Instruction Fetch Stage 2
Branch Address Calc/Begin Decode
Complete Decode
Steer Instructions to Functional units
Register File Read
Integer Execute
BTB/BHT only updated after branch resolves in E stage
5/6/2008
CS152-Spring’08
34
Sequential ISA Bottleneck
Sequential
source code
Superscalar compiler
Sequential
machine code
a = foo(b);
for (i=0, i<
Find independent
operations
Schedule
operations
Superscalar processor
Check instruction
dependencies
5/6/2008
Schedule
execution
CS152-Spring’08
35
VLIW: Very Long Instruction Word
Int Op 1
Int Op 2
Mem Op 1
Mem Op 2
FP Op 1
FP Op 2
Two Integer Units,
Single Cycle Latency
Two Load/Store Units,
Three Cycle Latency Two Floating-Point Units,
Four Cycle Latency
•
•
•
•
Multiple operations packed into one instruction
Each operation slot is for a fixed function
Constant operation latencies are specified
Architecture requires guarantee of:
– Parallelism within an instruction => no cross-operation RAW check
– No data use before data ready => no data interlocks
5/6/2008
CS152-Spring’08
36
Scheduling Loop Unrolled Code
Unroll 4 ways
loop: ld f1, 0(r1)
ld f2, 8(r1)
ld f3, 16(r1)
ld f4, 24(r1)
add r1, 32
fadd f5, f0, f1
fadd f6, f0, f2
fadd f7, f0, f3
fadd f8, f0, f4
sd f5, 0(r2)
sd f6, 8(r2)
sd f7, 16(r2)
sd f8, 24(r2)
add r2, 32
bne r1, r3, loop
5/6/2008
Int1
Int 2
loop:
add r1
M1
ld f1
ld f2
ld f3
ld f4
Schedule
M2
FP+
FPx
fadd f5
fadd f6
fadd f7
fadd f8
sd f5
sd f6
sd f7
add r2 bne sd f8
CS152-Spring’08
37
Software Pipelining
Int1
Unroll 4 ways first
loop: ld f1, 0(r1)
ld f2, 8(r1)
ld f3, 16(r1)
ld f4, 24(r1)
add r1, 32
fadd f5, f0, f1
fadd f6, f0, f2
fadd f7, f0, f3
fadd f8, f0, f4
sd f5, 0(r2)
sd f6, 8(r2)
sd f7, 16(r2)
add r2, 32
sd f8, -8(r2)
bne r1, r3, loop
5/6/2008
Int 2
M1
ld f1
ld f2
ld f3
add r1
ld f4
prolog
ld f1
ld f2
ld f3
add r1
ld f4
loop:
ld f1
iterate
ld f2
add r2 ld f3
add r1 bne ld f4
epilog
add r2
bne
CS152-Spring’08
M2
sd f5
sd f6
sd f7
sd f8
sd f5
sd f6
sd f7
sd f8
sd f5
FP+
FPx
fadd f5
fadd f6
fadd f7
fadd f8
fadd f5
fadd f6
fadd f7
fadd f8
fadd f5
fadd f6
fadd f7
fadd f8
38
Vector Programming Model
Scalar Registers
r15
v15
r0
v0
Vector Registers
[0]
[1]
[2]
[VLRMAX-1]
Vector Length Register
Vector Arithmetic
Instructions
ADDV v3, v1, v2
v1
v2
v3
Vector Load and
Store Instructions
LV v1, r1, r2
Base, r1
5/6/2008
VLR
+
+
[0]
[1]
v1
+
+
+
+
[VLR-1]
Vector Register
Memory
Stride, r2
CS152-Spring’08
39
Vector Unit Structure
Functional Unit
Vector
Registers
Elements
0, 4, 8, …
Elements
1, 5, 9, …
Elements
2, 6, 10, …
Elements
3, 7, 11, …
Lane
Memory Subsystem
5/6/2008
CS152-Spring’08
40
Vector Instruction Parallelism
Can overlap execution of multiple vector instructions
– example machine has 32 elements per vector register and 8 lanes
Load Unit
load
Multiply Unit
Add Unit
mul
add
time
load
mul
add
Instruction
issue
Complete 24 operations/cycle while issuing 1 short instruction/cycle
5/6/2008
CS152-Spring’08
41
Multithreading
How can we guarantee no dependencies between
instructions in a pipeline?
-- One way is to interleave execution of instructions
from different program threads on same pipeline
Interleave 4 threads, T1-T4, on non-bypassed 5-stage pipe
t0 t1 t2 t3 t4 t5 t6 t7
F D X MW
T1: LW r1, 0(r2)
F D X M
T2: ADD r7, r1, r4
F D X
T3: XORI r5, r4, #12
T4: SW 0(r7), r5
F D
T1: LW r5, 12(r1)
F
5/6/2008
t8
W
MW
X MW
D X MW
CS152-Spring’08
t9
Prior instruction in
a thread always
completes writeback before next
instruction in
same thread reads
register file
42
Time (processor cycle)
Multithreaded Categories
Superscalar
Fine-Grained Coarse-Grained
Thread 1
Thread 2
5/6/2008
Multiprocessing
Thread 3
Thread 4
CS152-Spring’08
Simultaneous
Multithreading
Thread 5
Idle slot
43
Power 4
SMT in Power
5
2 fetch (PC),
2 initial
decodes
5/6/2008
CS152-Spring’08
2 commits
(architected
register sets)
44
A Producer-Consumer Example
Producer
tail
head
Rtail
Rtail
Producer posting Item x:
Load Rtail, (tail)
Store (Rtail), x
Rtail=Rtail+1
Store (tail), Rtail
The program is written assuming
instructions are executed in order.
5/6/2008
Consumer
Rhead
R
Consumer:
Load Rhead, (head)
spin: Load Rtail, (tail)
if Rhead==Rtail goto spin
Load R, (Rhead)
Rhead=Rhead+1
Store (head), Rhead
process(R)
CS152-Spring’08
45
Sequential Consistency
A Memory Model
P
P
P
P
P
P
M
“ A system is sequentially consistent if the result of
any execution is the same as if the operations of all
the processors were executed in some sequential
order, and the operations of each individual processor
appear in the order specified by the program”
Leslie Lamport
Sequential Consistency =
arbitrary order-preserving interleaving
of memory references of sequential programs
5/6/2008
CS152-Spring’08
46
Sequential Consistency
Sequential consistency imposes more memory ordering
constraints than those imposed by uniprocessor
program dependencies (
)
What are these in our example ?
T1:
Store (X), 1 (X = 1)
Store (Y), 11 (Y = 11)
T2:
Load R1, (Y)
Store (Y’), R1 (Y’= Y)
Load R2, (X)
Store (X’), R2 (X’= X)
additional SC requirements
5/6/2008
CS152-Spring’08
47
Mutual Exclusion and Locks
Want to guarantee only one process is active in a critical
section
• Blocking atomic read-modify-write instructions
e.g., Test&Set, Fetch&Add, Swap
vs
• Non-blocking atomic read-modify-write instructions
e.g., Compare&Swap, Load-reserve/Store-conditional
vs
• Protocols based on ordinary Loads and Stores
5/6/2008
CS152-Spring’08
48
Snoopy Cache Protocols
Memory
Bus
M1
Snoopy
Cache
M2
Snoopy
Cache
M3
Snoopy
Cache
Physical
Memory
DMA
DISKS
Use snoopy mechanism to keep all processors’
view of memory coherent
5/6/2008
CS152-Spring’08
49
MESI: An Enhanced MSI protocol
increased performance for private data
Each cache line has a tag M: Modified Exclusive
Address tag
state
bits
P1 write
or read
M
E: Exclusive, unmodified
S: Shared
I: Invalid
P1 write
E
Other processor reads
P1 writes back
Read miss,
shared
Read by any
processor
5/6/2008
S
Other processor
intent to write
CS152-Spring’08
P1 read
Read miss,
not shared
Write miss
Other processor
intent to write
I
Cache state in
processor P1 50
Basic Operation of Directory
P
P
Cache
Cache
• k processors.
• With each cache-block in memory:
k presence-bits, 1 dirty-bit
Interconnection Network
Memory
• •
•
presence bits
Directory
• With each cache-block in cache:
1 valid bit, and 1 dirty (owner) bit
dirty bit
• Read from main memory by processor i:
• If dirty-bit OFF then { read from main memory; turn p[i] ON; }
• if dirty-bit ON then { recall line from dirty proc (cache state to
shared); update memory; turn dirty-bit OFF; turn p[i] ON; supply
recalled data to i;}
• Write to main memory by processor i:
• If dirty-bit OFF then {send invalidations to all caches that have the
block; turn dirty-bit ON; supply data to i; turn p[i] ON; ... }
5/6/2008
CS152-Spring’08
51
Directory Cache Protocol
(Handout 6)
CPU
CPU
CPU
CPU
CPU
CPU
Cache
Cache
Cache
Cache
Cache
Cache
Interconnection Network
Directory
Controller
Directory
Controller
Directory
Controller
Directory
Controller
DRAM Bank
DRAM Bank
DRAM Bank
DRAM Bank
• Assumptions: Reliable network, FIFO message
delivery between any given source-destination pair
5/6/2008
CS152-Spring’08
52
Performance of Symmetric Shared-Memory
Multiprocessors
Cache performance is combination of:
1. Uniprocessor cache miss traffic
2. Traffic caused by communication
– Results in invalidations and subsequent cache misses
• Adds 4th C: coherence miss
– Joins Compulsory, Capacity, Conflict
– (Sometimes called a Communication miss)
5/6/2008
CS152-Spring’08
53
Intel “Nehalem”
(2008)
• 2-8 cores
• SMT (2 threads/core)
• Private L2$/core
• Shared L3$
• Initially in 45nm
5/6/2008
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
CS152-Spring’08
54
Related Courses
CS 258
Parallel Architectures,
Languages, Systems
CS61C
Strong
Prerequisite
Basic computer
organization, first look
at pipelines + caches
5/6/2008
CS 152
Computer Architecture,
First look at parallel
architectures
CS 252
Graduate Computer
Architecture,
Advanced Topics
CS 150
CS 194-6
Digital Logic Design
New FPGA-based
Architecture Lab Class
CS152-Spring’08
55
Advice: Get involved in research
E.g.,
• RADLab - data center
• ParLab - parallel clients
• Undergrad research experience is the most important
part of application to top grad schools.
5/6/2008
CS152-Spring’08
56
End of CS152
• Thanks for being such patient guinea pigs!
– Hopefully your pain will help future generations of CS152 students
5/6/2008
CS152-Spring’08
57
Acknowledgements
• These slides contain material developed and
copyright by:
–
–
–
–
–
–
Arvind (MIT)
Krste Asanovic (MIT/UCB)
Joel Emer (Intel/MIT)
James Hoe (CMU)
John Kubiatowicz (UCB)
David Patterson (UCB)
• MIT material derived from course 6.823
• UCB material derived from course CS252
5/6/2008
CS152-Spring’08
58
Download