Lecture 1

advertisement
ECE 5367
4436
Introduction to Computer Architecture
and Design
Ji Chen
Section : T TH 1:00PM – 2:30PM
Prerequisites: ECE 4436
ECE 5367
4436
Instructor:
Ji Chen
Email: jchen18@uh.edu
Tel: (713)-743-4423
Office: W328
Office Hour: T TH 2:30-3:30 or
by appointment
TA:
None
ECE 5367
4436
ECE 5367
4436
Course Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction, basic computer organization
Instruction formats, instruction sets and their design
ALU design: Adders, subtracters, logic operations
Multiplication, division, floating point arithmetic
Datapath design
Control design: Hardwired control, microprogrammed control
Pipelining
Memory systems
I/O
ECE 5367
4436
Web: http://www.egr.uh.edu/courses/ece/ECE5367/
Grading
HW/Quiz/Lab
10 %
Project
15 %
Exam 1
25 %
Exam 2
25 %
Exam 3
25 %
Academic Honesty Statement
ECE 5367
4436
Computer Organization and Design: The Hardware/Software Interface
by David A. Patterson, John L. Hennessy, 3rd edition
Required
NOT REQUIRED
ECE 5367
4436
Home works/quiz:
Labs:
Laboratory assignments may be worked in teams of two (2);
however, there should be no collaboration between teams . .
Lab assignments turned in late will be penalized 25 points for each calendar day.
Both students in a team will receive the same grade for the project.
Projects:
Exams:
There will be several graded homework/lab assignments. Home works
turned in late will be
accepted only under extraordinary circumstances.
Teams of four (4): describe computer architecture of a modern technology
two mid-term exams, and one final exam.
A missed exam will result in a grade of zero  Let me know immediately if you have
any situation
Final Exam - TBD
Grading:
Your final grade will be computed as follows:
HW/Quiz/Lab
10 %
Project
15 %
Exam 1
25 %
Exam 2
25 %
Exam 3
25 %
ECE 5367
4436
• Since 1946 all computers have had 5 components
Processor
Input
Control
Memory
Datapath
Output
ECE 5367
4436
• TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module
SuperSPARC
Floating-point Unit
L2
$
Integer Unit
Inst
Cache
Ref
MMU
Data
Cache
CC
MBus
L64852 MBus control
M-S Adapter
SBus
Store
Buffer
Bus Interface
Message Bus (Mbus)
DRAM
Controller
SBus
DMA
SBus
Cards
SCSI
Ethernet
STDIO
serial
kbd
mouse
audio
RTC
Floppy
ECE 5367
4436
Computer Architecture
Application
Operating
System
Compiler
Instr. Set Proc.
Firmware
I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Layout
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
ECE 5367
4436
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
Cleverness
History
ECE 5367
4436
Mixed-Signal
ECE 5367
4436
Where are We Going??
Input
Multiplier
Input
Multiplicand
32
Multiplicand
Register
LoadMp
32=>34
signEx
32
34
34
1
0
34x2 MUX
Multi x2/x1
34
34
Arithmetic
Sub/Add
34-bit ALU
Control
Logic
32
32
2
ShiftAll
2
LO register
(16x2 bits)
Prev
HI register
(16x2 bits)
Booth
Encoder
Extra
2 bits
2
"LO
[0]"
34
LO[1]
Single/multicycle
Datapaths
<<1
32=>34
signEx
ENC[2]
ENC[1]
ENC[0]
LoadLO
ClearHI
LoadHI
2
32
Result[HI]
LO[1:0]
32
Result[LO]
1000
“Moore’s Law”
Exec Mem WB
IFetchDcd
Exec Mem WB
Performance
Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
9%/yr.
DRAM (2X/10
yrs)
1
198
198
0
1
198
198
2
198
3
4
198
5
198
698
1
198
7
8
198
9
199
099
1
199
2
199
199
3
4
199
5
199
6
199
1
799
8
199
900
2
0
IFetchDcd
ECE 5367
Spring 08
100
µProc
CPU 60%/yr.
(2X/1.5yr)
Time
IFetchDcd
Exec Mem WB
IFetchDcd
Exec Mem WB
Pipelining
I/O
Memory Systems

ECE 5367
4436
• Purchasing perspective
– Given a collection of machines, which has the
• Best performance ?
• Least cost ?
• Best performance / cost ?
• Design perspective
– Faced with design options, which has the
• Best performance improvement ?
• Least cost ?
• Best performance / cost ?
• Both require
– basis for comparison
– metric for evaluation
• Our goal: understand cost & performance implications of architectural
choices
ECE 5367
4436
Two Notions of “Performance”
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
Concorde
3 hours
1350 mph
132
178,200
Which has higher performance?
• Time to do the task (Execution Time)
– execution time, response time, latency
• Tasks per day, hour, week, sec, ns. .. (Performance)
– throughput, bandwidth
Response time and throughput often are in opposition
ECE 5367
4436
Definitions
• Performance is in units of things-per-second
– bigger is better
• If we are primarily concerned with response time
– performance(x) =
1
execution_time(x)
" X is n times faster than Y" means
n
=
Performance(X)
---------------------Performance(Y)
Example
ECE 5367
4436
• Time of Concorde vs. Boeing 747?
• Concord is 1350 mph / 610 mph = 2.2 times faster
= 6.5 hours / 3 hours
• Throughput of Concorde vs. Boeing 747 ?
• Concord is 178,200 pmph / 286,700 pmph
• Boeing is 286,700 pmph / 178,200 pmph
= 0.62 “times faster”
= 1.60 “times faster”
• Boeing is 1.6 times (“60%”) faster in terms of throughput
• Concord is 2.2 times (“120%”) faster in terms of flying time
We will focus primarily on execution time for a single job
Lots of instructions in a program => Instruction throughput important!
ECE 5367
4436
CPU
Performance
= Seconds
Program
= Instructions x Cycles
Program
Instruction
x Seconds
Cycle
ECE 5367
4436
Amdahl's Law
Speedup due to enhancement E:
ExTime w/o E
Speedup(E) = -------------------ExTime w/ E
Performance w/ E
= --------------------Performance w/o E
Suppose that enhancement E accelerates a fraction F of the task by a
factor S and the remainder of the task is unaffected then,
ExTime(with E) = ((1-F) + F/S) x ExTime(without E)
Speedup(with E) =
1
(1-F) + F/S
ECE 5367
4436
Base Machine
Op
ALU
Load
Store
Branch
Freq
50%
20%
10%
20%
Typical Mix
Cycles
1
5
3
2
CPI(i)
.5
1.0
.3
.4
2.2
% Time
23%
45%
14%
18%
How much faster would the machine be if a better data cache
reduced the average load time to 2 cycles?
How does this compare with using branch prediction to save a
cycle off the branch time?
What if two ALU instructions could be executed at once?
Download