CMPT 241 Web Programming

advertisement
CMPT 334 Computer Organization
Instructor: Tina Tian
General Information
• Email: tina.tian@manhattan.edu
• Office: RLC 203A
• Office Hours: Monday, Thursday 1:30 – 2:30 PM
Wednesday 12:00 – 1:00 PM
or by appointment
• Website: home.manhattan.edu/~tina.tian
About the Course
• Mon, Thur 3:00–4:15PM
• Textbook:
▫ David A. Patterson and John
L. Hennessy, "Computer
Organization and Design:
The Hardware Software
Interface".
▫ 5th edition
▫ or 4th edition revised printing
About the Authors
• David A. Patterson
▫ pioneer of Reduced
Instruction Set Computer
(RISC)
• John L. Hennessy
▫ founder of MIPS
Computer Systems Inc
Grading
•
•
•
•
1st Midterm Exam (5th week)
2nd Midterm Exam (10th week)
Final Exam
Homework
15%
15%
30%
40%
Homework
• Homework assignments involve
▫ written problems
▫ assembly language programming
▫ logic design using tools
• Late work has points deducted.
• Not accepted a week after the deadline
Homework Policy
• You may discuss the homework with other
students.
• However, you must acknowledge the people you
worked with.
• And you must independently write up your own
solutions.
• Any written sources used (apart from the text)
must also be acknowledged.
What you will learn
• A study of the internal architecture of a
computer.
▫
▫
▫
▫
▫
Digital logic
Machine-level representation of data
Assembly level machine organization
Memory system organization and architecture
What determines program performance and how
it can be improved
▫ ...
Software
• Assembly language simulator
▫ SPIM
▫ MARS assembler and runtime simulator
• Digital logic simulator
▫ Logisim
Advises
• Take notes
• Do NOT check emails, Facebook, etc.
▫ Turn the monitors off
• Don’t use the printer in class.
▫ Print after class, no points off
• Start the homework early
Chapter 1
Computer Abstractions
and Technology
[Adapted from Computer Organization and Design 5th Edition,
Patterson & Hennessy, © 2014, MK]
Classes of Computers
• Personal computers
▫ General purpose, variety of software
▫ Subject to cost/performance tradeoff
• Server computers
▫ Network based
▫ High capacity, performance, reliability
▫ Range from small servers to building sized
Classes of Computers
• Supercomputers
▫ High-end scientific and engineering calculations
▫ Hundreds to thousands of processors, terabytes of
memory and petabytes of storage
• Embedded computers
▫ Hidden as components of systems
▫ Stringent power/performance/cost constraints
Review: Some Basic Definitions
• Kilobyte – 210 or 1,024 bytes
• Megabyte– 220 or 1,048,576 bytes
▫ sometimes “rounded” to 106 or 1,000,000 bytes
• Gigabyte – 230 or 1,073,741,824 bytes
▫ sometimes rounded to 109 or 1,000,000,000 bytes
• Terabyte – 240 or 1,099,511,627,776 bytes
▫ sometimes rounded to 1012 or 1,000,000,000,000 bytes
• Petabyte – 250 or 1024 terabytes
▫ sometimes rounded to 1015 or 1,000,000,000,000,000
bytes
• Exabyte – 260 or 1024 petabytes
▫ Sometimes rounded to 1018 or
1,000,000,000,000,000,000 bytes
The PostPC Era
Embedded Processor Characteristics
The largest class of computers spanning the
widest range of applications and performance
▫ Often have minimum performance requirements.
▫ Often have stringent limitations on cost.
▫ Often have stringent limitations on power
consumption.
▫ Often have low tolerance for failure.
Below Your Program
• Application software
▫ Written in high-level language
• System software
▫ Compiler: translates HLL code to
machine code
▫ Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources
• Hardware
▫ Processor, memory, I/O controllers
Levels of Program Code
• High-level language
▫ Level of abstraction closer to
problem domain
▫ Provides for productivity
and portability
• Assembly language
▫ Textual representation of
instructions
• Hardware representation
▫ Binary digits (bits)
▫ Encoded instructions and
data
Under the Covers
• Five classic components of a computer – input, output,
memory, datapath, and control

datapath
+ control
=
processor
(CPU)
Moore’s Law
• The number of transistors on integrated circuits
doubles every 18 – 24 months.
▫ Predicted by Gordon Moore, co-founder of Intel
Moore’s Law
• Moore predicted the growth rate of the number
of transistors per chip.
• A transistor is an on/off switch controlled by
electricity.
• The integrated circuit combines dozens to
hundreds of transistors into a single chip.
• Very large-scale integrated circuit (VLSI) –
millions of transistors
Technology Scaling Road Map
Year
2004
2006
2008
2010
2012
2014
Feature size (nm)
90
65
45
32
22
16
Intg. Capacity (BT)
2
4
6
16
32
64
• Fun facts about 45nm transistors
▫ 30 million can fit on the head of a pin
▫ You could fit more than 2,000 across the width of a
human hair
▫ If car prices had fallen at the same rate as the price
of a single transistor has since 1968, a new car
today would cost about 1 cent
Another Example of Moore’s Law Impact
DRAM capacity growth over 3 decades
But What Happened to Clock Rates and Why?


The power wall


Clock rates hit a
“power wall”
We can’t remove more heat
How else can we improve performance?
A Sea Change is at Hand
• The power challenge has forced a change in the design
of microprocessors
▫ Since 2002 the rate of improvement in the response time
of programs on desktop computers has slowed from a
factor of 1.5 per year to less than a factor of 1.2 per year
• As of 2006 all desktop and server companies are
shipping microprocessors with multiple processors –
cores – per chip

Plan of record is to double the number of cores per chip
per generation (about every two years)
Uniprocessor Performance
Understanding Performance
• Algorithm
▫ Determines number of operations executed
• Programming language, compiler, architecture
▫ Determine number of machine instructions executed
per operation
• Processor and memory system
▫ Determine how fast instructions are executed
• I/O system (including OS)
▫ Determines how fast I/O operations are executed
Defining Performance
• Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
1500
0
100000 200000 300000 400000
Passengers x mph
Throughput versus Response Time
• Response time (execution time) – the time between
the start and the completion of a task
▫ Important to individual users
• Throughput (bandwidth) – the total amount of
work done in a given time
▫ Important to data center managers

Will need different performance metrics as well as a
different set of applications to benchmark embedded and
desktop computers, which are more focused on response
time, versus servers, which are more focused on
throughput
Defining (Speed) Performance
• To maximize performance, need to minimize execution
time
performanceX = 1 / execution_timeX
If X is n times faster than Y, then
performanceX
execution_timeY
-------------------- = --------------------- = n
performanceY
execution_timeX

Decreasing response time almost always improves
throughput
Relative Performance Example
• If computer A runs a program in 10 seconds and
computer B runs the same program in 15 seconds,
how much faster is A than B?
We know that A is n times faster than B if
performanceA
execution_timeB
-------------------- = --------------------- = n
performanceB
execution_timeA
The performance ratio is
So A is 1.5 times faster than B
15
------ = 1.5
10
Measuring Execution Time
• Elapsed time
▫ Total response time, including all aspects
 Processing, I/O, OS overhead, idle time
▫ Determines system performance
• CPU time
▫ Time spent processing a given job
 Discounts I/O time, other jobs’ shares
▫ Comprises user CPU time and system CPU time
▫ Different programs are affected differently by CPU
and system performance
CPU Clocking
• Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle


e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
CPU Time  CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles

Clock Rate
• Performance improved by
▫ Reducing number of clock cycles
▫ Increasing clock rate
▫ Hardware designer must often trade off clock rate
against cycle count
Improving Performance Example
• A program runs on computer A with a 2 GHz clock in 10
seconds. What clock rate must computer B run at to run
this program in 6 seconds? Unfortunately, to
accomplish this, computer B will require 1.2 times as
many clock cycles as computer A to run the program.
CPU timeA= ------------------------------CPU clock cyclesA
clock rateA
CPU clock cyclesA = 10 sec x 2 x 109 cycles/sec
= 20 x 109 cycles
CPU timeB= ------------------------------1.2 x 20 x 109 cycles
clock rateB
clock rateB= ------------------------------1.2 x 20 x 109 cycles
= 4 GHz
6 seconds
Clock Cycles per Instruction
• Not all instructions take the same amount of time to
execute
▫ One way to think about execution time is that it equals the
number of instructions executed multiplied by the average
time per instruction
# CPU clock cycles
for a program =

# Instructions
x
for a program
Average clock cycles
per instruction
Clock cycles per instruction (CPI) – the average number
of clock cycles each instruction takes to execute

A way to compare two different implementations of the same
ISA
CPI
CPI for this instruction class
A
B
C
1
2
3
Instruction Count and CPI
Clock Cycles  Instruction Count  Cycles per Instruction
CPU Time  Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI

Clock Rate
• Instruction Count for a program
▫ Determined by program, ISA and compiler
• Average cycles per instruction
▫ Determined by CPU hardware
▫ If different instructions have different CPI
 Average CPI affected by instruction mix
Using the Performance Equation
• Computers A and B implement the same ISA. Computer
A has a clock cycle time of 250 ps and an effective CPI of
2.0 for some program and computer B has a clock cycle
time of 500 ps and an effective CPI of 1.2 for the same
program. Which computer is faster and by how much?
Each computer executes the same number of instructions, I,
so
CPU timeA = I x 2.0 x 250 ps = 500 x I ps
CPU timeB = I x 1.2 x 500 ps = 600 x I ps
Clearly, A is faster … by the ratio of execution times
performanceA
execution_timeB 600 x I ps
------------------- = --------------------- = ---------------- = 1.2
performanceB
execution_timeA 500 x I ps
THE Performance Equation
• Our basic performance equation is then
CPU time
= Instruction_count x CPI x clock_cycle
or
CPU time

=
Instruction_count x CPI
----------------------------------------------clock_rate
These equations separate the three key factors that affect
performance




Can measure the CPU execution time by running the program
The clock rate is usually given
Can measure overall instruction count by using profilers/
simulators without knowing all of the implementation details
CPI varies by instruction type and ISA implementation for which
we must know the implementation details
Determinates of CPU Performance
CPU time
= Instruction_count x CPI x clock_cycle
Instruction_
count
CPI
Algorithm
X
X
Programming
language
X
X
Compiler
X
X
ISA
X
X
clock_cycle
X
Example
• A given application written in Java runs 15 seconds on a
desktop processor. A new Java compiler is released that
requires only 0.6 as many instructions as the old
compiler. Unfortunately, it increases the CPI by 1.1. How
fast can we expect the application to run using this new
compiler?
Effective (Average) CPI
• Computing the overall effective CPI is done by
looking at the different types of instructions and
their individual cycle counts and averaging
n
Overall effective CPI =

(CPIi x ICi)
i=1




Where ICi is the count (percentage) of the number of
instructions of class i executed
CPIi is the (average) number of clock cycles per instruction for
that instruction class
n is the number of instruction classes
The overall effective CPI varies by instruction mix – a
measure of the dynamic frequency of instructions across
one or many programs
CPI Example
• Alternative compiled code sequences using
instructions in classes A, B, C

Class
A
B
C
CPI for class
1
2
3
IC in sequence 1
2
1
2
IC in sequence 2
4
1
1
Sequence 1: IC = 5


Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6


Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5
Turbo Mode
• Clock cycle time has traditionally been fixed.
• Today’s processors can vary their clock rates.
▫ Intel Core i7 will temporarily increase clock rate
by about 10% until the chip gets too warm.
Amdahl’s Law
• Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Timproved 

Example: multiply accounts for 80s/100s


Taffected
 Tunaffected
improvemen t factor
How much improvement in multiply performance to
get 5× overall?
80
 Can’t be done!
20 
 20
n
Corollary: make the common case fast
• Amdah’s Law, together with the CPU
performance equation, is a handy tool for
evaluation potential enhancements.
Timproved 
Taffected
 Tunaffected
improvemen t factor
Clock Cycles  Instruction Count  Cycles per Instruction
CPU Time  Instruction Count  CPI  Clock Cycle Time
Instruction Count  CPI

Clock Rate
MIPS as a Performance Metric
• MIPS: Millions of Instructions Per Second
▫ Faster computers have a higher MIPS rating.
▫ Doesn’t account for
 Differences in ISAs between computers
 Differences in complexity between instructions
Instruction count
MIPS 
Execution time  10 6
Instruction count
Clock rate


6
Instruction count  CPI
CPI

10
6
 10
Clock rate

We cannot compare computers with different
instruction sets using MIPS.
Example
• Which computer has the higher MIPS rating?
• Which computer is faster?
Computer A
Computer B
Instruction Count 10 billion
8 billion
Clock rate
4GHz
4GHz
CPI
1.0
1.1
Download