Survey of Computer Architecture

advertisement
Computer Architecture
Part II-D: Survey of Processor
Architecture
Microprocessors in the Market
What’s the difference?
Areas of Development

Below are technologies which can be
improved in CPU design:







System bus speed
Internal and external clock frequency
Casing
Cooling system
Instruction set
Material used for the die
End result: enhance speed of the CPU and
the system in general
The System Bus

CPU
Conduit for moving data
between the processor and
other system components
Caches
System Bus
Memory
Adapters
Bus
Controllers
I/O Devices:
Disks
Displays
Keyboards
Networks
System Bus Speeds


Intel Pentium Core 2 Quad/Duo have
CPU clocks of 2.66/3 GHz with system
bus speeds of 1066/1333 MHz
AMD: 2nd Generation Opteron (dual
core) processor has clock speed of 1.8
GHz with a 1000 MHz system bus
Split Clock Frequency

Internal clock frequency


External clock frequency


Speed of data processing inside the CPU
Speed of data transfer to and from the
CPU via the system bus
Intel 486DX2 25/50 was first to use
clock doubling to implement split clock
system
The GHz Race in CPU Frequency



June 1999: API (Alpha Processor Inc.) demonstrated
a 1 GHz chip
March 2000: AMD released Athlon 1 GHz; within
days Intel released 1 GHz Pentium III
2002: AMD, Intel uses 0.13 micron technology



2004



Athlon XP 2200+ (June)
Pentium 4 2.53 GHz (May), mobile Pentium 4 2 GHz (June)
Pentium 4: 3.6 GHz, 800 MHz system bus
AMD: 3200+, 2.2 GHz, 400 MHz : Same as 2003 32-bit
CPUs  now concentrating on 64-bit
2005


Pentium 4: 3.73 – 3.8 GHz, 800/1066 MHz system bus
AMD: Same as 2004
Is Moore’s Law Dead?




Intel’s vision of a 10 GHz CPU cannot
be realized due to heat problems
Some have pushed speed limits
through high-end cooling systems
Both Intel and AMD no longer
concentrating on speed as
performance driver
SIA says “Moore’s Law is still going
strong after 40 years”
Micron Technology

A micron is 1 millionth of a meter


Objective: thinner wires




Human hair strand about 100 microns
Allow CPU to operate at lower voltage
Results in CPU generating less heat and
operating at higher speeds
Currently, processors are in the range
of 0.065 microns (65 nm)
Intel’s Roadmap: 45  15 nm
Micron Technology Through the Years
Processor
Year
Micron
4004 First microprocessor
1971
10
8080
1974
6
8086
1978
3
486 Intel
1989
1
486 AMD
1990
0.8
Pentium classic
1993
0.8
IDT Winchip
1997
0.35
Pentium MMX
1997
0.25
AMD K6-11
1997
0.25
PIII/Athlon/Itanium
2001
0.18
P4/Athlon XP
2002
0.13
2003
0.13/0.09
2004 - 05
0.09
2006 - 07
0.065
Transistors (,000)
Thinner Wires = Increased Transistors
45000
40000
35000
30000
25000
20000
15000
10000
5000
0
Pertium 4
42,000,000
8086/8088
22,000
1984
AMD K6
486SX/486DX
8,800,000
486DX2/486DX4
1,200,000
386DX/386SX
Pentium, Cyrix
250,000
286
AMD K5, MMX
128,000
3,100,000
Athlon 1.4 GHz
37,000,000
1987
1990
1993
Year
1997
1999
2001
The Switch to Copper


Aluminum limits making chips smaller
Copper is a good choice because it





is a better conductor
consumes less energy, and
takes up less space than aluminum
Copper allowed processors to boost
speeds to the GHz range
IBM pioneered the use of copper on
September 1, 1998 (IBM Power PC
740/750)
PC on a Chip



Integrates a number of key
components into one chip
Result: The chip replaces
dozen or so separate chips
(memory, FPU, graphics,
video, etc.)
Applications: PDAs,
cellphones, set-top boxes,
embedded processors, etc.
Impact of PC-on-a-Chip



Smaller and quieter desktops
Battery of devices lasts longer
because of the low power drain
Proliferation of information appliances
CPU Receptacle

ZIF


Zero Insertion Force
socket - type of socket
designed for easy
insertion of chips that
have high density of pins
Socket 7 - popular
implementation of ZIF
CPU Receptacle

Slot 1

Consists of receptacle on the motherboard
that holds an Intel Single Edge Contact
(SEC) cartridge

Cartridge may contain up to two CPUs and
an L2 cache (runs at half the speed of CPU)
and plugs into 242-pin receptacle

Started with Pentium II
CPU Receptacle

A Pentium II mounted on Slot 1
CPU Receptacle

Slot 2





An enhanced Slot 1
Uses 330-pin SEC
Holds up to four CPUs
L2 cache runs at full
processor speed
First used in Intel's
Pentium II Xeon
CPU Receptacle

AMD’s Slot A


Receptacle on motherboard for K7 CPU
Physically similar to Slot 1, but has
different electrical requirements
Casing: FC-PGA (Flip-Chip)
Traditional Wiring
Flip-Chip (IBM)
Advantages of FC-PGA



Greater # of I/O pins available
Shorter electrical connections
Better manufacturing efficiency
Casing: FC-LGA
Bottom view of LGA/BGA-based CPU
LGA Socket 775
Advantages of FC-LGA


Lower voltage used (less distance
traveled, reduced signal loss)
Less heat dissipation
Cache


Works as buffer between CPU and memory
Two types:


Internal
External
Levels of Cache



Level 1
Level 2
Level 3
L1
L2
L3
Cache Placement


Intel used to have
external L2 cache
Pentium Pro


Internal but CPU and L2
cache are separate
Result: larger chip that
requires a larger socket
Overclocking


Going beyond recommended clock
frequency settings
3 method of overclocking




System bus frequency
CPU frequency multiplier
Change both of the above
Some CPUs have locked frequencies
Overclocking: How to...



Done through BIOS
program
Older systems require
motherboard jumpers
Some motherboards
(e.g. ASUS TX97)
contain jumper codes
Overclocking Issues



Heat!
Can main memory cope?
Will the software still work?
Cooling Systems



CPUs get hotter as
they get faster
Developed to keep
the CPU from
overheating
Sophisticated
cooling systems
allow more reliable
CPU operation
Liquid Nitrogen: Extremely Cool!
CPU: Pentium 4 (Northwood)
Date: Christmas 2003
The CPU Gets Watered Down
Multimedia Processing

Multimedia applications require geometric
transformation




Re-computation of location and size of an image
to determine new position
Deals with FP
FPU handles all real number computations
Drawing landscapes (e.g. games) involves
lots of computations and CPU may not
handle it as fast as the player could react
Ways of Handling Multimedia




Speed up the CPU
Improve the CPU’s FPU by adding
more pipelines
Use high-end 3D graphics cards
Add new multimedia instructions
Multimedia Innovations in CPUs



MMX
3DNow!
SSE
MMX




Introduced 1995 in the Pentium processor
Had 57 new instructions for 3D graphics
Introduced SIMD (Single Instruction Multiple
Data) instructions: technique that processes
more than one integer simultaneously
Problems:


Only works with integers
CPU can only work with either MMX or FPU, not
both simultaneously because they share registers
3DNow!


Introduced summer of 1998 in the
AMD K6-2
Characteristics



Supports SIMD instructions
Improved handling of numbers
Successful!


Integrated in Windows, games, and
drivers
Does not use the same registers
SSE


Introduced in Pentium III (Katmai) 500 MHz
as Intel’s response to 3DNow!
Characteristics




8 new 128-bit registers (can hold four 32-bit #s)
Has Streaming SIMD Extensions
50 new instructions enabling simultaneous
advanced calculations of more FP with a single
instruction
New Media Instructions designed for coding and
decoding MPEGs
Problems with SSE



Pipelines can only handle two 32-bit
numbers at a time
To take advantage of 128-bit registers,
FPU pipeline should have been
doubled (would have pushed back
release date of Katmai)
Potentially, it could have enhanced 3D
graphics since registers can handle
four 32-bit numbers at a time
SSE Enhancements

SSE2




SSE3



Started in Pentium 4
Has 144 new instructions (since SSE)
Data width now 64 bits
13 additional SIMD instructions (since SSE2)
New instructions primarily designed to improve
thread synchronization and specific application
areas such as media and gaming
Supplemental SSE3 (Core 2)  SSE4
Other CPU Innovations

Data width




Internal: How many
bits can the CPU
process
simultaneously?
External: How many
bits can the CPU
receive simultaneously
for processing
Superscalar
architecture
Superpipelined
architecture
Superscalar processing
Intel Corporation


Produced biggest impact on
microprocessor technology
Main line of business is CPU but also
has other hardware products (e.g.
motherboards)
Short History of Intel

1968: Birth of Intel






Started in memory business
First product was 64-bit memory
1970s: Increase in market share
Early 1980s: Japanese eats up memory
market with 16 - 256 KB chips
1984: Business slowing down  “Get us
out of memory!”
1986: Exited from memory due to success
of 80386
Intel Processor Time Line
1982: 286
16-bit processor
Optimized Instruction handling
1978: 8086
First 16-bit CPU from Intel
1988: 386SX
Cheaper version
of the 386DX
2
1979: 8088
Reengineered CPU to fit
existing 8-bit hardware
1971: 4004
Intel’s first microprocessor
(108 KHz, 4 bit bus width)
1985: 386
First 32-bit CPU
(32-bit system bus)
1989: 486
Built in math co-processor
L1 cache on-chip
Intel Processor Time Line
May 7, 1997: Pentium II
1993: Pentium Classic
(Klamath)
Superscalar (5x 486DX-33 MHz)
512 KB L2
Width of system bus: 64 bit
L1 cache of 32 KB
486SX
Speed of system bus: 60 to 66 MHz
Discount chip
Initially produced a lot of heat
Nov 1, 1995: Pentium Pro
No math co-processor
RISC Processor
32 bit processing
L2 cache is built in
3
486DX4
Triple the clock speed
From 25 MHz to 75 MHz
33 MHz to 100 MHz
Jan 8, 1997: Pentium MMX
New set of instructions for multimedia
32 KB L1 cache
Intel Processor Time Line
Jan 26, 1998: Deschutes
333 MHz
0.25 micron technology
2000: Pentium 4
7th Generation
0.18 micron technology
1998: Celeron (Mendocino)
333 MHz
128 KB L2 internal cache
1999: Pentim III (Katmai)
Enhanced MMX2 graphics
instructions
1Q 1998: Celeron (Covington)
Pentium II without
July 26, 1998: Pentium II Xeon
the L2 cache
450 MHz
Custom SRAM
Different L2 caches: 512, 1/2 MB
Can have 4 - 8 Xeons in one server
Core
(2005)
2001: Itanium
(formerly Merced)
64-bit CPU
0.18 micron technology
> 25 million transistors
1999: Pentium III
Xeon
(Tanner)
Current Intel CPU Innovations


Hyperthreading
Multi-core


Core
Core 2 (64-bit architecture)
Intel’s First 64-Bit Chip (Server):
Itanium


Was known as IA-64 (but IA-32 compatible)
EPIC (Explicitly Parallel Instruction
Computing) processor




Enables up to 20 operations/clock cycle
Employs branch prediction and speculation
Three levels of cache: 2 MB / 4 MB L3
cache, 96K L2 cache, and 32K L1 cache
128 integer registers, 128 FP registers
Itanium 2





Available from 1 - 1.66 GHz
Internal L3 cache (1.5 MB, 3 MB, 4 MB, 6 MB, or 9 MB)
System bus: 400/533/667 MHz, 128-bits wide
0.13 microns, 592 million transistors
Next version (“Montecito”) has 1.72 billion transistors, 26 MB ondie cache, 90 nm
Current Intel CPU Lineup

Mobile


Desktop



Centrino (Core and Core 2)
Core 2 Extreme
Core 2 (now used in Apple Mac Mini)
Servers and workstations


Xeon (now used in Apple Mac Pro)
Itanium 2
AMD (Advanced Micro Devices)




Incorporated in May 1969
Challenging Intel even before
Pentium-class processors
Offered their own technology
and cannot be considered as
producing clones
Achieved increased market
sales starting with K6 and K6-II
AMD Series (From Pentium Class)

K5




Similar to the classic Pentiums
16 KB L1 cache and no MMX
Not very impressive but much cheaper than
similar Pentium models
K6



Technology brought in from NexGen; put AMD
back in business
32 KB L1 cache & MMX
Pentium compatible but performed better than
MMX
AMD Series (From Pentium Class)

K6-II: Chomper




K6-III: Sharptooth



0.25 micron, system bus speed of 100 MHz
Introduced 3DNow!
Also MMX-compatible; really challenged the
Pentium II and led to low-cost Celeron
Three levels of cache: L1 and L2 are in CPU; L3
is on motherboard up to 1 MB; 133 system bus
Was not as successful as the K6-II
K7: Athlon
Last of AMD’s 32-Bit Processors

Athlon XP



Intel played catch-up to the Athlon XP on many
occasions, but now stagnant in 32-bit computing
Model 3200+ has a 2.2 GHz CPU, 3 FP
pipelines, 128 KB of L1 cache, 512 KB L2
cache, system bus speed of 400 MHz, and 0.13
micron technology
Sempron


Counterpart of Intel Celeron
Model 3300+ has 2 GHz CPU, 754-pins, 90 nm
technology, 128 KB L2 cache
AMD’s 64-Bit Chips

Varieties:







Athlon 64 (desktop)
Turion 64 (mobile)
Opteron (servers or
workstations)
Provides seamless
transition to 64-bit
System bus runs at
processor speed through
on-chip memory controller
Lead the Itanium 2 on
many benchmarks
AMD formed a partnership
with Sun
Current AMD 64-Bit CPU Innovations



HyperTransport
Dual core
Direct Connect Architecture
Transmeta’s Crusoe Processor


Transmeta’s founders include David Ditzel,
Linus Torvalds, and Paul Allen released
“Crusoe” in January 2000
Architectural achievements



Only 25% the number of transistors compared to
current Pentiums
Needs only 1 or 2 watts of power for 400 MHz or
700 MHz chips running at full speed
Much less heat dissipated but can compete with
same category Intel and AMD chips
How Crusoe Pulled It Off



Efficient instruction
set bears no
resemblance to x86
Takes advantage of
latest and best in
hardware design
Software layer
(code morphing
software) in flash
ROM translates x86
commands
Current Transmeta Processors

Crusoe TM5900





667 MHz – 1 GHz CPU speed
128 KB L1, 512 KB L2
133 MHz system bus
0.13 microns
Efficeon TM8800



Up to 1.7 GHz
128 KB L1 instruction cache 64 KB L1 data
cache, 1 MB L2
400 MHz system bus
The PowerPC Microprocessor





Originally designed by
Apple, IBM, and Motorola
Based on IBM POWER
architecture used in IBM
RS/6000 (RISC based)
Provides seamless
transition to 64-bit
The PowerPC G5 is used in
Apple iMac G5
2.7 GHz CPU speed, 1.35
GHz system bus, 512 KB
on-chip L2 cache
Sun UltraSparc IV+









2nd generation dual core
processor design (1368 pins FCLGA)
64-bit CPU, 90 nm, 295 million
transistors
CPU speeds of 1.95 / 2.1 GHz
2 MB L2 cache, 32 MB off-chip
On-chip memory controller
CMT (Chip Multi-Threading) with
2 threads per processor
14-stage non-stalling pipeline
4-way superscalar
Runs Solaris, Linux, FreeBSD,
and other UNIX versions
Sun UltraSparc T1








Available in 4-, 6- or 8-core
64 bits, 90 nm
4-way multithreaded core
14-stage non-stalling pipeline
4 integrated memory controllers
16 KB instruction, 8 KB data L1
cache per core, 3 MB unified L2
cache
Available in 1 and 1.2 GHz
Low power (72 – 79 watts)
Multiprocessor Systems



Combines two or more CPUs of the
same brand and model
Allows systems to scale up
Forms an N-way system
Future Trends


In Dec. 1997, the Semiconductor
Industry Association (SIA) provided
details about future requirements of
microprocessors.
Attempts to continue the pace
predicted by Moore’s Law
1999 SIA Roadmap for
Microprocessors
MPU (gate length)
Transistors/
(sq. cm)
Die size
(sq. mm)
1999
2000
2001
2002
2005 2008
0.14
microns
0.12
0.10
0.09
0.065 0.045
6.6 million 9.4 million 13 million 18 million
44
109
million million
340
340
340
340
408
468
MHz
1250
1486
1767
2100
3500
6000
Packaging
(pins/balls)
740
821
912
1012
1384
1893
Wafer size
(mm)
200
200
300
300
300
300
International Technology Roadmap for
Semiconductors
Download