Status of Microprocessors Technology

advertisement
Status of Microprocessors
Technology
Advanced Computer Architecture
Spring 2013, Kyushu University
Lecturer: Farhad Mehdipour
Email: farhad@ejust.kyushu-u.ac.jp
Web: http://www.c.csce.kyushu-u.ac.jp/~farhad
A Typical Computer Organization
CPU: Central Processing Unit
RF: Register File
ALU: Arithmetic & Logic Unit
I/O: Input/Output
2
Designing Computers
All computers more or less based on the same basic design:
the Von Neumann Architecture!
3
The Von Neumann Architecture
•
Model for designing and building computers,
based on the following three characteristics:
1) The computer consists of four main sub-systems:
•
•
•
•
Memory
ALU (Arithmetic/Logic Unit)
Control Unit
Input/Output System (I/O)
2) Program is stored in memory during execution.
3) Program instructions are executed sequentially.
4
The Von Neumann Architecture
Bus
Processor (CPU)
Input/Output
Memory
Control Unit
ALU
Store data and program
Execute program
Do arithmetic/logic operations
requested by program
Communicate with
"outside world", e.g.
• Screen
• Keyboard
• Storage devices
• ...
5
Classes of Computers
• 1960s - large mainframes
–
–
–
–
Costing millions of dollars
Stored in computer rooms
Multiple operators
Typical applications: business data processing and
large-scale scientific computing
• 1970s - the birth of the minicomputer
– A smaller-sized and cheaper computer
• Also the emergence of supercomputers
– High-performance computers for scientific computing
6
Classes of Computers
• 1980s - the rise of the desktop computer
based on microprocessors
– Personal computers
– Workstations
• 1990s - the emergence of
– The Internet and the World Wide Web
– The first successful handheld computing devices
(personal digital assistants or PDAs)
– High-performance digital consumer electronics
– Cell phones and smart phones
7
Personal Mobile Device (PMD)
• Wireless devices with multimedia interfaces such as
cell phones, smartphones, tablet computers and ….
• Requirements
–
–
–
–
Cost
Energy efficiency
Real-time performance
Minimized memory
8
Desktop Computers
• One of the largest markets in dollar terms
• Low-end (<$500) to high-end ($5K) systems
• Optimized price-performance
– Performance measured in the no. of
calculations and graphic operations
– Price is what matters to customers
9
Servers
• Provide large-scale and more reliable file and
computing services (Web servers)
• Key requirements
– Dependability – effectively provide service 24/7/365 (Yahoo!,
Google, eBay)
– Scalability – server systems grow over time, so the ability to
scale up the computing capacity is crucial
– Performance – transactions per minute
10
Clusters/Warehouse-Scale Computers
• Software as a Service(SaaS)
–
–
–
–
Search
Social networking
Video sharing
Multiplayer games
• Each nodes runs its own OS and nodes communicate
using a network protocol.
• The largest of the clusters are called Warehouse-Scale
Computers (WSC), tens of thousands of servers can act
as one.
• Power (80% of the cost of $90M a WCS is associated
with power and cooling)
Google’s data center
• As clusters grow in popularity, the number of
conventional supercomputers is shrinking.
11
Embedded Computers
•
Computers as parts of other devices where their presence is
not obviously visible
– e.g., home appliances, printers, smart cards,
cell phones, set-top boxes, gaming consoles, network
routers.
•
Fastest growing portion of the market
•
Wide range of processing power and cost
– $0.1 (8-bit, 16-bit processors), $10 (32-bit, capable to
execute 50M instructions per second), $100-$200 (highend video gaming consoles and network switches)
•
Requirements
– Real-time performance
(e.g., time to process a video frame is limited)
– Minimized memory
– Minimized power
– Price, Weight, Size
12
Classes of Computers
• These changes in computer use have led to five
different computing markets:
13
Exciting Change
It impacts every aspect of human life.
Eniac, 1946
Occupied 17x10 meter ^2 room,
weighted 30 tones,
contained 18000 electronic valves, consumed
150KW of electrical power;
capable to perform 5K addition per second
PlayStation Portable (PSP)
Approx. 170 mm (L) x 74 mm (W) x 23 mm (D)
Weight: Approx. 260 g (including battery)
CPU: PSP CPU (clock frequency 1~333MHz)
Main Memory: 32MB
Embedded DRAM: 4MB
Profile: Game, Audio, Video
14
Evolution of Computers
 First generation (1939-1954) - vacuum tube
 Second generation (1954-1959) - transistor
 Third generation (1959-1971) - IC
 Fourth generation (1971-present) - microprocessor
15
Technology Used in Computers
Transistors
Vacuum Tube
Integrated
Circuit- IC
Microprocessor VLSI*
chips
*VLSI: Very large-scale integration
16
Wafer & Die
Die
20~30 cm
X nm
(nanometer)
Wafer
x mm (e.g. 100 mm)
17
Evolution of Computers
 First Generation: ENIAC, 1946 (U of Penn) –Vacuum Tubes
• The first programmable electronic
digital computer
• 18,000 vacuum tubes
• 30 ton, 30m x 2.5m x 1m
• 5000 additions per second
• 20×10-decimal-digit words
• Programmed by 3000 switches
• Cost: almost $500,000
(approximately $6,000,000 today)
(became stored program in 1948
following von Neumann's advise)
18
Evolution of Computers
 Second generation (1954-1959) - Transistor
Manchester University Experimental Transistor Computer
http://history.acusd.edu/gen/recording/computer1.html
http://www.computer50.org/kgill/transistor/trans.html
19
Commercialization in the 50s
• UNIVAC, 1951, the first commercial computer
– contract price $400K, actual cost ~$1M, sold 48 copies
• IBM 701, 1952, shipped 19 copies
– leased at $12K per month
• IBM 650, 1953, mass produced ~2000 units
– $200K ~ 400K
• IBM System/360, 1964
– A family of binary compatible computer
– 19 combinations of varying speed and memory capacity from $200K ~
$2M
– Still lives on today as the “highly-profitable” IBM z900 series
20
Evolution of Computers
 Third generation (1959-1971) - IC
PDP-8, Digital Equipment Corporation
 Thanks to the use of ICs, the DEC PDP-8
is the least expensive general-purpose small
computer in 1960s
http://history.acusd.edu/gen/recording/computer1.html
http://www.piercefuller.com/collect/pdp8.html
21
Cheaper or Faster in 60s and 70s
• Minicomputers
– DEC PDP-8, 1965, $20K, size of large refrigerators
– Less powerful than “mainframes”, 10x cheaper
– Departmental computers--PDP-11 and VAXs enjoyed extreme
popularity in the 70s and 80s
• Supercomputers
– Performance at all cost!!
– Biggest customers: national security, nuclear weapons, cryptography,
(also aerospace, petroleum, automotive, pharmaceutical, sciences)
check out www.top500.org
22
Evolution of Computers
 Fourth generation (1971-present) - microprocessor
 In 1971, Intel developed 4-bit 4004 chip for calculator
applications.
ROM/RAM buffer
Timing
Reset
Control logic
Program
counter
Instruction
decoder
ALU
Reg.
I/O
Refresh
logic
System bus
Block diagram of Intel 4004
4004 chip layout
http://www.intel.com
A good review article: The History of The Microprocessor, Bell Labs Technical Journal, 1997.
23
Early Examples
DEC PDP 8, 1963
An early mini
Xerox Alto, 1973
An early “PC” with mouse
24
Cray 3, 1993
•
•
•
•
Up to 16 processors and up to 2 gigawords (16 GB) of memory
Power consumption: 90KW
15 GFLOPS (1 sec on Cray3 ≈ 67 years ENIAC)
$30,000,000
25
Microprocessor Generations
• First generation: 1971-78
– Behind the power curve
(16-bit, <50k transistors)
• Second Generation: 1979-85
– Becoming “real” computers
(32-bit , >50k transistors)
• Third Generation: 1985-89
– Challenging the “establishment”
(Reduced Instruction Set Computer/RISC,
>100k transistors)
• Fourth Generation: 1990– Architectural and performance leadership
(64-bit, > 1M transistors,
Intel/AMD translate into RISC internally)
26
Intel 4004 @ 70s
• Intel 4004, first single chip CPU
–
–
–
–
4- bit processor for a calculator
2,300 transistors
16-pin DIP package
740kHz (eight clock cycles per CPU
cycle of 10.8 microseconds)
– ~ 100K OPs per second
27
Intel Itanium 9500 Series
• 64-bit processor
• 3.1 billion transistors
• 2.53 GHz, issue up to 12
instructions per cycle
• 8 Cores
• 54 MByte of cache!!
In ~40 years, about 1,000,000
times growth in transistor
count and performance!
28
Key Architectural Trends
• Increase performance at 1.6x per year (2X/1.5yr)
– True from 1985-present
• Combination of Technology and Architectural enhancements
– Technology provides faster transistors
 Faster transistors leads to high clock rates
– More transistors (“Moore’s Law”):
• Architectural ideas turn transistors into performance
– Responsible for about half the yearly performance growth
• Two key architectural directions
– Sophisticated memory hierarchies
– Exploiting instruction level parallelism
29
Moor’s Law
Transistor count doubles every 18-24 months!
30
Transistor CountIntel Processors
Transistor count doubles every 18-24 months
31
Processor Transistor Count
Intel 4004, 2300tr
(1971)
Intel P4 – 55M tr
(2001)
Intel McKinley – 221M tr.
(2001)
Intel Core 2 Extreme Quadcore 2x291M tr.
(2006)
32
Microprocessors (Y2K-2014)
Year of 1st shipment 1997 1999 2002 2005 2008 2011 2014
Clock Frequency (GHz) 0.75 1.2
1.6
2
2.5
3
3.674
Chip Size (mm²)
300 340 430 520 620 750 901
Transistors per chip
11M 21M 76M 200M 520M 1,4B 3,62B
33
Towards RISCs
• Two significant changes:
– Virtual elimination of assembly language programming reduced
the need for object-code compatibility
– The creation of standardized, vendor-independent operating
systems (UNIX and Linux)
• These changes 
– A new set of architectures with simpler instructions, called RISC
(Appendix I) (early 1980s).
• RISC-based machines focused on
– the exploitation of Pipelining (Appendix II) and Instruction Level
Parallelism (Appendix III)
– use of Caches
34
Growth in Processor Performance
• Advances in technology
• Innovations in computer design
35
Growth in Processor Performance
RISC
• ILP (pipelining, multiple instruction issue)
• Use of caches
36
Growth in Processor Performance
RISC
Forcing prior architectures to keep up or disappear
• Digital Equipment VAX was replaced by a RISC architecture
• Intel rose to the challenge, primarily by translating x86 (or IA-32) instructions into RISC-like
instructions internally
37
Growth in Processor Performance
RISC
• Little ILP left to exploit efficiently (ILP-Wall)
• Almost unchanged memory latency (Memory-Wall-Appendix IV)
• Maximum power dissipation of air-cooled chips (Power-Wall- Appendix V)
38
Growth in Processor Performance
Move to Multiprocessor
RISC
• Maximum power dissipation of air-cooled chips
• Little ILP left to exploit efficiently
• Almost unchanged memory latency
39
Multiprocessor
• “We are dedicating all of our future product development to
multicore designs. … This is a sea change in computing”
Paul Otellini, President, Intel (2005)
• All microprocessor companies switch to MP (2X CPUs / 2 yrs)
AMD/’05
Intel/’06
IBM/’04
Sun/’05
Processors/chip
2
2
2
8
Threads/Processor
1
2
2
4
Threads/chip
2
4
4
32
Manufacturer/Year
40
Future of Computers
• End of Moore’s law
– Future of VLSI technology after 2015 is unknown
 Transistor size will be measured in atoms and node charge
will be measured in electrons!!
 It doesn’t mean VLSI is finished, just no more scaling
• Non-von Neumann architectures toward:
– Parallel and distributed processing
– Reconfigurable hardware computing
• Non-silicon technologies
– Nanotechnologies: carbon nanotubes, molecular switches
– Biological/cellular computers: DNA, proteins and enzymes
– Quantum computers: magnetic resonance and quantum dots.
• New ways of using computers!!!
41
Thank you!
42
Appendix I:
RISC-Reduced Instruction Set Architectures
• Properties of RISC architectures:
– All ops on data apply to data in registers and typically
change the entire register (32-bits or 64-bits).
– The only ops that affect memory are load/store
operations. Memory to register, and register to
memory.
– Load and store ops on data less than a full size of a
register (32, 16, 8 bits) are often available.
– Usually instructions are few in number (this can be
relative) and are typically one size.
Back
Appendix II:
Pipelining
Single-Cycle CPU
Load
IF
Dec
EX Mem WB
Multiple Cycle CPU
Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5
Load
IF
Dec
EX
Mem WB
Pipelined CPU
Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8
Load
IF
Dec EX Mem WB
Load IF
Dec EX Mem WB
Load IF
Dec EX Mem WB
Load IF
Dec EX Mem WB
Back
44
Appendix III:
Instruction Level Parallelism
• Architectural technique that allows the overlap of individual
machine operations ( add, mul, load, store …)
• Multiple operations execute in parallel (simultaneously)
• Goal: Speed Up the execution
• Example:
instr. 1: sub
instr. 2: add
instr. 3: add
R1  R1, “1”
R4  R1, R3
R5  R3, R2
• Sequential execution (Without ILP)
each instruction takes one cycle
Total execution time: 3 cycles
• ILP execution (overlap execution)
instr. 1 or instr. 2 can run simultaneously with instr. 3
Total execution time: 2 cycles
Back
45
Appendix IV:
Memory Wall
Back
46
Appendix V:
Power Wall
Back
47
Download