Pertemuan 08 Intel x86 Matakuliah : H0162/ Mikroprosesor

advertisement
Matakuliah
Tahun
Versi
: H0162/ Mikroprosesor
: 2006
: 1/0
Pertemuan 08
Intel x86
1
Learning Outcomes
Pada akhir pertemuan ini, diharapkan
mahasiswa akan mampu:
• menerangkan arsitektur mikroprosesor
keluarga Pentium (C2)
2
Outline Materi
•
•
•
•
•
Jenis-jenis Pentium
Fitur tambahan
Branch Prediction
Dual Independent Bus
SpeedStep
3
Enhancing CPU Operation
• Local Bus (system bus or front side bus):
connects CPU to RAM, and video card slots –
fastest bus.
• L1 and L2 cache: Cache in CPU is called L1
cache (static RAM – faster). L2 cache (dynamic
RAM –slower) is mounted close to the processor
L1 cache is divided into data cache and
instruction cache. Pentium III incorporates both
caches into the CPU.
• Math Coprocessor: Beginning with 486 math
coprocessor (floating point unit) was integrated
into CPU.
4
Processor Descriptive Features
• RISC (Reduced instructional set of commands)
– produces CPU that is faster and cheaper –
software has to be better.
• CISC (Complex instructional set computer) –
opposite of RISC – most processors.
• MMX (Multimedia extensions) – additional 57
commands for graphics.
• Multiple Branch Prediction – guessing what
will be needed next – speeds CPU operations
– over 90% accurate.
5
Processor Descriptive Features
• Superscalar Technology – processing more than
one instruction at a time – Pentium onwards – 2
pipelines (paths for data to CPU).
• Dynamic Execution – enhanced superscalar and
multiple branch predict features – Pentium II
onwards.
• Dual Independent Bus (DIB) – Pentium Pro
and Pentium II onwards – one bus to main
memory and other to L2 cache.
6
Real, Protected and Virtual Modes
• Real Mode – 286 and later – operates within first
1MB of memory - Multitasking not supported –
acts like 8088 or 8086.
• Protected Mode – Processor supports
multitasking – accesses more than 1 MB of
memory. While in protected mode each program
is given its own section of memory.
• Virtual Mode – processor can operate several
real mode programs at once and access
memory higher than 1 MB.
7
Jenis-jenis Pentium
• Yang dimaksud dengan keluarga Pentium
adalah dari Pentium, Pentium Pro,
Pentium II, Pentium III, Pentium 4
8
Progress @ Intel
From http://www.howstuffworks.com/microprocessor1.htm
Name
Date
8080
1974
6,000
6
2 MHz
8 bits
0.64
8088
1979
29,000
3
5 MHz
16 bits
8-bit bus
0.33
80286
1982
134,000
1.5
6 MHz
16 bits
1
80386
1985
275,000
1.5
16 MHz
32 bits
5
80486
1989
1,200,000
1
25 MHz
32 bits
20
Pentium
1993
3,100,000
0.8
60 MHz
32 bits
64-bit bus
100
Pentium II
1997
7,500,000
0.35
233 MHz
32 bits
64-bit bus
~300
Pentium III
1999
9,500,000
0.25
450 MHz
32 bits
64-bit bus
~510
42,000,000
0.18
1.5 GHz
32 bits
64-bit bus
~1,700
55,000,000
0.13
2.4 GHz
2x32 bits
64-bit bus
Pentium 4
Pentium 4 HT
2000
2002
Transistors
Microns
Clock speed
Data width
MIPS
9
Pentium Family
• Pentium -1993 – 200 MHz – 64 bit bus –
pipeline and superscalar architecture.
• Pentium Overdrive – 1995 – 100 MHz –
upgrade for 486.
• Pentium Pro – 1995 – 200MHz.
• Pentium with MMX – 1997 – 200 MHzdesigned for multimedia applications.
• MMX Overdrive – 1997 – 200 MHz upgrade for Pentium
10
Pentium Family
• Pentium II – 1997 – 450 MHz – Single Edge
Contact (SEC) processor package.
• Pentium III – 1999 -70 new instructions for
graphics, video, audio and speech recognition –
1 GHz – up to 64GB memory
• Pentium IV – 2000 – 1.3 GHz and up – 32 bit
processor – 400 MHz front side bus – 144 new
programming instructions to improve video,
audio and 3D applications
• Lebih detilnya di untuk setiap variant speednya
http://www.xbitlabs.com/news/cpu/display/20030
327134746.html
11
Pentium Pro Die Photo
5.5 Juta
Transistor
12
Pentium
Die Photo
13
Pentium 4 Die Photo
42 Juta
Transistor
14
The Heat Problem
Rocket Nozzle
1000
Nuclear Reactor
Pentium 4
Watts/cm2
(Prescott)
100
Pentium 4
(Willamette)
Pentium III
Pentium II
Hot Plate
10
Pentium Pro
Pentium
i386
i486
1
1.5
Courtesy of Bob Colwell
1.0
0.7
0.5
0.35
0.25
0.18
Increasing Frequency
0.13
0.1
0.07
15
Microarchitecture Trends
Adapted from Johan De Gelas, Quest for More Processing Power,
AnandTech, Feb. 8, 2005.
16
Moore’s Law Still Holds
1011
10
2G 4G
10
Memory
109
Transistors Per Die
512M 1G
256M
128M
Itanium®
64M
Pentium® 4
16M
Microprocessor
108
107
Pentium® III
4M
Pentium® II
1M
106
256K
105
4K
104
64K
16K
i486™
Pentium®
i386™
80286
8080
1K
8086
103
4004
102
101
100
’60
’65
’70
’75
’80
’85
’90
’95
’00
’05
’10
17
Source: Intel
18
Pentium
• 100% binary compatible with
ancestors.
• Enhancements and additions
to i486:
–
–
–
–
–
–
–
–
–
–
Superscalar Architecture
Dynamic Branch Prediction
Pipelined Floating-Point Unit
Improved Instruction
Execution Time
Separate 8K Code and Data
Caches
Writeback MESI Protocol
(Data Caches)
64-Bit Data Bus
Bus Cycle Pipelining
Address Parity
Internal Parity Checking
– Functional Redundancy
Checking
– Execution Tracking
– Performance Monitoring
– IEEE 1149.1 Boundary Scan
– System Management Mode
– Virtual Mode Extensions
• New instructions
to accommodate the additional
functionality.
• The MMU fully compatible with
i386 and i486.
• The floating-point unit
completely redesigned,
compared with i486.
19
Pentium
20
Pentium Block
Diagram
21
Pentium Pro
• Fitur tambahan:
– out-of-order execution engine,
– dual integer pipelines, and
– improved floating-point unit
22
Fitur Pentium Pro
• Superpipelining: The Pentium Pro dramatically increases the
number of execution steps, to 14, from the Pentium's 5.
• Integrated Level 2 Cache: The Pentium Pro features a dramatically
higher-performance secondary cache compared to all earlier
processors. Instead of using motherboard-based cache running at
the speed of the memory bus, it uses an integrated level 2 cache
with its own bus, running at full processor speed, typically three
times the speed that the cache runs at on the Pentium. The Pentium
Pro's cache is also non-blocking, which allows the processor to
continue without waiting on a cache miss.
• 32-Bit Optimization: The Pentium Pro is optimized for running 32bit code (which most modern operating systems and applications
use) and so gives a greater performance improvement over the
Pentium when using the latest software.
• Wider Address Bus: The address bus on the Pentium Pro is
widened to 36 bits, giving it a maximum addressability of 64 GB of
memory.
23
Fitur Pentium Pro
• Greater Multiprocessing: Quad processor
configurations are supported with the Pentium Pro
compared to only dual with the Pentium.
• Out of Order Completion: Instructions flowing down the
execution pipelines can complete out of order.
• Superior Branch Prediction Unit: The branch target
buffer is double the size of the Pentium's and its
accuracy is increased.
• Register Renaming: This feature improves parallel
performance of the pipelines.
• Speculative Execution: The Pro uses speculative
execution to reduce pipeline stall time in its RISC core.
24
Pentium II
• The Pentium II utilizes features of P6
microarchitecture (namely a multi-transaction
bus, Dynamic Execution performance, and Intel
MMX)
• Dual Independent Bus architecture
• 66MHz or 100MHz system bus
• Single Edge Contact Cartridge packaging
technology
• 512K unified, non-blocking L2 Cache
• 233MHz through 450MHz clock speeds
25
Pentium III
• Pentium III = Pentium II + SSE
• SSE : Internet Streaming SIMD
Extensions
• Seventy New Instruction
• Three Categories:
– SIMD-Floating Point
– New Media Instruction
– Streaming Memory Instruction
26
SIMD: Single Instruction Multiple Data
• MMX instruction & SSE instruction
– provides a group of instructions that perform
SIMD operations on packed integer and/or
packed floating-point data elements contained
in the 64-bit MMX or the 128-bit XMM
registers.
– enables increased performance on a wide
variety of multimedia and communications
applications.
27
SIMD-FP Instruction
• SIMD feature introduce a new register file
containing eight 128-bit registers
– Capable of holding a vector of four IEEE
single precision FP data elements
– Allow four single precision FP operations to
be carried out within a single instruction
28
SIMD
29
Fitur Pentium 4
•
•
•
•
•
•
•
•
400MHz System Bus
Hyper-Pipelined Technology
Rapid Execution Engine
Execution Trace Cache
Advanced Transfer Cache
Advanced Dynamic Execution
Enhanced Floating Point
Streaming SIMD2 Instructions
30
Pentium 4
31
Pentium Pipeline
•
The Pentium's basic integer pipeline is five stages
long, with the stages broken down as follows:
–
–
–
–
–
Prefetch/Fetch: Instructions are fetched from the
instruction cache and aligned in prefetch buffers for
decoding.
Decode1: Instructions are decoded into the Pentium's
internal instruction format. Branch prediction also takes
place at this stage.
Decode2: Same as above, and microcode ROM kicks
in here, if necessary. Also, address computations take
place at this stage.
Execute: The integer hardware executes the
instruction.
Write-back: The results of the computation are written
back to the register file
32
Pentium Pipeline
33
P5 Architecture
34
P6 Architecture
35
NetBurst Architecture
http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/pentium4/optimization/44015.htm?page=1
36
Branch Prediction
37
Branch Prediction
• Imagine a simple microprocessor where all
instructions are handled in two steps: decoding
and execution. The microprocessor can save
time by decoding one instruction while the
preceding instruction is executing. This
assembly line-principle is called pipelining. In
advanced microprocessors, the pipeline may
have many steps so that many consecutive
instructions are underway in the assembly line at
the same time, one at each stage in the pipeline.
38
Branch Prediction
• The problem now occurs when we meet a
branch instruction. A branch instruction is the
implementation of an if-then-else construct. If a
condition is true then jump to some other
location; if false then continue with the next
instruction. This gives a break in the flow of
instructions through the pipeline because the
processor doesn't know which instruction comes
next until it has finished executing the branch
instruction. The longer the pipeline, the longer
time it will have to wait until it knows which
instruction to feed next into the pipeline. As
modern microprocessors tend to have longer
and longer pipelines, there has been a growing
need for doing something about this problem.
39
Branch Prediction
• The solution is branch prediction. The
microprocessor tries to predict whether the
branch instruction will jump or not, based on a
record of what this branch has done previously.
If it has jumped the last four times then chances
are high that it will also jump this time. The
microprocessor decides which instruction to load
next into the pipeline based on this prediction,
before it knows for sure. This is called
speculative execution. If the prediction turns out
to be wrong, then it has to flush the pipeline and
discard all calculations that were based on this
prediction. But if the prediction was correct, then
it has saved a lot of time.
40
2 Level Branch Prediction
41
Branch Prediction
• BTB used to predict the outcome of branch instructions.
• Current address in D1 is applied to BTB.
• If hit,
the assumption is that the branch will be taken (if the
assumption is correct, execution goes without stalls and
flushes).
• If miss,
the assumption is the branch will not be taken.
• A mispredicted branch (weather BTB hit or miss) causes
the pipeline to be flushed.
• The number of delay clocks depends on the branch type.
42
Dual Independent Bus (DIB)
• a bus architecture that is part of Intel's Pentium Pro
and Pentium II microprocessors.
• As its name implies, DIB uses two buses: one from
the processor to main memory, and the other from
the processor to the L2 cache.
• The processor can access both buses
simultaneously, which increases throughput.
43
Dual Independent Bus
• The Pentium II processor bus architecture
addresses processor-to-memory bus bandwidth
limitations, offering up to three times the
performance bandwidth of the single-bus,
"socket 7" generation processors, such as the
Pentium processor. This translates into overall
faster system performance.
• Two buses make up the Dual Independent Bus
architecture: the L2 cache bus and the
processor-to-main-memory system bus. The
speed of the dedicated L2 cache bus on the
Pentium II processor scales with the speed of
the processor.
44
Dual Independent Bus
• The cache bus on the 300-MHz processor,
for instance, runs at 150 MHz, more than
twice as fast as the L2 cache on a
Pentium processor, which runs at a fixed
66 MHz.
• The processor-to-main-memory system
bus enables simultaneous parallel
transactions instead of single, sequential
transactions of previous generation
processors, further increasing
performance.
45
Dual Independent Bus
• Two buses make up the Dual
Independent Bus architecture: the
L2 cache bus and the system bus.
Each is 8-bytes wide, thus
doubling the available channels for
data.
• As the L2 cache bus is integrated
into the Single Edge Contact
cartridge, it is not limited in speed
by the constraints of motherboard
routing.
• Therefore the L2 cache bus is
designed to run at 1/2 the
processor core frequency on the
Pentium II processor. Peak
bandwidth for a Pentium II
processor with Dual Independent
Bus Architecture can be calculated
as 533MB/sec for the system bus
plus 8 times the L2 cache bus
frequency.
46
Intel SpeedStep Technology
• Speed Step was first seen helping to preserve
battery life in Pentium III notebook computers by
reducing the speed (and hence the power drain)
of the processor when it had less work to do.
• The design allows notebooks to power down
from 600MHz or 650MHz to 500MHz when
running on battery power.
• Enhanced Intel Speed Step technology (EIST)
What this technology does is to dynamically
scale the speed of the processor between its
default clock setting and a minimum speed (at
the moment) of 2.8GHz based on how much
CPU horsepower is needed at that moment.
47
Download