More on processors - La Salle University

advertisement
More About Processors
CSIT 301 (Blum)
1
Pentium 4 Processor Specs
CSIT 301 (Blum)
2
The above list of processor specifications includes such
aspects as
•CPU Speed, Bus Speed, Manufacturing technology, Stepping,
Cache Size, Package Type
CSIT 301 (Blum)
3
CPU Speed
CSIT 301 (Blum)
4
CPU Speed
• The activities of the processor are kept in sync by
the clock.
• The clock goes through a regular/repetitive action.
In a binary system, a cycle consists of a 1 and a 0
(a high followed by a low).
• The clock is usually a quartz oscillator that is
external to the microprocessor.
• So the CPU speed is not something built into the
chip, but rather the maximum rate at which the
chip can be expected to perform normally.
CSIT 301 (Blum)
5
CPU Speed (Cont.)
• Sometimes differently rated chips are made
from the same manufacturing process, and
the CPU speed is determined by some
testing after the fact.
• Some people try to operate the processor
faster than the designated rate. This is
known as “overclocking.”
CSIT 301 (Blum)
6
CPU Speed (Cont.)
• The speed is measured in Hertz, which are
cycles per second.
– KiloHertz, kHz, is thousands (103) of cycles per
second
– MegaHertz, MHz, is millions (106) of cycles
per second
– GigaHertz, GHz, is billions (109) of cycles per
second
– What’s next?
CSIT 301 (Blum)
7
CPU Speed (Cont.)
• The clock speed is also known as the clock’s
frequency (the number of cycles per second).
• A related quantity is called the period which is the
time required for one cycle (a.k.a. as a clock tick).
• A clock’s frequency and period are reciprocals.
– f = 1/T or T = 1/f, where f is frequency and T is period
– E.g. a frequency of 60 Hertz (cycles per second)
corresponds to a period of 1/60 = 0.0167 seconds per
cycle
CSIT 301 (Blum)
8
CPU Speed (Cont.)
• A frequency of 1 kHz [a thousand cycles per second]
corresponds to a period (tick) of 1 millisecond (ms) [a
thousandth (10-3) of a second per cycle].
• A frequency of 1 MHz [a million cycles per second]
corresponds to a period (tick) of 1 microsecond (s) [a
millionth (10-6) of a second per cycle].
• A frequency of 1 GHz [a billion cycles per second]
corresponds to a period (tick) of 1 nanosecond (ns) [a
billionth (10-9) of a second per cycle].
CSIT 301 (Blum)
9
Bus Speed
CSIT 301 (Blum)
10
Bus Speed
• There is a hierarchy of buses in a computer, but in
a discussion of processors, the buses of interest are
the front-side bus and the back-side bus.
• In early processors the CPU speed and bus speed
(and thus the speed of interactions with memory,
etc.) were the same. But a bottleneck (the von
Neumann bottleneck) arose because memory
speeds cannot keep up with processor speeds. And
so accessing the memory was holding the
processor back.
CSIT 301 (Blum)
11
Front-side Bus (FSB)
• The Front-side Bus (a.k.a. the memory bus or
system bus) connects the processor to other parts
via the chipset.
• It allows communication between the processor
and main memory (RAM), the system chipset, PCI
devices, the AGP card, and other peripheral buses.
• When the “bus speed” is given as one of the
processor’s specs it refers to the front-side bus
speed.
CSIT 301 (Blum)
12
The Northbridge
• A chipset is a simply group of chips that work
together to perform related functions.
• The Northbridge chipset communicates with the
processor (using the FSB) and controls interaction
with memory, the PCI bus, and AGP.
• Northbridge’s partner in the chipset is the
Southbridge. The Southbridge handles the IO
functions.
– The Intel Hub Architecture (IHA) is replacing the
Northbridge/Southbridge chipset.
CSIT 301 (Blum)
13
Backside Bus
• The back-side bus (a.ka. the cache bus) connects
the processor to L2 cache. The term back-side bus
is reserved for cases in which the L2 cache is
packaged with the microprocessor.
– If the L2 cache is separate from the processor, the frontside bus will connect the processor to the Level 2
cache.
• Cache (SRAM) operates faster than memory
(DRAM). The backside bus operates at faster
speeds than the front-side bus, sometimes it works
at the processor speed.
CSIT 301 (Blum)
14
FSB Speeds
• The ratio between the CPU speed and bus speed is
a simple fraction.
– For example, a CPU speed of 3.2 GHz and bus speed
of 800 MHz has a ratio of 4.
• With Pentium III’s the 100 and 133 MHz FSB
speeds became standard.
• That rate has been somewhat fixed for a few years
but what is changing is the amount of data
transferred each clock cycle.
• This is where one begins to talk of “DDR” or
“quad-pumped.”
CSIT 301 (Blum)
15
Edge-triggering
CSIT 301 (Blum)
16
Edge triggering
• The clock keeps the various circuit elements
working in unison.
• Elements are typically designed to be active on the
“edge” of the clock – either
– when it is rising (the positive edge)
– Or when it is falling (the negative edge)
• More precise than level activation, where the
action takes places when the clock has a certain
state or level (e.g. when the clock is high).
CSIT 301 (Blum)
17
DDR
• Double Data Rate (DDR) allows data to be
fetched on both the positive and negative
edges of the clock.
– Thus it is essentially the equivalent of doubling
clock rate.
– E.g. a 100MHz DDR transfer equals that of a
200MHz SDR transfer
CSIT 301 (Blum)
18
Quad pumped
• A quad pumped bus allows four signals to be
communicated per clock cycle. This is sometimes
called QDR (Quad Data Rate).
• Pentium 4’s uses a quad pumped FSB.
– The 400MHz FSB is a 100MHz bus with four signals
per cycle.
– The 533MHz FSB is a quad-pumped 133MHz bus.
• Quad pumping is one of the features of the
Pentium 4 Net-Burst micro-architecture.
CSIT 301 (Blum)
19
Manufacturing Technology
CSIT 301 (Blum)
20
Manufacturing technology
• The next specification found in the table is
manufacturing technology, which indicates the
size of the components (mainly transistors) which
reflects the number of components that can be
placed on the chip.
• In earlier microprocessors, one used terms like
large-scale integration (LSI), very large-scale
integration (VLSI) and ultra large-scale
integration (ULSI).
– But as Moore’s Law continued to hold true, we ran out
of adjectives.
CSIT 301 (Blum)
21
Manufacturing Technology
• Today the manufacturing technology is given in
terms of microns or nanometers (e.g. the 0.13micron or the 90-nm technology).
– A nanometer (nm) is a billionth of a meter (10-9 m).
• The same chip may be made using different
technologies, but this is to done to perfect the
newer technology so that more components can be
added to latter chips.
CSIT 301 (Blum)
22
Stepping
CSIT 301 (Blum)
23
Stepping
• As with software, mistakes (errata) in hardware
are found and revisions are needed. However,
hardware mistakes are more difficult to fix.
• The stepping refers to various fixes, so one wants
a higher stepping which presumably has fewer
bugs.
– AMD uses the term “revision number.”
• The circuitry cannot be changed on an existing
chip, it might be possible to overcome a processor
bug by changing the BIOS which can be changed
(flashed).
CSIT 301 (Blum)
24
Pentium 4 Product Information
CSIT 301 (Blum)
25
Document on Specification Update
(Stepping Levels)
CSIT 301 (Blum)
26
Cache size
CSIT 301 (Blum)
27
Cache
• Recall that there are three levels of cache
(L1, L2 and L3) associated with the
processor.
• The cache specification on the previous
slide refers to L2 cache.
• A more detailed set of specification will
reveal the amount of L1 and L2 as well as
the amount of L3 that can be supported.
CSIT 301 (Blum)
28
Package Type
CSIT 301 (Blum)
29
Form Factor and Package
• The term form factor applies to many
devices including processors. It refers to
their size and shape. And in the case of
processors it also includes how they connect
to the motherboard.
– The motherboard has a slot or socket.
• A related term is the “package” — an
enclosure for a chip (integrated circuit).
CSIT 301 (Blum)
30
Pinning
The pins or leads are how a
chip interfaces with the
outside world.
There are various ways to
arrange the pins on a chip.
Furthermore, several chips
can be brought together
into unit called a module
(common in memory).
CSIT 301 (Blum)
31
PGA/DIP/SIP
• PGA: pin grid array, chip in which the
pins are located on the bottom in concentric
squares.
– Used in some microprocessors.
• DIP: dual in-line package, rectangular
chip with two rows of pins, one on each
side.
• SIP: single in-line package, chip with pins
protruding from one side
CSIT 301 (Blum)
32
SEPP
An out-dated processor
packaging scheme.
CSIT 301 (Blum)
• Single-Edge Processor
Package
• With the S.E.P.P. form
factor, the processor is not
completely covered by the
black plastic (as in
S.E.C.C.and S.E.C.C.2).
• The circuit board
(substrate) can be seen
from the bottom side.
33
SECC
Another out-dated processor
packaging scheme.
CSIT 301 (Blum)
• Single Edge Contact
Connector
• With the S.E.C.C. form
factor, processors have a
plastic shroud covering
with an active heatsink
and fan.
• Identifiable by the
goldfinger contacts which
in this case are inside of
the plastic housing.
34
Heat
• Recall that in the history of processors the number
of transistors continues to grow (Moore’s Law)
while the relative size of the chip stays fixed.
With more transistors carrying current, more
heat is produced.
• Various developments have occurred to deal with
the issue of heat. One is a reduction in the
working voltage (5V  3.3V  2V). Another has
been the introduction of the heatsink and fan.
CSIT 301 (Blum)
35
Heat Sink
• The computer has had a fan for some time to deal
with heat. But starting with the 486, the processor
needed special consideration.
• A heat sink is an element designed to take heat
away from the processor.
• In this case, heat is dissipated mainly via
convection, the heat is transferred to the nearby air
and is carried away with the air as it moves.
– Convection is why a breeze feels nice on a hot summer
day.
CSIT 301 (Blum)
36
Desired Effects
• A heat sink should have a large surface area since
this is where the heat is transferred to the air.
• But the heat sink should not block the air flow
since this is how the heat is carried away.
• Heat sinks often have very strange shapes to try to
maximize these two competing effects.
– Typically made of Aluminum
– May have “fins”
CSIT 301 (Blum)
37
Heat Sinks
CSIT 301 (Blum)
38
Passive and Active
• All modern processors have a heat sink. Some also
require a fan.
– Without a fan: passive heat sink
– With a fan: active heat sink
• Because the heat sink’s purpose is to dissipate
heat, it is important that the heat can get from the
processor to the heat sink. The material “gluing”
the heat sink to the processor must conduct heat
well.
• A heat slug is a piece of metal that connects the
processor core to the processor package and/or
heatsink.
CSIT 301 (Blum)
39
SECC2
• As with SECC, with
SECC2 the processors
have a plastic housing
with an active heatsink
(means it has a fan).
• It is distinct from
SECC in that the
goldfinger contacts are
exposed.
CSIT 301 (Blum)
40
PPGA
• Plastic Pin Grid Array
• With PPGA the processors
have pins arranged in a
square pattern. They fit
into Socket 370
motherboards.
• Look for the square
pattern (Pin Grid Array)
on the bottom.
• Slot connectors do not
have pins.
CSIT 301 (Blum)
41
FC-PGA
• Flipped-Chip Pin Grid
Arrays
• The chip is designed so
that the “core” processor,
which is the part that gets
the hottest, is on top
(closer to the heat sink).
• Also fits into a socket 370
motherboard. But it must
be a FCPGA compliant
motherboard for FCPGA
processor to work.
CSIT 301 (Blum)
42
Pentium 4 Form Factors
• Pentium 4’s also come in a FCPGA form factor.
– The package uses 478 pins, which are 2.03 mm long
and .32 mm in diameter.
• FCBGA (Flip Chip Ball Grid Array)
– Instead of pins, FCBGA uses small balls, which acts as
contacts for the processor. Pins bend, ball don’t.
– The package uses 479 balls, which are .78 mm in
diameter.
CSIT 301 (Blum)
43
The LGA
• "Intel’s new LGA, or Land Grid Array, 775 processor
socket takes a step away from traditional implementations
in that the package no longer features pins, rather the
bottom of the LGA 775 processors only have small gold
contacts. With the LGA package, Intel has moved the pins
into the bottom portion of the processor socket, something
that will make installation of the processor easier in that
there is no need to watch for bent pins on the
package...although it will make it more difficult as well.
You no longer need to worry about bent or damaged pins
on the processor, rather now you have to worry twice as
much about bent pins within the processor socket itself."
•
http://rootprompt.org/article.php3?article=7115
CSIT 301 (Blum)
44
The previous specifications differentiated one
Pentium 4 from another. Now let us look at some of
the features that differentiate the Pentium 4 from
other Intel microprocessors.
CSIT 301 (Blum)
45
Micro-architecture
• A processor’s architecture refers to its instruction
set, the number and type of registers, and memoryresident data structures (e.g. stacks) that are available
to a programmer (at least at the assembly level).
• A processor’s micro-architecture refers to the
hardware implementation of the architecture (the
transistors).
• Backward compatibility is within the architecture
(which is more of a logical level). The microarchitecture (implementation) may change
dramatically and is not necessarily compatible with
previous versions.
CSIT 301 (Blum)
46
NetBurst Micro-architecture
• Features of the Pentium 4’s NetBurst microarchitecture include:
–
–
–
–
–
Hyper Pipelined Technology
Improved Branch Prediction
Level 1 Execution Trace Cache
Rapid Execution Engine
400 or 533 MHz System Bus (quad pumping)
• Actually even faster now
CSIT 301 (Blum)
47
NetBurst
CSIT 301 (Blum)
48
Pipelining
• Recall that to execute an instruction, one must
fetch it, decode it, fetch any data required, execute
the instruction, write the answer to the appropriate
place and possibly look for an interrupt requests
that might have occurred during the previous.
• In pipelining a processor can begin executing a
second instruction before the first has been
completed.
• Thus, many instructions are in the pipeline at the
same, though at various processing stages.
CSIT 301 (Blum)
49
Pipelining
• The pipeline is divided into segments. Each
segment can perform its duty at the same time as
the other segments.
• When a segment completes its task, it passes the
result to the next segment and fetches the next
operation from the preceding segment.
• Once a feature of only high-end processors, now
pipelining is standard.
– A Pentium had up to six instruction in the pipeline.
CSIT 301 (Blum)
50
Hyper-Pipelined Technology
• Pentium 4’s Hyper-pipelined technology
uses a 20-stage pipeline.
• Having so many instructions in the works
can be a problem if the program branches
and one has the wrong instructions in the
pipeline.
• For long pipelines to be effective there must
be good “branch prediction.”
CSIT 301 (Blum)
51
NetBurst
CSIT 301 (Blum)
52
BPU
• The Pentium 4’s Branch Prediction Unit
(BPU) is about 33% more efficient than that
of the Pentium III at predicting the
instruction one needs to line up.
• The improved BPU is part of what Intel
calls “Advanced Dynamic Execution.”
CSIT 301 (Blum)
53
NetBurst
CSIT 301 (Blum)
54
Rapid Execution Engine
• Pentium 4’s have two Arithmetic Logic
Units (ALUs) clocked at twice the core
processor frequency.
• This allows basic integer instructions such
as Add, Logical AND, etc. to execute in half
of a clock cycle.
• E.g. the Rapid Execution Engine on a 1.50
GHz Pentium 4 processor runs at 3 GHz.
CSIT 301 (Blum)
55
NetBurst
CSIT 301 (Blum)
56
Level 2 Advanced Transfer Cache
• L2 Advanced Transfer Cache (ATC) yields
a higher throughput between L2 cache and
processor.
–
–
–
–
256 KB in the 0.18 micron technology
512 KB in the 0.13 micron technology
1MB in the 90-nm technology
Now up to 2MB
CSIT 301 (Blum)
57
Manufacturing Technology and
Cache Correlation
If we can put more on the chip, one thing we will choose
to put on is more cache.
CSIT 301 (Blum)
58
Advanced Transfer Cache
• Features of the ATC include:
– Non-Blocking, full speed, on-die level 2 cache
– 8-way set association
• We will explain that when we cover cache
– 512-bit or 256-bit data bus to the level 2 cache
– data clocked into and out of the cache every clock
cycle.
– The Data Prefetch Logic anticipates the data needed by
an application and pre-loads it into the Advanced
Transfer Cache, further increasing processor and
application performance.
CSIT 301 (Blum)
59
NetBurst
CSIT 301 (Blum)
60
Level 1 Execution Trace Cache
• Along with an 8-KB data cache, the Pentium 4 has
a 12-KB Execution Trace Cache that stores
decoded micro-instructions in the order of
program execution.
• Caching decoded micro-instructions saves on the
instruction decoding portion of execution.
• Storing them in execution order speeds things up
and prevents one from having to store instructions
that are “jumped over”.
CSIT 301 (Blum)
61
NetBurst
CSIT 301 (Blum)
62
Enhanced Floating Point and
Multimedia Unit
• The Pentium4 has an expanded 128-bit
floating point register and an additional
register for data movement.
• It improves performance on floating-point
operations and multimedia applications.
CSIT 301 (Blum)
63
NetBurst
CSIT 301 (Blum)
64
Internet Streaming SIMD
Extensions
• SSE is an acronym within an acronym: It stands
for Streaming SIMD Extensions, where SIMD is
Single Instruction Multiple Data
• SSE consists of 70 SIMD instructions for integer
and floating-point operations. It helps with high
resolution images, audio and video viewing,
speech recognition etc.
• Pentium 4 actually uses SSE2.
• SEE2 adds 144 new instructions.
CSIT 301 (Blum)
65
Hyperthreading
CSIT 301 (Blum)
66
Special Compiler
• A compiler is a software tool that takes raw
source code and converts (or compiles) it
into a machine language a computer can
understand.
• Intel® compilers have additional features
that make code run more efficiently and
take advantage of Intel® NetBurst™
architecture.
CSIT 301 (Blum)
67
VTune
• The Intel® VTune™ Performance Analyzer
is used to determine how software performs
when run on a specific processor such as the
Intel® Pentium® 4 processor. Software
developers can then optimize their software
to utilize a processor's features such as
SSE2.
CSIT 301 (Blum)
68
References
• PC Hardware in a Nutshell, Thompson and
Thompson
• http://www.webopedia.com
• http://www.intel.com
• http://www.anandtech.com
• http://www.mbreview.com/lga775.php
CSIT 301 (Blum)
69
Download