File

advertisement
TB/AMP/2010
The Pentium Microprocessors
The Pentium microprocessor signals an improvement to the architecture found in the 80486
microprocessor. The changes include an improved cache structure, a wider data bus width, a
faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache
has been reorganized to form two caches that are each 8K bytes in size, one for caching
data, and the other for instructions. The data bus width has been increased from 32 bits to 64
bits. The numeric coprocessor operates at about five times faster than the 80486 numeric
coprocessor. A dual-integer processor often allows two instructions per clock. Finally, the
branch prediction logic allows programs that branch to execute more efficiently. Notice that
these changes are internal to the Pentium, which makes software upward-compatible from
earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition
of the MMX instructions.
SALIENT FEATURES OF 80586 (PENTIUM)
A salient feature of Pentium is its superscalar, superpipelined architecture. It has two integer
pipelines U and V, where each one is a 4-stage pipeline. This enhances the speed of integer
arithmetic of Pentium to a large extent. Moreover, it has an on-chip floating-point unit,
which has increased the floating-point performance manifold compared to the floatingpoint performances of 80386/486 processors.
Another feature of Pentium is that it contains two separate caches, viz. data cache and
instruction cache. In 80486 there was a single unified data/instruction cache.
The Intel CPU architectures up to 80486 issues only one instruction to the execution unit per
cycle. This obviously leads to a comparatively slow process of decoding and execution. For
enhancement of processor performance beyond one instruction per cycle, the computer
architects employ the technique of multiple instruction issue (MII). Thus a microprocessor
which is capable of issuing more thaw instruction per single processor cycle will be termed
as MII microprocessor. Obvious executing more than one instruction in a cycle, the
microprocessor must have more than execution channels. Thus there are two problems, viz.
5.1
TB/AMP/2010
(a) How to issue multiple instruct, and (b) How to execute them concurrently. Keeping in
view these two issues, I architectures may again be redivided in two classes of architectures
— (i) Very Long Instruction Word (VLIW) architecture and (ii) Superscalar architecture.
Fig. 5 .1 Pentium CPU Architecture
In VLIW processors, the compiler reorders the sequential stream of code that is coming
from memory into a fixed size instruction group and issues them in parallel for execution.
On the other hand, in superscalar architecture the hardware decides which instructions are to
be issued concurrently at run time.
The Pentium CPU is based on superscalar architecture. The hardware, in case of the
superscalar architecture like Pentium, becomes enormously complex because in such a
processor multiple instructions have to be issued in each cycle to the execution unit.
Another important concept involved here is that of pipelining. Pipelining has been
implemented in all the processors from 8086 onwards, in a limited sense when instructions
have been prefetched and stored in a queue.
5.2
TB/AMP/2010
Superscalar Execution
The salient feature of Pentium is that it supports superscalar architecture. For execution of
multiple instructions concurrently, Pentium microprocessor issues two instructions in
parallel to the two independent integer pipelines known as U and V pipelines. Each of these
two pipelines has 5 stages, as shown in Fig. 5.2. These pipeline stages are similar to the one
in 80486 CPU. Functions of these pipelines have been presented in brief.
Fig. 5 .2 Superscalar Organisation
1. In the prefetch stage of the pipeline, the CPU fetches the instructions from the
instruction cache, which stores the instructions to be executed. In this stage, the CPU
also aligns the codes appropriately. This is required since the instructions are of
variable length and the initial opcode bytes of each instruction should be
appropriately aligned. After the prefetch stage, there are two decode stages D1 and
D2.
2. In the D1 stage, the CPU decodes the instruction and generates a control word. For
simple RISC like instructions involving register data transfer or arithmetic and
5.3
TB/AMP/2010
logical operations, only a single control word might be sufficient enough for starting
the execution. However, as we know X86 architecture supports complex CISC
instruction and require microcoded control sequencing.
3. Thus a second decode stage D2 is required where the control word from D1 stage is
again decoded for final execution. Also the CPU generates addresses for data
memory references in this stage.
4. In the execution stage, known as E stage, the CPU either accesses the data cache for
data operands or executes the arithmetic/logic computations or floating-point
operations in the execution unit.
5. In the final stage of the five stage pipeline, which is the WB (writeback) stage, the
CPU updates the registers’ contents or the status in the flag register depending upon
the execution result.
Although, as we mentioned Pentium pipeline structure is somewhat similar to the 80486
pipeline structure, Pentium achieves a lot of speed-up by integrating additional hardware in
each pipeline stages. Thus while 80486 may take two clock cycles to decode some
instructions, Pentium takes only one.
Separate Code and Data Cache
Unlike 80486 microprocessors’ unified code/data cache of 8Kbyte size, Pentium has
introduced two separate 8Kbyte caches for code and data. From the fundamental principles
of cache operation, one may observe that a unified cache, as in 80486 will always have a
higher hit ratio than two separate caches. Why then Pentium has gone in for separate
caches? The answer probably lies in the fact that to support the superscalar organisation, it
demanded more bandwidth that a unified cache could not provide. Moreover to efficiently
execute the branch prediction, separate caches are more meaningfully employed.
5.4
TB/AMP/2010
The Memory System
The memory system for the Pentium microprocessor is 4G bytes in size, just as in the
8O386DX and 80486 microprocessors. The difference lies in the width of the memory data
bus. The Pentium uses a 64-bit data bus to address memory organized in eight banks that
each contains 512Mbytes of data. Figure 5.3 shows the organization of the Pentium physical
memory system.
The Pentium memory system is divided into eight banks that each stores a byte of data with
a parity bit. The Pentium, like the 80486, employs internal parity generation and checking
logic for the memory system’s data bus information. (Note that most Pentium systems do
not use parity checks, but it is available.) The 64-bit wide memory is important to doubleprecision floating-point data. Recall that a double-precision floating-point number is 64 bits
wide. Because of the change to a 64-bit wide data bus, the Pentium is able to retrieve
floating-point data with one read cycle, instead of two as in the 80486. This causes the
Pentium to function at a higher throughput than an 80486. As with earlier 32-bit Intel
microprocessors, the memory system is numbered in bytes from byte 00000000H to byte
FFFFFFFFH.
Fig. 5.3. The 8-byte wide memory banks of the Pentium microprocessor.
Memory selection is accomplished with the bank enable signals ( BE 7 — BE 0 ). These
separate memory banks allow the Pentium to access any single byte, word, doubleword, or
quadword with one memory transfer cycle. As with earlier memory selection logic, we often
generate eight separate write strobes for writing to the memory system.
5.5
TB/AMP/2010
A new feature added to the Pentium is its capability to check and generate parity for the
address bus (A3 1—A5) during certain operations. The AP pin provides the system with
parity information and the APCHK indicates a bad parity check for the address bus. The
Pentium takes no action when an address parity error is detected. The error must be assessed
by the system and the system must take appropriate action (an interrupt), if so desired.
The Pentium can function with a 32-bit wide memory system by using a multiplexer to
convert the 64-bit data bus to a 32-bit data bus. A set of bi-directional multiplexers (bidirectional buffers are used as multiplexers) are used to convert the Pentium’s 64-bit data
bus into a 32-bit data bus. Care must be taken when using this arrangement because
software could access a doubleword that crosses the boundary between the lower and upper
halves of the data bus. All doublewords must be stored at doubleword boundaries. Note that
a doubleword boundary is an address that is divisible by 4.
Input/output System
The input/output system of the Pentium is completely compatible with earlier Intel
microprocessors. The I/O port number appears on address lines A15—A3 with the bank
enable signals used to select the actual memory banks used for the I/O transfer.
Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS
segment when the Pentium is operated in the protected mode. This allows I/O ports to be
selectively inhibited. If the blocked I/O location is accessed, the Pentium generates type13
interrupt to signal an I/O privilege violation.
5.6
TB/AMP/2010
Special Pentium Registers
The Pentium is essentially the same microprocessor as the 80386 and 80486, except that
some additional features and changes to the control register set have occurred.
Control Registers
Figure 5.4 shows the control register structure for the Pentium microprocessor. Note that a
new control register CR4 has been added to the control register array.
Fig. 5.4. The structure of the Pentium control registers.
PG
Selects page table translation of linear addresses into physical addresses
when PG = 1. Page table translation allows any linear address to be assigned
any physical memory location.
CD
Cache disable controls the internal cache. If CD = 1, the cache will not fill
with new data for cache misses, but it will continue to function for cache hits.
If CD = 0, misses will cause the cache to fill with new data.
NW
Not write-through selects the mode of operation for the data cache. If NW =
1, the data cache is inhibited from cache write-through.
AM
Alignment mask enables alignment checking when set. Note that alignment
checking only occurs for protected mode operation when the user is at
privilege level 3.
5.7
TB/AMP/2010
WP
Write protect protects user level pages against supervisor level write
operations. When WP = 1, the supervisor can write to user level segments.
NE
Numeric error enables standard numeric coprocessor error detection. If NE =
1, the FERR pin becomes active for a numeric coprocessor error. If NE = 0,
any coprocessor error is ignored.
ET
Selects the 80287 coprocessor when ET =0 or the 80387 coprocessor when
ET=1. This bit was installed because there was no 80387 available when the
80386 first appeared. In most systems, ET is set to indicate that an 80387 is
present in the system.
TS
Indicates that the 80386 has switched tasks (in protected mode, changing the
contents of TR places a 1 into TS). If TS = 1, a numeric coprocessor
instruction causes a type 7 (coprocessor not available) interrupt.
EM
Is set to cause a type 7 interrupt for each ESC instruction. (ESCape
instructions are used to encode instructions for the 80387 coprocessor.) We
often use this interrupt to emulate, with software, the function of the
coprocessor. Emulation reduces the system cost, but it often requires at least
100 times longer to execute the emulated coprocessor instructions.
MP
Is set to indicate that the arithmetic coprocessor is present in the system.
PE
Is set to select the protected mode of operation for the 80386. It may also be
cleared to re-enter the real mode. This bit can only be set in the 80286. The
80286 could not return to real mode without a hardware reset, which
precludes its use most systems that use protected mode.
VME
Virtual mode extension enables support for the virtual interrupt flag in
protected mode. If VME = 0, virtual interrupt support is disabled.
PVI
Protected mode virtual interrupt enables support for the virtual interrupt flag
in protected mode.
TSD
Time stamp disable controls the RDTSC instruction.
5.8
TB/AMP/2010
DE
Debugging extension enables I/O breakpoint debugging extensions when set.
PSE
Page size extension enables 4M-byte memory pages when set
MCE
Machine check enable enables the machine checking interrupt.
The Pentium contains new features that are controlled by CR4 and a few bits in CR0.
EFLAG Register
The extended flag (EFLAG) register has been changed in the Pentium microprocessor.
Figure5.5 pictures the contents of the EFLAG register. Four new flag bits have been added
to this register to control or indicate conditions about some of the new features in the
Pentium.
Fig. 5.5. The structure of the Pentium EFLAG register.
Following is a list of the four new flags and the function of each:
ID
The identification flag is used lb test for the CPUID instruction. If a program can set
and clear the ID flag, the processor supports the CPUID instruction.
VIP
Virtual interrupt pending indicates that a virtual interrupt is pending.
VIF
Virtual interrupt is the image of the virtual interrupt flag IF used with VIP
AC
Alignment check indicates the state of the AM bit in control register 0.
VM
Virtual Mode Flag If this flag is set, the 80386 enters the virtual 8086 mode within
the protected mode. This is to be set only when the 80386 is in protected mode. In
this mode, if any privileged instruction is executed an exception 13 is generated.
This bit can be set using the IRET instruction or any task switch operation only in
the protected mode.
5.9
TB/AMP/2010
RF
Resume Flag This flag is used with the debug register break points. It is checked at
the starting of every instruction cycle and if it is set, any debug fault is ignored
during the instruction cycle. The RF is automatically reset after successful execution
of every instruction, except for the IRET and POPF instructions. Also, it is not
automatically cleared after the successful execution of JMP, CALL and TNT
instructions causing a task switch. These instructions are used to set the RF to the
value specified by the memory data available at the stack.
NT
Nested Task Flag
IOP
I/O privilege level
Built-In Self-Test (BIST)
The built-in self-test (BIST) is accessed on power-up by placing a logic 1 on INIT while the
RESET pin changes from 1 to 0. The BIST tests 70 percent of the internal structure of the
Pentium in approximately 150μs. Upon completion of the BIST, the Pentium reports the
outcome in register EAX. If EAX = 0, the BIST passed and the Pentium is ready for
operation. If EAX contains any other value, the Pentium has malfunctioned and is faulty.
PENTIUM MEMORY MANAGEMENT
The memory-management unit within the Pentium is upward-compatible with the 80386
and 80486 microprocessors. Many of the features of these earlier microprocessors are
basically unchanged in the Pentium. The main change is in the paging unit and a new
system memory-management mode.
Paging Unit
The paging mechanism functions with 4K-byte memory pages or with a new extension
available to the Pentium with 4M byte-memory pages. As detailed in Chapters 1 and 17, the
size of the paging table structure can become large in a system that contains a large memory.
Recall that to fully repage 4G bytes of memory, the microprocessor requires slightly over
4M bytes of memory just for the page tables. In the Pentium, with the new 4M-byte paging
5.10
TB/AMP/2010
feature, this is dramatically reduced to just a single page table. The new 4M-byte page sizes
are selected by the PSE bit in control register 4.
The main difference between 4K paging and 4M paging is that in the 4M paging scheme
there is no page table entry in the linear address. See Figure 5.6 for the 4M paging system in
the Pentium microprocessor. Pay close attention to the way the linear address is used with
this scheme. Notice that the leftmost 10 bits of the linear address select an entry in the page
directory (just as with 4K pages). Unlike 4K pages, there are no page tables; instead, the
page directory addresses a 4M-byte memory page.
Fig. 5.6 The linear address 00200001H repaged to memory location
01000002H in 4Mbyte pages. Note that there are no page tables.
Memory-Management Mode
The system memory-management mode (SMM) is on the same level as protected mode, real
mode, and virtual mode, but it is provided to function as a manager. The SMM is not
intended to be used as an application or a system-level feature. It is intended for high-level
system functions such as power management and security, which most Pentiums use during
operation.
5.11
TB/AMP/2010
Access to the SMM is accomplished via a new external hardware interrupt applied to the
SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins
executing system-level software in an area of memory called the system management RAM
or SMMRAM, called the SMM state dump record. The SMI interrupt disables all other
interrupts that are normally handled by user applications and the operating system. A return
from the SMM interrupt is accomplished with a new instruction. RSM returns from the
memory-management mode interrupt and returns to the interrupted program at the point of
the interruption.
The SMM interrupt calls the software, initially stored at memory location 38000H, using
CS=3000H and EIP = 8000H. This initial state can be changed using a jump to any location
within the first 1M byte of memory. An environment similar to real-mode memory
addressing is entered by the management mode interrupt, but it is different because, instead
of being able to address the first 1M of memory, SMM mode allows the Pentium to treat the
memory system as a flat, 4G-byte system.
In addition to executing software that begins at location 38000H, the SMM interrupt also
stores the state of the Pentium in what is called a dump record. The dump record is stored at
memory locations 3FFA8H through 3FFFFH, with an area at locations 3FE00H through
3FEF7H that is reserved by Intel. The dump record allows a Pentium-based system to enter
a sleep mode and reactivate at the point of program interruption. This requires that the
SMMRAM be powered during the sleep period. Many laptop computers have a separate
battery to power the SMMRAM for many hours during sleep mode.
The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the
RSM instruction. These data allow the RSM instruction to return to the halt-state or return to
the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering
the mode, the RSM instruction reloads the state of the machine from the state dump and
returns point of interruption.
5.12
TB/AMP/2010
The SMM mode can be used by the system before the normal operating system is placed in
the memory and executed. It can also periodically be used to manage the system, provided
that normal software doesn’t exist at location 38000H—3FFFFH. If the system relocates the
SMRAM before booting the normal operating system, it becomes available for use in
addition to the normal system.
The base address of the SMM mode SMRAM is changed by modifying the value in the state
dump base address registers (locations 3FEF8H through 3F3FBH) after the first memorymanagement mode interrupt. When the first RSM instruction is executed, returning control
back to the interrupted system, the new value from these locations changes the base address
of the SMM interrupt for all future uses. For example, if the state dump base address is
changed to 000E8000H, all subsequent SMM interrupts use locations E8000H—EFFFFH
for the Pentium state dump. These locations are compatible with DOS and Windows.
PENTIUM II
Pentium II is also a 32-bit processor with 64-bit data bus and 36-bit address bus to address
up to 64GB of physical memory space. It is actually a Pentium pro processor with on-chip
MMX (Multi Media Extension). It is available with maximum internal ratings of 233 MHz
to 450 MHz.
The features of Pentium II processor are;
(i) Supports the INTEL architecture with dynamic execution.
(ii) Integrated primary (L1) 16-kb instruction cache and 16-kb write back data cache.
(iii)Integrated 256kb second level (L2) cache.
(iv) Fully compatible with previous microprocessors.
(v) Supports MMX technology.
(vi) Quick start and Deep sleep modes provide extremely low power dissipation.
5.13
TB/AMP/2010
(vii)
Low power GTL + processor system bus interface (GTL: Gunning transceiver
Logic).
(viii) Integrated math co-processor.
(ix) Integrated thermal diode for measuring processor temperature.
Pentium II Software Changes
The Pentium II microprocessor core is a Pentium Pro. This means that the Pentium II and
the Pentium Pro are essentially the same device for software. This section lists the changes
to the CPUID instruction; and the SYSENTER, SYSEXIT, FXSAVE, and FXRSTORE
instructions (the only modifications to the software).
CPUID Instruction
Table 5.1 lists the values passed between the Pentium II and the CPUID instruction. These
are changed from earlier versions of the Pentium microprocessor.
The version information returned after executing the CPUID instruction with a logic 0 in
EAX is returned in EAX. The family ID is returned in bits 8 to 11; the model ID is returned
in bits 4 to 7. The stepping ID is returned in bits 0 to 3. For the Pentium II, the model
number is 6 and the family ID is a 3. The stepping number refers to an update number. The
higher the stepping number, the newer the version.
TABLE 5.1 CPUID instruction.
5.14
TB/AMP/2010
The features are indicated in the EDX register after executing the CPUID instruction with a
zero in EAX. Only two new features are returned in EDX for the Pentium II. Bit position 11
indicates whether the microprocessor supports the two new fast call instructions
SYSENTER and SYSEXIT. Bit position 23 indicates whether the microprocessor supports
the MMX instruction set. The remaining bits are identical to earlier versions of the
microprocessor and are not described. Bit 16 indicates whether the microprocessor supports
the page attribute table or PAT. Bit 17 indicates whether the microprocessor supports the
page size extension found with the Pentium Pro and Pentium II microprocessors. The page
size extension allows memory above 4G through MG to be addressed. Finally, bit 24
indicates whether the fast floating-point save and restore instructions are implemented.
SYSENTER and SYSEXIT Instructions
The SYSENTER and SYSEXIT instructions use the fast call facility introduced in the
Pentium II microprocessor. Please note that these instructions function only in ring zero
(privilege level 0) in protected mode. Windows operates in ring 0, but does not allow
applications access to ring 0. These new instructions are meant for operating system
software.
The SYSENTER instruction uses some of the model-specific registers to store CS, EIP, and
ESP to execute a fast call to a procedure defined by the model-specific register. The fast call
is different from a regular call because it does not push the return address onto the stack as a
regular call. Table 5.2 illustrates the model-specific register used with SYSENTER and
SYSEXIT. Note that the model-specific registers are read with the RDMSR instruction and
written with the WRMSR instruction.
TABLE 5.2 The model- specific registers used with
SYSENTER and SYSEXIT.
5.15
TB/AMP/2010
To use the RDMSR or WRMSR instructions, place the register number in the ECX register.
If the WRMSR is used, place the new data for the register in EDS: EAX. For the
SYSENTER instruction, you need use only the EAX register, but place a zero into EDX. If
the RDMSR instruction is used, the data are returned in the EDX: EAX register pair.
To use the SYSENTER instruction, first load the model-specific registers with the address
of the system entrance point into the SYSENTER_CS and SYSENTER._EIP registers. This
would normally be the address of the operating system such as Windows or Windows NT.
Note that this instruction is meant as a system instruction to access code or software in ring
0. The stack segment register is lo4ded with the value placed into SYSENTER_CS plus 8.
In other words, the selector pair addressed by SYSENTER._CS selector value are loaded
into CS and SS. The value of the stack offset is loaded into SYSENTER_ESP.
The SYSEXIT instruction loads CS and SS with the selector pair addressed by
SYSENTER_CS plus 16 and 24. Table 5.3 illustrates the selectors from the global selector
table, as addressed by SYSENTER_CS. In addition to the code and stack segment selector
and the memory segments that they represent, the SYSEXIT instruction passes the value in
EDX to the EIP register and the value in ECX to the ESP register. The SYSEXIT instruction
returns control back to application ring 3. As mentioned, these instructions appear to have
been designed for quick entrance and return from the Windows or Windows NT operating
systems on the personal computer.
TABLE 5.3 Selectors addressed by the SYSENTER_CS select value.
To use SYSENTER and SYSEXIT, the SYSENTER instruction must pass the return
address to the system. This is accomplished by loading the EDX register with the return
offset arni by placing the segment address in the global descriptor table at location
SYSENTER_C?+. The stack segment is transferred by loading the stack segment selector
into SYSENTER_CS+24 and the ESP into the ECX.
5.16
TB/AMP/2010
FXSAVE and FXRSTOR Instructions
The last two new instructions added to the Pentium II microprocessor are the FXSAVE and
FXRSTOR instructions, which are almost identical to the FSAVE and FRSTOR
instructions. The main difference is that the FXSAVE instruction is designed to properly
store the state of the MMX machine, while the FSAVE properly stores the state of the
floating- point coprocessor. The FSAVE instruction stores the entire tag field, while the
FXSAVE instruction only stores the valid bits of the tag field. The valid tag field is used to
reconstruct the restore tag field when the FXRSTOR instruction executes. This means that if
the MMX state of the machine is saved, use the FXSAVE instruction; if the floating-point
state of the machine is saved, use the FSAVE instruction. For new applications, it is
recommended that the FXSAVE and FXRSTOR instructions should be used to save the
MMX state and floating-point state of the machine. Do not use the FSAVE and FRSTOR
instructions in new applications.
THE PENTIUM III
The Pentium III microprocessor is an improved version of the Pentium II microprocessor.
Even though it is newer than the Pentium II, it is still based on the Pentium Pro architecture.
There are two versions of the Pentium III. One version is available with a non-blocking
512Kbyte cache and packaged in the slot 1 cartridge, and the other version is available with
a 256K-byte advanced transfer cache and packaged in an integrated circuit. The slot 1version cache runs at half the processor speed, and the integrated-cache version runs at the
processor clock frequency. As shown in most benchmarks of cache performance, increasing
the cache size from 256K bytes to 512K bytes only improves performance by a few percent.
The salient architectural features are:
1. P-III CPU has been developed using 0.25 micron technology and includes over 9.5
million transistors. It has three versions operating at 450 MHz, 500 MHz and 550 MHz
which are commercially available.
2. P-III incorporates multiple branch prediction algorithms.
5.17
TB/AMP/2010
3. Seventy new instructions have been added to Pentium III. These instructions are useful in
advanced imaging, speech processing and multimedia applications.
4. Dual independent bus architecture increases bandwidth.
5. P-III employs dynamic execution technology, which has already been discussed.
6. A 512Kbyte unified, non-blocking level 2 cache has been used.
7. Eight 64-bit wide Intel MMX registers along with a set of 57 instructions for multimedia
applications are available
Chip Sets
The chip set for the Pentium III is different from the Pentium II. The Pentium III uses an
Intel 810, 815, or 820 chipset. The 815 is most commonly found in newer systems that use
the Pentium III. A few other vendor chip sets are available, but problems with drivers for
new peripherals, such as the video cards, have been reported. An 840 chip set also was
developed for the Pentium III, but Intel does not make it available.
Bus
The Coppermine version of the Pentium III increases the bus speed to either 100 MHz or
133MHz. The faster version allows transfers between the microprocessor and the memory at
higher speeds. Suppose that a 1-GHz microprocessor uses a 133-MHz memory bus. You
might think that the memory bus speed could be faster to improve performance. However,
the connections between the microprocessor and the memory preclude using a higher speed
for the memory. If it is decided to use a 200-MHz bus speed, we must recognize that a
wavelength at 200 MHz is 300,000,000/200,000,000 or 3/2 meter. An antenna is 1/4 of a
wavelength. At 200 MHz, an antenna is 14.8 inches. We do not want to radiate energy at
200 MHz, so we need to keep the printed circuit board connections shorter than 1/4wavelength. In practice, we would keep the connections to no more than 1/10 of 1/4wavelength. This means that the connections in a 200MHz system should be no longer than
1.48 inches. This size would present the main board manufacturer with a problem when
placing the sockets for a 200 MHz memory system.
5.18
TB/AMP/2010
It is possible to approach or even exceed the 200 MHz memory system, if we develop a new
technology for interconnecting the microprocessor, chipset, and memory. At present the
memory functions in bursts of four 64-bit numbers each time we read the main memory.
This burst of 32bytes is read into the cache. The main memory requires 3 wait states at 100
MHz to access the first 64-bit number and then zero wait states for each of the three
remaining 64-bit wide numbers for a total of seven 100 MHz bus clocks. This means we are
reading data at 70 ns / 32 = 2.1875ns per byte, which is a bus speed of 457M bytes per
second. This is slower than the clock on a 1GHz microprocessor, but because most
programs are cyclic and the instructions are stored ii internal cache, we can and often do
approach the operating frequency of the microprocessor.
PENTIUM IV
The most recent version of the Pentium Pro architecture microprocessor is the Pentium 4
microprocessor from Intel. The Pentium 4 was released initially in November 2000 with a
speed of 1.3 GHz. It is currently available in speeds up to 2.0 GHz. There are two packages
available for this integrated microprocessor, the 423-pin PGA and the 478-pin FC-PGA2.
Both versions use the 1.8 micron technology for fabrication. As with earlier versions of the
Pentium, the Pentium 4 uses a 100-MHz memory bus speed, but because it is quad pumped,
the bus speed can approach 400 MHz.
Memory Interface
The memory interface to the Pentium 4 typically uses the Intel 850 chipset. The 850
provides a dual-pipe memory bus to the microprocessor with each pipe interfaced to a 32-bit
wide section of the memory. The two pipes function together to comprise the 64-bit wide
data path to the microprocessor. Because of the dual pipe arrangement, the memory must be
populated with pairs of RDRAM memory devices operating at either 600 MHz or 800 MHz.
According to Intel this arrangement provides a 300% increase in speed over a memory
populated with PC-l00 memory.
5.19
TB/AMP/2010
Hyper Pipelined Technology
The Pentium 4 incorporates a deeper pipelined architecture than prior versions of the
Pentium microprocessor. Not only does it queue instructions for execution, but it also
queues microinstruction for execution in a special cache for the microprocessor core. This
special microinstruction cache is 12K bytes deep. This technology excludes the execution
unit from the main cache path to the microinstruction stream to increase performance.
RISC Architecture
The complexities of the instructions supported by a CISC processor went on increasing, as
more and more sophisticated processors were designed and marketed. This resulted in an
increase of processor die size to accommodate the large microcode required by the complex
instructions. The large size in turn meant more cost, since it consumes more silicon. Also
the chip size increases, the power consumption increases, resulting in more heating of the
chip. This in turn requires more cooling arrangement.
If we use processor, which support a set of simpler instructions, which do not require
complex decoding, then the design of processor becomes simple, with an associated
reduction in cost and power consumption. Also the execution of these instructions becomes
very fast.
As the name implies, Reduced Instruction Set Computer or RISC as it is popularly known is
a type of architecture that utilizes a small, lightly optimized set f instructions, rather than a
more specialized set of instructions often found in other types of architectures. Typica1ly
every instruction is executed in a single clock after it is fetched and decoded. These
instructions are executed very fast. Lot of disc space is consumed by micro codes in a ClSC
design which could be otherwise used for enhanced features. It is thus possible to produce
more RISC processors per silicon wafer. This makes RISC processors smaller, with less
energy consumption.
5.20
TB/AMP/2010
THE ADVANTAGES OF RISC
There are several advantages of a RISC processor over its CISC counterpart. Implementing
a processor with a simplified instruction set design provides several advantages over
implementing a comparable CISC design. Some of the advantages are as below.
(i)
RISC instructions, being simple, can be hard-wired, while CISC architectures may
have to use micro-programming in order to implement comp1ex instructions.
(ii)
A set of simple instructions results in reduced complexity of the control unit and
the data-path; as a consequence, the processor can work at a high clock frequency
and thus yields higher speed.
(iii)
As a result several extra functionalities, such as memory management units or
floating point arithmetic units, can also be placed on the same chip.
(iv)
Smaller chips allow a semiconductor manufacturer to place more parts on a single
silicon wafer, which can lower the per-chip cost dramatically.
(v)
High-level language compilers produce more efficient codes in a RISC processor
than its counterpart CISC processor, because they tend use the smaller set of
instructions in a RISC computer.
(vi)
Shorter design cycle—A new RISC processor can be designed and tested more
quickly since RISC processors are simpler than corresponding CISC processors.
(vii)
The application programmers who use the microprocessor’s instructions will find
it easier to - develop code with a smaller and optimum instruction set.
(viii)
Another advantage is that the loading and decoding of instructions in a R1SC
processor is simple and fast, as it is not needed to wait until the length of an
instruction is known in order to start decoding the following one. Decoding is
simplified as opcode and address fields are located in the same position for all
instructions.
5.21
TB/AMP/2010
BASIC FEATURES OF RISC PROCESSORS
(i)
Simple instruction set: In a RISC machine, the instruction set contains simple,
basic instructions, from which more complex instructions can be composed.
Thus instructions with less latency are preferred.
(ii)
Same length instruction: Each instruction is of the same length, so that it may be
fetched in a single operation. The traditional microprocessors from Intel or
Motorola support variable length instructions.
(iii)
Single machine-cycle instructions: Most instructions complete in one machine
cycle, which allows the processor to handle several instructions at the same time.
RISC processors have unity CPI (clock per instruction), which is due to the
optimization of each instruction on the CPU and massive pipelining embedded in
a RISC processor.
(iv)
Pipelining: Usually massive pipelining is embedded in a RISC processor. The
pipelining is key to speed up RISC machines.
(v)
Very few addressing modes and formats: Unlike the CISC processors, where the
number of addressing modes are very high, in RISC processors, the addressing
modes are much less and it supports few formats.
(vi)
Large number of registers: The RISC design philosophy generally incorporates a
larger number of registers to prevent in large amounts of interactions with
memory.
(vii)
Microcoding not required: Unlike in a CISC machine, in RISC architecture,
instruction microcoding is not required. This is because of the availability of a
set of simple instructions and simple instructions may be easily built into the
hardware.
(viii) Load and Store architecture: The RISC architecture is primarily a Load and
Store architecture implying that all the memory accesses take place using Load
or Store type operations.
5.22
Download