TB/AMP/2010 The Pentium Microprocessors The Pentium microprocessor signals an improvement to the architecture found in the 80486 microprocessor. The changes include an improved cache structure, a wider data bus width, a faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache has been reorganized to form two caches that are each 8K bytes in size, one for caching data, and the other for instructions. The data bus width has been increased from 32 bits to 64 bits. The numeric coprocessor operates at about five times faster than the 80486 numeric coprocessor. A dual-integer processor often allows two instructions per clock. Finally, the branch prediction logic allows programs that branch to execute more efficiently. Notice that these changes are internal to the Pentium, which makes software upward-compatible from earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition of the MMX instructions. SALIENT FEATURES OF 80586 (PENTIUM) A salient feature of Pentium is its superscalar, superpipelined architecture. It has two integer pipelines U and V, where each one is a 4-stage pipeline. This enhances the speed of integer arithmetic of Pentium to a large extent. Moreover, it has an on-chip floating-point unit, which has increased the floating-point performance manifold compared to the floatingpoint performances of 80386/486 processors. Another feature of Pentium is that it contains two separate caches, viz. data cache and instruction cache. In 80486 there was a single unified data/instruction cache. The Intel CPU architectures up to 80486 issues only one instruction to the execution unit per cycle. This obviously leads to a comparatively slow process of decoding and execution. For enhancement of processor performance beyond one instruction per cycle, the computer architects employ the technique of multiple instruction issue (MII). Thus a microprocessor which is capable of issuing more thaw instruction per single processor cycle will be termed as MII microprocessor. Obvious executing more than one instruction in a cycle, the microprocessor must have more than execution channels. Thus there are two problems, viz. 5.1 TB/AMP/2010 (a) How to issue multiple instruct, and (b) How to execute them concurrently. Keeping in view these two issues, I architectures may again be redivided in two classes of architectures — (i) Very Long Instruction Word (VLIW) architecture and (ii) Superscalar architecture. Fig. 5 .1 Pentium CPU Architecture In VLIW processors, the compiler reorders the sequential stream of code that is coming from memory into a fixed size instruction group and issues them in parallel for execution. On the other hand, in superscalar architecture the hardware decides which instructions are to be issued concurrently at run time. The Pentium CPU is based on superscalar architecture. The hardware, in case of the superscalar architecture like Pentium, becomes enormously complex because in such a processor multiple instructions have to be issued in each cycle to the execution unit. Another important concept involved here is that of pipelining. Pipelining has been implemented in all the processors from 8086 onwards, in a limited sense when instructions have been prefetched and stored in a queue. 5.2 TB/AMP/2010 Superscalar Execution The salient feature of Pentium is that it supports superscalar architecture. For execution of multiple instructions concurrently, Pentium microprocessor issues two instructions in parallel to the two independent integer pipelines known as U and V pipelines. Each of these two pipelines has 5 stages, as shown in Fig. 5.2. These pipeline stages are similar to the one in 80486 CPU. Functions of these pipelines have been presented in brief. Fig. 5 .2 Superscalar Organisation 1. In the prefetch stage of the pipeline, the CPU fetches the instructions from the instruction cache, which stores the instructions to be executed. In this stage, the CPU also aligns the codes appropriately. This is required since the instructions are of variable length and the initial opcode bytes of each instruction should be appropriately aligned. After the prefetch stage, there are two decode stages D1 and D2. 2. In the D1 stage, the CPU decodes the instruction and generates a control word. For simple RISC like instructions involving register data transfer or arithmetic and 5.3 TB/AMP/2010 logical operations, only a single control word might be sufficient enough for starting the execution. However, as we know X86 architecture supports complex CISC instruction and require microcoded control sequencing. 3. Thus a second decode stage D2 is required where the control word from D1 stage is again decoded for final execution. Also the CPU generates addresses for data memory references in this stage. 4. In the execution stage, known as E stage, the CPU either accesses the data cache for data operands or executes the arithmetic/logic computations or floating-point operations in the execution unit. 5. In the final stage of the five stage pipeline, which is the WB (writeback) stage, the CPU updates the registers’ contents or the status in the flag register depending upon the execution result. Although, as we mentioned Pentium pipeline structure is somewhat similar to the 80486 pipeline structure, Pentium achieves a lot of speed-up by integrating additional hardware in each pipeline stages. Thus while 80486 may take two clock cycles to decode some instructions, Pentium takes only one. Separate Code and Data Cache Unlike 80486 microprocessors’ unified code/data cache of 8Kbyte size, Pentium has introduced two separate 8Kbyte caches for code and data. From the fundamental principles of cache operation, one may observe that a unified cache, as in 80486 will always have a higher hit ratio than two separate caches. Why then Pentium has gone in for separate caches? The answer probably lies in the fact that to support the superscalar organisation, it demanded more bandwidth that a unified cache could not provide. Moreover to efficiently execute the branch prediction, separate caches are more meaningfully employed. 5.4 TB/AMP/2010 The Memory System The memory system for the Pentium microprocessor is 4G bytes in size, just as in the 8O386DX and 80486 microprocessors. The difference lies in the width of the memory data bus. The Pentium uses a 64-bit data bus to address memory organized in eight banks that each contains 512Mbytes of data. Figure 5.3 shows the organization of the Pentium physical memory system. The Pentium memory system is divided into eight banks that each stores a byte of data with a parity bit. The Pentium, like the 80486, employs internal parity generation and checking logic for the memory system’s data bus information. (Note that most Pentium systems do not use parity checks, but it is available.) The 64-bit wide memory is important to doubleprecision floating-point data. Recall that a double-precision floating-point number is 64 bits wide. Because of the change to a 64-bit wide data bus, the Pentium is able to retrieve floating-point data with one read cycle, instead of two as in the 80486. This causes the Pentium to function at a higher throughput than an 80486. As with earlier 32-bit Intel microprocessors, the memory system is numbered in bytes from byte 00000000H to byte FFFFFFFFH. Fig. 5.3. The 8-byte wide memory banks of the Pentium microprocessor. Memory selection is accomplished with the bank enable signals ( BE 7 — BE 0 ). These separate memory banks allow the Pentium to access any single byte, word, doubleword, or quadword with one memory transfer cycle. As with earlier memory selection logic, we often generate eight separate write strobes for writing to the memory system. 5.5 TB/AMP/2010 A new feature added to the Pentium is its capability to check and generate parity for the address bus (A3 1—A5) during certain operations. The AP pin provides the system with parity information and the APCHK indicates a bad parity check for the address bus. The Pentium takes no action when an address parity error is detected. The error must be assessed by the system and the system must take appropriate action (an interrupt), if so desired. The Pentium can function with a 32-bit wide memory system by using a multiplexer to convert the 64-bit data bus to a 32-bit data bus. A set of bi-directional multiplexers (bidirectional buffers are used as multiplexers) are used to convert the Pentium’s 64-bit data bus into a 32-bit data bus. Care must be taken when using this arrangement because software could access a doubleword that crosses the boundary between the lower and upper halves of the data bus. All doublewords must be stored at doubleword boundaries. Note that a doubleword boundary is an address that is divisible by 4. Input/output System The input/output system of the Pentium is completely compatible with earlier Intel microprocessors. The I/O port number appears on address lines A15—A3 with the bank enable signals used to select the actual memory banks used for the I/O transfer. Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS segment when the Pentium is operated in the protected mode. This allows I/O ports to be selectively inhibited. If the blocked I/O location is accessed, the Pentium generates type13 interrupt to signal an I/O privilege violation. 5.6 TB/AMP/2010 Special Pentium Registers The Pentium is essentially the same microprocessor as the 80386 and 80486, except that some additional features and changes to the control register set have occurred. Control Registers Figure 5.4 shows the control register structure for the Pentium microprocessor. Note that a new control register CR4 has been added to the control register array. Fig. 5.4. The structure of the Pentium control registers. PG Selects page table translation of linear addresses into physical addresses when PG = 1. Page table translation allows any linear address to be assigned any physical memory location. CD Cache disable controls the internal cache. If CD = 1, the cache will not fill with new data for cache misses, but it will continue to function for cache hits. If CD = 0, misses will cause the cache to fill with new data. NW Not write-through selects the mode of operation for the data cache. If NW = 1, the data cache is inhibited from cache write-through. AM Alignment mask enables alignment checking when set. Note that alignment checking only occurs for protected mode operation when the user is at privilege level 3. 5.7 TB/AMP/2010 WP Write protect protects user level pages against supervisor level write operations. When WP = 1, the supervisor can write to user level segments. NE Numeric error enables standard numeric coprocessor error detection. If NE = 1, the FERR pin becomes active for a numeric coprocessor error. If NE = 0, any coprocessor error is ignored. ET Selects the 80287 coprocessor when ET =0 or the 80387 coprocessor when ET=1. This bit was installed because there was no 80387 available when the 80386 first appeared. In most systems, ET is set to indicate that an 80387 is present in the system. TS Indicates that the 80386 has switched tasks (in protected mode, changing the contents of TR places a 1 into TS). If TS = 1, a numeric coprocessor instruction causes a type 7 (coprocessor not available) interrupt. EM Is set to cause a type 7 interrupt for each ESC instruction. (ESCape instructions are used to encode instructions for the 80387 coprocessor.) We often use this interrupt to emulate, with software, the function of the coprocessor. Emulation reduces the system cost, but it often requires at least 100 times longer to execute the emulated coprocessor instructions. MP Is set to indicate that the arithmetic coprocessor is present in the system. PE Is set to select the protected mode of operation for the 80386. It may also be cleared to re-enter the real mode. This bit can only be set in the 80286. The 80286 could not return to real mode without a hardware reset, which precludes its use most systems that use protected mode. VME Virtual mode extension enables support for the virtual interrupt flag in protected mode. If VME = 0, virtual interrupt support is disabled. PVI Protected mode virtual interrupt enables support for the virtual interrupt flag in protected mode. TSD Time stamp disable controls the RDTSC instruction. 5.8 TB/AMP/2010 DE Debugging extension enables I/O breakpoint debugging extensions when set. PSE Page size extension enables 4M-byte memory pages when set MCE Machine check enable enables the machine checking interrupt. The Pentium contains new features that are controlled by CR4 and a few bits in CR0. EFLAG Register The extended flag (EFLAG) register has been changed in the Pentium microprocessor. Figure5.5 pictures the contents of the EFLAG register. Four new flag bits have been added to this register to control or indicate conditions about some of the new features in the Pentium. Fig. 5.5. The structure of the Pentium EFLAG register. Following is a list of the four new flags and the function of each: ID The identification flag is used lb test for the CPUID instruction. If a program can set and clear the ID flag, the processor supports the CPUID instruction. VIP Virtual interrupt pending indicates that a virtual interrupt is pending. VIF Virtual interrupt is the image of the virtual interrupt flag IF used with VIP AC Alignment check indicates the state of the AM bit in control register 0. VM Virtual Mode Flag If this flag is set, the 80386 enters the virtual 8086 mode within the protected mode. This is to be set only when the 80386 is in protected mode. In this mode, if any privileged instruction is executed an exception 13 is generated. This bit can be set using the IRET instruction or any task switch operation only in the protected mode. 5.9 TB/AMP/2010 RF Resume Flag This flag is used with the debug register break points. It is checked at the starting of every instruction cycle and if it is set, any debug fault is ignored during the instruction cycle. The RF is automatically reset after successful execution of every instruction, except for the IRET and POPF instructions. Also, it is not automatically cleared after the successful execution of JMP, CALL and TNT instructions causing a task switch. These instructions are used to set the RF to the value specified by the memory data available at the stack. NT Nested Task Flag IOP I/O privilege level Built-In Self-Test (BIST) The built-in self-test (BIST) is accessed on power-up by placing a logic 1 on INIT while the RESET pin changes from 1 to 0. The BIST tests 70 percent of the internal structure of the Pentium in approximately 150μs. Upon completion of the BIST, the Pentium reports the outcome in register EAX. If EAX = 0, the BIST passed and the Pentium is ready for operation. If EAX contains any other value, the Pentium has malfunctioned and is faulty. PENTIUM MEMORY MANAGEMENT The memory-management unit within the Pentium is upward-compatible with the 80386 and 80486 microprocessors. Many of the features of these earlier microprocessors are basically unchanged in the Pentium. The main change is in the paging unit and a new system memory-management mode. Paging Unit The paging mechanism functions with 4K-byte memory pages or with a new extension available to the Pentium with 4M byte-memory pages. As detailed in Chapters 1 and 17, the size of the paging table structure can become large in a system that contains a large memory. Recall that to fully repage 4G bytes of memory, the microprocessor requires slightly over 4M bytes of memory just for the page tables. In the Pentium, with the new 4M-byte paging 5.10 TB/AMP/2010 feature, this is dramatically reduced to just a single page table. The new 4M-byte page sizes are selected by the PSE bit in control register 4. The main difference between 4K paging and 4M paging is that in the 4M paging scheme there is no page table entry in the linear address. See Figure 5.6 for the 4M paging system in the Pentium microprocessor. Pay close attention to the way the linear address is used with this scheme. Notice that the leftmost 10 bits of the linear address select an entry in the page directory (just as with 4K pages). Unlike 4K pages, there are no page tables; instead, the page directory addresses a 4M-byte memory page. Fig. 5.6 The linear address 00200001H repaged to memory location 01000002H in 4Mbyte pages. Note that there are no page tables. Memory-Management Mode The system memory-management mode (SMM) is on the same level as protected mode, real mode, and virtual mode, but it is provided to function as a manager. The SMM is not intended to be used as an application or a system-level feature. It is intended for high-level system functions such as power management and security, which most Pentiums use during operation. 5.11 TB/AMP/2010 Access to the SMM is accomplished via a new external hardware interrupt applied to the SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins executing system-level software in an area of memory called the system management RAM or SMMRAM, called the SMM state dump record. The SMI interrupt disables all other interrupts that are normally handled by user applications and the operating system. A return from the SMM interrupt is accomplished with a new instruction. RSM returns from the memory-management mode interrupt and returns to the interrupted program at the point of the interruption. The SMM interrupt calls the software, initially stored at memory location 38000H, using CS=3000H and EIP = 8000H. This initial state can be changed using a jump to any location within the first 1M byte of memory. An environment similar to real-mode memory addressing is entered by the management mode interrupt, but it is different because, instead of being able to address the first 1M of memory, SMM mode allows the Pentium to treat the memory system as a flat, 4G-byte system. In addition to executing software that begins at location 38000H, the SMM interrupt also stores the state of the Pentium in what is called a dump record. The dump record is stored at memory locations 3FFA8H through 3FFFFH, with an area at locations 3FE00H through 3FEF7H that is reserved by Intel. The dump record allows a Pentium-based system to enter a sleep mode and reactivate at the point of program interruption. This requires that the SMMRAM be powered during the sleep period. Many laptop computers have a separate battery to power the SMMRAM for many hours during sleep mode. The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the RSM instruction. These data allow the RSM instruction to return to the halt-state or return to the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering the mode, the RSM instruction reloads the state of the machine from the state dump and returns point of interruption. 5.12 TB/AMP/2010 The SMM mode can be used by the system before the normal operating system is placed in the memory and executed. It can also periodically be used to manage the system, provided that normal software doesn’t exist at location 38000H—3FFFFH. If the system relocates the SMRAM before booting the normal operating system, it becomes available for use in addition to the normal system. The base address of the SMM mode SMRAM is changed by modifying the value in the state dump base address registers (locations 3FEF8H through 3F3FBH) after the first memorymanagement mode interrupt. When the first RSM instruction is executed, returning control back to the interrupted system, the new value from these locations changes the base address of the SMM interrupt for all future uses. For example, if the state dump base address is changed to 000E8000H, all subsequent SMM interrupts use locations E8000H—EFFFFH for the Pentium state dump. These locations are compatible with DOS and Windows. PENTIUM II Pentium II is also a 32-bit processor with 64-bit data bus and 36-bit address bus to address up to 64GB of physical memory space. It is actually a Pentium pro processor with on-chip MMX (Multi Media Extension). It is available with maximum internal ratings of 233 MHz to 450 MHz. The features of Pentium II processor are; (i) Supports the INTEL architecture with dynamic execution. (ii) Integrated primary (L1) 16-kb instruction cache and 16-kb write back data cache. (iii)Integrated 256kb second level (L2) cache. (iv) Fully compatible with previous microprocessors. (v) Supports MMX technology. (vi) Quick start and Deep sleep modes provide extremely low power dissipation. 5.13 TB/AMP/2010 (vii) Low power GTL + processor system bus interface (GTL: Gunning transceiver Logic). (viii) Integrated math co-processor. (ix) Integrated thermal diode for measuring processor temperature. Pentium II Software Changes The Pentium II microprocessor core is a Pentium Pro. This means that the Pentium II and the Pentium Pro are essentially the same device for software. This section lists the changes to the CPUID instruction; and the SYSENTER, SYSEXIT, FXSAVE, and FXRSTORE instructions (the only modifications to the software). CPUID Instruction Table 5.1 lists the values passed between the Pentium II and the CPUID instruction. These are changed from earlier versions of the Pentium microprocessor. The version information returned after executing the CPUID instruction with a logic 0 in EAX is returned in EAX. The family ID is returned in bits 8 to 11; the model ID is returned in bits 4 to 7. The stepping ID is returned in bits 0 to 3. For the Pentium II, the model number is 6 and the family ID is a 3. The stepping number refers to an update number. The higher the stepping number, the newer the version. TABLE 5.1 CPUID instruction. 5.14 TB/AMP/2010 The features are indicated in the EDX register after executing the CPUID instruction with a zero in EAX. Only two new features are returned in EDX for the Pentium II. Bit position 11 indicates whether the microprocessor supports the two new fast call instructions SYSENTER and SYSEXIT. Bit position 23 indicates whether the microprocessor supports the MMX instruction set. The remaining bits are identical to earlier versions of the microprocessor and are not described. Bit 16 indicates whether the microprocessor supports the page attribute table or PAT. Bit 17 indicates whether the microprocessor supports the page size extension found with the Pentium Pro and Pentium II microprocessors. The page size extension allows memory above 4G through MG to be addressed. Finally, bit 24 indicates whether the fast floating-point save and restore instructions are implemented. SYSENTER and SYSEXIT Instructions The SYSENTER and SYSEXIT instructions use the fast call facility introduced in the Pentium II microprocessor. Please note that these instructions function only in ring zero (privilege level 0) in protected mode. Windows operates in ring 0, but does not allow applications access to ring 0. These new instructions are meant for operating system software. The SYSENTER instruction uses some of the model-specific registers to store CS, EIP, and ESP to execute a fast call to a procedure defined by the model-specific register. The fast call is different from a regular call because it does not push the return address onto the stack as a regular call. Table 5.2 illustrates the model-specific register used with SYSENTER and SYSEXIT. Note that the model-specific registers are read with the RDMSR instruction and written with the WRMSR instruction. TABLE 5.2 The model- specific registers used with SYSENTER and SYSEXIT. 5.15 TB/AMP/2010 To use the RDMSR or WRMSR instructions, place the register number in the ECX register. If the WRMSR is used, place the new data for the register in EDS: EAX. For the SYSENTER instruction, you need use only the EAX register, but place a zero into EDX. If the RDMSR instruction is used, the data are returned in the EDX: EAX register pair. To use the SYSENTER instruction, first load the model-specific registers with the address of the system entrance point into the SYSENTER_CS and SYSENTER._EIP registers. This would normally be the address of the operating system such as Windows or Windows NT. Note that this instruction is meant as a system instruction to access code or software in ring 0. The stack segment register is lo4ded with the value placed into SYSENTER_CS plus 8. In other words, the selector pair addressed by SYSENTER._CS selector value are loaded into CS and SS. The value of the stack offset is loaded into SYSENTER_ESP. The SYSEXIT instruction loads CS and SS with the selector pair addressed by SYSENTER_CS plus 16 and 24. Table 5.3 illustrates the selectors from the global selector table, as addressed by SYSENTER_CS. In addition to the code and stack segment selector and the memory segments that they represent, the SYSEXIT instruction passes the value in EDX to the EIP register and the value in ECX to the ESP register. The SYSEXIT instruction returns control back to application ring 3. As mentioned, these instructions appear to have been designed for quick entrance and return from the Windows or Windows NT operating systems on the personal computer. TABLE 5.3 Selectors addressed by the SYSENTER_CS select value. To use SYSENTER and SYSEXIT, the SYSENTER instruction must pass the return address to the system. This is accomplished by loading the EDX register with the return offset arni by placing the segment address in the global descriptor table at location SYSENTER_C?+. The stack segment is transferred by loading the stack segment selector into SYSENTER_CS+24 and the ESP into the ECX. 5.16 TB/AMP/2010 FXSAVE and FXRSTOR Instructions The last two new instructions added to the Pentium II microprocessor are the FXSAVE and FXRSTOR instructions, which are almost identical to the FSAVE and FRSTOR instructions. The main difference is that the FXSAVE instruction is designed to properly store the state of the MMX machine, while the FSAVE properly stores the state of the floating- point coprocessor. The FSAVE instruction stores the entire tag field, while the FXSAVE instruction only stores the valid bits of the tag field. The valid tag field is used to reconstruct the restore tag field when the FXRSTOR instruction executes. This means that if the MMX state of the machine is saved, use the FXSAVE instruction; if the floating-point state of the machine is saved, use the FSAVE instruction. For new applications, it is recommended that the FXSAVE and FXRSTOR instructions should be used to save the MMX state and floating-point state of the machine. Do not use the FSAVE and FRSTOR instructions in new applications. THE PENTIUM III The Pentium III microprocessor is an improved version of the Pentium II microprocessor. Even though it is newer than the Pentium II, it is still based on the Pentium Pro architecture. There are two versions of the Pentium III. One version is available with a non-blocking 512Kbyte cache and packaged in the slot 1 cartridge, and the other version is available with a 256K-byte advanced transfer cache and packaged in an integrated circuit. The slot 1version cache runs at half the processor speed, and the integrated-cache version runs at the processor clock frequency. As shown in most benchmarks of cache performance, increasing the cache size from 256K bytes to 512K bytes only improves performance by a few percent. The salient architectural features are: 1. P-III CPU has been developed using 0.25 micron technology and includes over 9.5 million transistors. It has three versions operating at 450 MHz, 500 MHz and 550 MHz which are commercially available. 2. P-III incorporates multiple branch prediction algorithms. 5.17 TB/AMP/2010 3. Seventy new instructions have been added to Pentium III. These instructions are useful in advanced imaging, speech processing and multimedia applications. 4. Dual independent bus architecture increases bandwidth. 5. P-III employs dynamic execution technology, which has already been discussed. 6. A 512Kbyte unified, non-blocking level 2 cache has been used. 7. Eight 64-bit wide Intel MMX registers along with a set of 57 instructions for multimedia applications are available Chip Sets The chip set for the Pentium III is different from the Pentium II. The Pentium III uses an Intel 810, 815, or 820 chipset. The 815 is most commonly found in newer systems that use the Pentium III. A few other vendor chip sets are available, but problems with drivers for new peripherals, such as the video cards, have been reported. An 840 chip set also was developed for the Pentium III, but Intel does not make it available. Bus The Coppermine version of the Pentium III increases the bus speed to either 100 MHz or 133MHz. The faster version allows transfers between the microprocessor and the memory at higher speeds. Suppose that a 1-GHz microprocessor uses a 133-MHz memory bus. You might think that the memory bus speed could be faster to improve performance. However, the connections between the microprocessor and the memory preclude using a higher speed for the memory. If it is decided to use a 200-MHz bus speed, we must recognize that a wavelength at 200 MHz is 300,000,000/200,000,000 or 3/2 meter. An antenna is 1/4 of a wavelength. At 200 MHz, an antenna is 14.8 inches. We do not want to radiate energy at 200 MHz, so we need to keep the printed circuit board connections shorter than 1/4wavelength. In practice, we would keep the connections to no more than 1/10 of 1/4wavelength. This means that the connections in a 200MHz system should be no longer than 1.48 inches. This size would present the main board manufacturer with a problem when placing the sockets for a 200 MHz memory system. 5.18 TB/AMP/2010 It is possible to approach or even exceed the 200 MHz memory system, if we develop a new technology for interconnecting the microprocessor, chipset, and memory. At present the memory functions in bursts of four 64-bit numbers each time we read the main memory. This burst of 32bytes is read into the cache. The main memory requires 3 wait states at 100 MHz to access the first 64-bit number and then zero wait states for each of the three remaining 64-bit wide numbers for a total of seven 100 MHz bus clocks. This means we are reading data at 70 ns / 32 = 2.1875ns per byte, which is a bus speed of 457M bytes per second. This is slower than the clock on a 1GHz microprocessor, but because most programs are cyclic and the instructions are stored ii internal cache, we can and often do approach the operating frequency of the microprocessor. PENTIUM IV The most recent version of the Pentium Pro architecture microprocessor is the Pentium 4 microprocessor from Intel. The Pentium 4 was released initially in November 2000 with a speed of 1.3 GHz. It is currently available in speeds up to 2.0 GHz. There are two packages available for this integrated microprocessor, the 423-pin PGA and the 478-pin FC-PGA2. Both versions use the 1.8 micron technology for fabrication. As with earlier versions of the Pentium, the Pentium 4 uses a 100-MHz memory bus speed, but because it is quad pumped, the bus speed can approach 400 MHz. Memory Interface The memory interface to the Pentium 4 typically uses the Intel 850 chipset. The 850 provides a dual-pipe memory bus to the microprocessor with each pipe interfaced to a 32-bit wide section of the memory. The two pipes function together to comprise the 64-bit wide data path to the microprocessor. Because of the dual pipe arrangement, the memory must be populated with pairs of RDRAM memory devices operating at either 600 MHz or 800 MHz. According to Intel this arrangement provides a 300% increase in speed over a memory populated with PC-l00 memory. 5.19 TB/AMP/2010 Hyper Pipelined Technology The Pentium 4 incorporates a deeper pipelined architecture than prior versions of the Pentium microprocessor. Not only does it queue instructions for execution, but it also queues microinstruction for execution in a special cache for the microprocessor core. This special microinstruction cache is 12K bytes deep. This technology excludes the execution unit from the main cache path to the microinstruction stream to increase performance. RISC Architecture The complexities of the instructions supported by a CISC processor went on increasing, as more and more sophisticated processors were designed and marketed. This resulted in an increase of processor die size to accommodate the large microcode required by the complex instructions. The large size in turn meant more cost, since it consumes more silicon. Also the chip size increases, the power consumption increases, resulting in more heating of the chip. This in turn requires more cooling arrangement. If we use processor, which support a set of simpler instructions, which do not require complex decoding, then the design of processor becomes simple, with an associated reduction in cost and power consumption. Also the execution of these instructions becomes very fast. As the name implies, Reduced Instruction Set Computer or RISC as it is popularly known is a type of architecture that utilizes a small, lightly optimized set f instructions, rather than a more specialized set of instructions often found in other types of architectures. Typica1ly every instruction is executed in a single clock after it is fetched and decoded. These instructions are executed very fast. Lot of disc space is consumed by micro codes in a ClSC design which could be otherwise used for enhanced features. It is thus possible to produce more RISC processors per silicon wafer. This makes RISC processors smaller, with less energy consumption. 5.20 TB/AMP/2010 THE ADVANTAGES OF RISC There are several advantages of a RISC processor over its CISC counterpart. Implementing a processor with a simplified instruction set design provides several advantages over implementing a comparable CISC design. Some of the advantages are as below. (i) RISC instructions, being simple, can be hard-wired, while CISC architectures may have to use micro-programming in order to implement comp1ex instructions. (ii) A set of simple instructions results in reduced complexity of the control unit and the data-path; as a consequence, the processor can work at a high clock frequency and thus yields higher speed. (iii) As a result several extra functionalities, such as memory management units or floating point arithmetic units, can also be placed on the same chip. (iv) Smaller chips allow a semiconductor manufacturer to place more parts on a single silicon wafer, which can lower the per-chip cost dramatically. (v) High-level language compilers produce more efficient codes in a RISC processor than its counterpart CISC processor, because they tend use the smaller set of instructions in a RISC computer. (vi) Shorter design cycle—A new RISC processor can be designed and tested more quickly since RISC processors are simpler than corresponding CISC processors. (vii) The application programmers who use the microprocessor’s instructions will find it easier to - develop code with a smaller and optimum instruction set. (viii) Another advantage is that the loading and decoding of instructions in a R1SC processor is simple and fast, as it is not needed to wait until the length of an instruction is known in order to start decoding the following one. Decoding is simplified as opcode and address fields are located in the same position for all instructions. 5.21 TB/AMP/2010 BASIC FEATURES OF RISC PROCESSORS (i) Simple instruction set: In a RISC machine, the instruction set contains simple, basic instructions, from which more complex instructions can be composed. Thus instructions with less latency are preferred. (ii) Same length instruction: Each instruction is of the same length, so that it may be fetched in a single operation. The traditional microprocessors from Intel or Motorola support variable length instructions. (iii) Single machine-cycle instructions: Most instructions complete in one machine cycle, which allows the processor to handle several instructions at the same time. RISC processors have unity CPI (clock per instruction), which is due to the optimization of each instruction on the CPU and massive pipelining embedded in a RISC processor. (iv) Pipelining: Usually massive pipelining is embedded in a RISC processor. The pipelining is key to speed up RISC machines. (v) Very few addressing modes and formats: Unlike the CISC processors, where the number of addressing modes are very high, in RISC processors, the addressing modes are much less and it supports few formats. (vi) Large number of registers: The RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory. (vii) Microcoding not required: Unlike in a CISC machine, in RISC architecture, instruction microcoding is not required. This is because of the availability of a set of simple instructions and simple instructions may be easily built into the hardware. (viii) Load and Store architecture: The RISC architecture is primarily a Load and Store architecture implying that all the memory accesses take place using Load or Store type operations. 5.22