Advanced Higher Computing Computer Architecture Chapter 2 The internal architecture of the microprocessor The memory address register (MAR) Data transferred to the processor from anywhere else in the system, arrives in the MDR. Data which is to be sent out from the processor, is sent out from the MDR along the data bus. The MDR forms a tiny buffer between the internal bus and the data bus, so it is also known as the memory buffer register The memory data register (MDR) - sometimes called the memory buffer register (MBR) Addresses never arrive in the MAR from outside the processor. The address bus is a one-way bus, unlike the data bus which can carry data in either direction. The data that is being fetched from memory will be a machine code instruction. This arrives in the MDR like any other item of data. It is then transferred The instruction register (IR) Data is held while being decoded by the control unit. At any instant, the IR holds the machine code instruction which is currently being decoded and executed. A machine code program consists of a series of machine code instructions, held in main memory. These are fetched one by one from memory. The program counter (PC) How does the processor know where to find the next instruction to be processed? the program counter holds the address of the next Instruction. In addition, there will be many general purpose registers (GP registers), which, as their name implies, can be used to store any item of data at any time as required by the current program running in the processor. Pupil task Complete questions 10-14 on pages 30 & 31 To execute a machine code program it must first be loaded, together with any data that it needs, into main memory (RAM). Once loaded, it is accessible to the CPU which fetches one instruction at a time, decodes and executes it. Fetch, decode and execute are repeated until a program instruction to HALT is encountered. This is known as the fetch-execute cycle. 1. Fetch. The instruction is fetched from the memory location whose address is contained in the Program Counter and placed in the Instruction Register. The instruction will consist of an operation code and possibly some operands. The operation code determines which operation is carried out. The term opcode is usually used as a shorthand for operation code. 2. Decode. The pattern of the opcode is interpreted by the Control Unit and the appropriate actions are taken by the electronic circuitry of the CU. These actions may include the fetching of operands for the instruction from memory or from the general purpose registers. 3. Increment. The Program Counter is incremented. The size of the increment will depend upon the length of the instruction in the IR. For example, if this instruction was a 3 byte instruction then the PC would be incremented by 3. 4. Execute. The instruction in the Instruction Register is executed. This may lead to operands being sent to the ALU for arithmetic processing and the return of the result to one of the general purpose registers. When a HALT instruction is received then execution of this program ceases. The fetch phase 1. The contents of the PC are copied into the MAR; 2. The contents of memory at the location designated by the MAR are copied into the MDR; 3. The PC is incremented; 4. The contents of the MDR are copied into the IR. The execute phase The execute phase consists of the following steps: 1. Decode the instruction in the IR; 2. Execute the instruction in the IR. For convenience we can write this series of steps as a pseudocode representation: loop forever PC > MAR [MAR] > MDR PC +1 > PC MDR > IR Decode IR Execute IR End loop Note that means is copied to and that means the contents of the location pointed to by . Pupil task Web animation on Scholar for:- Use of Registers in an Instruction Fetch & Sequencing the steps in an instruction fetch Improving performance Computer and microprocessor designers are driven by the need to improve computer performance to meet the ever increasing demands of computer users. Early microprocessors had clock speeds measured in kHz (thousands of cycles per second) while modern processors such as the Pentium III are now achieving speeds of over 1 GHz (thousand million cycles per second). Obviously clock speed is an important factor in determining the clock speed versus the performance of Intel processors as measured in Million Instructions per Second (MIPS). MIPS is now an outdated way to measure performance but it is the only measure applicable over the whole range. Intel Processor 8086 80286 80386DX 80486DX Pentium Pentium Pro Clock Speed 8 MHz 12.5 MHz 20 MHz 25 MHz 60 MHz 200 MHz MIPS 0.8 2.7 6.0 20 100 440 This table shows that the performance as measured by MIPS has gone up at a higher rate than has the clock rate Pupil task (30mins) Complete the table below. Then predict what clock speeds You would expect to be available in the next 5 years Intel Processor Pentium 2 (1997) Pentium 3 (1999) Pentium 4 (2000) Itanium (2001) Pentium M (2003) Clock Speed MIPS Go to www.intel.com/pressroom/kits/quckreffam.htm or www.intel.com and enter quckreffam into the search box Increasing data bus width Increasing the clock speed will increase the number of data fetches that can be made per second. Increasing the data bus width will increase the amount of data that can be fetched each time A data bus width of only 4 bits took 2 fetches to fetch a byte from memory to the processor. The Intel 8008 processor (1972) used an 8 bit data bus. Clearly the internal registers (particularly the MDR) had to match this Intel 8086 was developed which used a 16 bit data bus and set of internal registers. This gave huge improvements in performance, and allowed the development of the first PCs In 1985, Intel decided to increase the data bus width and internal registers of its processor again, so the 80386 was produced with a 32 bit data bus. 32 bits was the norm for the next 10 years, until the first 64 bit Pentium chip was introduced in 1995. All PC designs since then have made use of 64 bit techn. NOTE: A similar development has taken place in the Motorola chips which are used in Apple computers, from the early 68000 16-bit architecture through to the current G5 64-bit architecture the width of the address bus has no direct effect on performance The width of the address bus determines the maximum memory address address bus widths have also increased steadily over the last few decades from 16-bit to 32-bit, and now 64-bit. The earliest computers had a single system bus connecting the processor with the main memory and peripheral interfaces. This system bus operated at the same speed as the processor. the data bus width has been stepped up from 8 to 16, 32 and now 64 bits wide. The number of different components within a system has also increased. A modern processor is likely to be connected to a range of peripherals as well as main memory. These peripherals operate at lower speeds than the processor and main memory. As a result, designs have developed with multiple buses within the system a very fast "frontside" system bus for main memory, a slower bus for communication with peripheral devices The PCI and PCI-X buses are connected to the main system bus by a bus bridge, which controls the traffic on and off the bus The PCI and PCI-X buses are known as multipoint buses; (this means they can have branches to any number of Devices) The previous diagram shows how components interact The separation of (relatively slow) peripheral traffic on to the PCI bus means that fast data transfers between main memory and the processor are not slowed down. The PCI-X bus, as well as being faster and wider than the original PCI bus, also has a number of special features to maximise performance. These include the prioritisation of data from different devices, which particularly improves performance of streaming audio and video. Cache Memory The speeds quoted for data access to main memory sound quite impressive. However, current processors are able to process data even faster than that! One solution to this problem would be to increase the number of registers on the microprocessor itself, so that all data required would be instantaneously available to the processor. However, this solution is impractical, leading to over complex and large microprocessor chips Cache memory uses the faster but more expensive static RAM chips rather than the less expensive, but slower, dynamic RAM chips which are used for most of the main memory Cache memory is connected to the processor by the "backside" bus Normally whole blocks of data are transferred from main memory into cache, while single words are transferred along the backside bus from the cache to the processor. L1 and L2 cache Most modern chips also have level 1 (L1) cache. This is similar to L2 cache, but the cache is actually on the same chip as the processor. This means that it is even faster to access than L2 cache. Pentium processors have two L1 caches on the processor. One of these is for caching data, while the other is use for caching instructions (machine code). In the Pentium 4 processor, each of these is 8Kbytes. Similarly, the PowerPC G4 processor has two 32Kb L1 caches. As we have seen, memory access is one of the major bottlenecks limiting performance of computer systems. Many techniques have been devised to overcome this, including use of SRAM, widening the data bus, using separate buses for memory and peripherals, and the use of L1 and L2 cache. Another technique which can be applied is called interleaving. Memory interleaving The idea behind interleaving is that memory can be split up into 2 or 4 independent RAM chips. A memory read or write will normally take 3 or 4 clock cycles to perform. Data is actually being transferred along the data bus during only 1 of these clock cycles. The processor has to insert "wait" states into its program to allow for this. If memory interleaving has been implemented, the processor can use a "wait" state to initiate the next memory access, so saving time. In effect, the processor can access the 4 memory chips almost simultaneously. This increases throughput significantly. To make best use of this, successive data items must be stored in different memory chips. Memory interleaving is tricky to implement for memory fetches, as the processor has to deal with the data that arrives, which may require further processing steps. It is ,therefore, more often used for memory writes, where the processor simply sends the data off to memory, and does not have to "worry" about what happens next. For a similar reason, memory interleaving is used to speed up access to video RAM. This is less problematic than main memory, as all the "data" is simply data, whereas in main memory, the "data" may be instructions! Direct Memory Access (DMA) direct memory access (DMA), which is used when data is being transferred to or from a peripheral device. There are two methods commonly used to transfer data without DMA – • programmed I/O, • interrupt-driven I/O The inefficiencies of programmed and interrupt driven I/O are not too serious under most circumstances, but become a serious issue when large blocks of data are to be transferred between main memory and a slow peripheral. DMA is a technique which overcomes this. Diagram of DMA – DMAC can be required For exam papers? Pupil task Complete questions 18 – 25 on pages 42 & 43