CHAPTER 2 - COMPUTER SYSTEMS ORGANIZATION 2.1 PROCESSORS CPU- the brain of the computer CPU- ALU + control unit + small high speed memory (register) Most important registers: PC, IR Data path: registers, ALU, buses connecting the pieces. Instructions: 1. Register-register 2. Register-memory Data path cycle: The process of running two operands through the ALU and storing the result back in one of the registers. 1 INSTRUCTION EXECUTION (Fetch-decode-execute cycle) 1. 2. 3. 4. 5. 6. 7. Fetch the next instruction from memory into the IR. Change PC to point to following instruction. Determine the type of instruction just fetched. If the instrucion uses a word in memory, determine where it is. Fetch the word, if needed into a CPU register. Execute the instruction. Go to step 1. 2 Should a program be executed by a "hardware" CPU? Hardware processors ≈ interpreters (Some hardware machine is needed to run the interpreter.) An interpreter breaks the instructions of its target machine into small steps. As a consequence, the machine on which the interpreter runs can be much simpler and less expensive than a hardware processor for the target machine would be. The saving comes from the fact that hardware is being replaced by software. Simple instructions or complex instructions? Complex instruction (direct support for accessing array elements, floating point arithmetic) are better: - More powerful, more complex instructions led to faster program execution even though individual instructions might take longer to execute. - Execution of individual operations could be overlapped or executed in parallel with different hardware. Tradeoff: cost Expensive, high performance computers came to have many more instructions than lower cost ones. However, the rising cost of software development and instruction compability requirements created the need to implement complex instructions even on low-cost computers where cost was more important than speed. How to build a low cost computer that could execute all the complicated instructions of high performance, expensive machines? INTERPRETATION In 1950s, IBM introduced a family of computers (IBM System/360) with one architecture but many different implementations that could execute the same program, differing only in price and speed. A direct hardware implementation was used on the most expensive models. An important benefit of interpretation: The opportunity to add new instructions at minimal cost. Another factor in favor of interpretation: existence of fast read-only memories. Example: a typical 68000 instruction took the interpreter 10 instructions called microinstructions, at 100 nsec each, and 2 references to main memory, at 500 nsec each. Total execution time was then 2000 nsec. Without control structure, it would have taken 6000 nsec. 3 RISC vs CISC RISC: Reduced Instruction Set Computers CISC: Complex Instruction Set Computers In 1980, Patterson and Sequin designed VLSI CPU chips that did not use interpretation -> RISC I - > RISC II -> SPARC. In 1981, Hennessey designed MIPS. - different from commercial processors: Since these new CPUs did not have to be backward compatible with existing products, their designers were free to choose new instruction sets that would maximize the performance. Designing instructions that could be started quickly was the key to good performance. How long an instruction actually took mattered less than how many could be started per second. RISC principle: a computer should have a small number of simple instructions in one cycle of the data path, fetching two registers, combining them somehow and storing the result back in a register. - If a RISC machine takes 4 or 5 instructions to do what a CISC machine does in one instruction, if the RISC instructions are 10 times as fast as (because they are not interpreted), RISC wins. - By this time, the speed of main memories had caught up to the speed of readonly control stores, interpretation penalty had greatly increased, strongly favoring RISC machines. RISC -> DEC Alpha CISC -> Intel CISC is still preferred. (backward compability): Billions of dollars have been invested in Intel line. Intel employed RISC ideas even in CISC architecture. Starting with 486, Intel CPUs contain a RISC core that executes the simplest instructions in a single data path cycle, while interpreting the more complicated ones in CISC way. Common instructions are fast, less common ones are slow. DESIGN PRINCIPLES OF MODERN COMPUTERS - RISC design principles 1. 2. All instructions are directly executed by hardware: not interpreted by microinstructions high speed Maximize the rate at which instructions are issued: by parallelism 4 3. Instructions should be easy to decode: make instructions regular, fixed length, with a small number of fields. The fewer different formats, the better. 4. Only loads and stores should reference memory: - operands must come from and return to registers. - Separate instructions for moving operands from memory to registers (delay are unpredictable -> overlap different instructions) 5. Provide plenty of registers. PARALLELISM 1. Instruction level 2. Processor level INSTRUCTION LEVEL PARALLELISM 1. Pipelining 2. Superscalar Architecture: 2 pipelines are better Intel 486 - one pipeline Pentium: double pipelines 5 Main pipeline: u pipeline - arbitrary Pentium instructions v pipeline - integer instructions Complex rules determined whether a pair of instructions are compatible. Pentium specific compilers that produced compatible pairs could produce faster-running programs. 4 pipelines: duplicate hardware Solution: superscalar architecture PROCESSOR LEVEL PARALLELISM Pipelining and superscalar operation rarely win a factor of more than 5 or 10 due to heat and speed of light problems. To get gains of 50, 100 or more, the only way is to design with multiple CPUs. 1. Data Parallel Computers Loop/array based program structure make it easy target for parallel computing 1. SIMD processors: parallel computer 2. vector processors: extension of a single processor SIMD processors have 1 control unit and many processing units which execute the same instruction. Modern graphic processing units (GPUs) heavily rely on SIMD processing Ex: Nvidia Fermi GPU- 16 SIMD stream multiprocessors (SM) with each SM containing 32 SIMD processors. Each cycle, the scheduler selects 2 threads to execute on the SIMD processor. The next instruction from each thread then executes on top of 16 SIMD processors. A fully loaded Fermi GPU core with 32 SMs will perform 512 operations per cycle. 6 Unlike a SIMD processor, vector processors perform operations in a single, heavily pipelined functional unit. Both SIMD processors and vector processors work on arrays of data. A SIMD processor does it by having as many as functional units. On the other hand, the vector processor has the concept of a vector register, which consists of a set of conventional registers that can be loaded from memory in a single instruction. For ex., a vector addition instruction performs the pairwise addition of the elements of two such vectors by feeding them to a pipelined adder from the two vector registers. The SSE (Streaming SIMD extension) instructions on the Intel Core architecture use this execution model to speed up multimedia and scientific calculations. 2.Multiprocessors 3. Multicomputers - large number of interconnected computers with private memory - message communication - 2D, 3D, tree, ring interconnections 7 4. Hybrid systems: Multiprocessor: easier to program Multicomputer: easier to build 2.2 PRIMARY MEMORY Bits: BCD: 4 decimals into an 16 bit integer 16 bit pure binary number: can store 65536 combinations BCD: 10000 combinations Memory addresses: If an address has m bits, there are 2m cells. 32 bit machine: 32 bit registers Byte ordering: Big endian, little endian No problem with characters Problem with integers during transfer between different systems Solution: - software reverses the bytes - include header telling which kind of data and its length 8 Error Correcting code: Hamming code Cache memory: Most heavily used instructions are kept in cache memory. Locality principle forms the basis of all caching systems. Main memories and caches are divided into fixed size blocks: cache lines When a cache miss occurs, entire cache line is loaded from main memory. 9 Cache design issues: - cache size - size of a cache line - organization of the cache: how does the cache keep track of which memory words are currently being held? - Whether instructions and data are kept in the same cache: Unified x Split caches Split caches: parallel access - Number of caches Memory Packaging and Types: SIMM - Single Inline Memory Module: transfer 32 bits at once DIMM - Dual Inline Memory Module: transfer 64 bits at once SO-DIMM (Small Outline DIMM): notebook computers 2.3 SECONDARY MEMORY Memory Hierarchies 10 As we move down the hierarchy, three key parameters increase. 1. The access time gets bigger. 2. Storage capacity increases. 3. Number of bits per dollar increases. Magnetic Disks Track, sector, preamble, intersector gap, cylinder, rotational latency, seek Outer tracks have more linear distance around them than the inner ones do. In older drives, manufacturers used the maximum possible linear density on the innermost track, and successively lower bit densities on tracks further out. If a disk had 18 sectors per track, each one occupied 20 degrees of arc, no matter which cylinder it was in. Nowadays, a different strategy is used. Cylinders are divided into zones and the number of sectors per track is increased in each zone, moving outward from the innermost track. This makes keeping track of the information harder but increases the drive capacity. Disk controller: a chip that controls the drive. - accept commands from software (READ; WRITE, FORMAT...) - controlling the arm motion - detecting and correcting errors - converting 8-bit bytes read from memory into a serial bit stream - Some controllers also handle buffering of multiple sectors, caching sectors, remapping bad sectors. IDE Disks (Integrated Device Electronics) 1. First disk: 10 MB Seagate disk. Controller was on a seperate board. BIOS calling conversions were used. 2. IDE: the controller was closely integrated with the drives. Problem: Sectors starts at address 1 while heads and cylinders starts with 0. Ability to control two drives. 3. EIDE: Extended IDE LBA (Logical block addressing) sectors starts at 0. Ability to control four drives instead of two. Data transfer rate has increased from 4 Mbps to 16.67 Mbps. 4. ATA-3 (AT Attachment) 5. ATAPI-4: ATA Packet Interface – 33 MBps ATAPI-5: 66 MBps 11 ATAPI-6: 100 MBps – 48 bit LBA size instead of 28 ATAPI-7: radical break with the past Serial ATA:1 bit over 7 pins - 150 MBps – 1.5 GBps - 0.5 volts for signaling which reduces power consumption - Better airflow with round cables SCSI Disks (Small Computer System Interface) Not different from IDE disks in terms of how their cylinders, tracks and sectors are organized, but they have a different interface and much higher transfer rates. High transfer rates: SUN, HP, SGI workstations, Macintosh and high-end Intel PCs as network servers More than just a hard disk interface: it is a bus to which a SCSI controller and up to seven devices can be attached. Controller issues commands to disks and other peripherals. Commands and responses occur in phases, using various control signals to delineate the phases and arbitrate bus accesses when multiple devices are trying to use the bus at the same time. SCSI allows all the devices run at once. IDE and EIDE do not allow this. RAID (Redundant Array of Inexpensive Disks) - SCSI is suitable for good performance - More than one disks appears as a single disk to the software. - Striping: a technique to distribute data over multiple drives. 12 Solid-State Disks: nonvolatile flash memory Advantage: faster than magnetic disks due to 0 seek time Disadvantages: cost, higher failure rate Wear leveling: every time a new block is written, the destination block is reassigned to a new SSD block that has not been recently written. 13 CD-ROMs CD-Recordables CD-ROM XA: incremental writes CD-Rewritables DVD (Digital Versatile Disk)- 4.7 GB/17 GB, 1.4 MBps Blue Ray- 25 /50 GB, 4.5MBps 2.4 INPUT/OUTPUT 1. BUSES Motherboard: - CPU chip - some slots into which DIMM modules can be clicked - a bus etched along its length - sockets into which the edge connectors of I/O boards can be inserted Each I/O device has two parts: 1. Controller 2. I/O device The controller is usually contained on a board plugged into a free slot, except for those controllers that are not optional (keyboard), which are sometimes located on the motherboard. The controller connects to its device by a cable attached to a connector on the back of the box. What does a controller do?: 1. controls its I/O device and handles bus access for it . 2. breaks the bit stream up into units, and write each unit into memory. 3. A controller that reads and writes data to or from memory without CPU access: DMA controller. 4. interrupts the CPU after the transfer. 14 What happens if the CPU and an I/O controller want to use the bus at the same time? A bus arbiter decides who goes next. In general, I/O devices are given preference. Why? !! Disks and other moving devices cannot be stopped and forcing them to wait would result in lost data. Cycle stealing: When no I/O is in progress, the CPU can have all the bus cycles for itself to reference memory. When some IO device is also running, that device will request and be granted the bus when it needs it. As the CPUs, memories and I/O devices got faster, a problem arouse: the bus could no longer handle the load presented. - People often upgraded their CPU, but wanted to move their printer, scanner, and modem to the new system. - A huge industry had grown up around providing a vast range of I/O devices for the IBM Pc bus. IBM decided produced a faster bus for IBM PC, the PS/2 range. Old bus: ISA (Industry Standard Architecture) PCI and PCIe Buses Despite the market pressure not to change anything, the old bus was really too slow. Other companies developed machines with multiple buses, one of which was the old ISA bus, or its backward compatible successor, the EISA (Extended ISA) bus. The winner was PCI (Peripheral Component Interconnect) designed by Intel. Intel put all patents in the public domain. PCI was replaced by PCIe (PCI Express) which is a radical change from the PCI bus. 15 -1 bit wide serial connections (much higher speeds than 8,16,32,64 bit transfers) PCI: 66 MHZ clock rate, 528 MBps transfer rate with 64 bits transfers PCIe: 8 GHZ, 1 GBps transfer rate - point to point communication provides parallelization. On PCI all devices are listened on the same bus. 2. TERMINALS Computer terminal consists of two parts: a keyboard and a monitor. Mainframe world: these parts are often integrated into a single device and attached to the main computer by a serial line or over a telephone line. PC world: These devices are independent. Keyboards: When a key is depressed and released, an interrupt is generated. Touch Screens: infrared, resistive, multitouch Flat Panel Displays: TFT display: a tiny switching element at each pixel position 16 Colored display: optical filters to separate light into components OLED display(Organic Light Emitting Diode) : consists of layers of electrically charged organic molecules sandwiched between two electrodes. Video RAM: on the display’s controller card. It holds bitmaps for screen images. indexed vs 3byte RGB model. Require a lot of bandwith: 1920x1080 display, 25 fps: 155 MB/sec more than PCI can handle but PCIe can handle it with ease. 3. MICE Mechanical mice Optical mice Optomechanical mice 4. GAME CONTROLLERS Based on accelerometer and cameras 5. PRINTERS 5.1Monochrome Printers: Matrix printer, ink jet printer, laser printer 5.2Color Printers: based on CYMK principle. Color ink jet printers, solid ink printers, laser printer, wax printer, dye sublimation printers. 6. TELECOMMUNICATION EQUIPMENTS 17 Modems: Amplitude modulation, Frequency modulation, Phase modulation Baud rate, full duplex, simplex, half duplex Digital Subscriber Line Due to the high bandwidth provisions of satellite and cable TV, telcos needed a more competitive product than dialup lines. Data rate is reduced with a filter in dialup lines. With ADSL, this filter is removed. 4-8 Mbps is possible. ADSL modem acts as 250 modems operating at different frequencies. 18 Internet over Cable Each telephone user has a private wire to the telco office. However, hundreds of users share the same cable to the headend. A cable modem is needed for Internet Access. When a cable modem is turned on, it scans for a packet from the headend. Upon finding such a packet, the new modem announces its presence on one of the upstream channels. The headend assigns downstream and upstream channels. Upstream channels: Ranging to determine distance from the headend for proper timing. 19 Upstream channels are divided into minislots. Each upstream packet must fit in 1 or more consecutive minislots. The headend announces the start of a new round of minislots periodically. The starting gun is not heard at all modems simultaneously due to the propagation time down the cable. Contention from the minislots. Packets goes to the headend, which relays them over a dedicated channel to the cable company’s main office and then to the ISP. Down stream channels: Single sender, no contention Time division Statistical multiplexing Protection with Reed Solomon codes Encryption for security. 7. DIGITAL CAMERAS 20 The film in original cameras is replaced by CCDs (Charge coupled devices) When light strikes a CCD, it acquires an electrical charge. The charge can be read off by an ADC. One pixel is made up four CCDs. Autofocus: by analyzing high frequency info. in the image and then moving the lens until it is maximized to give the most detail. Exposure: measure the light falling on CCDs, make adjustments to have the light intensity fall in the middle of the CCDs’ range. White balance: for color correction. CCDs are read and stored in rgb camera’s RAM. The camera’s software applies white balance color correction, applies an algorithm for noise reduction, and sharpening the image. Compression Transfer to PC over USB or FİreWİre cable. 8. CHARACTER CODES ASCII - American Standard Code for Information Interchange IS 646: Add 128 chars to ASCII IS 8859: code page UNICODE : code points of 16 bits UTF-8: variable length codes (1-4 bytes) - dominant character set in www ASCII characters: 1 byte length Self synchronizing 21