Operating System • Exploits hardware resources – one or more processors – main memory, disk and other I/O devices • Provides a set of services to system users – program development, program execution, access to I/O devices, controlled access to files and other resources etc. Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 1 Computer System Overview 2 Given Credits • Most of the lecture notes are based on the slides from the Textbook’s companion website: http://williamstallings.com/OS/OS6e.html • Some of the slides are from Dr. David Tarnoff in East Tennessee State University • I have modified them and added new slides Computer Components: TopLevel View Processor Registers • User-visible registers – Enable programmer to minimize main memory references by optimizing register use • Control and status registers – Used by processor to control operation of the processor – Used by privileged OS routines to control the execution of programs Control and Status Registers • Program counter (PC) – Contains the address of an instruction to be fetched • Instruction register (IR) – Contains the instruction most recently fetched • Program status word (PSW) – Condition codes – Interrupt enable/disable – Kernel/user mode Control and Status Registers • Condition codes or flags – Bits set by processor hardware as a result of operations – Can be accessed by a program but not altered – Example • Condition code bit set following the execution of arithmetic instruction: positive, negative, zero, or overflow Instruction Execution • Two steps – Processor reads (fetches) instructions from memory – Processor executes each instruction Basic Instruction Cycle Instruction Fetch and Execute • The processor fetches the instruction from memory • Program counter (PC) holds address of the instruction to be fetched next • PC is incremented after each fetch Instruction Register • Fetched instruction loaded into instruction register • An instruction contains bits that specify the action the processor is to take • Categories of actions: – Processor-memory, processor-I/O, data processing, control Characteristics of a Hypothetical Machine Example of Program Execution Interrupts • Interrupt the normal sequencing of the processor • Why do we need interrupts Classes of Interrupts Interrupts • Most I/O devices are slower than the processor – Without interrupts, processor has to pause to wait for device Program Flow of Control Program Flow of Control Interrupt Stage • Processor checks for interrupts • If interrupt – Suspend execution of program – Execute interrupt-handler routine Transfer of Control via Interrupts Instruction Cycle with Interrupts Simple Interrupt Processing Changes in Memory and Registers for an Interrupt Changes in Memory and Registers for an Interrupt Multiple Interrupts • What to do if another interrupt happens when we are handling one interrupt? Sequential Interrupt Processing Nested Interrupt Processing Multiprogramming • Processor has more than one program to execute • The sequence the programs are executed depend on their relative priority and whether they are waiting for I/O • After an interrupt handler completes, control may not return to the program that was executing at the time of the interrupt Input/Output Techniques • Programmed I/O • Interrupt driven – I/O • Direct Memory Access (DMA) • What are they & the ranking of their efficiencies Input/Output Techniques • Programmed I/O – poll and response • Interrupt driven – I/O module calls for CPU when needed • Direct Memory Access (DMA) – module has direct access to specified block of memory I/O Module Structure Programmed I/O – CPU has direct control over I/O • Processor requests operation with commands sent to I/O module – Control – telling a peripheral what to do – Test – used to check condition of I/O module or device – Read – obtains data from peripheral so processor can read it from the data bus – Write – sends data using the data bus to the peripheral • I/O module performs operation • When completed, I/O module updates its status registers • Sensing status – involves polling the I/O module's status registers Programmed I/O (continued) • I/O module does not inform CPU directly • CPU may wait or do something and come back later • Wastes CPU time because – CPU acts as a bridge for moving data between I/O module and main memory, i.e., every piece of data goes through CPU – CPU waits for I/O module to complete operation Interrupt Driven I/O • • • • • Overcomes CPU waiting Requires interrupt service routine No repeated CPU checking of device I/O module interrupts when ready Still requires CPU to go between for moving data between I/O module and main memory Interrupt-Driven I/O • Consumes a lot of processor time because every word read or written passes through the processor Direct Memory Access (DMA) • Impetus behind DMA – Interrupt driven and programmed I/O require active CPU intervention (all data must pass through CPU) • Transfer rate is limited by processor's ability to service the device • CPU is tied up managing I/O transfer DMA (continued) • Additional Module (hardware) on bus • DMA controller takes over bus from CPU for I/O – Waiting for a time when the processor doesn't need bus – Cycle stealing – seizing bus from CPU (more common) DMA Operation • CPU tells DMA controller: – whether it will be a read or write operation – the address of device to transfer data from or to – the starting address of memory block for the data transfer – the amount of data to be transferred • DMA performs transfer while CPU does other processing • DMA sends interrupt when completes Cycle Stealing • • • • DMA controller takes over bus for a cycle Transfer of one word of data Not an interrupt to CPU operations CPU suspended just before it accesses bus – i.e. before an operand or data fetch or a data write • Slows down CPU but not as much as CPU doing transfer Direct Memory Access • Transfers a block of data directly to or from memory • An interrupt is sent when the transfer is complete • Most efficient The Memory Hierarchy Going Down the Hierarchy • • • • Decreasing cost per bit Increasing capacity Increasing access time Decreasing frequency of access to the memory by the processor Cache Memory • Processor speed faster than memory access speed • Exploit the principle of locality with a small fast memory Cache and Main Memory Cache Principles • Contains copy of a portion of main memory • Processor first checks cache • If not found, block of memory read into cache • Because of locality of reference, likely future memory references are in that block Cache/Main-Memory Structure Cache Read Operation Cache Principles • Cache size – Small caches have significant impact on performance • Block size – The unit of data exchanged between cache and main memory – Larger block size more hits until probability of using newly fetched data becomes less than the probability of reusing data that have to be moved out of cache Cache Principles • Mapping function – Determines which cache location the block will occupy – Direct Mapped Cache, Fully Associative Cache, N-Way Set Associative Cache • Replacement algorithm – Chooses which block to replace – Least-recently-used (LRU) algorithm Cache Principles • Write policy – Dictates when the memory write operation takes place – Can occur every time the block is updated – Can occur when the block is replaced • Minimize write operations • Leave main memory in an obsolete state • 2. [25 pts] This problem concerns the performance of • • • • • the cache memory in web applications that play media files. Consider a video streaming workload that accesses working sets of size 256KB sequentially with the following byte-address stream: 0, 2, 4, 6, 8, 10, … Suppose the computer that processes the above stream has a 32 KB direct-mapped L1 cache. The cache block size is 32 bytes. a) What would be the cache miss rate of the address stream above? Show all calculations. Every 16th access would be a miss, hence, miss ratio is 1/16 = 6.25% • b) If the cache size were changed to 64 KB, what would be the change in the miss rate? Justify your answer. • There would be no change; the miss rate depends only on the block size. • c) If the cache organization were changed to two-way set associative, without changing the block size, would the cache miss rate change? Justify your answer. • There would be no change; the data is fetched in units of blocks from the memory, therefore the miss rate will be the same. • • d) If the cache block size is changed to 16B would the miss rate change? If so, what would be the new value? • Now, every 8th access would be a miss, hence the rate is 1/8 = 12.5% • e) Prefetching is a technique that can be used effectively in streaming applications, such as the one described above. Describe how prefetching works and how it impacts the cache miss rate. • In prefetching, cache lines are brought in speculatively, in anticipation of future accesses. This works very well in streaming applications, where memory accesses are in sequential order, and the preloading for a new block can be overlapped with the consumption of the current block. This effectively reduces the miss rate to zero.