STUDY GUIDE – FINAL EXAM Major Topics 1. Storage management and file systems 2. Memory management including superpage paper 3. Synchronization: semaphores, mutual exclusion, transactions, data-race detection, etc. 4. Structure and principles of distributed systems: architectures, communication, naming, transparency, replication, fault tolerance 5. Kernel architectures and virtual machine monitors 6. Introductory topics: processes and threads, process/thread state diagram, etc. Approximately 50% of the test will come from the new material in numbers 1 and 2. Other topics will be covered more or less equally. Don’t forget that similar concepts are discussed in several different places, so for example a question about client-server architecture could fit into several different categories. Memory Management 1. Understand the basic principles of memory management: motivation, implementation, etc. 2. What are the main problems introduced by page tables? (storage consumption, poor performance). Be able to explain the problems and know the common solutions. 3. How are page tables used to perform address translation (virtual address to physical machine address)? 4. What is the purpose of a Translation Lookaside Buffer (TLB)? 5. What problem is addressed in the superpage paper? 6. What is the difference between a base page and a superpage? 7. Discuss the two main ways an operating system can implement superpages (relocation, reservation) and the advantages & disadvantages of each. 8. Be able to state an argument for and against the use of multiple superpage sizes in a system. What approach is adopted in the paper we read? 9. Understand superpage promotion and demotion, contiguity issues, superpage alignment issues. 10. In the superpage management system developed by Navarro et al., when is an initial reservation made? 11. What guidelines did the Navarro system use to choose an initial reservation size? 12. Explain Distributed Shared Memory and give several benefits. Storage Management and File Systems 1. Know the physical characteristics of a disk (sectors, tracks, cylinders) and the relation between a file block and a disk sector. 2. What are the three components of a disk access? (seek, rotational delay, data transmission time) 3. Which of these components would we seek to minimize if we want to improve performance (disk read/write times)? 4. What is the difference between sequential and random (direct) access patterns for files? 5. What is an i-node? 6. What are some techniques for improving read/write performance in a file system? 7. Define/describe buffering and caching in file systems. 8. Why did FFS introduce cylinder groups? (Consider performance and reliability) 9. Give an argument for and against large block sizes in file systems. How did FFS address this issue? Compare to the large superpage versus small superpage issue in virtual memory management. 10. What are the major issues that must be addressed in a distributed file system? What is network transparency, how can it be achieved? 11. Be able to describe traditional client/server distributed file system architecture, as exemplified in NFS. 12. What is the role of the Virtual File System Layer in NFS? 13. How does the architecture of a cluster file system (for example, the Google File System) differ from a traditional client-server system? 14. Know the roles of the master and the chunk servers in GFS. 15. How is replication handled in the GFS. 16. What are some differences between traditional client-server file systems and peer-to-peer file systems? 17. How is the Ivy file system different from most P2P file systems? 18. What is the difference between a stateless and a stateful file system? Be able to give an advantage and disadvantage of each approach. 19. What is a file handle? 20. Understand the meaning of UNIX semantics, session semantics, immutable semantics, and transaction semantics. 21. What measures might client-side software take to address file consistency issues? 22. How do reader locks differ from writer locks in a DFS? 23. What are the advantages and disadvantages of client-side caching in a distributed file system? Compare to the advantages and disadvantages of server-side replication. 24. NFS caching with server control versus open delegation. 25. How are callbacks used in distributed file systems? Consider NFS and Coda 26. In Coda, how are file replicas maintained? Introduction/Review 1. Be able to define system call, mode switch, and context switch. Know the purpose/use of each. 2. Be able to reproduce the process state transition diagram and describe the characteristics of each state and the kinds of events that cause state transitions. 3. What is the advantage of providing operating system support for multithreaded processes as opposed to having only a single thread? 4. In what way are threads supported by user level libraries better (worse) than threads supported by the OS kernel? 5. Know the difference between deadlock and starvation. Kernel Architectures & Virtual Machine Monitors 1. What is the motivation for developing microkernel and other extensible operating system architectures?.(What problem do they address?) 2. Exokernel systems are based on the microkernel concept. What is the main mechanism this system uses to support extensibility? 3. SPIN provides a set of core operating system services. What is the main mechanism this system provides to support extensibility? 4. Be able to define virtual machine and virtual machine monitor. Know the difference between full virtualization and para-virtualization. 5. What are the reasons for implementing virtual machine technology? Distributed Systems 1. What is middleware? How does it contribute to transparency (a single system image) in distributed systems? 2. Be able to list/briefly describe/identify four distributed system goals. 3. Be able to define transparency and identify various types 4. Discuss scalability in distributed systems and be familiar with various scaling techniques. 5. State two major differences between cluster computer systems and grid computer systems. 6. Be able to discuss/describe centralized (client/server) and decentralized (peer-to-peer) architectures for distributed systems. 7. Be able to identify Distributed Hash Tables and the Chord algorithm; understand their purpose. 8. Be able to define code mobility and give some examples of when code migration would be useful. 9. What is the difference between reliable and unreliable communication? 10. Understand the Remote Procedure Call mechanism. 11. What is location-independent naming? How does the Chord algorithm support it? 12. What are two arguments in favor of data replication? What problems does it introduce? 13. Fault tolerance: definitions of fault tolerance, failure, and error. Synchronization, Mutual Exclusion, Transactions, Data-race Detection 1. Definitions: mutual exclusion, critical section, data race, other relevant terms 2. Know the characteristics of the producer/consumer and readers/writers problem. 3. Be able to state the P and V semaphore algorithms that were presented in class, and know how to apply semaphores in situations such as the homework problems or class examples. 4. What is Lamport’s happened-before relation? Know the three components of the definition. Understand the difference between causally related events and concurrent events. How does “happensbefore” differ (in purpose or application) from total ordering? 5. What is the biggest shortcoming associated with Lamport’s virtual clocks? 6. What is the advantage of vector clocks over Lamport clocks? 7. Given the four algorithms for ensuring mutual exclusion in a distributed environment, be able to explain (not just state) a) how the algorithm works b) how a process knows when it can enter the critical section c) the main problems associated with this algorithm d) how fault tolerant the algorithm is (justify your answer) e) how the algorithms compare in terms of messages per entry/exit & synchronization delay 8. Be able to define “transaction”, list, explain, and demonstrate understanding of the four ACID properties (especially atomicity and isolation), and describe what it means for a transaction to “commit”. 9. Define data race 10. What is the main problem addressed by RaceTrack and Eraser? 11. Be able to compare Eraser and RaceTrack with respect to their respective techniques (how they work) and their ability to detect data races: do either or both issue false alarms? Do either or both miss any potential data races? 12. Why might a distributed system need an election algorithm? Question Types 1. Objective: multiple choice, fill-in-the-blank, true-false 2. Short answer: e.g., complete a sentence, write a paragraph, define a term 3. Problems: simple paging questions, clock values, event relations, semaphore usage – things we have seen in homework and on previous tests. 4. Discussion questions: Possibilities include discuss/explain a particular issue we have studied (for example: discuss improvements made by FFS) or evaluate some system for effectiveness (for example: discuss the relative merits of SPIN operating system as a way of providing extensibility) or compare/contrast two concepts or … . In other words, a discussion question may ask you to recall, organize, and present facts or it might ask you to contribute some original ideas in the form of drawing comparisons, evaluating, etc.. Discussion questions might span several topics from the previous list.