Systems Ph.D. Qualifying Exam Spring 2010 (March 30, 2010) NOTE: PLEASE ATTEMPT 6 OUT OF THE 8 QUESTIONS GIVEN BELOW. Question 1 (OS Kernels and Hypervisors) There are many commonalities between systems like Exokernel and minimal real-time operating software with modern hypervisors and hypervisor designers also adopted micro-kernel techniques in their implementations. (1) List three concrete similarities mapping to certain mechanisms or principles used in both types of systems. (2) Then argue how both types of systems differ from each other, by focusing on new principles or mechanisms provided in hypervisors. Be concrete here. (3) Finally, consider the following issue: it is well-known that the page structures used by hypervisors have direct effects on the performance of the operating systems using them. There is therefore, an interaction between OS and hypervisor page structures and organizations. Hypervisors should be built to be able to adapt to this fact. Question 2 (Distributed Systems) Formal methods have played an important role in helping us understand, design, and implement distributed systems, a case in point being extensive past work on voting and consensus protocols, failure detection, dealing with unusual failures, etc. In this question, you are asked to develop a rigorous approach to making adjustments in distributed applications so that atomicity (either all adjustments are made or none) properties are guaranteed. This is because you have chosen to implement a distributed approach to monitoring and then managing large-scale systems that does not rely on a central controller but instead, permits peers to influence each other concerning the adjustments necessary to attain some desirable global system property. (1) Describe a concrete use case, including desirable adjustments to improve some (you define it) system property. (2) Describe the basic method(s) with which monitoring and adjustments are carried out. (3) Discuss where and where not atomicity of adjustments may be desired. (4) Formulate the atomicity case more precisely, and describe the mechanism you choose to implement this property across a distributed message-based system. (5) Discuss whether atomicity will hold for your mechanism in the presence of failures. More precisely, discuss which failures you can tolerate vs. which ones are not tolerable. Question 3 (Distributed and High Performance Systems) It appears that the only ways in which we can scale systems or applications is by giving up something, such as the strong consistency properties provided by smallscale shared memory multiprocessors but not at all guaranteed in large distributed systems like those used for web applications. (1) Provide specific examples from both the parallel and distributed domains where giving up some global strong property can result in substantial performance improvement (at least one example from each domain). Be precise about the properties in question and how implementations do/do not guarantee them. (2) A specific useful property (don't use it for 1. above!) is eventual consistency, often used in web applications, yet it is actually unclear how to exactly formulate this property. Come up with at least two useful formulations, referring to specific examples of applications with which you are familiar. Then describe tradeoffs for these formulations with respect to potential performance gains when using them. (3) Speculate on additional applications (at least two) where consistency support of this kind may be useful. Question 4 (Embedded Systems) Traditional embedded systems are typically isolated devices that provide specific functionality. In contrast, modern embedded systems usually have some kind of network access capability and become part of a larger distributed system. An example of such networked embedded systems is smart phones, which have hardware capabilities comparable to “normal” computers. Some smart phones run specialized OS kernels (e.g., Symbian) while others run specialized versions of “normal” OS’s (e.g., Windows Embedded and Embedded Linux). (1) Choose a concrete instance of a specialized OS kernel and an instance of a specialized “normal” OS. Compare their kernel call APIs (functionality) in terms of similarities and differences. Use an illustrative comparison by choosing one or two major OS components (e.g., network protocols, file systems, memory management) for a concrete comparison of one or two kernel calls. (2) If you compared two OS components in item (1) above, choose one component for this item. Compare the implementation of the OS component of (a) the specialized OS kernel (e.g., Symbian), (b) the specialized version of “normal” OS (e.g., Embedded Linux), and (c) a normal version of the “normal” OS (e.g., a normal release of Linux). The comparison should be concrete (more detailed than Wikipedia explanations), and illustrative (not exhaustive) using concrete examples to explain similarities and differences (no more than 2 pages). (3) For the 3 OS’s chosen in item (2) for discussion, explain briefly how battery management is enabled or supported in each OS as an example of their support for embedded systems. Use smart phones as illustrative environments for the specialized OS and specialized version of “normal” OS, and laptop as an illustrative environment for the “normal” OS. Question 5 (Autonomic System Management) With the continued growth of data centers and cloud computing environments, there is a growing need for autonomic management of large parallel and distributed systems, both at the system level and at the application level. (1) Consider highly parallelizable applications such as MapReduce on hadoop. (a) Explain how such applications may be decomposed and run on a large number of virtualized environments. (b) Explain what kind of system-level facilities is needed for achieving service level agreements (SLA) on performance (e.g., monitoring) and availability (e.g., recovery). (c) Choose a current OS kernel (e.g., RedHat Linux) and explain whether it supports the above facilities (part 1.b) or is lacking in such support. (2) Performance: Consider more complex applications such as N-tier applications used in e-commerce, typically including web servers, application servers, and database servers. When workloads grow, explain the ease (or difficulties) of scalability of each tier: (a) web servers, (b) application servers, and (c) database servers. [Hint: comparing with MapReduce may be helpful.] Explain the support (or lack of support) of current OS kernels (e.g., Linux) for facilitating dynamic system configuration adaptation due to application scalability requirements. If you are arguing against OS-level adaptation, you should give an answer (what facilities are needed and at which system level) to support your arguments for application-level adaptation. (3) Availability: Explain the support (or lack of support) of current OS kernels (e.g., Linux) for facilitating dynamic system configuration adaptation due to application recovery requirements such as business continuity. What system-level facilities would enable seamless recovery for “easily” scalable components such as web server? How about a database server? How about a MapReduce application? (4) Performance: Explain the difficulties of applying classic queuing theory (e.g., M/M/1) in describing N-tier applications. [Hint: consider the assumptions made in mean value analysis.] Explain two alternatives to classic queuing theory that may be able to model N-tier systems. Question 6 (File systems) File system is a well-researched topic. (1) Trace at least three important research advances made in the evolution of file systems. Justify why these are important advances. Be specific as to what is the performance problem in the file system that these advances are set out to solve. (2) Building on your answer to part (a), identify how commercial file systems (drawing examples from Linux, Microsoft, IBM, and Apple) have chosen to incorporate these research advances. (3) In a network file system using the client/server model, a centralized server performs a number of functions. Identify and discuss at least four sources of nonscalability with this approach. (4) Discuss solution approaches to address these sources of non-scalability in a centralized network file system. Question 7 (Parallel Systems) Inter-process communication (IPC) and synchronization are at the core of an operating system for a parallel machine or for that matter a multi- or many-core processor. For this question, assume a cache coherent NUMA shared memory multiprocessor. Each processor has a separate TLB, first and second level caches. The processors share a third level cache. The TLBs are internal to each processor and the hardware does not do TLB consistency. The operating system on each processor is autonomous and coordinates with its counter-parts on the other processors for IPC and synchronization. (1) Enumerate and discuss the OS mechanisms needed for managing the shared memory of a parallel application. Your answer should be complete with respect to how the memory is allocated (statically and dynamically) and shared among the threads of the application, how the paging system works, how the OS on each processor has a consistent view of the page table for an application, how the data is kept consistent, and so on. In other words, your answer should address how the OS and and the hardware work together to provide a consistent view of the whole memory hierarchy to the application. (2) What are the factors that limit the scalability of the OS mechanisms for communication that you described in part (1)? Consider the NUMA aspect of the multiprocessor. How can such sources of poor scalability be overcome? (3) Consider mutual exclusion lock. What are the typical concerns in implementing a mutual exclusion lock efficiently? How should such a lock be implemented in the OS to overcome such concerns? Question 8 (Transactions in OS) Time and again, OS researchers have taken a close look at supporting transactions either directly in the operating system, or as a subsystem for building higher-level OS services. Quicksilver, Camelot/Mach, LRVM, RioVista are all examples of such attempts in the past. (1) Give compelling reasons for supporting transactions for building higher-level services. Give concrete examples of how they may be used in building at least two higher-level services. (2) Give the pros and cons of a "library" approach to supporting transactions versus making transactions a fundamental unifying concept as is done in Quicksilver. Your answer should be complete in identifying the strength and weakness of each approach. (3) Given the evolution of software systems, does the idea of supporting transactions in the OS assume more importance or less importance? Once again your answer should be complete with justification, whichever stand you take. --- END ---