Systems Ph.D. Qualifying Exam Spring 2010 (March 30, 2010)

advertisement
Systems Ph.D. Qualifying Exam
Spring 2010 (March 30, 2010)
NOTE: PLEASE ATTEMPT 6 OUT OF THE 8 QUESTIONS GIVEN BELOW.
Question 1 (OS Kernels and Hypervisors)
There are many commonalities between systems like Exokernel and minimal real-time
operating software with modern hypervisors and hypervisor designers also adopted
micro-kernel techniques in their implementations.
(1) List three concrete similarities mapping to certain mechanisms or principles used
in both types of systems.
(2) Then argue how both types of systems differ from each other, by focusing on new
principles or mechanisms provided in hypervisors. Be concrete here.
(3) Finally, consider the following issue: it is well-known that the page structures
used by hypervisors have direct effects on the performance of the operating
systems using them. There is therefore, an interaction between OS and hypervisor
page structures and organizations. Hypervisors should be built to be able to adapt
to this fact.
Question 2 (Distributed Systems)
Formal methods have played an important role in helping us understand, design, and
implement distributed systems, a case in point being extensive past work on voting and
consensus protocols, failure detection, dealing with unusual failures, etc. In this question,
you are asked to develop a rigorous approach to making adjustments in distributed
applications so that atomicity (either all adjustments are made or none) properties are
guaranteed. This is because you have chosen to implement a distributed approach to
monitoring and then managing large-scale systems that does not rely on a central
controller but instead, permits peers to influence each other concerning the adjustments
necessary to attain some desirable global system property.
(1) Describe a concrete use case, including desirable adjustments to improve some
(you define it) system property.
(2) Describe the basic method(s) with which monitoring and adjustments are carried
out.
(3) Discuss where and where not atomicity of adjustments may be desired.
(4) Formulate the atomicity case more precisely, and describe the mechanism you
choose to implement this property across a distributed message-based system.
(5) Discuss whether atomicity will hold for your mechanism in the presence of
failures. More precisely, discuss which failures you can tolerate vs. which ones
are not tolerable.
Question 3 (Distributed and High Performance Systems)
It appears that the only ways in which we can scale systems or applications is by
giving up something, such as the strong consistency properties provided by smallscale shared memory multiprocessors but not at all guaranteed in large distributed
systems like those used for web applications.
(1)
Provide specific examples from both the parallel and distributed domains
where giving up some global strong property can result in substantial
performance improvement (at least one example from each domain). Be
precise about the properties in question and how implementations do/do not
guarantee them.
(2)
A specific useful property (don't use it for 1. above!) is eventual consistency,
often used in web applications, yet it is actually unclear how to exactly
formulate this property. Come up with at least two useful formulations,
referring to specific examples of applications with which you are familiar.
Then describe tradeoffs for these formulations with respect to potential
performance gains when using them.
(3)
Speculate on additional applications (at least two) where consistency support
of this kind may be useful.
Question 4 (Embedded Systems)
Traditional embedded systems are typically isolated devices that provide specific
functionality. In contrast, modern embedded systems usually have some kind of network
access capability and become part of a larger distributed system. An example of such
networked embedded systems is smart phones, which have hardware capabilities
comparable to “normal” computers. Some smart phones run specialized OS kernels (e.g.,
Symbian) while others run specialized versions of “normal” OS’s (e.g., Windows
Embedded and Embedded Linux).
(1) Choose a concrete instance of a specialized OS kernel and an instance of a
specialized “normal” OS. Compare their kernel call APIs (functionality) in terms
of similarities and differences. Use an illustrative comparison by choosing one or
two major OS components (e.g., network protocols, file systems, memory
management) for a concrete comparison of one or two kernel calls.
(2) If you compared two OS components in item (1) above, choose one component
for this item. Compare the implementation of the OS component of (a) the
specialized OS kernel (e.g., Symbian), (b) the specialized version of “normal” OS
(e.g., Embedded Linux), and (c) a normal version of the “normal” OS (e.g., a
normal release of Linux). The comparison should be concrete (more detailed than
Wikipedia explanations), and illustrative (not exhaustive) using concrete
examples to explain similarities and differences (no more than 2 pages).
(3) For the 3 OS’s chosen in item (2) for discussion, explain briefly how battery
management is enabled or supported in each OS as an example of their support
for embedded systems. Use smart phones as illustrative environments for the
specialized OS and specialized version of “normal” OS, and laptop as an
illustrative environment for the “normal” OS.
Question 5 (Autonomic System Management)
With the continued growth of data centers and cloud computing environments, there is a
growing need for autonomic management of large parallel and distributed systems, both
at the system level and at the application level.
(1) Consider highly parallelizable applications such as MapReduce on hadoop. (a)
Explain how such applications may be decomposed and run on a large number of
virtualized environments. (b) Explain what kind of system-level facilities is
needed for achieving service level agreements (SLA) on performance (e.g.,
monitoring) and availability (e.g., recovery). (c) Choose a current OS kernel (e.g.,
RedHat Linux) and explain whether it supports the above facilities (part 1.b) or is
lacking in such support.
(2) Performance: Consider more complex applications such as N-tier applications
used in e-commerce, typically including web servers, application servers, and
database servers. When workloads grow, explain the ease (or difficulties) of
scalability of each tier: (a) web servers, (b) application servers, and (c) database
servers. [Hint: comparing with MapReduce may be helpful.] Explain the support
(or lack of support) of current OS kernels (e.g., Linux) for facilitating dynamic
system configuration adaptation due to application scalability requirements. If you
are arguing against OS-level adaptation, you should give an answer (what
facilities are needed and at which system level) to support your arguments for
application-level adaptation.
(3) Availability: Explain the support (or lack of support) of current OS kernels (e.g.,
Linux) for facilitating dynamic system configuration adaptation due to application
recovery requirements such as business continuity. What system-level facilities
would enable seamless recovery for “easily” scalable components such as web
server? How about a database server? How about a MapReduce application?
(4) Performance: Explain the difficulties of applying classic queuing theory (e.g.,
M/M/1) in describing N-tier applications. [Hint: consider the assumptions made in
mean value analysis.] Explain two alternatives to classic queuing theory that may
be able to model N-tier systems.
Question 6 (File systems)
File system is a well-researched topic.
(1) Trace at least three important research advances made in the evolution of file
systems. Justify why these are important advances. Be specific as to what is the
performance problem in the file system that these advances are set out to solve.
(2) Building on your answer to part (a), identify how commercial file systems
(drawing examples from Linux, Microsoft, IBM, and Apple) have chosen to
incorporate these research advances.
(3) In a network file system using the client/server model, a centralized server
performs a number of functions. Identify and discuss at least four sources of nonscalability with this approach.
(4) Discuss solution approaches to address these sources of non-scalability in a
centralized network file system.
Question 7 (Parallel Systems)
Inter-process communication (IPC) and synchronization are at the core of an operating
system for a parallel machine or for that matter a multi- or many-core processor. For this
question, assume a cache coherent NUMA shared memory multiprocessor. Each
processor has a separate TLB, first and second level caches. The processors share a third
level cache. The TLBs are internal to each processor and the hardware does not do TLB
consistency. The operating system on each processor is autonomous and coordinates
with its counter-parts on the other processors for IPC and synchronization.
(1) Enumerate and discuss the OS mechanisms needed for managing the shared
memory of a parallel application. Your answer should be complete with respect
to how the memory is allocated (statically and dynamically) and shared among the
threads of the application, how the paging system works, how the OS on each
processor has a consistent view of the page table for an application, how the data
is kept consistent, and so on. In other words, your answer should address how the
OS and and the hardware work together to provide a consistent view of the whole
memory hierarchy to the application.
(2) What are the factors that limit the scalability of the OS mechanisms for
communication that you described in part (1)? Consider the NUMA aspect of the
multiprocessor. How can such sources of poor scalability be overcome?
(3) Consider mutual exclusion lock. What are the typical concerns in implementing a
mutual exclusion lock efficiently? How should such a lock be implemented in the
OS to overcome such concerns?
Question 8 (Transactions in OS)
Time and again, OS researchers have taken a close look at supporting transactions either
directly in the operating system, or as a subsystem for building higher-level OS
services. Quicksilver, Camelot/Mach, LRVM, RioVista are all examples of such
attempts in the past.
(1) Give compelling reasons for supporting transactions for building higher-level
services. Give concrete examples of how they may be used in building at least
two higher-level services.
(2) Give the pros and cons of a "library" approach to supporting transactions versus
making transactions a fundamental unifying concept as is done in
Quicksilver. Your answer should be complete in identifying the strength and
weakness of each approach.
(3) Given the evolution of software systems, does the idea of supporting transactions
in the OS assume more importance or less importance? Once again your answer
should be complete with justification, whichever stand you take.
--- END ---
Download