Systems Area Qualifier Written Exam (2008 Fall)

advertisement
Systems Area Qualifier Written Exam (2008 Fall)
You should choose 6 out of 9 questions in this written exam. Good luck!
Question 1. Distributed system design
Consider an RPC subsystem.
a) Give a breakdown of the costs involved in performing an RPC in a RPC
subsystem.
b) As a system designer, what are the avenues available to you for shaving each of
the component cost you identified in part (a)? In discussing such avenues, you
have to clearly state what assumptions you are making about the execution
environment (OS and hardware) to shave the costs, and the pros and cons of your
design choices.
c) Given today’s multi-core platforms, come up with at least one new way of
reducing RPC cost you have not seen in prior RPC papers. Describe and defend it.
Question 2. OS structures
a) Microkernel based OSs will always be less efficient compared to Monolithic
kernels. Based on your understanding of OS structures, is this statement True or
False? Explain your answer with technical detail about some specific microkernel
design.
b) To what extent has the idea of configurable operating system kernels (like SPIN)
influenced commercial operating systems? Compare that with the influence of
microkernels (e.g., L3) or `thin’ OSs (e.g., Exokernel). (Note: For the sake of this
question, any flavor of Unix and Microsoft Windows fall under commercial OS).
You have to clearly explain how they have been or have not been influential and
back such statements with facts and reasons.
Question 3. Event Processing in Distributed Systems
Researchers in distributed systems have posited that it may not be necessary to precisely
know the causal relationships between events in different processes, but instead, it is
sufficient to understand which set of events is concurrent (i.e., events that could not have
influenced each other and therefore, could not be the ones to cause bugs or give rise to
race conditions). Are algorithms that identify concurrency or not concurrency for events
less or more complex (or the same) to implement than algorithms that determine
causality? Elaborate your answer.
Hint: a good way to start thinking about this point may be to draw time diagrams for sets
of representative events.
Question 4. Parallel Systems
Multicore platforms have become unavoidable. An issue with future platforms with
hundreds of cores is how to program/organize them in order to attain high levels of
performance. Approaches advanced in the literature include i) forming multiple `cells’
comprised of smaller numbers of cores, where each cell has independent failure
properties and is isolated from other cells in terms of performance, ii) adding hardware
features like token busses for inter- and/or intra-cell coordination, and others. In this
question, you are asked to design a programmable collective communication/computation
construct that can be used for cross-cell program coordination, the assumption being that
within a single cell we used standard coordination (e.g., synchronization) primitives but
across different cells we use message-like synchronization using your construct.
a) Describe the design of your programmable coordination construct.
b) Illustrate its use with a simple example of a hypervisor-level service (i.e., the HV
enforces and manages cells). Hint: consider scheduling.
c) Speculate on useful hardware support fore your construct.
Question 5. Virtualization
A key problem with system virtualization is I/O. Some recent proposals have resurrected
IBM’s idea of channels and channel processors to ‘fix’ some of the issues with I/O. This
question explores that solution.
a) Using a standard Xen system, explain why and to what extent I/O is a
performance issue. Make sure you discuss both full and para-virtualization
solutions to I/O.
b) Given I/O channels, what capabilities do they have to have in order to address the
I/O problems you have identified in (i)? Use this description to define your notion
of I/O channel.
c) Speculate on the hardware support needed to improve the viability of this
approach, for standard ia-based architectures (Hints: Intel’s VT architecture is one
start on this. IBM’s z system has lots of support for this)
Question 6. Real Time Systems
Real-time systems have time constraints that impose deadlines on task completion,
consequently the allocation of system resources must take into account the task deadlines.
a) Explain the differences between hard real-time and soft real-time constraints.
b) Give an example of CPU scheduler that can achieve hard real-time guarantees.
You are required to elaborate your example by answering the following questions:
i) Describe how the scheduler works.
ii) Give detailed explanation of the assumptions made by the scheduler.
iii) Outline the performance of the scheduler in terms of worst case achievable
utilization and discuss its overhead.
c) Explain the difficulties that may arise when trying to apply the scheduler you
described in (b) to a distributed system with heterogeneous network connections.
Question 7. Replication and Fault-Tolerance in Distributed Systems
Distributed state sharing is an important capability in distributed file systems with
replication or distributed shared memory systems. Consider the following problem that
would arise when we are concerned about when and how often certain state is accessed
by various nodes of the distributed system. Assume that in addition to reading and
writing common state, the users are also allowed to query when and on what nodes
certain objects were read or written. The returned information may be per node count or
more detailed information on resource utilization, including timestamps. This information
may be used to detect anomalous access to objects, to recover some state information due
to node failure, and so forth. We want any node to be able to access this information even
if some other nodes fail. You are asked to develop an algorithm for making such
information available to certain users under the following conditions:
(a) Node failures are not considered and but consistent results (correct number of reads
and writes) must be provided.
(b) Nodes may experience crash failures and results returned may be stale or may only
reflect operations that executed at non-faulty nodes.
(c) Nodes can experience Byzantine failures. In this case, returned results must be correct
with respect to all reads and writes that happen at non-faulty nodes.
You may have to limit the number of failures in cases (2) and (3). If you do, specify the
maximum number of failures that can be tolerated. If a solution cannot be developed in
the presence of certain kind of failures, you need to explain why that is the case.
Question 8. Distributed File Systems
Network file systems such as the Andrew File System (AFS) and the World Wide Web
both provide users access to remote files, but the two systems have very different user
interfaces. In network file systems, the user only needs to mount the remote file system
onto the local machine. Then he or she can access remote files just as if they were local.
In WWW, a "browser" sends an "address" (a URL) of the file to a Web Server and
displays the result to the user.
(a) Discuss the major differences between the two systems, and elaborate your answers.
Hint: At least you need to discuss how these two systems differ from granularity of file
accesses; semantics of the caching of remote files on local disks; and handling concurrent
reads and writes of a file.
(b) Since the two systems provide different services to the end users. Naturally, their
implementations are different. Based on your understanding on the main features of the
AFS file system implementation, suggest an algorithm for implementing the WWW and
discuss why your algorithm is better than the current WWW.
Hint: You do not need to describe how a browser implements the display of the Web
documents, but you should suggest a framework for maintaining a cache of recentlyaccessed documents to decrease network traffic and for making sure that the documents
in the cache are up-to-date.
(c) If AFS were universally available, would that simplify the task of implementing the
WWW? Elaborate your answer.
Question 9. Specialization
(Read the entire question before starting to answer.) Program specialization is similar to
partial evaluation. A generic program can be specialized when there are _invariants_ that
make some sequence of instructions superfluous, since they always produce the same
results at the end. The idea of program specialization is to replace the sequence of
instructions with the result, thus improving program performance. The main difficulty in
OS specialization is that there are _quasi-invariants_ that remain true almost all of the
time, but that could be invalidated in rare situations.
(a) In their SOSP'95 paper, Pu et al describe the specialization of the HP-UX file system.
They specialize the read system call using the exclusive sequential access quasi-invariant,
which is considered the most common case in Unix file systems. Give an example of
quasi-invariant in another OS module that may improve system performance through
specialization. Hint: I/O subsystems are easier candidates.
(b) To guard against the rare cases when quasi-invariants may be invalidated, we need to
insert _guards_ into the system when applying program specialization. For example, in
the specialization of HP-UX read system call, they inserted guards into the open system
call, which may invalidate the exclusive access quasi-invariant when a second process
opens the same file. A less obvious guard is inserted into the dup system call, which may
produce the same result. For the example of quasi-invariant you gave in sub-question (a),
give two examples of places you need to guard against quasi-invariant invalidation. If
you believe there is only one guard necessary, present an argument that you don't need
any other guards.
Download