EXAMPLE QUESTIONS (SINCE FIRST EXAM) Final Examination: SOME DATE

advertisement
EXAMPLE QUESTIONS (SINCE FIRST EXAM)
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Examination: SOME DATE
Time: 240 minutes
Total Marks: 40 (IN THIS CASE THINGS MAY DIFFER FROM 40)
Grade Requirements: Three (20/40); four (28/40); and five (36/40).
Assistance: None (closed book, closed notes, and no electronics)
Instructor: Niklas Carlsson
Instructions:
 Read all instructions carefully (including these)!!!! Some questions have multiple
tasks/parts. Please make sure to address all of these.
 The total possible marks granted for each question are given in parentheses. The
entire test will be graded out of 40. This gives you 10 marks per hour, or six
minutes per mark, plan your time accordingly.
 This examination consists of a total of 9+1=10 questions. Check to ensure that
this exam is complete.
 When applicable, please explain how you derived your answers. Your final
answers should be clearly stated.
 Write answers legibly; no marks will be given for answers that cannot be read
easily.
 Where a discourse or discussion is called for, be concise and precise.
 If necessary, state any assumptions you made in answering a question. However,
remember to read the instructions for each question carefully and answer the
questions as precisely as possible. Solving the wrong question may result in
deductions! It is better to solve the right question incorrectly, than the wrong
question correctly.
 Please write your AID number, exam code, page numbers (even if the questions
indicate numbers as well), etc. at the top/header of each page. (This ensures that
marks always can be accredited to the correct individual, while ensuring that the
exam is anonymous.)
 Please answer in English to largest possible extent, and try to use Swedish or
”Swenglish” only as needed to support your answers.
 If needed, feel free to bring a dictionary from an official publisher. Hardcopy, not
electronic!! Also, your dictionary is not allowed to contain any notes; only the
printed text by the publisher.
 Good luck with the exam.
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
Part A: Distributed Systems
Some example questions ...
7) Question: Mutual exclusion and coordinator selection (4)
Consider a simple scenario in which there are five nodes A, B, C, D, and E. Use a
sequence of figures to illustrate and explain the message sequences and coordination
between these nodes when (i) node A acts as a central coordinator for a shared memory
resources (that all five nodes can use) and both nodes B and C almost at the same time
decides that they want to write to the resource, (ii) the bully algorithm is used to elect a
new coordinator when node A crash.
8) Question: Lamport’s clock (4)
Assume that you have three processes p1, p2, and p3 which are implementing Lamport’s
clocks. There are many events that take place at these processes, including some
messages being sent between the processes. In the figure below we use circles and
arrows to specify in-processor events and messages being sent between processes,
respectively. Please provide the logical timestamps associated with each event. You can
assume that all three clocks start at zero, at the left-most point in time. (Also, explain
how the processes would adjust their clocks if using Lamport’s logical clocks.)
time @ p1
time @ p2
time @ p3
8) Question: Transparency (4)
Please give two concrete examples illustrating where and how transparency is
implemented in some example distributed system(s), and explain why transparency is
important in such a distributed system.
7) Question: Transparency and multi-tier systems (6)
Transparency plays a central role in some distributed systems. Consider a simple multitier system with three levels: a user interface, an application server, and two replicated
database servers. Assume these layers are implemented at different geographic locations
2
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
and that the average round trip time (RTT) between the machines used in the consecutive
layers (starting with the top-tier layer) is 60ms and 20ms, respectively. Consider a fully
synchronize call in which the application server require 100ms total processing and the
database require 220ms processing to satisfy the request.
(a) How long time is the client process looked from the moment it makes the request
to the application server? You can assume that 6kB is transferred between the
database and the server before the server can respond, and this transfer require a
TCP connection establishment (done at the initial request by the server) and four
TCP messages transferred after the database is done its processing. You can
assume that no packets are lost in the transfer and that the transfer happens during
slow start phase. Please explain your answer and illustrate with a figure.
(b) Please explain how a cache could help reduce the response times. Give concrete
example and remember to explain your answers.
Part B: Methodology
Some example questions ...
Question X (4 points)
Consider a web proxy that serves a large population with web clients. Over a long time
period over which the system have operated in steady state, it has been observed that on
average 20 new user sessions start per minute, and that user sessions last on average 30
minutes. On average the proxy see a 1000 HTTP request per second (including
embedded images and other drag along content). Please estimate the average number of
requests per user session. (Hint: You may want to use Little’s law twice.)
Question X (4 points)
In class we discussed how simulations, analytic models, and experimental
instrumentation can be used to evaluate the performance of both large and small
computer systems. Please explain the advantages and disadvantages of each approach.
Also, please provide a concrete system example (e.g., an L1 cache, a webserver, a server
cluster, a datacenter, or a wide area distributed system) to discuss and illustrate these
tradeoffs.
Question X (4 points)
When designing experiments, it is important to carefully identify the most appropriate
factors, levels, and metrics to consider. Consider a researcher wanting to assess the
performance of a simple webserver. Please describe the process of selecting factors,
levels, and metrics. For this step, please list a set of factors and their levels that you think
would be reasonable to consider when wanting to understand how the system perform
under varying workloads (load, job sizes, and mix of two job types, for example).
Assuming that all metrics that you identified can be measured in parallel, please outline
3
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
and explain how many experiments that you would need to perform if performing a full
factor experiment.
Question X (4 points)
Power law distributions (and other related “heavy tailed” distributions) are often
observed in nature and large distributed systems. How does “heavy tailed” distributions
relate to exponential distributions? Power law relationships are often illustrated using
log-log plots. Please use equations to explain why this can be an effective illustration of
these distributions. In the context of the Internet topology or a social network, please
explain the mapping of the “heavy tails” to routers or people. In other words, which
Autonomous Systems (ASes) and people would correspond to the heavy tail?
Part C: Multicore and Parallel Programming
Four classes of practice questions ...
Questions on Parallel Architecture Concepts
PAR-Q1:
Define the following technical terms:
 Heterogeneous multicore processor
 SIMD instructions
 Moore's Law
 Cluster (in high-performance computing)
Be thorough and general. An example is not a definition.
PAR-Q2:
There are three fundamental "walls" for traditional single-threaded CPU technology that
led to the currently prevailing concepts "multicore" and "hardware multithreading".
Which ones?
PAR-Q3:
Designs for interconnection networks differ in cost and in other properties such as
scalability. Name and briefly describe one type of interconnection network where
throughput does not scale up to large numbers of nodes. Motivate your answer.
Thread programming:
THR-Q1:
On a shared-memory system, (low-level) parallel applications could be written using
multiple processes (using shared-memory for inter-process communication) or using one
process with multiple threads. Which of these two is more suitable / more efficient, and
why?
THR-Q2:
4
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
A naive implementation of spinlocks using test_and_set is likely to lead to performance
problems. Sketch one possible (software-controlled) way to resolve the problem.
THR-Q3:
A simple task pool data structure, as introduced in the lecture, could be a queue (e.g.,
cyclic buffer) containing pointers to task descriptors, along with push and pop operations
that are, traditionally, synchronized by a global mutex lock. Which performance problem
could result from this solution if tasks are small and the number of threads is high?
THR-Q4:
The function pthread_mutex_trylock() takes an address of a pthreads mutex variable as
its argument and returns an integer (basically, a boolean). What could it be used for?
THR-Q5:
Write a shared-memory parallel program using threads (pseudocode is fine, explain your
code) that determines the maximum value in a large array of N elements. Which parallel
algorithmic design pattern do you use? Derive the asymptotic worst-case parallel
execution time, parallel speed-up, parallel work, and parallel cost for your algorithm as
functions (big-O notation) in N and the number P of processors, using the (EREW)
PRAM model.
Questions on MPI:
MPI-Q1:
The \texttt{MPI\_Barrier} operation is a collective communication operation with the
following signature:
int MPI_Barrier ( MPI_Comm comm );
where the current communicator is passed as \texttt{comm}.
Once called, the function does not return control to the caller before all p MPI processes
belonging to the process group of communicator comm have called MPI_Barrier
(with this communicator).
1. Design an implementation of MPI_Barrier using only MPI_Send and
MPI_Recv operations. Use C, Fortran, or equivalent pseudocode (explain your
language constructs if you are not sure about the right syntax). Which parallel
algorithmic design pattern do you use? Explain your code.
(Note: There exist several possibilities. Try to design an algorithm that is both
time-optimal and work-optimal for large p.)
2. Show how your code behaves over time by drawing a processor-time diagram for
p=8 showing when messages are sent from which source to which destination,
and what they contain.
3. Analyze asymptotically the (worst-case) parallel execution time, the parallel work
and the parallel cost of your implementation (for arbitrary values of p) as a
5
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
function in p. You may use the delay model (simple linear cost model for
communication delay) for this purpose. If you need to make further assumptions,
state them clearly.
Questions on Design and Analysis of Parallel Algorithms
ALG-Q1:
Derive Amdahl's Law and give its interpretation.
ALG-Q2:
What is the difference between relative and absolute parallel speed-up? Which of these is
expected to be higher?
ALG-Q3:
Give two different examples for parallel speedup anomalies.
ALG-Q4:
Describe Foster's PCAM method of designing parallel programs. What are the four main
steps, what is their outcome, and which of the steps are platform-specific?
ALG-Q5:
The PRAM (Parallel Random Access Machine) computation model has the simplestpossible parallel cost model. Which aspects of a real-world parallel computer does it
represent, and which aspects does it abstract from?
Part D: Embedded Systems
Some example questions ...
Question XX (4 points)
Consider an application modelled as the task graph below. Each task, when activated,
consumes one message on each input edge and generates, at termination, one message on
each output edge. The task graph is executed on the architecture shown in the figure.
Execution times of the tasks, when executed on the corresponding processor, are shown
in the table. All messages transmitted over the bus, between tasks mapped on different
processors, consume 2 time units to reach the destination. Communication between tasks
mapped to the same processor is considered to not consume any time.
Propose an efficient task mapping (indicate on which processor each task is executed)
and a corresponding static (nonpreemptive) schedule for the application. Illustrate your
schedule as a Gantt chart (similar to the way we captured schedules in Lecture 1&2).
Try to achieve a maximum delay (the time interval between the start of T1 and the
finishing of T7) of 46 time units!
6
TDDD93/TEN2 – Large-scale distributed systems and networks
Final Exam: SOME DATE
Question XX (3 points)
In the lectures we have particularly emphasized three design steps: architecture selection,
task mapping, elaboration of a schedule. Explain, in short, what each step is doing.
Illustrate the three steps by a small example.
Question XX (3 points)
Dynamic power management. How does it work? What knobs do we need from the
underlying hardware platform?
Bonus (only on original exam)
Bonus Question: XXXX (4)
Blah blah
Good luck!!
7
Download