EXAMPLE QUESTIONS (SINCE FIRST EXAM) TDDD93/TEN2 – Large-scale distributed systems and networks Final Examination: SOME DATE Time: 240 minutes Total Marks: 40 (IN THIS CASE THINGS MAY DIFFER FROM 40) Grade Requirements: Three (20/40); four (28/40); and five (36/40). Assistance: None (closed book, closed notes, and no electronics) Instructor: Niklas Carlsson Instructions: Read all instructions carefully (including these)!!!! Some questions have multiple tasks/parts. Please make sure to address all of these. The total possible marks granted for each question are given in parentheses. The entire test will be graded out of 40. This gives you 10 marks per hour, or six minutes per mark, plan your time accordingly. This examination consists of a total of 9+1=10 questions. Check to ensure that this exam is complete. When applicable, please explain how you derived your answers. Your final answers should be clearly stated. Write answers legibly; no marks will be given for answers that cannot be read easily. Where a discourse or discussion is called for, be concise and precise. If necessary, state any assumptions you made in answering a question. However, remember to read the instructions for each question carefully and answer the questions as precisely as possible. Solving the wrong question may result in deductions! It is better to solve the right question incorrectly, than the wrong question correctly. Please write your AID number, exam code, page numbers (even if the questions indicate numbers as well), etc. at the top/header of each page. (This ensures that marks always can be accredited to the correct individual, while ensuring that the exam is anonymous.) Please answer in English to largest possible extent, and try to use Swedish or ”Swenglish” only as needed to support your answers. If needed, feel free to bring a dictionary from an official publisher. Hardcopy, not electronic!! Also, your dictionary is not allowed to contain any notes; only the printed text by the publisher. Good luck with the exam. TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE Part A: Distributed Systems Some example questions ... 7) Question: Mutual exclusion and coordinator selection (4) Consider a simple scenario in which there are five nodes A, B, C, D, and E. Use a sequence of figures to illustrate and explain the message sequences and coordination between these nodes when (i) node A acts as a central coordinator for a shared memory resources (that all five nodes can use) and both nodes B and C almost at the same time decides that they want to write to the resource, (ii) the bully algorithm is used to elect a new coordinator when node A crash. 8) Question: Lamport’s clock (4) Assume that you have three processes p1, p2, and p3 which are implementing Lamport’s clocks. There are many events that take place at these processes, including some messages being sent between the processes. In the figure below we use circles and arrows to specify in-processor events and messages being sent between processes, respectively. Please provide the logical timestamps associated with each event. You can assume that all three clocks start at zero, at the left-most point in time. (Also, explain how the processes would adjust their clocks if using Lamport’s logical clocks.) time @ p1 time @ p2 time @ p3 8) Question: Transparency (4) Please give two concrete examples illustrating where and how transparency is implemented in some example distributed system(s), and explain why transparency is important in such a distributed system. 7) Question: Transparency and multi-tier systems (6) Transparency plays a central role in some distributed systems. Consider a simple multitier system with three levels: a user interface, an application server, and two replicated database servers. Assume these layers are implemented at different geographic locations 2 TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE and that the average round trip time (RTT) between the machines used in the consecutive layers (starting with the top-tier layer) is 60ms and 20ms, respectively. Consider a fully synchronize call in which the application server require 100ms total processing and the database require 220ms processing to satisfy the request. (a) How long time is the client process looked from the moment it makes the request to the application server? You can assume that 6kB is transferred between the database and the server before the server can respond, and this transfer require a TCP connection establishment (done at the initial request by the server) and four TCP messages transferred after the database is done its processing. You can assume that no packets are lost in the transfer and that the transfer happens during slow start phase. Please explain your answer and illustrate with a figure. (b) Please explain how a cache could help reduce the response times. Give concrete example and remember to explain your answers. Part B: Methodology Some example questions ... Question X (4 points) Consider a web proxy that serves a large population with web clients. Over a long time period over which the system have operated in steady state, it has been observed that on average 20 new user sessions start per minute, and that user sessions last on average 30 minutes. On average the proxy see a 1000 HTTP request per second (including embedded images and other drag along content). Please estimate the average number of requests per user session. (Hint: You may want to use Little’s law twice.) Question X (4 points) In class we discussed how simulations, analytic models, and experimental instrumentation can be used to evaluate the performance of both large and small computer systems. Please explain the advantages and disadvantages of each approach. Also, please provide a concrete system example (e.g., an L1 cache, a webserver, a server cluster, a datacenter, or a wide area distributed system) to discuss and illustrate these tradeoffs. Question X (4 points) When designing experiments, it is important to carefully identify the most appropriate factors, levels, and metrics to consider. Consider a researcher wanting to assess the performance of a simple webserver. Please describe the process of selecting factors, levels, and metrics. For this step, please list a set of factors and their levels that you think would be reasonable to consider when wanting to understand how the system perform under varying workloads (load, job sizes, and mix of two job types, for example). Assuming that all metrics that you identified can be measured in parallel, please outline 3 TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE and explain how many experiments that you would need to perform if performing a full factor experiment. Question X (4 points) Power law distributions (and other related “heavy tailed” distributions) are often observed in nature and large distributed systems. How does “heavy tailed” distributions relate to exponential distributions? Power law relationships are often illustrated using log-log plots. Please use equations to explain why this can be an effective illustration of these distributions. In the context of the Internet topology or a social network, please explain the mapping of the “heavy tails” to routers or people. In other words, which Autonomous Systems (ASes) and people would correspond to the heavy tail? Part C: Multicore and Parallel Programming Four classes of practice questions ... Questions on Parallel Architecture Concepts PAR-Q1: Define the following technical terms: Heterogeneous multicore processor SIMD instructions Moore's Law Cluster (in high-performance computing) Be thorough and general. An example is not a definition. PAR-Q2: There are three fundamental "walls" for traditional single-threaded CPU technology that led to the currently prevailing concepts "multicore" and "hardware multithreading". Which ones? PAR-Q3: Designs for interconnection networks differ in cost and in other properties such as scalability. Name and briefly describe one type of interconnection network where throughput does not scale up to large numbers of nodes. Motivate your answer. Thread programming: THR-Q1: On a shared-memory system, (low-level) parallel applications could be written using multiple processes (using shared-memory for inter-process communication) or using one process with multiple threads. Which of these two is more suitable / more efficient, and why? THR-Q2: 4 TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE A naive implementation of spinlocks using test_and_set is likely to lead to performance problems. Sketch one possible (software-controlled) way to resolve the problem. THR-Q3: A simple task pool data structure, as introduced in the lecture, could be a queue (e.g., cyclic buffer) containing pointers to task descriptors, along with push and pop operations that are, traditionally, synchronized by a global mutex lock. Which performance problem could result from this solution if tasks are small and the number of threads is high? THR-Q4: The function pthread_mutex_trylock() takes an address of a pthreads mutex variable as its argument and returns an integer (basically, a boolean). What could it be used for? THR-Q5: Write a shared-memory parallel program using threads (pseudocode is fine, explain your code) that determines the maximum value in a large array of N elements. Which parallel algorithmic design pattern do you use? Derive the asymptotic worst-case parallel execution time, parallel speed-up, parallel work, and parallel cost for your algorithm as functions (big-O notation) in N and the number P of processors, using the (EREW) PRAM model. Questions on MPI: MPI-Q1: The \texttt{MPI\_Barrier} operation is a collective communication operation with the following signature: int MPI_Barrier ( MPI_Comm comm ); where the current communicator is passed as \texttt{comm}. Once called, the function does not return control to the caller before all p MPI processes belonging to the process group of communicator comm have called MPI_Barrier (with this communicator). 1. Design an implementation of MPI_Barrier using only MPI_Send and MPI_Recv operations. Use C, Fortran, or equivalent pseudocode (explain your language constructs if you are not sure about the right syntax). Which parallel algorithmic design pattern do you use? Explain your code. (Note: There exist several possibilities. Try to design an algorithm that is both time-optimal and work-optimal for large p.) 2. Show how your code behaves over time by drawing a processor-time diagram for p=8 showing when messages are sent from which source to which destination, and what they contain. 3. Analyze asymptotically the (worst-case) parallel execution time, the parallel work and the parallel cost of your implementation (for arbitrary values of p) as a 5 TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE function in p. You may use the delay model (simple linear cost model for communication delay) for this purpose. If you need to make further assumptions, state them clearly. Questions on Design and Analysis of Parallel Algorithms ALG-Q1: Derive Amdahl's Law and give its interpretation. ALG-Q2: What is the difference between relative and absolute parallel speed-up? Which of these is expected to be higher? ALG-Q3: Give two different examples for parallel speedup anomalies. ALG-Q4: Describe Foster's PCAM method of designing parallel programs. What are the four main steps, what is their outcome, and which of the steps are platform-specific? ALG-Q5: The PRAM (Parallel Random Access Machine) computation model has the simplestpossible parallel cost model. Which aspects of a real-world parallel computer does it represent, and which aspects does it abstract from? Part D: Embedded Systems Some example questions ... Question XX (4 points) Consider an application modelled as the task graph below. Each task, when activated, consumes one message on each input edge and generates, at termination, one message on each output edge. The task graph is executed on the architecture shown in the figure. Execution times of the tasks, when executed on the corresponding processor, are shown in the table. All messages transmitted over the bus, between tasks mapped on different processors, consume 2 time units to reach the destination. Communication between tasks mapped to the same processor is considered to not consume any time. Propose an efficient task mapping (indicate on which processor each task is executed) and a corresponding static (nonpreemptive) schedule for the application. Illustrate your schedule as a Gantt chart (similar to the way we captured schedules in Lecture 1&2). Try to achieve a maximum delay (the time interval between the start of T1 and the finishing of T7) of 46 time units! 6 TDDD93/TEN2 – Large-scale distributed systems and networks Final Exam: SOME DATE Question XX (3 points) In the lectures we have particularly emphasized three design steps: architecture selection, task mapping, elaboration of a schedule. Explain, in short, what each step is doing. Illustrate the three steps by a small example. Question XX (3 points) Dynamic power management. How does it work? What knobs do we need from the underlying hardware platform? Bonus (only on original exam) Bonus Question: XXXX (4) Blah blah Good luck!! 7