Operating Systems (234123) – Spring-2013 (Homework 3 Wet) Homework 3 Wet Due date: Sunday, 9/06/2013 12:30 noon Teaching assistants in charge: Anastasia Braginsky Important: this semester forum only. Please note, clarification/correction of guidelines to use the the Q&A for the exercise will take place at a public the forum is a part of the exercise, any that will be published in the forum is a *MUST*. A number forum: Read previous Q&A carefully before asking the question; repeated questions will probably go without answers Be polite, remember that course staff does this as a service for the students You’re not allowed to post any kind of solution and/or source code in the forum as a hint for other students; In case you feel that you have to discuss such a matter, please come to the reception hour When posting questions regarding hw3, put them in the hw3 folder Note: Start working on the assignment as soon as possible!!! This assignment involves algorithmic design, a lot of code writing, and extensive testing. There will be no postponement. 1 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) 1. Introduction Welcome to the world of concurrent data structures! Counting is one of the very basic and natural activities that computers do. However, for concurrent programs running on multi-core/multi-processor machines (particularly, on very large shared-memory processors), designing a counting algorithm that scales well is not an easy task. We are going to give you an idea of how to exploit the parallelism for a counter updates, but you will need to think yourself about many synchronization details. We need concrete and correct results. Pay attention, that concurrent counting should take care of the lock contention. In traditional ways of counting (just lock around the counter's update), usually multiple threads are competing with each other for accessing a shared variable, each of them tries to get into the critical section. Thus, tremendous amount of running time is wasted on lock contention (waiting queue of the lock is very long). We want to avoid this problem elegantly by changing the critical region from one memory access point to multiple, which decreases the lock contention very efficiently. Many threads are going to compete to be the one who updates the counter, but the winner thread is going to update the counter once for all those threads it was competing with. This is a better approach, because the traditional counting may allow multiple threads to access the same memory access point. However, only one worker is able to get into the critical region at a particular time. This in essence loses the advantage of concurrency, because in fact, all threads are still forced to do sequential operations in the critical region. The algorithm presented below allows each thread to concentrate on separate working fields, so they can start working at the same time without worrying about each other. This method can achieve very good throughput result despite the fact that one single operation maybe slower. In order to send bursts of commands to the counter, you will also implement a mechanism called Barrier. Note that this exercise (in contrast to the previous two) is going to be implemented in the User Mode (no kernel changes and compilations). 2 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) 2. General Description Counter interface A counter is just a signed integer that can hold any signed value allowed to signed integer. It can be increased in some value (+), decreased in some value (-) and just read. However, we want to truly allow those operations to happen simultaneously for number of threads. Assume we may have N threads, where N is some power of two and 2 ≤ N ≤ 1024 (210). void Initialize () Name Initialize Description Initializes the global counter to zero and also builds everything needed for synchronization. The function should not be concurrent. Input Parameters None Output Parameters None Comments This function should be called once, before any concurrent access to counter is allowed. void Destroy () Name Destroy Description Destroys and frees everything. The function should not be concurrent. Input Parameters None Output Parameters None Comments This function should be called once, when any concurrent access to the counter structure is no longer possible. int Increase (int value) Name Increase Description Adds the given value to the counter 3 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) Input Parameters A value which is greater than or equal to zero. Also, the value is less than or equal to 32768 (215). Output Parameters Old value of the counter Comments You can assume the input parameter is valid. This interface can be called in parallel with many others. The caller should return to user only after this operation is applied on the counter. int Decrease (int value) Name Decrease Description Subtracts the given value from the counter Input Parameters A value which is greater than or equal to zero. Also, the value is less than or equal to 32768 (215). Output Parameters Old value of the counter Comments You can assume the input parameter is valid. This interface can be called in parallel with many others. The caller should return to user only after this operation is applied on the counter. int Read () Name Read Description Returns a value of the counter correct for some point in time, between invocation of the Read() and the return time of the Read() Input Parameters None Output Parameters A counter value Comments This interface can be called in parallel with many others. Note that this interface is for your convenience only. We are not going to call the interface directly and we will use the input commands instead (as explained below). However, you are required to implement this interface as requested here. In our input commands we are not going to give you any input sequence that might cause counter overflow. 4 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) The complexity of the Initialize() operation should be O(N). The complexity of each of the remaining operation should be O(logN), where N is the number of threads that can use the counter. Pay attention that solution with a single lock is O(N), because all threads can wait in the waiting queue. Such solution is unacceptable and will be graded to zero. Executing Commands You receive, through the standard input, several bursts of commands. Every burst contains the increase, decrease and read commands. You should read every burst from the input as a whole, and then try and execute all of its commands simultaneously. For this matter, you should assign a new thread for each task, and destroy the threads upon completion of all commands in the burst. Notice that you must let the threads work simultaneously. Meaning, you should neither let the threads run one after another, nor let any of the threads breach the boundary of the burst. All threads should work in parallel inside the boundaries of a single burst and terminate before the next burst is read. (At which point, new threads will be allocated for the new burst). To implement the burst you will need a barrier. In general Barrier synchronization structure is initialized to n - the number of the threads it should wait for. Then each of the threads that should have been synchronized on this barrier should call barrier() method, which is blocking till the time when n threads reach the barrier (call barrier()). Then all n threads should be simultaneously allowed to run and can continue running in any order according to the scheduler. After that the barrier can be reinitialized to any value. The design should comply with the following guidelines: There should be a one thread (called main thread) that should read the burst from the standard input. The main thread creates exactly the needed amount of the threads (workers) that will process the commands in this burst. The main thread initiates the barrier and supply input to the threads in the burst (for example if the burst contains 7 commands then main thread need to prepare 7 threads). The worker threads wait till all the workers have been created (on barrier) and only after it, they start processing the command. For this you have to design a mechanism, called Barrier, explained above. After the command has been finished each worker prints the output to the standard output and ends. When all the worker threads has finished the main thread can start reading the next burst. 5 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) You should devise a synchronization mechanism that ensures the validity of the counter with most possible concurrency. Notice again that using a single lock for the list is not allowed – such a HW will receive 0 points automatically! Additionally you have to synchronize the input and the output as explained above and below. As an effect, your output will not always be deterministic. Suppose counter is zero and you have a burst with the following commands: (a) Increase (3) (b) Decrease (3) (c) Read() Then it is possible that either (a) gets the lead, and then (c) follows and should return “3”, or that (b) gets the lead, and then (c) follows and should return “-3”. Finally, (c) may get the lead (or come last), in which case it should return “0”. However if we have: Increase (3) Decrease (3) Barrier Read() Then the Read() should always return 0. 6 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) 3. Detailed description Your program will consist of a number of threads: 1. The main thread, which reads the bursts from the input, and allocates the threads for each command in a burst, and waits upon their termination before it reads another burst. 2. The worker threads allocated for their sole commands. The threads must follow these rules: No communication is allowed between the main thread and the worker threads other than initialization, and notification of completion. Only the main thread should access the standard input. The threads must not communicate with each other beside the synchronization mechanism defined below. Synchronization Mechanism: Combining Tree A combining tree is a synchronization structure used to allow multiple threads to access the critical section via different access points on the leaves. The threads are progressing from the bottom up. In addition, the thread which is able to progress up in the combining tree is requested to carry not only its own update, but also an update of the threads that remained behind (if there were such threads). A combining tree is a complete binary tree data structure having N/2 leaves, where N is the maximal possible number of threads. Again, instead of having one memory access point, that becomes a performance bottleneck when concurrency increases, a combining tree will have many smaller access points. Each working thread is assigned to a leaf node and each leaf node can have at most 2 threads been assigned to it. The counter variable can be updated by a thread that can progress after passing the root of the tree. Figure 1 below shows a combining tree with 4 leaf nodes, which can be concurrently accessed by maximum 8 threads. In the combining tree, if a thread wants to increase the counter, it starts from its leaf node and works its way up the tree to the root. If two threads reach a node at approximately the same time, the first arrived thread becomes the active/winner thread and the second arrived thread becomes the passive thread. The active thread will combine its own update request with the passive thread’s request and carry the combined requests up to the next level while the passive thread waits for the active thread to return with fetched result. An active thread might become a passive thread later while climbing up the tree. When a thread reaches the root of the tree, it will update the counter’s value and fetch the old value and pass this value down the tree. Thus if multiple threads are updating the counter at approximately the same time, the maximum number of threads that will compete for accessing the counter variable itself is 2, no matter how many threads are making increment/decrement requests. In this way, combining 7 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) tree distributes a single access point among all the nodes in the tree and each node becomes a much smaller memory access point, thus reduces memory contention. Figure 1: General presentation of a combining tree. The root, on which, winners from AA and BB are going to compete for updating the counter itself. A node AA, on which, winners from A and B are going to compete. Each winner may carry update for two threads. A node A, on which, threads with ID 0 and 1 can compete. If T0 doesn't arrive T1 can continue up with only its update. A node BB, on which, winners from C and D can compete. If a competitor is late, the first one should ensure the second will wait for the first to get down. A node B, on which, threads with ID 2 and 3 are going to compete. A node C, on which, threads with ID 4 and 5 are going to compete (if they arrive closely in the same time). A node D, on which, threads with ID 6 and 7 are going to compete. More explanations: The combining of the operations works as following: 1. A thread Ti arrives at a leaf assigned to this thread and declares the update it needs to do. 2. After that, Ti needs to compete with another thread that possibly also wants to start from this leaf. Let assume thread Tj wins the competition (Ti continues to wait). 3. Tj now continues up to the root, but it needs to promote the combined update. For example, if Ti declared that it wants to do +8 and Tj wanted to do -7, than the winner (Tj) now needs to continue and to do +1. 4. Tj does similarly from Step 1 just working on the next node and not on the leaf. The thread that looses the competition (Ti) has two options: 1. Ti's update was taken by the winner: so Ti now needs to wait for Tj's return and return to the user. 8 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) 2. Ti's update wasn't taken by the winner, because Ti arrived too late: so Ti now needs to wait for Tj's return (anyway), but then to enter into the competition with someone else and probably win or at least this time its update will be surely taken. A specific state of the combining tree when the threads are still climbing up is presented in Figure 2 below. Pay attention, that on the winner's way back, all threads waiting for the winner to get back with their updates are getting the same old counter value found by the root's winner. Also note that in purpose we do not give you all the details; you should think about rest of the details yourself. Figure 2: Specific presentation of a combining tree. T1 (with +6 update) and T7 (with -2 update) arrived here simultaneously. T7 won the competition and is updating the counter with +4. T1 should later realize that its update is already done. Threads T1 (with +1 update) and T2 (with +5 update) arrived here simultaneously. T1 won the competition and should continue with +6 update. Threads T0 (with +6 update) and T1 (with -5 update) arrived here simultaneously. T1 won the competition and should continue with +1 update. Thread T7 arrived with -2 update and found no competitor. T7 should continue with -2 update and ensure that another competitor is not going to get up before T7 returns. Thread T2 arrived with +5 update and found no competitor. T2 should continue with +5 update and ensure that T3 is not going to get up before T2 returns. 9 None arrived here. T6 arrived here (with +3) after T7 had already left the node with -2 update. T6 should wait here till T7 returns. Operating Systems (234123) – Spring-2013 (Homework 3 Wet) Format of Input & Output The input will be as follows: BEGIN COMMAND 1 COMMAND 2 …. BARRIER COMMAND 1 COMMAND 2 ….. BARRIER END The input always begins with “BEGIN” and ends with “END”. The “BARRIER” commands separate the different bursts. Between “BARRIER” commands, all commands must be executed simultaneously, and the bursts must be executed sequentially. The input lines are separated with “\n” characters. The size of each burst doesn't have to be equal. There can be "an empty burst" where two “BARRIER” commands come one after the other. When each command is executed, a line is written to the output according to the result. The line always consists of the original command (that includes the “BARRIER” command), and the result in the following manner: Input: Output: COMMAND 1 COMMAND 1->RESULT The possible commands and their return values are formatted as follows: 1. “INCREASE #value” – Increasing the counter in the given value. The result is the old counter value. 2. “DECREASE #value” – Decreasing the counter in the given value. The result is the old counter value. 3. “READ” – Reading the counter value. The result is the counter value. 10 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) An example of a possible output: Input Output BEGIN BEGIN INCREASE 3 INCREASE 3->0 INCREASE 5 INCREASE 5->3 INCREASE 2 INCREASE 2->3 READ READ->8 BARRIER BARRIER INCREASE 3 READ->10 DECREASE 5 INCREASE 3->10 READ DECREASE 5->10 READ READ->8 END END Notice that you have to synchronize the output! When, for example, two threads are done and want to print the " READ->8" and " INCREASE 2->0" they have to be able to print them separately without any mixture of characters. Of course, the order of the output might be different from the real order of the input inside any burst. Note that when writing the output line you should add no spaces, other then those you read as part of the command. Finally, you need to compile your program to the executable named “conc_cnt”, but it needs to get a parameter: N – which is the maximal number of thread possibly to be used by this program (it may be that entire N threads will never be used simultaneously). 11 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) Remarks We are going to use special software to check any act of cheating! Be sure you create everything yourself. The assignment should be implemented in C and should work in the VMWare running Red Hat Linux 8.0. You should use only pthread library to work with threads. Carefully design the synchronization so it allows the best parallelism possible, and prevent deadlocks. If your implementation becomes deadlocked, then the penalty in the grade for the whole assignment will be 30 points! You have to submit a detailed description of your algorithm. Make sure that you allow maximum parallelism otherwise you will lose points. In addition, don't forget to write a description of the Barrier, the tree, and the description of the whole system (pay special attention to simultaneous command execution mechanism and the output printing). The suggested work plan: start with creating a design! Think how you will implement everything before starting the coding! Later continue with implementation of the barrier mechanism, then the tree, and only after testing these things start implementing the main thread (parsing of the input). We also strongly suggest you to use asserts and/or defensive checks for easy debugging. Pay attention that when you submit the HW, in your Makefile you will not compile the program for debugging, thus asserts will not slow down the execution. On the other hand it is a very powerful tool for debugging the concurrent programs. Don't print anything except the required output. 12 Operating Systems (234123) – Spring-2013 (Homework 3 Wet) 4. Submission You should electronically submit a zip file that contains the source files and the Makefile. Its name should be “Makefile”. The makefile will create an executable with name “conc_cnt”. Note that this Makefile should compile your whole code and not only the dispatcher. You should submit a printed detailed design of your program, including explanations on the chosen algorithms, synchronization mechanisms, etc… You should also submit a printed version of the source code. A file named submitters.txt which includes the ID, name and email of the participating students. The following format should be used: Bill Gates bill@t2.technion.ac.il 123456789 Linus Torvalds linus@gmail.com 234567890 Steve Jobs jobs@os_is_best.com 345678901 Important Note: Make the outlined zip structure exactly. In particular, the zip should contain only the following files (no subdirectories): zipfile -+ | +- all your source/header files | +- Makefile | +- submitters.txt | +- documentation.pdf ,בהצלחה צוות הקורס 13