hw3_wet

advertisement
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
Homework 3 Wet
Due date: Sunday, 9/06/2013 12:30 noon
Teaching assistants in charge:

Anastasia Braginsky
Important: this semester
forum only. Please note,
clarification/correction
of guidelines to use the




the Q&A for the exercise will take place at a public
the forum is a part of the exercise, any
that will be published in the forum is a *MUST*. A number
forum:
Read previous Q&A carefully before asking the question; repeated questions
will probably go without answers
Be polite, remember that course staff does this as a service for the
students
You’re not allowed to post any kind of solution and/or source code in the
forum as a hint for other students; In case you feel that you have to
discuss such a matter, please come to the reception hour
When posting questions regarding hw3, put them in the hw3 folder
Note: Start working on the assignment as soon as possible!!! This assignment involves algorithmic
design, a lot of code writing, and extensive testing. There will be no postponement.
1
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
1. Introduction
Welcome to the world of concurrent data structures! 
Counting is one of the very basic and natural activities that computers do. However, for concurrent
programs running on multi-core/multi-processor machines (particularly, on very large shared-memory
processors), designing a counting algorithm that scales well is not an easy task. We are going to give
you an idea of how to exploit the parallelism for a counter updates, but you will need to think yourself
about many synchronization details. We need concrete and correct results.
Pay attention, that concurrent counting should take care of the lock contention. In traditional ways of
counting (just lock around the counter's update), usually multiple threads are competing with each
other for accessing a shared variable, each of them tries to get into the critical section. Thus,
tremendous amount of running time is wasted on lock contention (waiting queue of the lock is very
long). We want to avoid this problem elegantly by changing the critical region from one memory
access point to multiple, which decreases the lock contention very efficiently. Many threads are going
to compete to be the one who updates the counter, but the winner thread is going to update the counter
once for all those threads it was competing with.
This is a better approach, because the traditional counting may allow multiple threads to access the
same memory access point. However, only one worker is able to get into the critical region at a
particular time. This in essence loses the advantage of concurrency, because in fact, all threads are still
forced to do sequential operations in the critical region. The algorithm presented below allows each
thread to concentrate on separate working fields, so they can start working at the same time without
worrying about each other. This method can achieve very good throughput result despite the fact that
one single operation maybe slower.
In order to send bursts of commands to the counter, you will also implement a mechanism called
Barrier. Note that this exercise (in contrast to the previous two) is going to be implemented in the User
Mode (no kernel changes and compilations).
2
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
2. General Description
Counter interface
A counter is just a signed integer that can hold any signed value allowed to signed integer. It can be
increased in some value (+), decreased in some value (-) and just read. However, we want to truly
allow those operations to happen simultaneously for number of threads. Assume we may have N
threads, where N is some power of two and 2 ≤ N ≤ 1024 (210).
void Initialize ()
Name
Initialize
Description
Initializes the global counter to zero and also builds everything needed for
synchronization. The function should not be concurrent.
Input Parameters
None
Output Parameters
None
Comments
This function should be called once, before any concurrent access to counter is
allowed.
void Destroy ()
Name
Destroy
Description
Destroys and frees everything. The function should not be concurrent.
Input Parameters
None
Output Parameters
None
Comments
This function should be called once, when any concurrent access to the counter
structure is no longer possible.
int Increase (int value)
Name
Increase
Description
Adds the given value to the counter
3
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
Input Parameters
A value which is greater than or equal to zero. Also, the value is less than or equal
to 32768 (215).
Output Parameters
Old value of the counter
Comments
You can assume the input parameter is valid. This interface can be called in parallel
with many others. The caller should return to user only after this operation is
applied on the counter.
int Decrease (int value)
Name
Decrease
Description
Subtracts the given value from the counter
Input Parameters
A value which is greater than or equal to zero. Also, the value is less than or equal
to 32768 (215).
Output Parameters
Old value of the counter
Comments
You can assume the input parameter is valid. This interface can be called in parallel
with many others. The caller should return to user only after this operation is
applied on the counter.
int Read ()
Name
Read
Description
Returns a value of the counter correct for some point in time, between invocation of
the Read() and the return time of the Read()
Input Parameters
None
Output Parameters
A counter value
Comments
This interface can be called in parallel with many others.
Note that this interface is for your convenience only. We are not going to call the interface directly and
we will use the input commands instead (as explained below). However, you are required to implement
this interface as requested here. In our input commands we are not going to give you any input
sequence that might cause counter overflow.
4
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
The complexity of the Initialize() operation should be O(N). The complexity of each of the remaining
operation should be O(logN), where N is the number of threads that can use the counter. Pay attention
that solution with a single lock is O(N), because all threads can wait in the waiting queue. Such
solution is unacceptable and will be graded to zero.
Executing Commands
You receive, through the standard input, several bursts of commands. Every burst contains the
increase, decrease and read commands.
You should read every burst from the input as a whole, and then try and execute all of its commands
simultaneously. For this matter, you should assign a new thread for each task, and destroy the threads
upon completion of all commands in the burst. Notice that you must let the threads work
simultaneously. Meaning, you should neither let the threads run one after another, nor let any of the
threads breach the boundary of the burst. All threads should work in parallel inside the boundaries of a
single burst and terminate before the next burst is read. (At which point, new threads will be allocated
for the new burst). To implement the burst you will need a barrier.
In general Barrier synchronization structure is initialized to n - the number of the threads it should
wait for. Then each of the threads that should have been synchronized on this barrier should call
barrier() method, which is blocking till the time when n threads reach the barrier (call barrier()). Then
all n threads should be simultaneously allowed to run and can continue running in any order according
to the scheduler. After that the barrier can be reinitialized to any value.
The design should comply with the following guidelines:
 There should be a one thread (called main thread) that should read the burst from the standard
input.
 The main thread creates exactly the needed amount of the threads (workers) that will process
the commands in this burst.
 The main thread initiates the barrier and supply input to the threads in the burst (for example if
the burst contains 7 commands then main thread need to prepare 7 threads).
 The worker threads wait till all the workers have been created (on barrier) and only after it, they
start processing the command. For this you have to design a mechanism, called Barrier,
explained above.
 After the command has been finished each worker prints the output to the standard output and
ends.
 When all the worker threads has finished the main thread can start reading the next burst.
5
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
You should devise a synchronization mechanism that ensures the validity of the counter with most
possible concurrency. Notice again that using a single lock for the list is not allowed – such a HW
will receive 0 points automatically! Additionally you have to synchronize the input and the output as
explained above and below.
As an effect, your output will not always be deterministic. Suppose counter is zero and you have a
burst with the following commands:
(a) Increase (3)
(b) Decrease (3)
(c) Read()
Then it is possible that either (a) gets the lead, and then (c) follows and should return “3”, or that (b)
gets the lead, and then (c) follows and should return “-3”. Finally, (c) may get the lead (or come last),
in which case it should return “0”. However if we have:
Increase (3)
Decrease (3)
Barrier
Read()
Then the Read() should always return 0.
6
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
3. Detailed description
Your program will consist of a number of threads:
1. The main thread, which reads the bursts from the input, and allocates the threads for each
command in a burst, and waits upon their termination before it reads another burst.
2. The worker threads allocated for their sole commands.
The threads must follow these rules:



No communication is allowed between the main thread and the worker threads other than
initialization, and notification of completion.
Only the main thread should access the standard input.
The threads must not communicate with each other beside the synchronization mechanism
defined below.
Synchronization Mechanism: Combining Tree
A combining tree is a synchronization structure used to allow multiple threads to access the critical
section via different access points on the leaves. The threads are progressing from the bottom up. In
addition, the thread which is able to progress up in the combining tree is requested to carry not only its
own update, but also an update of the threads that remained behind (if there were such threads). A
combining tree is a complete binary tree data structure having N/2 leaves, where N is the maximal
possible number of threads. Again, instead of having one memory access point, that becomes a
performance bottleneck when concurrency increases, a combining tree will have many smaller access
points.
Each working thread is assigned to a leaf node and each leaf node can have at most 2 threads been
assigned to it. The counter variable can be updated by a thread that can progress after passing the root
of the tree. Figure 1 below shows a combining tree with 4 leaf nodes, which can be concurrently
accessed by maximum 8 threads.
In the combining tree, if a thread wants to increase the counter, it starts from its leaf node and works its
way up the tree to the root. If two threads reach a node at approximately the same time, the first arrived
thread becomes the active/winner thread and the second arrived thread becomes the passive thread. The
active thread will combine its own update request with the passive thread’s request and carry the
combined requests up to the next level while the passive thread waits for the active thread to return
with fetched result. An active thread might become a passive thread later while climbing up the tree.
When a thread reaches the root of the tree, it will update the counter’s value and fetch the old value and
pass this value down the tree. Thus if multiple threads are updating the counter at approximately the
same time, the maximum number of threads that will compete for accessing the counter variable itself
is 2, no matter how many threads are making increment/decrement requests. In this way, combining
7
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
tree distributes a single access point among all the nodes in the tree and each node becomes a much
smaller memory access point, thus reduces memory contention.
Figure 1: General presentation
of a combining tree.
The root, on which, winners from
AA and BB are going to compete
for updating the counter itself.
A node AA, on which, winners from
A and B are going to compete. Each
winner may carry update for two
threads.
A node A, on which,
threads with ID 0 and 1 can
compete. If T0 doesn't
arrive T1 can continue up
with only its update.
A node BB, on which, winners from
C and D can compete. If a
competitor is late, the first one
should ensure the second will wait
for the first to get down.
A node B, on which,
threads with ID 2 and 3
are going to compete.
A node C, on which,
threads with ID 4 and 5
are going to compete (if
they arrive closely in the
same time).
A node D, on which,
threads with ID 6 and 7
are going to compete.
More explanations:
The combining of the operations works as following:
1. A thread Ti arrives at a leaf assigned to this thread and declares the update it needs to do.
2. After that, Ti needs to compete with another thread that possibly also wants to start from this
leaf. Let assume thread Tj wins the competition (Ti continues to wait).
3. Tj now continues up to the root, but it needs to promote the combined update. For example,
if Ti declared that it wants to do +8 and Tj wanted to do -7, than the winner (Tj) now needs to
continue and to do +1.
4. Tj does similarly from Step 1 just working on the next node and not on the leaf.
The thread that looses the competition (Ti) has two options:
1. Ti's update was taken by the winner: so Ti now needs to wait for Tj's return and return to the
user.
8
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
2. Ti's update wasn't taken by the winner, because Ti arrived too late: so Ti now needs to wait
for Tj's return (anyway), but then to enter into the competition with someone else and
probably win or at least this time its update will be surely taken.
A specific state of the combining tree when the threads are still climbing up is presented in Figure 2
below. Pay attention, that on the winner's way back, all threads waiting for the winner to get back with
their updates are getting the same old counter value found by the root's winner. Also note that in
purpose we do not give you all the details; you should think about rest of the details yourself.
Figure 2: Specific
presentation of a combining
tree.
T1 (with +6 update) and T7 (with -2
update) arrived here simultaneously. T7
won the competition and is updating the
counter with +4. T1 should later realize
that its update is already done.
Threads T1 (with +1 update) and
T2 (with +5 update) arrived here
simultaneously. T1 won the
competition and should continue
with +6 update.
Threads T0 (with +6
update) and T1 (with -5
update) arrived here
simultaneously. T1 won the
competition and should
continue with +1 update.
Thread T7 arrived with -2 update and
found no competitor. T7 should
continue with -2 update and ensure
that another competitor is not going
to get up before T7 returns.
Thread T2 arrived with +5
update and found no
competitor. T2 should
continue with +5 update and
ensure that T3 is not going to
get up before T2 returns.
9
None arrived here.
T6 arrived here (with +3)
after T7 had already left
the node with -2 update.
T6 should wait here till T7
returns.
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
Format of Input & Output
The input will be as follows:
BEGIN
COMMAND 1
COMMAND 2
….
BARRIER
COMMAND 1
COMMAND 2
…..
BARRIER
END
The input always begins with “BEGIN” and ends with “END”. The “BARRIER” commands separate
the different bursts. Between “BARRIER” commands, all commands must be executed
simultaneously, and the bursts must be executed sequentially. The input lines are separated with “\n”
characters. The size of each burst doesn't have to be equal. There can be "an empty burst" where two
“BARRIER” commands come one after the other.
When each command is executed, a line is written to the output according to the result. The line always
consists of the original command (that includes the “BARRIER” command), and the result in the
following manner:
Input:
Output:
COMMAND 1
COMMAND 1->RESULT
The possible commands and their return values are formatted as follows:
1. “INCREASE #value” – Increasing the counter in the given value. The result is the old counter
value.
2. “DECREASE #value” – Decreasing the counter in the given value. The result is the old counter
value.
3. “READ” – Reading the counter value. The result is the counter value.
10
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
An example of a possible output:
Input
Output
BEGIN
BEGIN
INCREASE 3
INCREASE 3->0
INCREASE 5
INCREASE 5->3
INCREASE 2
INCREASE 2->3
READ
READ->8
BARRIER
BARRIER
INCREASE 3
READ->10
DECREASE 5
INCREASE 3->10
READ
DECREASE 5->10
READ
READ->8
END
END
Notice that you have to synchronize the output! When, for example, two threads are done and want to
print the " READ->8" and " INCREASE 2->0" they have to be able to print them separately without
any mixture of characters. Of course, the order of the output might be different from the real order of
the input inside any burst.
Note that when writing the output line you should add no spaces, other then those you read as part of
the command.
Finally, you need to compile your program to the executable named “conc_cnt”, but it needs to get a
parameter: N – which is the maximal number of thread possibly to be used by this program (it may be
that entire N threads will never be used simultaneously).
11
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
Remarks

We are going to use special software to check any act of cheating! Be sure you create
everything yourself.

The assignment should be implemented in C and should work in the VMWare running Red Hat
Linux 8.0.

You should use only pthread library to work with threads.

Carefully design the synchronization so it allows the best parallelism possible, and prevent
deadlocks.

If your implementation becomes deadlocked, then the penalty in the grade for the whole
assignment will be 30 points!

You have to submit a detailed description of your algorithm. Make sure that you allow
maximum parallelism otherwise you will lose points. In addition, don't forget to write a
description of the Barrier, the tree, and the description of the whole system (pay special
attention to simultaneous command execution mechanism and the output printing).

The suggested work plan: start with creating a design! Think how you will implement
everything before starting the coding! Later continue with implementation of the barrier
mechanism, then the tree, and only after testing these things start implementing the main thread
(parsing of the input).

We also strongly suggest you to use asserts and/or defensive checks for easy debugging. Pay
attention that when you submit the HW, in your Makefile you will not compile the program for
debugging, thus asserts will not slow down the execution. On the other hand it is a very
powerful tool for debugging the concurrent programs.

Don't print anything except the required output.
12
Operating Systems (234123) – Spring-2013
(Homework 3 Wet)
4. Submission

You should electronically submit a zip file that contains the source files and the Makefile. Its
name should be “Makefile”. The makefile will create an executable with name “conc_cnt”.
Note that this Makefile should compile your whole code and not only the dispatcher.

You should submit a printed detailed design of your program, including explanations on the
chosen algorithms, synchronization mechanisms, etc…

You should also submit a printed version of the source code.

A file named submitters.txt which includes the ID, name and email of the participating
students. The following format should be used:
Bill Gates bill@t2.technion.ac.il 123456789
Linus Torvalds linus@gmail.com 234567890
Steve Jobs jobs@os_is_best.com 345678901
Important Note: Make the outlined zip structure exactly. In particular, the zip should contain only the
following files (no subdirectories):
zipfile -+
|
+- all your source/header files
|
+- Makefile
|
+- submitters.txt
|
+- documentation.pdf
,‫בהצלחה‬
‫צוות הקורס‬
13
Download