Title page Prelude .......................................................................................................................................... 4 Starting A Thread ........................................................................................................................ 5 Win32 CreateThread API ...................................................................................................... 5 Run-Time Library Issues ....................................................................................................... 8 Cutting it Down to Size ....................................................................................................... 10 Simplify the Signature ............................................................................................. 10 Passing Parameters .................................................................................................. 10 Returning Results — In brief ................................................................................ 23 Synchronize the Startup ......................................................................................... 23 Reuse It ..................................................................................................................... 29 A higher-level C++ model .................................................................................................. 31 The Launch Pad Model .......................................................................................... 31 The Launch Pad — Starts a New Thread ........................................................... 32 Mission Control — Controls The Background Process ................................... 33 The Rocket — Is The Background Process ........................................................ 33 Ending A Thread ....................................................................................................................... 34 Software Models ........................................................................................................................ 35 Classification of Threaded Code ........................................................................................ 35 OTO .......................................................................................................................... 35 OTT .......................................................................................................................... 36 R/W .......................................................................................................................... 36 Grp ............................................................................................................................ 37 FT .............................................................................................................................. 38 Applying Thread-Safety Classifications ............................................................................. 38 Serialization ............................................................................................................................ 39 Servers .................................................................................................................................... 39 Worker Threads .................................................................................................................... 39 GUI Threads ......................................................................................................................... 39 Synchronization Issues ............................................................................................................. 40 Race Conditions .................................................................................................................... 40 Deadlocks............................................................................................................................... 40 System and Library Details .................................................................................................. 40 Handles ..................................................................................................................... 40 Atomic Operations .................................................................................................................... 41 Simple Instructions ............................................................................................................... 41 The "Interlocked" suite ........................................................................................................ 41 Kernel Synchronization Objects ............................................................................................. 42 Common Info on Kernel Objects ...................................................................................... 42 Complete Guide to C++ and Threads under NT John M. Dlugosz Mutex ...................................................................................................................................... 43 Mutex Semantics ..................................................................................................... 44 Win32 Mutex API Summary ................................................................................. 45 Using Mutexes in C++ ........................................................................................... 46 Semaphore ............................................................................................................................. 50 Semaphore Semantics ............................................................................................. 50 Win32 Semaphore API Summary ......................................................................... 51 Using Semaphores in C++ .................................................................................... 52 Event....................................................................................................................................... 52 Event Semantics ...................................................................................................... 53 Win32 Event API Summary .................................................................................. 53 Using Events in C++ ............................................................................................. 54 Timer....................................................................................................................................... 55 Timer Semantics ...................................................................................................... 55 Win32 Timer API Summary .................................................................................. 55 Using Timer in C++ ............................................................................................... 58 Change Notification ............................................................................................................. 58 Semantics of File Change Notification Objects.................................................. 59 An alternative: ReadDirectoryChangesW ............................................................ 59 Semantics of Printer Change Notification Objects ............................................ 59 Change Object API Summary ............................................................................... 60 Other Kernel Handles .......................................................................................................... 62 Process and Thread ................................................................................................. 62 File ............................................................................................................................. 62 Console-input........................................................................................................... 63 Other Synchronization Primitives........................................................................................... 64 Win32 Critical Section .......................................................................................................... 64 Monitors ................................................................................................................................. 64 Condition Variables (used with Monitors) ........................................................................ 64 Spin Locks.............................................................................................................................. 64 Reader/Writer and Group Locks ....................................................................................... 64 Rendezvous ............................................................................................................................ 64 Conditional Semaphore........................................................................................................ 64 Waiting and Blocking ................................................................................................................ 65 Alertable States and APC's .................................................................................................. 65 Windows Messages ............................................................................................................... 65 A Survey of Wait Primitives ................................................................................................ 65 WaitForSingleObject and WaitForSingleObjectEx............................................ 65 3/7/2016 3:34:00 PM Page 2 John M. Dlugosz Complete Guide to C++ and Threads under NT WaitForMultipleObjects and WaitForMultipleObjectsEx ................................ 65 SignalObjectAndWait ............................................................................................. 66 MsgWaitForMultipleObjects and MsgWaitForMultipleObjectsEx ................. 66 Thread Priorities.................................................................................................................... 66 Communicating Between Threads and Processes ................................................................ 67 Anonymous Pipes ................................................................................................................. 67 Named Pipes.......................................................................................................................... 67 Mailslots.................................................................................................................................. 67 Sockets .................................................................................................................................... 67 Shared Memory ..................................................................................................................... 67 APC's ...................................................................................................................................... 67 Windows Messages ............................................................................................................... 67 Thread-Specific Data ................................................................................................................ 68 Overlapped I/O ........................................................................................................................ 69 Dynamic Link Libraries (DLL's) ............................................................................................. 69 Fibers ........................................................................................................................................... 71 Processes ..................................................................................................................................... 72 A C++ Threading Library........................................................................................................ 73 Page 3 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Prelude Include info on why use threads. 3/7/2016 3:34:00 PM Page 4 John M. Dlugosz Complete Guide to C++ and Threads under NT Starting A Thread The best place to start, I suppose, is at the beginning. The beginning of a thread I should say. There are three Win32 API functions that create a new thread, and we will look into them in detail. There are also library functions (the compiler’s, my own, and hopefully soon, your own) that start a thread by eventually calling one of the API functions. We will discuss their purpose for existing and when to use them. Once you know the raw API calls, more important advise concerns creating threads in a higher level meaning of the word. Besides just calling the actual function to start a new thread, you need to have code before and after that point to set things up and deal with the results. Furthermore, the newly born thread needs to cooperate in the process, too. We will go over all these details using C++ code, and point out all the pitfalls and present sound advise on the proper way to start threads in your program. Win32 CreateThread API The most primitive raw function is actually CreateRemoteThread. In NT4, CreateThread simply calls CreateRemoteThread with the current process as the first argument. Another way threads are created is with CreateProcess. Starting a whole new program also creates a single thread to run that program. We will concentrate on CreateThread. HANDLE CreateThread( SECURITY_ATTRIBUTES*, ulong stack_commit_size, THREAD_START_ROUTINE* thread_start, void* parameter, //passed to thread_start ulong flags, //flag, anyway. ulong* lpThreadId ); Where the thread_start parameter is: ulong __stdcall THREAD_START_ROUTINE (void* parameter); The first parameter specifies security attributes, and is the same as you find on any kernel object creation function. You only need to supply this is you need to give the new thread non-default attributes; namely, allow other users to obtain a handle to your thread. This process (and any other process owned by the same user) can access the thread by default, so normally NULL will do here. You can also use the security attributes argument to make the thread's handle inheritable. However, there is also a function (SetHandleInformation) to change the handle's attributes, so you don't need to mess with the complex security structures if Page 5 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz this is all you need. Security issues are covered under Named Pipes, page xx. Unless you are told otherwise, just use NULL for security attributes in all the functions presented. The second parameter is commonly called the stack size. However, its exact meaning is seldom understood. As it turns out, this value has nothing to do with the size of the stack, as far as your program logic is concerned. That is, changing this value won't affect how deep you can recurse without running out. Under Win32, the memory system allows reserved but uncommitted memory. A megabyte is reserved for the stack, and this parameter specifies how much is initially committed. As the stack grows deeper, more memory is automatically committed. Memory commitment is done in units of 1 page, where a page is typically 4K. If you specify a small value, even zero, you still get one page initially set up for you. If you overflow that one, the memory management system automatically adds another. Using a larger value for the stack parameter simply gives you a larger initial size. This can save the work of growing the stack later. But, under normal conditions, it takes an awful lot of function calls to chew up 4K, so the work of adding another page is insignificant compared to what it takes to need another page. Unless you have special needs, don't worry about this value—just use zero. Besides a potential efficiency concern, there is another case where this may be useful. If you precommit memory for your stack, you'll know you have enough virtual memory ahead of time, before your thread starts doing work. If you let it grow as it goes, you may run out of memory when the memory manager tries to expand the stack. There may be cases where the extra degree of robustness is necessary. Now, back to the total reserved capacity of one megabyte. This cannot be changed within the program. All thread stacks used by a process use the same size stack, specified as part of the process. You can override this value at link time. The start address is where the action is. Literally, that's where action takes place. This function to a thread what main is to a process. Throughout this book, this is called the thread-start function. The next parameter is a value that is passed as the parameter to the thread-start function. So, it's not surprising that the thread-start function takes an argument. This single argument is declared as a void*, with the intention that you can pass a pointer to anything, presumably a structure containing as much information as you need. The return value is an unsigned long, with the idea that the thread exit value will be a status code of some kind. Actually, any 32-bit value will do in both places. That's what the __stdcall modifier is for: This tells the compiler to use a certain calling convention to pass the parameter, 3/7/2016 3:34:00 PM Page 6 John M. Dlugosz Complete Guide to C++ and Threads under NT expect the return value, and clean up after the call. The calling convention in C++ programs can be changed on the command line or with pragmas or project options, so it's an excellent idea to always declare the thread-start function with this modifier. The generated code works just fine no matter what is passed and returned, as long as they are both 32 bits. But the C++ compiler is pickier, requiring that declarations on function pointers match exactly. So, don't do this: struct S; // not shown in the example unsigned long __stdcall foo (const S* param) { param->blah(); //… } //… later S value; unsigned long ID; //an output parameter ::CreateThread (0,0, &foo, &value, 0, &ID); It may work in C, but C++ is much stricter about type checking. The compiler will object that &foo is the wrong type. Instead, write it this way: unsigned long __stdcall foo (void* raw) { const S* param= static_cast<S*>(raw); param->blah(); //… } That is, always declare the thread-start function as taking a void* and returning a ulong. You can't even differ by a const keyword. Then, inside the function, declare what you really wanted and initialize from the void* raw parameter. The creation flags can be 0 or CREATE_SUSPENDED, which is 4. That's it; there is only one available flag. So why 4 instead of 1, or simply making it a bool instead? The same set of flags is used in the CreateProcess API function. It's just that only this one flag has any meaning when creating just a thread, not a whole new process. If the CREATE_SUSPENDED flag is given, then the thread is created but never scheduled to run. It's as if the thread-start function began with a call to SuspendThread. You can make it go with a call to ResumeThread. These functions are covered on page xx. If the CREATE_SUSPENDED flag is not used, then the new thread can start at any time. It may run for a while before the thread that called CreateThread (called the parent thread throughout this book, for simplicity) gets a time slice again. The last parameter is an "out" parameter which receives the thread ID of the created thread. The thread ID is a unique value that identifies this thread on the system. All Page 7 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz threads, regardless of which process they are running in, have unique ID's. The thread ID is rarely used for anything, so don't worry about it. The return value is a HANDLE to the thread. If the function fails, 0 is returned and additional information is available via GetLastError. Remember to always close the handle (with CloseHandle) when you don't need it anymore. Closing the handle does not make the thread stop. The handle is used to call other functions that deal with the thread, such as getting the exit value, waiting for the thread to finish, performing some kinds of inter-thread communication, and suspending the thread. If you don't plan on doing any of that stuff, it's perfectly OK for the parent to close the handle immediately after calling CreateThread. Even when the thread finishes (normally or abnormally), there is still a kernel object representing the thread. This object does not go away until the last handle to it is closed. Why would you want a handle to a valid object that represents a finished thread? To get the thread's exit value, to see if it's finished yet, to prevent the thread ID from being reused right away, or just because you like to leak memory by not closing all your handles. Run-Time Library Issues The ::CreateThread API call is how a thread is ultimately created. However, the compiler’s run-time library has its own functions you are supposed to use for this purpose. In Microsoft VC++ 4.2 _beginthread (startaddress, stack_size, argument) returns -1, not NULL, on error _beginthreadex called just like CreateThread and Borland C++ 5.0, _beginthread (startaddress, stack_size, argument) _beginthreadNT (startaddress, stack_size, argument, security, flags, id) Both compilers have a _beginthread function, which is a simplified form. It is fine for those typical cases when you don’t need the other three parameters. Note however that Microsoft’s form returns –1 for an error, rather than a 0 handle as with all the other functions under discussion. Microsoft’s _beginthreadex is the simplest, because it is called exactly like ::CreateThread. Borland’s version, _beginthreadNT, has the parameters in a different order. 3/7/2016 3:34:00 PM Page 8 John M. Dlugosz Complete Guide to C++ and Threads under NT Borland also has a declaration for _beginthreadex, which is an inline function that rearranges the parameters, when the preprocessor symbol __MFC_COMPAT__ is defined. These functions are provided as wrappers around the simple CreateThread API call so that the run-time library can do some work of its own every time a thread is created. In a future article, we’ll show you the proper way to do this in your own libraries, so the user is not required to call your special function when creating threads. The run-time library code does it automatically to some extent, so the perils of using ::CreateThread directly have been more than a little exaggerated. However, the automatic behavior is not perfect. Here’s the scoop: Under both compilers, the reason for this code is to allocate memory for thread-local copies of variables that are “global” in the run-time library. For example, the standard function strtok() needs to hold an internal state between calls. Under the multithreaded library, strtok() can be used on different threads at the same time without confusion, since each thread maintains a different internal variable. Under the Microsoft compiler, if the DLL version of the run-time library is used, there is absolutely no problem with using CreateThread directly. The thread-specific data is allocated the first time it is needed (if ever), and the DLL thread-detach code frees it again. On the other hand, if you link with the LIB version of the run-time library, the cleanup is never done so there is a small memory leak. The situation with Borland is similar. The thread-specific data structure is allocated the first time it is needed. However, the initialization is incomplete. If you call ::CreateThread directly and allow the thread data to be created on first use, a call to _ExceptInit is missed. If you throw something, you get an infinite recursion loop. Other than that, it works. So most of the time, you are better off using the run-time library’s version. One nit is that these functions return unsigned long instead of HANDLE, so you always have to use an explicit cast! Although similar in nature, there is one interesting difference between the two compiler’s implementations. The Microsoft version starts the thread suspended, then resumes after the handle and id are stored in their variables. With the Borland’s version, it is possible for the thread to check its internal value (as recorded in the run-time library’s variable) and see a wrong value for its handle or id, as the new thread may do that before the parent thread continues. Also, Microsoft’s code puts a try block around the thread start function, so an unhandled exception1 will cause just the thread to C++ exception, that is. The operating system's SEH is set up by the CreateThread function. In general, when I say “exception”, I mean a C++ exception. 1 Page 9 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz terminate. With Borland, an unhanded exception in a thread causes the entire program to abort. Cutting it Down to Size We've seen the most basic functions for creating a new thread. Now, just how is it best used in C++? Simplify the Signature There are six parameters to CreateThread, most of which are seldom used. In the vast majority of uses, one or two parameters is enough. So, I present an overloaded form that is easier to use: HANDLE CreateThread (THREAD_START_ROUTINE* start, void* parameter= 0) { ulong id; HANDLE retval= ::CreateThread (0,0, start, parameter, 0, &id); if (!retval) throw win_error (__FILE__, __LINE__, GetLastError()); return retval; } This requires one or two parameters only, and also automates error checking. In general, you can craft your own simpler form that does just what you need. Passing Parameters The thread-start function takes a single void* argument. This means that whatever information you really need to pass needs to be squirted through a single 32-bit value. Simple Casting (everything that fits, except pointers) If you only need to pass a single value, and that value fits in 32 bits, you can simply use a cast. To convert non-pointer types to/from a void*, use a reinterpret_cast. Old-style casts are deprecated, so I'll set a proper example and use the keyword casts. If you are unfamiliar with them, I'll tell you which kind of cast to use in which situation, cookbook style. ulong __stdcall thread_start (void* raw) { int value= reinterpret_cast<int>(raw); cout << "Thread got: " << value << endl; return 0; } void test1() { int value= 42; 3/7/2016 3:34:00 PM Page 10 John M. Dlugosz Complete Guide to C++ and Threads under NT HANDLE h= CreateThread (thread_start, reinterpret_cast<void*>(value)); waiton (h); CloseHandle(h); } Here, the simplified CreateThread is as mentioned above: it fills in all the arguments I don't care about, and does error checking. The waiton function waits for the thread to terminate. It simply calls WaitForSingleObject, but could be more elaborate. For example, it could allow the user to cancel the operation, or it could time-out after a while and deal with a hung thread. These issues will be covered in full in "Rejoining Threads", on page xx. For now, think top-down-design and trust the waiton function to do what it's supposed to. You can see the approach commonly used in Windows: a word is a word, any any integer or pointer will work. That's fine in assembly language, and cool enough in C, but in modern C++ it's something of a faux pas. In C++, it takes a leap of faith to know that an int value can be stuffed in a void* and subsequently retrieved unharmed. But we're not exactly talking about portable programs here—this code is specific to Win32 at the very least, and often specific to Windows NT. On these platforms, we know that this is a valid assumption. If it were not, a lot of things would break in the Windows header files, and that goes against Win32's creed of source level compatibility between supported platforms. void test2() { double value= 3.14159; void* r= (void*)value; void* p= reinterpret_cast<void*>(value); } The function test2 will not compile. The compiler will reject the idea of casting a double to a void*, by either syntax. However, the compiler is not objecting because I asked to stuff an 8-byte value into 4 bytes. Rather, C++ doesn't allow casting between pointers and floating point values. Casting a float, which is in fact 4 bytes, won't work either. Conversely, I can write the following: ulong __stdcall thread_start_3 (void* raw) { __int64 value= reinterpret_cast<__int64>(raw); cout << "Thread got: " << value << endl; return 0; } void test3() { __int64 value= 1; value <<= 60; Page 11 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz value += 42; cout << "original value is: " << value << endl; HANDLE h= CreateThread (thread_start_3, reinterpret_cast<void*>(value)); waiton (h); CloseHandle(h); } Is also trying to cast an 8-byte value into a void*, and the compiler takes it without complaint2. Run it, and I get this result: original value is: 1152921504606847018 Thread got: 42 Clearly, I did not get the same value out that I put in. Because of the way the value was constructed, I know I ended up with the low-order bytes only, with the high-order bytes discarded. So, beware of casting things into a void*. Do so only for integral types, bools, and enumerations, and only when they are 4 bytes or smaller. For other things that are small enough to fit, you will need to resort to trickery other than casting. A union is a handy way to go. For example, union trix4 { void* p; float f; char s[4]; }; ulong __stdcall thread_start_4a (void* raw) // floating point value -- rejected in test 2. { trix4 yipes= {raw}; cout << "Thread got: " << yipes.f << endl; return 0; } ulong __stdcall thread_start_4b (void* raw) // an array of 4 bytes packed as one value. { trix4 yipes= {raw}; cout << "Thread got: " << yipes.s << endl; return 0; } void test4() { trix4 yipes; yipes.f= 3.14159; Actually, the compiler did complain, not because of the casting. Microsoft's <iostream.h> doesn't have an output operator for the __int64 type. So I added my own to the test code. 2 3/7/2016 3:34:00 PM Page 12 John M. Dlugosz Complete Guide to C++ and Threads under NT cout << "original value is: " << yipes.f << endl; HANDLE h= CreateThread (thread_start_4a, yipes.p); waiton (h); CloseHandle(h); char short_message[4]= "hi!"; strcpy (yipes.s, short_message); cout << "original value is: " << short_message << endl; h= CreateThread (thread_start_4b, yipes.p); waiton (h); CloseHandle(h); } This program demonstrates that a floating point value was indeed squeezed through the void* interface. It also shows how to do the same thing with an array of one-byte values. Basically, anything that does in fact fit into 4 bytes will work with this technique. But never do anything this ugly in public3. It could be just the technique you wanted to avoid dynamic memory management or synchronization issues. That is, passing the needed data directly as the parameter has definite advantages over passing a pointer to the real data, due to lifetime issues of the thing being pointed to. But, don't write code like this—only write wrappers like this. Keep the actual ugly mechanism hidden inside nice functions. More on abstracting the marshalling of data later (page 16), and more on lifetime issues after that (page 23). Simple casting on pointers For pointers, anything can be implicitly converted to a void*. Unless it's const or volatile, that is, in which case it still needs another step. To get the value out again, use the static_cast keyword. To remove const (or volatile), use a const_cast. ulong __stdcall thread_start_5a (void* raw) { double* value= static_cast<double*>(raw); cout << "Thread got: " << *value << endl; return 0; } ulong __stdcall thread_start_5b (void* raw) { const char* s= static_cast<char*>(raw); cout << "Thread got: " << s << endl; return 0; } void test5() { double pi= 3.14159; Any more than you would pass a kidney stone in public—another case of restrictive interfaces you just have to deal with. 3 Page 13 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz const char message[]= "Hello World!"; HANDLE h= CreateThread (thread_start_5a, &pi); waiton (h); CloseHandle(h); h= CreateThread (thread_start_5b, const_cast<char*>(message)); waiton (h); CloseHandle(h); } For the double* in case 5a, the address of the variable was passed without any sort of casting. CreateThread (thread_start_5a, &pi) On the other side, the function uses a static_cast to reverse the process. Although both static_cast and reinterpret_cast are accepted by the compiler to do this, the static_cast is correct in this case. That's because a static_cast is used to perform any implicit conversion explicitly, and also to reverse one. The implicit conversion of double* to void* is really a static_cast that goes unmentioned. So you also should use a static_cast to reverse the process. For the const char* in case 5b, the compiler can't do it implicitly, objecting because of the const. const char* to const void* would be just fine, but const char* to plain void* is a no-no. Note that the current formal specification of C++ indicates that character string literals (stuff in double-quotes, such as "hello") are of type "array of const char", when they used to be "array of char", as they still are in C. Compilers will vary for some time, and old compilers will eventually be updated. So if you write static_cast<void*>("hello") your compiler may not catch it as an error, but upon upgrading your code will suddenly give warnings. So in the example I used a declared array (where I specify the type I want) rather than a string literal to avoid any confusion. This is the same technique you'll need on string literals, though contemporary compilers may not require it. Back to the point. The const_cast is used to strip off the const, and then the resulting (non-const) char* is allowed to implicitly turn into a void*. CreateThread (thread_start_5b, const_cast<char*>(message)) To get it back out, use static_cast, and leave off the const. The result of the cast (type char*) will be implicitly turned into a const char* when assigned to s. Now, the compiler will not complain if I declare s as a plain char* rather than a const char*. But, the calling function will be most annoyed if the thread modifies message! When squirting through the void* interface, you lose type information including any const/volatile attributes. It's up to you to correctly take out exactly what you put in. const char* s= static_cast<char*>(raw) 3/7/2016 3:34:00 PM Page 14 John M. Dlugosz Complete Guide to C++ and Threads under NT With pointers to class types (which includes structures and unions), there is another problem to watch out for. The following code looks pretty much like &pi from case 5a above: implicit conversion to send, and static_cast to recover. Case 6a works as expected, but case 6b prints the wrong answer. Depending on the exact code, it may give the right answer after all, crash, or call the wrong function! class A { int x; public: virtual void print() const { cout << x; } A (int x) : x(x) {} }; class B { int x; public: virtual void print() const { cout << x; } B (int x) : x(x) {} }; class C : public A, public B { public: C (int a, int b) : A(a), B(b) {} void print() const { A::print(); cout << ','; B::print(); } }; ulong __stdcall thread_start_6a (void* raw) { A* value= static_cast<A*>(raw); cout << "Thread got: "; value->print(); cout << endl; return 0; } ulong __stdcall thread_start_6b (void* raw) { B* value= static_cast<B*>(raw); cout << "Thread got: "; value->print(); cout << endl; return 0; } void test6() { C value (42,66); HANDLE h= CreateThread (thread_start_6a, &value); waiton (h); CloseHandle(h); h= CreateThread (thread_start_6b, &value); Page 15 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz waiton (h); CloseHandle(h); } See what's wrong? Let me reiterate a point: When squirting through the void* interface, you lose all type information. It's up to you to correctly take out exactly what you put in. In this example, the test code is putting in a C*, and taking out an A* or a B*. That is not the same type. If you've used C++ for any length of time, you might not think anything of it. After all, thanks to the "isa" rule, a pointer to a derived class can be used anywhere a pointer to the base is expected. In an ordinary function call, you can pass a C* to a function expecting a B*. But this is not an ordinary function call. By casting through the void*, type information is lost and the compiler can't figure out what was meant. For a normal function call, the compiler sees that a C* is standing in where a B* was expected, and generates proper code. Specifically, both base classes cannot be at the same address as the complete object of type C. So, at least one of those test functions is going to get the wrong pointer. Creating structures just for passing Unlike the reinterpret_cast or other tricks for passing non-pointer types, the static_cast on pointers is guaranteed to be correct. That is, any pointer can be converted into a void* and back again and you'll get out the same value you put in. The C++ specification says that this must be so. So, as a matter of principle, passing pointers to the real data is much less distasteful than coercing actual values into a void* representation. No matter what you need to pass, you can pass a pointer to it without problems. No worry about the value fitting into 4 bytes, or doing an end-run around the compiler's sense of duty to prevent you from making certain casts. If you really want to pass more than one value to the thread-start function, why not create a structure just for the purpose? This gets into the concept of marshalling, which means recognizing that this passing problem is a job in itself, and can be tackled separately from the real point of the function. Keep the marshalling code separate from the real functions Here is an example that demonstrates a couple new things at once. The function count counts from x to y stepping by z. This is written as an ordinary function, without worrying about threads at all. Write it, run it, test it, and then come back for more. void count (int x, int y, int z) { 3/7/2016 3:34:00 PM Page 16 John M. Dlugosz Complete Guide to C++ and Threads under NT for (int loop= x; loop <= y; loop+=z) { cout << loop << '\t' << flush; Sleep (250); //delay a quarter second } cout << endl; } void test7() { cout << "before call to count" << endl; count (1, 30, 2); cout << "after call to count" << endl; } Notice that count is a perfectly normal function, with none of that void pointer crud. The test7 function calls it, and you can see by running it that it blocks for three and a half seconds before continuing. What we really want is to count in the background, so that the calling function continues right away while counting proceeds asynchronously. The count function is no good as a thread-start function, since it doesn't have the right signature. To remedy this, write another function to wrap it. Introduce a structure for the express purpose of passing the various arguments. struct count_args { int low; int high; int step; }; ulong __stdcall count_thread (void* raw) { count_args* args= static_cast<count_args*>(raw); count (args->low, args->high, args->step); return 0; } void test8() { cout << "before call to count" << endl; count_args args= {1, 30, 2}; HANDLE h= CreateThread (count_thread, &args); cout << "after call to count" << endl; waiton (h); } The output from this version indicates that counting continues in the background: before call to count Page 17 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT after call to count 2 4 6 8 10 12 14 20 22 24 26 28 30 16 John M. Dlugosz 18 The function test8 does the now familiar work to force the arguments to fit. But, look again at count. It was not changed at all! The new function, count_thread, contains all the extra work required to recover the arguments, and no real code concerning the point of the function. Meanwhile, count contains the actual algorithm, and none of the extra work needed for threading. Sounds like a good idea: separate the parameter passing stuff from the algorithm logic. This suggests that the parameter passing stuff is a problem in itself. Indeed, once recognized as such, it deserves a name. This process of transporting parameters across some kind of boundary is called marshalling. So, abstract the marshalling code. With this new insight, we can see that test8 only comes half way to meeting this new criteria. The unpacking of the parameters is separated, but the packing up is not. Well, easy enough to fix. Or is it? Introducing another function to accomplish the other half gives us: HANDLE detached_count (int x, int y, int z) { // this doesn't work right. count_args args= {x,y,z}; return CreateThread (count_thread, &args); } void test9() { cout << "before call to count" << endl; HANDLE h= detached_count (1, 30, 2); cout << "after call to count" << endl; waiton (h); } Looks good at first inspection. The test9 function is about as simple as the original non-threaded version, with only the minimal of extra work needed to remember the handle and wait on the background thread. Only problem is, it doesn't work. The problem is our first introduction to lifetime issues, a general problem with asynchronous functions. The args variable is local to detached_count, so it vanishes when detached_count returns. Meanwhile, count is executing in the background after detached_count does its stuff. Oops. The next section will revisit this issue in depth. A simple way to make this work is to use the heap, not the stack. The parent thread allocates the structure, and the new thread gets rid of it only when it's finished. All the relevant code is within the marshalling functions, and neither count nor test10 care about this little implementation detail. 3/7/2016 3:34:00 PM Page 18 John M. Dlugosz Complete Guide to C++ and Threads under NT struct count_args_10 { int low; int high; int step; count_args_10 (int x, int y, int z) : low(x), high(y), step(z) {} }; ulong __stdcall count_thread_10 (void* raw) { count_args_10* args= static_cast<count_args_10*>(raw); count (args->low, args->high, args->step); delete args; return 0; } HANDLE detached_count_10 (int x, int y, int z) { count_args_10* args= new count_args_10 (x,y,z); return CreateThread (count_thread_10, args); } void test10() { cout << "before call to count" << endl; HANDLE h= detached_count_10 (1, 30, 2); cout << "after call to count" << endl; waiton (h); } You'll notice that the marshalling implementation consists of a structure and two functions. Hmm… shouldn't that ring a few bells for object-oriented programmers? It sure sounds like a class to me. Both functions could be member functions, so there is now only one "thing" out there, a class representing a detached counter. Let's take that idea and run with it. Given such an object, can we logically do other things with it? Waiting for it to finish ought to be a member as well, and that generalizes to a means of getting back results, as well. I'll illustrate by changing the count function to report the number of loop iterations performed. int count (int x, int y, int z) // revised — demonstrates return value too. { int looped= 0; for (int loop= x; loop <= y; loop+=z) { cout << loop << '\t' << flush; Sleep (250); //delay a quarter second ++looped; } cout << endl; return looped; } Page 19 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz class detached_counter_11 { int low; int high; int step; int result; static ulong __stdcall thread_start (void* raw); HANDLE h; public: detached_counter_11 (int x, int y, int z); int wait_for_result() const; }; ulong __stdcall detached_counter_11::thread_start (void* raw) { detached_counter_11* args= static_cast<detached_counter_11*>(raw); args->result= count (args->low, args->high, args->step); return 0; } detached_counter_11::detached_counter_11 (int x, int y, int z) : low(x), high(y), step(z) { h= CreateThread (thread_start, this); } int detached_counter_11::wait_for_result() const { waiton(h); return result; } void test11() { cout << "before call to count" << endl; detached_counter_11 backgrounder (1, 30, 2); cout << "after call to count" << endl; int result= backgrounder.wait_for_result(); cout << "thread finished. Result is " << result << endl; } In test11, an object represents the background job, and the private implementation of that object is the marshalling mechanism and the thread creation logic. It uses count, which is still an ordinary function which can be used independently of this class as well. An advantage of this method is that the parameter packing mechanism is encapsulated, without the lifetime problems seen in test9. To complete the concept, the destructor should automatically wait if the background computation has not finished yet. 3/7/2016 3:34:00 PM Page 20 John M. Dlugosz Complete Guide to C++ and Threads under NT That's one way to package up the pieces we've already presented. Here is another way. Instead of making it an object, make it a single function. The other pieces can be hidden from the user. This example requires multiple source files to demonstrate. Example 12 — header file void count (int x, int y, int z); HANDLE count_in_background (int x, int y, int z); Example 12 — implementation .cpp file #include "chapter_1_12.h" #include <iostream.h> namespace { //internal stuff struct thread_args { int low; int high; int step; thread_args (int x, int y, int z) : low(x), high(y), step(z) {} }; ulong __stdcall thread_start (void* raw) { thread_args* args= static_cast<thread_args*>(raw); count (args->low, args->high, args->step); delete args; return 0; } } // end of unnamed namespace /* /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ */ void count (int x, int y, int z) { for (int loop= x; loop <= y; loop+=z) { cout << loop << '\t' << flush; Sleep (250); //delay a quarter second } cout << endl; } HANDLE count_in_background (int x, int y, int z) { thread_args* args= new thread_args (x,y,z); return CreateThread (thread_start, args); Page 21 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz } Example 12 — main program file #include "chapter_1_12.h" #include <iostream.h> int main() { cout << "before call to count" << endl; HANDLE h= count_in_background (1, 30, 2); cout << "after call to count" << endl; waiton (h); cout << "thread finished." << endl; return 0; } end of Example 12 Here, the header file gives two functions—the regular one and a background version. In the implementation file, the background version uses the concepts presented in this chapter to launch the first function in its own thread. The fact that the second function is implemented in terms of the first, and how it manages to do so, is not present in the header. C++ Feature You'll notice that the pieces of the marshalling implementation, a function and a structure, are defined inside an unnamed namespace. This is because those items are used inside this source file only. Declaring the thread_start function as static would accomplish the same thing for the function, but there is no such thing as a static class. The names of all classes (including simple structures) have external linkage and must be unique in the program. Using an unnamed namespace is the only way to keep thread_args local to this translation unit. In practice, if you didn't protect it this way and then used the name thread_args in another file, the compiler+linker probably won't notice this violation of the "one definition rule". Nothing bad will happen for plain structures with no member functions if that struct is never used to specialize a template. Well, it might still mess up in really contrived situations, which is why the rule was changed so that all structure/class/union names have external linkage, even if they are simple and have no member functions. 3/7/2016 3:34:00 PM Page 22 John M. Dlugosz Complete Guide to C++ and Threads under NT Returning Results — In brief The previous section was all about passing parameters. But what about return values? A brief survey is in order before continuing. The concepts mentioned here will be presented in full later in the book. The background counting example is given information on what to do, and then has no need to communicate results back to the parent thread. Often, threads are not so automatous, but require communication with other threads, including the parent thread. There are several ways to return results from a thread to the parent. In test11 one of the parameters was an out-parameter. That is, the structure used to pass in the three counting parameters also was used to receive the result. Meanwhile, the parent has some way of knowing when the result is ready. In this case, when the thread is done, the result is ready. The background-counter object must live as long as the background thread. This concept can be generalized in two different directions. First, the method used by overlapped I/O in NT, is to specify where you want the result as one of the arguments, as well as providing some mechanism to signal "done". The second is to use a future object which represents the value and will not block until you actually read from it, assuming it hasn't finished in the mean time. Alternativly, a thread can use the thread's exit value to communicate information. This works best when communicating simple status information, such as "print job finished OK" vs. "print job terminated abnormally." In a more general case, a thread giving information to its parent is just another case of inter-thread communications. This can be done with callbacks (as seen with another form of overlapped I/O in NT), or data queues. This is the best mechanism for things modeled as a server loop. Synchronize the Startup In the test9 example program, we ran into a problem with the lifetime of the arguments passed into the new thread. This was worked-around in test10, but the general problem keeps popping up again and again in different guises. For example, consider a thread that creates a window. The parent thread later tries to post a message to that window. Most of the time it works, but sometimes it doesn't. Seems that the new thread might be slow, and the other thread tries to use the window before it's been created! In general, you are faced with a critical initialization phase problem. The parent creates a service or background object of some kind, and later tries to use it. How can it be sure that the newly created object is ready for use? In the case of objects, it's a good idea to Page 23 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz let the constructor finish (running asynchronously in the new thread) before the parent continues. To take a more abstract position, you want to specify a critical initialization phase in the new thread, so the parent thread waits for that to finish before continuing. The critical initialization is synchronous with the parent's call to create the thread. The two threads are not asynchronous until after this critical point is reached. In many cases, making the parent wait right away is overkill. But, it's simple, and addresses a wide variety of issues. Basically, the new thread's setup code is much easier to write if it doesn't have to consider multi-threading issues until it's all set up. Here is the general concept in pseudocode. void function (a,b,c); void function_in_background (a,b,c) { 1. pack up {a,b,c} into structure. 2. launch thread_start function in its own thread. 3. wait for "ready" event from new thread. 4. return. } ulong __stdcall thread_start (void* raw) { 1. unpack parameters. 2. signal "ready" to parent. 3. function (a,b,c); } The "ready" event is naturally modeled as an event flag, a synchronization primitive. They are discussed starting on page xx. Here, test13, is a demonstration using the background counter. Comparing this to test9, you'll find that it's the same principle, only without the bug. Simply adding the "ready" flag to the logic already present in test9 (which you recall is a straightforward attempt to move all the marshalling code out of the algorithm code), overcomes the synchronization problems. struct thread_args { int low; int high; int step; event_flag ready; thread_args (int x, int y, int z); }; thread_args::thread_args (int x, int y, int z) : low(x), high(y), step(z), ready(event_flag::manual_reset, false) {} 3/7/2016 3:34:00 PM Page 24 John M. Dlugosz Complete Guide to C++ and Threads under NT C++ Feature Some examples use an aggregate initializer for thread_args. That is, a list of values in braces. Here I use a constructor because the event_flag object needs constructor parameters. The aggregate form is a short cut so I don't have to write a constructor for a simple structure. Since thread_args now has a member that's a full-blown class, it's not so simple anymore. ulong __stdcall thread_start (void* raw) { // 1. unpack the arguments. thread_args* args= static_cast<thread_args*>(raw); const int low= args->low; const int high= args->high; const int step= args->step; // 2. signal "ready" to parent args->ready.set(); // 3. call the function count (low, high, step); return 0; } HANDLE count_in_background (int x, int y, int z) { thread_args args (x,y,z); HANDLE h= CreateThread (thread_start, &args); tasking::WaitForSingleObject (args.ready.h()); return h; } void test13() { cout << "before call to count" << endl; HANDLE h= count_in_background (1, 30, 2); cout << "after call to count" << endl; waiton (h); } The only reason I need critical initialization in test13 is to handle the lifetime of the arguments being passed in. However, in more complex cases handling the critical initialization phase like this really does make a difference, and passing the arguments easily is just another side benefit. Since many threads can benefit from this technique, why not use it as a standard model, even on simple cases? Launching threads around an existing normal function is fine for the case where you let that function complete in the background, and don't interact with that thread until it's Page 25 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz finished. But what about more complex cases? Specifically, this function model doesn't lend itself well to marking critical initialization phase. The function needs to be built with this "ready" indication in mind. So, it's not really a lonesome old function being endowed with its own thread of execution by totally independent code. But, the concept still holds that the function should be written as a normal function, with normal C++ parameters. The marshalling code (which squirts everything through the void*) and the mechanics of creating the thread should be isolated into other functions. The algorithm is still in a function that just deals with the algorithm, but the algorithm knows to signal "ready" at a certain point. This next example, test14, demonstrates the concept. Here, memory allocation represents the critical initialization phase, because it's the simplest resource to manipulate. Also, error checking and cleanup is not shown, to keep the example focused on the topic. void alg14 (int x, int y, event_flag& ready) { // phase 1 — setup int* array= new int[x]; ready.set(); // phase 2 — sustained background activity for (int loop= 0; loop < x; loop++) { array[loop]= y; cout << '.' << flush; Sleep (250); } } It might seem silly to protect the memory allocation in such a manner, but for other resources it could be an issue. For example, the new thread might open a file, and the parent thread would work incorrectly if it proceeded before the file existed. Sometimes this work must be done by the thread needing the resources, and cannot be done in advance before starting the thread. We will see examples of this later. The point is that the background process can be separated into two distinct phases. The critical initialization phase is completed before the parent thread proceeds, and the second phase then executes concurrently with the parent thread. Why you would need to put stuff in phase one will be apparent in real code. In order to signal the end of phase one, the function takes an event_flag as an argument. So, unlike the counter example, alg14 is aware that it's designed for background operation. But, there is no hint of the clunky marshalling code here. The rest of the example should be familiar by now. struct thread_args_14 { int size; 3/7/2016 3:34:00 PM Page 26 John M. Dlugosz Complete Guide to C++ and Threads under NT int val; event_flag ready; thread_args_14 (int x, int y); }; thread_args_14::thread_args_14 (int x, int y) : size(x), val(y), ready(event_flag::manual_reset, false) {} ulong __stdcall thread_start_14 (void* raw) { thread_args_14* args= static_cast<thread_args_14*>(raw); alg14 (args->size, args->val, args->ready); return 0; } HANDLE background_14 (int x, int y) { thread_args_14 args (x,y); HANDLE h= CreateThread (thread_start_14, &args); tasking::WaitForSingleObject (args.ready.h()); return h; } void test14() { cout << "before call to background task" << endl; HANDLE h= background_14 (30,'A'); cout << "after call to to background" << endl; waiton (h); } But look how much simpler the thread_start_14 function is. The critical initialization phase of alg14 subsumes any need for thread_start_14 to protect the critical initialization of its own. Specifically, there is no need to copy the arguments into local variables. Instead, the event flag is passed in, and the function itself signals when it's good and ready. Here is another variation on the idea. Instead of a single function split into two phases, why not have two different functions? A group of functions cooperating on a set of data is an object. So, we should be looking at a class with member functions. Initialization in C++ is the job of constructors, so phase one can be done by the constructor. Here is the general idea, in pseudocode: void start_server (a,b,c) { 1. pack up {a,b,c} into structure. 2. launch thread_start function in its own thread. Page 27 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz 3. wait for "ready" event from new thread. } ulong __stdcall thread_start (void* raw) { 1. unpack parameters. 2. server* p= new server (a,b,c); 3. signal "ready" to parent. 4. p->serving_loop(); //e.g. a message pump } This works best with servers, where the background thread gets orders from other threads rather than just performing a single task and ending. We'll see this in more detail starting on page xx. But for the sake of not being satisfied with just pseudocode, here is the same example coded with this technique. class server15 { const int size; int* data; public: server15 (int size); ~server15() { delete[] data; } void go (int value); }; server15::server15 (int size) : size(size), data(0) { data= new int[size]; } void server15::go (int value) { // phase 2 - sustained background activity for (int loop= 0; loop < size; loop++) { data[loop]= value; cout << '.' << flush; Sleep (250); } } typedef thread_args_14 thread_args_15; //no change ulong __stdcall thread_start_15 (void* raw) { thread_args_15* args= static_cast<thread_args_15*>(raw); server15* server= new server15 (args->size); const int val= args->val; //must save this args->ready.set(); server->go (val); delete server; return 0; 3/7/2016 3:34:00 PM Page 28 John M. Dlugosz Complete Guide to C++ and Threads under NT } HANDLE background_15 (int x, int y) { thread_args_15 args (x,y); HANDLE h= CreateThread (thread_start_15, &args); tasking::WaitForSingleObject (args.ready.h()); return h; } void test15() { cout << "before call to background task" << endl; HANDLE h= background_15 (30,'A'); cout << "after call to to background" << endl; waiton (h); } Notice that the go member is phase two only, while the whole constructor is assumed to be phase one. The thread-start function must save args->value in a local variable before signaling done, as once the signal is sent, the code must assume that args is no longer a valid object. If the value is to be passed to go, it must be saved earlier. The args->size passed to the constructor had no such problem, and this is the fundamental difference between phase one and phase two. Phase two is asynchronous with the parent thread, and phase one is not. Never underestimate the ramifications of this. Once the "ready" signal is sent, the code executes in a more hostile environment. Reuse It You may have noticed a strong similarity among the examples in this chapter. After all, the means to start a thread is being taught as a higher-level idiom, or "pattern" if you prefer. The point is to learn it in detail and then apply it over and over. So, can the compiler allow you to build reusable components to fill this role? Yes and no. The marshalling code is different every time, due to the different signatures in whatever background function you are trying to start. But the two piece solution of doing the marshalling separate from the caller and thread-start function sure looks like it ought to be reusable somehow. If thread-start functions always took the same arguments, it would be a simple matter, either using templates or function pointers. So, what we need is an element of uniformity. The previous sample, test15, offers a hint. The test11 sample offers hints, too. We want to separate out all the parameter packing and unpacking from the code that does the thread launching. So, put that in an object, which is supplied to the reusable launching code. Page 29 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Here is a solution that does the simplest case (that is, no critical-initialization phase). template <typename T> struct dummy { static ulong __stdcall launch_thread_helper (void* raw) { T* p= static_cast<T*>(raw); p->start(); return 0; } }; template <typename T> ratwin::types::HANDLE launch_thread (T& x) { ulong id; THREAD_START_ROUTINE start= &dummy<T>::launch_thread_helper; return CreateThread (id, start, &x); } C++ Feature The launch_thread_helper function is a static member of a dummy class, rather than simply being a template function. That is because the compilers don't support template functions that don't use all the template parameters in the function's argument list. Also, the address of the function is assigned to a temporary variable start rather than being used directly in the parameter list to CreateThread. This makes the lines smaller, but wasn't done in the name of clarity. Rather, Microsoft C++ couldn't handle it without the temporary. To use it, supply an object that has a start member. The data needed for the new thread will already be inside the object. class detached_counter_16 { int low; int high; int step; public: detached_counter_16 (int x, int y, int z) : low(x), high(y), step(z) {} void start() { count (low, high, step); } }; int main() { cout << "before call to count" << endl; detached_counter_16 backgrounder (1, 30, 2); HANDLE h= launch_thread (backgrounder); cout << "after call to count" << endl; 3/7/2016 3:34:00 PM Page 30 John M. Dlugosz Complete Guide to C++ and Threads under NT waiton (h); return 0; } The launch_thread facility calls backgrounder->start(). So, the code needs to provide such a function in a suitable object. It's pretty simple—start just calls the function I really wanted started. But, in order for start to do that, it has to know what data to pass to count. To accomplish this, the values are stored in members. To build that object, a constructor is used. Thanks to the use of templates, you have full creativity in exactly how you arrange this. As long as backgrounder->start(); is a legal statement, it works. Alternatively, you could do without templates if you had a starter abstract base class that declared start as a virtual function. Then, all starters would have to be derived from that base class, and that base is what launch_thread would take. The lifetime issues are the same as in test11. If you want backgrounder to represent the background task, it needs to outlive the thread. Or, if you want backgrounder to represent a launcher, you need to deal with the critical-initialization phase so the arguments can be saved within the thread before the launcher is destroyed. A higher-level C++ model The preceding discussions demonstrate a single general idea on how to start a thread in C++, with a large number of variations on the theme. It is possible to produce some reusable code to implement this model. With that as initial input, I have developed a higher level C++ abstraction of threads, complete with reusable support code. The Launch Pad Model I've always thought that the "obvious" approach was to model a background process as a C++ object. In my early home-brew multitasking systems, I used an object to represent the thread itself. Today with operating system support for threads, that object exists inside the kernel. It might seem that wrapping the HANDLE in a C++ object is the natural way to proceed. After all, that works fine with files, windows, and other resources. But this model has inherent flaws, and that prompted me to find a better approach. The result is my launch pad model, which uses three different abstractions to represent different aspects of the dynamic system. Consider the launch_thread template function above. Just what is the backgrounder object representing? It does not meaningfully represent the background process. In Page 31 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz fact, the object does not have to continue to exist once the new thread finishes its critical initialization phase. Clearly, this is something used to start an activity, not the activity itself. Contrast this with test11. There, the detached_counter_11 object represented the background activity. The object had to live as long as the thread, and other code could use the object in order to operate on the thread (in that example, getting the result of the calculation). But reading this abstraction into test11 has its flaws. The object has more data in it than the thread actually needs. Some of the values are only needed to start the thread, not to sustain it. An object that controls a thread should not have to carry around that extra baggage. Ah, what's that there? The object controls the thread. It is not the thread itself, it's a controller. The extra baggage indicates that two objects are needed—each fills a different role, and can have distinct lifetimes. But neither represents the background activity itself, and there are cases where such an abstraction is indeed useful. So that gives a grand total of three different objects, each representing a different concept. You can think of these rolls as the launch pad, mission control, and the rocket. cute picture here. The Launch Pad — Starts a New Thread Let's revisit test16 and the launch_thread function. The implementation of the backgrounder object bears a striking resemblance to a more general idiom called the Command Pattern4. The command object lets code pass around instructions to be executed by other code by making all such commands self-contained. This has other uses in programming, and will in fact find much use later in this book. So, let's adopt a standard command object, and rewrite test16 to use it. Here is the class under discussion: class command { public: virtual void execute() =0; void operator()() { execute(); } virtual ~command() {} }; Another flaw with the earlier example was that as a reusable mechanism, it's somewhat lacking in customizability. It assumes common defaults for lots of the thread creations settings, and in some cases you may want to change these. Instead of adding more 4 Put reference to the gang-of-four Patterns Book here. 3/7/2016 3:34:00 PM Page 32 John M. Dlugosz Complete Guide to C++ and Threads under NT forms of launch_thread that takes various other arguments, think objects. After all, we already decided that the launch pad should be an object, not a single function. If launching the thread is behavior of an instantiated launch pad, then it stands to reason that the launch pad can be configured (perhaps extensively) before use. Here is the general idea to give a feel for what is wanted. Assume that backgrounder is an instance of something derived from command, and is similar to its namesake in the earlier example. // simple case (as before) launch_pad launcher; //create with defaults launcher.launch (backgrounder); //use it. // more subtle use launch_pad slow_launcher; slow_launcher.priority (low); // … change other settings as desired. // … later … slow_launcher.launch (backgrounder); // … later still … slow_launcher.launch (backgrounder_2); A launch pad object can be created and configured. Then that launch pad can be used to launch any number of background activities. The launch pad is not the background activity—rather, it is something which starts the background activity. There is a clear separation of rolls between the background process itself and the launch pad. A launch pad can be a complex thing in itself. In a rocket launch, look at all the support structures, gantries, ground vehicles, and personnel involved. That's all to support the launch process, and is baggage that is left behind on the ground, never becoming part of the actual mission in space. Meanwhile, the launch pad can be used again as soon as the launch is finished. Many rockets in orbit could potentially trace their start back to the same launch pad. Here is a first cut of code to implement a launch pad, with our good old counter example as the payload. Mission Control — Controls The Background Process The Rocket — Is The Background Process Page 33 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Ending A Thread 3/7/2016 3:34:00 PM Page 34 John M. Dlugosz Complete Guide to C++ and Threads under NT Software Models Intro goes here Classification of Threaded Code I've designed a classification system to help document and understand threading issues associated with an object, class, or other system. These models act as mental shorthand, so that once a class is pegged it is well understood. Why go to the trouble of explaining (and hopefully documenting) the same basic thing over and over again? Instead, learn these common classifications and just refer to them. First, we need to specify what level of organization we are describing. Choose one of the following: I Instance G Group C Class S Subsystem Then use one of the following ranks: 1 (most restrictive) OTO One Thread Only 2 OTT One Thread at a Time Grp Grouped members R/W Reader/Writer Grouped FT Free Threaded 3 4 (least restrictive) OTO The most restrictive ranking, One Thread Only (OTO) generally covers things that were not designed with any thread safety in mind. It means that only one specific thread may manipulate the object (or system or whatever). If a class is documented as being I-OTO (read: Instance, One Thread Only), it means that whatever thread that created the object is the only thread which can subsequently use the object. All member function calls (which include the constructor and destructor) on a particular instance must use the same thread. However, different instances can use different threads without any issue. For example, if worker thread one creates object p1 and thread two creates object p2, then worker Page 35 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz thread one can call p1->foo and worker thread two can call p2->bar. It would be a violation of the specification for thread one to call p2->bar, or to otherwise use p2 in any manner. Note that the ranking documents the way a class is supposed to be used. If p1 and p2 automatically detect use in the wrong thread and handle them correctly, then the class would no longer be OTO (it would be FT). If p1 and p2 automatically detect the misuse in a debug compilation and throw an exception, then the OTO label still applies. Now I-OTO is pretty simple to achieve. Since a class's state is kept in member data and is separate from any other instance, each instance is independent on how they are used. Since OTO is the most restrictive, anything else the class's implementation calls will satisfy this. But, what if different instances share data? Consider the presence of a static data member. Two different instances use the same underlying value, so there is contention there. This would be classified as C-OTO, meaning the OTO rank applies across all instances of the class, rather than individually to each instance. But sometimes that is overkill. Suppose instances point to each other, such as in a linked list. If two nodes are deleted from the same list at the same time, the link pointers could be corrupted. But deleting nodes from different lists at the same time is not a problem. So, this gets a G prefix, for Group. You must specify what the group is. In this example, the group is all instances in the same collection. OTT A step up from OTO is OTT. All uses of an object (I-OTT) or class (C-OTT) must still be serialized, but it doesn't matter which thread makes the call. As long as only one thread of execution at a time exists inside the component, it works fine. The component doesn't have any special affinity for a particular thread. Again, this is fairly easy to achieve. Classes written using only pure C++ features would all be OTT, not OTO. A class fails being OTT, being demoted to OTO, only if it uses a component that is OTO. So where did the first such class come from? Through the use of system objects (such as window handles) that are OTO, and from the explicit use of thread-local storage. R/W The Read/Write rank is a special case of Grouped. But it's so common that it's worth having its own classification. This means that the component can be used in two different ways: If one thread is writing to the component, than no other thread can use the component. But if there 3/7/2016 3:34:00 PM Page 36 John M. Dlugosz Complete Guide to C++ and Threads under NT are no writers, than any number of threads can read from the component at the same time. For a ranking of I-R/W, the component is any instance. But all instances are independent, so writing to one object doesn't put any restrictions on what can be done with other objects. Rather, it just disallows any simultaneous access to the same object. For C-R/W, the component is the set of all instances of the class. Writing to one object means you can't use any other object at the same time. Most C++ code can qualify as R/W rank with just a little bit of attention during implementation. First of all, you can't use any components that are ranked as more restrictive. As far as C++ primitives go, reference counting and other data sharing among objects will cause a class to fail at being R/W safe. Grp More generally, you can describe groups of members and then describe the restrictions as resource limits. Here is how a R/W is actually a special case of Grp. Pretend we are describing a simple stack class. Start by describing the groups, and formally define a group by listing every operation that is a member of that group. Group A — Readers peek() const count() const Group B — Writers push (const T& value) pop() For readers and writers, its pretty simple to tell which members go where. Except for a few special cases, every const member is a reader, and all the others are writers. But sometimes the simple readers/writers division is too limited. Formally defined groups falls in a comfortable zone between total unorganizable chaos and the simplicity of treating all operations the same. The key to making Grp meaningful and understandable is to have groups that correspond to the high-level semantics of the class. For example, consider a double-ended queue (also known as a deque). It may be designed so that operations on one end can be performed independently from operations on the other end. You can do two things at once, provided the threads manipulate different sides. This would give us the following groups: Page 37 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Group A — one end push_front (const T& value) pop_front() clear() Group B — the other end push_back (const T& value) pop_front() clear() Notice that it's perfectly correct to have one operation appear in multiple groups. Now that the groups have been identified, you state the restrictions in terms of how many operations in each group can be executed at the same time. In this example, I simply have: { 1, 1 } Meaning that at any one time, I can have at most one member from group A and one member from group B going on at the same time. In the readers/writers example, this would be expressed as { n, 0 } { 0, 1 } any number of readers but no writers one writer alone That is, give multiple cardinality lines. If a particular mix of operations can satisfy any of the lines, the situation is legal. It is a good idea to give an informal description of what each line represents, too. The key to making Grp useful is to have informal group descriptions that are easily understood and are meaningful for the semantics of the class, plus have formal and rigorous (yet simple to produce) documentation on exactly what is being specified. FT The Free Threading rank is the most relaxed. Basically, anything goes. This is what many people think of when they say "thread safe". You can call any member at any time on any thread. That's not to say that it won't block, but rather that the class won't malfunction if it's called this way. There is no meaningful distinction between I-FT and C-FT, so this can simply be called "Free Threaded" without further qualifications. Applying Thread-Safety Classifications 3/7/2016 3:34:00 PM Page 38 John M. Dlugosz Complete Guide to C++ and Threads under NT Serialization Servers Worker Threads Contrast test16 with test11: there the object represented the background process. GUI Threads "Future" values Producer/Consumer and Pipelines Job threads (background tasks) Job threads are more independent than worker threads. They don't have to be "background". It's not getting orders from someone like a server, but is doing its own job as an independent program. Page 39 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Synchronization Issues Race Conditions Deadlocks System and Library Details Handles Explain how Kernel, USER, and GDI handles are all different, and how each relates to threads. 3/7/2016 3:34:00 PM Page 40 John M. Dlugosz Complete Guide to C++ and Threads under NT Atomic Operations Simple Instructions The "Interlocked" suite Page 41 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Kernel Synchronization Objects A synchronization object is something that allows a thread to suspend itself and be awakened when something interesting happens. The core of the operating system, which is responsible for scheduling threads, has built-in support for this via kernel objects. Other synchronization techniques can be built on top of these, but ultimately one of these primitives is used to communicate the intent to wait to the scheduler deep inside the operating system. The kernel synchronization objects provided by NT are known as Mutex, Mutant, Semaphore, Event, and Waitable Timer. The kernel object called a mutex is only used in kernel mode. What the Win32 API calls a Mutex is actually a Mutant kernel object. In this book, Mutex means the Win32 API Mutex. Mutex, Semaphore, and Event are a pretty diverse group, in the sense that each can do something the other's can't. Mutexes remember which thread owns them and allows recursive locks; Semaphores can count; and Events allow multiple threads to resume on the same cue, or provide a way to gate a set of waiting threads without counting them. Common Info on Kernel Objects In general kernel objects are managed the same way, so I'm only going to explain it once. A kernel object is system-wide, and may be used by any process. They are accessed by HANDLE's which are reference counted. A HANDLE is local to a process, but different HANDLEs in different processes can refer to the same kernel object. The object is destroyed when the last HANDLE is closed (via CloseHandle or implicitly by terminating the process). To acquire a HANDLE to a kernel object, you can duplicate an existing handle implicitly by inheriting it in a child process, or explicitly by calling DuplicateHandle (a function that has too many parameters for its own good). Besides that, you can call a function that begins with Create… to access a named object or create it if it doesn't already exist. If you want to access an existing one only without the possibility of creating it instead, call the corresponding function that begins with Open…. The security attributes is the first parameter. If you don't plan on using it to synchronize across processes, just use null. The name is a nul-terminated string. The designation tchar* here means it's either a char* or a wchar_t*, depending on whether you compiled for ANSI or UNICODE character sets. 3/7/2016 3:34:00 PM Page 42 John M. Dlugosz Complete Guide to C++ and Threads under NT The name can be up to 260 characters long, and may contain any characters except backslash ('\\'). Presumably it can't contain a nul ('\0') either, since that terminates the input string. Alternatively, you can supply a null pointer for the name, and the mutex is created without a name. [[[ so what's a name of "" (empty string) do? ]]] If the Create… function finds an existing object with the same name, the existing object is used and no object is created. If the existing object is of the wrong type, Create… returns null and GetLastError returns ERROR_INVALID_HANDLE. You can't have different kernel objects with the same name (e.g. a mutex and a semaphore). If the existing object is the proper kind of object, parameters in the Create… call that specify characteristics of the object are ignored. After all, the object already exists and it is taken as-is. In the security attributes parameter, the actual security information is ignored, but the inheritable flag is significant, as that applies to the new handle only rather than to the object itself. When Create… accesses an existing object, all rights are requested. If you want to specify access rights in detail, or if you want an existing object only and not a new object, use the corresponding Open… call. The Open… functions take three parameters. The first specifies the desired access to the object. Details of security are not covered in this book. If you are not interested in fine-tuning what can be done with the handle, use MUTEX_ALL_ACCESS, SEMAPHORE_ALL_ACCESS, etc. for the type of object you are opening. The middle parameter to Open… specifies whether you want the handle to be inheritable. The last parameter is the name of the object. If it is not found or it is the wrong type, the function indicates failure by returning false. Mutex A mutex models a resource that can be acquired by a single thread. Semaphores model resources that have multiple instances, but a mutex is more than just a binary semaphore. In general, a thread acquires a mutex before manipulating a shared resource. If the mutex is unused, the thread proceeds, and the mutex is marked as being in-use. If the mutex is already in use, the thread blocks. When the mutex is released, the waiting thread (or one waiting thread, if there are several) wakes up and is allowed to acquire the mutex. By writing your code so that a thread only manipulates a resource when it "has" the mutex, you will be assured that only one thread at a time manipulates the resource. Page 43 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Mutex Semantics Besides only allowing one lock at a time, the mutex remembers which thread locked it, and considers itself affiliated with that thread. There are several ramifications to this feature: The same thread must do the unlocking. The same thread can harmlessly acquire the same mutex again. Other threads can sense "abandonment". The most significant purpose for having mutexes remember who locked them is a feature known as recursive locks. For an illustration, consider an object that has a mutex among its member data. The member functions acquire the mutex before manipulating the object's state, to provide for thread safety in the object's use. That sounds simple, but consider what happens when one member calls another: void C::foo() { 1. acquire the mutex 2. do some calculations 3. calculations includes a call to bar() 7. release the mutex } void C::bar() { 4. acquire the mutex 5. do some calculations 6. release the mutex } Now consider what happens when foo is called. In step 1, the mutex is acquired. But later, in step 4, the mutex is acquired again. If mutexes were equivalent to binary semaphores, it would block since, strictly speaking, it's trying to acquire a mutex that is already in use. The thread would deadlock on itself! This is a common enough issue that a way to deal with it was supplied as a primitive in the OS. Because the mutex remembers which thread got it, it can realize that the same thread is asking again. So, in step 4, the line proceeds without blocking. When the thread asks for the mutex, instead of the system thinking "it's in use; gotta wait for it" it realizes "he already has it; proceed". Every lock has to be balanced with an unlock. That is, you don't want step 6 to indicate that this thread is done with the object. Instead, the mutex has to remember how many locks are pending and not release until the last lock has been removed. So, step 7 really releases the mutex. 3/7/2016 3:34:00 PM Page 44 John M. Dlugosz Complete Guide to C++ and Threads under NT Win32 does not document the range limit of the mutex's lock counter, nor document what might happen if you lock too many times for the implementation to handle. I investigated this through empirical testing in NT4 and an examination of the structures found in the DDK (device-driver development kit) headers. The signaled state of any kernel object is stored as a signed long. Besides being signaled or not signaled, the actual value means different things to different objects. This value is used for the recursive acquisition count in mutexes. So, the count is limited to 31 bits, the positive values representable in a long. If you acquire a mutex something over two billion times, you get the blue screen of death, as the kernel throws an unhandled exception. As a secondary benefit to keeping track of which thread locked a mutex, the system can perform error checking on the unlocking. The system insists that a mutex is released by the same thread that acquired it. So what happens if a thread terminates without releasing a mutex that it has acquired? For other synchronization objects, nothing in particular happens. But for mutexes, the system knows that the only thread that could have released the mutex didn't and never can. What happens is that the mutex is forcefully released, and a different code, to indicate that the mutex was abandoned, is returned to the waiting thread that acquires it. Win32 Mutex API Summary A mutex is created using CreateMutex or OpenMutex, and destroyed using CloseHandle. A mutex is acquired by using any of the Wait functions, described on page 65. A mutex is released using the ReleaseMutex function. The CreateMutex function HANDLE CreateMutex ( SECURITY_ATTRIBUTES*, bool grab_right_away, const tchar* name ); This returns a handle to a new or existing mutex, as explained on page 42. On error, it returns null and GetLastError provides more information. The grab_right_away parameter allows the mutex to be acquired immediately upon creation. It has the same result as: HANDLE h= CreateMutex (security, false, name); WaitForSingleObject (h,0); Page 45 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz but has the benefit of being an atomic operation. That is, no other thread is allowed to grab the mutex between the time it is created and the time WaitForSingleObject is executed. If the mutex has a name, it could happen. The ReleaseMutex function BOOL ReleaseMutex (HANDLE mutex); Nothing to it—call this some time after a mutex is acquired in order to release it. Using Mutexes in C++ Mutexes are used to guard usage of a shared resource, or to otherwise make sure that one thread at a time does something. Their proper use is to acquire the mutex, perform the operation that is to be protected, and then release the mutex again. Sounds simple? Well, look again. void do_something() { WaitForSingleObject (MX); // OK, do my work. int y= foo(3); cout << y << endl; // done with my work, punch out. ReleaseMutex (MX); } What's wrong with this simple code? The very nature of C++. It looks innocent enough until you discover that MX is never being released and your threads are hanging. What happens if foo throws an exception? What happens is the stack is unwound. Presumably a caller of do_something eventually handles the error, but ReleaseMutex is never called. Execution jumps from foo all the way to some catch block, and the remainder of do_something is skipped! Don't write code like this in C++. Remember that resource acquisition is best modeled as initialization. The region where MX is held should be tracked by using the lifetime of an object. A Simple Solution Here is a minimal annotated class to accomplish this. class locker { HANDLE h; public: explicit locker (HANDLE h); //locks ~locker(); //unlocks }; 3/7/2016 3:34:00 PM Page 46 John M. Dlugosz Complete Guide to C++ and Threads under NT locker::locker (HANDLE h) : h(h) { WaitForSingleObject (h, INFINITE); // >> in real code, error checking would go here. cout << "acquired mutex " << h << endl; } locker::~locker() { ReleaseMutex (h); // >> in real code, error checking would go here. cout << "released mutex " << h << endl; } C++ Feature The explicit keyword on the constructor is used to prevent this constructor from providing an implicit conversion. That is, I can't accidentally use a HANDLE where a locker was expected. Here is the do_something example using this locker class. HANDLE MX= CreateMutex (0, false, 0); int foo (int x) { if (x==3) throw "invalid argument to foo"; return 1000 / (x-3); } void do_something() { locker _ (MX); int y= foo(3); cout << y << endl; // releasing MX here is implicit. } int main() { try { do_something(); } catch (const char* message) { cout << "exception caught: " << message << endl; } return 0; } Page 47 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz A locker variable is declared, and while it is in scope, the thread holds MX. What is the name of the variable? I don't use it except to construct it, so what do I need with a name? C++ doesn't provide anonymous variables, so I just used a throw-away name. An underscore by itself is just as valid of a name as an x by itself. However, don't use names that contain two consecutive underscores. Eliminating the need to explicitly release the mutex is not just cosmetic, and not just a convenience to encourage multiple return's from the middle of a function. Due to exception handling, it's necessary. Don't even think about using mutex's without destructor semantics. This program outputs (handle id will vary from run to run): acquired mutex 0x0000004C released mutex 0x0000004C exception caught: invalid argument to foo So you can see that the mutex was properly released even though do_something did not complete normally. There is no need for special clean-up code, as it is written in a proper exception-safe manner. Limitations of the Locker Class Windows provides the ability to wait on more than one thing at a time. This is a very powerful feature to have built into the operating system at a low level, simply because it's difficult to do well using simpler primitives. Waiting for the first available of several synchronization objects, some of which are mutexes, is not an elegant thing to do in C++. Consider the pseudocode: mutex A; mutex B; wait for A or B if (A was acquired) { do something release A } else { do something else release B } If you want to handle situations like this, you need to separate the multiple-wait ability from the creation of the locker. That is, one branch needs a locker on A, and the other branch needs a locker on B. At least it's not spaghetti—there is still some organization in that the structure matches the scope of two distinct lockers. I'm sure you realize that it could be far worse. Don't use mutexes as events. Use them only with resource acquisition semantics. 3/7/2016 3:34:00 PM Page 48 John M. Dlugosz Complete Guide to C++ and Threads under NT Waiting on multiple mutexes and then switching on the result is something that is better done using other means. Specifically, look at a server loop using callback routines instead. See page xx for an example using background I/O. On the other hand, it does make sense to wait for multiple things when exactly one of them is a mutex. For example, code may need to use some resource. But while it waits for access, the user can click on a cancel button, or the wait can time-out. This is easy to model in C++ by using exceptions. Constructing a locker represents resource acquisition. Failing to acquire the resource (because the user clicked cancel, or because it got tired of waiting) is failure to construct. Locker's constructor doesn't return, but throws an exception instead. try { locker _ (construct on mutex A, or abort if user clicks cancel before A is available); // if code got this far, _ is constructed, resource is acquired do something // implicit release of A as locker goes out of scope } catch (cancelled) { recovery code } Often, such code does not need an explicit try block. Rather, a single try block is used around the entire operation. Any time a resource is acquired, it waits on the cancel button as well. That way, the user can always quit when the operation is blocked. Another complex wait scenario is to wait on multiple mutexes at the same time, where you want all of them. This can be done by having the locker object take an array of mutexes. Construction succeeds only when all are acquired, and the destructor knows to free all of them. This is semantically different from acquiring one resource and then another. Consider: void foo() { locker L1 (A); locker L2 (B); // do stuff… } void bar() { locker L2 (B); locker L1 (A); // do stuff } Here, foo and bar can deadlock if called at the same time. This particular case can be fixed by using a consistent ordering to resource acquisition, as explained on page xx Page 49 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz under deadlocks. But, there are still issues involved with sequential acquisition of resources. In general, you should avoid holding one thing while waiting on something else. If you could instead write: void foo() { locker L (A,B); // do stuff… } void bar() { locker L (B,A); // do stuff } such that A and B are simultaneously acquired, you will be better off. Windows supports a primitive to do this, such that A and B are both acquired at the same time, or neither A nor B is held while it is waiting. It's all or nothing, not a piecemeal acquisition. So, reasonable and proper uses of mutexes can be structured using a locker class, where the class supports a list of mutexes (1 is a degenerate case), and other non-mutex synchronization objects which abort the construction of the locker. A full blown solution, in the C++ Threading Library, is presented on page xx. Semaphore Semaphores can model resources that have multiple instances. For example, a public restroom can handle three patrons at a time. The line outside the door are the people "blocked" on acquiring the resource. Rather than modeling each seat as a separate resource, the entire group of three is controlled by a single semaphore. But as pointed out earlier, a mutex is not just a binary semaphore. The mutex has to be released by the same thread that acquired it. The semaphore has no such restrictions, so something can be acquired by one thread and released by another. No, that's not only caused by serious pasta in the design—think of "consume" instead of "acquire", and "produce" instead of "release", and you can see another role for semaphores in modern C++ programs. Semaphore Semantics When used to model resource acquisition, like a mutex, acquiring the semaphore means that the resource may be used. However, it does not track the individual identity of a pool of resources. Think of a turn-style on the door of the public rest room. It is preset with the capacity, of three in this case. Passing through the door turns the crank 3/7/2016 3:34:00 PM Page 50 John M. Dlugosz Complete Guide to C++ and Threads under NT and decrements the counter. If the counter shows zero, the gate will not turn at all. So, we get lines outside the door. On the way out, the gate is turned in the other direction so the counter is incremented. That allows the gate to unlock and another person to enter. The mechanism in the gate doesn't keep track of which seat each person takes, or which people even turned the gate. In comparison, the mutex is more like the key to the restroom at a road-side filling station. One person acquires the key, and that person possesses the key which symbolizes the acquisition of the resource. That token (the key) must be turned back in afterwards. The only way to free the resource is for the key holder to give it up. Unlike a mutex, a semaphore doesn't keep track of who "has" it. It's just a counter. If the same process waits on a semaphore twice, the semaphore is decremented twice. Call it enough times, and the thread blocks. The other way to use a semaphore is to model a renewable and consumable resource. Instead of acquiring something and then having to give it back, you acquire something and use it up. Think of a bakery kiosk that sells cakes. The patron goes in, buys a cake, and presumably takes it home and eats it. The counter indicates the number of cakes that may be acquired. When the counter hits zero, the line forms outside the door. But customers never bring the cakes back! How does the counter ever increase again? A different person (the baker) delivers new cakes. So, one thread (the consumer) decrements the counter, and a different thread (the producer) increments the counter. The counter represents a buffer so that the producer and consumer can run asynchronously (though in the long run, they need the same average rate). Producer/consumer problems are discussed in more depth on page xx, and simple examples are given later in this section, on page xx. Win32 Semaphore API Summary A semaphore is created using CreateSemaphore or OpenSemaphore, and destroyed using CloseHandle. A semaphore is acquired by using any of the Wait functions, described on page 65. A semaphore is released using the ReleaseSemaphore function. The CreateSemaphore function HANDLE CreateSemaphore ( SECURITY_ATTRIBUTES*, long initial_count, long max_count, const tchar* name ); Page 51 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz This returns a handle to a new or existing semaphore, as explained on page 42. On error, it returns null and GetLastError provides more information. The initial_count is the initial counter value given when the semaphore is created. It must be between 0 and max_count, inclusive. The max_count is an upper limit to the counter. This provides for some error checking on releasing a semaphore—it is an error to release (increment the counter) past the maximum. The ReleaseSemaphore function BOOL ReleaseSemaphore ( HANDLE semaphore, long release_count, //must be >0 long* prev_count ); This function increments the counter of the semaphore by the value of release_count. That is, it is not limited to simply incrementing by one. If the current count plus release_count would be over the maximum allowed for this semaphore, the function returns false and the counter is not changed. Specifically, note that changing the counter by any amount is an atomic all-or-nothing operation. By analogy, if the baker tries to deliver a dozen cakes at once and the display case only has room for ten, none of them are delivered, as a dozen won't fit. If you want partial delivery, call ReleaseSemaphore in a loop (deliver one cake a dozen times) instead. The prev_count may be null, or points to an area to receive the previous count. [[ verify that prev_count works even on error ]] There is no function to get the current value of a semaphore. However, you can obtain the value by calling ReleaseSemaphore with a release_count so large that it will always fail. The counter will be unchanged, and *prev_count will indicate the old (and still current5) value. Using Semaphores in C++ Event An event synchronization object models "events" that you can define. That is, the event's state is explicitly controlled. One thread can wait for an event, where that event represents anything you want (e.g. "buffer is not full"). Meanwhile, another thread sets 5 Current as of the failed ReleaseSemaphore call, that is. The value might change between the time is called and the time that *prev_count is checked by the caller. ReleaseSemaphore 3/7/2016 3:34:00 PM Page 52 John M. Dlugosz Complete Guide to C++ and Threads under NT and clears the event to implement the event's meaning (e.g. set it when items are removed from the buffer, clear it when inserting an item which reaches capacity). Event Semantics Events come in two flavors: manual-reset and auto-reset. The difference is that an auto-reset event is automatically reset to not-signaled when a thread waiting on the event continues. A manual-reset event, on the other hand, is only reset when you tell it to. A noticeable difference in function between the two concerns multiple threads waiting on the same event. When you signal an auto-reset event, only one thread continues and the event is reset. When you signal a manual-reset event, all waiting threads proceed, and the event remains set to signaled. When there are no waiting threads, and the event is signaled, any number of threads may proceed on a manual-reset event. Any thread that tries to wait on it will see it as signaled, and not block. The event remains set. On an auto-reset event, as soon as one thread waits (and continues right away), the event is reset and subsequent threads will block if they attempt to wait on the event. Win32 Event API Summary An event is created using CreateEvent or OpenEvent, and destroyed using CloseHandle. A thread waits for an event to be in the signaled state by using any of the Wait functions, described on page 65. It can be set or reset by using the SetEvent, ResetEvent, and PulseEvent functions. The CreateEvent function HANDLE CreateEvent ( SECURITY_ATTRIBUTES*, bool manual_reset, bool initial_state, const tchar* name ); This returns a handle to a new or existing event, as explained on page 42. On error, it returns null and GetLastError provides more information. The manual_reset parameter, if true, indicates that a manual-reset event is to be created. If false, then an auto-reset event is created. The initial_state parameter indicates the initial state of the event: signaled (or set) if true, not signaled (cleared, or reset) if false. Page 53 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz The SetEvent function BOOL SetEvent (HANDLE event); This sets the state of the event to signaled. This can make waiting threads stop waiting. For a manual-reset event, all waiting threads are released. Any threads that subsequently wait on the event (until it is reset by a call to ResetEvent or PulseEvent) also proceed without blocking. For an auto-reset event, one waiting thread is released and the others keep waiting, and the event stays non-signaled (reset, or cleared). If there are no waiting threads, then the event is set to signaled and a thread which subsequently waits on the event will set the event back to not-signaled and not block. That is, one thread goes, regardless of whether there are already waiting threads. The event only has two states—set (signaled) and reset (non-signaled). It does not count or otherwise save up signals. If a call to SetEvent has no immediate effect (changing the state to signaled and/or releasing a waiting process) then it is essentially lost. For example, if an event is signaled and has nothing waiting, then another call to SetEvent has no effect, as the event is still signaled. The ResetEvent function BOOL ResetEvent (HANDLE event); This sets the state of the event to non-signaled. If the event was already non-signaled, there is no effect. The PulseEvent function BOOL PulseEvent (HANDLE event); This is like a SetEvent immediately followed by a ResetEvent as one atomic operation. For auto-reset events, one waiting thread is released. This can be understood as follows: the releasing of the first waiting thread also causes the event to reset, as is the fundamental characteristic of auto-reset events. The ResetEvent built into the PulseEvent is then redundant. For manual-reset events, all waiting threads are released and the event is left reset. Using Events in C++ 3/7/2016 3:34:00 PM Page 54 John M. Dlugosz Complete Guide to C++ and Threads under NT Timer The kernel timer object is called a "Waitable Timer" in the API, so as not to confuse it with older features in Windows. CreateWaitableTimer, new in NT4, creates a kernel synchronization object, while SetTimer dates back to 16-bit Windows and works differently. In this discussion, a timer object is assumed to refer to the kernel synchronization object. A timer, when used as a synchronization object, is signaled when the set time is reached. A timer can also perform callbacks. In this respect it's like overlapped I/O (see page 69), in that you have your choice of waiting for a result or being called back with an asynchronous procedure call. Timer Semantics Timers can have their behavior tuned in a few different respects. wait to be signaled, or be called back manual-reset timer or synchronization timer periodic timer or one-shot timer A timer object is signaled when the time expires. You don't have to wait on the timer object (you could specify a callback function in SetWaitableTimer instead), but it is still signaled. How long it stays that way can be specified. A manual-reset timer stays signaled until SetWaitableTimer is called again. It's kind of silly to define a manual-reset periodic timer. A synchronization timer is like an autoreset event—when a thread completes a wait on this object, it is reset to non-signaled. A periodic timer is signaled (and any callback performed) repeatedly using some specified interval. A one-shot timer expires once, and then doesn't do anything else until SetWaitableTimer is called again. For example, setting "noon on Tuesday" is a one-shot. But "noon on every Tuesday" is periodic, with a period of one week. Win32 Timer API Summary A timer is created using CreateWaitableTimer or OpenWaitableTimer, and destroyed using CloseHandle. A thread waits for the timer to be in the signaled state by using any of the Wait functions, described on page 65. Alternatively, a callback function can be specified. The timer programmed to signal (and call back) at a specified time with SetWaitableTimer, and an active timer can be deactivated using CancelWaitableTimer. The CreateWaitableTimer function Page 55 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz HANDLE CreateWaitableTimer ( SECURITY_ATTRIBUTES*, bool manual_reset, const tchar* name ); This returns a handle to a new or existing event, as explained on page 42. On error, it returns null and GetLastError provides more information. The manual_reset parameter, if true, indicates that a manual-reset timer is to be created. If false, then a periodic timer is created. The SetWaitableTimer function bool SetWaitableTimer ( HANDLE timer, const LARGE_INTEGER* due_time, long period, //zero for one-shot TIMERAPCROUTINE* callback_function, void* callback_argument, //passed to callback_function bool power_resume ); typedef void __stdcall TIMERAPCROUTINE ( void* callback_argument, unsigned long time_low, unsigned long time_high ); Is that enough parameters for you? Improving upon this function is covered under Using Timer in C++ on page 58. The first parameter, timer, is the handle of the timer object you are setting. The second parameter, due_time, specifies when the timer will expire. However, that's easier said than done. Just what is a LARGE_INTEGER anyhow, and how is that used to specify a time? A LARGE_INTEGER is a struct containing one member of type LONGLONG. A LONGLONG is 8 bytes. It's typedef'ed as a __int64 if 64-bit integers are supported by the compiler, or as a double otherwise. Clearly, this is meant to be an opaque type just defined to take up 8 bytes and force proper alignment, since assigning a number to it could give surprising results if it turns out to be a double. The value of these 64 bits is documented to be that of the FILETIME structure. A FILETIME structure is two unsigned long integers holding the low and high 32 bits of a single 64-bit number. There are API functions available for producing values of type FILETIME. For example, SystemTimeToFileTime: const SYSTEMTIME systime= { 1997, 10, -1/*ignored*/, 27, //October 27th 1997 13, 10, 30, // 1:10:30 PM 500 // milliseconds. Make that 13:10:30.5 3/7/2016 3:34:00 PM Page 56 John M. Dlugosz Complete Guide to C++ and Threads under NT }; union { FILETIME ftime; LARGE_INTEGER due_time; }; bool OK= SystemTimeToFIleTime (&systime, &ftime); if (!OK) throw "something is wrong"; // finally! OK= SetWaitableTimer (timer, &due_time, … The union is used rather than casting &ftime into a LARGE_INTEGER* because of the alignment restrictions. The LARGE_INTEGER will be aligned on a 8-byte address, while the FILETIME structure only demands 4-byte alignment. This doesn't matter on x86 architecture (you get a speed penalty but not an error), but on other CPU's it could be a fatal error. The due_time can also be a relative time, indicated by using a negative number. For example, regardless of what time it is now set the timer to go off in 1 hour, two minutes, and 3.13159 seconds6. To specify a relative time, express it in terms of 100-nanosecond (or tenth of a microsecond) intervals, and multiply by –1. Note that this unit base is different from most other places in Win32, including the next parameter to the SetWaitableTimer function. __int64 value= (__int64) –10000000*62*60 + 31315900; OK= SetWaitableTimer (timer, reinterpret_cast<LARGE_INTEGER*>(&value), … Note that this time I do use a cast, because the __int64 is aligned properly. If the compiler doesn't have any native 64-bit integer type, make a class to support 64-bit arithmetic (use the double trick like FILETIME to force the right alignment). The arithmetic to compute value begins with a cast so that all the arithmetic is done using 64-bit values. The numeric literals are of type int, and arithmetic with ints as input gives int as results. Being non-standard, there is no special suffix to indicate that a literal should be extra long, analogous to the L suffix to specify a long (as opposed to int) literal. The third parameter to SetWaitableTimer, period, indicates a repeat length. If zero, the timer is one-shot. Otherwise, this value (in milliseconds) is added to the previous set time and the timer is ready to go again. The actual accuracy of the waitable timer is unspecified. On the Intel platform, NT4 appears to keep track of time in hundredth of a second intervals. However, that doesn't mean that the kernel triggers the timer on any arbitrary tick. Furthermore, just because a timer is signaled doesn't mean that the waiting thread will get scheduled immediately. 6 Page 57 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz For example, to set a timer to go off the same time as before, but then go off again every five minutes and twenty seconds, the parameters would be: __int64 start_time= (__int64) –10000000*62*60 + 31315900; long period= (5*60+20)*1000; OK= SetWaitableTimer (timer, reinterpret_cast<LARGE_INTEGER*>(&start_time), period, … Suppose you were just interested in the period, not in any specific start time. That is, go off the first time one period length from now. In this case, use the same time as the period as a relative time for the starting time. Remember to convert milliseconds to decimicroseconds, changing the sign and being careful not to overflow 32-bit arithmetic. bool SetPeriodic (HANDLE timer, long period) { const __int64 conversion_factor= -10000; union { __int64 start_time_value; LARGE_INTEGER due_time; } start_time_value= conversion_factor * period; return SetWaitableTimer (timer, &due_time, period, 0,0,false); } The callback_function, if not null, is queued as an APC (see page 65) when the timer goes off. The callback_argument, along with the current time, is passed to the callback function. If the callback_function is null, callback_argument is ignored. The last argument, power_resume, if true will cause the computer to resume from power conservation mode when the timer goes off. If the computer doesn't have such a thing, the timer is set normally (assuming nothing else is wrong with the arguments) and GetLastError returns ERROR_NOT_SUPPORTED. The CancelWaitableTimer function bool CancelWaitableTimer (HANDLE timer); This deactivates a timer. It does not change the signaled state of the timer object. Using Timer in C++ Clean up the parameters and the 64-bit time values! Change Notification A timer can be thought of as a pre-defined "event". That is, it's simply a flag that's signaled when something interesting happens, and the "it happened" code is also available as part of the operating system. So it is with change notifications. You can 3/7/2016 3:34:00 PM Page 58 John M. Dlugosz Complete Guide to C++ and Threads under NT use a change notification object to wait for something to happen in a directory, or to a printer. Semantics of File Change Notification Objects A file change notification object is signaled when something changes in a specified directory. You can set it up to watch a single directory or a whole directory tree rooted at a specified directory. The things that can be monitored for changes are: Any filename change, including adding, removing, or renaming files. Any directory name change. Same as filename changes, only concerning names of subdirectories rather than ordinary files. Change in attributes, such as the read-only attribute, archive attribute, etc. The size of a file changed. Change to the last-write time of a file. Security attributes changed. Suppose you write a viewer program, and include a screen to display all files in a directory as thumbnails. You can set up a change notification object to know when something changes in that directory, so the display won't be out of date. Don't you hate it when a program doesn't automatically detect changes to files? A change notification object applies to a whole directory, so it's easy to watch that directory for changes. However, when a change occurs, all you know is that a change occurs. The object becomes signaled. No other information is present! This is fine if all you plan to do on such a notification is to refresh the whole display. But if you wanted to know which file or files changed, you would have to scan the directory information and compare it to what you had before. [[[ find my experiments with change notifications, give details here ]]] An alternative: ReadDirectoryChangesW Semantics of Printer Change Notification Objects Page 59 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Change Object API Summary [[[ Quick summary goes here ]]] The FindFirstChangeNotification function HANDLE FindFirstChangeNotification ( const tchar* directory_name, bool subtree, ulong filter ); The first parameter is the directory to watch for changes. If subtree is true, this is the root of the subtree to watch, as all subdirectories are included. If subtree is false, than only directory_name itself is watched. The filter is made of one or more of the following bits: FILE_NOTIFY_CHANGE_FILE_NAME FILE_NOTIFY_CHANGE_DIR_NAME FILE_NOTIFY_CHANGE_ATTRIBUTES FILE_NOTIFY_CHANGE_SIZE FILE_NOTIFY_CHANGE_LAST_WRITE FILE_NOTIFY_CHANGE_LAST_ACCESS FILE_NOTIFY_CHANGE_CREATION 1 2 4 8 0x10 0x20 0x40 not documented not documented 0x100 The LAST_ACCESS and CREATION values are not documented in the WIN32 API documentation, but can be found in WINNT.H with the other values. FILE_NOTIFY_CHANGE_SECURITY [[[ give details on what all the bits do, with empirical testing ]]] The function returns null if it fails, or a HANDLE to a change object which becomes signaled when the indicated change takes place. To reset the object after waiting completes, use FindNextChangeNotification. The FindFirstPrinterChangeNotification function HANDLE FindFirstPrinterChangeNotification ( HANDLE printer, //from call to OpenPrinter ulong filter, //what to watch for ulong reserved, //must be zero void* options ); The first parameter specifies the printer (or print server) to watch for changes. The second parameter, filter, is a set of bit flags which can be combined to watch for multiple things. The definitions are arranged hierarchically, meaning that categories are 3/7/2016 3:34:00 PM Page 60 John M. Dlugosz Complete Guide to C++ and Threads under NT defined using all the bits from individual items. The chart below reflects this organization. Naturally, what counts is the actual bits that get OR'ed together, and you don't have to use (or limit yourself to) these category symbols. PRINTER_CHANGE_ALL PRINTER_CHANGE_PRINTER PRINTER_CHANGE_ADD_PRINTER PRINTER_CHANGE_SET_PRINTER PRINTER_CHANGE_DELETE_PRINTER PRINTER_CHANGE_FAILED_CONNECTION_PRINTER PRINTER_CHANGE_JOB PRINTER_CHANGE_ADD_JOB PRINTER_CHANGE_SET_JOB PRINTER_CHANGE_DELETE_JOB PRINTER_CHANGE_WRITE_JOB PRINTER_CHANGE_FORM PRINTER_CHANGE_ADD_FORM PRINTER_CHANGE_SET_FORM PRINTER_CHANGE_DELETE_FORM PRINTER_CHANGE_PORT PRINTER_CHANGE_ADD_PORT PRINTER_CHANGE_CONFIGURE_PORT PRINTER_CHANGE_DELETE_PORT PRINTER_CHANGE_PRINT_PROCESSOR PRINTER_CHANGE_ADD_PRINT_PROCESSOR PRINTER_CHANGE_DELETE_PRINT_PROCESSOR PRINTER_CHANGE_PRINTER_DRIVER PRINTER_CHANGE_ADD_PRINTER_DRIVER PRINTER_CHANGE_SET_PRINTER_DRIVER PRINTER_CHANGE_DELETE_PRINTER_DRIVER PRINTER_CHANGE_TIMEOUT 0x7777FFFF 0x000000FF 0x00000001 0x00000002 0x00000004 0x00000008 0x0000FF00 0x00000100 0x00000200 0x00000400 0x00000800 0x00070000 0x00010000 0x00020000 0x00040000 0x00700000 0x00100000 0x00200000 0x00400000 0x07000000 0x01000000 0x04000000 0x70000000 0x10000000 0x20000000 0x40000000 0x80000000 The options pointer is declared as void* in the header, but should really be a PRINTER_NOTIFY_OPTIONS*. This structure is declared in WINSPOOL.H. This is basically an encapsulated array of PRINTER_NOTIFY_OPTIONS_TYPE structures, each of which describes a printer field. You can get change notifications based on these fields in addition to the bits in the filter parameter. The options parameter can be null if filter is non-zero. Likewise, options can be zero if the options pointer is used. They can't both specify no notification events. The FindNextChangeNotification function BOOL FindNextChangeNotification (HANDLE notifier); After a file notification object has been used, meaning that it has been signaled by some change, this function will reuse it. The object is reset to non-signaled and a subsequent or pending change will signal the object again. By pending change, I mean changes that occurred after the wait completed from previous change. The object doesn't function Page 61 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz properly if you issue FindNextChangeNotification without waiting for a change (from the FindFirstChangeNotification or the previous FindNextChangeNotification) first. [[[ give details from empirical testing, and show an example program ]]] The FindNextPrinterChangeNotification function BOOL FindNextPrinterChangeNotification ( HANDLE notifier, ulong* changed, //same bits as options in FindFirst. void* options, void** info ); This is a bit fancier than the equivalent for directories. This function not only resets the state of the notification object (so it will watch for changes again), but also gives details on the current change it noticed. The structures used with printer notifications are rather complex, and concern printing more than synchronization. This is not a book about Windows printers. For our purposes, suffice to know that you can treat it like an event object, waiting for "something interesting" to signal the object. The FindCloseChangeNotification and FindClosePrinterChangeNotification functions BOOL FindCloseChangeNotification (HANDLE notifier); BOOL FindClosePrinterChangeNotification (HANDLE notifier); This closes the handle to the change notification object. Unlike most kernel objects that just use CloseHandle, a handle to notification objects are closed in a special way. Why is an intriguing question. Other Kernel Handles Kernel objects have a "signaled" state, which implies that the wait functions should operate on any kind of object. In addition to the synchronization objects described above, here are the other object types and what "signaled" means to them. Process and Thread A thread object becomes signaled when the thread terminates. This makes it easy to wait for a thread to terminate, as it uses the same Wait functions as any other kernel object. The same applies to process objects. File A kernel file object is signaled every time an I/O operation completes. It is reset every time an I/O operation is initiated. However, this behavior is not well documented, so don't count on knowing exactly when the object is reset. Perhaps it even varies 3/7/2016 3:34:00 PM Page 62 John M. Dlugosz Complete Guide to C++ and Threads under NT depending on the kind of special file (e.g. pipe, socket) is being used (see Console-input below). The use of file handles for synchronization is not recommended. Although you can be sure it is signaled when an operation is completed, what if there are more than one outstanding request (perhaps on different threads) on the same kernel file object? For this purpose, use the event object in the OVERLAPPED structure instead. See page 69 for more information. [[[ note: change the xref to a more specific subsection after that chapter is written ]]] Console-input Console input is treated as a special file. You can obtain a handle to a console-input object by calling CreateFile with "CONIN$" as the file name. Consoles are the TTY terminal windows that are used by character-mode programs. The object is signaled when there is unread input in the console's input buffer. Page 63 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Other Synchronization Primitives Win32 Critical Section Monitors Condition Variables (used with Monitors) Spin Locks Reader/Writer and Group Locks Rendezvous Conditional Semaphore 3/7/2016 3:34:00 PM Page 64 John M. Dlugosz Complete Guide to C++ and Threads under NT Waiting and Blocking Alertable States and APC's Windows Messages Discuss per-thread message queues and how SendMessage et.al. implement complex blocking behavior. A Survey of Wait Primitives In general, the wait functions watch one or more kernel synchronization objects and return when the object or objects are signaled. Meanwhile, a time-out can be specified. WaitForSingleObject and WaitForSingleObjectEx ulong WaitForSingleObject (HANDLE object, ulong timeout); ulong WaitForSingleObjectEx (HANDLE object, ulong timeout, bool alertable); I prefer to simplify this by overloading the name WaitForSingleObject instead of having two different names, and also providing a default timeout of INFINITE. See page 73 under C++ issues compiler issues compiler and linker switches, etc. volatile data thread safety catalog language constructs that are thread safe or unsafe. A C++ Threading Library. [[[ replace page number with a more specific subsection later ]]] This works exactly like WaitForMultipleObjects given an array of one item. So, the only return values you can expect are WAIT_OBJECT_0, WAIT_ABANDOND, WAIT_IO_COMPLETION, or WAIT_TIMEOUT. WaitForMultipleObjects and WaitForMultipleObjectsEx ulong WaitForMultipleObjects ( ulong count, const HANDLE* array, bool wait_for_all, //wait for "any", not "all", if false dword timeout //in milliseconds Page 65 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz ); ulong WaitForMultipleObjects ( ulong count, const HANDLE* array, bool wait_for_all, //wait for "any", not "all", if false dword timeout, //in milliseconds bool alertable ); MAXIMUM_WAIT_OBJECTS is 64. SignalObjectAndWait MsgWaitForMultipleObjects and MsgWaitForMultipleObjectsEx Thread Priorities 3/7/2016 3:34:00 PM Page 66 John M. Dlugosz Complete Guide to C++ and Threads under NT Communicating Between Threads and Processes Anonymous Pipes Named Pipes Mailslots Sockets Shared Memory APC's Windows Messages Page 67 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Thread-Specific Data 3/7/2016 3:34:00 PM Page 68 John M. Dlugosz Complete Guide to C++ and Threads under NT Overlapped I/O Don't forget to cover the CancelIo function. future-value model callback model completion ports Page 69 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Dynamic Link Libraries (DLL's) 3/7/2016 3:34:00 PM Page 70 John M. Dlugosz Complete Guide to C++ and Threads under NT Fibers Page 71 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz Processes 3/7/2016 3:34:00 PM Page 72 John M. Dlugosz Complete Guide to C++ and Threads under NT C++ issues compiler issues compiler and linker switches, etc. volatile data thread safety catalog language constructs that are thread safe or unsafe. Page 73 3/7/2016 3:34:00 PM Complete Guide to C++ and Threads under NT John M. Dlugosz A C++ Threading Library 3/7/2016 3:34:00 PM Page 74