Multi-threaded Lisp: Challenges and Solutions Roger Corman Corman Technologies International Lisp Conference 2002 Threads: Why Bother? Most modern applications use multiple threads of execution. Nearly all servers require threads for reasonable performance. New languages have threads support built into the language. Most modern operating systems support threads. Lisp is cool; threads are cool. Right Way To Think About Threads All threads run simultaneously. If they don’t, it doesn’t matter. If you have more than one processor, they sometimes do. This model leads you to consider, early, issues such as synchronization, data sharing, performance and messaging. Wrong Way to Think About Threads Multiple execution contexts which run one at a time. Thread-switching can be intercepted to perform arbitrary logic. Thread-switching can be forced to happen only at appropriate times. This model will never support multiple processors, and will not even perform as well on a single CPU system. It is easy to develop bad programming habits, such as WITHOUT-INTERRUPTS. Next up: Thread What a thread looks like: Personal Background Developed multi-threaded applications in C/C++/Java. Designed Corman Lisp to include all the capabilities of these other language platforms. Had to make a lot of decisions about how to implement multiple threads in Lisp. Threads Are an OS Function Just as the file system is a function of the operating system, threads should be allocated and scheduled by the operating system. While lisp-implemented threads can be useful, they will never be as powerful as operating system threads. Many Lisp implementations have supported lispimplemented threads, and a standard API has developed which is supported by several implementations. It implies lisp-implemented threads, with fine-grained control over task scheduling. Lisp Process: Lisp-Implemented Threads Operating System Thread 0 Lisp Scheduled Thread 0 Lisp Scheduled Thread 1 Lisp Scheduled Thread 2 Processor A Processor B Processor C Lisp Process: Operating System Threads Operating System Thread 0 Operating System Thread 1 Operating System Thread 2 Processor A Processor B Processor C Corman Lisp Threading Model In the Corman Lisp implementation, we chose to implement only operating system threads. These give Lisp applications the maximum capabilities provided by the operating system. The Win32 API allows the application very limited control over thread scheduling. We don’t view this as a OS limitation, but rather as an OS feature. Threads and the Common Lisp Standard While lightweight processes have been supported in many Lisp implementations, the standard does not include any discussion of threads. The standard contains language which could imply non-conformance of multi-threaded implementations. Thread Implementation Issues Special variables Heap access Garbage collection Foreign calls and callbacks Code generation Debugging Lisp Image saving and loading Foreign Threads From ANSI Specification: 3.1.2.1.1.2 Dynamic Variables The effect of binding a dynamic variable is to create a new binding to which all references to that dynamic variable in any program refer for the duration of the evaluation of the form that creates the dynamic binding. From ANSI Specification Glossary: dynamic extent n. an extent whose duration is bounded by points of establishment and disestablishment within the execution of a particular form. See indefinite extent. “Dynamic variable bindings have dynamic extent.” extent n. the interval of time during which a reference to an object, a binding, an exit point, a tag, a handler, a restart, or an environment is defined. Special Variables And Threads Top level bindings should be shared between threads. Dynamic bindings (via LET, etc.) should only apply to the thread the created the binding. Is there ever a reason for dynamic bindings to be visible to other threads? (I haven’t found one.) Implementation of Dynamic Binding Many Lisp implementations store special variable bindings in a symbol’s value slot. This is a problem in a multi-threaded system. Special variable bindings need to be stackspecific, yet efficient. “Deep binding” where bindings are kept in a list, can be inefficient. Dynamic Bindings in Corman Lisp Top level (shared) bindings are stored in symbol’s value slot. Dynamic bindings are stored in a per-thread symbol table. Only symbols which are actually dynamically bound (in loaded, compiled code) need a slot in the per-thread symbol table. Special Variable Bindings *X* binding *Y* binding *X* *Y* *Z* etc *Z* binding Master symbol table Special Variable Bindings *X* *Y* *Z* *X* *Y* *Z* etc binding binding binding Thread local binding *X* *Y* *Z* etc *X* *Y* *Z* etc Master symbol table Thread 0 Symbol Table Thread local binding Thread 1 Symbol Table Thread Local Variables If you implement efficient per-thread special variable binding, you end up with an implementation of thread local variables (à la C#, Java). Surprise: they are far easier to use, elegant, and fit into the language better than most other thread local variable models in other languages. Example: Java public class MyThread extends Thread { // static Hashtable stuff; public static ThreadLocal stuff = new ThreadLocal(); public void run() { // each thread must have a different hashtable stuff.set(new Hashtable); } } if (((Hashtable)stuff.get()).containsValue("Foo")) doSomething(); // run code which can refer to hash table Example: Common Lisp (defvar *hash-table*) (defun run () (let ((*hash-table* (make-hash-table))) (do-something))) ; run code which can refer to hash table Next Up: Binding Dynamic Binding Garbage Collection 2 Big Issues: 1) Do you stop the mutator, i.e. all threads, to perform garbage collection? 2) Impact on code generation. Garbage Collection (cont.) Stopping the mutator: Overall performance hit (higher cost) when multiple processors are available (unless your garbage collection algorithm is a parallel algorithm). Generally lower overall cost on single processors. Simpler algorithms. Garbage Collection (cont.) Parallel execution: Threads keep running. Scales better to many processors. Higher overall cost on single processor due to code generation, extra root scanning. Minimizes pauses for collection. Complex algorithms, more error-prone. Corman Lisp Garbage Collector We chose to use a generational collector which stops the mutator (all threads) but keeps pauses short (a few milliseconds, usually). Most target systems have only one processor. Windows systems do not support more than 4 (or 8) processors. Does not require a software write barrier, or any type of read barrier. Code generation is minimally impacted. Corman Lisp Garbage Collector (cont.) 1. 2. 3. 4. 5. Suspends all executing threads. If any thread is in an atomic block, resume it for a few milliseconds and go to step 1. Performs root analysis on all stacks, heaps, and static memory. Copies live data from generation 0 to 1 (triggering other levels if necessary). Resumes all executing threads. Garbage Collection and Code Generation In a single-threaded system, garbage collection will only occur at “safe states” (usually when a heap allocation is requested). In a multi-threaded system, garbage collection can occur at any time, potentially between any pair of instructions. Restricting it to safe states can be difficult and require code generator assistance. Some code optimizations are impractical. Next up: Me Me, after a long night debugging the garbage collector. Foreign Calls and Callbacks (FFI) Foreign Call: Lisp code calls operating system, C function or any external library. Foreign Callback: Operating system, C function, or any other code calls Lisp function directly. FFI and Threads A thread may be executing foreign code when the garbage collector is called. It may not be in a predictable state for the collector. Need a way for the collector to determine which threads were executing foreign code and which were executing Lisp code. Lisp code could have been called by foreign code. Tricky issue: “thunking” from Lisp-to-foreign or foreign-to-Lisp code. The collector can run any time during the process. Multi-threading and Code Generation The code generator must cooperate with the collector to ensure that executing code can be analyzed by the collector as necessary. Areas of interest: Register Conventions Tagging Conventions Stack frame links Function calling/returning Foreign code transitions Stack-allocated data Multi-threading and Code Generation Many common optimizations are difficult or impossible if the collector can run at any time. Implementation Strategies: a) Restrict the collector to only run at “safe states” and allow these optimizations at any other places in the code. b) Never allow optimizations or generated code which would trip up the collector at any point. c) Allow code generator to designate portions of the code which are non-safe. Corman Lisp Code Generation We implemented option (c) using a pair of instructions: BEGIN_ATOMIC END_ATOMIC These mark blocks of code which are protected from garbage collection. The atomic sections may never span calls or branches, and typically should only include a few instructions. Corman Lisp Code Generation (cont.) An atomic block is either: a) b) Instruction sequence between BEGIN_ATOMIC/END_ATOMIC instructions. Instructions that build up or tear down a stack frame (beginning or end of function call). The latter is not safe because the stack frame list may be invalid, and the collector could be tripped up. It is impractical to use BEGIN_ATOMIC/END_ATOMIC in all these places, so the collector has special logic to detect that the instruction pointer is at one of these points. Corman Lisp Code Generation (cont.) A safe state is any point during code execution in which the instruction pointer is not in an atomic block, or is executing foreign code (which cannot access the heap). The collector will delay collection until all threads are in safe states. Some sequences of instructions can only be performed in safe states. Next up: Code Generator Corman Lisp Code Generator Lisp Debugger The ANSI specification does not go into detail about what the common Lisp debugger does. Implementations commonly will enter a special READ-EVAL-PRINT loop when an error is signaled or a BREAK is executed. This works well for checking the stack of the current thread, but what should it do about other threads? Lisp Debugger (cont.) Issues Should other threads be halted? Should it be possible to examine stack frames of other threads, and to switch logical threads in the debugger? Which thread are you dropped into when you press control-break? Lisp Debugger (cont.) I believe the best debugger will do all these things, but to do so it really needs to run in a separate thread. I don’t think debugger behavior ought to be part of a standard, but it certainly is something an implementer has to deal with. Next Up: Debugger Corman Lisp Debugger Lisp Image Saving and Loading A non-standard feature which many Common Lisp implementations support is loading and saving of the lisp heap as an image. It is important that other threads be at least suspended when saving or loading an image file, and possibly you need to disallow other threads from even existing. I found the implementation of this to be a bit tricky. Foreign Threads When the Lisp system allocates a thread, it can initialize it with any necessary data for the garbage collector to use when scanning it. If you want to allow threads not allocated by the lisp system to call lisp threads, there must be some way for the collector to know about them and figure out what to do with them. If you don’t allow foreign-created threads to call into Lisp, then it impairs the ability to embed the Lisp engine in other applications. Corman Lisp DirectCall DirectCall allows foreign-created threads to call directly into Lisp code. BlessThread() UnblessThread() These must be called by any thread that calls into Lisp directly. DLLs compiled with Corman Lisp perform this transparently as needed. Next up: Foreign Thread Foreign Thread (Before being blessed) Common Lisp Standard Threading Suggestions I propose that threading be incorporated in the Common Lisp standard, at least as an optional feature specified by the standard. This involves amending some of the existing language. Mostly it involves adding a few new macros, functions and probably some standard classes. Standard Common Lisp Threading Suggested Additions Dynamic variable specification. Specify which objects/structures are safe for use by multiple threads (such as hash-tables, packages, readtables, etc.). Specify which language features are safe for multi-threaded operations (read, eval, print, etc.). Standard special variable bindings for process id, thread id. WITH-LOCK macro for synchronization on objects and structures. Standard Common Lisp Threading Suggested Additions (cont.) Start with some of the features of the Franz MP API. Do not include scheduling functions or macros, such as WITHOUT-SCHEDULING and WITHOUT- INTERRUPTS. Standard of classes for synchronization such as Mutex, Critical Section, Semaphore. Summary Threads are good. Threads, by design, run in parallel. There are many system implementation issues involved. Common Lisp standard should add a section on standard multiple thread support. Illustrations help to clarify abstract concepts. Original Illustrations by Emmett Corman Copyright (c) Corman Technologies 2002