Clojure 6 Concurrency 26-Jul-16 Two (or more) writers Shared storage X This Thread writes to X Problems: This Thread writes to X Indeterminacy: Who gets there first? Data corruption: Each may write parts of X 2 Readers and writers Shared storage X This Thread reads from X Problems: This Thread writes to X Indeterminacy: Who gets there first? Data corruption: May read something that’s partly rewritten 3 Two (or more) readers Shared storage X This Thread reads from X This Thread reads from X No problems! But…how do Threads communicate? 4 The mailbox approach This Thread sends mail This Thread gets mail Synchronization is handled by the language, not by the programmer (Like the way garbage collection is handled) 5 Concurrency Clojure supports concurrency, and most values are immutable Clojure also has a form of shared state Clojure has agents, which are similar to Erlang’s actors Concurrency and shared state don’t play well together The Java approach, using locks (synchronization), is very error-prone Closure uses a new approach, Software Transactional Memory (STM), to managing concurrency This is similar to a database transaction Atomicity An action is atomic if it happens “all at once” from the point of view of other threads That is, the action either has happened or it hasn’t; it’s not in the middle of happening Few actions are actually atomic For example, x++ consists of three parts: get the value of x, add 1 to the value, put the new value back in x If a spacecraft moves from point (x, y, z) to point (x', y', z'), these values must be updated “simultaneously” If another thread access x during the operation, the results are unpredictable The spacecraft is never at point (x', y, z) In Java, the approach is to “lock” the variables being updated, thus denying access to other threads It is extremely difficult, with this model, to write correct programs Efficiency isn’t all that easy, either Refs and STM A ref is a mutable reference to an immutable value To find the value, you have to dereference the variable That is, the data remains immutable, but you can change what data the ref refers to Basic syntax: (ref initial-state) Typical use: (def ref-variable (ref initial-state)) Syntax: (deref ref-variable ) Syntactic sugar: @ref-variable A reference variable is like an “ordinary” variable in an “ordinary” language—its value can be changed However, there are restrictions on where it can be changed Updating a reference variable Reference variables can only be updated in a transaction Basic syntax: (dosync expression ... expression) Typical use: (dosync (ref-set ref-variable new-value)) A transaction is: Atomic -- From outside the transaction, the transaction appears instantaneous: It has either happened, or it hasn’t Consistent -- With some additional syntax, a reference can specify validation conditions ref-set is basically an “assignment” to a reference variable If the conditions aren’t met, the transaction fails Isolated -- Transactions cannot see partial results from other transactions A transaction is not: Durable -- Results are not saved for a future run of the program Databases are ACID: Atomic, Consistent, Isolated, and Durable Software transactions are only “ACI” alter The alter method provides a somewhat more readable way to update a reference variable Due to the way alter works, when you write an updating function, it should have the thing being updated as the first argument Syntax: (alter ref-variable function args-to-function) Typical usage: (dosync (alter ref-variable function args)) The return value of alter is the new value of ref-variable (defn function [thing-to-update other-arguments] function-body) When you cons something to a list, the list is the second parameter, which is not what you need There is an additional function, conj, which is like cons with the arguments reversed (conj sequence item) == (cons item sequence) How STM works A transaction takes a private copy of any reference it needs Since data structures are persistent, this is not expensive The transaction works with this private copy If the transaction completes its work, and the original reference has not been changed (by some other transaction), then the new values are copied back atomically to the original reference But if, during the transaction, the original data is changed, the transaction will automatically be retried with the changed data However, if the transaction throws an exception, it will abort without a retry Side effects alter must: Be completely free of side effects Return a purely functional transformation of the ref value alter can only be done inside a transaction A transaction may be retried multiple times Therefore: If the transaction updates state, the update may happen many times Pessimistic locking Java’s approach is pessimistic--access to shared state is always locked (if correctly done!) Locking is expensive In most scenarios, contention happens only occasionally Therefore, the expense of locking is usually unnecessary But… Locking must be done anyway because another thread might try to access the shared state Optimistic evaluation Clojure’s approach is optimistic--it assumes contention might happen, but probably won’t happen Transactions begin immediately, without locking Because data is persistent (immutable), making a private copy is much less expensive than locking mutable data Clojure guarantees that every transaction will eventually finish, so deadlock does not occur When the transaction completes, the results are copied back atomically Therefore, locking never happens unnecessarily In a high concurrency, high contention scenario, a transaction could be tried many times before it finally succeeds or aborts In this situation, locking is a more efficient approach Everybody’s first example (def account1 (ref 1000)) (def account2 (ref 1500)) (defn transfer "Transfers amount of money from a to b" [a b amount] (dosync (alter a - amount) (alter b + amount) ) ) (transfer account1 account2 300) (transfer account2 account1 50) Practical Clojure, Luke Vanderhart and Stuart Sierra, p. 101 Atoms Refs are used to coordinate multiple changes, which must happen all at once or not at all Atoms are like refs, but only change one value, and need not occur within a transaction Syntax: (atom initial-value) Typical usage: (def atom-name (atom initial-value)) Atoms are accessed just like refs (deref atom-name) or @atom-name (reset! atom-name new-value) will change the value of an atom, and return the new value (swap! atom-name function additional-arguments) calls the function with the current value of the atom and any additional arguments, and returns the new value Like alter, swap! may be retried multiple times, so should have no side effects Atoms are less powerful than refs, but also less expensive Validation and metadata The keyword :validator introduces a validation function, while the keyword :meta introduces a map of metadata Syntax: The validator function is used when a transaction is attempted (ref initial-value :validator validator-fn :meta metadata-map) (atom initial-value :validator validator-fn :meta metadata-map) If a validator function fails, the ref or atom throws an IllegalStateException, and the transaction doesn’t happen Metadata is data about the data, for example, the source of the data, or whether it is serializable Metadata is not considered in comparisons such as equality testing There are methods for working with metadata Metadata is outside the scope of this lecture Agents and actors Erlang has actors; Clojure has agents Advantages of actors: An actor holds a function, and you send it data An agent holds data, and you send it functions Actors have better error reporting than agents Actors can be remote; agents cannot Advantages of agents: You can directly retrieve a value from an agent, but not from an actor Actors can deadlock; agents cannot Agents A agent is a thread that holds a value An agent is created with an initial value: You can send an agent a function to update its state: Syntax: (agent initial-state) Typical use: (def agent-name (agent initial-state)) (send agent-name update-function arguments) The value of the send is the agent itself, not the value of the function (except, possibly, in the REPL) You can check the current state of an agent with deref or @ You can wait for agents to complete: (await agent-name-1 ... agent-name-N) This is a blocking function, and it could block forever (await-for timeout-millis agent-name-1 ... agent-name-N) This is a blocking function; it returns nil if it times out, otherwise non-nil More about agents When you “create” an agent, you are getting a thread from a Clojuremanaged thread pool agent starts the agent running concurrently send-off has the same syntax and semantics as send, but is optimized for slow processes (such as I/O) If you use a send (or send-off) within a transaction, it is held until the transaction completes This prevents the send from occurring multiple times I don’t think there is a way to stop an individual agent Clojure will not terminate cleanly if there are still agents running The function (shutdown-agents) tells all agents to finish up their current tasks, refuse to accept any more tasks, and stop running Agents and errors Agents can have validating functions and metadata (agent initial-state :validator validator-fn :meta metadatamap) Example: (def counter (agent 0 :validator number?)) If you send bad data to an agent, The send returns immediately, without error The error occurs when you dereference the agent All further attempts to query the agent will give an error (agent-errors agent-name) will return a sequence of errors encountered by the agent (clear-agent-errors agent-name) will make the agent useable again (set-validator! agent-name validator-fn) adds a validator to an existing agent Calling Java Clojure has good support for calling Java Our interest here is using Java Threads as an alternative to agents The following syntax (notice the dots) creates a new Thread, passes it a function to execute, and starts it: user=> (defn foo 10) user=> (.start (Thread. (fn [] (println foo)))) def and defn create vars (also called dynamic vars) with an initial value, or root binding The root binding is available to all Threads (which is why the above works) vars def and defn create vars with a root binding Usually, this is the only binding You can use the binding macro to create threadlocal bindings Syntax: (binding [var value ... var value] expression ... expression) The value of the binding is the value of the last expression Bindings can only be used on vars that have been defined by def at the top level The bindings have dynamic scope--within the expressions and all code called from those expressions This differs from let, which has lexical scope Thread-local bindings can be updated with set! A word about scopes Most languages use lexical scope: The scope of a name is a certain area on the page With dynamic scope, the scope of a variable is its lexical scope plus all code executed within that scope Example: In Java, the scope of a variable in a method starts at the point it is declared and ends at the nearest closing brace Example: In Clojure, the scope of a parameter is the method body, and the scope of a variable defined by let is within the parentheses enclosing the let In other words, it is available to all called methods and functions This is unlike using a global variable, as the same name may refer to different memory locations depending on where the function is called from Dynamic scope is not standard in any modern language Dynamic scope is available in Clojure for two purposes: Where it is necessary for reasons of efficiency For settings such as *out* (the output stream) Watches A watch is a function that is called whenever a state changes A watch can be set on any identity (atoms, refs, agents, and vars) To add a watch: (add-watch identity key watch-function) The key may be any value that is different from the key for any other watcher on this same identity To define a watch function: (defn function-name [key identity old-val new-val] expressions) When the watch function is called, it is given the old and new values of the identity of the update that caused the change For vars, the watch is only called when the root binding changes Other updates may have happened in the meantime To remove a watch function: (remove-watch identity key) Automatic parallelism The function pmap is just like map, except that the function is applied to the sequence values in parallel The number of threads used depends on the number of CPUs on the system pmap is “partially lazy”--it may run a bit ahead pvalues takes any number of expressions, and returns a lazy sequence of their values pcalls takes any number of zero-argument functions, and returns a lazy sequence of their values Futures and promises A future is a computation, running in a single thread, whose value will be required sometime in the future (future expressions)) You can check if a future has completed with (future-done? future-name) You can get the value computed by a future with deref or @ Syntax: (future expressions) Typical use: (def future-name This will block until the future is completed A promise is a result that may not yet exist Threads may ask for the result, and block until it exists To create a promise, use (def promise-name (promise)) To deliver a value to a promise, use (deliver promise-name value) The value can be retrieved with deref or @ An aside: memoization Memoization is keeping track of previously computed values of a function, in case they are needed again Example: The Collatz function of 6 gives: 6 3 10 5 16 8 4 2 1 The Collatz function of 7 gives: 7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2 1 Example 2: The factorial of 100 is trivial to compute if you already know the factorial of 99 In Clojure, functions can be memoized: (def faster-collatz (memoize collatz)) This will keep track of all previously computed values of collatz If this requires too much memory, you can write a more sophisticated method to keep track of only some values For obvious reasons, only pure functions can be memoized Summary: When to use what Refs Atoms When you absolutely must have mutable state Remember scope is dynamic, not lexical Validator functions Introduce asynchronous concurrency Vars Synchronous, independent updates Especially useful for memoized values Agents Synchronous, coordinated updates, using STM To maintain data integrity Watches To trigger events when an identity’s value changes Structs I A struct is something like an object, more like a map To define a struct: (defstruct name key ... key) A struct may be created by calling struct with the correct arguments in the correct order Example: (def witches (struct book "Witches Abroad" "Pratchett")) A struct may be created by calling struct-map with key-value pairs in any order For example: (defstruct book :title :author) Example: (def witches (struct-map book :author "Pratchett" :title "Witches Abroad") ) When creating a struct with struct-map, It is not necessary to supply a value for every key Additional keys and values may be included Structs II Structs are maps, and may be accessed like maps user=> (witches :title) "Witches Abroad" user=> (:author witches) "Pratchett" user=> (get witches :date "unknown") "unknown" Maps are, of course, immutable assoc will return a new map based on an existing map, with new or replaced key-value pairs map key value ... key value) dissoc will return a new map with key-value pairs removed Syntax: (assoc Syntax: (dissoc map key ... key) (contains? map key) will test if the key occurs in the map Records A record definition is something like a Java class To define a record type: (defrecord TypeName [fieldnames*]) It acts as a map but, because the “fields” are predefined, it can be more efficient. Example: (defrecord Tree [value left right]) Don't use keywords as field names To create an instance: (Typename. args*) Example: (def animals (Tree. "Water animal?" "frog" "horse")). To refer to a field, use the keyword version of the fieldname as a function: for example, (:left animals). The End