Clojure 4 Concurrency 26-Jul-16

advertisement
Clojure 4
Concurrency
26-Jul-16
Concurrency

Clojure supports concurrency, and most values are
immutable


Clojure also has a form of shared state



Clojure has agents, which are similar to Erlang’s actors
Concurrency and shared state don’t play well together
The Java approach, using locks (synchronization), is very
error-prone
Closure uses a new approach, Software Transactional
Memory (STM), to managing concurrency

This is similar to a database transaction
Atomicity

An action is atomic if it happens “all at once” from the point of
view of other threads


That is, the action either has happened or it hasn’t; it’s not in the middle
of happening
Few actions are actually atomic

For example, x++ consists of three parts: get the value of x, add 1 to the
value, put the new value back in x


If a spacecraft moves from point (x, y, z) to point (x', y', z'), these
values must be updated “simultaneously”


If another thread access x during the operation, the results are unpredictable
The spacecraft is never at point (x', y, z)
In Java, the approach is to “lock” the variables being updated,
thus denying access to other threads


It is extremely difficult, with this model, to write correct programs
Efficiency isn’t all that easy, either
Refs and STM

A ref is a mutable reference to an immutable value




To find the value, you have to dereference the variable



That is, the data remains immutable, but you can change what data the
ref refers to
Basic syntax: (ref initial-state)
Typical use: (def ref-variable (ref initial-state))
Syntax: (deref ref-variable )
Syntactic sugar: @ref-variable
A reference variable is like an “ordinary” variable in an
“ordinary” language—its value can be changed

However, there are restrictions on where it can be changed
Updating a reference variable

Reference variables can only be updated in a transaction


Basic syntax: (dosync expression ... expression)
Typical use: (dosync (ref-set ref-variable new-value))


A transaction is:


Atomic -- From outside the transaction, the transaction appears instantaneous: It
has either happened, or it hasn’t
Consistent -- With some additional syntax, a reference can specify validation
conditions



ref-set is basically an “assignment” to a reference variable
If the conditions aren’t met, the transaction fails
Isolated -- Transactions cannot see partial results from other transactions
A transaction is not:



Durable -- Results are not saved for a future run of the program
Databases are ACID: Atomic, Consistent, Isolated, and Durable
Software transactions are only “ACI”
alter

The alter method provides a somewhat more readable way to update a
reference variable




Due to the way alter works, when you write an updating function, it should
have the thing being updated as the first argument



Syntax: (alter ref-variable function args-to-function)
Typical usage: (dosync (alter ref-variable function args))
The return value of alter is the new value of ref-variable
(defn function [thing-to-update other-arguments] function-body)
When you cons something to a list, the list is the second parameter, which is
not what you need
There is an additional function, conj, which is like cons with the arguments
reversed

(conj sequence item) == (cons item sequence)
How STM works

A transaction takes a private copy of any reference it needs


Since data structures are persistent, this is not expensive
The transaction works with this private copy

If the transaction completes its work,
and the original reference has not been changed (by some other
transaction),
then the new values are copied back atomically to the original reference

But if, during the transaction, the original data is changed, the transaction
will automatically be retried with the changed data
However, if the transaction throws an exception, it will abort without a
retry

Side effects

alter must:



Be completely free of side effects
Return a purely functional transformation of the ref value
alter can only be done inside a transaction


A transaction may be retried multiple times
Therefore: If the transaction updates state, the update may
happen many times
Pessimistic locking

Java’s approach is pessimistic--access to shared state
is always locked (if correctly done!)





Locking is expensive
In most scenarios, contention happens only occasionally
Therefore, the expense of locking is usually unnecessary
But…
Locking must be done anyway because another thread might
try to access the shared state
Optimistic evaluation

Clojure’s approach is optimistic--it assumes contention might
happen, but probably won’t happen






Transactions begin immediately, without locking
Because data is persistent (immutable), making a private copy is much less
expensive than locking mutable data
Clojure guarantees that every transaction will eventually finish, so
deadlock does not occur
When the transaction completes, the results are copied back atomically
Therefore, locking never happens unnecessarily
In a high concurrency, high contention scenario, a transaction
could be tried many times before it finally succeeds or aborts

In this situation, locking is a more efficient approach
Everybody’s first example

(def account1 (ref 1000))
(def account2 (ref 1500))
(defn transfer
"Transfers amount of money from a to b"
[a b amount]
(dosync
(alter a - amount)
(alter b + amount) ) )
(transfer account1 account2 300)
(transfer account2 account1 50)

Practical Clojure, Luke Vanderhart and Stuart Sierra, p. 101
Atoms





Refs are used to coordinate multiple changes, which must happen all at once
or not at all
Atoms are like refs, but only change one value, and need not occur within a
transaction

Syntax: (atom initial-value)

Typical usage: (def atom-name (atom initial-value))
Atoms are accessed just like refs

(deref atom-name) or @atom-name
(reset! atom-name new-value) will change the value of an atom, and return
the new value
(swap! atom-name function additional-arguments) calls the function
with the current value of the atom and any additional arguments, and returns
the new value


Like alter, swap! may be retried multiple times, so should have no side effects
Atoms are less powerful than refs, but also less expensive
Validation and metadata


The keyword :validator introduces a validation function, while
the keyword :meta introduces a map of metadata
Syntax:



The validator function is used when a transaction is attempted


(ref initial-value :validator validator-fn :meta metadata-map)
(atom initial-value :validator validator-fn :meta metadata-map)
If a validator function fails, the ref or atom throws an
IllegalStateException, and the transaction doesn’t happen
Metadata is data about the data, for example, the source of the
data, or whether it is serializable



Metadata is not considered in comparisons such as equality testing
There are methods for working with metadata
Metadata is outside the scope of this lecture
Agents


A agent is like an Erlang actor that holds (or is?) a value
An agent is created with an initial value:



You can send an agent a function to update its state:




Syntax: (agent initial-state)
Typical use: (def agent-name (agent initial-state))
(send agent-name update-function arguments)
The value of the send is the agent itself, not the value of the function
(except, possibly, in the REPL)
You can check the current state of an agent with deref or @
You can wait for agents to complete:

(await agent-name-1 ... agent-name-N)


This is a blocking function, and it could block forever
(await-for timeout-millis agent-name-1 ... agent-name-N)

This is a blocking function; it returns nil if it times out, otherwise non-nil
More about agents

When you “create” an agent, you are getting a thread from a Clojuremanaged thread pool



send-off has the same syntax and semantics as send, but is optimized for
slow processes (such as I/O)
If you use a send (or send-off) within a transaction, it is held until the
transaction completes


agent starts the agent running concurrently
This prevents the send from occurring multiple times
I don’t think there is a way to stop an individual agent


Clojure will not terminate cleanly if there are still agents running
The function (shutdown-agents) tells all agents to finish up their current
tasks, refuse to accept any more tasks, and stop running
Agents and errors

Agents can have validating functions and metadata

Example: (def counter (agent 0 :validator number?))
If you send bad data to an agent,
 The send returns immediately, without error
 The error occurs when you dereference the agent
 All further attempts to query the agent will give an error
(agent-errors agent-name) will return a sequence of errors
encountered by the agent
(clear-agent-errors agent-name) will make the agent useable
again
(set-validator! agent-name validator-fn) adds a validator to
an existing agent





(agent initial-state :validator validator-fn :meta metadata-map)
Calling Java



Clojure has good support for calling Java
Our interest here is using Java Threads as an
alternative to agents
The following syntax (notice the dots) creates a new
Thread, passes it a function to execute, and starts it:




user=> (defn foo 10)
user=> (.start (Thread. (fn [] (println foo))))
def and defn create vars (also called dynamic vars)
with an initial value, or root binding
The root binding is available to all Threads (which is
why the above works)
vars

def and defn create vars with a root binding


Usually, this is the only binding
You can use the binding macro to create thread-local
bindings




Syntax: (binding [var value ... var value] expression ...
expression)
The value of the binding is the value of the last expression
Bindings can only be used on vars that have been defined
by def at the top level
The bindings have dynamic scope--within the expressions
and all code called from those expressions


This differs from let, which has lexical scope
Thread-local bindings can be updated with set!
Watches


A watch is a function that is called whenever a state changes
A watch can be set on any identity (atoms, refs, agents, and vars)


To add a watch:
(add-watch identity key watch-function)



The key may be any value that is different from the key for any other
watcher on this same identity
To define a watch function:
(defn function-name [key identity old-val new-val]
expressions)
When the watch function is called, it is given the old and new
values of the identity of the update that caused the change


For vars, the watch is only called when the root binding changes
Other updates may have happened in the meantime
To remove a watch function:
(remove-watch identity key)
Automatic parallelism

The function pmap is just like map, except that the
function is applied to the sequence values in parallel




The number of threads used depends on the number of CPUs
on the system
pmap is “partially lazy”--it may run a bit ahead
pvalues takes any number of expressions, and returns a
lazy sequence of their values
pcalls takes any number of zero-argument functions,
and returns a lazy sequence of their values
Futures and promises

A future is a computation, running in a single thread, whose value will be
required sometime in the future




You can check if a future has completed with
(future-done? future-name)
You can get the value computed by a future with deref or @






Syntax: (future expressions)
Typical use: (def future-name (future expressions))
This will block until the future is completed
A promise is a result that may not yet exist
Threads may ask for the result, and block until it exists
To create a promise, use (def promise-name (promise))
To deliver a value to a promise, use (deliver promise-name value)
The value can be retrieved with deref or @
An aside: memoization

Memoization is keeping track of previously computed values of a function, in
case they are needed again

Example:
The Collatz function of 6 gives: 6 3 10 5 16 8 4 2 1
The Collatz function of 7 gives: 7 22 11 34 17 52 26 13 40 20 10 5 16 8 4 2
1

Example 2: The factorial of 100 is trivial to compute if you already know the
factorial of 99

In Clojure, functions can be memoized:
(def faster-collatz (memoize collatz))



This will keep track of all previously computed values of collatz
If this requires too much memory, you can write a more sophisticated method to
keep track of only some values
For obvious reasons, only pure functions can be memoized
Summary: When to use what

Refs


Atoms




When you absolutely must have mutable state
Remember scope is dynamic, not lexical
Validator functions


Introduce asynchronous concurrency
Vars


Synchronous, independent updates
Especially useful for memoized values
Agents


Synchronous, coordinated updates, using STM
To maintain data integrity
Watches

To trigger events when an identity’s value changes
Structs I


A struct is something like an object, more like a map
To define a struct: (defstruct name key ... key)


A struct may be created by calling struct with the correct
arguments in the correct order


Example:
(def witches (struct book "Witches Abroad" "Pratchett"))
A struct may be created by calling struct-map with key-value
pairs in any order


For example: (defstruct book :title :author)
Example:
(def witches (struct-map book :author "Pratchett" :title
"Witches Abroad"))
When creating a struct with struct-map,


It is not necessary to supply a value for every key
Additional keys and values may be included
Structs II

Structs are maps, and may be accessed like maps




user=> (witches :title)
"Witches Abroad"
user=> (:author witches)
"Pratchett"
user=> (get witches :date "unknown")
"unknown"
Maps are, of course, immutable

assoc will return a new map based on an existing map, with new or
replaced key-value pairs


dissoc will return a new map with key-value pairs removed


Syntax: (assoc map key value ... key value)
Syntax: (dissoc map key ... key)
(contains? map key) will test if the key occurs in the map
The End
Download