D4.1 - HIERATIC

advertisement
!
!
HIERATIC( (
Hierarchical(Analysis(of((
Complex(Dynamical(Systems(
(
(
Deliverable:(D4.1!(revised!version)!
Title:!Multi4scale!simulation!library!featuring!spatial!
compartmentalisation!and!fast/slow!dynamics.!
Authors:!Jan!Huwald,!JENA!
Date:!6!January!2014!
(
(
(
(
WP4 Deliverable: Multi-scale simulation library
Jan Huwald
January 6, 2014
Contents
1 Extension to the MASON simulator
1.1 Temporal hierarchy of agents . . . . . . . . . . . . . . . . . .
1.2 Spatial hierarchy of compartments . . . . . . . . . . . . . . .
1
1
3
2 Discretization of continuous particle systems
2.1 Envisioned approach . . . . . . . . . . . . . . . . . . . . . . .
2.2 Enumeration of structures of MD-Simulation . . . . . . . . .
2.3 Griephs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
5
8
9
3 Efficient discretization of particle trajectories
10
4 Publications
13
1
Extension to the MASON simulator
An extension to the MASON simulator has been developed that supports to
embed agents into temporal and spatial hierarchies. The modified MASON
package is accompanied in the file code/mason.tar.bz2. Conception and
usage of the added library interface is described below.
1.1
Temporal hierarchy of agents
MASON manages the progression of time using a distinct scheduler object
(Schedule), that calls the step function of each agents once due. We added
a scheduler class (HierachicalSchedule) that implements a hierarchy of
1
timers yielding a hierarchy of slow-fast dynamics: Every timer has one subordinate timer. For every tick of the superior timer, the subordinate timer
executes many ticks.
A
B
C
time
The number of micro-ticks per macro-tick is dependent on a user-specified
mode and the behavior of the fast agents (the ones running on the subordinate timer). Three behaviors have been implemented:
Constant: for every macro-tick a user-specified constant number of microticks occurs
EquiOne: a macro-tick happens after a micro-tick once at least one agent
stepped during the micro-tick signaled that it reached it’s equilibrium
state
EquiAll: a macro-tick happens after a micro-tick once all agents stepped
during the micro-tick signaled that they reached their equilibrium state
Whether an agent reached its equilibrium state is determined by the
agent itself.
1.1.1
Usage
The hierarchical schedule is initialized with
HierachicalSchedule(EquiRelation[] hierarchy);
which is given an array of the above slow-fast timer relations, sorted from
slowest to fastest. This definition is constant during the run-time of the
scheduler.
Agents can be added to and removed from the scheduler by calling
void
scheduleHierarchy(final EquiSteppable agent, int level);
void unscheduleHierarchy(final EquiSteppable agent, int level);
where level refers to the level in the slow-fast hierarchy given to the constructor. Level 0 denotes the slowest, top-most level. The maximal level is
hierarchy.length + 1.
2
The hierarchical scheduler extends the non-hierarchical one. Agents embedded in the time hierarchy are compatible with those added using calls
inherited from Schedule. During one time step of Schedule the lowest
(fastest) level of the timer hierarchy is stepped once.
So far, an agent is implemented in MASON via the Steppable interface.
To allow testing for the equilibrium condition we extend this interface to
EquiSteppable, adding the method
public boolean isEquillibrium(SimState state);
that returns whether an agent reached an equilibrium state. The hierarchical
scheduler expects all agents to implement this interface. In case the equilibrium condition is irrelevant to the problem an agent, it can be implemented
as child of the abstract class DefaultEquiSteppable which always returns
to be in equilibrium.
1.2
Spatial hierarchy of compartments
In MASON the concept of space is implemented by registering an agent
with one or more field objects. Existing fields are regular grids, continuous
spaces and graphs. We added the CompartmentField to represent space as
a hierarchy of compartments.
A compartmentalized space is defined by dimension d 1 and base b 2:
the d-dimensional euclidean unit space is equipartitioned in a grid of bd cubic
sub-spaces. This division is applied to each subspace recursively. The figure
below shows the first three recursion steps for the compartment space defined
by d = 2, b = 2:
level of detail
x
3
y
A qualified position in such a compartment space is given by a level l
d
and a d-tuple of strings ⌃l over the alphabet ⌃ = {0, . . . , b 1}. This string
describes for each dimension which compartment to choose during a descent
from level 0 to level l.
When to decide weather two points of level l1 and l2 have the same location, only the first min(l1 , l2 ) characters of their position string are compared.
Thus–in contrast to all other fields–the spatial identity relation ⇡c in a
CompartmentField is non-transitive: for three points p1 ⇡c p2 ⇡c p3 6⇡c p1
may be true.
In the image above, the highlighted compartment in level 2 has the position (11, 01). All cells which belong to this location (for which ⇡c holds)
are highlighted.
1.2.1
Usage
A field is initialized with dimension and base:
CompartmentField(int d, int b)
To insert an object into the field or update the position of one already
inserted, the method setObjectLocation is used. It’s second parameter is
the destined position:
void setObjectLocation(Object obj, CompartmentPosition pos);
d
The CompartmentPosition is an immutable object storing ⌃l . But
directly operating on the string representation of the position is discouraged.
Instead three methods of motion are offered:
CompartmentPosition up();
CompartmentPosition down(int[] subDir);
CompartmentPosition side(int dir, int off);
The methods effects are self-explanatory in light of the figure above.
Moving downwards toward more spatial detail requires to specify into which
subspace of the current position to descent. Moving sideways requires the
dimension in which to move and the distance (and sign) of the movement.
Under standard compartment assumptions, an individual agent would
only use the side movement routine: it would move from one neighboring compartment to another, but not between levels of abstractions. The
methods up and down would be used by the programmer to specify relations
4
between agents that are not in the same level of abstraction–typically only
during the initialization of the simulation, to specify all components and
their places.
To start navigating using these functions the root position is acquired
from the compartment field via
CompartmentPosition getRoot();
This root is equivalent to the undivided unit space or a global variable scope.
To determine it’s environment, an agent uses the getLocals method of
a CompartmentField. It returns all objects of the field that are of identical
position with respect to the ⇡c relation–that is all agents that can be reached
by going up or down but not sideways in the compartment hierarchy.
2
Discretization of continuous particle systems
Particle-based models suffer from exorbitant computational demands once
particle count or the simulated time frame reaches biologically relevant scales.
Yet in those domain the precise wiggling and jiggling is irrelevant and only
abstract long-term behavior is sought. The current approach to overcome
those limits is to manually design coarser models with dynamics that are
arbitrary rather than based on physical first principles.
We worked towards automatizing this step and relating the coarse scale
dynamics to (fine grained) grounding dynamics. The aim is to improve simulation efficiency by ignoring (spatial and temporal) small scale information,
but retaining the large scale behavior induced by those dynamics. Our approach is to discretize the particle system and then apply a hierarchical
coarse graining.
During the first year we developed the approach, did feasibility checks
and went several steps towards a working implementation. The approach
and some of the resulting prototypes are detailed below.
2.1
Envisioned approach
We assume a particle system governed by newtonian dynamics: n particles
and d dimensions yielding the phase space R2nd . The dynamics are induced
by pair-wise force terms, depending soly on the particle types elements and
their relative distances.
We assume biology-scale systems to be be our primary use case: Temperature and pressure are largely constant or irrelevant, whereas the relative
5
position and proximity of key proteins over large time frames is highly relevant. The systems are governed soly by short-range interaction.
The key to our approach is to identify repeating local configurations in
space and time. But as the system is continuous, identical situations have
probability 0. Thus we discretize the state into classes of similar behavior
before searching for patterns. Both steps are depicted below:
state graph
grieph
continuous layer
discrete layer
The state graph is an undirected graph with labeled vertices and edges:
Each particle is a vertex annotated with particle type (e.g. the chemical
element). The edges are annotated with the quantized distances of it’s adjacent vertices in phase space (typically: distance and relative velocity). For
storage efficiency the edges of the farthest quantization class are stored only
implicitly, effectively implementing a cut-off distance. For a given particle count and dimension, the resulting state graph has a finite number of
configurations.
The only possible event in this system is the change of an edge label,
corresponding to a changing phase space distance between particles. We
conjecture that with probability 1 at most one such change occurs at any
time. This allows us to use a Gillespie-style update scheme:
1. For each edge, the mean time to change and the probability distribution
of the future labels is computed.
2. The next edge to change and the time to this change is computed from
that distribution.
3. The state graph is updated accordingly.
4. The process is repeated indefinitely.
6
To compute the future of an edge n1 , n2 the local environment around it
is considered: the subgraph induced by all nodes adjacent to the nodes n1 , n2
via non-implicit edges. This subgraph represents a system of inequalities to
constrain the phase space of the continuous system. Samples of this phase
space are generated and simulated according to the grounding dynamics of
the continuous layer.
continuous
sample
generation
discretization
continous
simulation
The next step is to discover subgraphs in the state graph that repeat
in space or time. This is a algorithmically hard problem: the underlying
subgraph isomorphism problem is NP-hard.
Instead of an exact solution we rely on a (pluggable) heuristic. Via hierarchical clustering the state graph is transformed into a tree: a node is
either a leaf (corresponding to a particle) or consists of two subordinate
nodes, the assignment of connections between them and their environment
and the transition distribution for internal edges (see the figure below); leafs
correspond to particles; the root node implicitly contains the entire simulation state. The heuristic is used to select the nodes to merge during the
clustering.
State update is implemented as a recursive function: The successor state
of a node is computed as either a change in one of both sub-nodes or change
of one of the edges between those subnodes. The choice is made randomly
according to the transition distributions of the individual elements.
This promises efficiency gains by three-fold application of memoization:
1. Instead of the usual representation of the tree using pointers, hash7
consing1 is used: Nodes are referenced by their hash value. For practical purposes hash values are identical if and only if the nodes are
identical. Thus identical nodes are discovered automatically during
the tree construction. We call the resulting directed acyclic graph a
grieph.
Note that a grieph exactly represents the state graph. The only information loss in the entire scheme occurs during the discretization from
continuous space into the state graph.
2. Memoization of the recursive state update function allows to cache the
effect of micro-changes to macro-structure. This way, behavior on ever
coarser scales can be derived from subordinate levels. By caching it, a
subsequent recursive descent can be omitted and the dynamics can be
evaluated on a macro-level.
3. An augmented analysis–where desired properties are computed employing a divide-and-conquer strategy along the nodes of the grieph–
are amendable to memoization as well. This allows fast updates of
the properties relevant to the experimenter without iterating over the
whole simulation state.
We build several prototypes to asses the viability of the proposed approach.
2.2
Enumeration of structures of MD-Simulation
To be efficient, the proposed approach induces two preconditions on the
coarse grained system:
1. The update frequency of an edge in the state graph should be significantly slower than a position change in the continuous layer.
2. A memoization-induced performance gain requires the reoccurence of
configurations of particle neighborhoods.
To check those preconditions we simulated a system of Lennard-Jones
particles using velocity verlet integration. For each time step and each particle we computed the local state graph–the subgraph of the state graph
induced by all adjacent nodes. From that we computed the rate of change
in the state graph and the distribution–and thus reoccurence frequency–of
different configurations.
1
Hash-consing is memoization applied to constructors of data structures
8
The used simulator is attached (see code/statecount.tar.bz2). It is
optimized towards high throughput: it is implemented in C++, using CUDA
to offload all computations to the massively parallel processor of a graphic
card acquired within the project. To reduce the required amount of storage,
a hash value of the local graph is stored instead of the graph itself. To this
end a custom hash function derived from Keccak has been employed.
2.3
Griephs
Computing a grieph from a given continuous layer state is nontrivial: the
problem is (yet) under-specified and inhibits a high degree of freedom in the
choice of algorithms and data structures. To allow rapid algorithm engineering a prototype has been written in Haskell that is geared towards high
adaptability.
It computes a grieph from a given phase space point using
• a variable merging strategy,
• arbitrary phase spaces and quantizations, and
• a variable payload.
In addition it was used as test-bed to develop grieph traversal and generic
computation over griephs.
Quick iterations of the algorithm design are fostered by three aspects:
1. A lot of context information is encoded in the type system, allowing
the Haskell compiler to prevent most coding mistakes and point to
corner cases.
2. A set of automatized tests that confirm desired high-level properties of
the using automatically generated test cases. This includes for example
a test for the idempotence of g g 1 (with g being the state graph
to grieph conversion function). It leads to a programming style by
counter-example.
3. A tool to visualize the grieph data structure as displayed below. To
reconstruct an state graph edge from a grieph a number of nodes have
to be traversed. In the graph, the “edges path” through different grieph
nodes is shown in exact resemblance of the node-internal data structures being used. Those graphs can be used to quickly detect places
where grieph aggregation happened erroneously.
9
0,0,6
3,1,0
3,1,1
2,1,1
1,1,1
2,1,1
672
666
667
668
669
670
671
The software is attached (see code/grieph.tar.bz2).
3
Efficient discretization of particle trajectories
Compression–that is efficient representation–requires a mechanized understanding of the subject. It is a first step towards coarse graining the underlying system: In the coarse graining diagram below it is an implementation
of the coarsening function ⇡ : X ! Y . For a full coarse graining it lacks the
coarse update function g : Y ! Y .
f
X
X
g
Y
Y
We investigated efficient representations of time-discrete particle trajectories in euclidean space. Mathematically they are described as points
x 2 RN DT (N particles, D dimensions, T time steps) with the additional
constraint that for a fixed particle (nd = const), small changes in time constrain position changes to be small as well: ||xn,d,t xn,d,t+ t ||  c t.
Such trajectories are generated by spatial agent-based simulations. Especially for molecular dynamics simulation–where numerical stability requires
very small time steps–those trajectories require large amounts of storage.
The de-facto standard approach to reduce the storage requirements is to
take simulation snapshots at an arbitrary frequency. Improvements over
this approach would facilitate permanence and exchange of simulation data,
thus improving reproducibility of research.
Our compression approach rests on three pillars:
10
1. Representation of the trajectory by piece-wise composition of functions
from a predefined set of functions.
2. Quantization of the real valued input to enable efficient representation
of the functions using a compact variable length integer encoding.
3. A user-specified error bound that is entirely consumed: The approximation is as coarse as possible while still maintaining the error bound.
f1
x
f2
x
t
t
To fulfill the additional requirements of MD simulations–online processing with high throughput and a small memory footprint–we use onedimensional linear functions spanned by integer support points as templates
for composition: {t ! x0 +t x1t0x0 : x0 , x1 2 Z, t0 2 N}. This allows us to encode the trajectory entirely by storing a stream of tuples (xi+1 xi , ti+1 ti ).
Those differences are small for typical trajectories and thus have a compact representation. For variable length integer encoding we use a state-ofthe-art library. The user-specified error budget is equally divided between
quantization error and approximation error. The software is attached (see
code/mdtrajcomp.tar.bz2).
In our tests it has been faster and compressing better than the state of the
art approach. The graph below shows the compression rate of a Kintochore
simulation trajectory in dependence of the specified positional error bound.
The compression ratio can be significantly below one bit per sample. A
detailing publication is in preparation.
11
bits per sample (log)
10
1
0.1
0.01
0.001
1e-07
1e-06
1e-05 0.0001 0.001
✏ (log)
0.01
0.1
1
The approach is amendable to hierarchisation in a number of different
ways:
• Polynomials of ever higher degrees capturing larger time spans can be
constructed from lower order polynomials capturing short time spans.
• Linear function can be extended to splines capturing increasing time
spans. An extension using spines has been developed within a bachelor
thesis.
• Application knowledge can be included to select for special function
classes. For example, stationary oscillations might be expressed using
sinusoidal functions.
12
higher
order
functions
linear
functions
constant
functions
x
t
4
Publications
Ibrahim, B., Henze, R., Gruenert, G., Egbert, M., Huwald, J., & Dittrich,
P. (2013). Spatial Rule-Based Modeling: A Method and Its Application to the Human Mitotic Kinetochore. Cells, 2(3), 506-544.
This paper introduces the reader to rule-based modeling in space applied to biological system using the simulation software SRSim previously
developed at our group.
As test case a model of the kinetochore is used. This protein complex
takes part in the control of the cell cycle (WP7). A similarity metric based
on the discretization scheme introduced in section 2 is used to analyze the
acquired simulation data.
13
Download