! ! HIERATIC( ( Hierarchical(Analysis(of(( Complex(Dynamical(Systems( ( ( Deliverable:(D4.1!(revised!version)! Title:!Multi4scale!simulation!library!featuring!spatial! compartmentalisation!and!fast/slow!dynamics.! Authors:!Jan!Huwald,!JENA! Date:!6!January!2014! ( ( ( ( WP4 Deliverable: Multi-scale simulation library Jan Huwald January 6, 2014 Contents 1 Extension to the MASON simulator 1.1 Temporal hierarchy of agents . . . . . . . . . . . . . . . . . . 1.2 Spatial hierarchy of compartments . . . . . . . . . . . . . . . 1 1 3 2 Discretization of continuous particle systems 2.1 Envisioned approach . . . . . . . . . . . . . . . . . . . . . . . 2.2 Enumeration of structures of MD-Simulation . . . . . . . . . 2.3 Griephs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 8 9 3 Efficient discretization of particle trajectories 10 4 Publications 13 1 Extension to the MASON simulator An extension to the MASON simulator has been developed that supports to embed agents into temporal and spatial hierarchies. The modified MASON package is accompanied in the file code/mason.tar.bz2. Conception and usage of the added library interface is described below. 1.1 Temporal hierarchy of agents MASON manages the progression of time using a distinct scheduler object (Schedule), that calls the step function of each agents once due. We added a scheduler class (HierachicalSchedule) that implements a hierarchy of 1 timers yielding a hierarchy of slow-fast dynamics: Every timer has one subordinate timer. For every tick of the superior timer, the subordinate timer executes many ticks. A B C time The number of micro-ticks per macro-tick is dependent on a user-specified mode and the behavior of the fast agents (the ones running on the subordinate timer). Three behaviors have been implemented: Constant: for every macro-tick a user-specified constant number of microticks occurs EquiOne: a macro-tick happens after a micro-tick once at least one agent stepped during the micro-tick signaled that it reached it’s equilibrium state EquiAll: a macro-tick happens after a micro-tick once all agents stepped during the micro-tick signaled that they reached their equilibrium state Whether an agent reached its equilibrium state is determined by the agent itself. 1.1.1 Usage The hierarchical schedule is initialized with HierachicalSchedule(EquiRelation[] hierarchy); which is given an array of the above slow-fast timer relations, sorted from slowest to fastest. This definition is constant during the run-time of the scheduler. Agents can be added to and removed from the scheduler by calling void scheduleHierarchy(final EquiSteppable agent, int level); void unscheduleHierarchy(final EquiSteppable agent, int level); where level refers to the level in the slow-fast hierarchy given to the constructor. Level 0 denotes the slowest, top-most level. The maximal level is hierarchy.length + 1. 2 The hierarchical scheduler extends the non-hierarchical one. Agents embedded in the time hierarchy are compatible with those added using calls inherited from Schedule. During one time step of Schedule the lowest (fastest) level of the timer hierarchy is stepped once. So far, an agent is implemented in MASON via the Steppable interface. To allow testing for the equilibrium condition we extend this interface to EquiSteppable, adding the method public boolean isEquillibrium(SimState state); that returns whether an agent reached an equilibrium state. The hierarchical scheduler expects all agents to implement this interface. In case the equilibrium condition is irrelevant to the problem an agent, it can be implemented as child of the abstract class DefaultEquiSteppable which always returns to be in equilibrium. 1.2 Spatial hierarchy of compartments In MASON the concept of space is implemented by registering an agent with one or more field objects. Existing fields are regular grids, continuous spaces and graphs. We added the CompartmentField to represent space as a hierarchy of compartments. A compartmentalized space is defined by dimension d 1 and base b 2: the d-dimensional euclidean unit space is equipartitioned in a grid of bd cubic sub-spaces. This division is applied to each subspace recursively. The figure below shows the first three recursion steps for the compartment space defined by d = 2, b = 2: level of detail x 3 y A qualified position in such a compartment space is given by a level l d and a d-tuple of strings ⌃l over the alphabet ⌃ = {0, . . . , b 1}. This string describes for each dimension which compartment to choose during a descent from level 0 to level l. When to decide weather two points of level l1 and l2 have the same location, only the first min(l1 , l2 ) characters of their position string are compared. Thus–in contrast to all other fields–the spatial identity relation ⇡c in a CompartmentField is non-transitive: for three points p1 ⇡c p2 ⇡c p3 6⇡c p1 may be true. In the image above, the highlighted compartment in level 2 has the position (11, 01). All cells which belong to this location (for which ⇡c holds) are highlighted. 1.2.1 Usage A field is initialized with dimension and base: CompartmentField(int d, int b) To insert an object into the field or update the position of one already inserted, the method setObjectLocation is used. It’s second parameter is the destined position: void setObjectLocation(Object obj, CompartmentPosition pos); d The CompartmentPosition is an immutable object storing ⌃l . But directly operating on the string representation of the position is discouraged. Instead three methods of motion are offered: CompartmentPosition up(); CompartmentPosition down(int[] subDir); CompartmentPosition side(int dir, int off); The methods effects are self-explanatory in light of the figure above. Moving downwards toward more spatial detail requires to specify into which subspace of the current position to descent. Moving sideways requires the dimension in which to move and the distance (and sign) of the movement. Under standard compartment assumptions, an individual agent would only use the side movement routine: it would move from one neighboring compartment to another, but not between levels of abstractions. The methods up and down would be used by the programmer to specify relations 4 between agents that are not in the same level of abstraction–typically only during the initialization of the simulation, to specify all components and their places. To start navigating using these functions the root position is acquired from the compartment field via CompartmentPosition getRoot(); This root is equivalent to the undivided unit space or a global variable scope. To determine it’s environment, an agent uses the getLocals method of a CompartmentField. It returns all objects of the field that are of identical position with respect to the ⇡c relation–that is all agents that can be reached by going up or down but not sideways in the compartment hierarchy. 2 Discretization of continuous particle systems Particle-based models suffer from exorbitant computational demands once particle count or the simulated time frame reaches biologically relevant scales. Yet in those domain the precise wiggling and jiggling is irrelevant and only abstract long-term behavior is sought. The current approach to overcome those limits is to manually design coarser models with dynamics that are arbitrary rather than based on physical first principles. We worked towards automatizing this step and relating the coarse scale dynamics to (fine grained) grounding dynamics. The aim is to improve simulation efficiency by ignoring (spatial and temporal) small scale information, but retaining the large scale behavior induced by those dynamics. Our approach is to discretize the particle system and then apply a hierarchical coarse graining. During the first year we developed the approach, did feasibility checks and went several steps towards a working implementation. The approach and some of the resulting prototypes are detailed below. 2.1 Envisioned approach We assume a particle system governed by newtonian dynamics: n particles and d dimensions yielding the phase space R2nd . The dynamics are induced by pair-wise force terms, depending soly on the particle types elements and their relative distances. We assume biology-scale systems to be be our primary use case: Temperature and pressure are largely constant or irrelevant, whereas the relative 5 position and proximity of key proteins over large time frames is highly relevant. The systems are governed soly by short-range interaction. The key to our approach is to identify repeating local configurations in space and time. But as the system is continuous, identical situations have probability 0. Thus we discretize the state into classes of similar behavior before searching for patterns. Both steps are depicted below: state graph grieph continuous layer discrete layer The state graph is an undirected graph with labeled vertices and edges: Each particle is a vertex annotated with particle type (e.g. the chemical element). The edges are annotated with the quantized distances of it’s adjacent vertices in phase space (typically: distance and relative velocity). For storage efficiency the edges of the farthest quantization class are stored only implicitly, effectively implementing a cut-off distance. For a given particle count and dimension, the resulting state graph has a finite number of configurations. The only possible event in this system is the change of an edge label, corresponding to a changing phase space distance between particles. We conjecture that with probability 1 at most one such change occurs at any time. This allows us to use a Gillespie-style update scheme: 1. For each edge, the mean time to change and the probability distribution of the future labels is computed. 2. The next edge to change and the time to this change is computed from that distribution. 3. The state graph is updated accordingly. 4. The process is repeated indefinitely. 6 To compute the future of an edge n1 , n2 the local environment around it is considered: the subgraph induced by all nodes adjacent to the nodes n1 , n2 via non-implicit edges. This subgraph represents a system of inequalities to constrain the phase space of the continuous system. Samples of this phase space are generated and simulated according to the grounding dynamics of the continuous layer. continuous sample generation discretization continous simulation The next step is to discover subgraphs in the state graph that repeat in space or time. This is a algorithmically hard problem: the underlying subgraph isomorphism problem is NP-hard. Instead of an exact solution we rely on a (pluggable) heuristic. Via hierarchical clustering the state graph is transformed into a tree: a node is either a leaf (corresponding to a particle) or consists of two subordinate nodes, the assignment of connections between them and their environment and the transition distribution for internal edges (see the figure below); leafs correspond to particles; the root node implicitly contains the entire simulation state. The heuristic is used to select the nodes to merge during the clustering. State update is implemented as a recursive function: The successor state of a node is computed as either a change in one of both sub-nodes or change of one of the edges between those subnodes. The choice is made randomly according to the transition distributions of the individual elements. This promises efficiency gains by three-fold application of memoization: 1. Instead of the usual representation of the tree using pointers, hash7 consing1 is used: Nodes are referenced by their hash value. For practical purposes hash values are identical if and only if the nodes are identical. Thus identical nodes are discovered automatically during the tree construction. We call the resulting directed acyclic graph a grieph. Note that a grieph exactly represents the state graph. The only information loss in the entire scheme occurs during the discretization from continuous space into the state graph. 2. Memoization of the recursive state update function allows to cache the effect of micro-changes to macro-structure. This way, behavior on ever coarser scales can be derived from subordinate levels. By caching it, a subsequent recursive descent can be omitted and the dynamics can be evaluated on a macro-level. 3. An augmented analysis–where desired properties are computed employing a divide-and-conquer strategy along the nodes of the grieph– are amendable to memoization as well. This allows fast updates of the properties relevant to the experimenter without iterating over the whole simulation state. We build several prototypes to asses the viability of the proposed approach. 2.2 Enumeration of structures of MD-Simulation To be efficient, the proposed approach induces two preconditions on the coarse grained system: 1. The update frequency of an edge in the state graph should be significantly slower than a position change in the continuous layer. 2. A memoization-induced performance gain requires the reoccurence of configurations of particle neighborhoods. To check those preconditions we simulated a system of Lennard-Jones particles using velocity verlet integration. For each time step and each particle we computed the local state graph–the subgraph of the state graph induced by all adjacent nodes. From that we computed the rate of change in the state graph and the distribution–and thus reoccurence frequency–of different configurations. 1 Hash-consing is memoization applied to constructors of data structures 8 The used simulator is attached (see code/statecount.tar.bz2). It is optimized towards high throughput: it is implemented in C++, using CUDA to offload all computations to the massively parallel processor of a graphic card acquired within the project. To reduce the required amount of storage, a hash value of the local graph is stored instead of the graph itself. To this end a custom hash function derived from Keccak has been employed. 2.3 Griephs Computing a grieph from a given continuous layer state is nontrivial: the problem is (yet) under-specified and inhibits a high degree of freedom in the choice of algorithms and data structures. To allow rapid algorithm engineering a prototype has been written in Haskell that is geared towards high adaptability. It computes a grieph from a given phase space point using • a variable merging strategy, • arbitrary phase spaces and quantizations, and • a variable payload. In addition it was used as test-bed to develop grieph traversal and generic computation over griephs. Quick iterations of the algorithm design are fostered by three aspects: 1. A lot of context information is encoded in the type system, allowing the Haskell compiler to prevent most coding mistakes and point to corner cases. 2. A set of automatized tests that confirm desired high-level properties of the using automatically generated test cases. This includes for example a test for the idempotence of g g 1 (with g being the state graph to grieph conversion function). It leads to a programming style by counter-example. 3. A tool to visualize the grieph data structure as displayed below. To reconstruct an state graph edge from a grieph a number of nodes have to be traversed. In the graph, the “edges path” through different grieph nodes is shown in exact resemblance of the node-internal data structures being used. Those graphs can be used to quickly detect places where grieph aggregation happened erroneously. 9 0,0,6 3,1,0 3,1,1 2,1,1 1,1,1 2,1,1 672 666 667 668 669 670 671 The software is attached (see code/grieph.tar.bz2). 3 Efficient discretization of particle trajectories Compression–that is efficient representation–requires a mechanized understanding of the subject. It is a first step towards coarse graining the underlying system: In the coarse graining diagram below it is an implementation of the coarsening function ⇡ : X ! Y . For a full coarse graining it lacks the coarse update function g : Y ! Y . f X X g Y Y We investigated efficient representations of time-discrete particle trajectories in euclidean space. Mathematically they are described as points x 2 RN DT (N particles, D dimensions, T time steps) with the additional constraint that for a fixed particle (nd = const), small changes in time constrain position changes to be small as well: ||xn,d,t xn,d,t+ t || c t. Such trajectories are generated by spatial agent-based simulations. Especially for molecular dynamics simulation–where numerical stability requires very small time steps–those trajectories require large amounts of storage. The de-facto standard approach to reduce the storage requirements is to take simulation snapshots at an arbitrary frequency. Improvements over this approach would facilitate permanence and exchange of simulation data, thus improving reproducibility of research. Our compression approach rests on three pillars: 10 1. Representation of the trajectory by piece-wise composition of functions from a predefined set of functions. 2. Quantization of the real valued input to enable efficient representation of the functions using a compact variable length integer encoding. 3. A user-specified error bound that is entirely consumed: The approximation is as coarse as possible while still maintaining the error bound. f1 x f2 x t t To fulfill the additional requirements of MD simulations–online processing with high throughput and a small memory footprint–we use onedimensional linear functions spanned by integer support points as templates for composition: {t ! x0 +t x1t0x0 : x0 , x1 2 Z, t0 2 N}. This allows us to encode the trajectory entirely by storing a stream of tuples (xi+1 xi , ti+1 ti ). Those differences are small for typical trajectories and thus have a compact representation. For variable length integer encoding we use a state-ofthe-art library. The user-specified error budget is equally divided between quantization error and approximation error. The software is attached (see code/mdtrajcomp.tar.bz2). In our tests it has been faster and compressing better than the state of the art approach. The graph below shows the compression rate of a Kintochore simulation trajectory in dependence of the specified positional error bound. The compression ratio can be significantly below one bit per sample. A detailing publication is in preparation. 11 bits per sample (log) 10 1 0.1 0.01 0.001 1e-07 1e-06 1e-05 0.0001 0.001 ✏ (log) 0.01 0.1 1 The approach is amendable to hierarchisation in a number of different ways: • Polynomials of ever higher degrees capturing larger time spans can be constructed from lower order polynomials capturing short time spans. • Linear function can be extended to splines capturing increasing time spans. An extension using spines has been developed within a bachelor thesis. • Application knowledge can be included to select for special function classes. For example, stationary oscillations might be expressed using sinusoidal functions. 12 higher order functions linear functions constant functions x t 4 Publications Ibrahim, B., Henze, R., Gruenert, G., Egbert, M., Huwald, J., & Dittrich, P. (2013). Spatial Rule-Based Modeling: A Method and Its Application to the Human Mitotic Kinetochore. Cells, 2(3), 506-544. This paper introduces the reader to rule-based modeling in space applied to biological system using the simulation software SRSim previously developed at our group. As test case a model of the kinetochore is used. This protein complex takes part in the control of the cell cycle (WP7). A similarity metric based on the discretization scheme introduced in section 2 is used to analyze the acquired simulation data. 13