pgenesis

advertisement
PGENESIS Tutorial – GUM’02
Greg Hood
Pittsburgh Supercomputing Center
What is PGENESIS?
• Library extension to GENESIS that supports
communication among multiple processes –
so nearly everything available in GENESIS is
available in PGENESIS
• Allows multiple processes to perform multiple
simulations in parallel
• Allows multiple processes to work together
cooperatively on a single simulation
• Runs on workstations or supercomputers
History
• PGENESIS developed by Goddard and Hood
at PSC (1993-1998)
• Current contact: pgenesis@psc.edu
Tutorial Outline
• Installation
• What PGENESIS provides
• Using PGENESIS for parallel parameter
searching
• Using PGENESIS for simulating large
networks more quickly
• Scaling up for large runs
• A comparison of PGENESIS with alternatives
PGENESIS Installation
Installation: Requirements



At least 1 Unix-like computer on which
GENESIS will run.
Same account name on all computers.
If multiple machines are to be used together,
then it is best if they are all on the same
network segment (e.g. same 100Mbit/s Ethernet
switch).
Installation: GENESIS
1. Install regular (serial) GENESIS:
a. Make sure you have configured serial
GENESIS to include all libraries that you
will ever want to use with PGENESIS.
b. make all; make install
c. make nxall; make nxinstall if you want an
Xodus-less version of PGENESIS
Installation: ssh
2. Configure ssh to allow process startup across
machines without password entry:
a. You probably already have ssh/sshd. If not,
download from http://www.openssh.org and
install according to instructions.
b. Run ssh-keygen –t rsa on each machine from
which you will launch PGENESIS to generate
private/public keys.
c. Append all of the public keys (stored in
~/.ssh/id_rsa.pub) to ~/.ssh/authorized_keys on
all host on which you want to run PGENESIS
processes.
d. Test: ssh remote_host_name remote_command
should not ask you for a password.
Installation: PVM
3. Install PVM message passing library
a. Download from http://www.csm.ornl.gov/pvm
b. Modify .bashrc to set PVM_ROOT to where
PVM was installed:
export PVM_ROOT=/usr/share/pvm3
c. Modify .bashrc to set PVM_RSH to the ssh
executable:
export PVM_RSH=/usr/bin/ssh
d. Build PVM (“cd $PVM_ROOT; make”)
e. Test PVM
% pvm
pvm> add otherhost
pvm> halt
Installation: PGENESIS
3. Install PGENESIS package
a. Download from http://www.genesis-sim.org
b. cp Makefile.dist Makefile
c. Edit Makefile
d. make install
e. make nxinstall for Xodus-less version
Installation: Simple
•
•
•
Cluster of similar machines
Shared filesystem
Home directory is located on shared
filesystem
Installation: Complex
•
•
•
•
Heterogeneous cluster
Novel processor/OS
No shared filesystems
Custom libraries linked into GENESIS
Recommended approach:
• Install on each machine independently and
make sure PGENESIS works locally before
trying to use all machines together
The "pgenesis" Startup Script (1)
Purpose: checks that the proper PVM files are
in place, starts the PVM daemon, then starts
the appropriate PGENESIS executable.
Basic syntax:
pgenesis scriptname.g
The "pgenesis" Startup Script (2)
Options:
-config <filename> where <filename>
contains a list of hosts to use
-debug <mode>
where <mode> is
one of the following: tty dbx gdb
-nox
do not use Xodus
-v
verbose mode
-help
list the valid pgenesis
script flags
PGENESIS Functionality
How PGENESIS Runs in Parallel
This lecture will
introduce you to
PGENESIS, which is a
derivative of GENESIS
that has been augmented
with additional
commands to support
parallelism.
Contents
•
What
is PGENESIS? typically one process starts
Workstation:
How PGENESIS Runs
and then spawns n-1 other processes
in Parallel
Nodes and Zones
• mapping of processes to processors is
Who am I?
often 1 to 1, but may be many to 1 during
Threads
Synchronization
debugging
Remote Function Calls
Asynchronous Calls
Useful Commands for
How PGENESIS Runs in Parallel

Massively parallel machines:
all n processes are started simultaneously by
the operating system
mapping of processes to processors is nearly
always 1 to 1

On both:
every process runs same script
this is not a real limitation
Nodes and Zones

Each process is referred to as a "node".
 Nodes may be organized into "zones".
 A node is fully specified by a numeric string of
the form “<node>.<zone>”.
 Simulations within a zone are kept
synchronized in simulation time.
 Each node joins the parallel platform using
the paron command.
 Each node should gracefully terminate by
calling paroff
Every node in its own zone

Simulations on each node are not coupled
temporally.
 Useful for parameter searching.
 We refer to nodes as “0.0”, “0.1”, “0.2”, …
All nodes in one zone

Simulations on each node are coupled
temporally.
 Useful for large network models
 Zone numbers can be omitted since we are
dealing with only one zone; we can thus refer
to nodes as “0”, “1”, “2”, …
Hybrid schemes
Parameter searching on large network models
Example: The network is partitioned over 8
nodes; we run 16 simulations in parallel to do
parameter searching on this model, thus
using a total of 128 nodes.
Nodes have distinct namespaces
/elem1 on node 0 refers to an element on
node 0
/elem1 on node 1 refers to an element on
node 1
To avoid confusion we recommend that you
use distinct names for elements on different
nodes within a zone.
GENESIS Terminology
GENESIS
Object
Element
Message
Value
Computer Science
=
=
=
=
Class
Object
Connection
Message
Who am I?
PGENESIS provides several functions that allow a
script to determine its place in the overall parallel
configuration:
mytotalnode - # of this node in platform
mynode - # of this node in this zone
myzone - # of this zone
ntotalnodes - # of nodes in platform
nnodes - # of nodes in this zone
nzones - # of zones
npvmcpu - # of processors in configuration
mypvmid - PVM task identifier for this node (all
numbering starts at 0)
Styles of Parallel Scripts
– Each node executes the
same script commands.
 Symmetric
– One node (usually
node 0) coordinates processing and
issues commands to the other nodes.
 Master/Worker
Explicit Synchronization
barrier - causes thread to block until all nodes within
the zone have reached the corresponding barrier
barrier
-wait at default barrier
barrier 7
-wait at named barrier
barrier 7 100000 -timeout is 100000
seconds
barrierall - causes thread to block until all nodes in all
zones have reached the corresponding barrier
barrierall
-wait at default barrier
barrierall 7
-wait at named barrier
barrierall 7 100000 -timeout is 100000 sec
Implicit Synchronization
Two commands implicitly execute a zone-wide barrier:
step - implicitly causes the thread to block until all
nodes within the zone are ready to step (this
behavior can be disabled with “setfield /post
sync_before_step 0”)
reset - implicitly causes the thread to block until all
nodes have reset
These commands require that all nodes in the zone
participate, thus the barrier.
Remote Function Calls (1)
An "issuing" node directs a procedure to run on
an "executing" node.
Examples:
some_function@2 params...
some_function@all params...
some_function@others params...
some_function@0.4 params...
some_function@1,3,5 params...
Remote Function Calls (2)

Each remote function call causes the creation
of a new thread on the executing node.
 All parameters are evaluated on the issuing
node.
Example: if called from node 1,
some_function@2 {mynode} will execute
some_function 1 on node 2
Remote Function Calls (3)
When does the executing node actually perform
the remote function call, since we don't use
hardware interrupts?
While waiting at barrier or barrierall.
While waiting for its own remote operations to
complete, e.g. func@node, raddmsg
When the simulator is sitting at the prompt
waiting for user input.
When the executing script calls clearthread or
clearthreads.
Threads
A thread is a single flow of control within a
PGENESIS script being executed.


When a node starts, there is exactly one thread
on it – the thread for the script.
There may potentially be many threads per node.
These are stacked up, with only the topmost
actually executing at any moment.
clearthread – yield to one thread awaiting execution
(if one exists)
clearthreads – yield to all threads awaiting
execution
Asynchronous Calls (1)
The async command allows a script to dispatch
an operation on a remote node without
waiting for its completion.
Example:
async some_function@2 params...
Asynchronous Calls (2)
One may wait for an async call to complete, either
individually,
future = {async some_function@2 ...}
...
// do some work locally
waiton {future}
or for an entire set:
async some_function@2 ...
async some_function@5 ...
...
waiton all
Asynchronous Calls (3)
Asynchronous calls may return a value.
Example:
int future = async myfunc@1 // start thread on node 1
…
// do some work locally
int result = waiton {future}
// wait for thread's result
Thus the term "future" - it is a promise of a value some
time in the future. waiton calls in that promise.
Asynchronous Calls (4)
async returns a value which is only to be used as
the parameter of a waiton call, and waiton must
only be called with such a value.
 Remote function calls from a particular
issuing node to a particular executing node
are guaranteed to be performed in the
sequence they were sent.
 There is no guaranteed order among calls
involving multiple issuing or executing nodes.

Advice about Barriers (1)

It is very easy to reach deadlock if barriers
are not handled correctly. PGENESIS tries to
warn you by printing a message that it is
waiting at a barrier.
 Examples of incorrect barrier usage:
Each node executes: barrier {mynode}
Each node executes: barrier@all
A single node executes: barrier@others;
barrier; However: async barrier@others;
barrier will work!
Advice about Barriers (2)

Guideline: if your script is operating in the
symmetric style (all nodes execute all
statements), never use barrier@
 If your script is operating in the master-worker
style, master must ensure it calls a function
on each worker that executes a barrier before
it enters the barrier
barrier; async barrier@others
work.
will not
Commands for Network Creation
Several new commands permit the creation of
"remote" (internode) messages:
raddmsg /local_element /remote_element@2 \
SPIKE
rvolumeconnect /local_elements
\
/remote_elements@2 \
-sourcemask ... -destmask ...
\
-probability 0.5
rvolumedelay /local_elements -radial 10.0
rvolumeweight /local_elements -fixed 0.2
rshowmsg /local_elements
Parallel I/O: Display
How can one display from more than one node?
1. Use an xview object.
2. Add an index field to the displayed
elements.
3. Use the ICOORDS and IVAL1 ... IVAL5
messages instead of the COORDS and
VAL1 .. VAL5 messages:
raddmsg /src_elems /xview_elem@0 \
ICOORDS io_index_field x y z
raddmsg /src_elems /xview_elem@0 \
IVAL1 io_index_field Vm
Interaction with Xodus
Xodus introduces another degree of parallelism via
the X11 event processing mechanism. PGENESIS
periodically instructs the X Server to process any X
events. Some of those events may result in some
script code being run.
 Race condition: processing order is unpredictable.

Safe 1: ensure all affected nodes are at a barrier (or
equivalent)
Safe 2: ensure mouse/keyboard events do not cause
remote operations that require the participation of
another node.
Parallel I/O: Writing a File
How can one write a file from more than
one node?
1. Use a par_asc_file or par_disk_out
object.
2. Add an index field to the source
elements.
3. raddmsg /src_elems \
/par_asc_file_elem@0 \
SAVE io_index_field Vm
Tips for Avoiding Deadlocks

Use lots of echo statements.
 Use barrier IDs.
 Do not execute barriers remotely (e.g.,
barrier@all).
 Remember that step usually does an implicit
barrier.
 Have each node do its own step command,
or have one controlling node do a step@all.
(similarly for reset)
 Do not use the stop command.
 Keep things simple.
Motivation

Parallel control of setup can be hard.
 Parallel control of simulation can be hard.
 Debugging parallel scripts is hard.
How PGENESIS Fits into Schedule

Schedule controls the order in which
GENESIS objects get updated.
 At beginning of step, all internode data is
transferred.
 There will be equivalence to serial GENESIS
only if remote messages do not pass from
earlier to later elements in the schedule.
How PGENESIS Fits into Schedule
addtask Simulate /##[CLASS=postmaster] -action PROCESS
addtask Simulate /##[CLASS=buffer]
-action PROCESS
addtask Simulate /##[CLASS=projection] -action PROCESS
addtask Simulate /##[CLASS=spiking]
-action PROCESS
addtask Simulate /##[CLASS=gate]
-action PROCESS
addtask Simulate /##[CLASS=segment][CLASS!=membrane]\
[CLASS!=gate][CLASS!=concentration]
-action PROCESS
addtask Simulate /##[CLASS=membrane]
-action PROCESS
addtask Simulate /##[CLASS=hsolver]
-action PROCESS
addtask Simulate /##[CLASS=concentration] \
-action PROCESS
addtask Simulate /##[CLASS=device]
-action PROCESS
addtask Simulate /##[CLASS=output]
-action PROCESS
Adding Custom "C" Code
Uses:
data analysis
interfacing
custom objects
PGENESIS allows user's custom libraries to
be linked in, similarly to GENESIS
We recommend that you first incorporate your
custom library into serial GENESIS, before
trying to use it with PGENESIS.
Modifiable Parameters
/post/sync_before_step – boolean (default: 1)
 /post/remote_info – boolean (default 1) enables
rshowmsg
 /post/perfmon – boolean (default 0) enables
performance monitoring
 /post/msg_hang_time – float (default 120.0) seconds
before giving up on remote operation
 /post/pvm_hang_time – float (default 3.0) seconds
between printing dots while waiting for a message
 /post/xupdate_period – float (default 0.01) seconds
between checking for X events when at barrier

Limitations of PGENESIS
No rplanarweight, rplanardelay – use
corresponding 3-D routines rvolumeweight,
rvolumedelay
 Cannot delete remote messages
 getsyncount, getsynindex, getsyndest no
longer return the correct values.

Parameter Searching with
PGENESIS
Model Characteristics
The following are prerequisites to use PGENESIS
for optimization on a particular parameter
searching problem:
 Model must be expressed in GENESIS.
 Decide on the parameter set.
 Have a way to evaluate the parameter set.
 Have some range for each of the parameter
values.
 The evaluations over the parameter-space
should be reasonably well-behaved.
 Stopping criterion
Trivial Model

Rather than do a simulation, we will just optimize
a function f of four parameters a, b, c, and d:
f(a, b, c, d) = 10.0 – (a-1)*(a-1) – (b-2)*(b-2) –
(c-3)*(c-3) – (d-4)*(d-4)
Evaluation of the model:
fitness = f(a, b, c, d)
 Range of parameters: -10 < a,b,c,d < 10
 Evaluation is definitely well-behaved.
 Stopping criterion: Stop after 1000 individuals.
Master/Worker Paradigm (1)
Master/Worker Paradigm (2)
 All nodes in a separate zone.
 Node 0.0 will control the search.
 Nodes 0.1 through 0.{n-1} will run the model and
perform the evaluation.
Commands for Optimization
Typically these are organized in a
master/worker fashion with one node (the
master) directing the search, and all other
nodes evaluating parameter sets. Remote
function calls are useful in this context for:
 sending tasks to workers:
async task@{worker} param1...
 having workers return evaluations to master:
return_result@{master} result
Choose a Search Strategy
 Genetic Search
 Simulated Annealing
 Monte Carlo (for very ill-behaved search spaces)
 Nelder-Mead (for well-behaved search spaces)
 Use as many constraints as you can to restrict
the search space
 Always do a sanity check on results
A Parallel Genetic Algorithm
 We adopt a population-based approach as
opposed to a generation-based one.
 We will keep a fixed population "alive" and use
the workers to evaluate the fitness of candidate
individuals.
 If a candidate turns out to be better than some
member of the current population, then we
replace the worst member of the current
population with the new individual.
Parameter Representation

We represent the set of parameters that define
an individual as a string of bits. Each 16-bit
string (one "gene") is interpreted as a signed
integer and then divided by 1000.0 to yield the
floating point value. To generate a new
candidate from the existing population:
1.
2.
Pick a member of the population at random.
Go through each bit of the bit string, and
mutate it with some small probability.
Main Script
paron -farm -silent 0 -nodes {n_nodes} \
-output o.out -executable nxpgenesis
barrierall
if ({mytotalnode} == 0)
search
end
barrierall 7 1000000
paroff
quit
Master Conducts the Search
function search
int i
init_search
init_farm
for (i = 0; i < individuals; i = i + 1)
if (i < population)
init_individual
else
mutate_individual {rand 0
actual_population}
end
delegate_task {i} {bs_a} {bs_b} {bs_c} {bs_d}
end
finish
end
Master Conducts the Search
function delegate_task
while (1)
if (free_index >= 0)
async worker@0.{getfield \
/free[{free_index}] value} \
{bs_a} {bs_b} {bs_c} {bs_d}
free_index = free_index - 1;
return
else
clearthreads
end
end
end
Workers Evaluate Individuals
function worker (bs_a, bs_b, bs_c, bs_d)
int bs_a, bs_b, bs_c, bs_d
float a, b, c, d, fit
a = (bs_a - 32768.0) / 1000.0
b = (bs_b - 32768.0) / 1000.0
c = (bs_c – 32768.0) / 1000.0
d = (bs_d – 32768.0) / 1000.0
fit = {evaluate {a} {b} {c} {d}}
return_result@0.0 {mytotalnode} {bs_a} {bs_b} \
{bs_c} {bs_d} {fit}
end
Workers Evaluate Individuals
function evaluate (a, b, c, d)
float a, b, c, d, fit
fit = 10.0 – (a-1)*(a-1) – (b-2)*(b-2) \
- (c-3)*(c-3) – (d-4)*(d-4)
return {fit}
end
Master Integrates the Results (1)
function return_result (node, bs_a, bs_b, bs_c, bs_d,
fit)
int node, bs_a, bs_b, bs_c, bs_d
float fit
if (actual_population < population)
least_fit = actual_population
min_fitness = -1e+10
actual_population = actual_population + 1
end
Master Integrates the Results (2)
if (fit > min_fitness)
setfield /population[{least_fit}] fitness
setfield /population[{least_fit}] a_value
setfield /population[{least_fit}] b_value
setfield /population[{least_fit}] c value
setfield /population[{least_fit}] d value
if (actual_population == population)
recompute_fitness_extremes
end
end
free_index = free_index + 1
setfield /free[{free_index}] value {node}
end
{fit}
{bs_a}
{bs_b}
{bs_c}
{bs_d}
A More Realistic Model
We have a one compartment cell model of a spiking
neuron. Dynamics are probably well-behaved.
 Parameters are the conductances for the Na, Kdr,
Ka, and KM channels. We know the conductance
values to be in the range from 0.1 to 10.0 a priori.
 We write spike times to a file, then compare this
using a C function, spkcmp, to "experimental" data.
 Stop when our match fitness exceeds 20.0

Improved Parameter
Representation
 As before, we still represent the set of
parameters that define an individual as a string
of bits. However, now each 16-bit string will
logarithmically map into the range of 0.1 to 10.0
so that we will have increased resolution at the
low end of the scale.
Crossover Mutations
1.
2.
3.
Pick a member of the population at random.
Decide whether to do crossover according to
the crossover probability. If we are doing
crossover, pick another random member of the
current population, and combine the "genes" of
those individuals. If we aren't doing crossover,
just copy the bits of the original individual.
Go through each bit of the bit string, and
mutate it with some small probability.
Main Script (1)
int n_nodes = 4
int individuals = 1000
int population = 10
float stopping_criterion = 20.0
float crossover_prob = 0.5
float bit_mutation_prob = 0.02
Main Script (2)
include population.g
// functions for GA
//
population-based
//
parameter searches
// model-specific files
include siminit.g // defines parameters of
//
simulation
include fI.g
// sets up table of currents
include channels.g // defines the channels
include simcell.g // functions to load in the
//
cell model
include eval.g
// functions to evaluate the
model
Main Script (3)
paron -farm -silent 0 -nodes {n_nodes} \
-output o.out -executable nxpgenesis
barrierall
if ({mytotalnode} == 0)
init_master
pb_search {individuals} {population}
else
init_worker
end
barrierall 7 1000000
paroff
Parameters Are Customizable
function
setfield
setfield
setfield
setfield
setfield
setfield
setfield
setfield
end
init_params
/params[0] label "Na" scaling "log“
/params[0] min_value 0.1 max_value 10.0
/params[1] label "Kdr" scaling "log“
/params[1] min_value 0.1 max_value 10.0
/params[2] label "Ka" scaling "log“
/params[2] min_value 0.1 max_value 10.0
/params[3] label "KM" scaling "log“
/params[3] min_value 0.1 max_value 10.0
Worker Evaluates Individuals (1)
function evaluate
float match, fitness
// first run the simulation
newsim {getfield /params[0]
{getfield /params[1]
{getfield /params[2]
{getfield /params[3]
runfI
call /out/{sim_output_file}
value} \
value} \
value} \
value}
FLUSH
Worker Evaluates Individuals (2)
// then find the simulated spike times
gen2spk {sim_output_file} {delay} \
{current_duration} {total_duration}
// then compare the simulated spike
// times with the experimental data
match = {spkcmp {real_spk_file} \
{sim_spk_file} -pow1 0.4 -pow2 0.6 \
-msp 0.5 -nmp 200.0}
fitness = 1.0 / {sqrt {match}}
return {fitness}
end
Tuning Search
 representation
 parameter selection
 generation vs population-based approach
 generation/population size
 crossover probability
 crossover method
 mutation probability
 initial ranges
Large Networks with PGENESIS
Parallel Network Creation
In parallel network creation make sure elements
exist before connecting them up, e.g.
create_elements(...)
barrier
create_messages(...)
Goals of decomposition
 Keep all processors busy all the time on useful
work
 Use as many processors as are available
 Key concepts are:
Load-balancing
Minimizing communication
Minimizing synchronization
Scalable decomposition
Parallel I/O
Load balancing

Attempt to parcel out the modeled cells such
that each CPU takes the same amount of
time to simulate one step
 This is static load balancing - cells do not
move
 Dedicated access to the CPUs is required for
effective decomposition
 Easier if identically configured CPUs.
 PGENESIS provides no automated loadbalancing but there are some performance
monitoring tools.
Minimizing communication

Put highly connected clusters of cells on the
same PGENESIS node.
 Think of each synapse with a presynaptic cell
on a remote node as expensive.
 The same network distributed among more
nodes will result in more of these expensive
synapses; hence, more nodes can be
counterproductive.
 The time spent communicating can
overwhelm the time spent computing.
Orient_tut Example
Non-scalable decomposition
Scalable decomposition (1)
Goal: as the number of available processors
grows, your model naturally partitions into finer
divisions
Scalable decomposition (2)
Scalable decomposition (3)
To the extent that you can arrange your decomposition
to scale with the number of processors, it is a very
good idea to create the scripts using a function of the
number of nodes anywhere that a node number must
be explicitly specified.
 E.g., createmap /library/rec /retina/recplane \
{REC_NX / n_slices} {REC_NY} \
-delta {REC_SEPX} {REC_SEPY} \
-origin {-REC_NX * REC_SEPX / 2 + \
slice * REC_SEPX * REC_NX / n_slices} \
{-REC_NY * REC_SEPY / 2}

Case Study: Cerebellar Model
 Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
Goddard, N., De Schutter, E., “A large-scale
model of the cerebellar cortex using
PGENESIS”, Neurocomputing, 32/33 (2000), p.
1041-1046.
 16 Purkinje cells embedded in an environment of
other simpler, but more numerous cells
 Simulated on 128 processors of PSC’s Cray T3E
Cell Populations & Connectivity
3-D Representation of Network
Model Partitioning
Timings on 128 Processors of T3E
Timings vs. Model Size
Timings on Workstation Network
Significant Overhead on Cluster
Scaling Up
Getting Cycles
NSF-Funded Supercomputing Centers
Pittsburgh Supercomputing Center
(http://www.psc.edu)
– PGENESIS installed on 512 processor Cray T3E
NPACI (http://www.npaci.edu)
– Worked on MPI-based PGENESIS
Alliance (http://www.ncsa.uiuc.edu)
The High End
3000-processor Terascale computer at PSC
Parallel Script Development
1. Develop single cell prototypes using serial
GENESIS.
2. (a) For network models, decide partitioning
and develop scalable scripts. (b) For
parameter searches, develop scripts to run
and evaluate a single individual, and a
scalable script that will control the search.
3. Try out scripts on single processor using the
minimum number of nodes.
Parallel Script Development
4. Try out scripts on single processor but
increase the number of nodes.
5. Try out scripts on small multiprocessor
platform.
6. Try out scripts on large multiprocessor
platform.
Resource Limits and Other Tips

On the Cray T3E set PVM_SM_POOL to
ensure adequate PVM buffer space. This
should be set to the maximum number of
messages that might arrive at any PE before
it gets a chance to process them.
 On other machines, you may need to set
PVMBUFSIZE to address similar issues.
 When debugging interactively, set the timeout
so that other nodes do not timeout:
setfield /post msg_hang_time 10000.0
Reducing Synchronization Delay
In network models, axonal delays L are large
compared to the simulation time step.
 A spike generated at simulation time T on one node
need not be physically delivered to the destination
synapse on another node until simulation time T+L.
 PGENESIS can use this to reduce unnecessary
waiting. Node B can get ahead of node A by the
minimum of all the axonal delays on the connections
from cells on A to synapses on B. This is called the
lookahead of B with respect to A.
 You must set /post/sync_before_step to 0 for this to
allow this looser synchronization.

Reducing Synchronization Delay
A goal when you are partitioning a network across
nodes is to make the lookahead between any pair of
nodes large.
 PGENESIS provides the setlookahead command for you
to inform it of the lookahead between nodes:

setlookahead 0.01 // sets lookahead to 10 mS
setlookahead 3 0.01 // sets lookahead to 10 mS w.r.t. node 3

The getlookahead command reports the current setting
with respect to a particular node, and the
showlookahead command reports the minimum
lookahead to all other nodes:
getlookahead 3 // gets lookahead with respect to node 3
showlookahead // get lookahead with respect to all nodes
Parallel I/O
Currently the I/O facilities (disk elements and Xodus
elements) are tightly synchronized with the
simulation (no lookahead). Therefore sending
messages to Xodus objects or disk objects on
remote nodes usually slows the simulation to a
crawl. Use Xodus only for post-processing.
 Try to arrange input and output to be via local
elements. On workstations it is preferable to access
local disk. If access is via a shared file system (e.g.,
NFS, AFS), use different output disk files for
different nodes, and amalgamate the data after the
simulation is over.

Performance Monitoring (1)




A script can turn on performance monitoring with
setfield /post perfmon 1 and turns it off with
setfield /post perfmon 0
Whenever performance monitoring is active, the
categories listed below accumulate time.
To ignore the time involved in construction of a
model, do not activate performance monitoring
until just prior to the first simulation step.
The accumulated time values can be dumped to
a file with the command perfstats This writes a
file to /tmp (usually a local disk) called
pgenesis.ppp.nnn.acct where ppp is the process
id and nnn is the node number. Each time
perfstats is called it dumps the accumulated
values, but it does not reset them.
Performance Monitoring (2)
The monitoring package tracks the amount of time
in various operations:
PGENESIS_PROCESS_SNDREC_SND
time sending data to other nodes
PGENESIS_PROCESS_SNDREC_REC
time receiving data from other nodes
PGENESIS_PROCESS_SNDREC_GETFIELD
time spent gathering local data for transmission to
other nodes
PGENESIS_PROCESS_SNDREC
time spent in sending and receiving data not
accounted for by the three preceding categories
PGENESIS_PROCESS_SYNC
time spent explicitly synchronizing nodes prior to
each step
Performance Monitoring (3)
PGENESIS_PROCESS
time spent in parallel overhead of exchanging data
with other nodes which is not accounted for by the
preceding categories
PGENESIS_EVENT
time spent handling incoming spikes
PGENESIS
time spent in PGENESIS not accounted for by the
preceding overhead categories. (In other words the
time spent doing useful work.)
Comparisons and Summary
Alternatives to PGENESIS (1)

Batch scripts (Perl, Python, bash) for parameter
searching
Incurs GENESIS process startup and network
setup overheads
If simulations are long, and evaluation step is
done externally already, may be simpler

NEURON
Parallel parameter searching (talk with Mike
Hines)
Vectorized NEURON if you happen to have a
vector machine handy
Alternatives to PGENESIS (2)

NEOSIM (http://www.neosim.org/)
Prototype stage (Java kernel released)
Integration with NEURON simulation engine
Supports automatic network partitioning
Modular architecture
Designed for scalability

Hand-coded simulation (Java, C++, C, Fortran)
Very time-consuming (especially parallel coding)
Difficult to share models
Specialized code can run much faster
Possibly appropriate for large, but simple models (e.g.
connectionist-style approaches)
Summary

PGENESIS is a GENESIS extension which
can let you use multiple computers to:
Perform large parameter searches much more
quickly
Simulate large network models more quickly
Discussion
References
• Goddard, N.H. and Hood, G., Large-scale
simulation using parallel GENESIS, The Book of
GENESIS, 2nd ed., Bower, J.M. and Beeman, D.
(Eds), Springer-Verlag, 1998.
• Goddard, N.H. and Hood, G., Parallel Genesis for
large scale modeling, Computational
Neuroscience: Trends in Research 1997, Plenum
Publishing, NY, 1997, p. 911-917.
• Howell, D. F., Dyhrfjeld-Johnsen, J., Maex, R.,
Goddard, N., De Schutter, E., A large-scale model
of the cerebellar cortex using PGENESIS,
Neurocomputing, 32/33 (2000), p. 1041-1046.
Download