The Indra Simulation Database

advertisement

The Indra Simulation Database

Bridget Falck (JHU), Tamás Budavári (JHU), Shaun Cole (Durham), Daniel Crankshaw (JHU), László Dobos (Eötvös),

Gerard Lemson (MPA), Mark Neyrinck (JHU), Alex Szalay (JHU), Jie Wang (Durham)

Summary

We present the Indra suite of cosmological N -body simulations and the design of its companion database. Indra consists of 512 different instances of a 1 Gpc/h-sided box, each with 512 3 dark matter particles and the same input cosmology, enabling a characterization of very large-scale modes of the matter power spectrum with 10 12 M

Sun particle mass and an excellent handle on cosmic variance. We discuss the database design for the particle data, consisting of the positions and velocities of each particle, and the FOF halos, with links to the particle data so that halo properties can be calculated within the database.

The Indra Simulations

Key Features:

512 different random instances, each 1 Gpc/h

Dark Matter only, WMAP7 cosmology

Particle mass of ~10 12 M

Sun

512 3 particles and 64 snapshots per simulation

Over 100 TB worth of data

In addition to the snapshots of particle data and the FOF halos calculated as the simulation runs, we output directly the complex Fourier amplitudes for all the large wavelength modes at 256 time-steps in order to study the mildly non-linear regime.

Science Enabled:

Classification of topologies (walls, filaments, clusters)

Studies of the baryon acoustic features in redshift-space

Large scale structure statistics, correlation functions, etc.

Mildly non-linear mode statistics and in-fall patterns

And much more…

Indra refers to the Buddhist metaphor of Indra’s Net, in which each of the infinite jewels reflects every other jewel

Database Design

Particle Tables

The main feature of the database is that once the simulations have ended, it will contain all of the particle data for each of the 64 snapshots for each of the 512 simulation runs, totaling over 100 TB of data. We are currently testing particle tables with and without the use of SqlArrays (see below).

Schema without arrays:

for each run, 1 row per particle per snapshot

snapnum phkey id x y z vx vy

Schema with arrays:

for each run, 1 row per PH-index per snapshot

snapnum phkey numpart [ id ] [ pos ] [ vel ] vz

Halo Tables

We will also include FOF halos that are being calculated as the simulation runs. Because we have all of the particle data, we can calculate any halo properties within the database itself or create new halo catalogs using different halo-finding algorithms.

Halo schema:

for each run, 1 row per halo per snapshot

snapnum haloid numpart [ particle ids ] (property A) (property B) …

Spatial Indexing

The particle tables will be indexed according to snapshot number and

Peano-Hilbert key, which describes the x-, y-, and z-coordinates of particles according to a space-filling curve. This will enable very fast spatial searches as well as light cones for mock galaxy catalogs generated on the fly.

Peano-Hilbert curves of different resolution

SqlArrays

Many tables will exploit SqlArray functions developed by László Dobos et al. for Microsoft SQL Server 2008. Some features of the SqlArrays include:

Flexible manipulation of arrays of the major data types (incl. complex)

Optimized storage for very small (point data) and very big (data grid) arrays

Data stored as binary with a short header

Array size up to 2 GB

Common math libraries (LAPACK, FFTW) are wrapped and callable from inside SQL

Sample Queries

-- Select particles with velocity > 1500 km/s: select id, sqrt(vx*vx+vy*vy+vz*vz) as v from snapnoarr -- 1.2 million rows where sqrt(vx*vx+vy*vy+vz*vz) > 1500. and snapnum = 11 -- 2 minutes

-- Select particles within a specified sphere: select id from snapnoarr where (x-500.)*(x-500.)+(y-500.)*(y-500.)+(z-500.)*(z-500.) <= 100.

and snapnum = 63 -- 1.5 minutes

-- As above, but accounting for PBCs and using spatial indexing: declare @qshape Shape3D = Shape3D::newInstance(‘Sphere [@x,@y,@z,@r]'); with fs as (select * from dbo.fSimulationCoverShape(@sim,@qshape,6)) select * from snapnoarr p inner join fs on p.phkey between fs.keymin and fs.keymax

-- search using PH-key where @qshape.ContainsPoint(x+fs.Shiftx,y+shifty,z+shiftz) = 1 and fs.FullOnly = 0 and p.snapnum = 63 -- seconds

-- Calculate the number of halos and halo particles in each snapshot: select snapnum, count(*) nhalo, SUM(numpart) npart from foftable group by snapnum order by snapnum -- compare to reading 32*64 data files

-- Get initial positions of particles in a particular halo (uses SqlArrays): with q as (select b.v from

(select partid from foftable where snapnum = 63 and haloid = 320) a cross apply BigIntArrayMax.ToTable(a.partid) b -- decompose array of IDs

) select p.id, p.x, p.y, p.z from snapnoarr p inner join q on p.id = q.v where p.snapnum = 0 -- minutes

Ongoing Development

The database design and simulation runs are currently an ongoing process. For example, we are (or will be): developing partitioning schemes that would allow the most common queries to be run on the data in parallel creating example queries that would demonstrate how to use

SQL and SqlArrays, as was done for SDSS and Millennium

(http://www.mpa-garching.mpg.de/millennium/)

designing an automated bulk-loading process streamlining the use of SqlArray functions incorporating on-the-fly data visualization

Finally, we plan to make the database available online.

Contact:

Bridget Falck bfalck@pha.jhu.edu

Download