Find all SPH simulations containing a galaxy cluster with

advertisement
SimDB and SimTAP
Dealing with a complex data model
Gerard Lemson, Nara, 2010-12-10
SimDB and SimDAL
Protocols to support
• describing simulations
– Simulation Data Model: Model for N-body 3+1D any simulations
http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/specification/uml/SimDB_DM
.png
•
publishing simulations
– Simulation Database (SimDB): protocol for accessing a database built according
to SimDM.
•
finding simulations
– SimDB/TAP
– queryData in SimDAL
– SimTAP
•
retrieving simulation data, whole, in parts, manipulated
– SimDAL getData services (not in this talk)
•
Btw: “simulation” can be
–
–
–
–
simulation run
simulation result
simulation data
post-processing of simulation results
SimDB/REST
• “simple” access to SimDB
• Uses XML representation of model
– XML schema http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/xsd
• Examples
http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples
– PDR http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples/external/PDR
– Gadget2
http://volute.googlecode.com/svn-history/r1382/trunk/projects/theory/snapdm/specification/examples/external/Gadget2/Gadget2.xml
– TODO more (SVO)
• VO-URP
– validator http://www.g-vo.org/SimDB-browser/Validate.do
– upload
– download
http://www.g-vo.org/SimDB-browser
http://www.g-vo.org/SimDB-browser
SimDB/TAP
• Model complex
– Too(?) complex for trivial (parameter based) query language
– Need special navigation tools (vo-urp@gavo)
– Need powerful query language
• Impement TAP on database built according to SimDM
• Map UML to RDB model
– TAP_SCHEMA for SimDM (vo-urp@gavo old)
http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/tap
– create table + inserts
– VODataService
• VO-URP SQL query
http://www.g-vo.org/SimDB-browser/Query.do
• Not always easy!
Model complex
• Normalised (see image)
• General  Abstract
– e.g. parameters must be fully defined, no
assumptions
• Hard to deal with quantities with a priori
unknown units
– ParameterSetting table has value AND unit
attributes (Quantity datatype)
Example queries
• Find synthetic spectra of white dwarf
stars
• Find cosmological simulations with Ω=0.9,
ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a
galaxy cluster with mass around1014 Msun
select
from
,
,
,
where
and
and
and
and
and
e.*
experiment e
targetObject t
result r
product p
t.label=‘white_dwarf’
t.containerid=e.id
r.containerid=e.id
r.targetId=t.id
p.containerid=r.id
p.productType=‘spectrum’
Example queries
• Find synthetic spectra of white dwarf stars
• Find (cosmological) simulations with
Ω=0.9, ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a
galaxy cluster with mass around1014 Msun
select
from
,
,
,
,
,
,
where
and
and
and
and
and
and
and
and
and
and
e.*
Experiment e
InputParameter ip1
ParameterSetting ps1
InputParameter ip2
ParameterSetting ps2
InputParameter ip3
ParameterSetting ps3
ps1.containerId = e.id
ps1.parameterId = ip1.id
ip1.label = ‘omega_lambda’
ps1.numericalValue_value=0.7
ps2.containerId = e.id
ip2.label = ‘omega_baryon’
ps2.parameterId = ip1.id
ps2.numericalValue_value=0.02
ps3.containerId = e.id
ip3.label = ‘omega’
ps3.numericalValue_value=0.9
Example queries
• Find synthetic spectra of white dwarf stars
• Find (cosmological) simulations with
Ω=0.9, ΩΛ= 0.7 and Ωb=0.02
• Find all SPH simulations containing a
galaxy cluster with mass around1014
Msun
select e.*
from Experiment e
,
ExperimentRepresentationObject ero
,
RepresentationObjectType rot
,
TargetObject to
,
Property p
,
StatisticalSummary s
where ero.containerId = e.id
and ero.typeId= rot.id
and rot.label=‘sph.particle’
and to.containerId = e.id
and to.label = ‘galaxy.cluster’
and p.containerId = to.id
and p.label=‘mass’
and s.propertyId = p.id
and s.statistic = ‘value’
and s.numericalValue_value=1e14
and s.numericalValue_unit=‘M_sun’
An example from Paris.
Find typical values of mass,x,y,z
properties in a given simulation result
SELECT
,
,
,
,
,
FROM
,
,
,
,
,
,
,
,
,
WHERE
AND
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
and
r.id as id
r.publisherdid as publisherdid
s0.numericValue_value as mass
s1.numericValue_value as x
s2.numericValue_value as y
s3.numericValue_value as z
result r
product o
statisticalsummary s0
property p0
statisticalsummary s1
property p1
statisticalsummary s2
property p2
statisticalsummary s3
property p3
r.containerid = 6
o.containerid = r.id
s0.containerid = o.id
s1.containerid = o.id
s2.containerid = o.id
s3.containerid = o.id
p0.publisherdid = 'mass'
s0.proprtyid=s3.id
s0.statistic = ‘nominal’
p1.publisherdid = 'x'
s1.proprtyid=s3.id
s1.statistic = ‘nominal’
p2.publisherdid = 'y'
s2.proprtyid=s3.id
s2.statistic = ‘nominal’
p3.publisherdid = 'z'
s3.proprtyid=s3.id
s3.statistic = ‘nominal’
SELECT r.id as id
,
r.publisherdid
,
max(case when p.publisherdid
s.statistic=‘nominal’
then s.numericValue_value
,
max(case when p.publisherdid
s.statistic=‘nominal’
then s.numericValue_value
,
max(case when p.publisherdid
s.statistic=‘nominal’
then s.numericValue_value
,
max(case when p.publisherdid
s.statistic=‘nominal’
then s.numericValue_value
FROM result r
,
product o
,
statisticalsummary s
,
property p
WHERE r.containerid = 6
AND o.containerid = r.id
and s.containerid = o.id
and p.id = s.propertyid
group by r.id,r.publisherid,o.id
= ‘mass’ and
else null end) as mass
= ‘x’ and
else null end) as x
= ‘y’ and
else null end) as y
= ‘z’ and
else null end) as z
Conclusions
• Some queries can be phrased nicely
• Others using standard SQL, but due to
level of normalisation and abstraction
MANY joins required
• Can we simplify this a bit?
zoom
ParameterSetting
containerId
value
unit
parameterId
...
...
...
...
123
0.02
456
123
0.7
457
123
0.9
458
345
.04
456
345
.7
457
345
1
458
...
...
...
...
+
InputParameter
id
name
label
datatype
description
456
omega_b
omega.baryon
real
...
457
omega_l
omega.lambda
real
...
458
omega
omega
real
...
...
...
...
...
...
simtap.Experiment
id
omega_b
omega_l
omega
123
0.02
0.7
0.9
345
0.04
0.7
1
...
SimTAP
• When Protocol is fixed, tap schema can be
simplified
– parameters  columns in simtap.Experiment
table
– property characterisation  columns in
product specific characterisation table(s)
– ...
Instead of
this
select
from
,
,
,
,
,
,
where
and
and
and
and
and
and
and
and
and
and
e.*
Experiment e
InputParameter ip1
ParameterSetting ps1
InputParameter ip2
ParameterSetting ps2
InputParameter ip3
ParameterSetting ps3
ps1.containerId = e.id
ps1.parameterId = ip1.id
ip1.label = ‘omega_lambda’
ps1.numericalValue_value=0.7
ps2.containerId = e.id
ip2.label = ‘omega_baryon’
ps2.parameterId = ip1.id
ps2.numericalValue_value=0.02
ps3.containerId = e.id
ip3.label = ‘omega’
ps3.numericalValue_value=0.9
this
select
from
where
and
and
e.*
simtap.Experiment
omegaLambda=0.7
omegaBaryon=0.02
omega=0.9
Table definitions can be derived
• From a Protocol definition
– input parameters
– for each Representation object type
• a table with statistical summaries of properties
– target object type
• ala SimDM (units in ADQL required)
• pivoted per project?
– input data sets (urls)
• Pivoting queries can be generated
Proposal
• SimDAL services MAY include a SimTAP
service
• 1 SimTAP schema per Protocol
• Each such schema contains
– 1 Experiment table with columns for
parameters
– >=1 Product tables with characterisation of
properties
– Possibly other tables from SimDB/TAP
Download