SimDB and SimTAP Dealing with a complex data model Gerard Lemson, Nara, 2010-12-10 SimDB and SimDAL Protocols to support • describing simulations – Simulation Data Model: Model for N-body 3+1D any simulations http://volute.googlecode.com/svn/trunk/projects/theory/snapdm/specification/uml/SimDB_DM .png • publishing simulations – Simulation Database (SimDB): protocol for accessing a database built according to SimDM. • finding simulations – SimDB/TAP – queryData in SimDAL – SimTAP • retrieving simulation data, whole, in parts, manipulated – SimDAL getData services (not in this talk) • Btw: “simulation” can be – – – – simulation run simulation result simulation data post-processing of simulation results SimDB/REST • “simple” access to SimDB • Uses XML representation of model – XML schema http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/xsd • Examples http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples – PDR http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/examples/external/PDR – Gadget2 http://volute.googlecode.com/svn-history/r1382/trunk/projects/theory/snapdm/specification/examples/external/Gadget2/Gadget2.xml – TODO more (SVO) • VO-URP – validator http://www.g-vo.org/SimDB-browser/Validate.do – upload – download http://www.g-vo.org/SimDB-browser http://www.g-vo.org/SimDB-browser SimDB/TAP • Model complex – Too(?) complex for trivial (parameter based) query language – Need special navigation tools (vo-urp@gavo) – Need powerful query language • Impement TAP on database built according to SimDM • Map UML to RDB model – TAP_SCHEMA for SimDM (vo-urp@gavo old) http://code.google.com/p/volute/source/browse/#svn/trunk/projects/theory/snapdm/specification/tap – create table + inserts – VODataService • VO-URP SQL query http://www.g-vo.org/SimDB-browser/Query.do • Not always easy! Model complex • Normalised (see image) • General Abstract – e.g. parameters must be fully defined, no assumptions • Hard to deal with quantities with a priori unknown units – ParameterSetting table has value AND unit attributes (Quantity datatype) Example queries • Find synthetic spectra of white dwarf stars • Find cosmological simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02 • Find all SPH simulations containing a galaxy cluster with mass around1014 Msun select from , , , where and and and and and e.* experiment e targetObject t result r product p t.label=‘white_dwarf’ t.containerid=e.id r.containerid=e.id r.targetId=t.id p.containerid=r.id p.productType=‘spectrum’ Example queries • Find synthetic spectra of white dwarf stars • Find (cosmological) simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02 • Find all SPH simulations containing a galaxy cluster with mass around1014 Msun select from , , , , , , where and and and and and and and and and and e.* Experiment e InputParameter ip1 ParameterSetting ps1 InputParameter ip2 ParameterSetting ps2 InputParameter ip3 ParameterSetting ps3 ps1.containerId = e.id ps1.parameterId = ip1.id ip1.label = ‘omega_lambda’ ps1.numericalValue_value=0.7 ps2.containerId = e.id ip2.label = ‘omega_baryon’ ps2.parameterId = ip1.id ps2.numericalValue_value=0.02 ps3.containerId = e.id ip3.label = ‘omega’ ps3.numericalValue_value=0.9 Example queries • Find synthetic spectra of white dwarf stars • Find (cosmological) simulations with Ω=0.9, ΩΛ= 0.7 and Ωb=0.02 • Find all SPH simulations containing a galaxy cluster with mass around1014 Msun select e.* from Experiment e , ExperimentRepresentationObject ero , RepresentationObjectType rot , TargetObject to , Property p , StatisticalSummary s where ero.containerId = e.id and ero.typeId= rot.id and rot.label=‘sph.particle’ and to.containerId = e.id and to.label = ‘galaxy.cluster’ and p.containerId = to.id and p.label=‘mass’ and s.propertyId = p.id and s.statistic = ‘value’ and s.numericalValue_value=1e14 and s.numericalValue_unit=‘M_sun’ An example from Paris. Find typical values of mass,x,y,z properties in a given simulation result SELECT , , , , , FROM , , , , , , , , , WHERE AND and and and and and and and and and and and and and and and and r.id as id r.publisherdid as publisherdid s0.numericValue_value as mass s1.numericValue_value as x s2.numericValue_value as y s3.numericValue_value as z result r product o statisticalsummary s0 property p0 statisticalsummary s1 property p1 statisticalsummary s2 property p2 statisticalsummary s3 property p3 r.containerid = 6 o.containerid = r.id s0.containerid = o.id s1.containerid = o.id s2.containerid = o.id s3.containerid = o.id p0.publisherdid = 'mass' s0.proprtyid=s3.id s0.statistic = ‘nominal’ p1.publisherdid = 'x' s1.proprtyid=s3.id s1.statistic = ‘nominal’ p2.publisherdid = 'y' s2.proprtyid=s3.id s2.statistic = ‘nominal’ p3.publisherdid = 'z' s3.proprtyid=s3.id s3.statistic = ‘nominal’ SELECT r.id as id , r.publisherdid , max(case when p.publisherdid s.statistic=‘nominal’ then s.numericValue_value , max(case when p.publisherdid s.statistic=‘nominal’ then s.numericValue_value , max(case when p.publisherdid s.statistic=‘nominal’ then s.numericValue_value , max(case when p.publisherdid s.statistic=‘nominal’ then s.numericValue_value FROM result r , product o , statisticalsummary s , property p WHERE r.containerid = 6 AND o.containerid = r.id and s.containerid = o.id and p.id = s.propertyid group by r.id,r.publisherid,o.id = ‘mass’ and else null end) as mass = ‘x’ and else null end) as x = ‘y’ and else null end) as y = ‘z’ and else null end) as z Conclusions • Some queries can be phrased nicely • Others using standard SQL, but due to level of normalisation and abstraction MANY joins required • Can we simplify this a bit? zoom ParameterSetting containerId value unit parameterId ... ... ... ... 123 0.02 456 123 0.7 457 123 0.9 458 345 .04 456 345 .7 457 345 1 458 ... ... ... ... + InputParameter id name label datatype description 456 omega_b omega.baryon real ... 457 omega_l omega.lambda real ... 458 omega omega real ... ... ... ... ... ... simtap.Experiment id omega_b omega_l omega 123 0.02 0.7 0.9 345 0.04 0.7 1 ... SimTAP • When Protocol is fixed, tap schema can be simplified – parameters columns in simtap.Experiment table – property characterisation columns in product specific characterisation table(s) – ... Instead of this select from , , , , , , where and and and and and and and and and and e.* Experiment e InputParameter ip1 ParameterSetting ps1 InputParameter ip2 ParameterSetting ps2 InputParameter ip3 ParameterSetting ps3 ps1.containerId = e.id ps1.parameterId = ip1.id ip1.label = ‘omega_lambda’ ps1.numericalValue_value=0.7 ps2.containerId = e.id ip2.label = ‘omega_baryon’ ps2.parameterId = ip1.id ps2.numericalValue_value=0.02 ps3.containerId = e.id ip3.label = ‘omega’ ps3.numericalValue_value=0.9 this select from where and and e.* simtap.Experiment omegaLambda=0.7 omegaBaryon=0.02 omega=0.9 Table definitions can be derived • From a Protocol definition – input parameters – for each Representation object type • a table with statistical summaries of properties – target object type • ala SimDM (units in ADQL required) • pivoted per project? – input data sets (urls) • Pivoting queries can be generated Proposal • SimDAL services MAY include a SimTAP service • 1 SimTAP schema per Protocol • Each such schema contains – 1 Experiment table with columns for parameters – >=1 Product tables with characterisation of properties – Possibly other tables from SimDB/TAP