DC2 Workshop – June 2005 GLAST Glast Collaboration Data Server and Data Catalog Tony Johnson DC2 Planning Meeting June 2005 T.Johnson 1/10 DC2 Workshop – June 2005 GLAST Contents • What Exists – Ntuple Pruner/Peeler – Data Server (for Internal Collaboration Use) • What is Planned • What is Wanted? T.Johnson 2/10 DC2 Workshop – June 2005 GLAST Data Server Portal • Web Portal Provides access to existing data server functionality – Currently: • NTuple Pruner (Tom Glanzman) – Selection of Data via Cuts on Merit Tuple » Works with datasets in pipeline data catalog – Download of Data via FTP after submission of batch job » Allows access to Root Merit Tuple • Event Peeler (coming very soon) (Tom Glanzman) – Selection of Data via run/event number (uploaded file) » Works with datasets in pipeline data catalog – Download of Data via FTP after submission of batch job » Access to Root Merit tuple and/or full Root tuple • “Data Server” (Jean-Paul LeFevre) – Allows rapid selection of events based on » Energy, Origin (decl, ra), Time, Gamma Quality » Stored in “meta-data” database – Additional MeritTuple cuts – Supports adding cuts to personal “favorites” list – Download of Data via FTP after submission of batch job » Currently configured to work with DC1 Root merit tuple only T.Johnson 3/10 DC2 Workshop – June 2005 GLAST Screen Shots T.Johnson http://glast-ground.slac.stanford.edu/DataServer/ 4/10 DC2 Workshop – June 2005 GLAST Data Server Issues • Currently trying both Oracle (10g) and MySQL (5) using “spatial” extensions – Do not fully support spherical geometry • Forced to make rectangular selections rather than circular • Problems at poles – Performance seems OK – at least for 50 million events • Selection performance scales by number of events selected, rather than total events in database • Indexes seem very slow to build – many hours to add 1,000,000 events – and this seems to scale by total database size – Still under investigation, maybe can be improved by tuning – Need to decide very soon how much effort to put into database vs. a custom solution T.Johnson 5/10 DC2 Workshop – June 2005 GLAST Data Catalog Plans • Working on new “Glast Data Catalog” – Less tightly coupled to pipeline than current catalog – Allows domain specific, user-defined, hierarchical “metadata” to be associated with each dataset, e.g. • Simulation physics, test setup parameters • Pointers to pipeline task • Pointers to e-logbook entries – Web interface will allow browsing data hierarchy or searching based on meta-data – Implementation based on earlier “Grid” data catalog developed at SLAC. • Uses XML for import/export of data – (stored in Oracle XML database) T.Johnson 6/10 DC2 Workshop – June 2005 GLAST DC2 Data Catalog? T.Johnson 7/10 DC2 Workshop – June 2005 GLAST Data Server Plans • • Continue to enhance pruner/peeler – Add TCut capability to peeler – Add access to other data types (SVAC tuple) Data Server – Enhance ergonomics of web interface – Support search using sky catalog – Work on handling larger data volumes – Add ability to download events in different formats • FITS, run/event #, Different Root tuples – Add ability to browse events using event display – Use xrootd server to stream data • Eliminate waiting for batch job and FTP transfer – Experiment with SLAC “Peta-Cache” system • Initially use xrootd to serve existing Root tuples • Highest performance may require storing tuples in some other format T.Johnson 8/10 DC2 Workshop – June 2005 GLAST Data Pump – Streaming data directly to users Data Server Format Converter TCut Format Converter TCut Format Converter TCut Multiple Threads T.Johnson xrootd Root Files 9/10 DC2 Workshop – June 2005 GLAST Conclusions • Initial data server available – Would like some people to try it and give feedback • Lots of work to do – Need to set goals/priorities for DC2 work • Understand timescales • Understand what data volume will be • Understand what typical queries will be T.Johnson 10/10 GLAST DC2 Workshop – June 2005 Hierarchical Data Catalog T.Johnson 11/10