Glast Collaboration Data Server and Data Catalog Tony Johnson DC2 Planning Meeting

advertisement
DC2 Workshop – June 2005
GLAST
Glast Collaboration Data Server
and Data Catalog
Tony Johnson
DC2 Planning Meeting
June 2005
T.Johnson
1/10
DC2 Workshop – June 2005
GLAST
Contents
• What Exists
– Ntuple Pruner/Peeler
– Data Server (for Internal Collaboration Use)
• What is Planned
• What is Wanted?
T.Johnson
2/10
DC2 Workshop – June 2005
GLAST
Data Server Portal
•
Web Portal Provides access to existing data server functionality
– Currently:
• NTuple Pruner (Tom Glanzman)
– Selection of Data via Cuts on Merit Tuple
» Works with datasets in pipeline data catalog
– Download of Data via FTP after submission of batch job
» Allows access to Root Merit Tuple
• Event Peeler (coming very soon) (Tom Glanzman)
– Selection of Data via run/event number (uploaded file)
» Works with datasets in pipeline data catalog
– Download of Data via FTP after submission of batch job
» Access to Root Merit tuple and/or full Root tuple
• “Data Server” (Jean-Paul LeFevre)
– Allows rapid selection of events based on
» Energy, Origin (decl, ra), Time, Gamma Quality
» Stored in “meta-data” database
– Additional MeritTuple cuts
– Supports adding cuts to personal “favorites” list
– Download of Data via FTP after submission of batch job
» Currently configured to work with DC1 Root merit tuple only
T.Johnson
3/10
DC2 Workshop – June 2005
GLAST
Screen Shots
T.Johnson
http://glast-ground.slac.stanford.edu/DataServer/
4/10
DC2 Workshop – June 2005
GLAST
Data Server Issues
• Currently trying both Oracle (10g) and MySQL (5) using
“spatial” extensions
– Do not fully support spherical geometry
• Forced to make rectangular selections rather than
circular
• Problems at poles
– Performance seems OK – at least for 50 million events
• Selection performance scales by number of events
selected, rather than total events in database
• Indexes seem very slow to build
– many hours to add 1,000,000 events – and this seems to
scale by total database size
– Still under investigation, maybe can be improved by
tuning
– Need to decide very soon how much effort to put into
database vs. a custom solution
T.Johnson
5/10
DC2 Workshop – June 2005
GLAST
Data Catalog Plans
• Working on new “Glast Data Catalog”
– Less tightly coupled to pipeline than current catalog
– Allows domain specific, user-defined, hierarchical “metadata” to be associated with each dataset, e.g.
• Simulation physics, test setup parameters
• Pointers to pipeline task
• Pointers to e-logbook entries
– Web interface will allow browsing data hierarchy or
searching based on meta-data
– Implementation based on earlier “Grid” data catalog
developed at SLAC.
• Uses XML for import/export of data
– (stored in Oracle XML database)
T.Johnson
6/10
DC2 Workshop – June 2005
GLAST
DC2 Data Catalog?
T.Johnson
7/10
DC2 Workshop – June 2005
GLAST
Data Server Plans
•
•
Continue to enhance pruner/peeler
– Add TCut capability to peeler
– Add access to other data types (SVAC tuple)
Data Server
– Enhance ergonomics of web interface
– Support search using sky catalog
– Work on handling larger data volumes
– Add ability to download events in different formats
• FITS, run/event #, Different Root tuples
– Add ability to browse events using event display
– Use xrootd server to stream data
• Eliminate waiting for batch job and FTP transfer
– Experiment with SLAC “Peta-Cache” system
• Initially use xrootd to serve existing Root tuples
• Highest performance may require storing tuples in some other
format
T.Johnson
8/10
DC2 Workshop – June 2005
GLAST
Data Pump – Streaming data directly to users
Data Server
Format
Converter
TCut
Format
Converter
TCut
Format
Converter
TCut
Multiple Threads
T.Johnson
xrootd
Root Files
9/10
DC2 Workshop – June 2005
GLAST
Conclusions
• Initial data server available
– Would like some people to try it and give feedback
• Lots of work to do
– Need to set goals/priorities for DC2 work
• Understand timescales
• Understand what data volume will be
• Understand what typical queries will be
T.Johnson
10/10
GLAST
DC2 Workshop – June 2005
Hierarchical Data Catalog
T.Johnson
11/10
Download