ATLAS Databeses

Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance  No real problems uploading data to the correct tag  Calibration experts starting to deal with ‘real’ IOVs (data valid for calibn period)  New POOL file registration scripts worked fine  Calibration users need to be in AFS group atlcond:poolcond  Consider doing calibration uploads from a ‘calibration’ account, not personal ones?  No instances of data in COOL without corresponding (or wrong) POOL file upload  No use of run-signoff database pages yet  System was not ready and integrated yet (holidays; too busy with other things)  But only one set of runs, and all calibrations were ‘accepted’ - no real test  Handling of detector status information works technically  Merging and transfer to LBSUMM folder (for ESD/AOD) still done by hand  Limited mapping of DQ histograms to status flags restricts usefulness  Need to make sure this improves for real data  Need to clarify how detector status flags are dealt with in ES1, ES2 processing 2nd September 2008 Richard Hawkings / Paul Laycock 1 Conditions DB access problems  Big problems in Tier-0 conditions DB access Thursday night/ Friday morning  Combination of several factors  2/4 of Oracle server nodes got into trouble and restarted  Kernel patch being applied this week, some interdependencies not fully understood yet  Server full of ‘stuck’ connections which were never released or cleaned up - deadlock  Very high load due to FDR2 bulk reprocessing and cosmics reprocessing going on in parallel, plus FCT, ATN, RTT, TCT tests, plus user jobs  All jobs accessing Oracle directly, no use of SQLite replicas at present  Replica only useful once the run is ended online - applicable to ES2, bulk reco only  Vulnerability in that ALL Athena jobs accessing Oracle use same reader account  Limit of 800 concurrent sessions, now changed to 4 x 800  Each Athena job holds O(10) connections in parallel until end of first event (one per subdetector schema) - typically for 5 minutes or so. Vulerable to ‘deadlock’  Further actions being pursued  Deploy SQLite replica for bulk processing (but not for cosmics / express stream)  Use a dedicated COOL reader account for Tier-0 jobs - guarantee # connections  Reduce connection load from Athena jobs (short/long term actions) 2nd September 2008 Richard Hawkings / Paul Laycock 2 Next steps - discussion needed  Work on conditions DB access problems  Deployment of SQLite replicas to be used where possible  Start to setup tag hierarchies for first data  Separate top-level tags to be used by HLT, monitoring, Tier-0, reprocessing  Define calibration loop model for first data  Cosmics processing has no calibration loop, and several ‘express’ streams  Same plan for single beam running, or move to ‘calibration loop’  Calibration 24hrs might be needed for code fixes even if no prompt calibration can be done yet, might have multiple processings at Tier-0  What to do for first collisions  Sign-off tool and Tier-0/conditions integration to support all this ..? 2nd September 2008 Richard Hawkings / Paul Laycock 3

ATLAS Databeses

Related documents

Products

Support

ATLAS Databeses

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib