Conditions data handling in FDR2c Tag hierarchies set up (largely by Paul) and communicated in advance No real problems uploading data to the correct tag Calibration experts starting to deal with ‘real’ IOVs (data valid for calibn period) New POOL file registration scripts worked fine Calibration users need to be in AFS group atlcond:poolcond Consider doing calibration uploads from a ‘calibration’ account, not personal ones? No instances of data in COOL without corresponding (or wrong) POOL file upload No use of run-signoff database pages yet System was not ready and integrated yet (holidays; too busy with other things) But only one set of runs, and all calibrations were ‘accepted’ - no real test Handling of detector status information works technically Merging and transfer to LBSUMM folder (for ESD/AOD) still done by hand Limited mapping of DQ histograms to status flags restricts usefulness Need to make sure this improves for real data Need to clarify how detector status flags are dealt with in ES1, ES2 processing 2nd September 2008 Richard Hawkings / Paul Laycock 1 Conditions DB access problems Big problems in Tier-0 conditions DB access Thursday night/ Friday morning Combination of several factors 2/4 of Oracle server nodes got into trouble and restarted Kernel patch being applied this week, some interdependencies not fully understood yet Server full of ‘stuck’ connections which were never released or cleaned up - deadlock Very high load due to FDR2 bulk reprocessing and cosmics reprocessing going on in parallel, plus FCT, ATN, RTT, TCT tests, plus user jobs All jobs accessing Oracle directly, no use of SQLite replicas at present Replica only useful once the run is ended online - applicable to ES2, bulk reco only Vulnerability in that ALL Athena jobs accessing Oracle use same reader account Limit of 800 concurrent sessions, now changed to 4 x 800 Each Athena job holds O(10) connections in parallel until end of first event (one per subdetector schema) - typically for 5 minutes or so. Vulerable to ‘deadlock’ Further actions being pursued Deploy SQLite replica for bulk processing (but not for cosmics / express stream) Use a dedicated COOL reader account for Tier-0 jobs - guarantee # connections Reduce connection load from Athena jobs (short/long term actions) 2nd September 2008 Richard Hawkings / Paul Laycock 2 Next steps - discussion needed Work on conditions DB access problems Deployment of SQLite replicas to be used where possible Start to setup tag hierarchies for first data Separate top-level tags to be used by HLT, monitoring, Tier-0, reprocessing Define calibration loop model for first data Cosmics processing has no calibration loop, and several ‘express’ streams Same plan for single beam running, or move to ‘calibration loop’ Calibration 24hrs might be needed for code fixes even if no prompt calibration can be done yet, might have multiple processings at Tier-0 What to do for first collisions Sign-off tool and Tier-0/conditions integration to support all this ..? 2nd September 2008 Richard Hawkings / Paul Laycock 3