2012 Availability B. Todd, A. Apollonio, A. Macpherson, J. Uythoven, D. Wollmann + Cryo + QPS + Powering + Machine Protection +… Evian workshop - 1v2 Outline… CERN Goal of this presentation: Give you summary of 2012 availability… subjective… Three key actors for determining availability… Post-Mortem data Operations – TIMBER and the elogbook Equipment – equipment group tracking Putting these together for dumps above 450.1 GeV… benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 2 CERN Post Mortem : Dump Cause – 2010 355 in total [1] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 3 CERN Post Mortem : Dump Cause – 2011 503 in total [2] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 4 Post Mortem : Dump Cause – 2012 CERN 2010 in green 3.7% 22.8% 355 585 in total 12.7% [3] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 5 CERN Post Mortem : Dump Cause – 2012 11 26 74 228 246 585 dumps [3] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 6 CERN Post Mortem : Dump Cause – 2012 11 26 74 228 6 345 dumps + 64 Test + 176 End of Fill D. Wollmann – this session [4] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 7 CERN Operations : Lost Physics & Fault Time Lost Physics impact on physics: + Fault Time benjamin.todd@cern.ch = stable beams cut short = waiting to re-start Operations Workshop – Evian – December 2012 8 Lost Physics CERN Lost Physics = stable beams cut short by faults Average time in physics when reaching End of Fill = 9 hours … good turnaround = 3 hours if fill did not have 9 hours stable beams : dump cause is assigned up to 3 hours lost physics = Lost Physics Start Physics Begin Injection benjamin.todd@cern.ch Begin Injection Operations Workshop – Evian – December 2012 9 CERN Mean Stable Beams 9 Hours @ End of Fill [PM + TIMBER] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 10 CERN Operations : Lost Physics per Cause 812 hours lost physics Better metric = luminosity impact … A. Apollonio – HL LHC – IPAC 345 causes = 812h = 34 days benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 [5] 11 Fault Time CERN Fault Time Cause X FAULT Fault Time X Cause Y F Fault Time Y Cause Z Start Physics Begin Injection benjamin.todd@cern.ch = time to repair a faulty system F Fault Time Z Beam Abort Start Physics Begin Injection Operations Workshop – Evian – December 2012 12 CERN Operations : Fault Time per Cause 1524 hours = 64 days = fault time [6] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 13 CERN Operations : Lost Physics + Fault Time 812 hours = 34 days = lost physics [5] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 14 CERN Operations : Lost Physics + Fault Time 812 hours = 34 days = lost physics 1524 hours = 64 days = fault time [5 + 6] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 15 Equipment CERN Fault Time Cause X FAULT Fault Time X Cause Y F Fault Time Y Cause Z Start Physics = time to repair a faulty system F Fault Time Z Start Physics Beam Abort cause given full fault time, even if it is shared: not fair representation Concentrate on top 3 systems: Power converters Quench protection Cryogenics Data from equipment groups… Begin Injection Begin Injection look into all fault time assigned by operations – compare with their records Big point: not in the elogbook = not investigated in this presentation fault lists here could be (are) incomplete aiming to see things from operations view. benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 16 CERN Equipment : Power Converters Post Mortem: 35 beam aborts Operations: 59 faults = 106 hours Cause External Internal Radiation Induced Combined # 2 38 12 52 Total [h] 2.5 64.8 25.2 92.4 External Internal Radiation benjamin.todd@cern.ch Average [h] 1.3 1.7 2.1 1.8 [V. Montabonnet and Y. Thurel] [7] Operations Workshop – Evian – December 2012 17 CERN Equipment : Power Converters Post Mortem: 35 beam aborts Operations: 59 faults = 106 hours Cause External Internal Radiation Induced Combined # 2 38 12 50 Total [h] 2.5 64.8 25.2 89.9 External Internal Radiation benjamin.todd@cern.ch Average [h] 1.3 1.7 2.1 1.8 [V. Montabonnet and Y. Thurel] [7] Operations Workshop – Evian – December 2012 18 Equipment : Power Converters CERN Post Mortem: 35 beam aborts Operations: 59 faults = 106 hours Duration [hours] Cause # External Internal Radiation Induced Combined 2 38 12 50 Total [h] 2.5 64.8 25.2 89.9 Average [h] 1.3 1.7 2.1 1.8 ≤1 1-2 2-3 3-4 4-5 ≥5 1 17 3 1 10 3 7 3 2 3 1 1 remote reset.. + long intervention can also come from late piquet calls, or linked problems (e.g. access) + some in the shadow of others External Internal Radiation benjamin.todd@cern.ch [V. Montabonnet and Y. Thurel] [7] Operations Workshop – Evian – December 2012 19 CERN Equipment : Quench Protection Post Mortem: 56 beam aborts Operations: 57 faults = 112 hours EE + QPS Protection Functions were 100% successful + some in the shadow of others External Internal Radiation benjamin.todd@cern.ch [R. Denz, K. Dahlerup-Petersen and I. Romera] [8] Operations Workshop – Evian – December 2012 20 CERN Equipment : Quench Protection Post Mortem: 56 beam aborts Operations: 57 faults = 112 hours Cause # CMW / WorldFIP DFB EE (600A) EE (13kA) QPS DAQ failure QPS DAQ radiation induced QPS Detector failure QPS Detector radiation induced Combined 3 6 6 1 10 7 7 15 46 Total [hours] 2.7 17.3 18.9 4.7 27.4 11.8 12.0 14.2 89 Average [hours] 0.9 2.9 3.2 4.7 2.7 1.7 1.7 0.9 1.9 EE + QPS Protection Functions were 100% successful + some in the shadow of others External Internal Radiation benjamin.todd@cern.ch [R. Denz, K. Dahlerup-Petersen and I. Romera] [8] Operations Workshop – Evian – December 2012 21 Equipment : Quench Protection CERN Post Mortem: 56 beam aborts Operations: 57 faults = 112 hours Duration [hours] Cause CMW / WorldFIP DFB EE (600A) EE (13kA) QPS DAQ failure QPS DAQ radiation induced QPS Detector failure QPS Detector radiation induced Combined # 3 6 6 1 10 7 7 15 46 Total [hours] 2.7 17.3 18.9 4.7 27.4 11.8 12.0 14.2 89 Average [hours] [h] 0.9 2.9 3.2 4.7 2.7 1.7 1.7 0.9 1.9 ≤1 1-2 2-3 3-4 3 1 1 2 1 2 1 3 3 3 8 2 2 2 5 1 1 1 2 1 4-5 ≥5 1 1 2 1 1 2 1 1 EE + QPS Protection Functions were 100% successful + some in the shadow of others External Internal Radiation benjamin.todd@cern.ch [R. Denz, K. Dahlerup-Petersen and I. Romera] [8] Operations Workshop – Evian – December 2012 22 Equipment : Cryogenics CERN Post Mortem: 14 beam aborts Operations: 37 faults = 358 hours Duration [hours] Cause Supply (CV / EL / IT) User Cryogenics failure Cryogenics radiation induced Combined # 17 28 46 4 95 Total [h] 19 25 233 57 334 <8 8-30 >30 17 28 33 1 11 2 2 1 ≈14 days downtime of total time ≈263 days = 95% availability External Internal Radiation benjamin.todd@cern.ch [S. Claudet and E. Duret] [9] Operations Workshop – Evian – December 2012 23 Equipment : Cryogenics CERN Post Mortem: 14 beam aborts Operations: 37 faults = 358 hours Duration [hours] Cause # Supply (CV / EL / IT) User Cryogenics failure Cryogenics radiation induced Combined 17 28 46 4 95 Total [h] 19 25 233 57 334 <8 8-30 >30 17 28 33 1 11 2 2 1 ≈14 days downtime of total time ≈263 days = 95% availability downtime halved 2010 2012 – proactive approach to improving availability + after LS1, a quench (user) at 6 TeV = 10-12h to recover… External Internal Radiation benjamin.todd@cern.ch [S. Claudet and E. Duret] [9] Operations Workshop – Evian – December 2012 24 Considering Complexity CERN Can we compare systems by their complexity? Asked each system to give “number of hardware signals which can provoke beam abort”… initial informal attempt: System Approximate Number Reference RF 800 O. Brunner Beam Interlock System 2000 B. Todd Cryogenics 3500 S. Claudet Quench Protection 14000 R. Denz BLM (surveillance of protection function) 18000 C. Zamantzas BLM (protection function) 48000 C. Zamantzas Better metric for complexity… TE/MPE & AWG – student 2013 Study by A. Apollonio [10] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 25 2005 Predictions… CERN 2005 – Reliability Sub-Working Group Predicted false dumps and safety of Machine Protection System safety: no events false dumps: used to determine whether predictions were accurate System Predicted 2005 Observed 2010 Observed 2011 Observed 2012 LBDS 6.8 ± 3.6 9 11 4 BIS 0.5 ± 0.5 2 1 0 BLM 17.0 ± 4.0 0 4 15 PIC 1.5 ± 1.2 2 5 0 QPS 15.8 ± 3.9 24 48 56 SIS - 4 2 4 radiation induced effects are included in the figures above false dumps – in line with expectations… safety –therefore in line with expectations… if ratio false dumps to safety is ok. Study by A. Apollonio [11] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 26 2005 Predictions… CERN 2005 – Reliability Sub-Working Group Predicted false dumps and safety of Machine Protection System safety: no events false dumps: used to determine whether predictions were accurate System Predicted 2005 Observed 2010 Observed 2011 Observed 2012 LBDS 6.8 ± 3.6 9 11 4 BIS 0.5 ± 0.5 2 1 0 BLM 17.0 ± 4.0 0 4 15 PIC 1.5 ± 1.2 2 5 0 QPS 15.8 ± 3.9 24 48 56 SIS - 4 2 4 radiation induced effects are included in the figures above false dumps – in line with expectations… safety –therefore in line with expectations… if ratio false dumps to safety is ok. Study by A. Apollonio [11] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 27 Conclusions 1/3 CERN Machine Protection: Operation: 345 beam aborts 34 days lost physics due to beam aborts 64 days due to equipment in fault Top 3 systems: Cryogenics, Power Converters, Quench Protection… Equipment: operations logbook fault time and number of faults not consistent… …responsibility for faults needs to be correctly assigned …dependency between faults needs to be included (e.g. power piquet stuck at a broken access door) Increased radiation post LS1 equipment needs to be more reliable to keep the same availability… Good points: We can consolidate this information to a certain extent! Thanks to Daniel, Andrea, Jan, Alick… and all the equipment gurus Less good: many sources, view points, need cross-checking, interpreting, integrating… The data from equipment groups doesn’t align to the elogbook – consolidation headache… Cannot always be done as no rigorous application of rules. It not possible to have “correct” data benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 28 Conclusions 2/3 CERN The future: four points worth considering… metric for luminosity impact … A. Apollonio – HL-LHC – IPAC 1. Availability should be objective… + We need some metrics and rules… better metric for complexity… TE/MPE & AWG – student 2013 2. information capture should be easier and rigorous… e.g. eLogbook: tracking and understanding faults is inconsistent. + is it the central place to store fault information? 3. Dealing with parallel / hidden / dependent faults should be built in… + find one fault, fix it, find another, fix it, … etc… + Is there a way to better predict this? Big Sister? LASER? DIAMON? 4. Information analysis should be easier… + better tools needed + Simple, easy to use, make benefits obvious benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 29 Conclusions 3/3 CERN With the right analysis: bottlenecks can identified and corrected proactively… e.g. diodes in power converters considering change = predicted reliability following observations If we understand the availability of the machine protection – we understand the safety… value this information: worth $$$$$ in the future. In all of these cases: Availability Working Group AWG – small forum – eager to participate Maintenance Management Project MMP – plans in this area benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 30 Availability Working Group (AWG) CERN https://espace.cern.ch/LHC-Availability-Working-Group/Meetings/SitePages/Home.aspx Goal: Improve LHC integrated luminosity via a transverse, strategic view B. Todd [TE/EPC] / L. Ponce [BE/OP] co-chairs A. Apollonio [TE/MPE] Scientific Secretary • • • • • sharing ideas / concepts, information, experience discussing common metrics information capture equipment information capture operation analysis techniques and bases 6 meetings to date, lots of ideas and information Putting together the key players in a nice & open environment Next Milestones: benjamin.todd@cern.ch • • • • Metrics… Measurement of complexity… reliability prediction for upgraded systems… Tools for operation – and working with Alick on the stats page Operations Workshop – Evian – December 2012 31 CERN Fin! Thank you benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 References CERN [1] – PM database Extracted from 23rd March – 6th December 2010 [2] – PM database Extracted from 17th February – 13th December 2011 • • Fills above 450.1 GeV Ignore “no input change” [3] – PM database Extracted from 1st March – 6th December 2012 [4] – Sort by MPS Dump Cause. Discard EOF, MD & MPS TEST [5] – Calculate Stable Beams per fill from TIMBER, assign lost-physics by MPS Dump Cause from [4] [6] – eLogbook extract from 1st March – 6th December 2012 • duplicate entries suppressed • MISCELLANEOUS is ignored, except for 4 x BSRT entries • No correction for faults which “roll-over” between shifts [7] – Data directly given from Power Converter experts – sorted by failure mode [8] – Data directly given from QPS experts – sorted by failure mode [9] – Data directly given from Cryogenics experts – up to 15/12/12 sorted by failure mode [10] – Difficult to consolidate the concept of hardware signal… for example is inside an FPGA hardware? [11] – Data studied for 2012 run baseline from table 1 of:r http://accelconf.web.cern.ch/AccelConf/p05/PAPERS/TPAP011.PDF benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 33 CERN Spare Slide : Parallel Faults ≈½ have >1 cause [eLogbook + TIMBER] benjamin.todd@cern.ch Operations Workshop – Evian – December 2012 34