The LHC Beam Interlock System - Indico

advertisement
2012 Availability
B. Todd, A. Apollonio, A. Macpherson, J. Uythoven, D. Wollmann
+ Cryo + QPS + Powering + Machine Protection +…
Evian workshop - 1v2
Outline…
CERN
Goal of this presentation:
Give you summary of 2012 availability… subjective…
Three key actors for determining availability…
Post-Mortem data
Operations – TIMBER and the elogbook
Equipment – equipment group tracking
Putting these together for dumps above 450.1 GeV…
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
2
CERN
Post Mortem : Dump Cause – 2010
355 in total
[1]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
3
CERN
Post Mortem : Dump Cause – 2011
503 in total
[2]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
4
Post Mortem : Dump Cause – 2012
CERN
2010 in green
3.7% 
22.8% 
355  585 in total
12.7% 
[3]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
5
CERN
Post Mortem : Dump Cause – 2012
11
26
74
228
246
585 dumps
[3]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
6
CERN
Post Mortem : Dump Cause – 2012
11
26
74
228
6
345 dumps
+ 64 Test
+ 176 End of Fill
D. Wollmann – this session
[4]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
7
CERN
Operations : Lost Physics & Fault Time
Lost Physics
impact on physics:
+
Fault Time
benjamin.todd@cern.ch
= stable beams cut short
= waiting to re-start
Operations Workshop – Evian – December 2012
8
Lost Physics
CERN
Lost Physics
= stable beams cut short by faults
Average time in physics when reaching End of Fill = 9 hours … good turnaround = 3 hours
if fill did not have 9 hours stable beams : dump cause is assigned up to 3 hours lost physics
= Lost Physics
Start Physics
Begin Injection
benjamin.todd@cern.ch
Begin Injection
Operations Workshop – Evian – December 2012
9
CERN
Mean Stable Beams 9 Hours @ End of Fill
[PM + TIMBER]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
10
CERN
Operations : Lost Physics per Cause
812 hours lost physics
Better metric = luminosity impact …
A. Apollonio – HL LHC – IPAC
345 causes = 812h = 34 days
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
[5]
11
Fault Time
CERN
Fault Time
Cause X
FAULT
Fault Time X
Cause Y
F
Fault Time Y
Cause Z
Start Physics
Begin Injection
benjamin.todd@cern.ch
= time to repair a faulty system
F
Fault Time Z
Beam Abort
Start Physics
Begin Injection
Operations Workshop – Evian – December 2012
12
CERN
Operations : Fault Time per Cause
1524 hours = 64 days = fault time
[6]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
13
CERN
Operations : Lost Physics + Fault Time
812 hours = 34 days = lost physics
[5]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
14
CERN
Operations : Lost Physics + Fault Time
812 hours = 34 days = lost physics
1524 hours = 64 days = fault time
[5 + 6]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
15
Equipment
CERN
Fault Time
Cause X
FAULT
Fault Time X
Cause Y
F
Fault Time Y
Cause Z
Start Physics
= time to repair a faulty system
F
Fault Time Z
Start Physics
Beam Abort
cause given full fault time, even if it is shared: not fair representation
Concentrate on top 3 systems:
Power converters
Quench protection
Cryogenics
Data from equipment groups…
Begin Injection
Begin Injection
look into all fault time assigned by operations – compare with their records
Big point: not in the elogbook = not investigated in this presentation
fault lists here could be (are) incomplete aiming to see things from operations view.
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
16
CERN
Equipment : Power Converters
Post Mortem: 35 beam aborts
Operations: 59 faults = 106 hours
Cause
External
Internal
Radiation Induced
Combined
#
2
38
12
52
Total
[h]
2.5
64.8
25.2
92.4
External
Internal
Radiation
benjamin.todd@cern.ch
Average
[h]
1.3
1.7
2.1
1.8
[V. Montabonnet and Y. Thurel] [7]
Operations Workshop – Evian – December 2012
17
CERN
Equipment : Power Converters
Post Mortem: 35 beam aborts
Operations: 59 faults = 106 hours
Cause
External
Internal
Radiation Induced
Combined
#
2
38
12
50
Total
[h]
2.5
64.8
25.2
89.9
External
Internal
Radiation
benjamin.todd@cern.ch
Average
[h]
1.3
1.7
2.1
1.8
[V. Montabonnet and Y. Thurel] [7]
Operations Workshop – Evian – December 2012
18
Equipment : Power Converters
CERN
Post Mortem: 35 beam aborts
Operations: 59 faults = 106 hours
Duration
[hours]
Cause
#
External
Internal
Radiation Induced
Combined
2
38
12
50
Total
[h]
2.5
64.8
25.2
89.9
Average
[h]
1.3
1.7
2.1
1.8
≤1
1-2
2-3
3-4
4-5
≥5
1
17
3
1
10
3
7
3
2
3
1
1
remote reset..
+ long intervention can also come from late piquet calls, or linked problems (e.g. access)
+ some in the shadow of others
External
Internal
Radiation
benjamin.todd@cern.ch
[V. Montabonnet and Y. Thurel] [7]
Operations Workshop – Evian – December 2012
19
CERN
Equipment : Quench Protection
Post Mortem: 56 beam aborts
Operations: 57 faults = 112 hours
EE + QPS Protection Functions were 100% successful
+ some in the shadow of others
External
Internal
Radiation
benjamin.todd@cern.ch
[R. Denz, K. Dahlerup-Petersen and I. Romera] [8]
Operations Workshop – Evian – December 2012
20
CERN
Equipment : Quench Protection
Post Mortem: 56 beam aborts
Operations: 57 faults = 112 hours
Cause
#
CMW / WorldFIP
DFB
EE (600A)
EE (13kA)
QPS DAQ failure
QPS DAQ radiation induced
QPS Detector failure
QPS Detector radiation induced
Combined
3
6
6
1
10
7
7
15
46
Total
[hours]
2.7
17.3
18.9
4.7
27.4
11.8
12.0
14.2
89
Average
[hours]
0.9
2.9
3.2
4.7
2.7
1.7
1.7
0.9
1.9
EE + QPS Protection Functions were 100% successful
+ some in the shadow of others
External
Internal
Radiation
benjamin.todd@cern.ch
[R. Denz, K. Dahlerup-Petersen and I. Romera] [8]
Operations Workshop – Evian – December 2012
21
Equipment : Quench Protection
CERN
Post Mortem: 56 beam aborts
Operations: 57 faults = 112 hours
Duration
[hours]
Cause
CMW / WorldFIP
DFB
EE (600A)
EE (13kA)
QPS DAQ failure
QPS DAQ radiation induced
QPS Detector failure
QPS Detector radiation induced
Combined
#
3
6
6
1
10
7
7
15
46
Total
[hours]
2.7
17.3
18.9
4.7
27.4
11.8
12.0
14.2
89
Average
[hours]
[h]
0.9
2.9
3.2
4.7
2.7
1.7
1.7
0.9
1.9
≤1
1-2
2-3
3-4
3
1
1
2
1
2
1
3
3
3
8
2
2
2
5
1
1
1
2
1
4-5
≥5
1
1
2
1
1
2
1
1
EE + QPS Protection Functions were 100% successful
+ some in the shadow of others
External
Internal
Radiation
benjamin.todd@cern.ch
[R. Denz, K. Dahlerup-Petersen and I. Romera] [8]
Operations Workshop – Evian – December 2012
22
Equipment : Cryogenics
CERN
Post Mortem: 14 beam aborts
Operations: 37 faults = 358 hours
Duration
[hours]
Cause
Supply (CV / EL / IT)
User
Cryogenics failure
Cryogenics radiation induced
Combined
#
17
28
46
4
95
Total
[h]
19
25
233
57
334
<8
8-30
>30
17
28
33
1
11
2
2
1
≈14 days downtime of total time ≈263 days = 95% availability
External
Internal
Radiation
benjamin.todd@cern.ch
[S. Claudet and E. Duret] [9]
Operations Workshop – Evian – December 2012
23
Equipment : Cryogenics
CERN
Post Mortem: 14 beam aborts
Operations: 37 faults = 358 hours
Duration
[hours]
Cause
#
Supply (CV / EL / IT)
User
Cryogenics failure
Cryogenics radiation induced
Combined
17
28
46
4
95
Total
[h]
19
25
233
57
334
<8
8-30
>30
17
28
33
1
11
2
2
1
≈14 days downtime of total time ≈263 days = 95% availability
downtime halved 2010  2012 – proactive approach to improving availability
+ after LS1, a quench (user) at 6 TeV = 10-12h to recover…
External
Internal
Radiation
benjamin.todd@cern.ch
[S. Claudet and E. Duret] [9]
Operations Workshop – Evian – December 2012
24
Considering Complexity
CERN
Can we compare systems by their complexity?
Asked each system to give “number of hardware signals which can provoke beam abort”…
initial informal attempt:
System
Approximate Number
Reference
RF
800
O. Brunner
Beam Interlock System
2000
B. Todd
Cryogenics
3500
S. Claudet
Quench Protection
14000
R. Denz
BLM (surveillance of protection function)
18000
C. Zamantzas
BLM (protection function)
48000
C. Zamantzas
Better metric for complexity…
TE/MPE & AWG – student 2013
Study by A. Apollonio
[10]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
25
2005 Predictions…
CERN
2005 – Reliability Sub-Working Group
Predicted false dumps and safety of Machine Protection System
safety: no events
false dumps: used to determine whether predictions were accurate
System
Predicted
2005
Observed
2010
Observed
2011
Observed
2012
LBDS
6.8 ± 3.6
9
11
4
BIS
0.5 ± 0.5
2
1
0
BLM
17.0 ± 4.0
0
4
15
PIC
1.5 ± 1.2
2
5
0
QPS
15.8 ± 3.9
24
48
56
SIS
-
4
2
4
radiation induced effects are included in the figures above
false dumps – in line with expectations…
safety –therefore in line with expectations… if ratio false dumps to safety is ok.
Study by A. Apollonio
[11]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
26
2005 Predictions…
CERN
2005 – Reliability Sub-Working Group
Predicted false dumps and safety of Machine Protection System
safety: no events
false dumps: used to determine whether predictions were accurate
System
Predicted
2005
Observed
2010
Observed
2011
Observed
2012
LBDS
6.8 ± 3.6
9
11
4
BIS
0.5 ± 0.5
2
1
0
BLM
17.0 ± 4.0
0
4
15
PIC
1.5 ± 1.2
2
5
0
QPS
15.8 ± 3.9
24
48
56
SIS
-
4
2
4
radiation induced effects are included in the figures above
false dumps – in line with expectations…
safety –therefore in line with expectations… if ratio false dumps to safety is ok.
Study by A. Apollonio
[11]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
27
Conclusions 1/3
CERN
Machine Protection:
Operation:
345 beam aborts
34 days lost physics due to beam aborts
64 days due to equipment in fault
Top 3 systems: Cryogenics, Power Converters, Quench Protection…
Equipment:
operations logbook fault time and number of faults not consistent…
…responsibility for faults needs to be correctly assigned
…dependency between faults needs to be included
(e.g. power piquet stuck at a broken access door)
Increased radiation post LS1  equipment needs to be more reliable to keep the same availability…
Good points:
We can consolidate this information to a certain extent!
Thanks to Daniel, Andrea, Jan, Alick… and all the equipment gurus 
Less good:
many sources, view points, need cross-checking, interpreting, integrating…
The data from equipment groups doesn’t align to the elogbook – consolidation headache…
Cannot always be done as no rigorous application of rules.
It not possible to have “correct” data
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
28
Conclusions 2/3
CERN
The future: four points worth considering…
metric for luminosity impact …
A. Apollonio – HL-LHC – IPAC
1. Availability should be objective…
+ We need some metrics and rules…
better metric for complexity…
TE/MPE & AWG – student 2013
2. information capture should be easier and rigorous…
e.g. eLogbook: tracking and understanding faults is inconsistent.
+ is it the central place to store fault information?
3. Dealing with parallel / hidden / dependent faults should be built in…
+ find one fault, fix it, find another, fix it, … etc…
+ Is there a way to better predict this? Big Sister? LASER? DIAMON?
4. Information analysis should be easier…
+ better tools needed
+ Simple, easy to use, make benefits obvious
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
29
Conclusions 3/3
CERN
With the right analysis: bottlenecks can identified and corrected proactively…
e.g. diodes in power converters considering change = predicted reliability following observations
If we understand the availability of the machine protection – we understand the safety…
value this information: worth $$$$$ in the future.
In all of these cases:
Availability Working Group AWG – small forum – eager to participate
Maintenance Management Project MMP – plans in this area
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
30
Availability Working Group (AWG)
CERN
https://espace.cern.ch/LHC-Availability-Working-Group/Meetings/SitePages/Home.aspx
Goal:
Improve LHC integrated luminosity via a transverse, strategic view
B. Todd [TE/EPC] / L. Ponce [BE/OP] co-chairs
A. Apollonio [TE/MPE] Scientific Secretary
•
•
•
•
•
sharing ideas / concepts, information, experience
discussing common metrics
information capture equipment
information capture operation
analysis techniques and bases
6 meetings to date, lots of ideas and information
Putting together the key players in a nice & open environment
Next Milestones:
benjamin.todd@cern.ch
•
•
•
•
Metrics…
Measurement of complexity…
reliability prediction for upgraded systems…
Tools for operation – and working with Alick on the stats page
Operations Workshop – Evian – December 2012
31
CERN
Fin!
Thank you
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
References
CERN
[1] – PM database Extracted from 23rd March – 6th December 2010
[2] – PM database Extracted from 17th February – 13th December 2011
•
•
Fills above 450.1 GeV
Ignore “no input change”
[3] – PM database Extracted from 1st March – 6th December 2012
[4] – Sort by MPS Dump Cause. Discard EOF, MD & MPS TEST
[5] – Calculate Stable Beams per fill from TIMBER, assign lost-physics by MPS Dump Cause from [4]
[6] – eLogbook extract from 1st March – 6th December 2012
• duplicate entries suppressed
• MISCELLANEOUS is ignored, except for 4 x BSRT entries
• No correction for faults which “roll-over” between shifts
[7] – Data directly given from Power Converter experts – sorted by failure mode
[8] – Data directly given from QPS experts – sorted by failure mode
[9] – Data directly given from Cryogenics experts – up to 15/12/12 sorted by failure mode
[10] – Difficult to consolidate the concept of hardware signal… for example is inside an FPGA
hardware?
[11] – Data studied for 2012 run baseline from table 1 of:r
http://accelconf.web.cern.ch/AccelConf/p05/PAPERS/TPAP011.PDF
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
33
CERN
Spare Slide : Parallel Faults
≈½ have >1 cause
[eLogbook + TIMBER]
benjamin.todd@cern.ch
Operations Workshop – Evian – December 2012
34
Download