RMT - MKS 2

advertisement
The European
X-Ray Laser Project
Redundant EPICS IOCs
Matthias Clausen
Gongfa Liu
Bernd Schoeneburg
(DESY)
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
XFEL
X-Ray Free-Electron Laser
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Agenda
•
•
•
•
•
High Availability
Choose the ‘right’ approach
Design
Implementation
Functionality
– Redundancy Monitor Task
– Continuous Control Executive
– SNL Executive
• Redundancy management
– Diagnostic
– SNL Debugger
• Outlook
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
2
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
High Availability Example: File-Server
• Main Service: NFS File Server (1)
– must be cluster service
• Adding Services: Archiver (2)
– as cluster service?
– as ‘managed’ service!
• Adding Services: LDAP-HA
Archiver
(2)
Archiver
(2)
(1)
– must be cluster service
right choice for a file server?
Keep your eye on the main service. Do not allow other
services to interfere with the main service.
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
3
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
High Availability: How to implement it?
• Increase Availability by Following Mill Specs ?
• Redundant Components !
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
4
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
High Availability: Why?
In our case:
• 24/7 Cryogenic operations for more then one year of
operation without any interruption
Necessary for:
• FLASH Cryogenic Plant
Will be converted from (redundant) commercial to
redundant EPICS next year.(1/3 of the system)
• XFEL Cryogenic Plant
Will be converted from (redundant) commercial to
redundant EPICS in 2010.(remaining part)
• XFEL Cryogenic (and possibly Utility) Controls in the
XFEL Tunnel
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
5
The European
X-Ray Laser Project
When using redundant IOCs?
In applications, where high availability is needed and
the failure of an IOC can cause a long plant
breakdown.
If you have to be able to maintain the system during
operation. Like exchanging a power supply, or loading
new software versions.
If a risk of failure is increased; e.g. in areas, where
ionizing radiation can be present.
Design goal:
The redundant IOC pair must be more
reliable than a standalone system!
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
XFEL
X-Ray Free-Electron Laser
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Project Schedule:
• Design Phase (June 2005)
• Identifying the main components:
– Redundancy Monitor Task
– Continuous Control Executive
– SNL Executive
• Implementation Phase (March-September 2006)
– Redundancy Monitor Task: Industry
– Continuous Control Executive: Bob and his brother
– SNL Executive: DESY with SLAC support
• Testing (2006-2007)
• Porting RMT and CCE to other OS
(cooperation with KEK)
• Testing (2007)
• Porting to uTCA System (planned)
• Production: Middle of 2008
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
7
The European
X-Ray Laser Project
Layout: RMT, CC-Exec, SNL-Exec
LAN
The Redundancy Monitor
Task (RMT) is an
implementation
Independent from EPICS
Redundancy Monitor
Process
Control
Update
Process
Q
U
E
RMT
CCE
SNL
Contr
ol
Exec
SNL
Exec
Scan
Task
SNL
Prog
Scan
Task
SNL
Prog
:
:
:
:
Scan
Task
I/O
Driver
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
SNL
Prog
I/O
Driver
I/O
Driver
Driver
SNL
Update
Process
Q
U
E
XFEL
X-Ray Free-Electron Laser
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Basic Layout
Eternet
private
Eternet
RMT
controller
(IOC) A
RMT
controller
(IOC) B
The RMT is a process
which handles all
redundancy issues
within the IOC
redundant pair
Two IOC form a redundant pair
One controller is active (Master state)
The other IOC keeps synchronized with the Master
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
9
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Ethernet Connectivity
•check global Ethernet
Global services:
Time
Alarming
.....
CA
Clients
•check public Ethernet
•check private Ethernet
•RMT communication
•synchronization of data
•CA broadcast
Eternet
•Master replies only
private
Eternet
RMT
controller
(IOC) A
Master
RMT
controller
(IOC) B
redundant pair
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
10
The European
X-Ray Laser Project
RMT: the Redundancy Supervisor
XML Diagnose/Control IF
Monitor XML Task
Shell Commands
rmtConfig
rmtStart
rmtStop
rmtShutdown
rmtReport ...
rmtConfig
rmtStart
rmtStop
rmtShutdown
rmtReport ...
External RMT Functions
Redundancy Monitor Task
Checks communication
over public ethernet
by connects to
time or ftp server
Main Thread
Status Machine
Ethernet
Controller
TCP/IP Communication
to Partner RMT
TCP/IP Communication
to Partner RMT
Task A
Control
Table
TCP/IP
Controller
Public
Ethernet
Task B
Driver IF
TCP/IP
Controller
Private
Ethernet
Driver IF
Driver n
Process internal Thread
Process realized by T-H
Process realized by Desy
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
Config
File
PRR
Controller
Driver IF
SNL Exec
PRR
Controller
PRR
Controller
Driver IF
Control Exec
PRR
Controller
Driver IF
CA Server
Watch Dog
XFEL
X-Ray Free-Electron Laser
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
How the RMT works
•
Control the processes of interest for redundancy
Init
•
Processes register themselves to be
controlled
•
Communicate with the RMT in the other IOC
•
Set the IOC in the Master- or Slave-state
(manage switch-over)
•
Monitor network connections (slide before)
Master
Slave
CMD to
fail
Master
fail
Slave
test
RMT State Machine
(sinplified)
RMT
In a redundant IOC PRR-processes register
by calling a RMT function and wait for a
start command.
register
start
stop
sync read
status
process
Some processes need to be synchronized
with their partner process in the other IOC.
Synchronization over the private Ethernet is
controlled by the RMT.
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
When to fail over?
From the Preamble of the design Document:
“Any redundant implementation must make the system
more reliable than the non redundant one.
Precaution must be taken especially for the
detection of errors which shall initiate the failover.
This operation should only be activated if there is no
doubt that keeping the actual mastership definitely
causes more damage to the controlled system than
an automatic failover.”
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
13
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
EPICS Specific Parts
LAN
RMT is an implementation
Independent from EPICS
CCE: The Continuous
Control Executive permanently
collects changes on the master
to update the client
SNL Executive:
Permanently collects states and
values from SNL programs on the
master. Sending changes to the
slave.
Redundancy Monitor
Process
Control
Update
Process
Q
U
E
RMT
CCE
Contr
ol
Exec
SNL
Exec
Scan
Task
SNL
Prog
Scan
Task
SNL
Prog
:
:
:
:
Scan
Task
I/O
Driver
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
SNL
SNL
Prog
I/O
Driver
I/O
Driver
Driver
SNL
Update
Process
Q
U
E
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Remote Diagnostic
Client
XML Request files can be
passed though the RMT to
any registered underlying
process.
The final destination of the
Message will generate an answer.
SNL Executive:
In this special case the remote
diagnostic protocol is used to
debug the SLS programs actually
Executive running on the remote
IOC.
Redundancy Monitor
Process
Control
Update
Process
Q
U
E
RMT
CCE
Contr
ol
Exec
SNL
Exec
Scan
Task
SNL
Prog
Scan
Task
SNL
Prog
:
:
:
:
Scan
Task
I/O
Driver
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
SNL
SNL
Prog
I/O
Driver
I/O
Driver
Driver
SNL
Update
Process
Q
U
E
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Status
• Implementing support for redundant systems is quite a
challenging task.
• The current implementation is improving its maturity by
continuous testing. (Gongfa Liu – (Hefei-China) DESY)
• The RMT and CCE code has bee ported to Linux by
Artem Kazakov (KEK) to run redundant (soft)-IOCs on
Linux. (see TPPA31)
• The ported RMT has been used to implement redundant
CA-Gateways. (Artem Kazakov)
(An option for load balancing is also available)
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
16
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Outlook
• RMT can be used independent from EPICS to implement
redundant applications.
• The SNL debugging features will be improved.
(Joint development of SLAC and DESY)
• First production of a redundant IOC is foreseen for
middle of 2008
• An implementation based/ running on uTCA is desired.
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
17
The European
X-Ray Laser Project
XFEL
X-Ray Free-Electron Laser
Test System
The test system consists of two Compact PCI CPUs in
separated crates with redundant power supplies each.
Thank you!
Matthias Clausen, Gongfa Liu, Bernd Schoeneburg (DESY),
ICALEPCS, 2007
Download