High-Availability Linux Reliability Availability Serviceability What are HA clusters? High-availability clusters are groups of computers that support sever applications that can be reliably utilized with a minimum of down-time. Utilizes redundant computers in clusters that provide service when any system components fail. Failover – A process by which HA clusters detect hardware or software faults and restart the application on another system without requiring administrative intervention Emphasis on a layered approach to redundancies Primary software of Linux-HA is called Heartbeat No fixed limit on nodes, allowing use with clusters of any size Parallel resource monitoring – as with normal computing, but can shift resources from one node to another if the initial node fails Automatically removes failed nodes from the cluster Integrates with many popular software packages, including: Apache, DB2, Oracle, PostgreSQL GUI included for easier controlling and monitoring of the clusters and relevant resources Originally capable of only handling two nodes at a time Did not include resource monitoring Would later switch to a layered design implementing n-node clusters Project was split into various separate packages Pacemaker – Cluster resource manager component that handles resource management and node failure Heartbeat – Now only refers to the layer used for communication between clusters and the individual nodes of the cluster. Resource Agents – a standardized interface for a cluster resource, used for translating operations to a cluster and determining success or failure per process Cluster Glue – a set of libraries, tools, and utilities for use with Heartbeat and Pacemaker; this includes everything not covered by Heartbeat (messaging), Pacemaker (resource management), Resource Agents (cluster operations) Local Resource Manager – similar to the Pacemaker, but solely exists for one client in the cluster, and is thus not “aware” of the status of the rest of the cluster