Fault-Tolerant General Purpose Servers Express5800/ft series servers Product Information Express5800/ft Series Servers High Availability Technologies Approaches to Reliability and Availability ▐ Select and combine hardware and software technologies for availability Higher availability of a single server FT server + cluster Redundant hardware (dual modular architecture) •Continuous operation despite of hardware failures. •Simplified installation and operation Fault tolerant server FT server cluster Enhance fault tolerance of the hardware Single server (Typical servers) Partially redundant hardware (e.g. HDD, PSU) Cluster software Enhance availability of the system Failover across multiple servers •Enhanced HW/SW failure resilience •For Large scale system with scalable nodes etc. Higher availability of the system Select the best availability solution according to system requirements Page 3 © NEC Corporation 2013 FT Server and Cluster Solution Comparison Fault tolerant server Cluster system Aim High availability of a single server Achieve availability / scalability / load balancing Technology Lockstep (CPU&MEM) and Failover (I/O) (Synchronized in normal conditions) Failover Load balancing Failure Failover process CPU CPU Failure Isolation MemoryMemory HDD Service during failure Failover HDD Isolate faulty component Failover to other servers Continuous operation (no interruption) Operation is interrupted for failover process (some several minutes to 10 minutes) Resilience Performance enhancement Hardware failures Hardware/ Software failures Add CPU Add CPU or node. Supports servers with 4 or more sockets Supported apps General applications No modifications needed Failover settings is required for each app. (creation of script batch files) •System configuration requires no app modifications •Continuous operation without interruption •Ideal for 24-7 systems, email and Web servers •Features load balancing as well as availability •Software failure-resilient •Suitable for large-scale systems (scalable nodes) ft servers provide hardware availability and can be installed quick and easily Ft servers + EXPRESSCLUSTER solution takes advantage of both solutions Page 4 © NEC Corporation 2013 Recovery Process from HW Failures Express5800/ft series server Continuous operation Recovery complete Non-stop service Failure In service In service 1. Instantaneous isolation of the faulty module Module #0 In service 2. Resynchronization after replacement Isolated faulty model Processing Lockstep Processing Replacement of faulty module Processing Module #1 Cluster system Failure Start failover process System down for a few mins to 10 mins In service Service Intermittence 1. Interruption 2. Determine failover host (a few secs) (a few secs to 1-2 mins) Failover Page 5 Failover complete 3. Takeover of cluster resources (e.g. NW settings and disks) (a few secs to 1 min) © NEC Corporation 2013 Restart service 4. Restart apps (a few secs to a few mins) Repair / Replace Express5800/ft Series Servers Optional Features to Increase Fault Tolerance Express Report Service Support • Isolate the failed components to continue operation. • Monitor hardware status at the service center. • Support the system proactively to ensure continuous availability. Express Report Service ② ① Only the alert information will be sent out with dedicated software (secure environment) Isolation Failure CPU CPU CPU CPU Mem Mem Mem Mem HDD HDD HDD HDD Client Continuous Operation Recovery ④ CPU CPU Mem Mem HDD HDD Hardware monitoring & detection ③ Via the internet (mail server) public line (modem connection) Alert Notification Replace NEC (monitoring center) CPU CPU Mem Mem HDD HDD NEC Service Center Notification Page 7 © NEC Corporation 2013 Support for Redundant Peripheral Devices Peripheral Devices ▐ Selection of LTO or DAT and support for redundant backup* ◆ Double backup configuration is supported to provide for failures during backup ◆ LTO or DAT drives are offered for selection ft series Module #1 SAS Controller Backup device SAS Controller Backup device Module #2 Data is output from each module to achieve backup redundancy Both backups are created almost simultaneously * Configuration of standalone backup is also supported ▐ A two UPS configuration provides tolerance against UPS defects* ft series UPS Module #1 PSU Uninterruptable power supply PSU Uninterruptable power supply Module #2 Page 8 Connecting each UPS to separate power sources helps avoid being affected by failures of the power sources UPS © NEC Corporation 2013 * Single UPS configuration is also supported. UPS is controlled through the network ft series + EXPRESSCLUSTER for Higher Availability Enhancement SW ▐ Clusters with ft servers enhance both HW and SW availability Software failure EXPRESSCLUSTER Failover to secondary server EXPRESSCLUSTER monitors SW Apps Apps OS OS Module #0 Module #1 ft server (secondary) Module #0 Module #1 Hardware failure ft server (primary) ft series server Highest level of availability suitable for critical systems Page 9 © NEC Corporation 2013 Benefits of ft Series + EXPRESSCLUSTER Enhancement SW ▐ Clusters using ft servers deliver the benefits of both solutions Function HW failure tolerance Treatment Treatment time SW failure tolerance Treatment Treatment time Periodical maintenance (SW update) Performance enhancement Apps settings Express5800/ft server Cluster system (configured by normal servers) Cluster system (configured by ft servers) Lockstep and Failover (within a server) Failover (between multiple servers) Failover (between multiple servers) ★★★ ★★☆ ★★★ Isolate faulty module (within the server) Failover from the primary server to the secondary server Isolate faulty module within the primary server (no failover between nodes) Few minutes (Depends on the time necessary to startup apps) Instantaneous Instantaneous - ★★☆ ★★☆ (Apps level failures can be resolved by SingleServerSafe software) Failover from the primary server to the secondary server Failover from the primary server to the secondary server - Several minutes (Depends on the time necessary to startup apps) Several minutes (Depends on the time necessary to startup apps) ★★☆ Active Upgrade enables OS patches to be applied with only short interruption ★★★ ★★★ Each node can be separated for upgrade Each node can be separated for upgrade ★★☆ ★★★ ★★☆ Add CPU Add CPU or Nodes Add CPU ★★★ General apps can be used without special modifications ★☆☆ ★☆☆ Takeover process is required for each app Takeover process is required for each app Legend: ★★★: Excellent, ★★☆: Good, ★ ☆ ☆ : Fair Page 10 © NEC Corporation 2013 ft server + Hyper V + EXPRESSCLUSTER Enhancement SW ▐ Clusters configured on Hyper-V on an ft server Software failure EXPRESSCluster EXPRESSCluster monitors SW In the event of a SW failure, the operation fails over to another guest OS Apps Apps Guest OS Guest OS Hyper-V™ 2.0 Hardware failure ft server Module #0 Module #1 ft series server High HW and SW availability for virtualized environments Page 11 © NEC Corporation 2013 ExpressCluster X SingleServerSafe Enhancement SW ▐ SW is monitored on the ft server to automatically restart the SW in the event of a failure. ◆ SingleServerSafe (SSS) monitors the server and SW status at all times. ◆ In an event of a failure, SSS restarts the service, process, OS etc. to resume operation. ◆ The ft server and SSS in tandem can handle both HW and SW failures Service Process Restart Restart Apps SingleServerSafe By enabling failure detection and restart/reboot, SSS helps handle a wide range of failures with a single server By using the optional monitoring function of EXPRESSCluster, SSS is capable of further detailed monitoring including the detection of stalling in data bases. OS Reboot SW availability can be improved even for a single ft server Page 12 © NEC Corporation 2013 Page 13 © NEC Corporation 2013