Building Reliable Services Using Backdoors Stephen Smaldone Department of Computer Science Rutgers University Frustration Scalability Service.com Planetary-Scale Services Internet Attacks Failure 9:00pm EST 2:00am GMT 11:00am JST Human operators, phone calls and emails hard to scale Cost of ownership dramatically exceeds cost of systems The Dream: A Defensive Architecture Internet Attacks Failure BD BD Gateway BD BD BD BD BD BD BD Gateway Gateway Private Network 9:00pm EST 2:00am GMT 11:00am JST Possible Healing Actions Refresh the state (reboot) Destructive and Disruptive Repair the state (continue) Recover the state (transfer) How to access the memory of the failed system when the OS is “hung”? The Motivating Philosophy Something is better than nothing Faster is better than slower Repairing state faster than repairing software It is hard to corrupt or stop an outsider Save application state if possible Remote healing better than self-healing Attackers and faults are becoming “smarter” Try “holistic” approach if nothing else The Backdoor (BD) Backdoor: a hidden software or hardware mechanism, usually created for testing and troubleshooting --American National Standard for Telecommunications Backdoor Design Principles 1. Availability BD must be highly available (even when OS is not) 2. Non-intrusiveness BD operations must not involve local OS (zerooverhead monitoring) 3. Integrity OS cannot alter BD execution or modify the result of a BD operation 4. Responsiveness A BD operation cannot be delayed indefinitely Possible Backdoor Implementations A programmable network interface (I-NIC) A virtual machine over a VMM Our current prototype is on Myrinet Work in progress over Xen IBM’s Remote Supervisor Adapter? HP’s Remote Management Adapter? Backdoor as building block Remote Healing Systems A computer system monitors/repairs/recovers the state of a remote system through the backdoor Backdoor is controlled by the remote OS Defensive Architectures Backdoors are programmed to execute defensive tasks, stand-alone or cooperatively over a private network Standalone backdoor Outline Introduction Backdoor Idea Remote Healing Defensive Architectures Conclusions Remote Healing Backdoor prototyped on I-NIC (Myrinet) Remote Repair of OS State Remote Recovery for Cluster-Based Internet Servers Backdoor on I-NIC “Front door” CPU Mem NIC I-NIC Backdoor Private Network Backdoor provides an alternative access to system memory without involving local CPU/OS Private network over a specialized interconnect, VPN, or even over a phone link! A Remote Healing Architecture Monitor System Target System CPU CPU BD BD Mem Mem I/O I/O Backdoors use Remote Memory Communication Target Memory MONITOR (RemoteRead) Monitor Memory CPU CPU Recovery/Repair (RemoteRead/Write) BD BD NIC CPU Remote OS Locking Implemented by a BD-OS protocol Two functions Provides exclusive access to target OS data for state repairing Enforces fail-stop model in the recovery case to avoid the consequences of false positives in failure detection Can be avoided? Yes for monitoring OS Support for Remote Healing Monitoring and Failure Detection Sensor Box: system health indicators (sensors) provided by the target OS in its local memory Sensors: <UniqueID, Type, Threshold , Value> Repairing Externalized State: OS state data that the BD can read Remote Access Hooks: OS control data that the BD can write to perform repairing actions Recovery Continuation Box: fine-grain OS and application checkpoint state that the BD can transfer between systems to migrate running applications Sensor Box (SB) Collection of health indicators (sensors) in the target OS memory <ID, Type, Threshold, Value> Sensor Type Threshold Progress Update deadline Level Max/Min value Pressure Max number of events Failure Detection using Sensor Box Target OS updates progress sensors in SB continuously Monitoring thread reads SB periodically and checks counters Failure = counter stalled beyond its deadline False positive rate vs. detection latency tradeoff Sensor Box Target OS <Timer interrupts> <Context switches> Monitor <NIC interrupts> … Backdoor Monitoring and Detection Using BD Mem CPU Mem CPU Remote view Sensor Box Detection BD BD Diagnosis and Repairig Diagnosis Inspect live OS data structures in target’s memory (through the externalized state) Identify damaged OS state (e.g. resource exhaustion due to memory hogging processes) Repairing Modify target OS memory (through remote access hooks) to correct damaged state (e.g. remove memory hogging processes by “injecting” a kill signal in its process control block) Diagnosis Using BD Mem CPU Mem CPU Fine grained Diagnosis view BD Externalized state BD Repair Using BD Mem CPU Mem CPU Correct state BD Repair Repair Hook BD Case Study: Repairing OS State Damaged OS state : resource exhaustion, corrupted data structures, compromised OS, etc. Resource exhaustion Attack, overload, system misconfiguration, programming error Repairing cannot rely on local resources Two examples Fork bomb Memory hog Case Study : Memory Hog Program allocates memory in an infinite loop Both memory and swap space are occupied by the memory hog System is inaccessible from console or the network Cannot spawn new processes Cannot handle interrupts Local daemons cannot repair system Remote Repairing in case of Memory Hogging Monitoring Pressure sensor signals when severe low memory condition is detected Diagnosis Target externalizes process table and process memory usage statistics Monitoring thread identifies the culprit Repairing Monitoring thread kills culprit by remotely posting a SIGKILL Prototype BD implemented on Myrinet LanaiX NIC Modified firmware and low level GM library Modified FreeBSD 4.8 kernel Experimental setup Dell Poweredge 2600 servers with 2.4 GHz dual Intel Xeon, 1GB RAM, 2GB swap, Myrinet Lanai X NIC Benchmark: simple counting program with fixed number of iterations Effectiveness of Remote Repairing 20 Execution time (s) Impaired system With remote repair 15 10 5 0 0 2 4 6 8 10 12 Number of memory hog processes 14 16 Repairing Timeline Memory pressure Remote Repair Local cleanup of damaged state Detection Diagnosis & Repair End of repair 0 0.5 1 1.5 Time (s) 2 2.5 3 Remote Healing Backdoor prototype using Myrinet Remote Repair of OS State Remote Recovery for Cluster-based Internet Servers Clusters with BD Network M M I/O P BD T M BD M I/O P T M BD M T Interconnect T I/O P M BD M I/O P Cluster-based Internet Services with BD network Client Client Client Server Server Server Monitor Monitor Monitor Cluster-based Internet Services with BD network Client Client Client Server Server Server Monitor Monitor Monitor Continuation Box (CB) Idea Define per client-session state (OS and application) Transfer client sessions from the failed system to other systems in the cluster running the same server application CB encapsulates the state of a client session associated with a server application (possibly multi-process) OS state (data in transit through IPC channels) application-specific state (periodically exported/checkpointed by the application) Continuation Box Extraction Continuation Box Recovered State CPU OS Memory BD Victim machine (crashed) Memory BD Recovery machine (healthy) Client-Session Continuation Box for Multi-Process Servers App. state Comm. state Process 1 Client 1 TCP/IP CB1 Client 2 CB2 Process 2 IPC Continuation Box API create_cb for a client session export application state to CB associate I/O channel with the CB open_cb given an I/O channel import application state from CB Changes to make Server Recoverable while (cid = accept()) { cbid = create_cb(cid) if (import(cbid, &{file_name, offset}) == NULL) { receive(cid, file_name) offset = 0 } fd=open(file_name) seek(fd, offset) while (read(fd, block, size) != EOF) { send(cid, block, size) offset += size export(cbid, {file_name, offset}) } } State Synchronization Problem Application state (SB_APP) updated only upon export OS state (SB_IO) updated continuously by the OS kernel How to synchronize the two components of the CB? Application export OS A1 SB_IO 3 2 A2 OS SB_APP SB Application A1 OS SB_IO 3 2 Application SB_APP SB A1 A1 import SB_IO 3 SB_APP SB A1 CB-based Recovery Log-based rollback recovery OS keeps communication logs (send/receive) restores server state with respect to a client 0-copy using the communication buffers After migration, OS replays send/receive operations from logs transparent to server and client applications Backdoors Prototype Myrinet LanaiX NIC as backdoor Modified FreeBSD kernel in-kernel remote read/write operations Sensor Box, Continuation Box Modified server applications Apache, Flash, Icecast, JBoss Case Study: A Multi-tier Auction Service Front-End (FE) Apache web server Middle Tier (MT) JBoss app. server Back-End MySQL DB server Recoverable RUBiS Experimental Evaluation Experimental setup Dell PowerEdge 2600 servers, 2.4 GHz dual Intel Xeon, 1GB RAM, 1Gb Ethernet Workload modeled after TPC-W Fault injection in FE and MT nodes synthetic freeze, emulated freeze by remote OS locking, bugs inserted in network drivers Evaluation Low overhead under load Recovery is fast Low Overhead under Load 8,000 Base Recoverable FE 7,000 Recoverable FE+MT Requests/min 6,000 5,000 4,000 3,000 2,000 1,000 0 20 100 300 500 Clients 700 900 1,100 Recovery is Fast Failure Recovery latency Detection Latency Detection Import CB Recovery ends 0 5 10 15 Time (ms) 20 25 30 Outline Introduction Backdoor Idea Remote Healing Experience Defensive Architectures Conclusions Autonomous Backdoor BD is programmed to execute defensive tasks, then “sealed” Defensive Architecture Hierarchy Defensive Computer Architecture (DCA) Defensive Network Architecture (DNA) Individual computers equipped with BD BD performs local defensive tasks (e.g. OS state inspection) Cluster nodes equipped with BDs connected over high-speed private network BDs perform defensive tasks cooperatively (e.g. OS integrity checking, continuous remote logging) Defensive Inter-Network Architectures (DINA) Loosely coupled DNAs connected over the Internet or other networks DNA cooperate (e.g. early warnings of virus attacks) Defensive Inter-Network Architecture over PlanetLab (new project) Internet Attacks Failure BD BD Gateway BD BD BD BD BD BD BD Gateway Gateway Private Network 9:00pm EST 2:00am GMT 11:00am JST Local Memory Inspection (Work in Progress) Orion - Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states Related Work DEC WRL Titan system [’86] Recoverable OS subsystems Rio reliable file cache [Chen ‘96] Recovery Box [Baker ‘92] Defensive Programming [Qie ‘03] Nooks [Swift ’04] Recovery Oriented Computing [Patterson’02] Microreboot [Candea’04] TCP Connection Failover[Snoeren’01, Sultan’01, Alvisi’01, Koch’03, Mishra’03, Zagorodnov’03] Automatic repair of data structures [Demski ‘03] K42 [Soules ’03] Hypervisor-based fault tolerance [Bressoud ‘95] Conclusions The Backdoor is a promising building block for remote healing and defensive architectures Feasibility studies for Remote Repairing and Remote Recovery using I-NIC-based Backdoor prototype Current work includes Defensive Architectures and Orion People and Money Behind Backdoors Liviu Iftode Florin Sultan Aniruddha Bohra Pascal Gallard (INRIA/IRISA, France) Iulian Neamtiu (University of Maryland) Yufei Pan Arati Baliga Tzvika Chumash NSF CAREER CCR-0133366 Thank You! http://discolab.rutgers.edu/bda Yes, BD Security! (work in progress) BD under OS control Access to remote memory controlled through memory registration (established at the initialization time) Voting scheme for remote writes (delayed writes) BDs monitor each other and their OSes integrity Autonomous BD OS cannot access BD memory after initialization (possible with PCI Express) Local Memory Inspection (Work in Progress) Kernel Integrity Monitoring & Healing Search for kernel rootkits individual kernel functions kernel tables e.g. syscall dynamic structures e.g. the process table, etc Repair the kernel when compromised Replace tampered tables with clean versions. Replace corrupt versions of kernel functions with clean ones. Holistic Approach to System Failure Prediction Identify kernel memory update patterns and correlate them to predict unstable system states