Advanced I/O Techniques for Efficient and Highly Available Process Crash Recovery Protocols Thesis Presentation Jason Cornwell 03/15/2011 Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Computing Intensive Applications Network Centric Services Recent Advances Motivation & Goals Demand for more computing power and high-bandwidth network connections Advances in Microprocessors and Networks Parallel Computing Performance and Scalability Reliability and Availability Simplicity and Accessibility Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Reliability Problems Large numbers of CPUs, Memory Modules, Hard Disk Drives, Network Interfaces, Network Switches Low Mean-Time-To-Failure (MTTF) and/or High Failure-In-Time (FIT) Classification of Failure • Transient Failure – Power glitch – System patch and reboot – ECC trap • Partial “Permanent” Failure – Disk failure – Partial network failure • Wholesale “Permanent” Failure – Total hardware failure – Natural disaster Availability Problems Large numbers Processes, Threads, Software Barriers, Busy Waiting Temporarily Unresponsive and/or Unavailable Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Possible Solutions • Transient Failure – Restart/replay/resume on the same node – Task-migration is possible • Permanent Partial Failure – Rebalance the workload on surviving nodes – Partial task-migration is needed • Permanent Wholesale Failure – Reconfigure the applications and services – Massive task-migration to new platform Checkpointing • Common feature in high-performance computing (HPC) platforms • Saves the execution state • Application or system-level • Mechanism for task migration Application vs System Level • Application-level Recovery Point – Developed application specific – Generally smaller footprint – Data accessiblity restrictions • Kernel-level Recovery Point – Snapshot processes – Full resource restoration – Flexibility due to system level preemption Berkeley Labs Checkpoint/Restart • • • • • • System-level Kernel-module Checkpoint creation implemented Process recovery implemented Linked to BLCR libraries at execution Stores checkpoint data locally (stack, heap, registers, signals, etc.) Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Contribution • Enhanced BLCR performance through latency tolerant technique • Increased BLCR availability through novel checkpoint creation technique I/O Optimization • Avoided extreme modification to BLCR • Reduce the disk latency of checkpoint creation • Implemented a caching technique • Improved I/O performance 4-fold or more • System overhead less than 300KB in experimental test results Checkpoint Caching • Buffer used as temporary storage • Storage block flushed in large volume • Trade-off between resource consumption and improved I/O efficiency cr_copy(chkptData, count) if(chkptBuf is NULL) kmalloc size of count for chkptBuf space; copy chkptData into chkptBuf; else kmalloc size of count + chkptBuf size for tempBuf space; copy chkptBuf into tempBuf; krealloc chkptBuf for its expanded size; memmove tempBuf into chkptBuf; kfree memory for tempBuf; end if Optimized Write Operation Remote Checkpoint • BLCR is limited to local disk storage • Remote checkpoint offers off-site storage option • Uses sockets to transmit data • Needs predefined destination • Outperforms BLCR in some experimental tests Remote Checkpoint Server • Single thread daemon • Used GCC compiler • Stores the recovery point external to the client node • Could be ported to Microsoft derivative while(true) create socket; bind to address; listen for incoming connections; wait for client to connect; create file descriptor; while(data buffered received) write checkpoint data; close file descriptor; close socket; Modified Write Operation • TCP packets • MTU must be reached before delivery • Only modification is to the write operation of BLCR if(remote chkpt) if(socket is NULL) create socket; establish connection, if handshake fails break and perform the original_chkpt; end if package checkpoint data; send data message; end if if(original_chkpt) original BLCR write operation; end if Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Design I/O Optimization Write write(chkptData, count) if(chkptBuf has space for the incoming chkptData) cr_copy(ckptData, count); else vfs_write(chkptBuf); vfs_write(chkptData); kfree(chkptBuf); end if Remote Checkpoint Write Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Experimental Setup I/O Optimization Remote Checkpoint • • • • Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6 BLCR Implementation Optimized BLCR (O-BLCR) Implementation • • • • Dell PowerEdge 700, 2.80 GHz Dual-processor Intel Pentium 4, 3 GB Memory, 5,400 RPM Hard Disk, Linux 2.6 Dell Workstation, 3.06 GHz Intel Pentium 4, 1 GB Memory, 5,400 RPM Hard Disk, Linux 2.6 BLCR Implementation BLCR with NFS (BLCR+NFS) BLCR with our Remote Checkpoint Technique (BLCR+R) Benchmarks Resource Utilization Program • • • • Benchmark CPU Memory I/O TSP High Low Low AES High Low Medium GE Low High High Medium Medium Medium NP-Complete HC Data Encryption Linear Equation Solver File Compression I/O Optimization Results Remote Checkpoint Results Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Conclusion • Minimal modification to BLCR • I/O optimization technique reduced the write latency of BLCR • Remote checkpoint increases BLCR availability with new feature • These techniques should be deployed into the foundation of BLCR source code Agenda • • • • • • • • Introduction Challenges Pertinent Background Proposed Techniques Implementations Experimental Setup & Results Conclusions Future Work Future Work • Server authentication protocol • Data packet encryption • Automated process load balancing Questions