Cash: Distributed Cooperative Buffer Caching Christopher DeCoro Harper Langston Jeremy Weinberger What is Cash? Motivation and Background • Cash: A Distributed Cooperative Buffer Cache System • Problem: Applications need to access large amounts of data • Different trends in the performance of primary and secondary memory – RAM » fast (6400 MB/sec) » expensive (~$0.25 / MB) » limited (4 GB max in 32bit machines, 2-3 GB practical max) – Magnetic Storage » inexpensive (~ $ 0.001/ MB ) » effectively unlimited (>250 GB/drive) » relatively slow (30-60 MB/sec), high latency (2-6 ms/seek) • Idea: Leverage the strengths of each through networking – Fast compared to Disks (125 MB/sec for Gigabit Ethernet) – Effectively unlimited number of connected hosts Proposed Solution • Based off Traditional Approaches: Buffer Cache Management and Disk Paging – Try to store frequently used blocks from slow memory in faster memory – When memory full, copy out to disk (slow performance when used frequently) • Our Solution: Distributed Cooperative Buffer Caching – Rather than access data from disk, we would prefer to access from network – Instead of paging out to disk – page to network – Extra blocks can be sent to other computers; retrieved when needed – Essentially, we add additional layer to memory hierarchy CPU SRAM DRAM Network Disk Talk outline • Introduction • Application Programming Interface – Principles and Goals – System Structure – Function-call Interface • System Implementation • Experimental Validation • Conclusion Application Programming InterfacePrinciples and Goals: • Should closely resemble current file-access methods – Open / Read / Write / Close – Transparent to applications – Could potentially be implemented in a standard Unix-like kernel • Needs to be efficient – Cash does not provide additional functionality – Goal is to make a slow operation faster • Needs to be portable and easily deployed – No special kernel support required – Little overhead for administrators Application Programming InterfaceSystem Structure (1): • Cache Servers – Provide memory services, allowing other servers to retrieve or assign blocks – Run a Cache Manager process, that maintains the cache on the local machine • Cache Clients – Access the memory resources provided by the cache servers – Also runs a manager process, though its local cache may be comparatively very small compared to other cache servers. – In this sense, every client machine is also a server. Application Programming InterfaceSystem Structure (2): • Every instance of Cash has three major components – Client Program – Application Interface Library – Cache Manager Function Call Interface • copen – Opens a file, returning a CFILE handle – CFILE * copen( char * filename, char * mode ) – Will start Cache Manager, if not currently running – Establishes communication between current process and Cache Manager Process (uses Unix Domain Sockets for IPC) • cread – Reads data from file – int cread ( void * buffer, int size, int n, CFILE * file ) – – – – Sends datagram to ask manager for a block in a particular file Manager replies with the offset of data in a special cache file Cread maps that data into process memory, using mmap Buffers data to prevent need for IPC on every read • cclose – Closes file, reclaims resources – void cclose ( CFILE * file ) Talk outline (End Chris Section) • Introduction • Application Programming Interface • System Implementation – Local Cache Management – Global Cache Management • Experimental Validation • Conclusion Local Cache Management • Breaks disk files into a set of blocks, stored in two queues – Divided into master blocks & non-master blocks » Master: only copy in cluster, evicted last » Non-master: duplicate copy, used for local access, evicted before masters – Once the queue is filled, least recently used blocks are removed – We assume it is cheaper to remove non-masters, and request them over network if necessary, than to re-read master from disk • All blocks exist in system cache file – The cache file is mapped into memory – We use the GNU memory-mapped malloc library to manage storage within this file. – This allows for zero-copy transfer of data to client address space – Block addresses are mapped to file offset, and sent to client Global Cache Management • Each cache server has its distinct set of master blocks – Will communicate with other servers to access their blocks – Blocks accessed from other servers are non-master blocks • Block locations are indicated with “hints” – When a block is loaded from disk, a “hint” message is broadcast to indicate that that particular disk block is now on a specific server – When a block is needed, servers will use their hint table to contact the appropriate server and request the block – If the response is negative, or times out, server will load from disk • On eviction, blocks can be sent to other servers – Servers selected at random – Can choose to accept or decline. On acceptance, becomes the new owner of that given block – If old owner receives request, it forwards request to new owner Talk outline • • • • Introduction Application Programming Interface System Implementation Experimental Validation – – – – Application Test-bench Workload Generation Test Setup Experimental Results • Conclusion Application Test-Bench • Distributed Web Server – With the explosive growth of the Internet, web site sizes are increasing exponentially – Need to serve very large amounts of data – Need to quickly service very large numbers of clients • Based on the SPED system architecture – Single-process Event Driven – Uses libasync for asynchronous communication with clients • Uses the standard Cash system interface – Will access multiple Cache Servers to distribute memory load – Ideally, we would see fewer disk accesses, resulting in faster end-toend times for clients Workload Generation • SURGE: Scalable URL Reference Generator – Allows us to create synthetic workloads – Shown to probabilistically model real-life web access patterns – Creates a specific number of clients processes and threads, to run for a set amount of time • Test Sequences – Several independent variables affect web server performance: » » » » Workload size Number of clients Number of cache servers Total cache size – Important metrics of system performance include: » Mean page access time » Average requests handled per second » Average bytes transferred per second Test Setup • Server configurations – 1-server, 2-server, and 4-server; each with Web and Cache server – Also use 0-server (non-Cash) as a control – Two client machines are used to make requests of the servers • Test varies workload size, number of clients and test time – Results will compare performance under different situations – We expect that we will see linear improvement as Cash servers increase • Control test will compare a 4-server Cash cluster vs. 4 independent Web servers – All other variables kept constant between the two configurations – Indicates whether Cash is directly responsible for performance improvement Experimental Results • The results show strong performance, especially for the 4-server configuration – As expected, the 4-server configuration does not have 2x the performance of the 2server, due to some network overhead • Other tests confirm that a nmachine Cash cluster is more efficient than n independent machines – Indicates Cash is practical solution Conclusion • Addresses memory access problem by leveraging the strengths of primary and secondary memory through networking • Allows large clusters of machines to share memory, thereby reducing total disk access time • Enhances the reasonable working set size of data-centric applications, and has been experimentally shown to improve performance • Is easy to implement and deploy • Has unintended benefit of providing fault tolerance