Distributed Cooperative Buffering Presentation

Cash: Distributed Cooperative
Buffer Caching
Christopher DeCoro
Harper Langston
Jeremy Weinberger
What is Cash?
Motivation and Background
• Cash: A Distributed Cooperative Buffer Cache System
• Problem: Applications need to access large amounts of data
• Different trends in the performance of primary and
secondary memory
» fast (6400 MB/sec)
» expensive (~$0.25 / MB)
» limited (4 GB max in 32bit machines, 2-3 GB practical max)
– Magnetic Storage
» inexpensive (~ $ 0.001/ MB )
» effectively unlimited (>250 GB/drive)
» relatively slow (30-60 MB/sec), high latency (2-6 ms/seek)
• Idea: Leverage the strengths of each through networking
– Fast compared to Disks (125 MB/sec for Gigabit Ethernet)
– Effectively unlimited number of connected hosts
Proposed Solution
• Based off Traditional Approaches:
Buffer Cache Management and Disk Paging
– Try to store frequently used blocks from slow
memory in faster memory
– When memory full, copy out to disk (slow
performance when used frequently)
• Our Solution:
Distributed Cooperative Buffer Caching
– Rather than access data from disk, we would
prefer to access from network
– Instead of paging out to disk – page to network
– Extra blocks can be sent to other computers;
retrieved when needed
– Essentially, we add additional layer to memory
Talk outline
• Introduction
• Application Programming Interface
– Principles and Goals
– System Structure
– Function-call Interface
• System Implementation
• Experimental Validation
• Conclusion
Application Programming InterfacePrinciples and Goals:
• Should closely resemble current file-access methods
– Open / Read / Write / Close
– Transparent to applications
– Could potentially be implemented in a standard Unix-like kernel
• Needs to be efficient
– Cash does not provide additional functionality
– Goal is to make a slow operation faster
• Needs to be portable and easily deployed
– No special kernel support required
– Little overhead for administrators
Application Programming InterfaceSystem Structure (1):
• Cache Servers
– Provide memory services, allowing other servers to retrieve or
assign blocks
– Run a Cache Manager process, that maintains the cache on the
local machine
• Cache Clients
– Access the memory resources provided by the cache servers
– Also runs a manager process, though its local cache may be
comparatively very small compared to other cache servers.
– In this sense, every client machine is also a server.
Application Programming InterfaceSystem Structure (2):
• Every instance of Cash has three major components
– Client Program
– Application Interface Library
– Cache Manager
Function Call Interface
• copen – Opens a file, returning a CFILE handle
– CFILE * copen( char * filename, char * mode )
– Will start Cache Manager, if not currently running
– Establishes communication between current process and Cache
Manager Process (uses Unix Domain Sockets for IPC)
• cread – Reads data from file
– int cread ( void * buffer, int size, int n, CFILE * file )
Sends datagram to ask manager for a block in a particular file
Manager replies with the offset of data in a special cache file
Cread maps that data into process memory, using mmap
Buffers data to prevent need for IPC on every read
• cclose – Closes file, reclaims resources
– void cclose ( CFILE * file )
Talk outline (End Chris Section)
• Introduction
• Application Programming Interface
• System Implementation
– Local Cache Management
– Global Cache Management
• Experimental Validation
• Conclusion
Local Cache Management
• Breaks disk files into a set of blocks, stored in two queues
– Divided into master blocks & non-master blocks
» Master: only copy in cluster, evicted last
» Non-master: duplicate copy, used for local access, evicted before masters
– Once the queue is filled, least recently used blocks are removed
– We assume it is cheaper to remove non-masters, and request them
over network if necessary, than to re-read master from disk
• All blocks exist in system cache file
– The cache file is mapped into memory
– We use the GNU memory-mapped malloc library to manage storage
within this file.
– This allows for zero-copy transfer of data to client address space
– Block addresses are mapped to file offset, and sent to client
Global Cache Management
• Each cache server has its distinct set of master blocks
– Will communicate with other servers to access their blocks
– Blocks accessed from other servers are non-master blocks
• Block locations are indicated with “hints”
– When a block is loaded from disk, a “hint” message is broadcast to
indicate that that particular disk block is now on a specific server
– When a block is needed, servers will use their hint table to contact the
appropriate server and request the block
– If the response is negative, or times out, server will load from disk
• On eviction, blocks can be sent to other servers
– Servers selected at random
– Can choose to accept or decline. On acceptance, becomes the new
owner of that given block
– If old owner receives request, it forwards request to new owner
Talk outline
Application Programming Interface
System Implementation
Experimental Validation
Application Test-bench
Workload Generation
Test Setup
Experimental Results
• Conclusion
Application Test-Bench
• Distributed Web Server
– With the explosive growth of the Internet, web site sizes are
increasing exponentially
– Need to serve very large amounts of data
– Need to quickly service very large numbers of clients
• Based on the SPED system architecture
– Single-process Event Driven
– Uses libasync for asynchronous communication with clients
• Uses the standard Cash system interface
– Will access multiple Cache Servers to distribute memory load
– Ideally, we would see fewer disk accesses, resulting in faster end-toend times for clients
Workload Generation
• SURGE: Scalable URL Reference Generator
– Allows us to create synthetic workloads
– Shown to probabilistically model real-life web access patterns
– Creates a specific number of clients processes and threads, to run
for a set amount of time
• Test Sequences
– Several independent variables affect web server performance:
Workload size
Number of clients
Number of cache servers
Total cache size
– Important metrics of system performance include:
» Mean page access time
» Average requests handled per second
» Average bytes transferred per second
Test Setup
• Server configurations
– 1-server, 2-server, and 4-server; each with Web and Cache server
– Also use 0-server (non-Cash) as a control
– Two client machines are used to make requests of the servers
• Test varies workload size, number of clients and test time
– Results will compare performance under different situations
– We expect that we will see linear improvement as Cash servers
• Control test will compare a 4-server Cash cluster vs. 4
independent Web servers
– All other variables kept constant between the two configurations
– Indicates whether Cash is directly responsible for performance
Experimental Results
• The results show strong
performance, especially for
the 4-server configuration
– As expected, the 4-server
configuration does not have
2x the performance of the 2server, due to some network
• Other tests confirm that a nmachine Cash cluster is
more efficient than n
independent machines
– Indicates Cash is practical
• Addresses memory access problem by leveraging the strengths
of primary and secondary memory through networking
• Allows large clusters of machines to share memory, thereby
reducing total disk access time
• Enhances the reasonable working set size of data-centric
applications, and has been experimentally shown to improve
• Is easy to implement and deploy
• Has unintended benefit of providing fault tolerance