A Flexible and Efficient API for a Customizable Proxy Cache Vivek S. Pai, Alan L. Cox, Vijay S. Pai, and Willy Zwaenepoel iMimic Networking, Inc. http://www.imimic.com Motivation More features moving into proxy caches – – – – The ubiquitous layer 7 device Filtering, reporting, CDN support, transformation Lots of this being done one-off, ad hoc Can’t know everything at deployment Some approaches for generalization – ICAP/OPES, proprietary mechanisms – But design considerations shifting Goal: new approach for modern environments 2 Contributions Designed event-friendly proxy API Implemented on iMimic DataReactor cache Imposes negligible performance overhead Demo modules – High performance – Low interference 3 Outline Background API Design API Functions Implementation and Performance Conclusions 4 Proxy Cache Concepts clients WAN proxy cache LAN origin servers 5 Why Program a Proxy? It’s at the right point in network – Sees all client-side and server-side HTTP traffic – Can react to both LAN and WAN conditions Already examines layer 7 Groundwork in place for value-adds – Content filtering, access control, etc. 6 Enabling Technologies Moore’s Law – CPU speeds outstripping all other components – Lots of cycles to burn… Proxy software – Increasing efficiency in managing connections, disk storage, etc. Commodity OS/hardware improvements – No longer need specialized systems to run efficient proxy caches 7 Commodity System Improvements 1997: Appliances 4x faster than software running on a 2-processor UltraSparc [Source: Danzig, “NetCache Architecture and Deployment”] 8 Commodity System Improvements 1997: Appliances 4x faster than software running on a 2-processor UltraSparc [Source: Danzig, “NetCache Architecture and Deployment”] 1st NLANR cacheoff (April ’99): gap only 2.5 x – 600 req/sec (Peregrine) vs. 1500 (InfoLibria) 9 Commodity System Improvements 1997: Appliances 4x faster than software running on a 2-processor UltraSparc [Source: Danzig, “NetCache Architecture and Deployment”] 1st NLANR cacheoff (April ’99): gap only 2.5 x 2nd cacheoff (Jan ’00): gap only 1.7x – 1450 req/sec (iMimic) vs. 2400 (Compaq) 10 Commodity System Improvements 1997: Appliances 4x faster than software running on a 2-processor UltraSparc [Source: Danzig, “NetCache Architecture and Deployment”] 1st NLANR cacheoff (April ’99): gap only 2.5 x 2nd cacheoff (Jan ’00): gap only 1.7x 3rd cacheoff (Oct ’00): gap only 15% – 2083 req/sec (Microsoft) vs. 2400 (Compaq) 11 Commodity System Improvements 1997: Appliances 4x faster than software running on a 2-processor UltraSparc [Source: Danzig, “NetCache Architecture and Deployment”] 1st NLANR cacheoff (April ’99): gap only 2.5 x 2nd cacheoff (Jan ’00): gap only 1.7x 3rd cacheoff (Oct ’00): gap only 10% 4th cacheoff (Dec ’01): commodity system best – Performance record: 2700 req/sec (Cintel/iMimic) 12 How free is the CPU? Stratacache Dart-10, with Nokia phone 120 req/sec (7 Mbps) with 300 MHz CPU – CPU mostly idle; performance disk-limited 13 Outline Background API Design API Functions Implementation and Performance Conclusions 14 Previous Customization Approaches Write your own proxy or modify Squid – Huge code, changes likely to conflict with updates ICAP: TCP-based offload – Proxy redirects requests/responses to a separate server for modification Filter-style processes – Plugins where proxy designers anticipated a need (e.g., content filtering) Kernel modules – Difficult programming model, but needed for kernel-integrated proxies 15 Reasons for a New Approach Scalability needed to > 10,000 flows – Filter processes may not scale Limitations of ICAP-style offloading – Offloading small requests adds latency – Need for separate ICAP server with own CPU Programmers want flexibility – Program in C using standard OS and libraries – Avoid problems from later code conflicts 16 Design of the Proxy API Event-aware – Modules notified as requests/responses arrive – Maps well to implementation of modern proxies HTTP-Complete – Capture all key interactions in HTTP requestresponse protocol for full flexibility Support various programming models – Events, threads, processes – Communication via function call or socket 17 HTTP Data Flows Cache Misses Requests Proxy Cache Client Responses Cache Hits Server New Content Cached Content Storage System 18 HTTP Data Flows and the API Client modify modify Proxy Cache modify Server modify modify Storage System 19 HTTP Request-Response Structure Requested URL Request header line 1 Request header line 2 ... Request header line N <blank terminating line> Optional request “body" used in POST requests for forms, etc. Header block – special first line followed by more detail about request/response Body data Response Status Code Response header line 1 Response header line 2 ... Response header line N <blank terminating line> Actual response “body," containing HTML file, image binary data, etc. 20 Design of API Notifications typedef struct DR_FuncPtrs { DR_InitFunc *dfp_init; DR_ReconfigureFunc *dfp_reconfig; DR_FiniFunc *dfp_fini; // on module load // on config change // on module unload DR_ReqHeaderFunc *dfp_reqHeader; DR_ReqBodyFunc *dfp_reqBody; DR_ReqOutFunc *dfp_reqOut; // when req hdr done // on each piece of req body // before req to remote srv DR_DNSResolvFunc *dfp_dnsResolv; // when DNS resolution needed DR_RespHeaderFunc *dfp_respHeader; // when resp hdr done DR_RespBodyFunc *dfp_respBody; // on each piece of resp body DR_RespReturnFunc *dfp_respReturn; // when resp returned to clt DR_TransferLogFunc *dfp_logging; DR_OpaqueFreeFunc *dfp_opaqueFree; DR_TimerFunc *dfp_timer; int dfp_timerFreq; } DR_FuncPtrs; // // // // log entry after req done when each resp completes periodic maintenance timer period (sec) 21 Outline Background API Design API Functions Implementation and Performance Conclusions 22 API Functions Content Adaptation Content Management Customized Administration Utility Functions 23 Content Adaptation Functions to allow modules to inspect and modify requests and replies through cache Client modify modify Proxy Cache modify Server modify modify Storage System 24 Content Adaptation (cont’d) Example uses – Integration into a CDN based on URL rewriting – Transcoding for mobile devices Special features of cache integration – Store modified content – Return multiple versions using HTTP Vary header 25 Content Management Fine-grained control over cacheability – Content-freshness modification/eviction – Content preloading – Content querying Example uses – News CDN needs new home page on major event – Premium services 26 Customized Administration Notifications on logging Example uses – Aggregation at network operation centers – Detection of high error rates indicates bad links 27 Utility Functions Interfaces to underlying OS event-notification – Module may register or clear interest on FD events – API will automatically call back module – Independent of underlying OS mechanisms (e.g., poll, select, /dev/poll, kevent) Configuration options processing 28 Outline Background API Design API Functions Implementation and Performance Conclusions 29 Implementation in DataReactor Commercial proxy server – Portable (x86, Alpha, Sparc), and (FreeBSD, Linux, Solaris) – Fast (exposes overheads) – Independently measured at Proxy Cache-Offs (alone or via OEMs) Support requires < 1000 lines of code Implementation < 6 person-months 30 Sample Modules Ad Remover – Matches ad patterns in Hostname, URI Dynamic Compressor – Uses zlib to compress, store, & serve object Image Transcoder – Color stripping via NetPBM & ijpeg helpers Text Injector – Finds <head> tag, asks helper what to insert Content Manager – Local telnet, then query, fetch, inject, evict objects ICAP client – Implements ICAP 1.0 draft to use external server 31 Web Surfing Now 32 Web Surfing Without Ads 33 Sample Module Implementation Module Name Total Lines Code Lines Semicolons # API call sites Ad Remover 175 115 51 4 Compressor 387 280 126 11 Transcoder + helper 391 +166 309 +118 148 +54 10 Text Injector + helper 473 +56 367 +32 170 +8 12 Manager 675 556 289 56 1024 719 321 15 ICAP Client 34 Measurement Polygraph and PolyMix-3, Measurement Factory – De facto standard for proxy testing Scales with load – – – – Number of clients Number of servers Data set size Working set size Very long test time – Fill phase (~14 hours) – Test phase (~10 hours) 35 0 5 10 15 20 Time (hours) 2nd Load Phase Fill Phase 1st Load Phase PolyGraph Test Phases 25 30 36 PolyGraph Hit Rates Cacheable Offered Actual 37 Our Test Environment Proxy - 1.4GHz Athlon, 2GB memory 5 SCSI disks, GigE, FreeBSD Harness – 10 Polygraph client/server machines – Target load: 1450 reqs/sec – 16000 simultaneous connections Pmix-3: Modified Polymix-3 – Single fill phase for all tests – Load phase time cut in half – Slight increase in hit rate 38 API Performance Throughput Response req/sec Time, ms Miss Time, ms Hit Time, ms Hit Ratio, % Baseline 1452.87 1248.99 2742.53 19.82 57.81 API Enabled 1452.75 1248.95 2743.18 19.86 57.81 Empty Callback 1452.89 1251.25 2744.33 20.87 57.76 Add Headers 1452.62 1251.98 2745.07 20.85 57.74 Body + Headers 1452.84 1250.14 2746.98 22.10 57.85 39 Module Performance Throughput Response req/sec Time, ms Miss Time, ms Hit Time, ms Hit Ratio, % Baseline 1452.87 1248.99 2742.53 19.82 57.81 Ad Remover 1452.72 1248.87 2743.55 20.42 57.81 Images 25 Trans/s 1452.65 1256.60 2753.47 23.21 57.74 Images Max Trans 1452.73 1277.76 2778.09 43.30 57.80 Max Trans Nice 19 1452.68 1250.69 2744.60 20.15 57.78 Compress 75 obj/s 1452.73 1252.24 2745.63 23.44 57.81 Compress 95 obj/s 1452.88 1258.34 2752.63 28.69 57.78 40 Outline Background API Design API Functions Implementation and Performance Conclusions 41 Summary CPUs getting more idle Commodity OS suitable choices High-concurrency servers needed Customizable, efficient event-friendly API Implemented with low overhead Sample results, deployments promising 42 Ongoing Work CoDeeN – a CDN system on PlanetLab – – – – Uses a customized version of DataReactor Being built at Princeton Prototype: 1 week reading + 1 week reading Currently: ~42 nodes (one per site) Lessons – API easy enough for busy grad students – Logging infrastructure would be nice – Want to mask non-HTTP failures 43 Questions? vivek@imimic.com iMimic Networking, Inc. http://www.imimic.com/ Cacheoff-3 Hit Times 45 Cacheoff-3 Miss Times 46 Cacheoff-3 Improvements 47 Cacheoff-3 Price/Performance 48 CacheOff-3 Results 49 CacheOff-3 Results 50 Cacheoff-4 Hit Times 51 Cacheoff-4 Miss Times 52 CacheOff-4 Results 53