Choosing a Proxy Don’t roll the D20! Leif Hedstrom Cisco WebEx Who am I? • Unix developer since 1985 • Yeah, I’m really that old, I learned Unix on BSD 2.9 • Long time SunOS/Solaris/Linux user • • • • Mozilla committer (but not active now) VP of Apache Traffic Server PMC ASF member Overall hacker, geek and technology addict zwoop@apache.org @zwoop +lhedstrom So which proxy cache should you choose? Plenty of Proxy Servers PerlBal And plenty of “reliable” sources… Answer: the one that solves your problem! http://mihaelasharkova.files.wordpress.com/2011/05/5steploop2 But first… • While you are still awake, and the coffee is fresh: My crash course in HTTP proxy and caching! Forward Proxy Reverse Proxy Intercepting Proxy Why Cache is King • The content fastest served is the data the user already has locally on his computer/browser – This is near zero cost and zero latency! • The speed of light is still a limiting factor – Reduce the latency -> faster page loads • Serving out of cache is computationally cheap – At least compared to e.g. PHP or any other higher level page generation system – It’s easy to scale caches horizontally Choosing an intermediary SMP Scalability and performance Ease of use Extensible HTTP/1.1 Features Plenty of Proxy Servers PerlBal Plenty of Free Proxy Servers PerlBal Plenty of Free Proxy Servers PerlBal Plenty of Free Caching Proxy Servers Choosing an intermediary SMP Scalability and performance The problem • You can basically not buy a computer today with less than 2 CPUs or cores • Things will only get “worse”! – Well, really, it’s getting better • Typical server deployments today have at least 8 – 16 cores – How many of those can you actually use?? – And are you using them efficiently?? • NUMA turns out to be kind of a bitch… Solution 1: Multi-threading Single CPU Thread 1 Dual CPU Thread 1 Thread 3 Thread 2 Thread 1 Thread 3 Thread 3 Thread 1 Thread 3 Time Thread 2 Time Problems with multi-threading • It’s a wee bit difficult to get it right! http://www.flickr.com/photos/stuartpilbrow/3345896050 Problems with multi-threading "When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone." From Wikipedia, Illogical statute passed by Kansas legislation . Solution 2: Event Processing Scheduled events Network events Disk I/O events Queue Event Loop Disk handler HTTP state machine Can generate new events Accept handler Problems with Event Processing • It hates blocking APIs and calls! – Hating it back doesn’t help :/ • Still somewhat complicated • It doesn’t scale on SMP by itself Where are we at ? Processes Threads Evented Apache TS Nginx Squid Varnish 1 1 - <n> 1 Based on cores 1 1 Lots Yes Yes Yes *) 1 - <n> Yes *) Can use blocking calls, with (large) thread pool Proxy Cache test setup • • • • • AWS Large instances, 2 CPUs All on RCF 1918 network (“internal” net) 8GB RAM Access logging enabled to disk (except on Varnish) Software versions – – – – – Linux v3.2.0 Traffic Server v3.3.1 Nginx v1.3.9 Squid v3.2.5 Varnish v3.0.3 • Minimal configuration changes • Cache a real (Drupal) site ATS configuration • etc/traffficserver/remap.config: map / http://10.118.154.58 • etc/trafficserver/records.config: CONFIG proxy.config.http.server_ports STRING 80 Nginx configuration try 1, basically defaults (broken, don’t use) worker_processes 2; access_log logs/access.log main; proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m; proxy_temp_path /mnt/nginx_temp; server { listen 80; location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache; } Nginx configuration try 2 (works but really slow, 10x slower) worker_processes 2; access_log logs/access.log main; proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m; proxy_temp_path /mnt/nginx_temp; gzip on; server { listen 80; location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache; proxy_set_header Accept-Encoding ""; } Nginx configuration try 3 (works and reasonably fast, but WTF!) worker_processes 2; access_log logs/access.log main; proxy_cache_path /mnt/nginx_cache levels=1:2 keys_zone=my-cache:8m \ max_size=16384m inactive=600m; proxy_temp_path /mnt/nginx_temp; server { listen 80; set $ae ""; if ($http_accept_encoding ~* gzip) { set $ae "gzip"; } location / { proxy_pass http://10.83.145.47/; proxy_cache my-cache; proxy_set_header If-None-Match ""; proxy_set_header If-Modified-Since ""; proxy_set_header Accept-Encoding $ae; proxy_cache_key $uri$is_args$args$ae; } location ~ /purge_it(/.*) { proxy_cache_purge example.com $1$is_args$args$myae } Thanks to Chris Ueland at NetDNA for the snippet Squid configuration http_port 80 accel http_access allow all cache_mem 4096 MB workers 2 memory_cache_shared on cache_dir ufs /mnt/squid 100 16 256 cache_peer 10.83.145.47 parent 80 0 no-query originserver Varnish configuration backend default { .host = "10.83.145.47”; .port = "80"; } Performance AWS 8KB HTML (gzip) 10,000 25.0 8,000 20.0 Throughput 7,000 6,000 15.0 5,000 12.16 4,000 3,000 7.40 9.20 7.92 2,000 10.0 5.0 1,000 0 0.0 ATS 3.3.1 Nginx 1.3.9 hack Squid 3.2.5 QPS Varnish 3.0.3 Varnish 3.0.3 varnishlog -w Latency Time to s firt res p onse (ms) 22.81 9,000 Performance AWS 8KB HTML (gzip) 10,000 100.00% 9,000 90.00% Throughput 63% 60% 80.00% 70.00% 60.00% 5,000 50.00% 4,000 40.00% 3,000 30.00% 2,000 20.00% 1,000 10.00% 0 0.00% ATS 3.3.1 Nginx 1.3.9 hack Squid 3.2.5 Varnish 3.0.3 Varnish 3.0.3 varnishlog -w QPS CPU usage CPU used (dual core) 73% 7,000 6,000 83% 82% 8,000 16,000 18.0 14,000 16.41 16.0 14.0 Throughput 12,000 12.0 10,000 9.10 8,000 7.27 6,000 4,000 10.0 8.0 6.0 5.93 4.95 4.0 2,000 2.0 0 0.0 ATS 3.3.1 Nginx 1.3.9 hack Squid 3.2.5 QPS Varnish 3.0.3 Varnish 3.0.3 varnishlog -w Latency Time to s firt res p onse (ms) Performance AWS 500 bytes JPG Performance AWS 500 bytes JPG 16,000 100.00% 90.00% Throughput 12,000 84% 78% 77% 77% 76% 80.00% 70.00% 10,000 60.00% 8,000 50.00% 6,000 40.00% 30.00% 4,000 20.00% 2,000 10.00% 0 0.00% ATS 3.3.1 Nginx 1.3.9 hack Squid 3.2.5 Varnish 3.0.3 Varnish 3.0.3 varnishlog -w QPS CPU usage CPU used (dual core) 14,000 Choosing an intermediary HTTP/1.1 Features RFC 2616 is not optional! • Neither is the new BIS revision! • Understanding HTTP and how it relates to Proxy and Caching is important – Or you will get it wrong! I promise. How things can go wrong: Vary! $ curl -D - -o /dev/null -s --compress http://10.118.73.168/ HTTP/1.1 200 OK Server: nginx/1.3.9 Date: Wed, 12 Dec 2012 18:00:48 GMT Content-Type: text/html; charset=utf-8 Content-Length: 8051 Connection: keep-alive X-Powered-By: PHP/5.4.9 X-Drupal-Cache: HIT Etag: "1355334762-0-gzip" Content-Language: en X-Generator: Drupal 7 (http://drupal.org) Cache-Control: public, max-age=900 Last-Modified: Wed, 12 Dec 2012 17:52:42 +0000 Expires: Sun, 19 Nov 1978 05:00:00 GMT Vary: Cookie,Accept-Encoding Content-Encoding: gzip How things can go wrong: Vary! $ curl -D - -o /dev/null -s http://10.118.73.168/ HTTP/1.1 200 OK Server: nginx/1.3.9 Date: Wed, 12 Dec 2012 18:00:57 GMT Content-Type: text/html; charset=utf-8 Content-Length: 8051 Connection: keep-alive X-Powered-By: PHP/5.4.9 X-Drupal-Cache: HIT Etag: "1355334762-0-gzip" Content-Language: en X-Generator: Drupal 7 (http://drupal.org) Cache-Control: public, max-age=900 Last-Modified: Wed, 12 Dec 2012 17:52:42 +0000 Expires: Sun, 19 Nov 1978 05:00:00 GMT Vary: Cookie,Accept-Encoding Content-Encoding: gzip EPIC FAIL! Note: no gzip support What type of proxy do you need? • Of our candidates, only two fully supports all proxy modes! CoAdvisor HTTP protocol quality tests for reverse proxies Varnish 3.0.3 49% Squid 3.2.5 81% Failures Viola ons Success Nginx 1.3.9 51% ATS 3.1.3 68% 0 100 200 300 400 500 600 CoAdvisor HTTP protocol quality tests for reverse proxies 25% Varnish 3.0.3 Squid 3.2.5 6% Failures Viola ons Success 27% Nginx 1.3.9 15% ATS 3.1.3 0 100 200 300 400 500 600 Choosing an intermediary Ease of use Extensible My subjective opinions ATS – The good • Good HTTP/1.1 support, including SSL • Tunes itself very well to the system / hardware at hand • Excellent cache features and performance – Raw disk cache is fast and resilient • Extensible plugin APIs, quite a few plugins • Used and developed by some of the largest Web companies in the world ATS – The bad • • • • Load balancing is incredibly lame Seen as difficult to setup (I obviously disagree) Developer community is still too small Code is complicated – By necessity? Maybe … ATS – The ugly • Too many configuration files! • There’s still legacy code that has to be replaced or removed • Not a whole lot of commercial support – But there’s hope (e.g. OmniTI recently announced packaged support) Nginx – The good • Easy to understand the code base, and software architecture – Lots of plugins available, including SPDY • Excellent Web and Application server – E.g. Nginx + fpm (fcgi) + PHP is the awesome, according to a very reputable source • Commercial support available from the people who wrote and know it best. Huge! Nginx – The bad • Adding extensions implies rebuilding the binary • By far the most configurations required “out of the box” to even do anything remotely useful • It does not make good attempts to tune itself to the system • No good support for conditional requests Nginx – The ugly • The cache is a joke! Really • The protocol support as an HTTP proxy is rather poor. It fares the worst in the tests, and can be outright wrong if you are not very careful • From docs: “nginx does not handle "Vary" headers when caching.” Seriously? Squid – The Good • Has by far the most HTTP features of the bunch. I mean, by far, nothing comes even close • It also is the best HTTP conformant proxy today. It has the best scores in the CoAdvisor tests, by a wide margin • The features are mature, and used pretty much everywhere • Works pretty well out of the box Squid – The Bad • • • • Old code base Cache is not particularly efficient Has traditionally been prone to instability Complex configurations – At least IMO, I hate it Squid – The Ugly • SMP is quite an afterthought – Duct tape • Why spend so many years rewriting from v2.x to v3.x without actually addressing some of the real problems? Feels like a boat has been missed… • Not very extensible – Typically you write external “helper” processes, similar to fcgi. This is not particularly flexible, nor powerful (can not do everything you’d want as a helper, so might have to rewrite the Squid core) Varnish – The Good • • • • VCL And did I mention VCL? Pure genius! Very clever logging mechanism ESI is cool, even with its limited subset – Not unique to Varnish though • Support from several good commercial entities Varnish – The Bad • Letting the kernel do the hard work might seem like a good idea on paper, but perhaps not so great in the real world. But lets not go into a BSD vs Linux kernel war … • Persistent caching seems like an after thought at best • No good support for conditional requests • What impact does “real” logging have on performance? Varnish – The Ugly • There are a lot of threads in this puppy! • No SSL. And presumably, there never will be? – So what happens with SPDY / HTTP2 ? • Protocol support is weak, without a massive amount of VCL. • And, you probably will need a PhD in VCL! – There’s a lot of VCL hacking to do to get it to behave well Summary • Please understand your problem` – Don’t listen to @zwoop on twitter… • Performance in itself is rarely a key differentiator; latency, features and correctness are • But most important, use a proxy, preferably a good one, if you run a serious web server 10,000 50.0 9,000 45.87 45.0 8,000 40.0 7,000 35.0 6,000 30.0 5,000 22.81 4,000 3,000 2,000 1,000 12.16 7.40 7.92 25.0 20.0 15.0 10.0 9.20 5.0 0 0.0 ATS 3.3.1 Nginx 1.3.9 Squid 3.2.5 hack QPS Varnish 3.0.3 Latency Varnish Nginx 1.3.9 3.0.3 varnishlog w Time to s firt res p onse (ms) Throughput Performance AWS 8KB HTML (gzip) If it ain’t broken, don’t fix it But by all means, make it less sucky! However, when all you have is a hammer…