Scalable Web Site Antipatterns Justin Leitgeb Stack Builders Inc. Overview • Based on architectures that have caused significant downtime and pain • Like examples in Nygard's book, but more emphasis on essential rather than accidental properties of system Anti-pattern 1: Monotonically-increasing data set with rapid growth • Having a system that relies on querying all historical data • Requires joins from mega-tables (hundreds of millions of rows) • Often from automatically aggregated data Detection • • • • Slow query log SHOW FULL PROCESSLIST SHOW ENGINE INNODB STATUS vmstat vmstat Anti-solutions • • • • Partitioning Pre-caching (cron jobs) Switching to MyISAM NoSQL? NoSQL • Out-of-the box solutions with NoSQL (e.g., Mongo) help with data modeling • Use CAP instead of ACID • May lead to better ability to distribute algorithms • But: o Haven't had as much effort yet expended on engines as MySQL (INNODB) o Often use the same algorithms (e.g., Btree indexes) o Can require more dev time (e.g., Cassandra and good implementation of distributed algorithms) Stop the bleeding • Cut off long queries • Turn off site sections • Fail whale Band-aids • Obvious - adding app servers, memcached, bigger DB server • Adding app servers puts more pressure on DB server • HTTP Caching (varnish) • MySQL tuning (look for things like FILESORT) • Read slaves Solutions • Hard-limit data volume - look for cases where data decreases in value with time o Add features related to scale • Distributed algorithms and data stores • Data warehousing Anti-pattern 2: Allowing "risky" writes to block HTTP responses • Symptoms: o Slow requests o Servers hitting MaxClients and 500 error Possible Causes • Possible causes: database backed analytics tracking • Session management • Any SQL DML (UPDATE, DELETE) Risk increases with: • • • • • The number of requests invoking the write operation Traffic Concurrent background operations The algorithmic complexity of the write Slow AWS I/O on EBS Solutions • Asynchronize! o Write to a queue • Write to memcached or other non-ACID store o Later bring to data warehouse for advanced analytics More info 1. Nygard, Michael T. Release It!: Design and Deploy Production-ready Software. Raleigh, NC: Pragmatic, 2007. 2. Fowler, Martin. Patterns of Enterprise Application Architecture. Boston: Addison-Wesley, 2003. 3. Kimball, Ralph. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses: John Wiley & Sons 2010. 4. Schwartz, Baron. High Performance MySQL: O'Reilly, 2008