Mark Simms (@mabsimms) Principal Program Manager Windows Azure Customer Advisory Team resilient design and architecture Load Balancer Web Servers App Servers Database Load Balancer Distributed Cache Doc Store App Servers ... Web Servers External Services (SendGrid, Twitter, Facebook, etc) Database Load Balancer Distributed Cache Doc Store App Servers ... Web Servers external any workloads external service What are the “9”s Availability % Downtime per year Downtime per month* Downtime per week 90% ("one nine") 36.5 days 72 hours 16.8 hours 99% ("two nines") 3.65 days 7.20 hours 1.68 hours 99.9% ("three nines") 8.76 hours 43.2 minutes 10.1 minutes 99.99% ("four nines") 52.56 minutes 4.32 minutes 1.01 minutes 99.999% ("five nines") 5.26 minutes 25.9 seconds 6.05 seconds 99.9999% ("six nines") 31.5 seconds 2.59 seconds 0.605 seconds • Study Windows Azure Platform SLAs: • Compute External Connectivity: 99.95% (2 or more instances) • Compute Instance Availability: 99.9% (2 or more instances) • Storage Availability: 99.9% • SQL Azure Availability: 99.9% 12 Seconds Web Request Response Latency 450 400 350 300 250 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Avg Latency Response latency Platform Context Sample Target e2e “Fast latency max First” Retry Delay Count Backoff SQL Database Synchronous (e.g. render web page) 200 ms Yes 3 50 ms Linear Asynchronous (e.g. process queue item) 60 seconds No 4 5s Exponential Synchronous (e.g. render web page) 100 ms Yes 3 10 ms Linear Asynchronous (e.g. process queue item) 500 ms Yes 3 100 ms Exponential Azure Cache definition: design elements that can cause an outage. Focus on identifying design elements that are subject to external change. For example: Categories of common Failure Points: definition: a predictable root cause of the outage that occurs at a Failure Point. Examples of failure modes: The following would not be considered a failure mode: Failure Mode Example public int GetBusinessData(string[] parameters) { try { var config = Config.Open(_configPath); var conn = ConnectToDB(config.ConnectString); var data = conn.GetData(_sproc, parameters); return data; } catch (Exception e) { WriteEventLogEvent(100, E_ExceptionInDal); throw; } } Potential Failure Points: Database Server Database Table Configuration File Potential Failure Modes: DB Server not responding DB offline DB access denied Sproc execute denied DB doesn’t exist DB timeout on connect Index corrupt Database corrupt Table doesn’t exist Table corrupt Config file missing or invalid 27 Build Code Unit Test CI Check In Dev Fabric Plan Automat ed Test Run Dev on Azure Design Stage Deploy Run QA/Pre-release on Azure Log Defect Scope Test Plan Fixes Updates Defect Feature Triage Monitor Production Release on Azure Test Deploy http://msdn.microsoft.com/en-us/library/jj853352.aspx (http://msdn.microsoft.com/en-us/library/windowsazure/jj717232.aspx https://www.usenix.org/events/lisa07/tech/full_papers/hamilton/hamilton.pdf Microsoft Confidential Push vs. Pull Load Balanced Push Sync and good for sequential processing Dependent on downstream services Throttling vs. Performance Managed Pull/Throughput Asynchronous and event driven processing Easy Parallelisation and Pipelining Extending logic is easy Logic based • • • • Time based Priority Date Amount Etc. • • • • 52 ASAP Gradually Periodically On-Demand Volume based • Single • In Batches Data on the inside – Data on the outside http://msdn.microsoft.com/en-us/library/ms954587.aspx Reference Data • Immutable (versions) • Requires open schema for interop Activity Data • Low concurrency updates (e.g. shopping basket) Resource (shared) Data • Highly concurrent update (e.g. inventory) • Should live in worker role 53 Microsoft Confidential Microsoft Confidential “Query Ready” Cache Query patterns Push the data close to where it is queried – Example: BING Maps Process, structure, produce, format etc. data and cache “query ready” data Light/cheap data production is OK Pure and Idempotent operations are usually good candidates Duplication is OK Same data in a different format Same data in multiple places This requires processing data before it is queried - NOT at the query time All data can be cached Some data can be cached: Frequently used Process Heavy, Expensive data Build as you Go 54 Microsoft Confidential Distributed Caching Simple to administer No need to manage and host a distributed cache yourself. Integrates easily into existing applications ASP.NET session state and output cache providers enable no-code integration. Same managed interfaces as Windows Server AppFabric Cache On-Premises App AppFabric Cache APIs AppFabric Cache APIs Windows Azure App Windows Server AppFabric Cache 55 Windows Azure AppFabric Caching 01100100 01100001 01110100 01100001 Edge Location