Architecting to be Cloud Native Aligning your application’s architecture with the architecture of the cloud… FTW! But the cloud is a friendly place for non-native apps too! Guest lecture at Dino Konstantopoulos’ BU MET CS755 Cloud Computing class 17-April-2014 (7:00 – 9:00 PM EDT) My name is Bill Wilder codingoutloud@gmail.com blog.codingoutloud.com @codingoutloud www.devpartners.com www.cloudarchitecturepatterns.com Who is Bill Wilder? www.bostonazure.org www.devpartners.com I will ass-u-me… 1. You know what “the cloud” is 2. You have an inkling about Amazon Web Services and Windows Azure cloud platforms 3. You understand that such cloud platforms include compute services [like hosted virtual machines (VMs), in both IaaS and PaaS modes], SQL and NoSQL database services, file storage services, messaging, DNS, management, etc. 4. You are interested in understanding cloudnative applications and why that’s better than deploying my old-school app to the cloud “as is” Roadmap for rest of talk… … 1. Lightning-fast overview of Windows Azure 2. Cover three specific patterns for building cloud-native applications 3. Mention some other patterns along the way ? • Q&A during talk is okay (time permitting) • Q&A at end with any remaining time • Okay to reach out through email or twitter Windows Azure Portal General information http://www.windowsazure.com Management Portal http://manage.windowsazure.com “Bring Your Own” ____ as a Service NIST: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf NIST Terminology Power? Rigidity Simplicity SaaS = Software as a Service (BYO users) PaaS = Plaform as a Service (BYO apps) IaaS = Infrastructure as a Service (BYO VMs) http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf Complexity Flexibility Power? So Architecting for the (Windows Azure, AWS, GAE, …) Cloud is Different… But Why? WHY DID THEY (Microsoft, Amazon, Google, …) DO THIS TO US? Know the rules “If I had asked people what they wanted, they would have said faster horses.” - Henry Ford Know the rules “If I had asked IT departments what they wanted, they would have said IaaS.” - Henry Cloud Cloud Platform Characteristics • Scaling – or “resource allocation” – is horizontal – and ∞ (“illusion of infinite resources”) • Resources are easily added or released – self-service portal or API; cloud scaling is automatable • Pay only for currently allocated resources – costs are operational, granular, controllable, and transparent • Optimized for cost-efficiency – cloud services are MT, hardware is commodity – MTTR over MTTF • Rich, robust functionality is simply accessible – like an iceberg Cloud-Native Application Characteristics • Application architecture is aligned with the cloud platform architecture – uses the platform in the most natural way – lets the platform do the heavy lifting The term “cloud” is nebulous… The definition of “Cloud” is nebulous… What is different about the cloud? What's different about the cloud? ^ = TTM & Sleeping well 1/9th above water SOA MTBF MTTR Architectural Assumptions failure is routine (so you better be good at handling it) commodity hardware + multitenant services = cost-efficient cloud Loosely Coupled & Eventually Consistent Data & Workflow Architecture This bar is always open *and* Pay by the Drink has an API $ • Resource allocation (scaling) is: – Horizontal – Bi-directional – Automatable Resource Allocation The “illusion of infinite resources” Integrated Surface Area www.pageofphotos.com • Simple idea, simple app • Two-tiers: web tier (one server) + database • What’s the problem? ? • But… what’s WRONG with this architecture? • Different ≠ WRONG. Use the right tool for the job. Some apps are simply not good fit for cloud. www.pageofphotos.com • Simple idea, simple app • Two-tiers: web tier (one server) + database • What can go wrong • We’ll reexamine 1. 2. 3. 4. 5. Scaling the web tier Scaling the service tier Scaling the data tier Handling failure Operational efficiency (scale the app, not the team!) pattern 1 of 3 Horizontal Scaling Compute Pattern What’s the difference between performance and scale? Scale Up (and Scale Down??) vs. Horizontal Resourcing Common Terminology: Scaling Up/Down Vertical Scaling Scaling Out/In Horizontal “Scaling” But really is Horizontal Resource Allocation • Architectural Decision – Big decision… hard to change Vertical Scaling (“Scaling Up”) Resources that can be “Scaled Up” • Memory: speed, amount • CPU: speed, number of CPUs • Disk: speed, size, multiple controllers • Bandwidth: higher capacity pipe • … and it sure is EASY . Downsides of Scaling Up • Hard Upper Limit • HIGH END HARDWARE HIGH END CO$T • Lower value than “commodity hardware” • May have no other choice (architectural) Scaling Horizontally: Adding Boxes Autonomous nodes *and* Homogeneous nodes for operational simplicity *and* Anonymous nodes don‘t get emotionally involved! Autonomous nodes for scalability (stateless web servers, shared nothing DBs, your custom code in QCW) This is how the CLOUD works *and* This is how YOUR CLOUD-NATIVE APP WORKS Example: Web Tier www.pageofphotos.com Managed VMs (Cloud Service) Load Balancer (Cloud Service) Horizontal Scaling Considerations 1. Auto-Scale • Bidirectional 2. Nodes can fail • Auto-Scale is only one cause • Handle shutdown signals • Stateless (“like a taxi”) vs. Sticky Sessions • Stateless nodes vs. Stateless apps • N+1 rule vs. occasional downtime (UX) How many users does your cloud-native application need before it needs to be able to horizontally scale? pattern 2 of 3 Queue-Centric Workflow Pattern (QCW for short) Extend www.pageofphotos.com example into Service Tier • QCW enables applications where the UI and back-end services are Loosely Coupled • (Compare to CQRS at end if there is interest) QCW Example: User Uploads Photo www.pageofphotos.com Web Server Reliable Queue Reliable Storage Compute Service QCW WE NEED: • Compute (VM) resources to run our code • Reliable Queue to communicate • Durable/Persistent Storage Where does Windows Azure fit? QCW [on Windows Azure] WE NEED: • Compute (VM) resources to run our code Web Roles (IIS) and Worker Roles (w/o IIS) • Reliable Queue to communicate Azure Storage Queues • Durable/Persistent Storage Azure Storage Blobs & Tables; WASD QCW on Azure: User Uploads a Photo www.pageofphotos.com push Web Role (IIS) pull Azure Queue Worker Role Azure Blob UX implications: user does not wait for thumbnail (architecture!) QCW enables Responsive UX • Response to interactive users is as fast as a work request can be persisted • Time consuming work done asynchronously • Comparable total resource consumption, arguably better subjective UX • UX challenge – how to express Async to users? – Communicate Progress – Display Final results – Long Polling/Web Sockets (e.g., SignalR or Node.io) QCW enables Scalable App • Decoupled front/back provides insulation – – – – – Blocking is Bane of Scalability Order processing partner doing maintenance Twitter down Email server unreachable Internet connectivity interruption • Loosely coupled, concern-independent scaling – (see next slide) – Get Scale Units right –Key to optimizing operational CO$T$ General Case: Many Roles, Many Queues Web Role (Admin) Web Web Role Web Role (Public) Role (IIS) (IIS) Queue Queue Type 1 Type 1 Queue Queue Type 2 Type 2 Queue Type 3 Worker Worker Role Worker Role Worker Role Role Type 1 Worker Worker Role Worker Role Worker Worker Role Role Worker Role Worker TypeRole 2 TypeRole 2 Type 2 Type 2 • Scaling best when Investment α Benefit • Optimize for CO$T EFFICIENCY • Logical vs. Physical Architecture depends on current scale Reliable Queue & 2-step Delete var url = “http://pageofphotos.blob.core.windows.net/up/<guid>.png”; queue.AddMessage( new CloudQueueMessage( url ) ); (IIS) Web Role Queue Worker Role var invisibilityWindow = TimeSpan.FromSeconds( 10 ); CloudQueueMessage msg = queue.GetMessage( invisibilityWindow ); (… do some processing then …) queue.DeleteMessage( msg ); QCW requires Idempotent • Perform idempotent operation more than once, end result same as if we did it once • Example with Thumbnailing (easy case) • App-specific concerns dictate approaches – Compensating action, Last write wins, etc. • PARTNERSHIP: division of responsibility between cloud platform & app – Far cry from database transaction QCW expects Poison Messages • A Poison Message cannot be processed – Error condition for non-transient reason – Use dequeue count property • Be proactive – Falling off the queue may kill your system • Determine a Max Retry policy per queue – Delete, put on “bad” queue, alert human, … QCW requires “Plan for Failure” • VM restarts will happen – Hardware failure, O/S patching, crash (bug) • Bake in handling of restarts into our apps – Restarts are routine: system “just keeps working” – Idempotent support needed important – Event Sourcing (commonly seen with CQRS) may help • Not an exception case! Expect it! • Consider N+1 Rule What’s Up? Reliability as EMERGENT PROPERTY Typical Site Any 1 Role Inst Operating System Upgrade Application Code Update Scale Up, Down, or In Hardware Failure Software Failure (Bug) Security Patch Overall System What about the DATA? • You: Azure Web Roles and Azure Worker Roles – Taking user input, dispatching work, doing work – Follow a decoupled queue-in-the-middle pattern – Stateless compute nodes • Cloud: “Hard Part”: persistent, scalable data – Azure Queue & Blob Services – Three copies of each byte – Blobs are geo-replicated – Busy Signal Pattern pattern 3 of 3 Database Sharding Pattern pattern 3 of 3 Database Sharding Pattern Most Cloud Applications don’t care (much) about (very high) scale But they do care about developer productivity and operational efficiency bar.com foo.com VNET in cloud, connected to on-prem Azure Cloud Public Internet Content Editing & Site Admin Blob Storage Global CDN Dev Team Onprem SOAP / REST / HTTP Blob Storag e TDS (native SQL Server TCP-based wire protocol) dedicated MySQL Database to run CMS bar.com as Azure Cloud Service Site-to-Site Virtual Network foo.com as Azure Web Site running CMS Off-site/Travel Dev Team (Point-to-Site VPN from laptop to Azure) On-prem database On-prem Dev Team (Point-to-Site VPN API from CoLo Router into Azure) Azure SQL Database (WASD) is SQL Server Except… SQL Server Specific (for now) • Full Text Search • Transparent Data Encryption (TDE) • Many more… Limitations • You need to run it • Max VM size SQL Database Specific Common “Just change the connection string…” Limitations • 500 GB size limit • Busy Signal Pattern Extra Capabilities • Managed Service • Highly Available • Rental model • Premium (reserved) Additional information on Differences: http://msdn.microsoft.com/en-us/library/ff394115.aspx My database instance is limited to 500 GB. ∞∞∞ Does that mean the cloud doesn’t really offer the illusion of infinite resources? Old-School Cloudvs. Native Stable/Static Hardware Fixed/CapEx Vertical Scaling Minimize MTBF Data Storage = RDBMS architectural concerns Pre-Cloud vs. Cloud-Native Control Efficiency Dynamic/∞ Resources Variable/OpEx Horizontal Resourcing Minimize MTTR Scenario-specific Storage Lessons: being CloudNative Auto-Scaling via API Dynamic/∞ Resources Pay-As-You-Go Variable/OpEx Stateless, Autonomous Horizontal Resourcing N+1, Idempotent Minimize MTTR SQL, NoSQL, Blob Scenario-specific Storage Pre-Cloud vs. Cloud-Native 1:15,000 Efficiency Know the rules “Know the rules well, so you can break them effectively.” - Dalai Lama XIV Integrated Surface Area Cloud Architecture Patterns book Primer Chapters 1. 2. 3. 4. Scalability Eventual Consistency Multitenancy and Commodity Hardware Network Latency Cloud Architecture Patterns book Pattern Chapters 1. Horizontally Scaling Compute Pattern 2. Queue-Centric Workflow Pattern 3. Auto-Scaling Pattern 4. MapReduce Pattern 5. Database Sharding Pattern 6. Busy Signal Pattern 7. Node Failure Pattern 8. Colocate Pattern 9. Valet Key Pattern 10. CDN Pattern 11. Multisite Deployment Pattern Questions? Comments? More information? Business Card BostonAzure.org • Boston Azure cloud user group • Focused on Microsoft’s Public Cloud Platform • Monthly, 6:00-8:30 PM in Boston area – Food; wifi; free; great topics; growing community • Follow on Twitter: @bostonazure • More info or to join our Meetup.com group: http://www.bostonazure.org Contact Me Looking for … • consulting help with Windows Azure Platform? • someone to bounce Azure or cloud questions off? • a speaker for your user group or company technology event? Just Ask! Find this slide deck here Bill Wilder @codingoutloud http://blog.codingoutloud.com community inquiries: codingoutloud@gmail.com business inquiries: www.devpartners.com book: www.cloudarchitecturepatterns.com