Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Part I. The Basics 1. Introduction to Scalable Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is Scalability? Examples of System Scale in the Early 2000s How Did We Get Here? A Brief History of System Growth Scalability Basic Design Principles Scalability and Costs Scalability and Architecture Trade-Offs Performance Availability Security Manageability Summary and Further Reading 3 6 7 9 11 13 13 14 15 16 17 2. Distributed Systems Architectures: An Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Basic System Architecture Scale Out Scaling the Database with Caching Distributing the Database Multiple Processing Tiers Increasing Responsiveness Systems and Hardware Scalability Summary and Further Reading 19 21 23 25 27 30 32 34 v 3. Distributed Systems Essentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Communications Basics Communications Hardware Communications Software Remote Method Invocation Partial Failures Consensus in Distributed Systems Time in Distributed Systems Summary and Further Reading 35 36 39 43 49 53 56 58 4. An Overview of Concurrent Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Why Concurrency? Threads Order of Thread Execution Problems with Threads Race Conditions Deadlocks Thread States Thread Coordination Thread Pools Barrier Synchronization Thread-Safe Collections Summary and Further Reading Part II. 62 64 67 68 69 73 78 79 82 84 86 88 Scalable Systems 5. Application Services. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Service Design Application Programming Interface (API) Designing Services State Management Applications Servers Horizontal Scaling Load Balancing Load Distribution Policies Health Monitoring Elasticity Session Affinity Summary and Further Reading vi | Table of Contents 93 94 97 100 103 106 107 109 109 110 111 113 6. Distributed Caching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Application Caching Web Caching Cache-Control Expires and Last-Modified Etag Summary and Further Reading 115 120 121 121 122 124 7. Asynchronous Messaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Introduction to Messaging Messaging Primitives Message Persistence Publish–Subscribe Message Replication Example: RabbitMQ Messages, Exchanges, and Queues Distribution and Concurrency Data Safety and Performance Trade-offs Availability and Performance Trade-Offs Messaging Patterns Competing Consumers Exactly-Once Processing Poison Messages Summary and Further Reading 128 128 130 131 132 133 133 135 138 140 141 141 142 143 144 8. Serverless Processing Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 The Attractions of Serverless Google App Engine The Basics GAE Standard Environment Autoscaling AWS Lambda Lambda Function Life Cycle Execution Considerations Scalability Case Study: Balancing Throughput and Costs Choosing Parameter Values GAE Autoscaling Parameter Study Design Results Summary and Further Reading 147 149 149 150 151 152 153 154 155 157 158 159 160 161 Table of Contents | vii 9. Microservices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 The Movement to Microservices Monolithic Applications Breaking Up the Monolith Deploying Microservices Principles of Microservices Resilience in Microservices Cascading Failures Bulkhead Pattern Summary and Further Reading 164 164 166 168 170 172 173 178 180 Part III. Scalable Distributed Databases 10. Scalable Database Fundamentals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Distributed Databases Scaling Relational Databases Scaling Up Scaling Out: Read Replicas Scale Out: Partitioning Data Example: Oracle RAC The Movement to NoSQL NoSQL Data Models Query Languages Data Distribution The CAP Theorem Summary and Further Reading 185 186 186 188 189 191 192 196 197 198 202 203 11. Eventual Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 What Is Eventual Consistency? Inconsistency Window Read Your Own Writes Tunable Consistency Quorum Reads and Writes Replica Repair Active Repair Passive Repair Handling Conflicts Last Writer Wins Version Vectors Summary and Further Reading viii | Table of Contents 205 206 207 209 211 213 214 214 215 216 217 221 12. Strong Consistency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Introduction to Strong Consistency Consistency Models Distributed Transactions Two-Phase Commit 2PC Failure Modes Distributed Consensus Algorithms Raft Leader Election Strong Consistency in Practice VoltDB Google Cloud Spanner Summary and Further Reading 224 226 227 228 230 232 234 236 238 238 241 244 13. Distributed Database Implementations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Redis Data Model and API Distribution and Replication Strengths and Weaknesses MongoDB Data Model and API Distribution and Replication Strengths and Weaknesses Amazon DynamoDB Data Model and API Distribution and Replication Strengths and Weaknesses Summary and Further Reading Part IV. 248 248 250 251 253 254 256 259 260 261 264 266 267 Event and Stream Processing 14. Scalable Event-Driven Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Event-Driven Architectures Apache Kafka Topics Producers and Consumers Scalability Availability Summary and Further Reading 272 274 275 276 279 283 284 Table of Contents | ix 15. Stream Processing Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Introduction to Stream Processing Stream Processing Platforms Case Study: Apache Flink DataStream API Scalability Data Safety Conclusions and Further Reading 288 291 293 293 295 298 300 16. Final Tips for Success. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Automation Observability Deployment Platforms Data Lakes Further Reading and Conclusions 304 305 306 307 307 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 x | Table of Contents