Uploaded by Pedro Lopes

Scale an Application

advertisement
Scale from zero to millions of
users
Single Server Setup
To start with something simple, everything is running on a single server.
1. Users access websites through domain names, such as api.mysite.com. Usually,
the Domain Name System (DNS) is a paid service provided by 3rd parties and
not hosted by our servers.
2. Internet Protocol (IP) address is returned to the browser or mobile app. In the
example, IP address 15.125.23.214 is returned.
3. Once the IP address is obtained, Hypertext Transfer Protocol (HTTP) [1]
requests are sent directly to your web server.
4. The web server returns HTML pages or JSON response for rendering.
Database
Scale from zero to millions of users
1
With the growth of the user base, one server is not enough. We need multiple
servers:
One for web/mobile traffic.
The other for the database.
Separating web/mobile traffic (web tier) and database (data tier) servers allows them
to be scaled independently.
Which databases to use?
You can choose between a traditional relational database and a non-relational
database.
Relational databases are also called SQL database. The most popular ones are
MySQL, Oracle database, PostgreSQL, etc. Relational databases represent and
store data in tables and rows. You can perform join operations using SQL across
different database tables.
Non-Relational databases are also called NoSQL databases. Popular ones are
CouchDB, Neo4j, Cassandra, MongoDB, Amazon DynamoDB, etc. These databases
are grouped into four categories: key-value stores, graph stores, column stores,
and document stores. Join operations are generally not supported in non-relational
databases.
Non-relational databases might be the right choice if:
Your application requires super-low latency.
Scale from zero to millions of users
2
Your data are unstructured, or you do not have any relational data.
You only need to serialize and deserialize data (JSON, XML, YAML, etc.).
You need to store a massive amount of data.
Vertical scalling vs Horizontal scalling
Vertical scaling, referred to as “scale up”, means the process of adding more power
(CPU, RAM, etc.) to your servers.
Horizontal scaling, referred to as “scale-out”, allows you to scale by adding more
servers into your pool of resources.
When traffic is low, vertical scaling is a great option, and the simplicity of vertical
scaling is its main advantage. Unfortunately, it comes with serious limitations:
Vertical scaling has a hard limit. It is impossible to add unlimited CPU and
memory to a single server.
Vertical scaling does not have failover and redundancy. If one server goes down,
the website/app goes down with it completely.
Horizontal scaling is more desirable for large scale applications due to the limitations
of
vertical scaling.
Load Balancer
In the previous design, users are connected to the web server directly. Users will be
unable to access the website if the web server is offline.
In another scenario, if many users access the web server simultaneously and it
reaches the web server’s load limit, users generally experience slower response or
fail to connect to the server. A load balancer is the best technique to address these
problems.
💡
A load balancer evenly distributes incoming traffic among web
servers that are defined in a load-balanced set.
Scale from zero to millions of users
3
With this setup, web servers are unreachable directly by clients anymore.
After a load balancer and a second web server are added, we successfully solved no
failover issue and improved the availability of the web tier:
If server 1 goes offline, all the traffic will be routed to server 2. This prevents the
website from going offline. We will also add a new healthy web server to the
server pool to balance the load.
If the website traffic grows rapidly, and two servers are not enough to handle the
traffic, the load balancer can handle this problem gracefully. You only need to
add more servers to the web server pool, and the load balancer automatically
starts to send requests to them.
Database Replication
Scale from zero to millions of users
4
The current design has one database, so it does not support failover and
redundancy. Database replication is a common technique to address those
problems.
💡
Database replication can be used in many database management
systems, usually with a master/slave relationship between the original
(master) and the copies
(slaves)
A master database generally only supports write operations. A slave database gets
copies of the data from the master database and only supports read operations.
All the data-modifying commands like insert, delete, or update must be sent to the
master database. Most applications require a much higher ratio of reads to writes;
thus, the number of slave databases in a system is usually larger than the number of
master databases.
Scale from zero to millions of users
5
Advantages
Better performance: In the master-slave model, all writes and updates happen
in master nodes; whereas, read operations are distributed across slave nodes.
This model improves performance because it allows more queries to be
processed in parallel.
Reliability: If one of your database servers is destroyed, data is still preserved.
You do not need to worry about data loss because data is replicated across
multiple locations.
High availability: By replicating data across different locations, your website
remains in operation even if a database is offline as you can access data stored
in another database server.
Scale from zero to millions of users
6
What if one of the databases goes offline?
If only one slave database is available and it goes offline, read operations will be
directed to the master database temporarily. In case multiple slave databases
are available, read operations are redirected to other healthy slave databases.
If the master database goes offline, a slave database will be promoted to be the
new master.
Cache
It is time to improve the load/response time. This can be done by adding a cache
layer.
💡
A cache is a temporary storage area that stores the result of expensive
responses or frequently accessed data in memory so that subsequent
requests are served more quickly.
The cache tier is a temporary data store layer, much faster than the database. The
benefits of having a separate cache tier include better system performance, ability
to reduce database workloads, and the ability to scale the cache tier
independently.
After receiving a request, a web server first checks if the cache has the available
response. If it has, it sends data back to the client. If not, it queries the database,
stores the response incache, and sends it back to the client. This caching strategy is
called a read-through cache (other caching strategies are available).
When to use a Cache system?
Consider using cache when data is read frequently but modified infrequently.
Since cached data is stored in volatile memory, a cache server is not ideal for
Scale from zero to millions of users
7
persisting data.
For instance, if a cache server restarts, all the data in memory is lost. Thus,
important data should be saved in persistent data stores.
Expiration Policy in a Cache System
It is a good practice to implement an expiration policy.
Once cached data is expired, it is removed from the cache. When there is no
expiration policy, cached data will be stored in the memory permanently.
It is advisable not to make the expiration date too short as this will cause the system
to reload data from the database too frequently. Meanwhile, it is advisable not to
make the expiration date too long as the data can become stale.
Consistency in a Cache System
This involves keeping the data store and the cache in sync. Inconsistency can
happen because data-modifying operations on the data store and cache are not in a
single transaction.
Single Point of Failure?
A single cache server represents a potential single point of failure (if it fails, will stop
the entire system from working). As a result, multiple cache servers across different
data centers are recommended to avoid SPOF.
What happens when the Cache System is full?
Once the cache is full, any requests to add items to the cache might cause existing
items to be removed. This is called cache eviction.
Least-recently-used (LRU) is the most popular cache eviction policy. Other eviction
policies, such as the Least Frequently Used (LFU) or First in First Out (FIFO), can be
adopted to satisfy different use
cases.
CDN - Content Delivery Network
Scale from zero to millions of users
8
💡
A CDN is a network of geographically dispersed servers used to
deliver static content. CDN servers cache static content like images,
videos, CSS, JavaScript files, etc.
Here is how CDN works at the high-level: when a user visits a website, a CDN server
closest to the user will deliver static content. Intuitively, the further users are from
CDN servers, the slower the website loads. For example, if CDN servers are in San
Francisco, users in Los Angeles will get content faster than users in Europe.
CDN workflow
1. User A tries to get image.png by using an image URL.
2. If the CDN server does not have image.png in the cache, the CDN server
requests the file from the origin, which can be a web server or online storage like
Amazon S3.
3. The origin returns image.png to the CDN server, which includes optional HTTP
header
Time-to-Live (TTL) which describes how long the image is cached.
4. The CDN caches the image and returns it to User A. The image remains cached
in the CDN until the TTL expires.
5. User B sends a request to get the same image.
6. The image is returned from the cache as long as the TTL has not expired.
Considerations of using a CDN
Cost: CDNs are run by third-party providers, and you are charged for data
transfers in and out of the CDN.
Scale from zero to millions of users
9
Setting an appropriate cache expiry: For time-sensitive content, setting a cache
expiry time is important. The cache expiry time should neither be too long nor too
short. If it is too long, the content might no longer be fresh. If it is too short, it can
cause repeat reloading of content from origin servers to the CDN
CDN fallback: You should consider how your website/application copes with CDN
failure. If there is a temporary CDN outage, clients should be able to detect the
problem and request resources from the origin.
Design with CDN & Cache included
Scale from zero to millions of users
10
Stateful vs Stateless Systems
It is time to consider scaling the web tier horizontally. For this, we need to move state
(for instance user session data) out of the web tier.
A good practice is to store session data in the persistent storage such as relational
database or NoSQL. Each web server in the cluster can access state data from
databases. This is called stateless web tier.
Stateful Architecture
A stateful server remembers client data (state) from one request to the next.
User A’s session data and profile image are stored in Server 1. To authenticate User
A, HTTP requests must be routed to Server 1. If a request is sent to other servers
like Server 2, authentication would fail because Server 2 does not contain User A’s
session data. Similarly, all HTTP requests from User B must be routed to Server 2;
all requests from User C must be sent to Server 3.
The issue is that every request from the same client must be routed to the same
server.
This adds the overhead. Adding or removing servers is much more difficult with this
approach. It is also challenging to handle server failures.
Stateless Architecture
Scale from zero to millions of users
11
A stateless server keeps no state information.
In this stateless architecture, HTTP requests from users can be sent to any web
servers, which fetch state data from a shared data store. State data is stored in a
shared data store and kept out of web servers. A stateless system is simpler, more
robust, and scalable.
Scale from zero to millions of users
12
Design with the horizontal scaling update (stateless architecture)
After the state data is removed out of web servers, auto-scaling of the web tier is
easily achieved by adding or removing servers based on traffic load.
Data Centers
To improve availability and provide a better user experience across wider
geographical areas,
supporting multiple data centers is crucial.
Scale from zero to millions of users
13
Example with 2 Data Centers
In normal operation, users are geo-routed to the closest data center, with a split
traffic of x% in US-East and (100 – x)% in US-West. In the event of any significant
data center outage, we direct all traffic to a healthy data center.
Message Queue
To further scale our system, we need to decouple different components of the system
so they can be scaled independently. Messaging queue is a key strategy employed
by many real-world distributed systems to solve this problem.
Scale from zero to millions of users
14
💡
A message queue is a durable component, stored in memory, that
supports asynchronous communication. It serves as a buffer and
distributes asynchronous requests.
Message queue architecture
Input services, called producers/publishers, create messages, and publish them to a
message queue. Other services or servers, called consumers/subscribers, connect
to the queue, and perform actions defined by the messages.
With the message queue, the producer can post a message to the queue when the
consumer is unavailable to process it. The consumer can read messages from the
queue even when the producer is unavailable. The producer and the consumer
can be scaled independently.
Use case
Your application supports photo customization, including cropping, sharpening,
blurring, etc. Those customization tasks take time to complete.
Web servers publish photo processing jobs to the message queue.
Photo processing workers (consumers) pick up jobs from the message queue
and asynchronously perform photo customization tasks.
When the size of the queue becomes large, more workers are added to reduce
the processing time. However, if the queue is empty most of the time, the
number of workers can be reduced.
Logging, metrics, automation
Logging: Monitoring error logs is important because it helps to identify errors and
problems in the system. You can monitor error logs at per server level or use tools to
aggregate them to a centralized service for easy search and viewing.
Scale from zero to millions of users
15
Metrics: Collecting different types of metrics help us to gain business insights and
understand the health status of the system. Some of the following metrics are useful:
Host level metrics: CPU, Memory, disk I/O, etc.
Aggregated level metrics: for example, the performance of the entire database
tier, cache tier, etc.
Key business metrics: daily active users, retention, revenue, etc.
Automation: When a system gets big and complex, we need to build or leverage
automation tools to improve productivity. Continuous integration is a good practice, in
which each code check-in is verified through automation, allowing teams to detect
problems early. Besides, automating your build, test, deploy process, etc. could
improve developer productivity
significantly.
Scale from zero to millions of users
16
Design with Message Queue, Logging, metrics & automation
Database Scaling
Scale from zero to millions of users
17
As the data grows every day, your database gets more overloaded. It is time to scale
the data tier.
Vertical vs Horizontal scaling
Vertical scaling, also known as scaling up, is the scaling by adding more power
(CPU, RAM, DISK, etc.) to an existing machine. However, vertical scaling comes with
some serious drawbacks:
There are hardware limits. If you have a large user base, a single server is not
enough.
Greater risk of single point of failures.
The overall cost of vertical scaling is high. Powerful servers are much more
expensive.
Horizontal scaling, also known as sharding, is the practice of adding more servers.
Sharding separates large databases into smaller, more easily managed parts called
shards. Each shard shares the same schema, though the actual data on each shard
is unique to the shard.
Sharding Use Case
User data is allocated to a database server based on user IDs. Anytime you access
data, a hash function is used to find the corresponding shard. In our example,
is used as the hash function. If the result equals to 0, shard 0 is used to
store and fetch data. If the result equals to 1, shard 1 is used. The same logic applies
to other shards.
user_id % 4
Scale from zero to millions of users
18
The most important factor to consider when implementing a sharding strategy is the
choice of the sharding key. Sharding key (known as a partition key) consists of one
or more columns that determine how data is distributed. user_id is the sharding key.
A sharding key allows you to retrieve and modify data efficiently by routing database
queries to the correct database.
When choosing a sharding key, one of the most important criteria is to choose a key
that can evenly distributed data.
Problems with Sharding
Celebrity problem: Excessive access to a specific shard could cause server
overload.
Certain shards might experience shard exhaustion faster than others due to uneven
data distribution. When shard exhaustion happens, it requires updating the
sharding function and moving data around.
Scale from zero to millions of users
19
Design with Horizontal Database Scaling (Sharding)
Scale from zero to millions of users
20
Download