Uploaded by raeyans

Best Practices for running optimized Kubernetes applications on GKE

advertisement
Best Practices for
running optimized
Kubernetes
applications on GKE
The Forrester Wave™: Public
Cloud Container Platforms, Q1
2022
"Google Cloud
is the best fit for firms
that want extensive cutting-edge
cloud-native capabilities for distributed
workloads spanning public cloud,
private cloud, and multicloud
environments.” [report and blog post]
Understanding
GKE options
01
We support the 4 scalability dimensions
Horizontal Pod
Autoscaler (HPA)
Vertical Pod
Autoscaler
(VPA)
Different
Pod Size
Workload
Cluster
Autoscaler (CA)
More nodes
Infrastructure
Node AutoProvisioning
(NAP)
Different
Node Size
We support the 4 scalability dimensions
Horizontal Pod
Autoscaler (HPA)
Vertical Pod
Autoscaler
(VPA)
Different
Pod Size
Workload
Cluster
Autoscaler (CA)
More nodes
Infrastructure
Node AutoProvisioning
(NAP)
Different
Node Size
GKE Standard
Here is where you
can save money
-GKE Autopilot
This is a Google
responsibility
HPA - Horizontal Pod Autoscaler
●
Target Utilization:
○
○
●
apiVersion: autoscaling/v1 : only CPU
utilization
apiVersion: autoscaling/v2beta2 : multiple
Recommended
metrics, even custom (eg. requests per second)
Indicated for stateless workers that need to handle
demand either growth or spike
○ Eg. 30% growth in traffic in a couple of minutes
HPA - Buffer size
●
Small
○ Prevents early scale ups
○ It can overload your
application during spikes.
●
Big
○ Causes resource waste,
increasing the cost of
your bill.
○ Need to be enough for
handling requests during
two or three minutes in a
spike.
For more information, see Configuring a Horizontal Pod Autoscaler
PUBLIC PREVIEW
HPA - Load balancer-based autoscaling
As of GKE 1.22, Gateways are “capacity-aware”
●
They monitor the incoming traffic load relative to
the Service’s capacity and can autoscale Pods or
spillover traffic to other clusters when load
exceeds capacity.
Autoscaler behavior:
10rps
=2
70%
10rps
Best practice: always configure a secondary
target, like CPU
See Gateway traffic management and Autoscaling based on load balancer traffic
VPA - Vertical Pod Autoscaler
●
Indicated for stateless workloads, not handled by
HPA, where traffic can be evenly distributed
between the replicas
●
Not indicated Init/Auto mode for statefulset and if
you need to handle sudden growth in traffic
○
●
Modes:
○
○
○
●
Use HPA instead.
Off: recommendation
Initial: do not restart
Auto: restart
Be careful when enabling the Auto Mode
○
○
Application will restart, can it?
■
Usually stateful can not
Help VPA out by setting
■
pod disruption budgets
■
min & max container sizes
Deeply integrated with Cluster Autoscaler
For more information, see Configuring Vertical Pod Autoscaling.
Mixing HPA and VPA
Never mix HPA and VPA on either CPU or Memory
Two options are safe to implement:
●
●
HPA using custom metrics
VPA in Recommendation Mode
In either case, make sure your application is always running over HPA
min-replicas
CA - Cluster Autoscaler
Scaling up
●
Indicated for: whenever you are using
either HPA or VPA
●
Optimized for the cost of
infrastructure and based on
scheduling simulation and declared
Pod requests
Scaling down
CA Profiles - Different scheduling
Balanced Profile
Optimize Utilization Profile
TL;DR: Prefer less used nodes
To keep all nodes, well… balanced
TL;DR: Prefer most used nodes
To leave nodes with less requested resources
that can be scaled down quickly
PENDING
PENDING
Application
pod
Application
pod
SCHEDULING
SCHEDULING
Application
pod
Application
pod
For more information, see Autoscaling profiles.
Disclaimer: Google may fine tune profiles to improve either reliability or the desire profile proposal
CA Profiles - Different scale downs
Balanced Profile
TL;DR: Maximize availability
Optimize Utilization Profile
TL;DR: Remove more nodes, faster
Runs every: 10m
Scale down: nodes < 50%
resource requests
Runs every: 1m
Scale down: nodes < 85%
resource requests
Delete empty nodes
and evict pods from
nodes flagged as
scale down
Delete empty nodes
and evict pods from
nodes flagged as
scale down
Delete empty nodes
and evict pods from
nodes flagged as
scale down
Delete empty nodes
-Nothing else to do
20m to delete 2 nodes
Delete empty nodes
-Nothing else to do
3m to delete 3 nodes
For more information, see Autoscaling profiles.
Disclaimer: Google may fine tune profiles to improve either reliability or the desire profile proposal
CA Profiles and Location Policy
●
Use optimize-utilization profile for a more aggressive scale down
●
Use location-policy=ANY (1.24+) to reduce preemption and speed up
auto scaling in edge cases
○
○
○
Default for Spot nodepools
CA queries the Recommend Locations API for optimal VMs location.
■ It prioritizes user's reservations and pick up zones with most unused capacity,
reducing preemption considerably.
■ Coming soon. Algorithm to use signals to minimize the risk of preemptions.
Prefer setting --total-max-nodes when using this location policy
For more information, see Autoscaling a cluster and Learn how to analyse Cluster Autoscaler events in the logs.
Bin Packing
●
Make sure your workload fits
well inside the machine size
(cpu, memory, ports)
●
You can create multiple node
pools and use either
nodeSelector based on
t-shirt sizes to select which
node your pod must run.
●
Another simpler option is to
configure NAP
What happens when you scale up (workload only)
●
Download Container Image = Mounting Volume → Fetching Image
●
App Startup = Init Containers Startup → Regular Containers Startup → Health Checks pass
What happens when you scale up (infra and workload)
GKE image streaming
for fast image downloading during autoscaling
Regular image pulling
1.
2.
Container image is downloaded to the VM
Application starts
GKE image streaming
1.
Application starts. Files are streamed from
remote filesystem.
GKE Image Streaming runs mysql container (~140MB image) in 3s
(vs 16s in regular image pulling process)
For more information read GKE image streaming for fast application startup and autoscaling
HPA - Scheduled & Predictive Autoscaling EAP
GKE uses machine-learning to
predict which huge traffic
spikes are likely to occur
and preemptively scale your
workloads to handle
anticipated traffic
Ask your account team for more information.
HPA - Scheduled & Predictive Autoscaling EAP
Use when
Don't use when
●
Strong spikes and/or deep valleys
●
Predictable traffic ebb and flow
○
○
○
Scale down to minimum during off-peak hours
Increase capacity previously to known events
Warm up stage clusters for load tests
●
Current spikes can be handled by HPA
and/or pause pods without big
overprovisioning
●
If you purchased CUD for your entire fleet
HPA + CA (possibly pause Pods)
Keep you
cluster warm
CUD
CPU Utilization
CUD
CPU Utilization
Consider
scheduled autoscaler
●
Combine with pause pods if unsure
about which workload will scale up
●
For batch workloads
For more information, see Reducing costs by scaling down GKE clusters during off-peak hours.
Review your logging and monitoring strategies
●
Cloud Logging and Cloud Monitoring are pay-as-you-go systems
●
The more you log and the longer you keep the logs, the more you pay
○
Exclude logs to avoid ingestion cost
■ However, the best place to do such a work is in the app
■ Note: If you create new buckets & routers, exclude routed logs from _Default,
otherwise you will changed twice
○
Monitor log bytes per application
○
Create an alerting system for logs
See Cost optimization for Cloud Logging, Cloud Monitoring, and Application Performance Management.
Review inter-region egress traffic in regional and
multi-zonal clusters
●
Either Regional or multi-zonal clusters
○
●
Need to pay egress traffic between the zones
Suggestions
○
○
○
○
Deploy development clusters in single-zone
Enable GKE usage metering its network egress agent (not enable by default)
Consider inter-pod affinity/anti-affinity to co-locate dependent Pods from different services in the
same nodes or in the same availability zone. Note this can compromise HA.
In GKE 1.24 you will be able to favor traffic to local zones using TopologyAwareHints. However, you
may face some issues if the traffic is all concentrated in one unique zone.
■
For example: HPA may not realize it needs to create new Pods because only Pods in one zone are running out of
cpu (33%), all the others (66%) are running idle.
Batch workloads
●
Usually concerned about the work being done eventually and commonly tolerant to some
latency at the job startup time
●
Separate batch workloads in different nodepools. Reasoning:
○
○
○
Cluster Autoscaler can delete empty nodes faster when it doesn't need to restart pods
You should consider configuring CA's optimize-utilization profile to improve even more the speed of
scale down activities
Some Pods cannot be restarted and so they will permanently block the scale down of their nodes
●
Use Node auto-provisioning to automatically create dedicated node pools for jobs
●
Make sure your long running jobs are saving checkpoints once they can be evicted during
CA downscaling
See Optimizing resource usage in a multi-tenant GKE cluster using node auto-provisioning tutorial.
Serving workloads
●
Must respond as quickly as possible to bursts or spikes
●
Problems in handling such spikes are commonly related to one or more of the following
reasons:
○ Applications not ready to run on top of Kubernetes.
○ For example, apps with big image size, slow startup time, non-optimal kubernetes
configurations, etc.
○ Applications depending on Infrastructure which takes some time to be provisioned.
■ For example, GPUs.
○ Autoscalers and Over-provisioning not being appropriately set.
●
In an attempt to "fix" the instability caused by these short-lived traffic peaks, companies
tend to over-provision their clusters. Although it may work in some cases, this is not a cost
effective solution
Reliable
Kubernetes
applications
02
Understanding your application capacity
●
Per Pod capacity:
○
●
Per app capacity
○
○
●
●
Isolate a single application Pod replica with
auto scaling off, and then execute the tests
simulating a real usage load
Can your app
support this?
Configure your Cluster Autoscaler, container
replicas, resources requests and limits, and
either Horizontal Pod Autoscaler or Vertical Pod
Autoscaler
Stress your application with more strength to
simulate sudden bursts or spikes.
To eliminate latency concerns, these tests
must run from the same region/zone the
application is running on Google Cloud
Or use VPA in recommendation mode
See Distributed load testing using Google Kubernetes Engine tutorial.
Make sure your application can grow both
vertically and horizontally
●
Make sure your application can handle spikes by adding more CPU and memory, or
by increasing the number of Pod replicas
●
This gives you the flexibility to experiment what fits your application better
●
Unfortunately, there are some applications that are single threaded or architecturally
limited to a number of workers/sub-process
Understanding QoS
Best Effort
Burstable
Guaranteed
apiVersion: v1
…
spec:
containers:
- name: qos-demo-3-ctr
image: nginx
apiVersion: v1
…
spec:
containers:
- name: qos-demo-3-ctr
image: nginx
resources:
requests:
cpu: 500m
memory: 100Mi
limits:
cpu: 1
memory: 200Mi
apiVersion: v1
…
spec:
containers:
- name: qos-demo-3-ctr
image: nginx
resources:
requests:
cpu: 3
memory: 200Mi
limits:
cpu: 3
memory: 200Mi
Pod can be killed and marked as
Fail at any time
Pod can be killed and marked as
Fail when used memory is
bigger than requested
Pod is guaranteed to run
Set the appropriate resources and limits
●
Always set your container resources requests
○
●
For Burstable workloads, set
○
○
●
If you only set limits, guaranteed is assumed
Memory: the same amount for both requests and limits
CPU: bigger or unbounded CPU limit
Use GKE Cost Insights and Recommendations
○
○
To help you understand the recommended values to your
production workload
Discussed in the next section
Make your container image as lean as possible
Having the smallest image as possible:
●
Every time Cluster Autoscaler provisions a new node, the node needs to download
the images. The smaller the image, the faster the node can download it.
●
Examples:
○
○
Go: you can get images with ~10Mb (distroless!)
Java: consider jib (for lean images); GraalVM (for low latency native binaries); Quarkus,
Micronaut, and Spring Boot (for Kubernetes Native Java stack)
Starting the application as fast as possible:
●
●
Some apps can take minutes to start due to class loading, caching, etc.
Whenever a pod requires a long start up, your requests may fail while your
application is booting up
For more information about how to build containers, see Best practices for building containers.
Add Pod Disruption Budget (PDB) to your application
●
PDBs are essential for autoscale activities, mainly during Cluster Autoscaler compacting phase
For more information, see Specifying a Disruption Budget for your Application.
Set meaningful Kubernetes Health checks
●
Readiness probes: to know if your pods must be added to or removed from load
balancers
●
Liveness probes: to know when your pods need to be restarted
●
Do this to ensure the correct lifecycle of your application:
○
○
○
○
○
Define at least the readiness probe for all your containers
If a cache is to be loaded at startup time, the readiness probe must say it's ready only
after the cache is fully loaded
If the app can start serving right away, use a simple probe. For example, an HTTP endpoint
returning 200 status code.
If a more advanced probe is needed (e.g. checking if the connection pool has available
resources), make sure your error rate doesn't increase when comparing it with a
simpler implementation.
Never make any probe logic access other services. It can compromise the lifecycle of
your Pod in case these services do not respond timely.
For more information, see Configure Liveness, Readiness and Startup Probes.
Make sure your apps are shutting down correctly
Consider the async nature of how Kubernetes updates endpoints and load balancers:
●
Don't stop accepting new requests right after SIGTERM.
●
If your application doesn't follow the above practice, use the preStop hook.
●
Handle SIGTERM for clean ups only. Eg. in-memory state which needs to be persisted.
●
Configure terminationGracePeriodSeconds to fit your application needs.
○
○
○
●
High values may increase time for node upgrades, rollouts, etc.
Low values may not give enough time for Kubernetes to finish the Pod termination process
Either way, it is recommended that you keep your application's termination period smaller than 10
minutes because Cluster Autoscaler honors it for 10 minutes only
If your application uses Container-native load balancing, it is important that you start failing
your readiness probe as soon as you receive a SIGTERM
For more information, see Kubernetes best practices: terminating with grace.
Monitoring and
enforcing
https://www.youtube.com/watch?v=sYdCqxM7OFM
03
Workload rightsizing recommendations
Features:
●
●
VPA Recommendations in
GKE UI
VPA Metrics in Cloud
Monitoring
Coming soon:
●
VPA Recommendations in
Recommendation Hub
For more details, see GKE workload rightsizing — from recommendations to action
Workload rightsizing recommendations at scale
VPA recommendation intelligence:
●
●
Allows you to get workload recommendations at org,
division or team level
Allow you to figure out:
a. If it worth investing in workload right-sizing?
b. What is the size of the effort?
c. Where should you focus initially?
d. How many workloads are over-provisioned
and how many are under-provisioned?
e. How many workloads are at reliability or
performance risk?
f.
If you are getting better over-time on
workload rightsizing?
For more details, see Right-sizing your GKE workloads at scale using GKE built-in Vertical Pod Autoscaler recommendations
Use Kubernetes Resource Quotas
●
Multitenant clusters: different teams are responsible for workloads in different
namespaces
●
Kubernetes Resource Quotas:
○
○
Manage the amount of resources used by objects in a namespace
You can set quotas in terms of:
■
■
○
compute (CPU and memory) and storage resources
object counts
They let you ensure that no tenant uses more than its assigned share of cluster
resources
For more information, see Configure Memory and CPU Quotas for a Namespace.
05
What's next?
Kubernetes
General
General Best Practices
●
Write configuration in YAML - CICD your YAMLs
●
When defining configurations, specify the latest stable API version
●
Avoid direct command line interaction with nodes in the cluster
●
Administrator-only SSH access to Kubernetes nodes. Devs may only
use “kubectl exec”
●
In deployment use the “record” option for easier rollbacks
Don’t use naked Pod
ReplicaSet
- Name = “my-rc”
- Selector = {“App”: “MyApp”}
- Template = { ... }
- Replicas = 4
Pods not bound to a ReplicaSet or Deployment will not be
rescheduled in the event of a node failure.
ReplicaSets are a simple control loop that ensures N
copies of a Pod:
●
Grouped by a selector
●
Too few? Start some
●
Too many? Kill some
How
many?
Start 1
more
3
How
many?
OK
API server
4
Use labels
Define and use labels that identify
semantic attributes of your
application or Deployment, such as:
●
App: MyApp
●
Phase: Prod
You can use these labels to select
the appropriate Deployments for
other resources such as ReplicaSets
and Services.
App: MyApp
App: MyApp
Phase: prod
Phase: prod
Role: FE
Role: BE
App: MyApp
App: MyApp
Phase: test
Phase: test
Role: FE
Role: BE
Kubernetes
Container Images
Base Image
It is highly recommended to use managed image as a base image.
Google provide managed base images which are base container images that are automatically patched by
Google for security vulnerabilities, using the most recent patches available from the project upstream.
Make image light
When it makes container, it is recommended to install required utilities only and make container size small.
Unnecessary utilities make container images bigger in size. They also increase the chances to host security
holes. In addition, big container images take longer time in building and deployment cycles.
For example, Node.js 6 image from Google Cloud Container Registry takes around 670MB, while Ubuntu 16
only images takes 140MB.
Change tag every new container build
In addition, every container image needs to have new tag name. Kubernetes caches container image based
on tag name. It means that if you build new container with same tag image, the container image will not be
pulled. Instead of the new image, cached image will be used. So every container build needs to have new tag
name.
Don’t use :latest or no tag
This one is pretty obvious and most people doing this today already. If you don’t add a tag for your
container, it will always try to pull the latest one from the repository and that may or may not contain the
changes you think it has.
Kubernetes
Security
Avoid privileged containers
And grant Linux capabilities only if strictly necessary
Avoid running as root
Containers share kernel resources!
Run as non-root in Docker file
Bad
Good
FROM node:carbon
EXPOSE 8080
WORKDIR /home/appuser
COPY server.js .
CMD node server.js > /home/appuser/log.out
FROM node:carbon
EXPOSE 8080
RUN groupadd -r -g 2001 appuser && useradd -r -u 1001 -g appuser appuser
RUN mkdir /home/appuser && chown appuser /home/appuser
USER appuser
WORKDIR /home/appuser
COPY --chown=appuser:appuser server.js .
CMD node server.js > /home/appuser/log.out
GKE
General
General Best Practices
●
Use Kubectl or the Google Cloud Platform console as primary
method of Kubernetes Engine interaction.
●
Use Container OS for a streamlined platform that minimizes
maintenance and administration
Use Regional Clusters for production
GKE provides two options about master node
management.
Zonal Clusters: Have one master node in
one zone, hence it cannot support zonal
level high availability.
Regional Clusters: Enables
zero-downtime upgrades and 99.95%
uptime by deploying multiple masters
across a region
For production, it is recommended to use
regional cluster.
Zone A
Zone B
Zone C
Regional Control Plane
Nodes
Nodes
Nodes
Enable Node Auto-Repair
Keep nodes in your K8s cluster in a healthy, running state.
Nodes that fail health checks are automatically scheduled for repair, including:
●
Unreachable nodes
●
Nodes reporting a NotReady status for a
●
prolonged period
●
Nodes whose boot disks are out of space for a prolonged period
Unhealthy nodes have their pods drained, then node is recreated
GKE
Upgrades
Enable Node Auto Upgrades
Automatically keep nodes in your K8s cluster or node pool up to
date with the latest stable version of Kubernetes
Get the latest Kubernetes features; apply security updates
automatically
Control when upgrades occur with maintenance windows
Enable Release Channels for Upgrades
Release channels allow you to control your cluster and node pool version based on a version's stability rather than
managing the version directly.
When you enroll a cluster in a release channel, that cluster is upgraded automatically when a new version is available in
that channel.
Node upgrades
●
●
Upgrades the node OS, Kubernetes, and the container runtime
○
Nodes are cordoned, drained, and replaced one at a time
○
Respects pod disruption budgets (minAvailable and maxUnavailable) to avoid service downtime
○
Supports roll-backs
Trigger cluster upgrades manually (per node pool), or use the Node auto-upgrade functionality
○
Use blue-green upgrade if you have highly-available production workloads that you need to be able to
roll back quickly in case the workload does not tolerate the upgrade, and a temporary cost increase is
acceptable
○
Enable maintenance windows and maintenance exclusions, which provide control over when cluster
maintenance such as auto-upgrades can and cannot occur on your Google Kubernetes Engine (GKE)
clusters
GKE
Security
GKE: Minimal OS
Container-optimized OS (COS) based on Chromium OS, and maintained by Google
●
Built from source: Since COS is based on Chromium OS, Google maintains all components and is able to rebuild
from source if a new vulnerability is discovered and needs to be patched
●
Smaller attack surface: Container-Optimized OS is purpose-built to run containers, has a smaller footprint,
reducing your instance's potential attack surface
●
Locked-down by default: Firewall restricts all TCP/UDP except SSH on port 22, and prevents kernel modules.
Root file system is mounted read-only
●
Automatic Updates: COS instances automatically download weekly updates in the background; only a reboot is
necessary to use the latest updates. Google provides patches and maintenance
Use private cluster
Cluster will not have public IP. It can be only
accessed from private network only.
A private cluster can use HTTP Load balancer or a
network load balancer to accept incoming traffic,
even though the cluster nodes do not have public IP
address.
A private cluster also use an internal load balancer
to accept traffic from within your VPC network
On premises
Private
clusters
WWW
Cloud NAT
Kubectl client
Node a
Pod a
Node z
Pod z
Pod a
Private Google Access
Google Services
Pod z
RFC 1918 only
(10.0.0.0, 172.16.0.0,
192.168.0.0)
Security on GKE
IAM and RBAC
GKE: Using IAM and RBAC
Use IAM at the project level
Set roles for
● Cluster Admin: manage clusters
● Container Developer: API access
GCP Project
Cluster
Node
Pod
Container
Container
Namespace
Use RBAC at the cluster and
namespace level
Set permissions on individual clusters
and namespaces
Use Groups
Only ever add groups
Groups get one or more roles
IAM: Example of managing access with roles
Applying the principle of least privilege for developers across environments
Project: DEV
Project: PROD
Cluster: Dev Playground
Cluster: Prod A
Cluster: Prod B
namespace: dev-alice-app-a
namespace:
app-a
namespace:
app-a
namespace: dev-mary-app-a
namespace:
app-b
namespace:
app-b
developers=container.developer
developers=container.viewer
Use Least Privilege Service Accounts
Security on GKE
Encryption, Keys and Secrets
Kubernetes Secrets
My little
secret
Auth to
Vault dir
e
ctly
Kubernetes
Secret
Secret with
EncryptionConfig
Secret with
EncryptionConfig
backed by KMS
provider
Acting as KMS
provider to
manage KEK
Acting as Secrets
Engine for keys and
encryption
Acting as KMS
provider to
manage KEK
Acting as Secrets Engine
for service accounts
management
Cloud KMS
Cloud IAM
Encryption and keys
●
All persistent disks are encrypted by default!
●
GKE Customer-Managed Encryption Keys (CMEK)
○
●
Encrypt volumes with keys managed by Cloud KMS
Plugin for Cloud KMS (open-sourced for DIY:ers)
○
Also known as “Application-layer Secrets Encryption”
○
DEKs on master(s) encrypt Kubernetes Secrets
○
KEK in Cloud KMS, potentially backed by HMS
Master
Cloud KMS
Nodes
Persistent
Disk
Keeping up with
the news
Google Cloud blog – filter on container-related news
https://cloud.google.com/blog/products/containers-kubernetes
Exclusive roadmap for customers and partners
https://www.cloudconnectcommunity.com/
The “Google Cloud Platform Podcast”
https://www.gcppodcast.com/
The “Kubernetes Podcast from Google”
https://kubernetespodcast.com/
Thank you!
Download