Horizontal Pod Autoscaler (HPA) is a powerful feature in Kubernetes that automatically adjusts the number of pod replicas based on resource utilization. In this guide, we’ll walk through setting up and experimenting with HPA in a Kubernetes environment.
Before we begin, make sure you have the following set up on your computer:
- Docker
- Kind (Kubernetes in Docker)
If you haven’t installed these tools yet, don’t worry! Check out the links in the video description for my tutorials on installing them on Windows, Mac, and Ubuntu.
Understanding HPA
HPA is a control loop that continuously monitors pod resource usage and automatically changes the number of replicas to maintain the required level of utilization for CPU and Memory.
For example, if we set a target CPU utilization of 50%, HPA will increase or decrease the number of pods to keep the average CPU usage across all pods close to 50%.
Hands-on Demo
Let’s go through the process of setting up and testing HPA in a Kubernetes cluster. We’ll use a Makefile to simplify our commands.
1. Create a Kubernetes Cluster
First, let’s create a Kind cluster:
make create-cluster
This command creates a cluster named “hpa” using a predefined configuration.
2. Enable Metrics Server
To use HPA, we need to enable the Metrics Server:
make enable-metrics
Verify that the Metrics Server is running:
make check-metrics
3. Deploy a Resource-Intensive Application
Now, let’s deploy our application:
make create-deployment
4. Create a Traffic Generator
We’ll need a way to generate load on our application:
make create-traffic
Add the wrk
load testing tool to our traffic generator:
make add-app-traffic
5. Apply HPA Configuration
Let’s apply our HPA configuration:
make apply-hpa
6. Monitor Resources
To see how our cluster is performing, we can monitor node and pod resources:
make show-node-resource
make show-pod-resource
7. Generate Load
Now, let’s generate some load on our application:
For CPU-intensive load:
make start-app-traffic-cpu
For memory-intensive load:
make start-app-traffic-memory
8. Observe HPA in Action
While the load is being generated, observe how HPA adjusts the number of pods:
make show-hpa
You should see the number of pods increasing as the load increases, and decreasing as it subsides.
Cleanup
Once you’re done experimenting, clean up your cluster:
make delete-all
Conclusion
In this hands-on guide, we’ve explored how to set up and use Horizontal Pod Autoscaler in Kubernetes. HPA is a powerful tool for maintaining application performance and efficiency by automatically scaling your workloads based on resource utilization.
However, it’s important to note that HPA is just one of several scaling options available in Kubernetes and cloud environments:
- Vertical Pod Autoscaler (VPA): Unlike HPA, which scales the number of pods, VPA adjusts the CPU and memory resources of existing pods. This can be useful when you want to optimize resource allocation without increasing the pod count.
- Cluster Autoscaler: This solution scales the number of nodes in your cluster. It’s particularly useful in cloud environments where you can dynamically add or remove virtual machines.
- Custom Metrics Autoscaling: Kubernetes allows you to scale based on custom metrics, not just CPU and memory. This can be valuable for applications with specific performance indicators.
- Cloud-specific Solutions: Many cloud providers offer their own autoscaling solutions that integrate well with Kubernetes. For example:
- AWS: Elastic Kubernetes Service (EKS) with Cluster Autoscaler
- Google Cloud: GKE Autopilot
- Azure: Azure Kubernetes Service (AKS) with Virtual Machine Scale Sets
You can find more Kubernetes tutorials and tips on my blog at https://thiagodsantos.com/blog/.