Kubernetes HPA

Understanding Kubernetes HPA and Its Role in Cloud Cost Reduction


Kubernetes’ autoscaling is a fundamental feature that allows for efficient resource utilization and cost reduction. The more precisely you set the scaling mechanisms – HPA (Horizontal Pod Autoscaler), VPA (Vertical Pod Autoscaler), and Cluster Autoscaler – the less wastage and cost you incur for your application.

Kubernetes offers three distinct autoscaling mechanisms: HPA, VPA, and Cluster Autoscaler. Each contributes a unique aspect to your overall goal of autoscaling for cloud cost optimization. This article will delve into the concept of horizontal pod autoscaling. By adjusting the HPA settings on your cluster, you can control the number of pods that run based on various metrics, allowing you to scale up or down as per demand.

A Quick Recap of Kubernetes Autoscaling

Before we delve into HPA, let’s briefly discuss Kubernetes autoscaling mechanisms. Kubernetes supports three types of autoscaling: HPA, which adjusts the number of application replicas; VPA, which scales the resource requests and limits of a container; and Cluster Autoscaler, which modifies the number of nodes in a cluster. These autoscalers operate at either the pod or cluster level. While HPA and VPA adjust resources at the pod level, the Cluster Autoscaler scales the number of nodes in a cluster up or down.

At our organization, we’ve ensured our platform integrates seamlessly with HPA and provides a direct alternative to Cluster Autoscaler. For more advanced horizontal scaling and business metrics-centric scaling, consider using KEDA.

What is Kubernetes Horizontal Pod Autoscaler (HPA)?

HPA is a valuable tool for managing application usage fluctuations. For instance, an e-commerce store might experience more traffic in the evening than at noon. When your application’s demands change, you can use HPA to automatically add or remove pods based on CPU utilization. HPA bases its autoscaling decisions on metrics that you provide, either externally or through custom metrics.

To begin, you need to specify the number of replicas that should run at any given time using the MIN and MAX values. Once set, the HPA controller handles the task of checking metrics and making necessary adjustments. By default, it checks metrics every 15 seconds.

The HPA controller monitors your deployment’s pods to determine whether the number of pod replicas needs to change. It calculates a weighted mean of a per-pod metric value and determines whether decreasing or increasing replicas would bring that value closer to its target value.

For example, if your deployment has a target CPU utilization of 50%, and you currently have five pods running with a mean CPU utilization of 75%, the HPA controller will add three replicas to bring the pod average closer to the target of 50%.

HPA autoscaler

When to Use Kubernetes HPA?

HPA is particularly useful for scaling stateless applications, but it can also support scaling stateful sets. To achieve cost savings for workloads that experience regular changes in demand, use HPA in combination with cluster autoscaling. This combination will help you decrease the number of active nodes when the number of pods decreases.

Limitations of Horizontal Pod Autoscaler

However, it’s important to note that HPA has some limitations. It may require you to architect your application with a scale-out in mind to distribute workloads across multiple servers. HPA might not always keep up with unexpected demand peaks since new virtual machines may take a few minutes to load. If you fail to set CPU and memory limits on your pods, they may frequently terminate or waste resources if you choose to do the opposite. If the cluster is out of capacity, HPA cannot scale up until new nodes are added to the cluster. Cluster Autoscaler (CA) can automate this process.

Running Horizontal Pod Autoscaler

HPA is a feature of the Kubernetes cluster manager that monitors the CPU usage of pod containers, automatically resizing them as necessary to maintain a target level of utilization. To do this, HPA requires a source of metrics. For example, when scaling based on CPU usage, it uses metrics-server. If you want to use custom or external metrics for HPA scaling, you need to deploy a service implementing the custom.metrics.k8s.io API or external.metrics.k8s.io API; this provides an interface with a monitoring service or metrics source.

Custom metrics include network traffic, memory, or any value that relates to the pod’s application. And if your workloads use the standard CPU metric, make sure to configure the CPU resource limits for containers in the pod spec.

Expert Tips for Running Kubernetes HPA

Here are some expert tips for running Kubernetes HPA:

  1. Install metrics-server: Kubernetes HPA needs to access per-pod resource metrics to make scaling decisions. These values are retrieved from the metrics.k8s.io API provided by the metrics-server.
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
  1. Configure resource requests for all pods: Another key source of information for HPA’s scaling decisions is observed CPU utilization values of pods. These values are a percentage of the resource requests from individual pods. If you miss resource request values for some containers, these calculations might become entirely inaccurate and lead to suboptimal operation and poor scaling decisions. That’s why it’s worth configuring resource request values for all containers of every pod that’s part of the Kubernetes controller scaled by the HPA.
apiVersion: apps/v

kind: Deployment
  name: nginx-deployment
      app: nginx
  replicas: 3
        app: nginx
      - name: nginx
        image: nginx:1.14.2
        - containerPort: 80
            cpu: "100m"
  1. Configure custom and external metrics: You can configure HPA to scale based on custom metrics, which are internal metrics that you collect from your application. HPA supports two types of custom metrics: Pod metrics – averaged across all the pods in an application, which support only the target type of AverageValue. Object metrics – metrics describing any other object in the same namespace as your application and supporting target types of Value and AverageValue. Remember to use the correct target type for pod and object metrics when configuring custom metrics.
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
  name: php-apache
    apiVersion: apps/v1
    kind: Deployment
    name: php-apache
  minReplicas: 1
  maxReplicas: 10
  - type: Resource
      name: cpu
        type: Utilization
        averageUtilization: 50
  1. Verify that your HPA and VPA policies don’t clash: Vertical Pod Autoscaler automates requests and limits configuration, reducing overhead and achieving cost savings. Horizontal Pod Autoscaler, on the other hand, aims to scale out rather than up or down. Double-check that your binning and packing density settings aren’t in conflict with each other when designing clusters for business or purpose-class tier of service.
  2. Use instance weighting scores: Suppose one of your workloads ends up consuming more than it requested. Is this happening because the resources are needed? Or did the workload consume them because they were available but not critically required? Use instance weighting when choosing instance sizes and types for autoscaling. Instance weighting is useful, especially when you adopt a diversified allocation strategy and use spot instances.

Monitoring and Control in Kubernetes

Increased scalability poses a challenge to cost monitoring and control in Kubernetes because autoscalers constantly adjust capacity. Cloud cost management tools like CAST AI provide cost monitoring you can use to get an hourly, daily, weekly, and monthly overview of your cloud cost. You can find out more tools in our comparison of best cloud cost management platforms.