How Cluster Autoscaler Works

Cluster Autoscaler (CA) watches for Pending pods and checks whether adding a node from any of the configured Auto Scaling Groups (ASGs) would allow the pod to be scheduled. If yes, it increases the ASG's desired count by one. It also periodically checks for underutilised nodes and scales them down if pods can be safely rescheduled elsewhere.

CA has been the standard solution since 2016 and is well-understood, but it has architectural limitations:

Setting Up Cluster Autoscaler on EKS

First, tag your EKS node group ASGs so CA can discover them:

k8s.io/cluster-autoscaler/enabled = "true"
k8s.io/cluster-autoscaler/<cluster-name> = "owned"

Deploy CA with Helm, passing your cluster name:

helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm repo update

helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=ap-south-1 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789012:role/cluster-autoscaler

The IAM role needs permissions to describe and modify Auto Scaling Groups. The minimal policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "autoscaling:DescribeAutoScalingGroups",
        "autoscaling:DescribeAutoScalingInstances",
        "autoscaling:DescribeLaunchConfigurations",
        "autoscaling:DescribeScalingActivities",
        "autoscaling:SetDesiredCapacity",
        "autoscaling:TerminateInstanceInAutoScalingGroup",
        "ec2:DescribeImages",
        "ec2:DescribeInstanceTypes",
        "ec2:DescribeLaunchTemplateVersions",
        "ec2:GetInstanceTypesFromInstanceRequirements",
        "eks:DescribeNodegroup"
      ],
      "Resource": "*"
    }
  ]
}

How Karpenter Works

Karpenter takes a different approach: instead of managing ASGs, it talks directly to the EC2 API. When a pod is unschedulable, Karpenter evaluates its resource requests and scheduling constraints (node selectors, affinities, topology spread), then provisions the cheapest EC2 instance that satisfies those constraints — across any family, size, or generation, including Spot.

This architectural difference has meaningful practical benefits:

Setting Up Karpenter on EKS

Karpenter requires an IAM role for the controller and a node IAM role for provisioned instances. Using eksctl:

export CLUSTER_NAME=my-cluster
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export AWS_REGION=ap-south-1

# Create the Karpenter controller IAM role (uses IRSA)
eksctl create iamserviceaccount \
  --cluster "${CLUSTER_NAME}" \
  --namespace karpenter \
  --name karpenter \
  --role-name "KarpenterControllerRole-${CLUSTER_NAME}" \
  --attach-policy-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:policy/KarpenterControllerPolicy-${CLUSTER_NAME}" \
  --approve

Install Karpenter with Helm:

helm registry logout public.ecr.aws || true

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.0.0" \
  --namespace karpenter \
  --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --wait

Defining NodePools and EC2NodeClass

Karpenter uses two CRDs to describe what nodes it should provision. The EC2NodeClass defines AWS-specific configuration, and the NodePool defines scheduling constraints and limits:

# ec2nodeclass.yaml
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
    - alias: al2023@latest          # Amazon Linux 2023, latest EKS-optimised AMI
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}"
  tags:
    Environment: production
    ManagedBy: karpenter
# nodepool.yaml
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    metadata:
      labels:
        nodepool: default
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]        # compute, general, memory optimised
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      expireAfter: 720h                   # rotate nodes every 30 days
  limits:
    cpu: "200"
    memory: 400Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

With consolidationPolicy: WhenEmptyOrUnderutilized, Karpenter actively replaces underutilised nodes — for example replacing two half-full m5.xlarge nodes with one m5.xlarge. This can meaningfully reduce EC2 costs without any application changes.

Spot Instances: Cost Optimisation at Scale

Spot instances offer up to 90% cost reduction over On-Demand. Both tools support Spot, but Karpenter handles it better. With Cluster Autoscaler, you need separate node groups per instance type and manual priority weighting. With Karpenter, include multiple instance families in your NodePool requirements and set capacity-type preference:

requirements:
  - key: karpenter.sh/capacity-type
    operator: In
    values: ["spot", "on-demand"]    # Karpenter prefers spot, falls back to on-demand
  - key: karpenter.k8s.aws/instance-category
    operator: In
    values: ["c", "m", "r"]
  - key: karpenter.k8s.aws/instance-size
    operator: NotIn
    values: ["nano", "micro", "small"]   # exclude too-small instances

Karpenter subscribes to EC2 Spot interruption notices via an SQS queue and gracefully drains nodes before the 2-minute interruption window. Configure this with the interruptionQueue setting during Helm install.

Cluster Autoscaler vs Karpenter: Side-by-Side

Cluster Autoscaler Karpenter
Provisioning mechanism Adjusts ASG desired count Calls EC2 RunInstances directly
Scale-up latency 2–5 minutes 60–90 seconds
Instance flexibility Fixed per node group Any family matching constraints
Spot diversification Requires many node groups Native, single NodePool
Bin packing / consolidation Scale-down only Active node replacement
Maturity Stable since 2016, broadly adopted GA since 2023, AWS-native

When to Use Each

Choose Cluster Autoscaler when:

Choose Karpenter when:

You can run Cluster Autoscaler and Karpenter on the same cluster with careful label-based partitioning — CA manages some node groups while Karpenter handles others. Avoid having both watch the same set of nodes.

Essential Configuration: Pod Disruption Budgets

Both tools can evict pods during scale-down or consolidation. Protect stateful workloads and avoid disrupting all replicas simultaneously by defining PodDisruptionBudgets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 1          # always keep at least 1 pod running
  selector:
    matchLabels:
      app: my-app

Without PDBs, a consolidation event could evict all replicas of a deployment simultaneously. Set minAvailable: 1 or maxUnavailable: 1 for any production workload.

Need Kubernetes set up properly on EKS?

We design and implement EKS clusters with Karpenter, HPA, proper monitoring, and cost controls — ready for production workloads from day one.

Talk to Us