Kubernetes Autoscaling with Karpenter: Cut Your AWS Bill in Half

Why Cluster Autoscaler isn't enough

Cluster Autoscaler (CAS) was designed for a world where Kubernetes node groups were fixed. It scales node groups up and down, but it's slow (1–2 minutes to provision a new node) and makes poor decisions about which instance type to use.

Karpenter changes this fundamentally. Instead of managing node groups, Karpenter provisions individual nodes in response to pending pods — often in under 30 seconds, and always with the right instance type for the workload.

What Karpenter does differently

Direct EC2 provisioning — Karpenter calls the EC2 API directly, bypassing Auto Scaling Groups for new nodes
Just-in-time sizing — Analyzes pending pod requirements and picks the optimal instance type
Consolidation — Actively moves pods to fewer, fuller nodes and terminates the empty ones
Spot-aware — Understands Spot interruptions and handles them gracefully

Installing Karpenter on EKS

# Terraform: Karpenter IAM resources
module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.0"

  cluster_name = module.eks.cluster_name

  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }
}

helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.0.0" \
  --namespace kube-system \
  --set settings.clusterName=my-cluster \
  --set settings.interruptionQueue=my-cluster-karpenter \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi

NodePool configuration

NodePool replaces the old Provisioner CRD:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
    memory: 4000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

Cost analysis: Cluster Autoscaler vs Karpenter

A real-world migration we performed for a fintech client:

Metric	Cluster Autoscaler	Karpenter
Node provisioning time	90–120 seconds	15–30 seconds
Spot instance usage	20% (manual config)	73% (automatic)
Node utilization (avg)	42%	71%
Monthly EC2 spend	$18,400	$8,200

The key difference: Karpenter automatically picks Spot instances when they're available and cheaper, falling back to on-demand only when necessary. CAS requires manual node group configuration for each instance type.

Handling Spot interruptions

Karpenter handles Spot interruptions via the SQS interruption queue:

spec:
  disruption:
    # Karpenter will gracefully terminate Spot nodes 2 minutes
    # before the interruption, giving pods time to drain
    budgets:
      - nodes: "10%"  # Max 10% of nodes disrupted at once

For stateless workloads, this is transparent. For stateful workloads, combine with Pod Disruption Budgets.

Key takeaways

Karpenter is faster, smarter, and cheaper than Cluster Autoscaler for most teams
Expect 30–50% cost reduction from better instance selection and Spot adoption
Consolidation actively reduces waste by packing workloads onto fewer nodes
Migration from CAS is straightforward: install Karpenter, create NodePools, remove CAS