HomeBlogKubernetes Autoscaling with Karpenter: Cut Your AWS Bill in Half
KubernetesKarpenterAWSCost Optimization

Kubernetes Autoscaling with Karpenter: Cut Your AWS Bill in Half

April 15, 2026·14 min read·Omphora Engineering

Why Cluster Autoscaler isn't enough

Cluster Autoscaler (CAS) was designed for a world where Kubernetes node groups were fixed. It scales node groups up and down, but it's slow (1–2 minutes to provision a new node) and makes poor decisions about which instance type to use.

Karpenter changes this fundamentally. Instead of managing node groups, Karpenter provisions individual nodes in response to pending pods — often in under 30 seconds, and always with the right instance type for the workload.

What Karpenter does differently

  1. Direct EC2 provisioning — Karpenter calls the EC2 API directly, bypassing Auto Scaling Groups for new nodes
  2. Just-in-time sizing — Analyzes pending pod requirements and picks the optimal instance type
  3. Consolidation — Actively moves pods to fewer, fuller nodes and terminates the empty ones
  4. Spot-aware — Understands Spot interruptions and handles them gracefully

Installing Karpenter on EKS

# Terraform: Karpenter IAM resources
module "karpenter" {
  source  = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.0"

  cluster_name = module.eks.cluster_name

  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }
}
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version "1.0.0" \
  --namespace kube-system \
  --set settings.clusterName=my-cluster \
  --set settings.interruptionQueue=my-cluster-karpenter \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi

NodePool configuration

NodePool replaces the old Provisioner CRD:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000
    memory: 4000Gi
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m

Cost analysis: Cluster Autoscaler vs Karpenter

A real-world migration we performed for a fintech client:

Metric Cluster Autoscaler Karpenter
Node provisioning time 90–120 seconds 15–30 seconds
Spot instance usage 20% (manual config) 73% (automatic)
Node utilization (avg) 42% 71%
Monthly EC2 spend $18,400 $8,200

The key difference: Karpenter automatically picks Spot instances when they're available and cheaper, falling back to on-demand only when necessary. CAS requires manual node group configuration for each instance type.

Handling Spot interruptions

Karpenter handles Spot interruptions via the SQS interruption queue:

spec:
  disruption:
    # Karpenter will gracefully terminate Spot nodes 2 minutes
    # before the interruption, giving pods time to drain
    budgets:
      - nodes: "10%"  # Max 10% of nodes disrupted at once

For stateless workloads, this is transparent. For stateful workloads, combine with Pod Disruption Budgets.

Key takeaways

  • Karpenter is faster, smarter, and cheaper than Cluster Autoscaler for most teams
  • Expect 30–50% cost reduction from better instance selection and Spot adoption
  • Consolidation actively reduces waste by packing workloads onto fewer nodes
  • Migration from CAS is straightforward: install Karpenter, create NodePools, remove CAS

Not sure where to start?
Let's talk.

One conversation, no commitment. We listen to what your team is struggling with and give you an honest picture of what needs to change — and what doesn't.

  • What's slowing down your team's deployment process
  • Where your cloud spend is going — and what's being wasted
  • Security vulnerabilities in your current setup
  • Reliability gaps that could cause downtime
  • Blind spots in your monitoring and alerting
Available for new projectsResponse within 1 business dayNo long-term commitment required
your-infra ~ after-omphora
$ terraform apply
✓ 23 resources. Apply complete in 4m 12s
$ kubectl get nodes
NAME STATUS ROLES AGE
ip-10-0-1 Ready worker 2d
ip-10-0-2 Ready worker 2d
ip-10-0-3 Ready worker 2d
$ argocd app list
production Synced Healthy
staging Synced Healthy
$ # Commit → production: 3m 42s
✓ Zero downtime · p99: 82ms · cost ↓ 38%
$ # Example output — results vary by workload.
3m 42s
Deploy time
38%
Cost saved
99.9%
Uptime