Why Cluster Autoscaler isn't enough
Cluster Autoscaler (CAS) was designed for a world where Kubernetes node groups were fixed. It scales node groups up and down, but it's slow (1–2 minutes to provision a new node) and makes poor decisions about which instance type to use.
Karpenter changes this fundamentally. Instead of managing node groups, Karpenter provisions individual nodes in response to pending pods — often in under 30 seconds, and always with the right instance type for the workload.
What Karpenter does differently
- Direct EC2 provisioning — Karpenter calls the EC2 API directly, bypassing Auto Scaling Groups for new nodes
- Just-in-time sizing — Analyzes pending pod requirements and picks the optimal instance type
- Consolidation — Actively moves pods to fewer, fuller nodes and terminates the empty ones
- Spot-aware — Understands Spot interruptions and handles them gracefully
Installing Karpenter on EKS
# Terraform: Karpenter IAM resources
module "karpenter" {
source = "terraform-aws-modules/eks/aws//modules/karpenter"
version = "~> 20.0"
cluster_name = module.eks.cluster_name
node_iam_role_additional_policies = {
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
}
helm install karpenter oci://public.ecr.aws/karpenter/karpenter \
--version "1.0.0" \
--namespace kube-system \
--set settings.clusterName=my-cluster \
--set settings.interruptionQueue=my-cluster-karpenter \
--set controller.resources.requests.cpu=1 \
--set controller.resources.requests.memory=1Gi
NodePool configuration
NodePool replaces the old Provisioner CRD:
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-category
operator: In
values: ["c", "m", "r"]
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
limits:
cpu: 1000
memory: 4000Gi
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
Cost analysis: Cluster Autoscaler vs Karpenter
A real-world migration we performed for a fintech client:
| Metric | Cluster Autoscaler | Karpenter |
|---|---|---|
| Node provisioning time | 90–120 seconds | 15–30 seconds |
| Spot instance usage | 20% (manual config) | 73% (automatic) |
| Node utilization (avg) | 42% | 71% |
| Monthly EC2 spend | $18,400 | $8,200 |
The key difference: Karpenter automatically picks Spot instances when they're available and cheaper, falling back to on-demand only when necessary. CAS requires manual node group configuration for each instance type.
Handling Spot interruptions
Karpenter handles Spot interruptions via the SQS interruption queue:
spec:
disruption:
# Karpenter will gracefully terminate Spot nodes 2 minutes
# before the interruption, giving pods time to drain
budgets:
- nodes: "10%" # Max 10% of nodes disrupted at once
For stateless workloads, this is transparent. For stateful workloads, combine with Pod Disruption Budgets.
Key takeaways
- Karpenter is faster, smarter, and cheaper than Cluster Autoscaler for most teams
- Expect 30–50% cost reduction from better instance selection and Spot adoption
- Consolidation actively reduces waste by packing workloads onto fewer nodes
- Migration from CAS is straightforward: install Karpenter, create NodePools, remove CAS