Karpenter vs Cluster Autoscaler: which one should you use?
If you're running EKS in production, autoscaling is where most of your cost savings live. Not rightsizing individual pods -- that matters too -- but how quickly and intelligently your cluster adds and removes nodes in response to real workload demand. Get this wrong and you're either paying for idle capacity around the clock or watching deployments sit in Pending while the autoscaler slowly catches up.
We've run both Cluster Autoscaler and Karpenter across production clusters of varying sizes -- from small three-node dev environments to clusters handling hundreds of pods with highly variable traffic patterns. This is an honest comparison based on what we've actually seen, not what the docs promise.
Cluster Autoscaler: the incumbent
Cluster Autoscaler (CA) has been the default choice for Kubernetes node autoscaling since long before Karpenter existed. The mental model is straightforward: CA watches for pods that can't be scheduled because there's no node with enough capacity, then scales up an Auto Scaling Group to add nodes. When nodes are underutilized, it drains and terminates them.
The configuration lives primarily in the ASG and the CA deployment itself. You define your node groups with specific instance types, and CA scales those groups up and down within the min/max bounds you've set.
# Cluster Autoscaler Helm values (simplified)
autoDiscovery:
clusterName: my-cluster
tags:
- k8s.io/cluster-autoscaler/enabled
- k8s.io/cluster-autoscaler/my-cluster
extraArgs:
scale-down-delay-after-add: "10m"
scale-down-unneeded-time: "10m"
skip-nodes-with-local-storage: "false"
balance-similar-node-groups: "true"
expander: "least-waste"
Where CA works well
- Maturity and predictability. CA has been around for years. The behavior is well-understood, well-documented, and there are very few surprises. If you configure it correctly, it does exactly what you expect.
- Simple mental model. You define node groups with specific instance types. You know exactly what kind of node will be provisioned. This is valuable in regulated environments where infrastructure changes need to be fully predictable.
- Broad ecosystem support. Every Kubernetes monitoring tool, every cost analysis platform, every tutorial out there covers CA. If your team is new to Kubernetes, the learning curve is gentler.
Where CA falls short
- Speed. CA's scaling loop typically runs every 10-15 seconds, but the actual time from "pod pending" to "pod running on a new node" is often 3-5 minutes. It has to recognize the scheduling failure, decide which ASG to scale, wait for EC2 to launch the instance, wait for the node to join the cluster, and then the pod gets scheduled. Under pressure, that lag hurts.
- Node group rigidity. Each node group is tied to one or a small set of instance types. If you want flexibility across instance families -- say, using m6i.large normally but falling back to m5.large or m6a.large when capacity is tight -- you need multiple node groups or creative ASG mixed-instance policies. It gets messy.
- No consolidation. CA will scale down nodes that are underutilized, but it won't proactively move workloads to pack them tighter. If you have three nodes each running at 30% utilization, CA won't consolidate them down to one or two. The pods have to leave on their own first.
Karpenter: the new approach
Karpenter takes a fundamentally different approach. Instead of scaling existing Auto Scaling Groups, it provisions EC2 instances directly through the EC2 Fleet API. There are no node groups. Karpenter looks at the pending pods, evaluates their resource requirements, constraints, and topology preferences, and launches the right instance type on the fly.
The configuration is declarative through a NodePool (previously called Provisioner in older versions) and a NodeClass:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["arm64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h
limits:
cpu: "100"
memory: 400Gi
That single NodePool tells Karpenter: use ARM64 Graviton instances, prefer spot but fall back to on-demand, choose from m/c/r families generation 6+, consolidate when underutilized, and replace nodes after 30 days. Karpenter figures out the specific instance type at launch time based on what the pending pods actually need.
Where Karpenter shines
- Speed. Karpenter typically provisions a new node and gets a pod scheduled in under 90 seconds. In many cases we see it under 60 seconds. It skips the ASG intermediary entirely and calls the EC2 Fleet API directly, which removes an entire layer of latency.
- Intelligent instance selection. Instead of you pre-defining which instance types to use, Karpenter evaluates the pool of available instance types in real time and picks the cheapest one that fits. If a c7g.large is cheaper than an m7g.large and your pod only needs CPU, Karpenter picks the c7g. This is where the cost savings come from.
- Consolidation. This is the killer feature. Karpenter actively watches for opportunities to replace underutilized nodes with smaller, cheaper ones -- or to pack workloads tighter onto fewer nodes. It respects PodDisruptionBudgets and does rolling replacements, but it's proactively optimizing your cluster bin-packing around the clock.
- Right-sized nodes. CA gives you the same instance type regardless of whether the pending pod needs 100m CPU or 4 CPUs. Karpenter can launch a t3.medium for a tiny pod and an m7g.2xlarge for a memory-heavy one. You stop paying for oversized nodes that are mostly idle.
Where Karpenter has rough edges
- Newer project, faster-moving API. Karpenter went GA in late 2023, but the API has changed significantly (Provisioner to NodePool, AWSNodeTemplate to EC2NodeClass). If you adopted it early, you've done at least one migration. The pace has stabilized, but it's not as set-and-forget as CA.
- Less ecosystem coverage. Some cost tools and dashboards assume ASG-based scaling. Karpenter's nodes don't belong to an ASG, so tooling that relies on ASG tags or group membership needs adjustment.
- Consolidation can be surprising. If your disruption budgets aren't configured correctly, Karpenter will cheerfully move workloads around to optimize cost. That's great until a stateful service gets disrupted unexpectedly. You need to be deliberate about your
disruptionsettings and PDBs.
Real numbers from production
Across the clusters where we've migrated from Cluster Autoscaler to Karpenter, we've consistently seen 25-40% reduction in compute costs. The savings come from three places:
- Better bin-packing -- fewer nodes running at low utilization because consolidation is always working
- Smarter instance selection -- Karpenter picks cheaper instance types that fit the actual workload, rather than always launching the one type you pre-configured
- Faster scale-down -- nodes are removed more aggressively when load drops, instead of lingering for the CA cooldown period
The biggest wins come on workloads with variable demand -- services that spike during business hours and go quiet at night, or batch processing that runs in bursts. If your workload is flat 24/7, the difference narrows. But even on steady workloads, the bin-packing improvements alone typically save 15-20%.
There's more to the optimization story -- spot instance strategies, consolidation policy tuning, disruption budgets for stateful workloads -- but those details depend heavily on your specific workload profile. The numbers above are the baseline you can expect from a straightforward migration.
When to use which
Stick with Cluster Autoscaler if:
- You need fully predictable instance types for compliance or capacity planning reasons
- Your team is early in their Kubernetes journey and you want the simplest possible setup
- You're running a small, stable cluster with consistent workloads where autoscaling rarely fires
- Your existing tooling and runbooks are built around ASGs and you don't want to retool
Use Karpenter if:
- Cost optimization is a priority and you're willing to invest time in tuning
- Your workloads have variable demand and you need fast scale-up
- You want to use spot instances broadly without managing multiple spot-specific node groups
- You're setting up a new cluster and don't have legacy ASG dependencies
Our default for new clusters is Karpenter. We reach for it first unless there's a specific reason not to. But we don't tell teams running stable Cluster Autoscaler setups to migrate just for the sake of it -- if CA is meeting your needs and your costs are acceptable, the migration effort may not be justified.
The bottom line
Both tools solve the same fundamental problem: matching node capacity to workload demand. Cluster Autoscaler does it conservatively and predictably. Karpenter does it aggressively and intelligently. The right choice depends on what you value more -- simplicity or optimization.
What we've skipped here is the deeper nuance: how to structure NodePools for mixed spot/on-demand fleets, how to set consolidation policies that don't disrupt stateful workloads, how disruption budgets interact with Karpenter's node replacement logic, and how to handle the EC2 capacity constraints that spot instances inevitably run into. Those topics deserve their own posts.
If you're evaluating Karpenter for an existing cluster or designing an autoscaling strategy for a new one, we're happy to talk through it. The migration path matters more than the destination -- getting the rollout wrong can cause more downtime than the savings are worth.
Vishwaraja Pathi
Cloud & DevOps specialist with 13+ years of experience. Founder of Adiyogi Technologies. Previously at Roku, Rocket Lawyer, and BetterPlace.