Cutting Kubernetes Costs by 60% Without Sacrificing Reliability
A real case study in right-sizing nodes, pod-disruption budgets, spot instances, Karpenter-based autoscaling, and the FinOps practices that turned a $48k/month cluster into a $19k one.
Key takeaways
- 01Right-size requests with real data — measure for two weeks, then trim.
- 02Karpenter typically beats fixed node groups by 20 to 40 percent on cost alone.
- 03Spot instances are safe for stateless workloads with disruption budgets and a fallback to on-demand.
- 04Bin-pack with topology spread constraints, not anti-affinity per pod.
- 05Tag, attribute and chargeback — costs you cannot attribute, you cannot reduce.
The starting point
The cluster in this case study ran 240 microservices across three environments on Amazon EKS. The monthly compute bill at the start of the engagement was $48,300. The team was paying for 92 m6i.2xlarge nodes on a fixed node group, and the cluster average CPU utilisation was 18 percent. Memory utilisation was higher (51 percent) because every team had inflated memory requests after one OOMKill incident. The numbers in this article are real; the project name has been redacted.
Step 1: measure before you cut
- Install Goldilocks or KRR (Kubernetes Resource Recommender) on a non-disruptive schedule.
- Capture two full weeks of usage including a peak business event if you have one.
- Use p95 of usage as the request target, plus a 20 percent safety margin.
- Use p99 plus a generous margin as the limit — but only on workloads that can tolerate it.
The first pass on this cluster cut requests by 38 percent on average. We did not touch limits in the first pass — that comes later, after we know the workloads survive in production.
Step 2: replace fixed node groups with Karpenter
Fixed node groups are the largest source of waste in production clusters. They scale slowly, force you into a single instance type, and bin-pack badly. Karpenter (now also available on AKS as Karpenter for AKS) provisions nodes that fit the actual pending pods, picks the cheapest instance that satisfies the constraints, and consolidates aggressively when load drops. On this cluster, switching to Karpenter cut compute spend by an additional 28 percent before any other optimisation.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata: { name: default }
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["5"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 30s
limits:
cpu: "2000"Step 3: spot instances for stateless workloads
Spot is 60 to 80 percent cheaper than on-demand and almost always under-used. The blockers are real but solvable: 2-minute interruption notice, regional capacity variability, and the fear of cascade failures. The safety net we ship by default is a topology spread plus a PodDisruptionBudget that guarantees a minimum number of pods on on-demand. Karpenter handles the fall-back automatically.
- Mark every Deployment with a topology spread across capacity-type — at least 1 pod always lands on on-demand.
- PDB minAvailable: 50 percent for tier-1 services, 1 for tier-3.
- Deploy a node-termination handler so pods are drained gracefully on the 2-minute notice.
- Use multiple instance families — capacity for 5 instance types is far more reliable than capacity for one.
Step 4: vertical pod autoscaling — selectively
VPA in 'Recreate' mode is dangerous on stateful workloads but excellent for batch and CI runners. We turn VPA on in 'Off' mode for everything (recommendations only), in 'Initial' mode for scaled-out web services (sets initial requests on pod creation), and in 'Auto' mode only for explicitly approved workloads. The recommendations alone, applied through CI, captured another 9 percent of cost on this cluster.
Step 5: bin-packing with topology spread
Default scheduler placement is spread-first, which is good for availability and bad for bin-packing. We use topologySpreadConstraints with maxSkew: 1 across zones (for HA) plus a small soft constraint on the node level (so the scheduler will pack a half-empty node before opening a new one). On this cluster the change moved average node CPU utilisation from 28 percent to 64 percent.
Step 6: tag, attribute, chargeback
FinOps maturity is the difference between a one-time cost cut and an ongoing discipline. We label every namespace with a cost-centre and product code, run kubecost or OpenCost daily, and produce a weekly report broken down by team. Once teams see their own bill, the optimisation conversations start happening organically. On this engagement, the cluster bill reduced by another 8 percent in the three months after we left, with no further engineering work — just visibility.
Three things we did not do
- We did not switch to a different cluster manager. EKS is fine; the same playbook works on GKE, AKS or self-managed clusters.
- We did not adopt service mesh. Cost-wise it tends to add overhead without commensurate savings.
- We did not micro-optimise per-service. The 80/20 of cluster cost is in five or six platform decisions, not in 240 individual services.
Three months later: the numbers
| Metric | Before | After |
|---|---|---|
| Monthly compute spend | $48,300 | $19,400 |
| Average node CPU utilisation | 18% | 64% |
| Average node memory utilisation | 51% | 72% |
| Active nodes (steady state) | 92 | 28-44 (variable) |
| P95 latency on flagship API | 118 ms | 112 ms |
| Tier-1 SLO breaches in period | 0 | 0 |
What sustains the savings
We left this cluster with three durable controls: a CI check that fails any deployment whose requests deviate more than 50 percent from VPA recommendations; a weekly automated report mailing top spenders; and a quarterly architecture review that examines node-pool composition and consolidation. Without those, savings drift back inside two quarters.
Frequently asked questions
Direct answers to questions readers and AI assistants commonly ask about this topic.
What is the single biggest source of Kubernetes waste?+
Inflated CPU and memory requests. Most clusters operate at 15 to 30 percent average utilisation because every team pads requests as a safety margin. Right-sizing with two weeks of real data is typically the largest one-time saving.
Should I use Karpenter or Cluster Autoscaler?+
Karpenter for AWS and AKS — its faster, instance-aware provisioning consistently outperforms Cluster Autoscaler on cost. Cluster Autoscaler is fine on GKE, where the equivalent functionality is built into the managed service.
Are spot instances safe for production?+
For stateless workloads with proper PodDisruptionBudgets, multiple instance type fallback, and a topology spread that keeps a fraction on on-demand, yes. Avoid spot for stateful workloads, single-replica leaders, and anything with a long warmup time.
How do I attribute Kubernetes spend across teams?+
Use OpenCost or Kubecost with consistent labels (cost-centre, product, environment) on every namespace and workload. Publish a weekly report; the visibility itself is half the battle.
How long does a cost optimisation programme take to pay back?+
Most one-time savings (right-sizing, autoscaling) pay back inside the first month. The compounding programme savings from FinOps discipline pay back inside a quarter.
Last updated: April 26, 2026 · Written by Ribbsaeter Systems Engineering · Platform Engineering