Why Attune¶
The $44 Billion Problem You're Contributing To¶
Every Kubernetes cluster wastes money. Not "maybe" or "probably." Every single one.
The numbers are staggering:
| Stat | Source |
|---|---|
| 8% average CPU utilization across K8s clusters | CAST AI 2026 State of K8s Optimization |
| 99.94% of clusters are overprovisioned | CAST AI 2025 Cost Benchmark |
| 83% of container costs are idle resources | Datadog State of Cloud Costs 2024 |
| $44.5 billion in projected cloud infrastructure waste for 2025 | Harness FinOps in Focus 2025 |
| 70% cite overprovisioning as the #1 cost driver | CNCF FinOps Microsurvey 2023 |
Here's what's happening: your developers set resources.requests to "something
that works," add a generous safety margin because they don't want 3 AM pages,
and never touch it again. Multiply that across every container in every
deployment in every namespace, and you're paying for 10x the compute you
actually use.
Why Nobody Fixes This (Even Though Everyone Knows About It)¶
The Kubernetes ecosystem has had a tool for this since 2018: the Vertical Pod Autoscaler (VPA). So why does less than 1% of the industry run it in production?
VPA evicts your pods¶
VPA's "Auto" mode works by evicting pods and recreating them with new resource values. In theory, this sounds fine. In practice, it means:
-
Pod restarts during traffic spikes. VPA sees high usage, recommends more resources, and evicts the pod to apply them. The pod restarts, loses its in-memory state, and starts handling requests from a cold cache, exactly when you need it most.
-
JVM cold start penalties. Java applications lose their JIT-compiled code on every restart. A warmed-up JVM might handle 10,000 req/s; a cold one handles 2,000. VPA evictions can cause a 5x throughput drop that cascades through your service mesh.
-
Stateful workload disruption. Databases, message brokers, and ML training jobs lose progress. A 6-hour training run evicted at hour 5 wastes 5 hours of GPU time.
-
Cascading failures. When VPA evicts a Prometheus pod (which it monitors itself through), the recommender loses its data source and starts making blind recommendations. This actually happened in production and took down an entire cluster's observability.
VPA fights with HPA¶
If you're running Horizontal Pod Autoscaler (HPA) on the same workloads (and you probably should be), VPA creates a death spiral:
- VPA sees low average CPU usage and lowers resource requests
- HPA sees CPU utilization spike (because the same load on smaller requests = higher percentage) and scales out
- More replicas with lower requests = lower average usage again
- VPA lowers requests further
- Eventually pods are too small to handle even a single request, and everything falls over
This is why the official Kubernetes documentation explicitly warns against running VPA and HPA on the same metric.
VPA's recommender is a black box¶
VPA uses backward-looking exponential histograms with a 24-hour half-life. This means:
- It reacts slowly to genuine load increases
- It doesn't understand time-of-day patterns (your 2 AM recommendation shouldn't be based on your 2 PM peak)
- It treats all workloads the same, whether it's a CPU-intensive API server or a memory-heavy cache
- The recommendations are not bounded by default, so VPA can recommend resources larger than any node in your cluster
The result: nobody uses VPA in production¶
Teams install VPA in Recommend mode, look at the numbers, manually apply some
changes once a quarter, and move on. The promise of automated right-sizing
remains unfulfilled.
Recommendation-only tools don't solve this¶
Tools like Goldilocks and Robusta KRR took a different approach: skip VPA's dangerous Auto mode entirely and just show you the recommendations. Goldilocks creates a dashboard. KRR prints a table. Both are useful for a one-time audit.
The problem is what happens next. For a platform running 200 microservices, "useful recommendations" means 200 Deployment YAML edits, 200 pull requests, 200 code reviews, and 200 rollouts. Most teams create a Jira ticket titled "right-size services," and it sits in the backlog for six months. The recommendations go stale. New services deploy with the same inflated defaults. Nothing changes.
Diagnostic tools tell you what to fix. They don't fix it. At scale, the gap between "knowing" and "doing" is where savings go to die.
Attune closes that gap. It computes the recommendation AND applies it to the running pod, with graduated safety controls so you don't have to babysit each change. No YAML edits, no pull requests, no backlog tickets.
What Changed: In-Place Pod Resize (Kubernetes 1.32+)¶
In December 2025, Kubernetes v1.35 graduated In-Place Pod Resize to GA (stable). This feature, tracked as KEP-1287, was 7 years in the making.
What it does: The kubelet can now adjust a container's CPU and memory limits by modifying the cgroup configuration directly, without restarting the container or evicting the pod. Your application never notices.
This changes everything. The entire reason VPA was dangerous, the eviction and restart cycle, is no longer necessary. A smart operator can now:
- Read usage metrics from Prometheus
- Calculate optimal resource levels
- Apply them to running pods via the
/resizesubresource - Monitor for problems and revert if needed
All without a single pod restart.
Enter Attune¶
Attune is a Kubernetes operator built exclusively for in-place pod right-sizing. It was designed from the ground up for the post-KEP-1287 world, not retrofitted onto a VPA architecture that was never meant for it.
How it works¶
You deploy an AttunePolicy CR
│
▼
┌─────────────────────┐ ┌──────────────┐
│ Metrics Collector │────►│ Prometheus │
│ (hourly profiles) │◄────│ (CPU + Mem) │
└─────────┬───────────┘ └──────────────┘
│
▼
┌─────────────────────┐
│ Recommender Engine │
│ P95/P99 percentile │
│ + overhead │
│ + confidence scaling│
│ + bounds clamping │
│ + change filter │
└─────────┬───────────┘
│
▼
┌─────────────────────┐ ┌──────────────┐
│ Resize Engine │────►│ K8s API │
│ (/resize sub- │ │ /resize │
│ resource) │ │ subresource │
└─────────┬───────────┘ └──────────────┘
│
▼
┌─────────────────────┐
│ Safety Monitor │
│ OOMKill detection │
│ CPU throttle check │
│ Restart spike │
│ Pod NotReady │
│ Auto-revert │
└──────────────────────┘
Five modes for every comfort level¶
You don't have to go from zero to fully-automated overnight. Attune provides a graduated path:
| Mode | What happens | Risk level |
|---|---|---|
| Observe | Collects metrics and tracks data-point progress; no recommendations surfaced | Zero |
| Recommend | Collects metrics and writes recommendations to the policy status | Zero |
| OneShot | Resizes one pod per reconciliation cycle, then stops | Minimal |
| Canary | Resizes 10% of pods first, watches them, then auto-promotes to the rest (optional) | Low |
| Auto | Continuously resizes all eligible pods based on observed metrics | Production-ready |
Most teams follow this progression:
Week 1-2: Recommend mode → Validate recommendations look sane
Week 3: Canary mode → Resize 10% of pods, watch for issues
Week 4+: Auto mode → Let the operator handle it continuously
Safety is not an afterthought¶
Every resize is guarded by a multi-layer safety system:
- OOMKill detection: If a resized container gets OOMKilled, the operator immediately reverts to the original resources.
- CPU throttle monitoring: If CPU throttling exceeds 50% post-resize, the operator reverts.
- Restart spike detection: 2+ restarts after a resize triggers a revert.
- Pod health checks: If the pod loses its
Readycondition, the operator reverts. - Exponential backoff: Each consecutive revert doubles the cooldown period (capped at 16x), so the operator doesn't keep hammering a problematic workload.
- Degraded condition: When 3+ of the last 5 resizes are reverted, the
policy is flagged as
Degradedso you know the parameters need tuning. - LimitRange/ResourceQuota guard: Resizes that would violate namespace constraints are skipped entirely.
- Node capacity guard: The operator checks that total pod resource requests after resize won't exceed node allocatable.
HPA coexistence, for real¶
Attune adjusts the base resource request, not the replica count. This means HPA continues to scale horizontally based on its configured metrics, while Attune ensures each replica is right-sized. No death spiral. No conflicting signals.
Time-of-day awareness¶
Unlike VPA's single-number histogram, Attune buckets usage data into 24 hourly profiles and takes the maximum across all hours. This means:
- A workload that peaks at 2 PM gets a recommendation based on that peak
- Overnight batch jobs don't drag down daytime recommendations
- Weekend vs. weekday patterns are captured in the overall percentile
Confidence-scaled recommendations¶
When data is sparse (you just deployed the policy, or the workload is new), recommendations are automatically inflated to be conservative. As more data accumulates, confidence increases and recommendations become more precise.
With a full 7-day history window and the default queryStep: 5m, confidence reaches
1.0 and the recommendation reflects actual observed behavior with minimal padding.
Who Is This For?¶
Platform engineering teams¶
You manage dozens or hundreds of namespaces. Developers set resource requests once and never look at them again. You're tired of fielding tickets about cluster capacity while dashboards show 8% utilization.
Attune gives you cluster-wide and namespace-scoped defaults.
Use AttuneDefaults for cluster baselines, AttuneNamespaceDefaults
for environment- or team-specific overrides, and per-namespace
AttunePolicy resources for workload-level customization.
FinOps teams¶
You know the cluster is overprovisioned but can't quantify it or fix it
without disrupting production. The Grafana dashboard and kubectl attune
savings command give you concrete dollar estimates per workload, and the
graduated rollout modes let you capture savings without risk.
SREs running latency-sensitive services¶
You can't afford pod restarts during peak traffic. VPA is off the table. But you also know your services are requesting 4 cores and using 0.5. In-place resize lets you reclaim that 3.5 cores without touching a single pod lifecycle.
Teams running HPA¶
You've been told "VPA and HPA don't mix." Attune fixes the base request so each HPA-scaled replica is right-sized, while HPA continues to handle horizontal scaling. They complement each other.
Anyone running Kubernetes 1.32+¶
If your cluster supports in-place pod resize, you're leaving money on the table by not using it.
Real-World Scenario: An API Service¶
Let's walk through a concrete example.
Before attune:
| Resource | Requested | Actual P95 usage | Utilization |
|---|---|---|---|
| CPU | 2000m (2 cores) | 400m | 20% |
| Memory | 4Gi | 1.2Gi | 30% |
This deployment has 10 replicas. On AWS EKS at on-demand pricing:
- CPU waste: 1.6 cores x 10 replicas x $0.031/core-hr x 730 hr/mo = $362/mo
- Memory waste: 2.8 GiB x 10 replicas x $0.004/GiB-hr x 730 hr/mo = $82/mo
- Total waste: $444/month for one service
After Attune (with P95 + 20% overhead):
| Resource | Original | Recommended | Savings |
|---|---|---|---|
| CPU | 2000m | 480m (400m + 20%) | 76% |
| Memory | 4Gi | 1.56Gi (1.2Gi + 30%) | 61% |
The operator applies this change in-place. No restarts. No HPA interference. The pods continue serving traffic with the same performance, just using fewer reserved resources.
Now multiply this across 50 services and you're saving $20,000+/month.
How Attune Compares¶
The Kubernetes rightsizing ecosystem spans 16+ tools, from open-source recommenders to full-stack commercial platforms. Here is how they compare across the capabilities that matter most.
Open-source tools¶
| VPA | Goldilocks | KRR (Robusta) | Oblik | kube-reqsizer | Attune | |
|---|---|---|---|---|---|---|
| Primary function | Recommend + apply | VPA dashboard | CLI recommender | VPA applier | Usage-based controller | Recommend + in-place apply |
| Resize method | Evict/recreate, InPlaceOrRecreate (1.33+) | No resize | No resize | Cron-based rollout | Rolling restart | In-place only |
| HPA compatible | No (conflicts on CPU metric) | N/A | N/A | N/A | No | Yes |
| Safety system | Minimal (PDB only) | N/A | N/A | min-diff thresholds | None | Multi-layer (OOMKill, throttle, revert) |
| Time-of-day aware | No (24h half-life histogram) | No | No | No | No | Yes (hourly profiles) |
| Graduated rollout | No (all-or-nothing) | N/A | N/A | No | No | 5 modes (Observe to Auto) |
| Per-resource config | containerPolicies[] | N/A | CLI flags | Annotations per resource | N/A | Typed CRD (cpu/memory sections) |
| Confidence scaling | Internal, not configurable | N/A | N/A | N/A | N/A | Configurable, visible in status |
| Config model | CRD | VPA + labels | CLI flags | CRD + annotations | Annotations | CRD + defaults hierarchy |
| Cluster-wide defaults | No | No | No | Env vars | No | Yes (AttuneDefaults CRD) |
Commercial platforms¶
| CAST AI | StormForge | ScaleOps | PerfectScale | Datadog | nOps | Spot Ocean | Sedai | |
|---|---|---|---|---|---|---|---|---|
| Resize method | In-place + rollout | In-place + rollout | In-place + rollout | In-place + rollout | In-place + rollout | VPA-based | VPA-based | Agent-based |
| Recommender | ML, usage-based | ML (Bayesian) | Real-time + burst | AI + risk scoring | Usage histograms | VPA + policies | VPA | Reinforcement learning |
| HPA coordination | Yes | Yes (adjusts targets) | Yes | Yes | Yes (unified CRD) | Partial | Partial | Yes |
| Per-step change cap | Change sensitivity % | maxPercentIncrease/Decrease | N/A | Policy levels | N/A | N/A | N/A | SLO guardrails |
| Graduated rollout | Immediate/deferred | Incremental % steps | Continuous | Risk-scored | Preview/Apply modes | Scheduled windows | N/A | DataPilot to AutoPilot |
| Config model | Proprietary CRD | Annotations | Proprietary CRD | Hierarchical CRDs | DatadogPodAutoscaler CRD | VPA + annotations | Standard VPA CRD | Platform API |
| Node optimization | Yes (Spot, bin-pack) | No | Yes | No | No | Yes (EKS) | Yes (Spot, headroom) | Yes |
| Self-hosted option | No (SaaS) | Hybrid | Yes | No (SaaS) | Agent + SaaS | No (SaaS) | No (SaaS) | No (SaaS) |
| Open source | No | No | No | No | Agent only | No | No | No |
| Typical cost | % of savings | Per-cluster | Per-cluster | Per-cluster | Included with Datadog | % of savings | Per-node | Per-workload |
Attune vs. the field¶
| Capability | Attune | How many of 16 tools have it |
|---|---|---|
| In-place resize (no eviction) | Yes | 9/16 (VPA 1.33+, CAST AI, StormForge, ScaleOps, PerfectScale, Datadog, nOps, Spot Ocean, Kedify) |
| HPA coexistence | Yes | 7/16 |
| Multi-layer safety with auto-revert | Yes | 2/16 (us, PerfectScale) |
| Graduated rollout (3+ modes) | Yes | 4/16 (us, StormForge, PerfectScale, Sedai) |
| Time-of-day awareness | Yes | 3/16 (us, StormForge, ScaleOps) |
| Cluster-wide defaults CRD | Yes | 2/16 (us, PerfectScale) |
| Fully open source (Apache 2.0) | Yes | 5/16 (VPA, Goldilocks, KRR, Oblik, kube-reqsizer) |
| No SaaS dependency | Yes | 7/16 (all OSS + ScaleOps) |
| kubectl plugin with savings estimates | Yes | 0/16 |
| Canary rollout for resizes | Yes | 0/16 |
Where commercial tools win¶
Commercial platforms like CAST AI, ScaleOps, and StormForge offer capabilities that Attune intentionally does not cover:
- Node optimization: Spot instance management, bin-packing, and cluster autoscaling. Pair Attune with Karpenter for an open-source equivalent.
- ML/predictive recommenders: Bayesian optimization (StormForge), reinforcement learning (Sedai), and risk-scored automation (PerfectScale) can outperform percentile-based recommendations for highly variable workloads.
- Multi-cloud dashboards: Unified cost views across AWS, GCP, and Azure with commitment optimization.
Where Attune wins¶
- Full control: The recommendation algorithm is open, auditable, and modifiable. No black-box ML.
- No SaaS dependency: Your metrics stay in your Prometheus. No data leaves the cluster.
- Kubernetes-native: Standard CRDs, conditions, events, and kubectl plugin. Works with existing GitOps workflows.
- Safety-first: The only open-source tool with OOMKill detection, CPU throttle monitoring, restart spike detection, and automatic revert with exponential backoff.
- Cost: Free forever (Apache 2.0). Commercial tools charge $10,000-50,000+/year for large clusters.
Getting Started¶
It takes less than 5 minutes to start seeing recommendations.
1. Install the operator¶
helm install attune oci://ghcr.io/attune-io/charts/attune \
--namespace attune-system --create-namespace
2. Create your first policy¶
apiVersion: attune.io/v1alpha1
kind: AttunePolicy
metadata:
name: my-app
namespace: default
spec:
targetRef:
kind: Deployment
name: my-app
metricsSource:
prometheus:
address: http://prometheus-server.monitoring:80
updateStrategy:
type: Recommend
3. Wait for data, then review¶
# After enough data collection for your queryStep
kubectl attune recommendations -n default
kubectl attune savings -n default
4. Promote to Canary, then Auto¶
# Try on 10% of pods first (autoPromote handles the rest)
kubectl patch rsp my-app --type merge \
-p '{"spec":{"updateStrategy":{"type":"Canary","canary":{"percentage":10,"autoPromote":true},"autoRevert":true}}}'
With autoPromote: true, the operator automatically promotes to the full
fleet after the observation period passes without safety violations. No
manual mode switch needed.
That's it. No agents to install. No SaaS to configure. No pods to restart.
Next steps:
- Estimate your savings with the interactive calculator
- Quick Start guide for a hands-on walkthrough
- Migrating from VPA if you're replacing an existing VPA setup
- Concepts for a deep dive into how the recommendation engine works