Scaling Guide¶
This guide covers how to size Attune for your cluster, from small dev environments to large production deployments with thousands of workloads.
Architecture Overview¶
Attune runs as a single-leader Deployment. One replica performs all reconciliation work while the standby (in HA mode) waits to take over via leader election. The operator's main scaling dimensions are:
- Prometheus query rate - how fast it can fetch metrics for each policy
- Memory - proportional to the number of watched pods and their metric history
- API server pressure - listing pods, patching resize subresources, updating status
Quick Start¶
The fastest way to configure for your cluster size:
# Option A: one-liner with clusterSize preset
helm install attune ./charts/attune --set clusterSize=large
# Option B: use an example values file for full control
helm install attune ./charts/attune -f charts/attune/examples/values-large.yaml
Cluster Size Presets¶
The clusterSize value sets multiple parameters at once. Any explicitly set
value always overrides the preset.
Preset Reference¶
| Setting | small | medium | large | xlarge |
|---|---|---|---|---|
| Target workloads | up to ~100 | ~100-500 | ~500-5,000 | 5,000+ |
resources.requests.cpu |
100m | 250m | 500m | 1000m |
resources.requests.memory |
128Mi | 256Mi | 512Mi | 1Gi |
resources.limits.cpu |
500m | 1000m | 2000m | 4000m |
resources.limits.memory |
256Mi | 512Mi | 2Gi | 4Gi |
prometheusQPS |
10 | 20 | 40 | 80 |
prometheusBurst |
20 | 40 | 80 | 160 |
maxConcurrentReconciles |
1 | 2 | 4 | 8 |
replicaCount |
1 | 1 | 2 | 2 |
The "workloads" count means the number of Deployments, StatefulSets, and DaemonSets targeted by AttunePolicy resources, not total pods.
Override Behavior¶
Explicit values always win. For example, to use large resources but
a custom QPS:
clusterSize: large
prometheusQPS: 60 # overrides the large preset's 40
When clusterSize is empty (the default), no preset is applied. Resources
default to the small tier to ensure the operator always has resource
requests set.
Recommended CRD Settings by Cluster Size¶
These settings are configured on your AttunePolicy or
AttuneDefaults CRDs, not in the Helm chart. They affect per-policy
reconciliation behavior.
| Setting | small | medium | large | xlarge |
|---|---|---|---|---|
cooldown |
1h | 1h | 2h | 4h |
historyWindow |
168h (7d) | 168h (7d) | 72h (3d) | 48h (2d) |
queryStep |
5m | 5m | 15m | 30m |
maxConcurrentResizes |
1 | 3 | 10 | 20 |
Why reduce historyWindow at scale? Each Prometheus query fetches data
for the entire window. At 5,000+ workloads, a 7-day window produces very
large responses. Reducing to 48-72h keeps query latency manageable while
still capturing weekly patterns (the operator uses hourly P95 aggregation
internally).
Bottleneck Guide¶
What breaks first¶
- Reconcile throughput (most common at scale). By default the controller
processes one AttunePolicy at a time. With hundreds of policies, the
work queue grows and recommendations become stale. Symptom:
workqueue_depthis consistently > 0,workqueue_longest_running_processor_secondsclimbs. Fix: increasemaxConcurrentReconciles(or set aclusterSizepreset). The Prometheus rate limiter is shared across all goroutines, so concurrent reconciles won't overwhelm Prometheus.
Within each policy, workloads are processed in parallel (up to 10 concurrent workers). This means a single policy targeting 200 Deployments via label selector issues Prometheus queries concurrently instead of serially, reducing recommendation latency from minutes to seconds. The worker count is fixed; actual throughput is bounded by the Prometheus QPS rate limiter, not goroutine count.
-
Prometheus query rate. Symptom: reconcile queue grows,
attune_reconcile_duration_secondsP99 increases. Fix: increaseprometheusQPSandprometheusBurst. This works in tandem withmaxConcurrentReconciles: more goroutines can issue queries in parallel, but they share the same QPS budget. -
Operator memory. Symptom: OOMKilled pods. Fix: increase
resources.limits.memory. Memory usage is roughly proportional to the total number of pods across all targeted workloads. -
Prometheus server load. Symptom: slow or timed-out Prometheus queries, high memory on Prometheus itself. Fix: reduce
historyWindowand increasequeryStepon the CRDs. IncreaseprometheusTimeout(default 5m) if queries are slow but completing within the timeout. Consider Prometheus recording rules or Thanos query federation. -
API server pressure. Symptom: throttled API requests, slow pod list responses. Fix: this is rarely the bottleneck since the operator uses informer caches. See API Server Pressure below for details.
-
Informer cache memory. Symptom: operator memory grows linearly with cluster size even for namespaces without policies. Fix: use
watchNamespacesto limit the operator to only the namespaces that have AttunePolicy resources. See Namespace Scoping below.
Diagnosing with metrics¶
The operator exposes metrics on the /metrics endpoint:
| Metric | What it tells you |
|---|---|
attune_reconcile_duration_seconds |
How long each policy reconcile takes. P99 > 30s means queries are slow. |
attune_reconcile_duration_seconds_count |
Total reconciles. Compare with error count. |
attune_reconcile_errors_total |
Errors per policy. Prometheus timeouts show here. |
attune_resize_total |
Actual in-place resizes performed. |
attune_eviction_total |
Eviction fallback attempts when in-place resize is not possible. |
attune_reverts_total |
Reverted in-place resizes (safety mechanism). |
workqueue_depth |
Controller work queue depth. Consistently > 0 means the operator can't keep up. |
workqueue_longest_running_processor_seconds |
Longest in-flight reconcile. |
Prometheus sizing for Attune¶
Each AttunePolicy generates 2-4 Prometheus queries per reconcile cycle (CPU usage, memory usage, OOM events, restart events). At steady state with a 1-hour cooldown:
| Workloads | Queries/hour | Prometheus impact |
|---|---|---|
| 50 | ~200 | Negligible |
| 500 | ~2,000 | Low (< 1% of a typical Prometheus) |
| 5,000 | ~20,000 | Moderate (ensure Prometheus has 4+ cores, 8Gi+ memory) |
| 10,000+ | ~40,000+ | Significant (use recording rules or Thanos) |
API Server Pressure¶
Client-side rate limiting is disabled by default¶
The operator uses controller-runtime v0.24.x, which sets the Kubernetes
client's QPS to -1 (disabled). This means there is no client-side
rate limiting. All API server throttling is handled by Kubernetes
API Priority and Fairness
(APF), which is GA since Kubernetes 1.29.
This is intentional and matches the direction of the ecosystem. Operators like cert-manager explicitly recommend disabling client-side limiting on clusters with APF enabled.
Per-reconcile API call budget¶
Most reads go through the informer cache (zero API calls). Writes happen only during resize phases:
| Phase | API calls | Notes |
|---|---|---|
| Read-only (cached) | 0 | Get/List via informer cache |
| Status update | 1-2 per policy | Direct write to status subresource |
| Per pod resize | ~5-6 calls | UpdateResize + annotation persist + re-fetch |
| Safety observation | 1-2 per tracked pod | Direct get + update |
At steady state (Recommend mode, 1-hour cooldown), a cluster with 500 policies generates roughly 500-1000 API writes per hour, which is negligible for any production API server.
Cloud provider APF limits¶
| Provider | Mutating inflight | Total inflight | Notes |
|---|---|---|---|
| AKS | 200 (standard) / 50 (free) | 600 / 150 | Scales with SKU tier |
| GKE | 200 | 600 | Control plane auto-scales |
| EKS | 200 | 600 | Multiple HA API servers |
The operator lands in the global-default APF priority level unless you
create a custom FlowSchema. For most deployments, the default allocation
is more than sufficient.
When to add a custom FlowSchema¶
If the operator is deployed outside kube-system and you want guaranteed
API server capacity, create a FlowSchema that assigns its service account
to a dedicated priority level:
apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
name: attune
spec:
priorityLevelConfiguration:
name: workload-high
matchingPrecedence: 1000
rules:
- subjects:
- kind: ServiceAccount
serviceAccount:
name: attune
namespace: attune-system
resourceRules:
- verbs: ["*"]
apiGroups: ["*"]
resources: ["*"]
Namespace Scoping¶
By default, the operator watches all namespaces for AttunePolicy resources. On large clusters (10,000+ namespaces) where policies exist in only a few namespaces, this wastes informer cache memory watching namespaces that will never have policies.
Set watchNamespaces to limit the operator to specific namespaces:
watchNamespaces:
- production
- staging
- team-alpha
Or via CLI flag:
--watch-namespaces=production,staging,team-alpha
Behavior:
- When empty (default): watches all namespaces (cluster-scoped)
- When set: only watches the listed namespaces for namespace-scoped resources (Pods, Deployments, HPAs, AttunePolicies, etc.)
- Cluster-scoped resources (Nodes, AttuneDefaults) are always watched regardless of this setting
- Requires a restart to change the namespace list
Memory impact: On a 10,000-namespace cluster with policies in 50
namespaces, setting watchNamespaces reduces informer cache memory by
roughly 99% for namespace-scoped resources (Pods, Deployments, HPAs).
HA Deployment¶
For production clusters, run two replicas with leader election:
replicaCount: 2
leaderElection:
enabled: true
priorityClassName: system-cluster-critical
The large and xlarge presets automatically set replicaCount: 2.
Only one replica is active at a time; the standby takes over in ~15 seconds
if the leader fails.
Consider adding a PodDisruptionBudget (the chart creates one when
replicaCount > 1) and topologySpreadConstraints to spread replicas
across nodes:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: attune
Monitoring the Operator¶
Enable the ServiceMonitor for Prometheus Operator integration:
metrics:
serviceMonitor:
enabled: true
additionalLabels:
release: prometheus # match your Prometheus Operator selector
Enable the Grafana dashboard for an at-a-glance view:
grafanaDashboard:
enabled: true
The dashboard shows reconcile latency, queue depth, resize activity, and resource usage in a single pane.