Scaling Guide¶

This guide covers how to size Attune for your cluster, from small dev environments to large production deployments with thousands of workloads.

Architecture Overview¶

Attune runs as a single-leader Deployment. One replica performs all reconciliation work while the standby (in HA mode) waits to take over via leader election. The operator's main scaling dimensions are:

Prometheus query rate - how fast it can fetch metrics for each policy
Memory - proportional to the number of watched pods and their metric history
API server pressure - listing pods, patching resize subresources, updating status

Quick Start¶

The fastest way to configure for your cluster size:

# Option A: one-liner with clusterSize preset
helm install attune ./charts/attune --set clusterSize=large

# Option B: use an example values file for full control
helm install attune ./charts/attune -f charts/attune/examples/values-large.yaml

Cluster Size Presets¶

The clusterSize value sets multiple parameters at once. Any explicitly set value always overrides the preset.

Preset Reference¶

Setting	small	medium	large	xlarge
Target workloads	up to ~100	~100-500	~500-5,000	5,000+
`resources.requests.cpu`	100m	250m	500m	1000m
`resources.requests.memory`	128Mi	256Mi	512Mi	1Gi
`resources.limits.cpu`	500m	1000m	2000m	4000m
`resources.limits.memory`	256Mi	512Mi	2Gi	4Gi
`prometheusQPS`	10	20	40	80
`prometheusBurst`	20	40	80	160
`maxConcurrentReconciles`	1	2	4	8
`replicaCount`	1	1	2	2

The "workloads" count means the number of Deployments, StatefulSets, and DaemonSets targeted by AttunePolicy resources, not total pods.

Override Behavior¶

Explicit values always win. For example, to use large resources but a custom QPS:

clusterSize: large
prometheusQPS: 60  # overrides the large preset's 40

When clusterSize is empty (the default), no preset is applied. Resources default to the small tier to ensure the operator always has resource requests set.

Recommended CRD Settings by Cluster Size¶

These settings are configured on your AttunePolicy or AttuneDefaults CRDs, not in the Helm chart. They affect per-policy reconciliation behavior.

Setting	small	medium	large	xlarge
`cooldown`	1h	1h	2h	4h
`historyWindow`	168h (7d)	168h (7d)	72h (3d)	48h (2d)
`queryStep`	5m	5m	15m	30m
`maxConcurrentResizes`	1	3	10	20

Why reduce historyWindow at scale? Each Prometheus query fetches data for the entire window. At 5,000+ workloads, a 7-day window produces very large responses. Reducing to 48-72h keeps query latency manageable while still capturing weekly patterns (the operator uses hourly P95 aggregation internally).

Bottleneck Guide¶

What breaks first¶

Reconcile throughput (most common at scale). By default the controller processes one AttunePolicy at a time. With hundreds of policies, the work queue grows and recommendations become stale. Symptom: workqueue_depth is consistently > 0, workqueue_longest_running_processor_seconds climbs. Fix: increase maxConcurrentReconciles (or set a clusterSize preset). The Prometheus rate limiter is shared across all goroutines, so concurrent reconciles won't overwhelm Prometheus.

Within each policy, workloads are processed in parallel (up to 10 concurrent workers). This means a single policy targeting 200 Deployments via label selector issues Prometheus queries concurrently instead of serially, reducing recommendation latency from minutes to seconds. The worker count is fixed; actual throughput is bounded by the Prometheus QPS rate limiter, not goroutine count.

Prometheus query rate. Symptom: reconcile queue grows, attune_reconcile_duration_seconds P99 increases. Fix: increase prometheusQPS and prometheusBurst. This works in tandem with maxConcurrentReconciles: more goroutines can issue queries in parallel, but they share the same QPS budget.
Operator memory. Symptom: OOMKilled pods. Fix: increase resources.limits.memory. Memory usage is roughly proportional to the total number of pods across all targeted workloads.
Prometheus server load. Symptom: slow or timed-out Prometheus queries, high memory on Prometheus itself. Fix: reduce historyWindow and increase queryStep on the CRDs. Increase prometheusTimeout (default 5m) if queries are slow but completing within the timeout. Consider Prometheus recording rules or Thanos query federation.
API server pressure. Symptom: throttled API requests, slow pod list responses. Fix: this is rarely the bottleneck since the operator uses informer caches. See API Server Pressure below for details.
Informer cache memory. Symptom: operator memory grows linearly with cluster size even for namespaces without policies. Fix: use watchNamespaces to limit the operator to only the namespaces that have AttunePolicy resources. See Namespace Scoping below.

Diagnosing with metrics¶

The operator exposes metrics on the /metrics endpoint:

Metric	What it tells you
`attune_reconcile_duration_seconds`	How long each policy reconcile takes. P99 > 30s means queries are slow.
`attune_reconcile_duration_seconds_count`	Total reconciles. Compare with error count.
`attune_reconcile_errors_total`	Errors per policy. Prometheus timeouts show here.
`attune_resize_total`	Actual in-place resizes performed.
`attune_eviction_total`	Eviction fallback attempts when in-place resize is not possible.
`attune_reverts_total`	Reverted in-place resizes (safety mechanism).
`workqueue_depth`	Controller work queue depth. Consistently > 0 means the operator can't keep up.
`workqueue_longest_running_processor_seconds`	Longest in-flight reconcile.

Prometheus sizing for Attune¶

Each AttunePolicy generates 2-4 Prometheus queries per reconcile cycle (CPU usage, memory usage, OOM events, restart events). At steady state with a 1-hour cooldown:

Workloads	Queries/hour	Prometheus impact
50	~200	Negligible
500	~2,000	Low (< 1% of a typical Prometheus)
5,000	~20,000	Moderate (ensure Prometheus has 4+ cores, 8Gi+ memory)
10,000+	~40,000+	Significant (use recording rules or Thanos)

API Server Pressure¶

Client-side rate limiting is disabled by default¶

The operator uses controller-runtime v0.24.x, which sets the Kubernetes client's QPS to -1 (disabled). This means there is no client-side rate limiting. All API server throttling is handled by Kubernetes API Priority and Fairness (APF), which is GA since Kubernetes 1.29.

This is intentional and matches the direction of the ecosystem. Operators like cert-manager explicitly recommend disabling client-side limiting on clusters with APF enabled.

Per-reconcile API call budget¶

Most reads go through the informer cache (zero API calls). Writes happen only during resize phases:

Phase	API calls	Notes
Read-only (cached)	0	Get/List via informer cache
Status update	1-2 per policy	Direct write to status subresource
Per pod resize	~5-6 calls	UpdateResize + annotation persist + re-fetch
Safety observation	1-2 per tracked pod	Direct get + update

At steady state (Recommend mode, 1-hour cooldown), a cluster with 500 policies generates roughly 500-1000 API writes per hour, which is negligible for any production API server.

Cloud provider APF limits¶

Provider	Mutating inflight	Total inflight	Notes
AKS	200 (standard) / 50 (free)	600 / 150	Scales with SKU tier
GKE	200	600	Control plane auto-scales
EKS	200	600	Multiple HA API servers

The operator lands in the global-default APF priority level unless you create a custom FlowSchema. For most deployments, the default allocation is more than sufficient.

When to add a custom FlowSchema¶

If the operator is deployed outside kube-system and you want guaranteed API server capacity, create a FlowSchema that assigns its service account to a dedicated priority level:

apiVersion: flowcontrol.apiserver.k8s.io/v1
kind: FlowSchema
metadata:
  name: attune
spec:
  priorityLevelConfiguration:
    name: workload-high
  matchingPrecedence: 1000
  rules:
    - subjects:
        - kind: ServiceAccount
          serviceAccount:
            name: attune
            namespace: attune-system
      resourceRules:
        - verbs: ["*"]
          apiGroups: ["*"]
          resources: ["*"]

Namespace Scoping¶

By default, the operator watches all namespaces for AttunePolicy resources. On large clusters (10,000+ namespaces) where policies exist in only a few namespaces, this wastes informer cache memory watching namespaces that will never have policies.

Set watchNamespaces to limit the operator to specific namespaces:

watchNamespaces:
  - production
  - staging
  - team-alpha

Or via CLI flag:

--watch-namespaces=production,staging,team-alpha

Behavior:

When empty (default): watches all namespaces (cluster-scoped)
When set: only watches the listed namespaces for namespace-scoped resources (Pods, Deployments, HPAs, AttunePolicies, etc.)
Cluster-scoped resources (Nodes, AttuneDefaults) are always watched regardless of this setting
Requires a restart to change the namespace list

Memory impact: On a 10,000-namespace cluster with policies in 50 namespaces, setting watchNamespaces reduces informer cache memory by roughly 99% for namespace-scoped resources (Pods, Deployments, HPAs).

HA Deployment¶

For production clusters, run two replicas with leader election:

replicaCount: 2
leaderElection:
  enabled: true
priorityClassName: system-cluster-critical

The large and xlarge presets automatically set replicaCount: 2. Only one replica is active at a time; the standby takes over in ~15 seconds if the leader fails.

Consider adding a PodDisruptionBudget (the chart creates one when replicaCount > 1) and topologySpreadConstraints to spread replicas across nodes:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app.kubernetes.io/name: attune

Monitoring the Operator¶

Enable the ServiceMonitor for Prometheus Operator integration:

metrics:
  serviceMonitor:
    enabled: true
    additionalLabels:
      release: prometheus   # match your Prometheus Operator selector

Enable the Grafana dashboard for an at-a-glance view:

grafanaDashboard:
  enabled: true

The dashboard shows reconcile latency, queue depth, resize activity, and resource usage in a single pane.