Auto Mode¶

Auto mode is the production end-state for attune. The operator continuously resizes all eligible pods based on observed metrics. Before enabling Auto mode, you should have validated recommendations through Recommend and/or Canary mode.

Prerequisites¶

Before switching to Auto mode:

Run in Recommend mode for at least 1 full history window (default 7 days) to build confidence in the recommendations
Verify recommendations are reasonable using the kubectl plugin:
```
kubectl attune recommendations -n <namespace>
```
Test with Canary mode (optional but recommended) to validate resizes on a subset of pods before the full fleet

Configure appropriate bounds to prevent extreme recommendations:

cpu:
  minAllowed: "50m"    # never go below 50 millicores
  maxAllowed: "4000m"  # never exceed 4 cores
memory:
  minAllowed: "64Mi"   # never go below 64 MiB
  maxAllowed: "8Gi"    # never exceed 8 GiB

Creating an Auto-mode policy¶

apiVersion: attune.io/v1alpha1
kind: AttunePolicy
metadata:
  name: my-app
  namespace: production
spec:
  targetRef:
    kind: Deployment
    selector:
      matchLabels:
        tier: api
  metricsSource:
    prometheus:
      address: http://prometheus-server.monitoring:80
    historyWindow: 168h  # 7 days of data
  cpu:
    percentile: 95
    overhead: "20"
    minAllowed: "50m"
    maxAllowed: "4000m"
    controlledValues: RequestsAndLimits
  memory:
    percentile: 99
    overhead: "30"
    minAllowed: "64Mi"
    maxAllowed: "8Gi"
    controlledValues: RequestsAndLimits
  updateStrategy:
    type: Auto
    cooldown: 1h
    autoRevert: true

Recommended guardrails¶

Setting	Purpose	Suggested value
`overhead`	Headroom above observed usage	20% (CPU), 30% (memory)
`minAllowed/maxAllowed`	Prevent extreme recommendations	Match your resource limits policy
`cooldown`	Time between resizes	1h minimum for production
`autoRevert`	Roll back if pods become unhealthy	`true` for production

The safety monitor watches each resized pod for an observation period before declaring the resize successful. The default is 5 minutes. To configure it, set safetyObservationPeriod:

spec:
  updateStrategy:
    type: Auto
    autoRevert: true
    safetyObservationPeriod: 10m  # safety watch window after each resize

Overhead guidance¶

CPU: 20% overhead works well for steady-state services. Use 50% for bursty workloads.
Memory: 30% overhead is recommended because memory pressure causes OOM kills. Never go below 10% for production.

Monitoring Auto mode¶

Check policy status¶

# Overview of all policies
kubectl attune status -A

# Estimated savings
kubectl attune savings -n production

# Detailed per-container recommendations
kubectl attune recommendations -n production

Watch for degradation¶

The operator sets a Degraded condition when 3 or more of the last 5 resizes are reverted. Monitor this with:

kubectl get attunepolicy -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {range .status.conditions[*]}{.type}={.reason} {end}{"\n"}{end}'

Prometheus metrics¶

The operator exports metrics for dashboarding:

attune_recommendation_cpu_cores -- Recommended CPU per workload
attune_recommendation_memory_bytes -- Recommended memory per workload
attune_confidence -- Confidence score (0-1) per workload
attune_resize_total -- Total successful, failed, and reverted in-place resize operations
attune_eviction_total -- Total eviction fallback attempts when resizeMethod: InPlaceOrRecreate
attune_reverts_total -- Total reverts (broken down by reason)

Alert on high revert rates:

- alert: AttuneHighRevertRate
  expr: rate(attune_reverts_total[1h]) > 0.1
  for: 10m
  annotations:
    summary: "High revert rate for {{ $labels.namespace }}/{{ $labels.workload }}"

Scheduled resizes¶

By default, resizes can occur at any time. Use the schedule field to restrict resizes to specific time windows and days of the week. Recommendations are always computed; only the actual resize execution is gated.

spec:
  updateStrategy:
    type: Auto
    schedule:
      windows:
        - start: "02:00"
          end: "06:00"
      daysOfWeek: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
      timezone: "America/New_York"

Key behavior:

If daysOfWeek is omitted, all days are allowed.
If windows is omitted, all times are allowed (only day filtering applies).
Overnight windows work: start: "22:00", end: "06:00" wraps past midnight.
The ScheduleBlocked status condition is set when outside the window.
An invalid timezone name fails open (resizes are allowed) to prevent silent lockout from a typo.

Combine scheduling with budget caps for large fleets:

spec:
  updateStrategy:
    type: Auto
    schedule:
      windows:
        - start: "02:00"
          end: "06:00"
    maxConcurrentResizes: 10
    maxTotalCpuIncrease: "2000m"
    maxTotalMemoryIncrease: "4Gi"

See examples/12-scheduled-auto-mode.yaml for a complete example. If resizes are blocked unexpectedly, see the troubleshooting guide for schedule-specific diagnostics.

Exporting recommendations to ConfigMaps¶

The export feature writes recommendation data to ConfigMaps for external consumption (e.g., GitOps workflows with ArgoCD or Flux that apply resource patches from CI/CD rather than letting the operator resize directly).

spec:
  updateStrategy:
    type: Recommend  # or Auto
    export:
      configMap: true

When enabled, the operator creates one ConfigMap per workload, named <policy>-<workload>-recommendations, with an owner reference to the policy for automatic cleanup when the policy itself is deleted.

The ConfigMap contains per-container recommended CPU and memory values plus a last-updated timestamp (RFC3339).

Example ConfigMap content:

apiVersion: v1
kind: ConfigMap
metadata:
  name: my-app-my-deployment-recommendations
  namespace: default
  labels:
    attune.io/policy: my-app
    attune.io/workload: my-deployment
data:
  workload: my-deployment
  kind: Deployment
  main.cpu-request: "250m"
  main.memory-request: "512Mi"
  main.cpu-limit: "500m"
  main.memory-limit: "1Gi"
  main.confidence: "0.92"
  last-updated: "2026-05-29T14:30:00Z"

Inspect exports with the plugin (recommended over raw kubectl get cm):

kubectl attune export list -n <ns>
# or with last-updated and container counts across all ns
kubectl attune export -A

Orphan cleanup: When a workload leaves the policy's selector (for example after a selector change or workload deletion while the policy still exists), the corresponding recommendation ConfigMap is automatically deleted on the next reconcile. This prevents stale recommendation data from lingering for GitOps consumers. Only ConfigMaps carrying the attune.io/policy label for that specific policy are considered for cleanup.

Any ConfigMap in the policy's namespace bearing the attune.io/policy label is treated as owned by this AttunePolicy for cleanup purposes. This is an intentional part of the feature's trust model (label-based management within the namespace).

This is useful in GitOps workflows where:

The operator runs in Recommend mode to compute recommendations.
A CI/CD pipeline reads the ConfigMaps and generates resource patches.
ArgoCD or Flux applies the patches through the normal GitOps flow.

kubectl patch attunepolicy my-app --type merge \
  -p '{"spec":{"updateStrategy":{"type":"Auto","autoRevert":true}}}'

From Canary mode¶

kubectl patch attunepolicy my-app --type merge \
  -p '{"spec":{"updateStrategy":{"type":"Auto"}}}'

Rollback¶

If Auto mode causes issues, switch back to Recommend immediately:

kubectl patch attunepolicy my-app --type merge \
  -p '{"spec":{"updateStrategy":{"type":"Recommend"}}}'

This stops all future resizes. Already-resized pods keep their current resources until their next restart.