Auto Mode¶
Auto mode is the production end-state for attune. The operator continuously resizes all eligible pods based on observed metrics. Before enabling Auto mode, you should have validated recommendations through Recommend and/or Canary mode.
Prerequisites¶
Before switching to Auto mode:
- Run in Recommend mode for at least 1 full history window (default 7 days) to build confidence in the recommendations
- Verify recommendations are reasonable using the kubectl plugin:
kubectl attune recommendations -n <namespace> - Test with Canary mode (optional but recommended) to validate resizes on a subset of pods before the full fleet
- Configure appropriate bounds to prevent extreme recommendations:
cpu: minAllowed: "50m" # never go below 50 millicores maxAllowed: "4000m" # never exceed 4 cores memory: minAllowed: "64Mi" # never go below 64 MiB maxAllowed: "8Gi" # never exceed 8 GiB
Creating an Auto-mode policy¶
apiVersion: attune.io/v1alpha1
kind: AttunePolicy
metadata:
name: my-app
namespace: production
spec:
targetRef:
kind: Deployment
selector:
matchLabels:
tier: api
metricsSource:
prometheus:
address: http://prometheus-server.monitoring:80
historyWindow: 168h # 7 days of data
cpu:
percentile: 95
overhead: "20"
minAllowed: "50m"
maxAllowed: "4000m"
controlledValues: RequestsAndLimits
memory:
percentile: 99
overhead: "30"
minAllowed: "64Mi"
maxAllowed: "8Gi"
controlledValues: RequestsAndLimits
updateStrategy:
type: Auto
cooldown: 1h
autoRevert: true
Recommended guardrails¶
| Setting | Purpose | Suggested value |
|---|---|---|
overhead |
Headroom above observed usage | 20% (CPU), 30% (memory) |
minAllowed/maxAllowed |
Prevent extreme recommendations | Match your resource limits policy |
cooldown |
Time between resizes | 1h minimum for production |
autoRevert |
Roll back if pods become unhealthy | true for production |
The safety monitor watches each resized pod for an observation period before
declaring the resize successful. The default is 5 minutes. To configure it,
set safetyObservationPeriod:
spec:
updateStrategy:
type: Auto
autoRevert: true
safetyObservationPeriod: 10m # safety watch window after each resize
Overhead guidance¶
- CPU: 20% overhead works well for steady-state services. Use 50% for bursty workloads.
- Memory: 30% overhead is recommended because memory pressure causes OOM kills. Never go below 10% for production.
Monitoring Auto mode¶
Check policy status¶
# Overview of all policies
kubectl attune status -A
# Estimated savings
kubectl attune savings -n production
# Detailed per-container recommendations
kubectl attune recommendations -n production
Watch for degradation¶
The operator sets a Degraded condition when 3 or more of the last 5 resizes are reverted.
Monitor this with:
kubectl get rsp -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {range .status.conditions[*]}{.type}={.reason} {end}{"\n"}{end}'
Prometheus metrics¶
The operator exports metrics for dashboarding:
attune_recommendation_cpu_cores-- Recommended CPU per workloadattune_recommendation_memory_bytes-- Recommended memory per workloadattune_confidence-- Confidence score (0-1) per workloadattune_resize_total-- Total successful, failed, and reverted in-place resize operationsattune_eviction_total-- Total eviction fallback attempts whenresizeMethod: InPlaceOrRecreateattune_reverts_total-- Total reverts (broken down by reason)
Alert on high revert rates:
- alert: AttuneHighRevertRate
expr: rate(attune_reverts_total[1h]) > 0.1
for: 10m
annotations:
summary: "High revert rate for {{ $labels.namespace }}/{{ $labels.workload }}"
Scheduled resizes¶
By default, resizes can occur at any time. Use the schedule field to restrict
resizes to specific time windows and days of the week. Recommendations are always
computed; only the actual resize execution is gated.
spec:
updateStrategy:
type: Auto
schedule:
windows:
- start: "02:00"
end: "06:00"
daysOfWeek: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
timezone: "America/New_York"
Key behavior:
- If
daysOfWeekis omitted, all days are allowed. - If
windowsis omitted, all times are allowed (only day filtering applies). - Overnight windows work:
start: "22:00", end: "06:00"wraps past midnight. - The
ScheduleBlockedstatus condition is set when outside the window. - An invalid timezone name fails open (resizes are allowed) to prevent silent lockout from a typo.
Combine scheduling with budget caps for large fleets:
spec:
updateStrategy:
type: Auto
schedule:
windows:
- start: "02:00"
end: "06:00"
maxConcurrentResizes: 10
maxTotalCpuIncrease: "2000m"
maxTotalMemoryIncrease: "4Gi"
See examples/12-scheduled-auto-mode.yaml for a complete example.
If resizes are blocked unexpectedly, see the troubleshooting guide for schedule-specific diagnostics.
Exporting recommendations to ConfigMaps¶
The export feature writes recommendation data to ConfigMaps for external
consumption (e.g., GitOps workflows with ArgoCD or Flux that apply resource
patches from CI/CD rather than letting the operator resize directly).
spec:
updateStrategy:
type: Recommend # or Auto
export:
configMap: true
When enabled, the operator creates one ConfigMap per workload, named
<policy>-<workload>-recommendations, with an owner reference to the policy
for automatic cleanup. The ConfigMap contains per-container recommended CPU
and memory values.
This is useful in GitOps workflows where:
- The operator runs in Recommend mode to compute recommendations.
- A CI/CD pipeline reads the ConfigMaps and generates resource patches.
- ArgoCD or Flux applies the patches through the normal GitOps flow.
Promoting from Recommend or Canary¶
From Recommend mode¶
kubectl patch rsp my-app --type merge \
-p '{"spec":{"updateStrategy":{"type":"Auto","autoRevert":true}}}'
From Canary mode¶
kubectl patch rsp my-app --type merge \
-p '{"spec":{"updateStrategy":{"type":"Auto"}}}'
Rollback¶
If Auto mode causes issues, switch back to Recommend immediately:
kubectl patch rsp my-app --type merge \
-p '{"spec":{"updateStrategy":{"type":"Recommend"}}}'
This stops all future resizes. Already-resized pods keep their current resources until their next restart.