Skip to content

API Reference

AttunePolicy

Group: attune.io
Version: v1alpha1
Scope: Namespaced
Short name: rsp

Defaulting Behavior

Fields are defaulted in three layers. Only weight and maxConcurrentResizes appear in the stored spec when omitted by the user (they are CRD schema or webhook defaults). All other defaultable fields (type, controlledValues, cooldown, historyWindow, minimumDataPoints, queryStep, rateWindow, autoRevert, resizeMethod, cpu.maxChangePercent, memory.maxChangePercent, safetyObservationPeriod) are applied by the controller at reconcile time so that cluster-wide AttuneDefaults and namespace-scoped AttuneNamespaceDefaults can override them. These fields will appear empty in kubectl get rsp -o yaml but still control runtime behavior through the controller's built-in and inherited defaults unless you override them. Use kubectl attune explain -n <namespace> <policy> to see the effective values for the key controller-applied defaults and whether each came from the policy, a namespace default, a cluster default, or the built-in default.

Spec

apiVersion: attune.io/v1alpha1
kind: AttunePolicy
metadata:
  name: example
  namespace: default
spec:
  # Target workload(s) to right-size.
  targetRef:
    kind: Deployment            # Deployment | StatefulSet | DaemonSet | CronJob | Job | ReplicaSet
    name: my-app                # optional: target a specific workload
    selector:                   # optional: target by label selector
      matchLabels:
        tier: api

  # Prometheus metrics configuration.
  metricsSource:
    prometheus:
      address: "http://prometheus:9090"   # Prometheus-compatible URL
      headers:                            # optional: non-secret tenant or routing headers
        X-Scope-OrgID: "my-tenant"        # e.g., Mimir tenant ID
      queryParameters:                    # optional: URL params for Thanos/VictoriaMetrics
        dedup: "true"                     # e.g., Thanos deduplication
        partial_response: "true"          # reserved query keys like query/start/end/step/time/timeout are rejected
      bearerTokenSecret:                  # optional: auth from Secret in the policy namespace
        name: prometheus-token
        key: token
      tls:                                # optional: TLS settings
        insecureSkipVerify: false
    # Alternative: consume VPA recommendations instead of querying Prometheus.
    # At most one of prometheus, datadog, cloudwatch, or vpa may be set.
    # vpa:
    #   name: my-vpa                       # VPA object name
    #   namespace: default                 # defaults to policy namespace
    historyWindow: 168h                    # lookback window (default: 168h)
    minimumDataPoints: 48                  # min samples before recommending (default: 48)
    queryStep: 5m                          # Prometheus range query step interval (default: 5m)
    rateWindow: 5m                         # PromQL rate() window for CPU queries (default: queryStep)

  # CPU recommendation parameters.
  cpu:
    percentile: 95             # target percentile: 50, 90, 95, or 99
    overhead: "20"        # percentage headroom above percentile
    burstSensitivity: "0.1"   # burst boost multiplier (0 = disabled, max 1.0)
    startupBoost:              # optional: temporary CPU boost for cold starts
      multiplier: "3.0"        # scale factor for startup CPU (1.1-10.0)
      duration: 2m             # boost window after pod creation (10s-1h)
    minAllowed: "1m"             # optional: min clamp
    maxAllowed: "4000m"          # optional: max clamp (upper limit: 256 cores)
    controlledValues: RequestsAndLimits  # RequestsOnly | RequestsAndLimits
    maxChangePercent: 50       # max CPU change per cycle (default: 50)
    maxIncreasePercent: 50     # max increase per cycle (default: 50)
    maxDecreasePercent: 30     # max decrease per cycle (default: 30)

  # Memory recommendation parameters.
  memory:
    percentile: 99
    overhead: "30"
    burstSensitivity: "0.1"
    minAllowed: "4Mi"
    maxAllowed: "8Gi"                 # upper limit: 16Ti
    controlledValues: RequestsAndLimits
    allowDecrease: false       # prevent memory decreases (recommended)
    maxChangePercent: 30       # max memory change per cycle (default: 30)
    maxIncreasePercent: 50     # max increase per cycle (default: 50)
    maxDecreasePercent: 30     # max decrease per cycle (default: 30)
    memoryFromCpuRatio: "2.0"  # optional: derive memory from CPU (GiB per core)

  # Pause reconciliation (no metrics, no recommendations, no resizes).
  paused: false                # default: false

  # How and when to apply changes.
  updateStrategy:
    type: Recommend            # Observe | Recommend | OneShot | Canary | Auto
    initialSizing: false       # optional: set pod resources at creation time via webhook
    canary:                    # required when mode is Canary
      percentage: 10           # % of pods to resize first
      observationPeriod: 30m   # watch canary pods before proceeding (minimum: 1m)
      autoPromote: true        # promote to full fleet after safe observation (default: false)
    cooldown: 1h               # min time between resize operations (default: 1h)
    autoRevert: true           # revert on safety violation (default: true)
    safetyObservationPeriod: 5m  # post-resize safety watch period (default: 5m, minimum: 1m)
    resizeMethod: InPlaceOnly  # InPlaceOnly | InPlaceOrRecreate (default: InPlaceOnly)
    maxConcurrentResizes: 1    # parallel pod resizes per cycle (default: 1, max: 50)
    maxTotalCpuIncrease: "2000m"    # max aggregate CPU increase per cycle (default: unlimited)
    maxTotalMemoryIncrease: "4Gi"   # max aggregate memory increase per cycle (default: unlimited)
    export:                         # optional: export recommendations to ConfigMaps
      configMap: true               # creates <policy>-<workload>-recommendations ConfigMap
    sloGuardrails:                  # optional: revert if application SLOs breach after resize
      - name: p99-latency
        query: 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{namespace="{{ .Namespace }}"}[5m]))'
        threshold: "0.5"
        comparison: above
    schedule:                       # optional: restrict when resizes can occur
      windows:
        - start: "02:00"           # HH:MM (24-hour)
          end: "06:00"
      daysOfWeek: ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
      timezone: "America/New_York" # IANA timezone (default: UTC)

  # Containers to skip (e.g., service mesh sidecars).
  excludedContainers:
    - istio-proxy
    - linkerd-proxy

  # Policy priority (1-1000, higher wins). Default: 100.
  weight: 100

Status

Field Type Description
conditions []Condition Standard Kubernetes conditions (Ready, Resizing, Degraded, ScheduleBlocked)
cooldown.effectiveCooldown Duration Current cooldown including exponential backoff
cooldown.backoffMultiplier int32 Current backoff multiplier (1, 2, 4, 8, or 16)
cooldown.consecutiveReverts int32 Number of consecutive reverts driving the backoff
workloads.discovered int32 Number of workloads matching the target
workloads.withRecommendations int32 Workloads with active recommendations
workloads.resized int32 Workloads that have been resized
workloads.pending int32 Workloads awaiting resize
workloads.dataPointsCollected int32 Max data points collected across all containers
workloads.dataPointsRequired int32 Minimum data points needed before recommendations
recommendations[].workload string Workload name
recommendations[].kind string Workload kind
recommendations[].containers[].name string Container name
recommendations[].containers[].current ResourceValues Current CPU/memory requests and limits
recommendations[].containers[].recommended ResourceValues Proposed CPU/memory requests and limits
recommendations[].containers[].explanation ContainerRecommendationExplanation Persisted reasoning used by kubectl attune explain
recommendations[].containers[].explanation.cpu ResourceRecommendationExplanation CPU estimator-chain details
recommendations[].containers[].explanation.memory ResourceRecommendationExplanation Memory estimator-chain details
recommendations[].containers[].confidence float64 Confidence score (0-1)
recommendations[].containers[].dataPoints int32 Prometheus samples used
recommendations[].containers[].lastUpdated Time Last recommendation timestamp
recommendations[].stale bool true when Prometheus returned no fresh data; resize is blocked until fresh data arrives
recommendations[].lastDataTime Time Timestamp of the most recent Prometheus data point

| savings.cpuRequestReduction | string | Total CPU request reduction (e.g. "1200m") | | savings.cpuRequestTotal | string | Total current CPU requests across all workloads (e.g. "2000m") | | savings.memoryRequestReduction | string | Total memory request reduction (e.g. "2Gi") | | savings.memoryRequestTotal | string | Total current memory requests across all workloads (e.g. "2Gi") | | savings.estimatedMonthlySavings | string | Estimated monthly cost savings | | savings.cpuRequestIncrease | string | Total CPU increase for under-provisioned workloads (e.g. "500m") | | savings.memoryRequestIncrease | string | Total memory increase for under-provisioned workloads (e.g. "512Mi") | | savings.estimatedMonthlyCostIncrease | string | Estimated monthly cost increase for under-provisioned workloads | | resizeHistory[].timestamp | Time | When the resize occurred | | resizeHistory[].workload | string | Resized workload name | | resizeHistory[].container | string | Resized container name | | resizeHistory[].resource | string | cpu, memory, or cpu+memory | | resizeHistory[].from | string | Previous value | | resizeHistory[].to | string | New value | | resizeHistory[].method | string | InPlace or Eviction | | resizeHistory[].result | string | Success, Failed, Reverted, or Evicted | | resizeHistory[].reason | string | Why a resize was reverted or failed (e.g. oomkill, restart, notready). Empty for successful resizes. | | workloadErrors[].workload | string | Workload name that encountered an error during reconciliation | | workloadErrors[].error | string | Human-readable error description | | canary.phase | string | CanaryInProgress or FullRollout | | canary.startTime | Time | When the canary subset was first resized | | canary.observedGeneration | int64 | Policy generation when this canary cycle started | | canary.pods | []string | Names of pods selected for the canary subset |

ResourceRecommendationExplanation contains the intermediate fields emitted by the estimator chain: rawPercentile, overhead, afterOverhead, burstFactor, afterBurst, confidence, confidenceFactor, afterConfidence, bounds, boundsApplied, afterBounds, minChangePercent, maxChangePercent, changeFilterApplied, afterChangeFilter, final, and optional finalAdjustment.

Condition types

Type Reasons Description
Ready Monitoring, InsufficientData, NoWorkloadsFound, PrometheusUnavailable, InvalidConfig, WorkloadDiscoveryFailed, Paused Overall health
Resizing InProgress, Idle, CooldownActive Active resize operation state
Degraded HighRevertRate High revert rate detected (3+ of last 5 reverted)
ScheduleBlocked OutsideWindow, InsideWindow Whether the current time is within the configured resize schedule window
kubectl get rsp
NAME     MODE        WORKLOADS   RECS   RESIZED   READY   AGE

Pass -o wide to include CPU Saved and Mem Saved columns.

Kubernetes Events

The operator emits Kubernetes events on the AttunePolicy object. View them with kubectl describe attunepolicy <name> or kubectl get events --field-selector involvedObject.kind=AttunePolicy.

Event Reason Type Description
RecommendationsReady Normal First recommendations became available (transitions from 0 to >0 workloads with data)
Resized Normal A container was successfully resized in-place
DecreaseSuppressed Normal A CPU or memory decrease was blocked by allowDecrease=false
ScheduleSkipped Normal Resize was skipped because the current time is outside the configured schedule window
ResizeFailed Warning An in-place resize API call failed
BudgetExhausted Warning The per-reconcile resize budget was exhausted before all workloads could be resized
InfeasibleBlocked Warning A resize was blocked because it would exceed node capacity
ResizeSkipped Warning A resize was skipped (e.g. pod in bad state, rolling out)
Reverted Warning A resize was reverted due to safety observation failure (OOMKill, CPU throttle, restarts)
Evicted Warning A pod was evicted as a fallback when in-place resize was not possible
StaleRecommendation Warning Recommendations are stale (no fresh Prometheus data)
CooldownActive Normal Resize deferred because the cooldown period has not elapsed
HPAConflict Warning An HPA targets the same workload and may conflict with resizing
VPAConflict Warning A VPA targets the same workload
ConfigClamped Warning A policy field was clamped to its allowed range at runtime
ExportFailed Warning Failed to export recommendations to ConfigMap
RestartOnResize Normal Container will restart on resize due to RestartContainer resize policy
MemoryLimitClamped Normal Memory limit decrease skipped due to K8s v1.33 restriction
PolicyConflict Warning Multiple policies target the same workload
RolloutInProgress Normal Resize skipped because the workload is mid-rollout
WorkloadOptOut Normal Workload opted out via annotation

Events use 1-hour deduplication to prevent log spam. Identical events are emitted at most once per hour; condition changes produce new events immediately. Specific events can be suppressed per-policy using the attune.io/suppress-warnings annotation (comma-separated list of event reasons).


AttuneDefaults

Scope: Cluster
Short name: rsd

apiVersion: attune.io/v1alpha1
kind: AttuneDefaults
metadata:
  name: default
spec:
  metricsSource:    # same structure as AttunePolicy.spec.metricsSource
    prometheus:
      address: http://prometheus-server.monitoring:80
    historyWindow: 168h
    minimumDataPoints: 48
    queryStep: 5m
    rateWindow: 5m
  cpu:              # same structure as AttunePolicy.spec.cpu
    percentile: 95
    overhead: "20"
    controlledValues: RequestsAndLimits
  memory:           # same structure as AttunePolicy.spec.memory
    percentile: 99
    overhead: "30"
    controlledValues: RequestsAndLimits
    allowDecrease: false
  updateStrategy:   # same structure as AttunePolicy.spec.updateStrategy
    type: Recommend
    cooldown: 1h
    autoRevert: true
  costPricing:      # optional, for EstimatedMonthlySavings computation
    cpuPerCoreHour: "0.031"     # USD per vCPU-hour (default: $0.031)
    memoryPerGiBHour: "0.004"   # USD per GiB-hour (default: $0.004)

AttuneDefaults fields are merged into every AttunePolicy at reconciliation time. Policy-level values always take precedence.

Cost pricing

The costPricing section configures per-unit pricing used to compute status.savings.estimatedMonthlySavings. If omitted, standard on-demand Linux pricing is used.

Field Default Description
cpuPerCoreHour 0.031 Cost per vCPU-hour
memoryPerGiBHour 0.004 Cost per GiB-hour

Formula: (cpuCoresSaved * cpuPrice + memGiBSaved * memPrice) * 730 hours/month

Reference pricing by provider

The defaults use AWS on-demand pricing. Adjust for your environment:

Provider Instance type cpuPerCoreHour memoryPerGiBHour Notes
AWS (default) m6i on-demand "0.031" "0.004" US East, Linux
AWS Savings Plans m6i 1yr "0.020" "0.003" ~35% discount
GCP on-demand e2-standard "0.034" "0.005" US
GCP committed e2-standard 1yr "0.022" "0.003" ~35% discount
Azure PAYG D4s v5 "0.036" "0.005" East US
Azure Reserved D4s v5 1yr "0.022" "0.003" ~38% discount
On-prem bare metal "0.010" "0.001" Amortized hardware

These are approximate. Use your actual billing data for accurate savings estimates. For reserved instances, use the reserved rate so savings reflect true recoverability.

Webhook validation

AttuneDefaults and AttuneNamespaceDefaults both have validating webhooks that reject invalid costPricing, schedule, and Prometheus address values. If cpuPerCoreHour or memoryPerGiBHour is set, the webhook validates that each is a parseable positive float. Invalid values (e.g., "banana", "-0.5"), invalid schedule settings, and blocked Prometheus addresses are rejected at admission time.


AttuneNamespaceDefaults

Scope: Namespaced Short name: rsnd

apiVersion: attune.io/v1alpha1
kind: AttuneNamespaceDefaults
metadata:
  name: production-defaults
  namespace: production
spec:
  # Same fields as AttuneDefaults.spec
  metricsSource:
    prometheus:
      address: http://prometheus-server.monitoring:80
  cpu:
    percentile: 99
    overhead: "30"
  memory:
    percentile: 99
    overhead: "50"
    allowDecrease: false
  updateStrategy:
    type: Canary
    cooldown: 2h
    autoRevert: true

AttuneNamespaceDefaults provides per-namespace defaults that override cluster-scoped AttuneDefaults. This enables different configurations for different environments (e.g., conservative settings for production, aggressive settings for staging).

Resolution order: policy spec first, then one defaults source.

If a namespace has an AttuneNamespaceDefaults, the controller uses it instead of the cluster-scoped AttuneDefaults for all policies in that namespace. Fields not specified in the namespace defaults are not inherited from cluster defaults; they fall back to the operator's built-in defaults.

If multiple defaults objects exist at the same scope, selection is deterministic: the controller uses the object with the lexicographically smallest metadata.name.