Configuration
This page documents every value in the Helm chart's values.yaml.
Operator
| Key |
Type |
Default |
Description |
replicaCount |
int |
1 |
Number of operator replicas. Set to 2 for HA with leader election. |
image.repository |
string |
ghcr.io/attune-io/attune |
Container image repository |
image.pullPolicy |
string |
IfNotPresent |
Image pull policy |
image.tag |
string |
"" |
Image tag. Defaults to the chart's appVersion. |
imagePullSecrets |
list |
[] |
Image pull secrets for private registries |
nameOverride |
string |
"" |
Override the chart name |
fullnameOverride |
string |
"" |
Override the fully qualified app name |
Service Account
| Key |
Type |
Default |
Description |
serviceAccount.create |
bool |
true |
Create a ServiceAccount for the operator |
serviceAccount.annotations |
object |
{} |
Annotations to add to the ServiceAccount |
serviceAccount.name |
string |
"" |
ServiceAccount name. Auto-generated if empty. |
Pod configuration
| Key |
Type |
Default |
Description |
podAnnotations |
object |
{} |
Additional pod annotations |
podSecurityContext.runAsNonRoot |
bool |
true |
Run pod as non-root |
podSecurityContext.seccompProfile.type |
string |
RuntimeDefault |
Seccomp profile |
securityContext.allowPrivilegeEscalation |
bool |
false |
Deny privilege escalation |
securityContext.capabilities.drop |
list |
["ALL"] |
Drop all Linux capabilities |
securityContext.readOnlyRootFilesystem |
bool |
true |
Read-only root filesystem |
securityContext.runAsNonRoot |
bool |
true |
Run container as non-root |
securityContext.runAsUser |
int |
65532 |
UID for the container process |
securityContext.runAsGroup |
int |
65532 |
GID for the container process |
Cluster Size Presets
| Key |
Type |
Default |
Description |
clusterSize |
string |
"" |
Cluster size preset (small, medium, large, xlarge, or empty). Sets resources, rate limits, and replica count in one shot. See the Scaling Guide for details. |
prometheusQPS |
number |
10 |
Prometheus query rate limit (queries per second). Increase for large clusters with many policies. |
prometheusBurst |
int |
20 |
Prometheus query burst allowance. |
Resources
| Key |
Type |
Default |
Description |
resources |
object |
{} |
Operator pod resource requests and limits. When empty, defaults are derived from clusterSize (or the small tier if clusterSize is also empty). |
resources.limits.cpu |
string |
(preset) |
CPU limit for the operator pod |
resources.limits.memory |
string |
(preset) |
Memory limit for the operator pod |
resources.requests.cpu |
string |
(preset) |
CPU request for the operator pod |
resources.requests.memory |
string |
(preset) |
Memory request for the operator pod |
Scheduling
| Key |
Type |
Default |
Description |
nodeSelector |
object |
{} |
Node selector for operator pods |
tolerations |
list |
[] |
Tolerations for operator pods |
affinity |
object |
{} |
Affinity rules for operator pods |
topologySpreadConstraints |
list |
[] |
Topology spread constraints for operator pods |
priorityClassName |
string |
"" |
Priority class name for the operator pod (recommended: system-cluster-critical for production) |
Leader election
| Key |
Type |
Default |
Description |
leaderElection.enabled |
bool |
true |
Enable leader election. Required for replicaCount > 1. |
Operator metrics
| Key |
Type |
Default |
Description |
metrics.enabled |
bool |
true |
Expose operator metrics endpoint |
metrics.port |
int |
8080 |
Metrics endpoint port |
metrics.serviceMonitor.enabled |
bool |
false |
Create a Prometheus Operator ServiceMonitor |
metrics.serviceMonitor.additionalLabels |
object |
{} |
Extra labels for the ServiceMonitor |
metrics.serviceMonitor.interval |
string |
30s |
Scrape interval |
Webhooks
| Key |
Type |
Default |
Description |
webhooks.enabled |
bool |
true |
Enable admission webhooks for defaulting and validation. Requires cert-manager. |
initialSizing.enabled |
bool |
false |
Enable the pod initial sizing mutating webhook. Sets pod resource requests at creation time based on existing AttunePolicy recommendations. Requires namespace label attune.io/initial-sizing=enabled and initialSizing: true on the policy. |
Grafana Dashboard
| Key |
Type |
Default |
Description |
grafanaDashboard.enabled |
bool |
false |
Create a ConfigMap with the Grafana dashboard. Auto-discovered by the Grafana sidecar via the grafana_dashboard: "1" label. |
grafanaDashboard.additionalLabels |
object |
{} |
Extra labels for the dashboard ConfigMap (e.g., folder selection). |
Network Policy
| Key |
Type |
Default |
Description |
networkPolicy.enabled |
bool |
true |
Enable a NetworkPolicy restricting operator pod traffic to DNS, K8s API, Prometheus, and metrics/health/webhook ports. |
networkPolicy.prometheusPort |
int |
9090 |
TCP port allowed for egress to Prometheus backend pods. Must match the Prometheus pod port, not the Service port. |
Collector Cache
| Key |
Type |
Default |
Description |
collectorTTL |
string |
"10m" |
How long unused Prometheus collectors stay cached before eviction. Maps to the --collector-ttl manager flag. Increase if policies frequently rotate Prometheus addresses; decrease in memory-constrained environments. |
Prometheus Query Timeout
| Key |
Type |
Default |
Description |
prometheusTimeout |
string |
"5m" |
Maximum time allowed for workload processing (including Prometheus queries) during a single reconciliation cycle. Maps to the --prometheus-timeout manager flag. If exceeded, the reconciler uses partial results and surfaces the timeout in the policy's status condition. Increase for clusters with slow Prometheus instances or very large numbers of workloads per policy. |
Namespace Scoping
| Key |
Type |
Default |
Description |
watchNamespaces |
list |
[] |
Namespaces to watch for AttunePolicy resources. Empty list means all namespaces (cluster-scoped). Maps to the --watch-namespaces manager flag. Set this on large clusters where policies exist in only a few namespaces to dramatically reduce informer cache memory. Cluster-scoped resources (Nodes, AttuneDefaults) are always watched regardless. Requires a pod restart to change. |
Example:
watchNamespaces:
- production
- staging
- team-alpha
Reconcile Concurrency
| Key |
Type |
Default |
Description |
maxConcurrentReconciles |
int/string |
"" (1) |
Maximum number of AttunePolicy reconciles running in parallel. Maps to the --max-concurrent-reconciles manager flag. The default (1) processes policies sequentially. Increase for clusters with many policies to reduce reconcile queue latency. Auto-set by clusterSize preset (small=1, medium=2, large=4, xlarge=8). The Prometheus rate limiter (prometheusQPS) is shared across all goroutines, so concurrent reconciles won't overwhelm Prometheus. |
Logging
| Key |
Type |
Default |
Description |
logging.level |
string |
info |
Log level: debug, info, warn, error |
logging.format |
string |
json |
Log format: json or text |
CRD Configuration (AttuneDefaults)
These fields are set on the AttuneDefaults cluster-scoped CRD, not in
the Helm values.yaml. They apply to all AttunePolicy resources.
Cost Pricing
| Field |
Type |
Default |
Description |
costPricing.cpuPerCoreHour |
string |
"0.031" |
USD per vCPU-hour for cost estimation |
costPricing.memoryPerGiBHour |
string |
"0.004" |
USD per GiB-hour for cost estimation |
These values are used to compute status.savings.estimatedMonthlySavings
on each AttunePolicy. Adjust for your cloud provider or reserved
instance pricing.
Inheritable UpdateStrategy Fields
All updateStrategy fields in AttuneDefaults are inherited by policies
that do not set them explicitly. Policy-level values always take precedence.
| Field |
Type |
Default |
Description |
type |
string |
Recommend |
Observe, Recommend, OneShot, Canary, Auto |
cooldown |
duration |
1h |
Minimum time between resizes |
autoRevert |
bool |
true |
Revert unsafe resizes automatically |
resizeMethod |
string |
InPlaceOnly |
InPlaceOnly or InPlaceOrRecreate |
maxConcurrentResizes |
int32 |
1 |
Max pods to resize simultaneously |
maxTotalCpuIncrease |
quantity |
(none) |
Max aggregate CPU increase per cycle |
maxTotalMemoryIncrease |
quantity |
(none) |
Max aggregate memory increase per cycle |
schedule |
object |
(none) |
Time windows, days of week, timezone |
export |
object |
(none) |
Metrics export configuration |
Example: set a cluster-wide maintenance window and budget cap via
AttuneDefaults, then individual policies inherit them unless overridden:
apiVersion: attune.io/v1alpha1
kind: AttuneDefaults
metadata:
name: cluster-defaults
spec:
updateStrategy:
type: Auto
cooldown: 30m
maxTotalCpuIncrease: "2000m"
schedule:
windows:
- start: "02:00"
end: "06:00"
daysOfWeek: [Monday, Tuesday, Wednesday, Thursday, Friday]
timezone: UTC
CRD Configuration (AttuneNamespaceDefaults)
AttuneNamespaceDefaults provides namespace-scoped default values that
override cluster-scoped AttuneDefaults. Policies in the same namespace
inherit these values unless they specify their own.
Precedence order: policy spec > namespace defaults > cluster defaults > built-in defaults
The spec is identical to AttuneDefaults (all fields in
AttuneDefaultsSpec are available). When multiple
AttuneNamespaceDefaults objects exist in the same namespace, the
lexicographically smallest metadata.name wins.
Use case
Different environments often need different right-sizing parameters.
Production namespaces may use higher overheads and conservative
modes, while staging namespaces can be more aggressive:
apiVersion: attune.io/v1alpha1
kind: AttuneNamespaceDefaults
metadata:
name: production-defaults
namespace: production
spec:
cpu:
percentile: 99
overhead: "30"
memory:
percentile: 99
overhead: "50"
allowDecrease: false
updateStrategy:
type: Canary
cooldown: 2h
autoRevert: true
---
apiVersion: attune.io/v1alpha1
kind: AttuneNamespaceDefaults
metadata:
name: staging-defaults
namespace: staging
spec:
cpu:
percentile: 95
overhead: "10"
memory:
percentile: 95
overhead: "20"
updateStrategy:
type: Auto
cooldown: 30m
See the full example in
examples/11-namespace-defaults.yaml.
Available Fields
All fields from AttuneDefaults are available in
AttuneNamespaceDefaults:
| Section |
Fields |
metricsSource |
prometheus.address, prometheus.headers, prometheus.queryParameters, prometheus.bearerTokenSecret, prometheus.tls, datadog.site, datadog.apiKeySecretRef, cloudwatch.region, cloudwatch.clusterName, cloudwatch.roleArn, historyWindow, minimumDataPoints, queryStep, rateWindow |
cpu |
percentile, overhead, minAllowed, maxAllowed, controlledValues, burstSensitivity, allowDecrease, startupBoost, maxChangePercent, maxIncreasePercent, maxDecreasePercent, memoryFromCpuRatio |
memory |
Same as cpu |
updateStrategy |
type, cooldown, autoRevert, resizeMethod, initialSizing, maxConcurrentResizes, maxTotalCpuIncrease, maxTotalMemoryIncrease, schedule, export, canary, safetyObservationPeriod, sloGuardrails |
costPricing |
cpuPerCoreHour, memoryPerGiBHour |
Alternative Metrics Sources
By default, Attune queries Prometheus for CPU and memory usage data.
The CRD also supports Datadog and CloudWatch Container Insights as
alternative metrics sources. At most one of prometheus, datadog, or
cloudwatch may be set per policy.
The Datadog collector queries the /api/v1/query endpoint and converts
nanocores to cores automatically. The CloudWatch collector uses the
Container Insights ContainerInsights namespace and supports IRSA/Pod
Identity credentials with optional cross-account role assumption.
Datadog
| Field |
Type |
Default |
Description |
metricsSource.datadog.site |
string |
datadoghq.com |
Datadog site (e.g., datadoghq.eu, us5.datadoghq.com, ddog-gov.com) |
metricsSource.datadog.apiKeySecretRef.name |
string |
(required) |
Name of the Secret containing the Datadog API key |
metricsSource.datadog.apiKeySecretRef.key |
string |
(required) |
Key within the Secret that holds the API key |
CloudWatch Container Insights
| Field |
Type |
Default |
Description |
metricsSource.cloudwatch.region |
string |
(required) |
AWS region (e.g., us-east-1) |
metricsSource.cloudwatch.clusterName |
string |
(required) |
EKS cluster name for Container Insights metric filtering |
metricsSource.cloudwatch.roleArn |
string |
"" |
Optional IAM role ARN for cross-account access (IRSA/Pod Identity used if empty) |
Policy-Level Fields
spec.paused
| Field |
Type |
Default |
Description |
paused |
bool |
false |
Halts all reconciliation for this policy: no metrics collection, no recommendations, no resizes. Existing resizes are not reverted. The operator sets Ready=False with reason=Paused. |
Directional Change Caps
Per-resource fields in cpu and memory that limit how much a recommendation can change per cycle:
| Field |
Type |
Default |
Description |
maxIncreasePercent |
int32 |
50 |
Maximum percentage increase allowed per resize cycle |
maxDecreasePercent |
int32 |
30 |
Maximum percentage decrease allowed per resize cycle |
maxChangePercent |
int32 |
50/30 |
Symmetric change cap (overridden by directional caps if set) |
Memory-from-CPU Derivation
| Field |
Type |
Default |
Description |
memory.memoryFromCpuRatio |
string |
(none) |
Derives memory recommendation from CPU instead of querying Prometheus for memory metrics. The value is a ratio of GiB per core (e.g., "2.0" means 1 core = 2 GiB memory). Useful for JVM and heap-bound workloads where memory is proportional to CPU. |
SLO Guardrails
Application-level PromQL checks evaluated after each resize during the safety observation period.
| Field |
Type |
Default |
Description |
updateStrategy.sloGuardrails[].name |
string |
(required) |
Identifies this guardrail for logging and status |
updateStrategy.sloGuardrails[].query |
string |
(required) |
PromQL query returning a scalar. Supports {{ .Namespace }}, {{ .WorkloadName }}, {{ .PodName }} template variables. |
updateStrategy.sloGuardrails[].threshold |
string |
(required) |
Value that triggers a revert |
updateStrategy.sloGuardrails[].comparison |
string |
above |
above (revert when value > threshold) or below |
updateStrategy.sloGuardrails[].evaluationWindow |
duration |
5m |
How long after resize to check |
Example:
updateStrategy:
sloGuardrails:
- name: p99-latency
query: 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket{namespace="{{ .Namespace }}"}[5m]))'
threshold: "0.5"
comparison: above
- name: error-rate
query: 'sum(rate(http_requests_total{namespace="{{ .Namespace }}", code=~"5.."}[5m])) / sum(rate(http_requests_total{namespace="{{ .Namespace }}"}[5m]))'
threshold: "0.01"
comparison: above
VPA Recommendation Consumption
| Field |
Type |
Default |
Description |
metricsSource.vpa.name |
string |
(required) |
Name of the VerticalPodAutoscaler object to consume recommendations from |
metricsSource.vpa.namespace |
string |
(policy namespace) |
Namespace of the VPA. Defaults to the policy's namespace. |
At most one of prometheus, datadog, cloudwatch, or vpa may be set per policy.
Initial Sizing
| Field |
Type |
Default |
Description |
updateStrategy.initialSizing |
bool |
false |
When true and the initial sizing webhook is enabled, new pods matching this policy receive recommended resources at creation time via a mutating admission webhook. Requires the namespace label attune.io/initial-sizing=enabled. |
Status Conditions
The controller sets these conditions on each AttunePolicy:
| Condition |
Reasons |
Description |
Ready |
Monitoring, InsufficientData, NoWorkloadsFound, PrometheusUnavailable, InvalidConfig, WorkloadDiscoveryFailed, Paused |
Overall health |
Resizing |
InProgress, Idle, CooldownActive |
Active resize operation state (only in resize modes) |
Degraded |
HighRevertRate |
Set when 3+ of the last 5 resizes were reverted |
Exponential Backoff
When consecutive resizes are reverted, the cooldown doubles per revert
(capped at 16x). A successful resize resets the multiplier.
| Consecutive reverts |
Effective cooldown |
| 0 |
1x base |
| 1 |
2x |
| 2 |
4x |
| 3 |
8x |
| 4+ |
16x (cap) |
Example: HA deployment with ServiceMonitor
replicaCount: 2
leaderElection:
enabled: true
metrics:
serviceMonitor:
enabled: true
additionalLabels:
release: prometheus
resources:
limits:
cpu: 1
memory: 256Mi
requests:
cpu: 200m
memory: 128Mi