HomeBlogHelm Chart Best Practices: Structuring Deployments for Production
KubernetesHelmDevOps

Helm Chart Best Practices: Structuring Deployments for Production

February 14, 2026·13 min read·Omphora Engineering

The chart that works in dev but breaks in prod

Most Helm charts start simple. A Deployment, a Service, maybe an Ingress. You paste in the image name, run helm install, and it works. Then you need to deploy it to staging with different resource limits. Then to prod with different environment variables and secrets. Then a second service with a similar structure.

By that point, the chart is a mess of conditionals, hard-coded values, and environment-specific overrides scattered across three values.yaml files nobody remembers why exist.

This guide covers how to structure Helm charts from the start for multi-environment, production use.

Chart structure

A production-ready chart looks like this:

my-service/
  Chart.yaml
  values.yaml           # base defaults — no environment-specific values here
  values-staging.yaml   # staging overrides
  values-prod.yaml      # prod overrides
  templates/
    _helpers.tpl        # named templates and label helpers
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    pdb.yaml
    serviceaccount.yaml
    configmap.yaml
    externalsecret.yaml  # if using External Secrets Operator

The key rule: values.yaml contains safe defaults that work for local development. Environment-specific files only contain overrides — never duplicate the full structure.

The _helpers.tpl file

Named templates in _helpers.tpl prevent you from copying the same label selectors and resource names across every template file. Always define these:

{{/* Standard labels for all resources */}}
{{- define "my-service.labels" -}}
helm.sh/chart: {{ include "my-service.chart" . }}
app.kubernetes.io/name: {{ include "my-service.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/* Selector labels — these NEVER change after initial deploy */}}
{{- define "my-service.selectorLabels" -}}
app.kubernetes.io/name: {{ include "my-service.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

Critical: selector labels on a Deployment cannot be changed after creation. If you change them, Kubernetes rejects the update. Use the stable subset (name + instance) for selectors, and the full set for labels on the pod template.

Values hierarchy: defaults vs overrides

Your base values.yaml should have every key the chart uses, with sensible defaults:

# values.yaml — complete defaults
replicaCount: 1

image:
  repository: ""      # required — no default
  tag: ""             # required — set at deploy time
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: false
  className: nginx
  host: ""
  tls: false

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 256Mi

autoscaling:
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

podDisruptionBudget:
  enabled: false
  minAvailable: 1

env: {}           # key-value environment variables
envFrom: []       # references to ConfigMaps/Secrets

serviceAccount:
  create: true
  annotations: {}  # used for IRSA on EKS

Then values-prod.yaml only overrides what differs:

# values-prod.yaml — only overrides
replicaCount: 3

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 2
    memory: 1Gi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20

podDisruptionBudget:
  enabled: true
  minAvailable: 2

ingress:
  enabled: true
  host: api.yourdomain.com
  tls: true

Deploy with:

helm upgrade --install my-service ./my-service   -f values.yaml   -f values-prod.yaml   --set image.tag=$IMAGE_TAG

Secrets: never put them in values.yaml

This is the most common mistake. Secrets do not belong in Helm values files — not even base64-encoded. If your values-prod.yaml has a database password, it's in Git history forever.

The right approach is the External Secrets Operator. Your chart creates an ExternalSecret resource that pulls the actual secret from AWS Secrets Manager or HashiCorp Vault at runtime:

# templates/externalsecret.yaml
{{- if .Values.externalSecrets.enabled }}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: {{ include "my-service.fullname" . }}
  labels:
    {{- include "my-service.labels" . | nindent 4 }}
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: {{ include "my-service.fullname" . }}
    creationPolicy: Owner
  data:
    {{- range .Values.externalSecrets.keys }}
    - secretKey: {{ .targetKey }}
      remoteRef:
        key: {{ $.Values.externalSecrets.path }}
        property: {{ .remoteKey }}
    {{- end }}
{{- end }}

In values:

externalSecrets:
  enabled: true
  path: production/my-service
  keys:
    - targetKey: DATABASE_URL
      remoteKey: database_url
    - targetKey: API_KEY
      remoteKey: api_key

Now secrets live in AWS Secrets Manager, rotation is automatic, and your Git history has no credentials.

Resource requests and limits: get these right

Under-specified requests cause scheduling problems. Over-specified limits cause OOMKills and CPU throttling. Both hurt reliability.

Requests determine where the pod schedules. If you set 100m CPU and 128Mi memory, the scheduler finds a node with at least that much available. Set too low and you'll have noisy neighbour problems. Set too high and pods won't schedule.

Limits determine the ceiling. CPU limits cause throttling — even if a node has spare CPU, Kubernetes won't let the container use more than the limit. For latency-sensitive services, set CPU limits generously or omit them. Memory limits cause OOMKill — always set these, as unbounded memory use will kill the node.

A reasonable starting point for a typical web service:

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 1000m   # 4x request headroom for bursts
    memory: 512Mi  # 2x request — tight enough to catch leaks

Use Vertical Pod Autoscaler (VPA) in recommendation mode to tune these after running in production for a week.

Pod Disruption Budgets for zero-downtime deploys

Without a PDB, rolling updates and node drains can take all replicas down simultaneously. A PDB prevents this:

# templates/pdb.yaml
{{- if .Values.podDisruptionBudget.enabled }}
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {{ include "my-service.fullname" . }}
spec:
  {{- if .Values.podDisruptionBudget.minAvailable }}
  minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
  {{- else if .Values.podDisruptionBudget.maxUnavailable }}
  maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "my-service.selectorLabels" . | nindent 6 }}
{{- end }}

Enable PDBs in production. A minAvailable: 1 PDB ensures at least one replica stays healthy during node maintenance, preventing a Karpenter consolidation from briefly taking your service offline.

Chart versioning and App versioning

Keep these separate. Chart.yaml has two version fields:

apiVersion: v2
name: my-service
version: 1.3.0        # chart version — bump on any chart changes
appVersion: "2.1.4"   # application version — your Docker image tag

Bump the chart version when you change templates or add features to the chart itself. The app version tracks your application's release. In CI, override appVersion with the image tag at deploy time:

helm upgrade --install my-service ./my-service   --set image.tag=${{ github.sha }}

Lint and test before merging

# Lint the chart
helm lint ./my-service -f values.yaml -f values-prod.yaml

# Render templates without installing — catch template errors
helm template my-service ./my-service   -f values.yaml   -f values-prod.yaml   --set image.tag=test | kubectl apply --dry-run=client -f -

Add both to your CI pipeline. Helm template errors are fast and cheap to catch in a PR. Finding them in production is not.

Not sure where to start?
Let's talk.

One conversation, no commitment. We listen to what your team is struggling with and give you an honest picture of what needs to change — and what doesn't.

  • What's slowing down your team's deployment process
  • Where your cloud spend is going — and what's being wasted
  • Security vulnerabilities in your current setup
  • Reliability gaps that could cause downtime
  • Blind spots in your monitoring and alerting
Available for new projectsResponse within 1 business dayNo long-term commitment required
your-infra ~ after-omphora
$ terraform apply
✓ 23 resources. Apply complete in 4m 12s
$ kubectl get nodes
NAME STATUS ROLES AGE
ip-10-0-1 Ready worker 2d
ip-10-0-2 Ready worker 2d
ip-10-0-3 Ready worker 2d
$ argocd app list
production Synced Healthy
staging Synced Healthy
$ # Commit → production: 3m 42s
✓ Zero downtime · p99: 82ms · all systems healthy
$ # Example output — results vary by workload.
3m 42s
Deploy time
IaC
Every resource
HA
Built-in reliability