Options available for Health Checks with Helm Charts

Chris Harwell
11 min readJun 5, 2023

--

While Helm does not inherently provide health checks, alerting, or automatic pod restarting, it leverages Kubernetes’ features for these purposes, and Helm Charts can be designed to accommodate these features.

Here are ways to incorporate health checks and other related features directly in Helm Charts:

  1. Defining Probes in Helm Chart: As described below, Liveness, Readiness, and Startup probes can be defined in the Helm Chart templates. These probes allow Kubernetes to check the health of the pods and respond accordingly. Helm allows you to template these, and their values can be configured in the values.yaml file.
  2. Using Helm hooks: Helm has a feature called hooks that allows you to specify code to run at specific points in the Helm lifecycle. For example, you could create a hook that runs a health check script after all resources in a chart have been loaded.
  3. Kubernetes Jobs and Helm: You can run one-off tasks, like complex health checks, migrations, or other batch jobs using Kubernetes Jobs. You can define these jobs within Helm charts and manage them as part of your application’s deployment lifecycle.
  4. Resource Limits and Requests: In Helm Charts, you can specify resource requests and limits to ensure the pods have enough resources to run. If a pod exceeds its resource limit, it could be restarted.
  5. Pod Disruption Budgets: A Pod Disruption Budget (PDB) is a Kubernetes API object that sets limits on the number of voluntary disruptions for Pods. You can define PDBs within your Helm charts to ensure a minimum number of Pods for a replicated application is always available.
  6. Kubernetes Horizontal Pod Autoscaler (HPA) with Helm: HPAs can be defined within Helm Charts, allowing Kubernetes to automatically scale your application based on CPU or memory usage.
  7. Custom Alerting: While the Helm chart itself doesn’t inherently support alerting, you can deploy resources that do, such as a Prometheus exporter or a custom component that integrates with an external alerting system.

Helm is essentially a package manager for Kubernetes. It helps you manage and deploy applications onto the Kubernetes system. The “health checks” or “alerting” capabilities you might want to add are typically features provided by Kubernetes itself or are added as part of the application/container configuration within the Kubernetes resources that are templated by Helm Charts.

Liveness and Readiness Probes

Liveness and Readiness Probes are crucial parts of the Kubernetes ecosystem that help keep applications healthy and traffic efficiently routed. Let’s go into a bit more detail:

Liveness Probes

A liveness probe in Kubernetes checks whether your container is running the application as expected. If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.

The main purpose of using a liveness probe is to catch and remedy situations where your application might be running but is unable to make progress. Such situations could occur if an application has deadlocked due to multi-threading defects or if it’s stuck in an infinite loop.

Here’s an example of how you might define a liveness probe using an HTTP GET request:

livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 20

In this example, the kubelet will start performing health checks 15 seconds after the container starts. After that, it will perform a check every 20 seconds.

Readiness Probes:

Readiness probes determine when a container is ready to start accepting traffic. A pod is considered ready when all of its containers are ready. If a container is not ready, it won’t receive traffic from the kube-proxy.

One use case for a readiness probe might be to control which pods are used as backends for a service. If a pod is not ready, it is removed from service load balancers.

Here’s an example of a readiness probe:

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10

In this example, the kubelet starts the readiness checks 5 seconds after the container starts. After that, it will perform a check every 10 seconds.

Both liveness and readiness probes can be implemented using three mechanisms:

  • httpGet: Performs an HTTP GET operation on the specified path and port of the IP address of the pod. The probe is successful if the HTTP status code is between 200 and 400.
  • tcpSocket: Performs a TCP check on the specified port of the IP address of the pod. The probe is successful if the port is open.
  • exec: Executes a specified command within the container. The probe is successful if the command exits with a status code of 0.

Both liveness and readiness probes can also be fine-tuned with various parameters like initialDelaySeconds, periodSeconds, timeoutSeconds, successThreshold, and failureThreshold. For instance, initialDelaySeconds is used to mention the delay before the probe starts after the container has started, and periodSeconds mentions how often to run the probe.

Helm Hooks

Helm Hooks are a powerful feature that allows you to add lifecycle events to the management of your applications deployed with Helm. These hooks enable you to run jobs at specific points in a release cycle.

You can use hooks to perform various tasks:

  • Prepare the environment before installing a chart (pre-install hook)
  • Clean up after removing a chart (post-delete hook)
  • Run database migrations (pre-upgrade and post-upgrade hooks)
  • Perform health checks or other forms of validation (post-install and post-upgrade hooks)

Here are the different lifecycle events (hooks) Helm provides:

  • `pre-install`: Executes after templates are rendered, but before any resources are created in Kubernetes.
  • `post-install`: Executes after all resources are loaded into Kubernetes.
  • `pre-delete`: Executes before any resources are removed from Kubernetes.
  • `post-delete`: Executes after all of the release’s resources have been deleted.
  • `pre-upgrade`: Executes after templates are rendered, but before any resources are updated.
  • `post-upgrade`: Executes after all resources are updated in Kubernetes.
  • `pre-rollback`: Executes after templates are rendered, but before any resources are rolled back.
  • `post-rollback`: Executes after all resources are rolled back in Kubernetes.
  • `test`: This is a special hook used to define integration tests.

To define a hook in your Helm chart, you simply add the annotation `helm.sh/hook` in the metadata of a Kubernetes resource in your chart. Here’s an example:

apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}"
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded
spec:
template:
spec:
containers:
- name: post-install-job
image: my-job-image
command: ['do', 'something']
restartPolicy: OnFailure

In this example:

  • `”helm.sh/hook”: post-install` tells Helm to run this job after all resources are loaded into Kubernetes.
  • `”helm.sh/hook-weight”: “-5”` tells Helm to sort hooks in ascending order based on their weight. Hooks with the same weight run concurrently.
  • `”helm.sh/hook-delete-policy”: hook-succeeded` tells Helm to delete the hook after it successfully executes.

It’s important to note that Helm does not track the status of hooks for long-running operations, and hooks do not prevent or delay the deployment of the application if they fail. Helm’s primary expectation is for hooks to be fast, lightweight tasks related to the deployment or management of an application.

Kubernetes Jobs and Helm

Kubernetes Jobs are designed to run non-parallelizable and parallelizable tasks called batch jobs within your Kubernetes cluster. These tasks are meant to run until completion, such as performing a calculation or processing data.

Jobs create one or more pods and ensure that a specified number of them successfully terminate. When a specified number of successful completions is reached, the job itself is complete. Deleting a Job will clean up the pods it created.

You can use Jobs in Helm to automate certain tasks that are part of your deployment lifecycle but are not part of the running application itself. For example, you might want to run a script to set up a database or perform a migration. These one-time tasks can be managed within your Helm Chart using Kubernetes Jobs.

Here’s a basic example of how you might define a Job within a Helm Chart:

apiVersion: batch/v1
kind: Job
metadata:
name: "{{ .Release.Name }}-migration"
spec:
template:
spec:
containers:
- name: migration
image: my-migration-image
command: ['run', 'migration']
restartPolicy: OnFailure

This Job runs a single pod that executes the command ‘run migration’. If the container exits with a failure, Kubernetes will restart the container according to the ‘OnFailure’ policy.

You can customize the Job further to fit your specific use case. For example, you might want to use the pre-install or post-install hooks to run the Job before or after other resources are installed.

Note that the success of a Job is determined by the success of its pods. Kubernetes will automatically retry the Job until it succeeds, reaches a retry limit, or is manually interrupted.

Note that Jobs are designed for one-time or batch processes, not for long-running services. For long-running services, you would use other types of Kubernetes resources, like Deployments or StatefulSets.

Resource Limits and Requests

In Kubernetes, when you specify a Pod, you can optionally set how much of each resource a container needs. The most common resources to specify are CPU and memory. You can set both requests and limits for each of these resources.

Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource.

Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.

Here’s how you might define requests and limits in a Helm Chart:

apiVersion: v1
kind: Pod
metadata:
name: "{{ .Release.Name }}-myapp"
spec:
containers:
- name: myapp-container
image: myapp:latest
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"

In this example:

  • The `myapp-container` container is guaranteed at least 64Mi of memory and 250m CPU (where 1000m = 1 core).
  • Kubernetes will not allow the `myapp-container` container to use more than 128Mi of memory or 500m CPU.

This specification helps Kubernetes make more efficient decisions about which nodes to place pods on. It also helps prevent resource-intensive applications from monopolizing resources, which can lead to system instability.

If a Pod exceeds its memory limit, it could be terminated. If it’s restartable, Kubernetes will restart it, subject to the Pod’s `restartPolicy`.

Exceeding CPU limit is handled differently. Kubernetes will not terminate the Pod, but the CPU will be throttled, and the Pod is not guaranteed to be scheduled on CPU cycles in certain conditions.

These resource requests and limits can be parameterized in Helm Chart using values.yaml file, providing more flexibility in resource allocation across different environments like development, testing, or production.

Pod Disruption Budgets

A Pod Disruption Budget (PDB) in Kubernetes allows you to specify the minimum number of replicas that an application can tolerate having, relative to how many it is intended to have. It provides some level of assurance about the reliability of a certain type of application during voluntary disruptions, which are disruptions that can be controlled by Kubernetes, as opposed to involuntary disruptions like hardware failures or system crashes.

Voluntary disruptions may be caused by operations such as:

  • Pod eviction due to resource contention
  • Nodes being stopped for maintenance or upgrades
  • Application scaling down due to changing resource requirements

When creating a PDB, you can specify the `minAvailable` field to denote the minimum number of Pods that must remain available. Alternatively, you can specify `maxUnavailable` to denote the maximum number of Pods that can be unavailable during the disruption.

Here’s an example of how you might define a PDB in a Helm Chart:

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: "{{ .Release.Name }}-pdb"
spec:
minAvailable: 2
selector:
matchLabels:
app: "{{ .Release.Name }}"

In this example, the PDB ensures that at least two Pods with the label `app: {{ .Release.Name }}` remain available during voluntary disruptions.

Kubernetes will try to respect the PDB when performing operations that could cause the application to become unavailable. For example, it will try to schedule Pods on nodes that do not cause the PDB to be violated.

It’s important to note that PDBs do not guarantee the specified number or percentage of Pods will always be available. In cases of involuntary disruptions or if there are not enough resources in the cluster to schedule a new Pod after a node failure, the number of available Pods might fall below the specified amount.

Finally, when defining PDBs in your Helm Chart, make sure that the labels in the `selector` field match the labels of the Pods you want to protect.

Kubernetes Horizontal Pod Autoscaler (HPA) with Helm

The Horizontal Pod Autoscaler (HPA) is a Kubernetes component that automatically scales the number of pods in a replication controller, deployment, or replica set based on observed CPU or memory utilization.

HPA is implemented as a Kubernetes API resource and a controller. The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition, and adjusts the number of pods in a replication controller or deployment to match the observed average utilization to the target specified by the user.

In a Helm chart, you can define an HPA for your deployments. Here is an example:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: "{{ .Release.Name }}-hpa"
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: "{{ .Release.Name }}"
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60

In this example:

  • The HPA is set to monitor the Deployment named `{{ .Release.Name }}`.
  • The number of Pods can scale between 1 and 10.
  • The `averageUtilization` target is set to 60%. This means if the average CPU usage across all Pods goes above 60%, Kubernetes will start adding more Pods.
  • The `type: Utilization` means the metric that decides whether to scale up or down is CPU utilization.

HPA allows your application to adapt to changes in traffic patterns and load, which can help you optimize resource usage and costs. It’s important to note that for HPA to work, you must have a cluster metrics server deployed. This server collects resource metrics from Kubelets and exposes them via the Kubernetes API server.

In Helm Charts, the values for minReplicas, maxReplicas, and averageUtilization can be parameterized, allowing you to tune autoscaling behavior based on the specific requirements of different environments.

Custom Alerting

While Helm charts and Kubernetes do not inherently support alerting, they can be integrated with monitoring and alerting tools to provide comprehensive observability for your applications. Tools like Prometheus, which is often paired with Alertmanager and Grafana, can be used to scrape metrics, generate alerts based on those metrics, and visualize data.

Prometheus is a powerful open-source monitoring and alerting toolkit that is designed for reliability and to work effectively in a Kubernetes-based environment. It collects metrics from configured targets at given intervals, evaluates rule expressions, and can trigger alerts if certain conditions are observed to be true.

Alertmanager, which is part of the Prometheus toolset, handles alerts sent by Prometheus server and takes care of deduplicating, grouping, and routing them to the correct receiver such as an email recipient. It can also silence and inhibit alerts.

Grafana is an open-source platform for data visualization, monitoring, and analysis. It allows you to query, visualize, alert on, and understand your metrics no matter where they are stored.

You can deploy these tools using their respective Helm charts and configure them to monitor your applications deployed using Helm charts.

For instance, if you have a service running in a Kubernetes cluster and you want to set up alerting for it, you might add a Prometheus exporter to your Helm chart. This exporter exposes application metrics in a format Prometheus can scrape.

Here’s an example of how you might add a Prometheus exporter as a sidecar container in your Helm chart:

apiVersion: apps/v1
kind: Deployment
metadata:
name: "{{ .Release.Name }}"
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: "{{ .Release.Name }}"
template:
metadata:
labels:
app: "{{ .Release.Name }}"
spec:
containers:
- name: my-app
image: my-app:latest
- name: my-app-exporter
image: prom/my-app-exporter:latest
ports:
- name: exporter
containerPort: 9110

In this example, `my-app-exporter` is a Prometheus exporter that exposes metrics for `my-app` at port 9110. Prometheus can be configured to scrape this endpoint and collect metrics.

The collected metrics can be used to define alerting rules in Prometheus, which can be routed to Alertmanager for notification via email, on-call notification systems, or other methods. You can also use Grafana to create dashboards for visualizing these metrics.

Note: Custom alerting setup might require changes not just in your Helm chart, but also in your monitoring infrastructure (Prometheus, Alertmanager, Grafana) and possibly in your application code to expose the necessary metrics.

--

--