How to Implement Liveness and Readiness Probes in Go

The container that never wakes up

You deploy a Go service to Kubernetes. The container starts, the binary runs, and the HTTP server binds to port 8080. Kubernetes sees the port is open and immediately routes live traffic to it. The application is still initializing its database connection pool, warming up caches, and waiting for a downstream API to acknowledge the handshake. Every incoming request fails. The orchestrator has no idea the service is in a temporary broken state, so it keeps sending more traffic. Eventually the pod gets marked unhealthy and restarted, creating a loop of failed requests and unnecessary churn.

This happens because the orchestrator only knows two things: the process is running, and the port is listening. It does not know whether the application is actually prepared to handle work. Probes bridge that gap. They give the container a voice to tell the orchestrator exactly what state it is in.

What probes actually do

Liveness and readiness probes are just HTTP endpoints that return a status code. The orchestrator polls them on a schedule. A 200 response means everything is fine. Any other response triggers a specific action.

Liveness answers the question of whether the process is alive. If the liveness probe fails, the orchestrator kills the container and starts a fresh one. Think of it as a heartbeat monitor. If the heart stops, you restart the patient.

Readiness answers the question of whether the service can accept traffic. If the readiness probe fails, the orchestrator removes the pod from the load balancer. Traffic stops flowing to that instance until it reports ready again. The container keeps running. It just sits idle until it recovers.

A coffee shop makes the distinction clear. Liveness is whether the lights are on and the barista is present. Readiness is whether the espresso machine has finished heating up and can actually pull shots. You can have a barista standing at a cold machine. The shop is alive, but it cannot serve customers yet.

Probes are cheap to implement in Go. They rely on the standard library and require zero external dependencies. The real work is deciding what state to check and how to expose it without blocking the probe handler.

The simplest possible setup

Start with a bare HTTP server that exposes two routes. The liveness endpoint returns success as long as the process is running. The readiness endpoint returns success immediately, which is fine for development but useless in production. You will upgrade it in the next section.

Here is the baseline server with both handlers registered:

package main

import (
	"log"
	"net/http"
)

// main starts the HTTP server and registers probe handlers.
func main() {
	// Liveness probe: always returns 200 if the process is alive
	http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
		w.WriteHeader(http.StatusOK)
	})

	// Readiness probe: placeholder that returns 200 immediately
	http.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
		w.WriteHeader(http.StatusOK)
	})

	// Start listening on port 8080
	log.Println("Server starting on :8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}

The server starts, binds to the port, and waits for requests. When the orchestrator hits /healthz, the handler writes a 200 status and returns. When it hits /readyz, the same thing happens. The orchestrator sees two healthy endpoints and considers the pod fully operational.

At runtime, http.ListenAndServe blocks the main goroutine. The handler functions run on separate goroutines pulled from the server's internal pool. Each probe request gets its own request context, which automatically cancels if the orchestrator times out. You do not need to manage goroutine lifecycles manually for simple handlers.

The convention in Go is to keep probe handlers fast and synchronous. They should never block on network calls or heavy computation. If a probe takes longer than the orchestrator's timeout window, the orchestrator assumes failure and acts accordingly. Keep the handler under a few milliseconds.

Making readiness actually check something

A readiness probe needs to reflect the actual state of the application. You typically check whether required dependencies are connected and whether background workers have finished their startup routines. The cleanest approach is to track state with a thread-safe flag and expose it through the handler.

Here is a realistic readiness handler that waits for a simulated dependency check before reporting ready:

package main

import (
	"log"
	"net/http"
	"sync/atomic"
	"time"
)

// readyFlag tracks whether the application has finished initialization.
var readyFlag atomic.Bool

// main starts the HTTP server and runs background initialization.
func main() {
	// Readiness probe: checks the atomic flag before responding
	http.HandleFunc("/readyz", func(w http.ResponseWriter, r *http.Request) {
		if readyFlag.Load() {
			w.WriteHeader(http.StatusOK)
			return
		}
		// Return 503 so the orchestrator removes this pod from the load balancer
		w.WriteHeader(http.StatusServiceUnavailable)
	})

	// Simulate slow dependency initialization in a separate goroutine
	go func() {
		time.Sleep(3 * time.Second)
		// Mark the service as ready after initialization completes
		readyFlag.Store(true)
		log.Println("Dependencies initialized. Service is ready.")
	}()

	log.Println("Server starting on :8080")
	log.Fatal(http.ListenAndServe(":8080", nil))
}

The readyFlag uses sync/atomic to avoid mutex contention. The handler reads the flag in a single CPU instruction. If initialization is still running, the handler returns 503. The orchestrator sees the non-200 response and stops routing traffic. Once the background goroutine finishes, it flips the flag to true. The next probe request gets a 200, and the orchestrator adds the pod back to the load balancer.

This pattern scales to real dependencies. Replace the time.Sleep with a database ping, a cache connection test, or a downstream API health check. Run the check in a background goroutine during startup, then flip the flag when everything succeeds. If a dependency fails, keep the flag false and let the orchestrator handle the traffic routing.

The Go community convention for functions that perform checks is to accept context.Context as the first parameter, conventionally named ctx. If you move the dependency check into its own function, pass the request context so cancellation propagates correctly. Functions that take a context should respect deadlines and return early if the context is done.

When things go wrong

Probes break in predictable ways. The most common mistake is blocking the handler. If you run a database query directly inside the readiness handler, you tie up a server goroutine for every probe request. The orchestrator polls every few seconds. Under load, you will exhaust the server's goroutine pool and the entire service will stall.

The compiler will catch obvious mistakes. If you forget to capture a loop variable in a closure, the compiler rejects the program with loop variable i captured by func literal. If you pass a string where an integer is expected, you get cannot use x (untyped int constant) as string value in argument. These are easy to fix. Runtime bugs are harder.

A silent goroutine leak happens when you spawn a background task inside a probe handler and forget to cancel it. The handler returns, but the goroutine keeps running. Over hours, thousands of leaked goroutines accumulate until the process runs out of memory. Always tie background work to the request context or a dedicated cancellation channel.

Another trap is ignoring the orchestrator's timeout configuration. If your probe handler takes 5 seconds to respond but Kubernetes expects a response in 2 seconds, the orchestrator marks the probe as failed. You will see healthy logs in your application while the orchestrator repeatedly restarts or deprioritizes the pod. Align your handler logic with the timeoutSeconds field in your deployment manifest.

Error handling in Go is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. If your readiness check calls a function that returns an error, handle it immediately. Do not swallow it. Return a 503 status and log the error so operators can diagnose the failure.

The worst probe bug is the one that never logs. If a handler panics, the server recovers and returns a 500, but you lose visibility into why it happened. Wrap probe handlers in a recovery middleware or use defer to catch panics and log them before responding.

Choosing the right probe strategy

Use a liveness probe when you need automatic recovery from deadlocks or corrupted state. Use a readiness probe when your service depends on external resources that take time to initialize. Use a startup probe when your application has a long initialization phase that exceeds the readiness timeout window. Skip probes entirely when you are running a stateless script that finishes and exits.

The decision matrix is straightforward. Liveness restarts. Readiness routes. Startup delays. Pick the one that matches your failure mode.

Where to go next

Liveness probes tell the system if your app is still running; if it fails, the system restarts it. Readiness probes tell the system if your app is ready to handle user requests; if it fails, the system stops sending traffic to it. Think of liveness as a heartbeat check and readiness as a "ready to work" sign.