The silent failure
You deploy a Go service to Kubernetes. The pod starts, pulls traffic, and then silently stops responding because a database connection pool exhausted itself. Kubernetes does not know the app is broken. It keeps sending requests to a black hole. Users see timeout errors. You need a way for the container to tell the orchestrator whether it is actually alive and whether it is ready to handle work.
Health checks are not optional in production. They are the contract between your application and the platform that runs it. Without them, Kubernetes treats every pod as a fire-and-forget process. With them, you get automatic recovery, safe rolling updates, and predictable traffic routing.
Two signals, two jobs
Kubernetes uses two distinct signals to manage your pod. The liveness probe asks if the process is stuck or crashed. If it fails, Kubernetes kills the container and restarts it. The readiness probe asks if the app can accept traffic. If it fails, Kubernetes removes the pod from the service load balancer but leaves the container running.
Think of liveness as a heartbeat monitor in an ICU. It only cares whether the patient is breathing. Think of readiness as an "Open" sign on a shop door. It tells customers whether the staff is ready to serve. Both matter. Both require different logic.
Liveness restarts. Readiness drains. Confusing them burns cluster resources.
The minimal endpoint
Start with a bare HTTP server. The endpoint returns a 200 status code when the process is running. Nothing fancy. The goal is to prove the Go runtime is alive and the HTTP listener is accepting connections.
package main
import (
"net/http"
)
// HealthHandler returns a 200 status to signal the process is alive.
func HealthHandler(w http.ResponseWriter, r *http.Request) {
// Return 200 immediately. Kubernetes treats any 2xx or 3xx as success.
w.WriteHeader(http.StatusOK)
}
func main() {
// Register the health check path before starting the server.
http.HandleFunc("/healthz", HealthHandler)
// Listen on port 8080. Kubernetes will probe this port.
http.ListenAndServe(":8080", nil)
}
A 200 status is a promise. Keep it fast and keep it honest.
How the kubelet actually probes
When Kubernetes starts your pod, the kubelet runs a background loop on the node. Every periodSeconds, it sends an HTTP GET to /healthz on port 8080. The Go server accepts the TCP connection, creates a new goroutine for the request, and runs HealthHandler. The handler writes the 200 header and closes the response body. The kubelet reads the response code. A 200 means success. The loop continues.
If the Go process crashes, the TCP connection fails. The kubelet marks the probe as failed. After failureThreshold consecutive failures, the kubelet sends a SIGTERM to the container. It waits for terminationGracePeriodSeconds, and then sends SIGKILL. The pod restarts. The cycle repeats.
The kubelet does not care about your application logic. It only cares about HTTP status codes and TCP connectivity. A 200 or 300 range means healthy. A 400 or 500 range means unhealthy. A connection timeout means unreachable. Kubernetes translates these signals into pod lifecycle actions.
You must configure the probe timing carefully. initialDelaySeconds gives your app time to start. periodSeconds controls how often the kubelet checks. timeoutSeconds defines how long the kubelet waits for a response. If your health check takes longer than timeoutSeconds, Kubernetes marks it as failed regardless of what your code does.
Checking real dependencies
A real service depends on external systems. You need to check if the database is reachable and if the request queue is draining. The health endpoint should reflect the actual state of the application, not just whether the Go runtime is running.
package main
import (
"context"
"net/http"
"time"
)
// CheckDB verifies the database connection is still healthy.
func CheckDB(ctx context.Context) error {
// Ping the database with a short timeout.
// Return an error if the connection is stale or unreachable.
return nil
}
// CheckQueue verifies the background worker is not falling behind.
func CheckQueue(ctx context.Context) error {
// Compare current timestamp against last processed message.
// Return an error if the lag exceeds the acceptable threshold.
return nil
}
// HealthHandler evaluates dependencies and returns 200 or 503.
func HealthHandler(w http.ResponseWriter, r *http.Request) {
// Use the request context to respect client timeouts.
ctx := r.Context()
// Check database connectivity.
if err := CheckDB(ctx); err != nil {
// Return 503 to signal the app is alive but unhealthy.
w.WriteHeader(http.StatusServiceUnavailable)
return
}
// Check queue processing lag.
if err := CheckQueue(ctx); err != nil {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
// All checks passed. Signal readiness.
w.WriteHeader(http.StatusOK)
}
func main() {
http.HandleFunc("/healthz", HealthHandler)
// Start the server with a timeout listener to prevent goroutine leaks.
srv := &http.Server{Addr: ":8080"}
srv.ListenAndServe()
}
Health checks are mirrors. They reflect your dependencies, not just your process.
Go convention favors explicit error handling over silent failures. The if err != nil { return err } pattern makes the unhappy path visible. Apply the same discipline to health checks. Log the failure reason before returning the status code. Kubernetes logs will show the HTTP status, but your application logs will show why it failed. Also, the receiver name for handler methods should be short, like (h *HealthChecker) ServeHTTP(...). Keep it consistent with the standard library.
Common traps and compiler feedback
Developers often conflate liveness and readiness. They put heavy dependency checks in the liveness probe. When the database goes down, Kubernetes kills and restarts the pod every ten seconds. The restart storm consumes cluster resources and masks the real problem. Keep liveness lightweight. Check memory leaks, goroutine counts, or stuck channels. Put database and cache checks in readiness.
Another common mistake is ignoring the request context. If your health check takes longer than the probe timeout, Kubernetes marks it as failed. The Go runtime will eventually cancel the context, but you should check it explicitly. If you forget to handle a missing import for a database driver, the compiler rejects the build with undefined: driver. If you return a 500 error instead of 503, Kubernetes treats it as a liveness failure and restarts the pod. Use 503 for temporary dependency failures. Use 500 only when the application itself is broken.
Race conditions appear when multiple goroutines update shared health state. The kubelet sends probes concurrently during rolling updates. If your health checker reads a map or slice without synchronization, the program panics with concurrent map read and map write. Protect shared state with sync.RWMutex or use atomic values. The Go race detector catches these issues during testing. Run go run -race main.go before deploying.
Forgetting to use a package triggers imported and not used from the compiler. Forgetting to capture a loop variable in a closure triggers loop variable captured by func literal. These errors stop the build. Fix them early. Do not ignore compiler feedback. The Go compiler is strict by design. It prevents subtle runtime bugs from reaching production.
Don't let Kubernetes restart a solvable problem. Diagnose first, restart second.
Choosing the right probe
Use a liveness probe when you need automatic recovery from deadlocks or memory exhaustion. Use a readiness probe when you need to drain traffic during dependency outages or rolling updates. Use a startup probe when your application takes longer than the liveness failure threshold to initialize. Use a simple TCP probe when you only need to verify the port is open and don't want HTTP overhead. Use plain sequential code when you don't need concurrency: the simplest thing that works is usually the right thing.
Match the probe to the symptom. Kubernetes responds to signals, not guesses.