How to Implement Health Check Endpoints in Go

The handshake that keeps your app alive

Your application starts up. The database driver initializes. The connection pool warms. The cache client connects. Meanwhile, the load balancer or orchestrator sees port 8080 open and immediately routes traffic to the new instance. Requests arrive before the database is ready. They fail. The orchestrator sees the errors, marks the pod unhealthy, and kills it. The cycle repeats.

This happens because the infrastructure assumes that an open port means a ready service. It doesn't. A health check endpoint is the handshake where your code tells the world what is actually happening. It answers two distinct questions. Am I alive? Am I ready to accept traffic?

Infrastructure tools like Kubernetes, AWS Elastic Load Balancing, and Docker Swarm poll these endpoints. They use the response to make decisions. If the liveness check fails, the orchestrator restarts the container. If the readiness check fails, the orchestrator removes the instance from the load balancer pool until it recovers.

What a health check actually does

A health check is a lightweight HTTP handler. It returns a status code that signals the state of the service. A 200 OK means everything is fine. A 503 Service Unavailable means the service is down or not ready.

The response body is secondary. Load balancers care about the status code. Humans debugging the system care about the body. A good health check returns a 200 with a JSON payload that lists the status of critical dependencies. This gives operators visibility without slowing down the probe.

There are two standard patterns. Liveness checks verify the process is not stuck. Readiness checks verify dependencies are connected and the app can serve requests. In production, these often live on separate paths. /live checks if the goroutine is running. /ready checks if the database is reachable.

The minimal handler

Start with the simplest possible implementation. A handler that returns 200 and a short string. This proves the HTTP server is running and the goroutine is alive.

Here's the baseline handler: register a route, write the status, write the body.

package main

import (
	"log"
	"net/http"
)

// healthHandler returns 200 to signal the process is alive.
func healthHandler(w http.ResponseWriter, r *http.Request) {
	// Set status before writing body to ensure the header is sent.
	w.WriteHeader(http.StatusOK)
	// Write a minimal body. Load balancers ignore this, humans read it.
	w.Write([]byte("OK"))
}

func main() {
	// Register the handler on the default mux.
	http.HandleFunc("/health", healthHandler)
	// Start the server. Log.Fatal handles the error and exits.
	log.Fatal(http.ListenAndServe(":8080", nil))
}

The handler sets http.StatusOK explicitly. If you skip WriteHeader, the first call to Write defaults to 200, but being explicit prevents bugs if you later add error logic. The body is just OK. Keep it small. Health checks run frequently. Every byte counts when thousands of probes hit the endpoint per second.

Goroutines are cheap. Channels are not magic. A simple handler like this costs almost nothing.

Checking dependencies

A 200 response is a promise. If you return 200 while the database is down, you are lying to the load balancer. Traffic flows to a broken service. Errors cascade.

Real health checks verify critical dependencies. The database is the most common one. The check should be fast. It should not block the entire service. If the database is hung, the health check must timeout and return 503. This triggers the orchestrator to stop sending traffic, which protects the rest of the system.

Here's a handler that checks a database connection using a context with a timeout.

package main

import (
	"context"
	"database/sql"
	"log"
	"net/http"
	"time"
)

// Service holds dependencies for the application.
type Service struct {
	db *sql.DB
}

// healthHandler checks the database and returns the status.
func (s *Service) healthHandler(w http.ResponseWriter, r *http.Request) {
	// Create a context with a short timeout. Health checks must be fast.
	ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
	defer cancel()

	// Ping the database to verify connectivity.
	// ExecContext is often cheaper than Ping for some drivers.
	err := s.db.PingContext(ctx)
	if err != nil {
		// Return 503 if the dependency is down.
		w.WriteHeader(http.StatusServiceUnavailable)
		w.Write([]byte("DB Unavailable"))
		return
	}

	// Return 200 if everything is healthy.
	w.WriteHeader(http.StatusOK)
	w.Write([]byte("OK"))
}

func main() {
	// Initialize the service with a real DB connection.
	svc := &Service{db: &sql.DB{}}
	http.HandleFunc("/health", svc.healthHandler)
	log.Fatal(http.ListenAndServe(":8080", nil))
}

The handler uses context.WithTimeout. This is critical. If the database driver hangs, the context cancels after 500 milliseconds. The handler returns 503. The load balancer stops routing traffic. Without the timeout, the health check goroutine blocks. The orchestrator waits for the probe timeout, which is usually much longer, and the system degrades slowly.

The receiver is (s *Service). Go convention uses short receiver names that match the type. s for Service. This keeps the method signature clean.

Context is plumbing. Run it through every long-lived call site. Health checks are no exception.

The registry pattern

Applications grow. You add Redis. You add Kafka. You add an external API. Hardcoding checks in the handler becomes messy. The handler knows too much. It couples the HTTP layer to every dependency.

A registry pattern solves this. Define an interface for a check. Create a registry that holds a list of checks. The handler iterates the registry and runs each check. This keeps the handler simple and allows dependencies to register themselves.

Here's a registry implementation that collects checks and reports their status.

package main

import (
	"context"
	"encoding/json"
	"fmt"
	"net/http"
	"sync"
	"time"
)

// CheckFunc defines the signature for a health check.
type CheckFunc func(ctx context.Context) error

// Registry holds a collection of health checks.
type Registry struct {
	mu    sync.RWMutex
	checks map[string]CheckFunc
}

// NewRegistry creates a new empty registry.
func NewRegistry() *Registry {
	return &Registry{
		checks: make(map[string]CheckFunc),
	}
}

// Add registers a new check with a name.
func (r *Registry) Add(name string, fn CheckFunc) {
	r.mu.Lock()
	defer r.mu.Unlock()
	r.checks[name] = fn
}

// Handler returns an HTTP handler that runs all registered checks.
func (r *Registry) Handler() http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		// Create a context with a timeout for the entire check cycle.
		ctx, cancel := context.WithTimeout(r.Context(), 1*time.Second)
		defer cancel()

		// Copy the checks map to avoid holding the lock during execution.
		r.mu.RLock()
		checksCopy := make(map[string]CheckFunc, len(r.checks))
		for k, v := range r.checks {
			checksCopy[k] = v
		}
		r.mu.RUnlock()

		// Run checks and collect results.
		results := make(map[string]string)
		allHealthy := true
		for name, fn := range checksCopy {
			err := fn(ctx)
			if err != nil {
				results[name] = fmt.Sprintf("error: %v", err)
				allHealthy = false
			} else {
				results[name] = "ok"
			}
		}

		// Set status code based on overall health.
		if !allHealthy {
			w.WriteHeader(http.StatusServiceUnavailable)
		} else {
			w.WriteHeader(http.StatusOK)
		}

		// Return JSON body with details for debugging.
		w.Header().Set("Content-Type", "application/json")
		json.NewEncoder(w).Encode(results)
	}
}

The Registry uses a sync.RWMutex. Checks can be added or removed while the server runs. The handler takes a read lock, copies the map, and releases the lock before running checks. This prevents blocking other goroutines that might be registering checks.

The handler returns JSON. This is useful for operators. They can see exactly which dependency failed. The status code still drives the infrastructure behavior. 200 for healthy, 503 for unhealthy.

Accept interfaces, return structs. The registry accepts CheckFunc and returns a http.HandlerFunc. This follows the common Go style mantra.

Pitfalls and runtime traps

Health checks seem simple, but they introduce subtle bugs.

Blocking is the biggest risk. If a check calls a slow external API without a timeout, the health check hangs. The orchestrator waits for the probe timeout, which might be 30 seconds. During that time, the instance stays in the load balancer pool. Traffic hits a stuck service. Errors spike. Always use context.WithTimeout in every check.

False positives happen when the check passes but the service is broken. Checking if a TCP port is open is not enough. The database might be accepting connections but rejecting queries. Use Ping or a lightweight SELECT 1 query. Verify the dependency can actually do work.

Log spam occurs when health checks run frequently. Kubernetes probes often hit the endpoint every 5 to 10 seconds. If the handler logs every request, the logs fill up. The log volume can trigger alerts or fill disk space. Health check handlers should not log on success. Log only on unexpected errors in the check logic itself.

Compiler errors appear when types mismatch. If you pass a *sql.DB to a function expecting interface{}, the compiler rejects it with cannot use db (variable of type *sql.DB) as interface{} value in argument. If you forget to import encoding/json, you get undefined: json. The compiler is strict. Read the errors. They tell you exactly what is wrong.

The worst goroutine bug is the one that never logs. A health check that panics silently kills the probe. Wrap the check execution in a recover if you are running third-party checks. Return 503 on panic.

Graceful shutdown integration

When the orchestrator sends a SIGTERM signal, the application should stop accepting new traffic immediately. It should finish processing existing requests. Then it should exit.

The health check endpoint controls this flow. During shutdown, the readiness check must fail. This tells the load balancer to stop sending traffic. The liveness check should stay passing. This prevents the orchestrator from restarting the container while it is draining.

Implement this with a flag. Set the flag to false when the shutdown signal arrives. The readiness handler checks the flag. If false, return 503.

package main

import (
	"context"
	"net/http"
	"os"
	"os/signal"
	"sync/atomic"
	"syscall"
)

// readyFlag tracks if the service is ready to accept traffic.
var readyFlag atomic.Bool

func init() {
	// Start as ready.
	readyFlag.Store(true)
}

// readinessHandler checks the flag and returns status.
func readinessHandler(w http.ResponseWriter, r *http.Request) {
	if readyFlag.Load() {
		w.WriteHeader(http.StatusOK)
		w.Write([]byte("Ready"))
	} else {
		w.WriteHeader(http.StatusServiceUnavailable)
		w.Write([]byte("Shutting down"))
	}
}

func main() {
	http.HandleFunc("/ready", readinessHandler)

	// Start server in a goroutine.
	go func() {
		if err := http.ListenAndServe(":8080", nil); err != nil {
			panic(err)
		}
	}()

	// Wait for interrupt signal.
	sigCh := make(chan os.Signal, 1)
	signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
	<-sigCh

	// Mark as not ready to stop new traffic.
	readyFlag.Store(false)

	// Shutdown the server gracefully.
	ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
	defer cancel()
	// Note: In real code, you'd pass the http.Server to Shutdown.
	// This is a simplified example.
}

The atomic.Bool provides thread-safe access to the flag. No mutex needed. The handler loads the flag and returns the appropriate status. When the signal arrives, the flag flips. The load balancer sees 503 and removes the instance. The server finishes draining.

Don't fight the type system. Use atomic operations for simple flags. Use mutexes for complex state.

Decision matrix

Health check complexity should match your deployment environment.

Use a simple 200 handler when you are running a local development server or a script with no external dependencies. The goal is just to verify the process is running.

Use a dependency check when your service relies on a database, cache, or message queue. The check must verify connectivity and return 503 if the dependency is down.

Use a registry pattern when you have multiple dependencies or need to add checks dynamically. The registry keeps the handler clean and allows each dependency to report its own status.

Use separate liveness and readiness endpoints when deploying to Kubernetes or a cloud load balancer. Liveness restarts the container if stuck. Readiness removes the instance from the pool during startup or shutdown.

Use a timeout in every check. Health checks must be fast. If a check takes more than a second, the orchestrator thinks the service is dead.

Where to go next

A health check endpoint is a specific URL your application exposes to tell external systems if it is running correctly. Load balancers and monitoring tools hit this URL regularly; if it returns a success code, they know your app is alive and ready to handle traffic. Think of it as a digital 'thumbs up' signal that your service is operational.