The rolling restart problem
You push a new version of your Go web service. The deployment script kills the running process and starts the new one. For two seconds, the load balancer routes requests to a dead port. Users see 502 Bad Gateway errors. Your monitoring dashboard spikes with failed requests. This happens because HTTP connections do not vanish the moment you stop listening. Active requests are still in flight, and the operating system needs time to tear down TCP handshakes. Zero-downtime deploys solve this by running the old and new versions side by side, draining active connections, and only then removing the old process.
How graceful shutdown actually works
The pattern relies on a coordinated handoff. You start the new binary on a spare port or behind a load balancer. Once the new instance passes a health check, you tell the old instance to stop accepting new work. The old server finishes every request that is already being processed, closes its listener, and exits cleanly. The load balancer then removes the old instance from its pool. The entire sequence takes seconds, not minutes, and no client receives a dropped connection.
Go makes this straightforward with net/http.Server.Shutdown. The method does not force-kill the process. It closes the underlying network listener, which stops the operating system from queuing new TCP connections. It then waits for all active HTTP connections to close naturally. If a request takes too long, a context timeout forces the shutdown to complete. Think of it like a restaurant closing for the night. The host stops seating new guests at the door, but the kitchen keeps cooking for the tables that are already eating. The doors lock only after the last plate is cleared.
Graceful shutdown is not a magic switch. It is a contract between your application, the operating system, and your deployment tooling. Every layer must respect the handoff.
The minimal graceful server
Here is the simplest way to wire graceful shutdown into a Go HTTP server. The program listens for termination signals, creates a timeout context, and hands control to Shutdown.
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
)
// run starts the HTTP server and waits for a termination signal.
func run() {
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Simulate work that takes a few seconds
time.Sleep(2 * time.Second)
fmt.Fprint(w, "OK")
})
srv := &http.Server{
Addr: ":8080",
Handler: mux,
}
// Start listening in a goroutine so we can block on signals
go func() {
// ListenAndServe blocks until the server is stopped
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("listen error: %v", err)
}
}()
// Wait for SIGTERM or SIGINT from the OS or container runtime
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
<-quit
// Give in-flight requests 15 seconds to finish
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
// Shutdown closes the listener and waits for active connections
if err := srv.Shutdown(ctx); err != nil {
log.Fatalf("server forced to shutdown: %v", err)
}
}
func main() {
run()
}
What happens under the hood
The program starts by registering a handler that deliberately sleeps for two seconds. This simulates a database query or an upstream API call. The server begins listening on port 8080 inside a goroutine. Without the goroutine, ListenAndServe blocks forever and the signal channel never gets a chance to run.
The main goroutine blocks on <-quit. When a container orchestrator or deployment script sends SIGTERM, the channel unblocks. The program creates a context with a fifteen-second deadline. context.Context always travels as the first parameter in Go functions, and the community convention names it ctx. The timeout acts as a safety net. If a request hangs for longer than fifteen seconds, Shutdown stops waiting and returns an error.
Shutdown performs three steps internally. First, it closes the TCP listener. The kernel stops accepting new SYN packets. Second, it waits for every active connection to finish sending its response. Third, it closes the connection. If the context expires before all connections close, Shutdown returns a context deadline exceeded error. The program logs it and exits. Any requests that were still running get abandoned, but the process does not hang indefinitely.
Go developers accept the if err != nil { return err } pattern because it forces you to acknowledge failure paths. In deployment scripts, ignoring errors leads to split-brain states where both old and new instances run simultaneously. Explicit error handling keeps the deployment pipeline predictable. Public names start with a capital letter. Private start lowercase. No keywords like public or private. The compiler enforces visibility at the package level, so you never need to decorate your types.
Trust the context to propagate the shutdown signal through your entire call tree.
Real-world deploy pattern
Production deployments rarely run a single binary in a terminal. You need a deployment script or a service manager that coordinates the handoff. The sequence follows a strict order to avoid routing traffic to an unready instance.
Here is how a typical deployment script orchestrates the swap. The script starts the new binary, waits for it to respond to health checks, signals the old process, and waits for it to exit.
#!/usr/bin/env bash
set -euo pipefail
OLD_PID=$(cat /var/run/myapp.pid 2>/dev/null || echo "")
NEW_VERSION="v1.2.3"
# Start the new binary in the background
./myapp --port 8081 &
NEW_PID=$!
echo $NEW_PID > /var/run/myapp.pid
# Wait until the new instance passes a health check
until curl -sf http://localhost:8081/health; do
sleep 0.5
done
# Route traffic to the new port via your load balancer
# (This step depends on your LB configuration)
# Signal the old instance to drain
if [ -n "$OLD_PID" ]; then
kill -TERM "$OLD_PID"
# Wait up to 30 seconds for the old process to exit
wait "$OLD_PID" 2>/dev/null || true
fi
The script uses set -euo pipefail to fail fast on errors. It captures the old process ID, starts the new binary on a different port, and polls the health endpoint. Once the new instance is ready, the load balancer switches traffic. The script then sends SIGTERM to the old process. The old Go program catches the signal, runs Shutdown, and exits. The wait command ensures the script does not proceed until the old process is gone.
Do not pass a *string for configuration values. Strings are already cheap to pass by value. Copying a pointer adds indirection without saving memory. Keep your deployment configuration simple and explicit.
Where things go wrong
Graceful shutdown fails when background work outlives the HTTP request. If your handler spawns a goroutine to process a message queue or write to a database, that goroutine continues running after Shutdown returns. The process exits, but the goroutine leaks into the next deployment cycle or crashes with a fatal error: all goroutines are asleep - deadlock! if it blocks on a closed channel.
Always tie background work to the same context you pass to Shutdown. If a handler needs to do async work, pass the request context to the goroutine. When Shutdown cancels the context, the goroutine receives the cancellation signal and exits cleanly. The receiver name is usually one or two letters matching the type: (b *Buffer) Write(...), NOT (this *Buffer) or (self *Buffer). Keep your method signatures idiomatic so other developers recognize the pattern immediately.
Another common mistake is ignoring the error returned by Shutdown. The compiler will not stop you from dropping it. If you write srv.Shutdown(ctx) without checking the result, you lose visibility into timed-out requests. The compiler complains with err declared and not used if you assign it to a variable and never read it. Always log the error or return it.
Load balancer health checks also cause trouble if misconfigured. If your load balancer sends a SIGKILL instead of SIGTERM, the Go process never gets a chance to run Shutdown. SIGKILL cannot be caught or ignored. It tears down the process immediately. Configure your container runtime or service manager to send SIGTERM first, wait for the configured timeout, and only then escalate to SIGKILL.
Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path. Trust gofmt to handle indentation and spacing. Argue logic, not formatting. Most editors run it on save, and the team stays consistent without style debates.
Choosing your deployment strategy
Use a simple SIGTERM plus Shutdown pattern when you deploy to virtual machines or bare metal and control the deployment script yourself. Use a load balancer with connection draining when you run behind AWS ALB, GCP Cloud Load Balancing, or Nginx, since the proxy handles the traffic switch while your app drains. Use Kubernetes rolling updates when you run containers, because the orchestrator sends SIGTERM, waits for the terminationGracePeriodSeconds, and only then sends SIGKILL. Use a blue-green deployment when you need instant rollback capability, since you keep the old environment fully running until you verify the new one. Use sequential process replacement when you run a single-instance service with no external load balancer, accepting a brief window of unavailability during the swap.
Accept interfaces, return structs. Keep your deployment configuration flexible on the input side and concrete on the output side. The worst goroutine bug is the one that never logs.