The slow death of a service
You deploy a new feature. Traffic spikes. The server starts responding slowly. You check the metrics, and the goroutine count is climbing. It never comes down. The process eats more RAM until the OOM killer steps in. You didn't write a bug that crashes the app. You wrote a bug that quietly starves it. This is a goroutine leak.
What a leak actually is
A goroutine leak occurs when a goroutine starts but never terminates. It stays alive, holding memory and a slot in the scheduler. The goroutine might be blocked on a channel, sleeping, or stuck in a loop. It doesn't matter why it's stuck. What matters is that it never exits.
Over time, leaked goroutines accumulate. Each one consumes stack memory and control structures. They also hold references to variables in their closure. Those variables can't be garbage collected. A single leaked goroutine is harmless. A thousand leaked goroutines per minute will exhaust your memory.
Think of a factory assembly line. Workers pick up parts, assemble them, and move to the next station. If a worker gets stuck at a station and never moves, they block the line. If the manager keeps hiring new workers but the stuck workers never leave, the factory floor fills up with idle people. Eventually, there's no room for new workers, and production stops. The factory isn't broken. It's just full of workers who forgot how to quit.
Goroutines are cheap. Leaks are expensive.
Minimal example
Here's the simplest leak: a goroutine waiting on a channel that never gets sent to.
package main
import (
"fmt"
"time"
)
func main() {
// Unbuffered channel. Sends block until a receiver is ready.
ch := make(chan int)
// Spawn a goroutine that waits for data that never arrives.
go func() {
// Blocks here indefinitely. The goroutine is stuck.
// It holds memory and a scheduler slot.
<-ch
fmt.Println("Never reached")
}()
// Sleep to let the goroutine run.
// In a real server, this is the lifetime of the process.
time.Sleep(1 * time.Second)
// Main exits. The leaked goroutine is abandoned.
// In a long-running service, this goroutine would persist until restart.
fmt.Println("Leaked goroutine is still alive")
}
The compiler accepts this code without warnings. Go doesn't perform static analysis of goroutine lifetimes. The runtime creates the goroutine and schedules it. The goroutine executes the receive operation on the channel. The channel is empty and unbuffered, so the receive blocks. The scheduler marks the goroutine as blocked and moves on to other work. The goroutine sits in memory, waiting for a value that will never arrive.
In a short-lived program, main exits and the process dies, hiding the leak. In a long-running server, this goroutine persists until the process restarts.
Realistic worker pattern
Leaks often hide in loops that process data. A worker function reads from a channel and does work. If the channel never closes and no cancellation signal arrives, the worker loops forever.
Here's a realistic worker pattern that avoids leaks by using context for cancellation.
package main
import (
"context"
"fmt"
)
// Worker processes items until the context is cancelled.
// Worker takes ctx as the first parameter, following Go convention.
func worker(ctx context.Context, jobs <-chan int) {
// Loop runs until the context signals cancellation.
for {
select {
// Check for cancellation first. This is the exit path.
case <-ctx.Done():
fmt.Println("Worker stopping")
return
// Receive a job. Blocks if the channel is empty.
case job, ok := <-jobs:
// ok is false if the channel is closed.
if !ok {
return
}
fmt.Printf("Processing job %d\n", job)
}
}
}
func main() {
// Create a cancellable context.
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
jobs := make(chan int, 10)
// Start the worker.
go worker(ctx, jobs)
// Send a job.
jobs <- 1
// Cancel the context to trigger the exit path.
cancel()
// In a real app, you'd wait for the goroutine to finish.
// Here we just show the cancellation mechanism.
}
context.Context always goes as the first parameter, conventionally named ctx. Functions that take a context should respect cancellation and deadlines. If you wrap this logic in a struct, the receiver name should be one or two letters matching the type, like (w *Worker), not (this *Worker). The Go community relies on gofmt to standardize code style. Don't argue about indentation or braces. Let the tool decide. Most editors run gofmt on save.
Context is plumbing. Run it through every long-lived call site.
Common pitfalls and errors
Several patterns lead to leaks. Sending on a closed channel causes a panic, not a leak. The runtime stops the program with panic: send on closed channel. This is a hard error. Receiving from a closed channel returns the zero value immediately. It doesn't block. Leaks happen when receivers wait on open channels that never receive data.
Buffered channels can also leak. If a buffer fills up, senders block. If no one receives from the buffer, senders stay blocked forever. The time.After function creates a timer that fires once. If you use time.After inside a loop, the timer continues to exist even after the select case fires. The timer holds memory until it fires. This creates a leak over time. Use time.NewTimer and call Stop to avoid this.
Defer doesn't rescue leaks. The defer statement schedules a function call to run when the surrounding function returns. If the goroutine blocks forever, the function never returns. The deferred call never runs. Don't rely on defer to clean up resources if the goroutine might leak.
The community accepts the verbose if err != nil boilerplate because it makes the unhappy path visible. Use _ to discard values intentionally. result, _ := ... says you considered the second return value and chose to drop it. Use it sparingly with errors. Don't pass a *string. Strings are cheap to pass by value. Accept interfaces, return structs. This mantra keeps your code flexible and easy to test.
Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path.
Detecting leaks
You can detect leaks by monitoring the goroutine count. The runtime package exposes the current number of goroutines. Call runtime.NumGoroutine() to get the count. If the count grows steadily under load, you have a leak.
Here's how to check the count.
package main
import (
"fmt"
"runtime"
)
func main() {
// Print the current number of goroutines.
// Useful for monitoring in long-running processes.
fmt.Println("Goroutines:", runtime.NumGoroutine())
}
You can also use pprof to inspect goroutine stacks. The net/http/pprof package exposes a debug endpoint. Hitting /debug/pprof/goroutine?debug=1 prints stack traces of all goroutines. Look for goroutines stuck in the same place. If you see hundreds of goroutines blocked on the same channel receive, you've found the leak.
The worst goroutine bug is the one that never logs.
Decision matrix
Use context.Context when you need to cancel a goroutine from the outside, like stopping a request handler when the client disconnects. Use a channel close when the sender knows all data is sent and receivers should stop processing. Use a done channel when you have a simple signal to stop a background task and don't need the full context API. Use a timer with Stop when you need periodic work that must stop, and you want to avoid timer leaks. Use a sync.WaitGroup when you need to wait for multiple goroutines to finish, but remember it doesn't stop them; it only waits.
Trust the cancellation path. If a goroutine can't stop, it's a leak waiting to happen.