How to Monitor Goroutine and Memory Usage in Production

Import net/http/pprof to expose live goroutine and memory metrics via a local HTTP endpoint.

The silent leak

A Go service runs smoothly for three weeks. Response times stay under fifty milliseconds. Then the CPU usage climbs to eighty percent. Memory creeps upward by two hundred megabytes every hour. The logs show no panics. The error counters stay flat. You restart the container and everything recovers, only to repeat the cycle a week later. The problem is not a single broken request. The problem is invisible accumulation. Goroutines are piling up on blocked channels. Slices are growing without eviction. The runtime is doing exactly what you told it to do, but you cannot see where the work is actually happening.

Visibility in Go does not require instrumentation libraries or bytecode weaving. The runtime already tracks every goroutine, every heap allocation, and every lock contention event. It just needs a door to show you the ledger.

What the runtime is actually tracking

Go manages concurrency and memory differently from languages that delegate everything to the operating system. The runtime maintains its own scheduler, its own garbage collector, and its own memory allocator. Goroutines are not OS threads. They are lightweight execution contexts that the scheduler multiplexes across a small pool of OS threads. When a goroutine blocks on I/O, the scheduler parks it and runs another one. When it wakes up, it resumes exactly where it left off.

Memory follows a similar pattern. Small allocations live in per-goroutine caches. Large allocations go straight to the heap. The garbage collector runs in the background, tracing live objects from root pointers and sweeping unreachable memory. The runtime keeps a continuous record of what is alive, where it was allocated, and which goroutine holds it.

Think of the runtime as a restaurant kitchen. Goroutines are line cooks. Memory is the counter space and the ingredients. The scheduler is the head chef moving people between stations. pprof is the manager's clipboard that shows exactly who is working, what they are holding, and where the bottleneck sits. You do not need to ask each cook to fill out a timesheet. The clipboard already exists. You just need to know how to read it.

The zero-configuration profiler

The standard library ships with a profiling server that requires zero custom code. Importing a single package registers HTTP handlers on the default multiplexer. Those handlers stream runtime data in a format that the go tool pprof command understands.

Here is the minimal setup to expose the profiler on a dedicated port:

package main

import (
    "log"
    "net/http"
    _ "net/http/pprof" // registers /debug/pprof/* handlers on the default mux
)

func main() {
    go func() {
        // runs on a separate port so profiling never blocks your app traffic
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()

    // your application logic starts here
    http.ListenAndServe(":8080", nil)
}

The blank identifier import is a deliberate Go convention. The underscore tells the compiler you care about the side effects of the package, not its exported names. The init function inside net/http/pprof registers routes like /debug/pprof/goroutine, /debug/pprof/heap, and /debug/pprof/profile. You do not call any functions from that package. The import alone is enough.

Visit http://localhost:6060/debug/pprof/ in a browser to see the list of available endpoints. Clicking /debug/pprof/goroutine returns a stack trace of every active goroutine. Clicking /debug/pprof/heap returns a snapshot of live allocations. The data is raw but complete.

Goroutines are cheap. Channels are not magic.

How the data flows

When you request a profile, the runtime pauses the world briefly, walks its internal data structures, and serializes the results. The pause is measured in milliseconds, not seconds. The profiler does not sample continuously. It captures a point-in-time snapshot when you ask for it.

The goroutine profile lists every active goroutine along with its call stack. If a goroutine is blocked on a channel send, the stack shows the channel operation. If it is waiting on a mutex, the stack shows the lock acquisition. The heap profile shows which functions allocated the most memory, how much is still reachable, and how much was freed between snapshots.

The CPU profile works differently. It samples the program counter every ten milliseconds and records the stack trace at that moment. Over thirty seconds, you get thousands of samples. The profiler aggregates them and shows which functions consume the most CPU time. This is why the CPU profile requires a duration parameter: ?seconds=30.

You can also control what the runtime exposes. Goroutine labels were added in Go 1.21 to let you attach metadata to a goroutine for debugging. In production, those labels can leak internal state. Set GODEBUG=tracebacklabels=0 before starting the binary to strip them from stack traces. The runtime respects this flag without requiring code changes.

The profiler gives you the truth. Your job is to ask the right questions.

Monitoring in a real service

Running the profiler on the default mux works for local development. Production services need separation, authentication, and programmatic access. You typically mount the profiler on a custom router, restrict it to internal networks, and fetch profiles via CLI or CI pipelines.

Here is how a production-ready setup looks:

package main

import (
    "context"
    "log"
    "net/http"
    "time"

    "net/http/pprof"
)

func main() {
    mux := http.NewServeMux()

    // mount profiler under a dedicated prefix to avoid routing collisions
    mux.HandleFunc("/debug/pprof/", pprof.Index)
    mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
    mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
    mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
    mux.HandleFunc("/debug/pprof/trace", pprof.Trace)

    go func() {
        // bind to loopback only so external traffic cannot reach it
        log.Println(http.ListenAndServe("127.0.0.1:6060", mux))
    }()

    // simulate a long-running service with graceful shutdown
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    go runService(ctx)

    // wait for interrupt signal before exiting
    <-ctx.Done()
}

func runService(ctx context.Context) {
    // context always goes first so cancellation propagates cleanly
    ticker := time.NewTicker(5 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return
        case <-ticker.C:
            log.Println("heartbeat")
        }
    }
}

The custom mux keeps profiling routes isolated from your application routes. Binding to 127.0.0.1 prevents accidental exposure. The context.Context parameter follows the standard convention: it always appears as the first argument, conventionally named ctx, and functions that accept it must respect cancellation and deadlines. The verbose if err != nil pattern is not shown here, but in real handlers you would check every I/O call and return early. The community accepts the boilerplate because it makes the unhappy path visible.

To fetch a profile programmatically, you use curl or wget and pipe the output to go tool pprof. The command opens an interactive terminal where you can type top to see the hottest functions, list to view annotated source code, or web to generate a graph. The tool reads the same format that the HTTP endpoints serve.

Trust the numbers. Ignore the noise.

When things go sideways

Profiling reveals problems, but it also introduces new failure modes if you treat it like a firehose. The most common mistake is exposing the profiler to the public internet. Anyone can request a CPU profile, freeze your service for thirty seconds, and download your stack traces. Always bind to localhost or restrict access with a reverse proxy and IP allowlists.

Goroutine leaks are the second trap. A goroutine that waits on an unbuffered channel will block forever if the sender never arrives. The profiler will show thousands of identical stacks stuck on <chan receive>. The fix is not to add more memory. The fix is to ensure every channel has a close path or a context timeout. The worst goroutine bug is the one that never logs.

Memory retention follows a similar pattern. Slices that grow without bounds, maps that never evict keys, or caches that hold onto request bodies will show up in the heap profile as steady growth. The garbage collector cannot free memory that your program still references. You must break the reference chain.

Runtime errors will appear if you misuse the profiler. Requesting a CPU profile without a duration parameter returns a 400 Bad Request with missing seconds parameter. Forgetting to pass a valid context to a long-running handler triggers context canceled when the client disconnects. The compiler rejects unused imports with imported and not used, so you cannot accidentally leave a profiling package in your build without it doing something.

Convention matters here. Receiver names should be one or two letters matching the type: (s *Server) ServeHTTP(...), not (this *Server). Public names start with a capital letter. Private names start lowercase. There are no public or private keywords. The language relies on visibility rules, not decorators. Follow the pattern and your code will read like every other Go codebase.

The profiler shows you what is alive. Your job is to decide what should die.

Picking the right tool

Profiling is powerful, but it is not the only way to observe a Go service. Different tools answer different questions. Choose based on what you need to measure and how often you need to measure it.

Use pprof when you need a deep, point-in-time snapshot of goroutines, heap allocations, or CPU usage. Use Prometheus metrics when you need continuous, aggregated counters and histograms that feed into dashboards and alerting rules. Use OpenTelemetry tracing when you need to follow a single request across multiple services and see latency breakdowns by span. Use structured logging when you need human-readable context for specific errors or business events. Use plain sequential code when you do not need concurrency: the simplest thing that works is usually the right thing.

Each tool has a cost. pprof pauses the world briefly and produces large outputs. Metrics add allocation overhead on every increment. Tracing adds context propagation and network calls. Logging adds I/O and storage costs. Pick the tool that matches the question. Do not instrument everything.

Observability is a discipline, not a library.

Where to go next