How to Use pprof for Production Profiling in Go

Use runtime/pprof to capture profiles and go tool pprof to analyze performance bottlenecks in Go applications.

When logs stop telling the story

You deploy a new service. It passes all tests. It handles the first thousand requests without breaking a sweat. Then traffic doubles. Latency climbs. CPU usage hits 85 percent. You add more logs, but logs only tell you what happened, not why it took so long. You need to see inside the running process. That is where pprof comes in.

Go ships with profiling built into the standard library. You do not need third-party agents or compiler flags. The runtime/pprof package lets you capture CPU, memory, goroutine, and mutex profiles at runtime. The net/http/pprof package exposes those profiles over HTTP so you can pull them from a live server. The go tool pprof CLI parses the captured data and turns it into an interactive call graph.

Profiling is not magic. It is statistical sampling. The runtime interrupts your program at regular intervals, captures the current stack trace, and records which function was executing. Over time, the samples accumulate into a map of where time and memory actually go.

Sampling is statistical. It misses the exact microsecond, but it catches the pattern.

How sampling actually works

The default CPU profiler fires a timer every 10 milliseconds. When the timer triggers, the runtime pauses the current goroutine, walks the call stack, and writes the frame addresses to a buffer. The overhead is usually under 1 percent, which is why you can leave it running in production. The data writes to a file in a binary protobuf format. That format stores the stack traces, the sample counts, and metadata like the sampling period and build ID.

You can change the sampling rate with runtime/debug.SetCPUProfileRate. A higher rate gives finer resolution but adds more overhead. A lower rate reduces noise but might miss short-lived hot paths. Most teams stick to the default 100 Hz. The runtime handles the signal routing and stack unwinding automatically. You just point it at a file or an HTTP response writer.

The binary file is just a list of stack traces with timestamps. Treat it like raw sensor data.

Capture a CPU profile in a script

Here is the simplest way to profile a standalone program. You create a file, start the sampler, run your workload, and stop the sampler.

package main

import (
	"log"
	"os"
	"runtime/pprof"
	"time"
)

func main() {
	// Open a file to store the binary profile data
	f, err := os.Create("cpu.prof")
	if err != nil {
		log.Fatal(err)
	}
	// Start sampling the CPU every 10ms and write to the file
	if err := pprof.StartCPUProfile(f); err != nil {
		log.Fatal(err)
	}
	// Ensure the sampler stops and the file flushes on exit
	defer pprof.StopCPUProfile()

	// Simulate a heavy computational workload
	time.Sleep(5 * time.Second)
}

Run the program with go run main.go. After five seconds, you will find a cpu.prof file in your directory. The file contains thousands of stack snapshots. Each snapshot represents a moment when the CPU was busy executing your code.

The defer pprof.StopCPUProfile() call is mandatory. If you skip it, the file descriptor stays open, the buffer never flushes, and the profile data remains incomplete. Go prefers explicit resource management over garbage collection for I/O handles.

Always stop the profiler before the process exits. Unflushed buffers corrupt the data.

Read the data with the CLI

The go tool pprof command reads the binary file and opens an interactive terminal interface. Run go tool pprof cpu.prof and you will see a prompt. Type top to list the functions consuming the most CPU time. The output shows two columns: flat time and cumulative time. Flat time is the time spent directly inside that function. Cumulative time includes time spent in functions called by that function.

Type list main to see your source code annotated with sample counts. Lines with higher numbers executed more often during the sampling window. Type web to open a graph visualization in your browser, or go tool pprof -http=:8080 cpu.prof to launch a web UI with flame graphs. Flame graphs stack functions vertically. The wider the block, the more samples it captured. Hot paths appear as wide horizontal bands near the top.

The CLI does not modify your code. It only reads the snapshot and builds a graph.

Mount pprof on a production server

In production, you rarely run a script that sleeps. You run a long-lived HTTP server. Go provides net/http/pprof which automatically registers handlers under /debug/pprof/. You mount it on a local address and pull profiles on demand.

package main

import (
	"log"
	"net/http"
	_ "net/http/pprof"
)

func main() {
	// The blank import triggers the pprof package init function
	// Handlers are automatically registered at /debug/pprof/*
	mux := http.NewServeMux()
	mux.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
		// Return a simple 200 OK for load balancer checks
		w.WriteHeader(http.StatusOK)
	})

	// Bind to localhost only to prevent external network access
	log.Println("pprof listening on localhost:6060")
	if err := http.ListenAndServe("localhost:6060", mux); err != nil {
		log.Fatal(err)
	}
}

The _ "net/http/pprof" import uses the blank identifier to trigger the package's init() function without importing a named symbol. This is a deliberate Go pattern for side-effect imports. The init() function registers routes like /debug/pprof/profile, /debug/pprof/heap, and /debug/pprof/goroutine.

To capture a 30-second CPU profile from the running server, run curl localhost:6060/debug/pprof/profile?seconds=30 > live.prof. The server blocks the request for 30 seconds, samples the CPU, and streams the binary data back to your terminal. You can then feed it to the CLI: go tool pprof live.prof.

For visual analysis, run go tool pprof -http=:8080 localhost:6060/debug/pprof/profile. The tool connects to the server, downloads the profile, and opens a browser window with interactive flame graphs. You can click any block to see the exact source lines, filter by package, or compare two profiles side by side.

Localhost is your firewall. Never bind pprof to 0.0.0.0 or expose it behind a public load balancer.

Memory, mutex, and block profiles

CPU profiling shows where time goes. Memory profiling shows where allocations live. Run curl localhost:6060/debug/pprof/heap > heap.prof to capture a snapshot of the heap. The heap profile tracks live objects, not total allocations. If you want to see allocation rates over time, use /debug/pprof/allocs. The allocs profile counts every make and new call, which helps you spot functions that churn memory unnecessarily.

Before capturing a heap profile, call runtime.GC() to force a garbage collection cycle. Without it, the profile will include dead objects that the collector has not yet swept. Dead objects distort the picture and make you optimize the wrong functions.

Mutex and block profiles track contention. The /debug/pprof/mutex endpoint shows where goroutines wait for locks. The /debug/pprof/block endpoint shows where goroutines wait on channels or network I/O. These profiles are invaluable when your CPU is idle but latency is high. They reveal synchronization bottlenecks that CPU sampling misses entirely.

A profile without context is just noise. Run the same workload in production and staging to isolate environment-specific overhead.

Common pitfalls and runtime errors

Profiling is straightforward, but a few patterns trip up new users. The most common mistake is exposing the /debug/pprof/ routes to the internet. Attackers can trigger CPU spikes, force memory allocations, or read process memory layouts. Always bind to localhost or restrict access with a reverse proxy and IP allowlists.

Another trap is forgetting to close the file handle after StartCPUProfile. If you skip defer f.Close(), the operating system keeps the file descriptor open. After enough profiles, you hit the file descriptor limit and the process starts failing with too many open files. The runtime will also reject writes to a closed file with write: file already closed if you accidentally stop the profiler before the deferred close runs.

Memory profiling requires explicit GC coordination. If you capture a heap profile during a heavy allocation phase, you will see inflated numbers. The runtime does not automatically compact the heap for you. Call runtime.GC() right before the capture, or use the ?gc=1 query parameter on the HTTP endpoint to force a collection automatically.

The compiler will complain with imported and not used if you import net/http/pprof without the blank identifier. Adding _ tells the compiler you intentionally want the side effects. This is standard Go convention for packages that register handlers or drivers during initialization.

Goroutine leaks often show up in /debug/pprof/goroutine. If a goroutine waits on a channel that never closes, it stays in the profile forever. Always design a cancellation path for long-running goroutines, usually through context.Context. Context is plumbing. Run it through every long-lived call site.

When to reach for pprof

Use runtime/pprof.StartCPUProfile when you need a focused, time-bound CPU snapshot for a specific batch job or integration test. Use net/http/pprof when you run a long-lived HTTP service and want on-demand profiling without restarting the process. Use go tool pprof -http when you need a visual flame graph to spot hot paths across multiple packages. Use continuous profiling agents like Parca or Pyroscope when you need historical trend data across a fleet of services. Use plain logging and metrics when you only need to track request counts or latency percentiles, not function-level overhead.

Profiling is a flashlight, not a hammer. Point it at the dark spots, then fix the code.

Where to go next