Profile with pprof

Your Go service handles requests fine during the day. At 2 AM, traffic drops, but the CPU usage stays pinned at 100%. Or maybe memory usage climbs slowly over three days until the OOM killer steps in. You added fmt.Println everywhere, but the logs just show normal flow. Printing logs changes timing and hides the real bottleneck. You need to see what the machine is actually doing, not what your code claims to do.

How profiling works

Profiling measures performance by sampling the program state. The runtime/pprof package provides the tools to capture these samples. It works like a high-speed camera pointed at your execution flow. Instead of instrumenting every function call, which adds massive overhead, the profiler takes snapshots at regular intervals.

For CPU profiling, the runtime sets a timer. Every few milliseconds, the timer fires. The runtime pauses the goroutine, records the current stack trace, and writes it to the output. If a function appears in the stack trace often, that function is consuming CPU time. The frequency of appearance correlates directly with execution cost.

Memory profiling tracks allocations. It records where objects are created and how much space they occupy. The data helps find leaks and identifies hot allocation paths. Goroutine profiling dumps the stack traces of all running goroutines, which is essential for debugging deadlocks or stuck workers.

Sampling is the key design choice. Tracing records every event. Sampling records a fraction. Sampling scales to high-throughput systems because the overhead is constant and small. Tracing generates terabytes of data and slows the program down significantly. Go prefers sampling for performance analysis.

Minimal CPU profile

Here's the simplest way to capture a CPU profile: open a file, start sampling, run the workload, stop sampling, and close the file.

package main

import (
	"os"
	"runtime/pprof"
)

func main() {
	// Open file for profile output.
	f, err := os.Create("cpu.prof")
	if err != nil {
		panic(err)
	}
	// Defer close to release the file handle.
	defer f.Close()

	// Start CPU sampling.
	// Runtime records stack traces at fixed intervals.
	if err := pprof.StartCPUProfile(f); err != nil {
		panic(err)
	}
	// Stop sampling after workload.
	defer pprof.StopCPUProfile()

	// Run workload.
	// Profiler captures samples during this loop.
	for i := 0; i < 1e9; i++ {
		_ = i * i
	}
}

Profiling is sampling, not simulation. Trust the samples, but verify with code.

Analyzing the data

The profile file contains raw sample data. You need the pprof tool to make sense of it. The tool ships with the Go distribution. Run go tool pprof cpu.prof to start the interactive shell.

The shell gives you a command prompt. Type top to see the functions consuming the most time. The output lists functions sorted by samples. The first column shows the self time, which is time spent in the function itself. The second column shows the cumulative time, which includes time spent in callees.

Type list FunctionName to see the source code annotated with sample counts. The tool highlights the hot lines. This bridges the gap between the abstract profile and the concrete code. You can see exactly which loop iteration or branch is burning cycles.

Type web to generate a graph. The tool uses Graphviz to draw the call stack. The graph shows functions as nodes and calls as edges. The width of the edges represents the proportion of samples. This visual view helps spot unexpected call paths or deep recursion.

You can also run go tool pprof -http=:8080 cpu.prof to launch a web interface. The browser shows flame graphs, top lists, and source code in one place. Flame graphs are particularly useful. Each rectangle represents a function. The width is proportional to CPU time. The stack depth shows the call hierarchy. Wide rectangles at the bottom are the root causes.

Profiling data tells you where the time goes. It does not tell you why. Always correlate profile results with code review.

Realistic server profiling

Production services rarely use file-based profiling for every request. Instead, they expose the built-in HTTP handlers that ship with the net/http/pprof package. This lets you grab profiles on demand without restarting the server.

package main

import (
	"net/http"
	_ "net/http/pprof" // Blank import registers handlers.
)

func main() {
	// Blank import registers /debug/pprof/* handlers.
	// These handlers serve profiles via HTTP requests.
	_ = http.DefaultServeMux

	// Start server.
	// Profiles available at localhost:8080/debug/pprof/profile
	http.ListenAndServe(":8080", nil)
}

The blank import _ "net/http/pprof" is a standard convention. The underscore tells the compiler you want the side effects of the import, not the package name. The package registers HTTP handlers in its init function. This pattern appears in database drivers and other initialization-heavy packages.

The server now responds to requests on /debug/pprof/. The path /debug/pprof/profile?seconds=30 captures a 30-second CPU profile. The path /debug/pprof/heap captures the current heap state. The path /debug/pprof/goroutine?debug=2 dumps all goroutine stacks.

You can pipe the output directly to the tool. Run go tool pprof http://localhost:8080/debug/pprof/profile?seconds=30 to analyze a live profile. The tool downloads the data and drops you into the interactive shell.

Expose pprof behind authentication. Never serve raw profiles to the public internet.

Memory and goroutine profiles

CPU profiling finds hot functions. Memory profiling finds leaks and allocation pressure. The heap profile shows allocations, but garbage collection reclaims memory. If you want to see what is currently live, you should trigger a GC before writing the profile. Otherwise, the profile includes objects that are already dead but haven't been collected yet.

Use pprof.WriteHeapProfile to capture memory data. This function writes the heap profile to an io.Writer. It is often used in tests or custom handlers. The profile distinguishes between inuse_space and alloc_space. Inuse space shows memory currently held by live objects. Alloc space shows total allocations since the program started. High alloc space with low inuse space means the GC is working hard. High inuse space means objects are staying alive too long.

Goroutine profiling helps debug concurrency issues. A goroutine leak happens when a goroutine waits on a channel that never gets closed. The goroutine stays in memory, consuming resources. Over time, the leak exhausts memory or file descriptors.

Use pprof.Lookup("goroutine").WriteTo to dump goroutine stacks. This captures the state of every goroutine. Analyze the output to find goroutines stuck in the same function. If hundreds of goroutines are blocked on the same channel operation, you likely have a leak or a deadlock.

The worst goroutine bug is the one that never logs. Profile goroutines when the process behaves strangely.

Pitfalls and errors

Profiling introduces overhead. The sampling timer interrupts your program. In a latency-sensitive service, profiling can make the program slower, which changes the behavior you are trying to measure. This is the observer effect. Always profile in an environment that matches production load, but be aware the numbers will be slightly worse than reality.

Memory profiling requires care. The heap profile is a snapshot. Allocations and frees happen continuously. If you capture a profile during a burst of activity, the data might look noisy. Trigger a GC before capturing to get a stable view of live objects. Use runtime.GC() to force collection.

Errors in profiling code are usually setup errors. If you call StartCPUProfile twice without stopping, the runtime returns an error. The error message reads pprof: CPU profile already started. Check the error return value to avoid silent failures. If you pass a nil file handle, the runtime panics with a nil pointer dereference. The compiler does not catch nil pointers passed to functions expecting io.Writer.

If you forget to stop the profile, the sampling continues until the program exits. The file might be incomplete or corrupted. Always defer StopCPUProfile immediately after starting.

Profiling changes performance. Measure the symptom, not the cure.

When to use what

Use runtime/pprof with a file handle when you are profiling a test case or a command-line tool that runs a finite workload and exits. Use the net/http/pprof package when you are running a web server and need to capture profiles on demand via HTTP requests. Use go tool pprof -http when you have a profile file and want to explore the data interactively in a browser with flame graphs. Use pprof.Lookup("goroutine").WriteTo when you need to dump the current stack traces of all goroutines to debug a deadlock or leak. Use external tracing tools like OpenTelemetry when you need to correlate performance data across multiple services in a distributed system.

Profile early, profile often, but never optimize without data.

Where to go next

Profiling with pprof records how your Go program uses CPU or memory while it runs. It helps you find slow functions or memory leaks so you can make your code faster. Think of it like a fitness tracker for your software that shows exactly where it is working hardest.