How to Profile a Go Program with pprof

The server fans are screaming

You wrote a function. It passes the tests. It handles the edge cases. You deploy it to production and immediately the server CPU hits 100 percent. The fans spin up. The response times climb. You stare at the code and guess which loop is too slow. Guessing is how you waste days. Profiling is how you find the truth in seconds.

Go ships with a profiler built into the standard library. You do not need third-party tools to see where your program spends its time. The runtime/pprof package captures a snapshot of your program's activity, and the go tool pprof command turns that snapshot into a map of the hot spots.

Sampling, not tracing

Profiling in Go relies on sampling. The runtime does not record every single instruction your program executes. That would be tracing, and tracing adds too much overhead. Instead, the profiler pauses your program every few milliseconds, looks at the current call stack, and writes down what function is running.

Think of it like a security camera that takes a photo every ten seconds. If you see the same person in the frame every time, they are the one doing the work. If you see a different person every time, the work is spread out. The profiler samples the CPU at a fixed frequency, usually 100 times per second. The functions that appear most often in the samples are the ones burning your CPU cycles.

This approach is fast enough to run on production systems with minimal impact. It gives you a statistical view of performance. You get enough data to find the bottleneck without slowing the program down to a crawl.

The bare minimum

Here's the simplest way to capture a CPU profile. You open a file, tell the runtime to start sampling, run your code, and then stop sampling.

package main

import (
	"os"
	"runtime/pprof"
)

func main() {
	// open file for writing the profile data
	f, err := os.Create("cpu.prof")
	if err != nil {
		panic(err)
	}
	// ensure file closes even if main panics
	defer f.Close()

	// tell the runtime to start sampling the stack
	if err := pprof.StartCPUProfile(f); err != nil {
		panic(err)
	}
	// stop sampling and flush the data to the file
	defer pprof.StopCPUProfile()

	// your code here
}

The os.Create call returns an error. The underscore discards the error in many quick scripts, but checking the error is safer. If the file fails to open, the profile never writes and you get no data. The defer statements ensure the file closes and the profiler stops even if your code panics. The runtime panics with panic: StartCPUProfile: ... if you call StartCPUProfile twice without stopping the first one. Always pair the start and stop calls.

Run the program. It produces a file named cpu.prof. This file is binary. You cannot read it with a text editor. You need the go tool pprof command to parse it.

go tool pprof cpu.prof

This command opens an interactive shell. You can type top to see the functions that consumed the most CPU. You can type web to generate a graph. You can type quit to exit. The tool reads the binary profile and builds a call graph from the sampled stacks.

How the data flows

The profiler writes the samples to the file as they happen. The file grows while the program runs. When you call StopCPUProfile, the runtime flushes the remaining samples and closes the internal buffer. The file is complete.

The go tool pprof command reads the file and aggregates the samples. It groups samples by function. It calculates two numbers for each function: flat time and cumulative time. Flat time is the time spent inside the function itself. Cumulative time is the time spent inside the function plus all the functions it calls.

If a function has high flat time, it is doing work. If a function has high cumulative time but low flat time, it is calling other functions that do the work. This distinction matters. You might see a function at the top of the list and assume it is slow. It might just be the entry point that calls the slow function. Look at the flat time to find the actual bottleneck.

The tool also supports a web interface. Run the command with the -http flag to open a browser window with a flame graph.

go tool pprof -http=:8080 cpu.prof

The flame graph shows the call stack as horizontal bars. The width of the bar represents the time spent. The top of the graph is the root of the call stack. The bottom shows the leaf functions. The widest bars are the hot spots. Click a bar to zoom in. Click the background to zoom out. The visual map makes it easy to see where the time goes.

Profiling a real handler

Real code usually runs a server or a long job. Here's how you profile an HTTP handler that does heavy computation. You wrap the handler with the profiler so you only capture the time spent in that request.

package main

import (
	"net/http"
	"os"
	"runtime/pprof"
)

func heavyHandler(w http.ResponseWriter, r *http.Request) {
	// profile only this request to isolate the cost
	f, err := os.Create("handler.prof")
	if err != nil {
		http.Error(w, "profile failed", 500)
		return
	}
	defer f.Close()

	// start sampling for this handler
	if err := pprof.StartCPUProfile(f); err != nil {
		http.Error(w, "profile failed", 500)
		return
	}
	defer pprof.StopCPUProfile()

	// calculate primes to simulate CPU-bound work
	primes := findPrimes(100000)
	w.Write([]byte(fmt.Sprintf("found %d primes", len(primes))))
}

func findPrimes(n int) []int {
	// sieve of eratosthenes
	sieve := make([]bool, n)
	var primes []int
	for i := 2; i < n; i++ {
		if !sieve[i] {
			primes = append(primes, i)
			for j := i * i; j < n; j += i {
				sieve[j] = true
			}
		}
	}
	return primes
}

func main() {
	http.HandleFunc("/heavy", heavyHandler)
	http.ListenAndServe(":8080", nil)
}

The handler creates a profile file for each request. This isolates the cost of the handler from the rest of the server. The defer statements ensure the profiler stops even if the handler panics. The findPrimes function does the work. The profile captures the time spent in the sieve loop.

Run the server. Hit the endpoint with curl. The server writes handler.prof. Run go tool pprof handler.prof. You will see findPrimes at the top of the list. The flat time will be high. The sieve loop is the bottleneck. You can optimize the loop or reduce the input size.

The convention for receiver names applies here too. If you move findPrimes to a struct, name the receiver p or s, not this or self. Go code uses short receiver names. The profiler does not care about the name, but the community expects the convention.

Reading the flame graph

The flame graph is the most powerful view. It shows the call stack hierarchy. The root is at the top. The leaves are at the bottom. The width of each bar is proportional to the time spent.

Look for the widest bars. They are the functions consuming the most CPU. If a bar is wide and tall, the function is doing work and calling other functions. If a bar is wide and thin, the function is doing work but not calling much. If a bar is narrow, the function is fast.

Click a bar to see the samples that contributed to it. The tool shows the stack trace for each sample. You can see the exact line of code that was running. This helps you pinpoint the loop or allocation causing the slowdown.

The top command in the interactive shell shows a table of functions. The table has columns for flat, flat%, sum%, and cum. The flat% column shows the percentage of total time spent in the function. The sum% column shows the cumulative percentage. The cum column shows the cumulative time. Use flat% to find the hot functions. Use cum to find the entry points.

The tool also supports the list command. Type list findPrimes to see the source code with the sample counts annotated. The numbers next to the lines show how many samples hit that line. This tells you exactly which line is slow. You do not need to guess. The data shows you.

Pitfalls and errors

Profiling introduces overhead. The runtime pauses the program to take samples. This adds latency. Do not profile in production without care. Use a staging environment or a low-traffic window. The overhead is small, but it is real.

Forgetting to stop the profiler is a common mistake. If you call StartCPUProfile and never call StopCPUProfile, the profile keeps growing. The file handle stays open. The memory usage climbs. The runtime panics with panic: StartCPUProfile: ... if you try to start a second profile. Always use defer to stop the profiler.

Ignoring errors from os.Create is another mistake. The underscore discards the error. If the file fails to open, the profile writes to nowhere. You get an empty file. Check the error or the profile will fail silently. The compiler rejects this with undefined: pprof if you forget the import. The compiler catches missing imports early. Runtime errors are harder to find.

Sampling noise is a third pitfall. The profiler samples at a fixed frequency. If a function runs for a short time, it might not get sampled. You might miss fast functions. This is a trade-off. Sampling is fast. Tracing is accurate but slow. Accept the noise. The wide bars in the flame graph are real. The narrow bars might be noise. Focus on the wide bars.

The profiler captures CPU time. It does not capture I/O time. If your program is waiting for a database or a network call, the profiler shows the time spent in the I/O function, not the time spent waiting. Use runtime/trace to see blocking and system calls. The profiler is for CPU. The tracer is for blocking.

When to use pprof

Use go test -bench when you want a quick number for a single function. Benchmarks give you a baseline. They run the function many times and report the average time. Use benchmarks to measure the impact of a change.

Use runtime/pprof for CPU when you need to find the hot path in a running program. The profiler shows you the call stack. It reveals the functions burning your CPU. Use the profiler when the benchmark does not show the full picture.

Use runtime/pprof for memory when you suspect leaks or excessive allocation. The heap profile shows where memory is allocated. It shows the size and count of allocations. Use the heap profile when the memory usage climbs over time.

Use runtime/trace when you need to see goroutine scheduling, blocking, and system calls. The tracer records events as they happen. It shows when goroutines start, block, and exit. Use the tracer when the CPU profile shows time spent in I/O functions.

Use net/http/pprof when you want to profile a running server without restarting it. The package registers endpoints on /debug/pprof. You can hit the endpoint to get a profile. Use this for production debugging.

Profiling is evidence. Don't guess; measure. The widest bar in the flame graph is your enemy. Stop the profiler before you stop the program.

Where to go next

Profiling is like putting a stopwatch on your code to see exactly where it spends the most time. You run your program with a special recorder that saves a report file. Then you open that report to find the slow functions so you can speed them up.