When the CPU spikes and you don't know why
You deployed a service that processes images. Locally, a request takes 50 milliseconds. In production, under load, latency jumps to two seconds and the CPU usage graph looks like a sawtooth wave. You added fmt.Println timestamps, but the logs slow the code down, clutter the output, and still don't tell you which loop is burning cycles. You need a map of where the processor actually spends its time.
Profiling captures that map. It records snapshots of your program's execution so you can see the hotspots without guessing. Go includes a built-in profiler that generates data in the universal pprof format. You can analyze the data in a terminal, a web browser, or a visualization tool.
How CPU profiling works
CPU profiling relies on sampling. The Go runtime interrupts your program at regular intervals, records the current call stack, and lets the program continue. By default, the runtime samples every 10 milliseconds, which corresponds to a rate of 100 Hertz. This overhead is usually under 10% and gives you a statistical view of execution.
Think of it like a time-lapse camera in a busy kitchen. The camera takes a photo every few seconds. If the chef appears in 80% of the photos chopping vegetables, you know chopping is the dominant activity. You don't need to watch every second to understand the workload.
The profiler records the stack trace for each sample. If a function appears in many samples, it consumes a large fraction of the CPU time. The data goes into a binary file that tools can parse. The file contains function names, source locations, and sample counts.
Generating a profile from code
Here's a program that wastes CPU cycles on purpose so we can see the profiler in action.
package main
import (
"os"
"runtime/pprof"
)
// HeavyLifting simulates CPU-bound work by spinning in a loop.
func HeavyLifting() {
var sum int64
// Spin loop to burn cycles; the profiler will catch this function frequently.
for i := 0; i < 100000000; i++ {
sum += i
}
// Prevent the compiler from optimizing the loop away entirely.
_ = sum
}
func main() {
// Create a file to store the profile data.
f, err := os.Create("cpu.prof")
if err != nil {
panic(err)
}
// Start recording CPU samples and defer closing the file.
if err := pprof.StartCPUProfile(f); err != nil {
panic(err)
}
defer pprof.StopCPUProfile()
// Run the work while the profiler is active.
HeavyLifting()
}
Run the program with go run main.go. It creates a file named cpu.prof. The pprof.StartCPUProfile call tells the runtime to begin sampling. The defer pprof.StopCPUProfile() ensures sampling stops when main exits. The convention is to handle the error from StartCPUProfile immediately and defer the stop. This pattern keeps the profile window tight and prevents resource leaks.
Walking through the runtime behavior
When StartCPUProfile runs, the runtime installs a signal handler or uses OS-specific mechanisms to generate periodic interrupts. Every 10 milliseconds, the interrupt fires. The runtime pauses the goroutine, walks the stack, and records the frames. The sample includes the function names and the line numbers.
The samples accumulate in memory and flush to the file. The file uses the pprof binary format, which is compact and supports multiple profile types. This format is shared across languages and tools. You can open the file with go tool pprof, FlameGraph, Jaeger, or Grafana.
When StopCPUProfile runs, the runtime flushes the remaining samples and closes the file. If you forget to stop the profiler, the file may be truncated or corrupted. The compiler won't catch a missing StopCPUProfile call. You just get incomplete data. Always defer the stop.
Start the profiler, do the work, stop the profiler. The file holds the map.
Analyzing the profile
Open the profile with the built-in tool:
go tool pprof cpu.prof
This launches an interactive shell. Type top to see the hottest functions. The output lists functions by sample count. Each row shows the flat samples and the cumulative samples.
Flat samples are the samples where the function itself was at the top of the stack. Cumulative samples include samples where the function was anywhere in the stack, including calls to other functions. If HeavyLifting has high flat samples, the work is inside that function. If main has high cumulative samples but low flat samples, the work is in the functions main calls.
Type web to generate an interactive SVG call graph. The tool opens a browser window showing nodes for functions and edges for calls. The size of a node represents the flat samples. The thickness of an edge represents the cumulative samples flowing through that call. You can click nodes to drill down.
Type list HeavyLifting to see the annotated source code. The tool highlights lines that appear in samples. This helps you pinpoint the exact loop or expression causing the hotspot.
Samples are statistics. Trust the trend, not the exact percentage.
Profiling a realistic HTTP service
Real services run over HTTP and handle concurrent requests. You can expose profiles without stopping the server using the net/http/pprof package.
package main
import (
"fmt"
"net/http"
_ "net/http/pprof" // Side-effect import registers /debug/pprof handlers.
)
// SlowHandler simulates a CPU-intensive request.
func SlowHandler(w http.ResponseWriter, r *http.Request) {
// Burn cycles to simulate processing.
var sum int64
for i := 0; i < 50000000; i++ {
sum += i
}
fmt.Fprintf(w, "Done: %d", sum)
}
func main() {
// Register the slow endpoint.
http.HandleFunc("/work", SlowHandler)
// pprof handlers are already registered by the side-effect import.
// They listen on /debug/pprof/profile, /debug/pprof/heap, etc.
fmt.Println("Server on :8080. Hit /work, then profile.")
http.ListenAndServe(":8080", nil)
}
The blank import _ "net/http/pprof" runs the package's init function, which registers handlers under /debug/pprof/. This is a standard Go convention for side-effect imports. You use the blank identifier to signal that you want the initialization behavior, not the exported names.
Start the server. Hit the /work endpoint in a loop to generate load. Then capture a profile from the command line:
go tool pprof http://localhost:8080/debug/pprof/profile?seconds=10
The tool connects to the server, requests a 10-second CPU profile, and downloads the data. You can analyze it interactively just like a local file. This approach lets you profile production traffic safely, as long as you restrict access to the /debug/pprof endpoints.
Side-effect imports register handlers. The blank identifier is the key.
Pitfalls and gotchas
Inlining changes the call stack. The Go compiler inlines small functions to reduce call overhead. When a function is inlined, the profile shows the caller's stack frame with a note like (inlined). You might see main.go:12:3 instead of Helper. Check the source location to understand where the work happens. If you need to force a function to appear in the profile, add the //go:noinline directive above it.
Garbage collection consumes CPU time. The profiler samples the GC workers just like application code. If you see runtime.gcBgMarkWorker or runtime.mspan_free in the top list, memory allocation is driving CPU usage. Switch to a memory profile to investigate allocations. Use go tool pprof -sample_index=alloc_objects heap.prof to see allocation hotspots.
Sampling rate affects resolution. The default 100 Hertz rate is good for most cases. Very short functions might not appear in samples. If you suspect a micro-hotspot, increase the rate with runtime.SetCPUProfileRate(1000) before starting the profile. Higher rates increase overhead and file size. Use this only when you need finer granularity.
Error handling matters. If you pass a nil file to StartCPUProfile, the function returns an error like nil file. If you try to write to a read-only directory, you get open cpu.prof: permission denied. Always check the error. If you run the profiler on a program that exits abruptly, the file may be incomplete. The compiler won't warn you about missing stops or permission issues. You get runtime errors or silent data loss.
Inlining hides functions. Look at the source lines.
When to use which tool
Use go tool pprof -top when you need a quick list of the most expensive functions to prioritize optimization.
Use go tool pprof -web when you need to understand the call graph and see which callers contribute to a hotspot.
Use go tool pprof -http=:8080 when you want an interactive browser UI to explore the profile without memorizing commands.
Use go tool pprof -base old.prof new.prof when you want to compare two profiles and see the delta after a change.
Use net/http/pprof when you need to profile a running service without restarting it or modifying the binary.
Use runtime.SetCPUProfileRate when you need higher resolution sampling for very short-lived hotspots, accepting higher overhead.
Profile before you optimize. Guessing wastes time.