How to Profile and Optimize HTTP Servers in Go

The server slows down under load

Your API works perfectly on localhost. You deploy to staging, run a load test, and the response time jumps from 50 milliseconds to two seconds. The CPU graph spikes. You stare at the code and see nothing obviously wrong. The bottleneck isn't the algorithm; it's the runtime behavior under pressure.

Profiling turns guessing into measurement. Go includes a built-in profiling system called pprof that captures snapshots of your program's execution. You can see exactly which functions consume CPU cycles, where memory allocations pile up, and which goroutines are blocked waiting for synchronization.

Profiling is evidence, not guessing

Profiling works by sampling. The runtime pauses your program every few milliseconds and records the current stack trace. Over time, these samples build a statistical picture of where the program spends its time. This approach keeps overhead low. You can run profiling on a live server without bringing it to its knees.

The net/http/pprof package provides ready-made HTTP handlers that expose these profiles. You register them on a route like /debug/pprof, and the package serves the data. The runtime/pprof package offers lower-level functions for writing profiles to files, which is useful for CLI tools or background workers that don't have an HTTP server.

Profiling is evidence. Don't guess where the time goes; measure it.

Wire up the endpoints

Here's the boilerplate to enable profiling on an HTTP server. You add this once during development. The handlers serve profiles for CPU, memory, goroutines, and more.

package main

import (
    "net/http"
    "net/http/pprof"
)

// main starts the HTTP server with profiling endpoints enabled.
func main() {
    // Register pprof handlers under /debug/pprof/
    // This gives access to CPU, memory, and goroutine profiles.
    http.HandleFunc("/debug/pprof/", pprof.Index)
    http.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
    http.HandleFunc("/debug/pprof/profile", pprof.Profile)
    http.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
    http.HandleFunc("/debug/pprof/trace", pprof.Trace)

    // Listen on port 8080
    // nil handler uses the default ServeMux where we registered pprof
    http.ListenAndServe(":8080", nil)
}

Public names start with a capital letter. Private names start lowercase. The pprof package exports handlers like Index and Profile because they are meant to be used by your server code. The underscore _ discards a value intentionally. If you ever need to ignore a return value from a helper, use _ to signal that you considered the value and chose to drop it.

How sampling works

When you visit /debug/pprof/profile?seconds=30, the handler starts sampling the CPU. Every 100 microseconds, the runtime interrupts a random goroutine and records its stack. After 30 seconds, it stops and streams the data as a protobuf file.

The sampling rate is fixed. You don't control how often samples are taken. This design prevents profiling from distorting the results. If profiling slowed the program down significantly, the samples would shift, and you'd measure the profiler instead of the application.

If you forget to specify the duration, the handler waits indefinitely. The browser hangs. Always pass a seconds parameter. If you try to access the profile endpoint without the parameter, you'll just stare at a loading spinner until you cancel the request.

Sampling misses short spikes. If a function runs for 50 microseconds and then returns, it might never get sampled. Profiling shows where the bulk of time goes. For rare, short events, you need tracing or logging.

A realistic bottleneck

Here's a server with a deliberately inefficient handler. It allocates memory, burns CPU, and waits on I/O. Profiling helps you distinguish between these three types of slowness.

// slowHandler simulates a handler with CPU, memory, and I/O bottlenecks.
func slowHandler(w http.ResponseWriter, r *http.Request) {
    // Allocate 1MB slice to create memory pressure
    // This forces the garbage collector to work harder
    data := make([]byte, 1024*1024)

    // Write to every byte so the compiler cannot optimize the allocation away
    // Real code often fills buffers before sending them
    for i := range data {
        data[i] = byte(i % 256)
    }

    // Burn CPU cycles in a tight loop
    // This will dominate the CPU profile
    result := 0
    for i := 0; i < 10000000; i++ {
        result += i
    }

    // Simulate blocking I/O
    // This shows up in goroutine profiles, not CPU profiles
    time.Sleep(50 * time.Millisecond)

    json.NewEncoder(w).Encode(map[string]int{"result": result})
}

func main() {
    // Register pprof endpoints for debugging
    http.HandleFunc("/debug/pprof/", pprof.Index)
    http.HandleFunc("/debug/pprof/profile", pprof.Profile)

    // Register the slow handler
    http.HandleFunc("/slow", slowHandler)

    // Start server
    http.ListenAndServe(":8080", nil)
}

Functions that take a context should respect cancellation and deadlines. context.Context always goes as the first parameter, conventionally named ctx. In a real handler, you'd pass r.Context() to database calls or downstream services. If a request is cancelled, the context signals the goroutine to stop. Context is plumbing. Run it through every long-lived call site.

Beyond CPU: memory and goroutines

CPU isn't the only resource. Memory allocations trigger garbage collection. If your code allocates too much, the GC runs frequently, and latency spikes. The memory profile tracks allocations.

Visit /debug/pprof/heap to get a memory profile. The data shows two views. alloc_space shows the total bytes allocated over the lifetime of the program. inuse_space shows the bytes currently held in memory. A function might allocate a lot but return quickly, so it shows up in alloc_space but not inuse_space. High alloc_space means GC pressure. High inuse_space means you're holding onto memory too long.

Goroutines are cheap, but not free. A leak happens when a goroutine blocks forever. The goroutine profile shows stack traces of all live goroutines. Visit /debug/pprof/goroutine?debug=1 to see the text output. If you see hundreds of goroutines stuck on the same channel receive, you have a leak.

Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path. If a goroutine blocks on a receive, ensure something can send or the channel can close. The worst goroutine bug is the one that never logs.

The block profile shows where goroutines wait for synchronization. Mutexes, channels, and network I/O cause blocks. Enable it by setting runtime.SetBlockProfileRate(1). A high block rate means contention. You might need to reduce lock granularity or redesign the data flow.

Analyzing the data

Download the profile file. Run go tool pprof profile.pb.gz to open the interactive analyzer. Type top to see the hottest functions. The output lists functions by CPU time. Type list funcName to see the annotated source code. The numbers next to each line show how many samples hit that line.

Type web to open a flame graph in the browser. The flame graph shows the call stack. Wider bars mean more time spent. The top of the stack is at the bottom of the graph. You can click on a bar to zoom in. Flame graphs make it easy to spot the critical path.

If you run go tool pprof without the go binary in your path, it might fail to resolve symbols. Ensure your Go installation is accessible. The tool needs the binary to map addresses to function names.

Trust gofmt. Argue logic, not formatting. Profiling code is no exception; keep it clean. Most editors run gofmt on save. Don't fight the type system. Wrap the value or change the design.

Pitfalls and conventions

Never expose pprof on a public IP. The endpoints reveal internal state, stack traces, and sometimes sensitive data. Lock them behind authentication or keep them dev-only. If you deploy to production, use a reverse proxy to restrict access, or remove the handlers entirely.

Forget to use a package and you get imported and not used from the compiler. If you add pprof to imports but remove the handler registration, the build fails. The compiler is strict about unused imports. This rule keeps codebases clean.

Check errors from http.ListenAndServe. The function returns an error. If you ignore it, you swallow panics. The community accepts the boilerplate if err != nil { log.Fatal(err) } because it makes the unhappy path visible. Verbose error handling is a feature, not a bug.

When you define methods on structs, the receiver name is usually one or two letters matching the type. (s *Server) Start() not (this *Server) Start(). This keeps code concise. You'll see this convention in profiles when methods appear in the stack traces.

Don't pass a *string. Strings are already cheap to pass by value. If you see a pointer to a string in a profile, you're likely doing unnecessary indirection. Pass the string directly.

When to use what

Use net/http/pprof when you are building a web server and need quick access to profiles via HTTP endpoints during development.

Use runtime/pprof when you are writing a CLI tool or a background worker that has no HTTP server, and you need to write profiles to a file manually.

Use go tool pprof when you have a profile file and need to analyze the data, visualize the flame graph, or compare two runs.

Use a production APM tool when you need continuous monitoring, distributed tracing across services, and alerting without manual intervention.

Profile early, profile often. Optimization without data is just guessing.

Where to go next

Profiling helps you find slow parts of your code so you can make them faster. Think of it like a speed camera that tells you exactly where your car is driving too slowly. You use it when your server feels sluggish or uses too much CPU.