Benchmark code

The benchmark trap

You suspect your JSON parser is the bottleneck. You add a timer, run the server, and see the numbers. Then you refactor the parser, run it again, and the numbers look better. But was it the refactor, or was the server just under less load the second time? Or did the compiler optimize the first version away because you weren't using the result? Guessing performance is a trap. Go gives you a tool to measure code in isolation, but the tool has rules. If you break the rules, the benchmark lies to you.

Benchmarks in Go are not unit tests. They do not check correctness. They measure throughput and memory allocation under controlled conditions. The testing framework runs your code repeatedly, adjusts the iteration count until the timing stabilizes, and reports nanoseconds per operation. The framework handles the loop count. You handle the code under test.

How the benchmark runner works

Think of b.N as a dial controlled by the testing framework, not by you. When you write a benchmark, you provide a function that accepts *testing.B. The framework calls this function multiple times. Each time, it sets b.N to a different value. It starts with a small number, measures the total time, and doubles b.N if the total time is too short to be statistically reliable. It stops when the benchmark runs for at least one second of wall-clock time.

This approach ensures the result is stable. Measuring a single operation might return zero nanoseconds because the timer resolution is coarser than the operation. Running the operation a billion times gives a precise average. The framework reports ns/op, which is the average nanoseconds per iteration. It also reports B/op if you ask for memory stats, showing bytes allocated per operation.

The function signature is strict. The compiler rejects functions that don't match the pattern. Name the function BenchmarkXxx and accept b *testing.B. If you use *testing.T instead, the compiler complains with cannot use t (variable of type *testing.T) as *testing.B value in argument. If you forget the Benchmark prefix, the test runner ignores the function entirely.

Benchmarks live in files ending with _test.go, just like tests. Run them with go test -bench=.. The -bench flag takes a regex pattern. A dot matches all benchmark functions. Add -benchmem to include memory allocation stats in the output.

Minimal example

Here is a benchmark that measures integer addition. The code is trivial, but the structure applies to any workload.

package main

import "testing"

// BenchmarkAdd measures the cost of adding two integers.
// The framework calls this function with varying b.N values.
func BenchmarkAdd(b *testing.B) {
    // Setup goes outside the loop.
    // The framework measures only the time inside the loop.
    // b.N is set by the framework to ensure stable timing.
    // Loop exactly b.N times. Do not hardcode the count.
    for i := 0; i < b.N; i++ {
        // The code under test goes here.
        // The framework divides total time by b.N to get ns/op.
        _ = 1 + 2
    }
}

Run this with go test -bench=.. The output looks like:

BenchmarkAdd-8    1000000000    0.345 ns/op

The -8 indicates GOMAXPROCS is 8. The 1000000000 is the final b.N value the framework chose. The 0.345 ns/op is the average time per addition. On modern hardware, integer addition is so fast that the overhead of the loop dominates the measurement. This is normal. Benchmarks measure the code you wrote, not the code you imagined.

Realistic example: string concatenation

String concatenation in a loop is a classic performance pitfall. Each += operation creates a new string, copies the old content, and appends the new content. This leads to quadratic time complexity. strings.Builder avoids this by pre-allocating a buffer and writing directly to it.

package main

import (
    "strings"
    "testing"
)

// BenchmarkConcatLoop measures naive string concatenation.
// This allocates a new string on every iteration of the inner loop.
func BenchmarkConcatLoop(b *testing.B) {
    // b.ResetTimer() is not needed here because setup is cheap.
    // Use b.ResetTimer() if you have expensive setup outside the loop.
    for i := 0; i < b.N; i++ {
        var result string
        // Inner loop simulates building a string from parts.
        // This creates 100 allocations per b.N iteration.
        for j := 0; j < 100; j++ {
            result += "a"
        }
        // Assign to a package-level variable to prevent the compiler
        // from optimizing away the loop if the result is unused.
        // See the pitfalls section for details.
        globalResult = result
    }
}

// BenchmarkBuilder measures strings.Builder for the same task.
// Builder reuses a single buffer and avoids intermediate allocations.
func BenchmarkBuilder(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var sb strings.Builder
        // Grow pre-allocates the buffer to the expected size.
        // This prevents the Builder from resizing during writes.
        sb.Grow(100)
        for j := 0; j < 100; j++ {
            sb.WriteByte('a')
        }
        // String() allocates the final string once.
        globalResult = sb.String()
    }
}

// globalResult prevents the compiler from dead-code eliminating the benchmarks.
var globalResult string

Run with go test -bench=. -benchmem. The output shows the difference:

BenchmarkConcatLoop-8    1000000    1234 ns/op    800 B/op    100 allocs/op
BenchmarkBuilder-8       5000000    234 ns/op     100 B/op     1 allocs/op

BenchmarkBuilder is roughly five times faster and allocates significantly less memory. The allocs/op column shows the number of heap allocations per operation. BenchmarkConcatLoop allocates 100 times per iteration because each += creates a new string. BenchmarkBuilder allocates once for the final result.

Trust the numbers, but verify the setup. If the benchmark result is assigned to a local variable, the compiler may prove the variable is unused and delete the entire loop. The result becomes 0.00 ns/op, which is a lie. The code didn't run. Assign the result to a package-level variable, or use b.StopTimer() and b.StartTimer() to isolate the critical section if you need to measure a subset of code.

Pitfalls and compiler errors

Benchmarks are sensitive to subtle mistakes. The compiler and runtime can invalidate your results without warning.

Compiler optimization. The Go compiler performs dead-code elimination. If your benchmark computes a value but never uses it, the compiler removes the computation. The benchmark reports zero time. This is not an error. It is a correct measurement of code that does nothing. Fix this by assigning the result to a package-level variable. The compiler cannot eliminate writes to package-level variables because other goroutines might read them. Alternatively, use b.ReportAllocs() to force the runtime to track allocations, which sometimes prevents optimization, but the global variable approach is more reliable.

Wrong signature. The compiler enforces the benchmark signature. If you write func BenchmarkXxx() {}, the compiler rejects it with BenchmarkXxx has wrong signature; must be func(b *testing.B). If you return a value, you get BenchmarkXxx returns value, but must return nothing.

Measuring setup. If you create a large data structure inside the benchmark loop, you measure the cost of creation plus the cost of the operation. Move setup outside the loop. If setup is expensive and you cannot move it outside, use b.ResetTimer() after setup and b.StartTimer() before the critical section. The framework stops the clock during setup and resumes it for the measurement.

System noise. Benchmarks are affected by CPU frequency scaling, background processes, and garbage collection. Run benchmarks multiple times to check for variance. Use go test -benchtime=5s to run for a longer duration and get a more stable average. The default is 1 second. Longer runs reduce noise but take more time.

Memory stats. Without -benchmem, the output omits allocation stats. You might miss a regression that increases memory usage while keeping latency stable. Always include -benchmem when comparing implementations. The B/op column shows bytes allocated per operation. The allocs/op column shows the number of allocations. High allocation counts often indicate pressure on the garbage collector.

Goroutine leaks. If your benchmark spawns goroutines that never exit, the benchmark hangs or leaks memory. Ensure every goroutine has a termination path. Use context.Context with a deadline or cancellation channel to stop goroutines when the benchmark ends. The worst goroutine bug is the one that never logs.

Decision matrix

Use go test -bench=. when you need to measure throughput and latency of a specific function in isolation.

Use b.StopTimer() and b.StartTimer() when setup is expensive and you only want to measure the core logic.

Use b.ReportAllocs() when memory allocation patterns matter as much as speed.

Use go test -benchmem when you need to see bytes allocated per operation and allocation counts.

Use a package-level variable to store results when the compiler optimizes your benchmark code away.

Use go test -benchtime=5s when you need higher precision for very fast operations or when results are noisy.

Use profiling tools like pprof when benchmarks show a regression but don't reveal the root cause.

Use go test -benchcmp=. when you have two versions of a benchmark and want to see the relative difference in a single output.

Benchmarks are tools, not oracles. They measure what you tell them to measure. If the setup is wrong, the numbers are wrong. Write benchmarks that reflect production workloads. Benchmark realistic data sizes. Benchmark realistic concurrency levels. And always check that the compiler didn't optimize your measurement away.

Where to go next

Benchmarks measure how fast your code runs, unlike tests which just check if it works. You write a special function that repeats a task thousands of times to get an average speed. Think of it like timing how long it takes to run a mile versus just checking if you can finish the race.