How to Write Benchmarks in Go with testing.B

The problem with guessing performance

You just finished a function that parses a configuration file. It handles the happy path, it catches malformed rows, and the unit tests pass. Then you run it against a production export with two million lines. The process hangs. You did not break the logic, but you broke the timeline. Guessing performance is a trap. You need a controlled environment that measures wall-clock time, memory allocation, and CPU cycles without the noise of a live server. Go gives you a built-in tool for exactly this.

How the testing framework measures time

The testing package ships with a *testing.B type. It looks like a regular test runner, but it behaves like a precision instrument. Instead of asking you to pick a fixed number of iterations, it hands you a counter named b.N. You wrap your target code in a loop that runs b.N times. The testing framework starts with a small number, measures how long it takes, and then automatically scales b.N upward until the measurement stabilizes or reaches roughly one second of execution time. It handles the math so you do not have to write your own stopwatch logic.

Think of it like a camera adjusting its exposure. The first shot is too dark, so the camera widens the aperture and takes another. It keeps adjusting until the image is clear. The benchmark runner does the same with iterations. It finds the sweet spot where statistical noise disappears and the average time per operation becomes reliable.

The minimal benchmark

Here is the simplest valid benchmark. It measures how long it takes to split a string.

package main

import (
	"strings"
	"testing"
)

// BenchmarkSplit measures the cost of splitting a fixed string.
func BenchmarkSplit(b *testing.B) {
	// b.N is controlled by the testing framework.
	// It starts small and grows until timing stabilizes.
	for range b.N {
		// The target operation goes here.
		// We ignore the result to focus purely on execution time.
		_ = strings.Split("hello-world", "-")
	}
}

Run it with go test -bench=BenchmarkSplit. The framework prints the number of iterations it ran and the average nanoseconds per operation. If you run go test -bench=. it executes every function matching the Benchmark* pattern in the package. The naming convention is strict: Benchmark followed by a capital letter, then the feature name. Functions that do not follow this pattern are ignored by the runner.

What happens under the hood

When you invoke the command, the compiler treats benchmark functions exactly like regular functions. The testing package registers them at startup. The runner initializes b.N to a small value, executes your loop, and records the elapsed time. If the total time is too short to be statistically useful, it multiplies b.N and runs again. This continues until the measurement reaches a stable baseline. The final output gives you a reliable average, not a single lucky run.

The framework also tracks memory allocations automatically. Add -benchmem to your command line and the output will show bytes allocated per operation and the number of allocations. This catches hidden heap pressure before it hits production. The convention here is straightforward: run benchmarks with -benchmem by default. Allocation patterns often matter more than raw CPU time in real workloads.

Benchmarks are cheap to run. Let the framework find the stable number.

Realistic benchmark with setup and teardown

Real code rarely measures a single library call. You usually need to set up data, run the operation, and clean up. If you include setup inside the b.N loop, you measure the setup cost instead of the target function. The *testing.B type provides timer controls to isolate the measurement window.

package main

import (
	"encoding/json"
	"testing"
)

// BenchmarkParseJSON measures parsing without counting allocation time.
func BenchmarkParseJSON(b *testing.B) {
	// Stop the timer before setup so allocation isn't counted.
	b.StopTimer()
	payload := []byte(`{"id": 1, "name": "sensor"}`)
	var target struct {
		ID   int    `json:"id"`
		Name string `json:"name"`
	}
	// Restart the timer right before the measured work.
	b.StartTimer()

	for range b.N {
		// The actual parsing logic runs here.
		// We reset the struct to simulate fresh state each iteration.
		target = struct {
			ID   int    `json:"id"`
			Name string `json:"name"`
		}{}
		_ = json.Unmarshal(payload, &target)
	}
}

The convention here is strict: setup belongs outside the loop or between StopTimer and StartTimer. Teardown follows the same rule. If you need to reset state between iterations, put it inside the loop but keep the timer running only around the critical section. Error handling in benchmarks follows the same pattern as tests. Call b.Fatal to stop immediately and mark the benchmark as failed. Call b.Error to log the issue and continue running. The community accepts the if err != nil { b.Fatal(err) } boilerplate because it makes failure paths visible and stops wasted CPU cycles.

Pitfalls and compiler traps

The biggest trap in benchmarking is dead code elimination. The Go compiler is aggressive. If your loop calls a function but never uses the result, the compiler removes the entire call. Your benchmark will measure an empty loop and report near-zero time. The compiler will not warn you about this. It assumes you know what you are doing. To prevent it, assign the result to a package-level variable or use runtime.KeepAlive.

Another common mistake is measuring the wrong thing. If you benchmark a function that the compiler inlines, you are measuring the surrounding code, not the function itself. Add -gcflags=-l to the go test command to disable inlining for that run. This forces the compiler to emit a separate function call, giving you an accurate measurement of the target.

If you forget to pass the correct parameter type, the compiler rejects the program with cannot use b as type *testing.B in argument. If you name the function TestBenchmark instead of BenchmarkTest, the runner ignores it entirely. If you accidentally capture a loop variable in a closure inside the benchmark, the compiler rejects this with loop variable i captured by func literal in modern Go versions. Always declare loop variables explicitly or use the for range b.N pattern to avoid accidental captures.

The worst benchmark is the one that measures nothing. Guard against compiler optimizations.

When to benchmark versus when to profile

Use testing.B when you need to measure the raw speed of a specific function or algorithm in isolation. Use go test -benchmem when you suspect hidden allocations are slowing down your code. Use pprof with -cpuprofile or -memprofile when the benchmark shows slowness but you cannot pinpoint which line or call chain is responsible. Use plain unit tests when you only care about correctness and do not need performance guarantees. Use load testing tools like k6 or hey when you need to measure end-to-end latency under network conditions and concurrent user traffic.

Where to go next

A benchmark is a special test that runs your code thousands of times to measure how fast it is. You write a function that loops a specific number of times, and Go automatically calculates the average time per operation. This helps you spot slow code before your users do.