How to Use go tool trace for Goroutine and Latency Analysis

The invisible traffic jam

Your HTTP server handles two hundred requests per second without breaking a sweat. Push it to five hundred and response times double. The CPU sits at twenty percent. Memory usage is flat. The code logic is straightforward. You stare at the logs and see nothing wrong. The bottleneck is not in your business logic. It is in the scheduler. Goroutines are spawning, waiting on channels, or blocked inside a syscall, and they are piling up. You cannot see this with a simple print statement. You need a timeline of every runtime event.

Go ships with a built-in tracer that records exactly what the scheduler, garbage collector, and network stack are doing. It turns invisible concurrency into a visual timeline. You generate a trace file, feed it to the CLI tool, and get a browser interface that maps out every goroutine's life cycle.

What tracing actually captures

Profiling and tracing solve different problems. A CPU profile takes periodic snapshots of what code is executing. It tells you which functions burn cycles. A trace records a chronological log of runtime events. It tells you why goroutines are waiting.

Think of profiling as taking a photograph of a busy intersection every second. You can count how many cars are in each lane. You cannot see why they stopped. Tracing is a security camera that records every car entering, stopping at a light, merging, and leaving. You see the handshakes between threads. You see the exact moment a goroutine moves from runnable to waiting.

The Go runtime hooks into its own scheduler to emit these events. When a goroutine starts, blocks on a channel, yields to the garbage collector, or makes a network call, the runtime writes an event to an internal ring buffer. The tracer does not pause your program to take measurements. It streams events asynchronously. When you stop the trace, the buffer flushes to disk as a compact binary file.

Tracing is not a replacement for profiling. It is a complement. You use profiles to find hot functions. You use traces to find scheduling bottlenecks, goroutine leaks, and blocking I/O patterns.

Your first trace file

Generating a trace requires the runtime/trace package. You wrap the code you want to observe with trace.Start and trace.Stop. The tracer writes to an io.Writer, usually a file.

Here is the simplest program that spawns a few goroutines, blocks on a channel, and records the activity.

package main

import (
	"os"
	"runtime/trace"
	"time"
)

func main() {
	// Open file for trace output. OS handles cleanup on exit.
	f, err := os.Create("trace.out")
	if err != nil {
		panic(err)
	}
	defer f.Close()

	// Start tracing. The runtime begins buffering scheduler events.
	trace.Start(f)
	defer trace.Stop()

	// Spawn workers that simulate I/O and channel waits.
	done := make(chan struct{})
	for i := 0; i < 4; i++ {
		go func(id int) {
			// Simulate network latency or disk read.
			time.Sleep(50 * time.Millisecond)
			// Block until main signals completion.
			<-done
		}(i)
	}

	// Let workers run briefly, then unblock them.
	time.Sleep(100 * time.Millisecond)
	close(done)
	time.Sleep(100 * time.Millisecond)
}

Run the program to generate the binary trace file. Then hand it to the tool.

go run main.go
go tool trace trace.out

The command starts a local HTTP server and opens your browser automatically. The interface loads a timeline view of the entire execution window. You can zoom, pan, and filter by goroutine ID or function name.

Tracing captures the runtime's perspective, not just your code. Trust the timeline. It shows what the scheduler actually did, not what you assumed it would do.

Reading the goroutine timeline

The default view shows a horizontal timeline with vertical bands representing goroutines. Each band contains colored blocks that map to runtime states. Green means running. Gray means waiting. Red means blocked in a syscall. Yellow means suspended for garbage collection.

Click the Goroutines tab to see a list of every goroutine that existed during the trace. The list shows the stack trace at the moment the trace started, plus the total time spent in each state. Sort by "Wait Time" to find the goroutines that spent the most time idle. Sort by "Run Time" to find the ones burning CPU.

The Latency tab aggregates blocking operations. It groups syscalls, channel operations, and mutex locks by duration. This tab answers the question of which specific operation is holding up the pipeline. If you see a cluster of chan send or chan recv blocks, your concurrency model is likely mismatched. Producers are outpacing consumers, or a single goroutine is serializing work that should run in parallel.

The Scheduler tab shows how the runtime assigns goroutines to OS threads. You will see runnable goroutines waiting for an available P (processor). If the runnable queue stays high while CPU usage stays low, your program is starved of threads or blocked on synchronous I/O that does not yield properly.

Click any block in the timeline to expand its stack trace. The trace shows the exact line where the goroutine entered the blocking state. You can filter the view to a specific time range by dragging the selection bar at the top. This isolates spikes and removes noise from idle periods.

Goroutines are cheap. Channels are not magic. The timeline shows exactly where the flow stops.

When traces lie to you

Tracing adds overhead. The runtime must serialize events into the ring buffer, acquire locks occasionally, and flush to disk. In tight loops, tracing can slow execution by ten to twenty percent. This is the observer effect. Timing measurements inside a trace will not match production benchmarks. Use traces to find structural bottlenecks, not to measure exact nanosecond latency.

The ring buffer has a fixed size. If your program emits events faster than the buffer can drain, the tracer drops older events. The runtime prints a warning to stderr: runtime/trace: buffer overflow, some events dropped. When this happens, the timeline will have gaps. Goroutine states will jump without showing the intermediate waits. Increase the buffer size by calling trace.Start(f, trace.WithBufferSize(1<<30)) to allocate one gigabyte, or reduce the trace duration.

Traces also do not show user-space CPU burns well. A goroutine that spins in a tight for loop will appear as a solid green block. The trace records scheduler transitions, not instruction counts. If a function is CPU-bound, the trace will tell you it ran, but it will not tell you why it took so long. Pair traces with CPU profiles to separate scheduling delays from computational cost.

Another common trap is forgetting to respect context cancellation in long-running goroutines. If a goroutine blocks on a channel that never closes, it stays in the trace as a waiting band until the program exits. The trace will show the leak clearly, but the program will not recover. Always wire context.Context as the first parameter to functions that might block, and check ctx.Done() before entering long waits.

The worst goroutine bug is the one that never logs. The trace will catch it anyway.

Pick the right profiling tool

Go provides several observability tools. They overlap but serve different diagnostic goals. Choose based on the symptom you are investigating.

Use go tool trace when you need to see goroutine scheduling, channel blocking, and garbage collection pauses. Use CPU profiling with runtime/pprof.WriteCPUProfile when you need to find which functions consume the most processor time. Use memory profiling with runtime/pprof.WriteHeapProfile when you need to track allocation rates and find retained objects. Use the pprof web UI when you want to interactively explore call graphs and filter by cumulative or flat time. Use plain sequential code when you do not need concurrency: the simplest thing that works is usually the right thing.

Where to go next

go tool trace records a timeline of your Go program's execution to help you find performance bottlenecks. It shows exactly when goroutines start, stop, and wait, acting like a video replay of your code's activity. You use it when your application feels slow or unresponsive to pinpoint the specific function causing the delay.