How to Implement Distributed Tracing in Go with OpenTelemetry

When a single log line stops making sense

You are debugging a payment service. The request hits your API, calls a database, talks to a third-party fraud checker, and then returns a 500 error. The logs show three separate lines from three different services. None of them share a common identifier. You spend forty minutes correlating timestamps manually before realizing the fraud checker timed out. Distributed tracing solves this by stitching those isolated logs into a single timeline. You stop guessing which log belongs to which user. You start seeing the exact path a request took through your entire stack.

What distributed tracing actually does

Tracing tracks a request as it moves through your system. Instead of guessing which log belongs to which user, you attach a unique identifier to the request at the entry point. Every downstream call copies that identifier into its own context. OpenTelemetry is the standard library for doing this. It gives you a tracer provider, a tracer, and spans. A span represents a single unit of work. Spans nest inside each other to form a tree. When the tree finishes, you export it to a backend that draws the timeline.

Think of a span like a stopwatch for a specific task. You start the stopwatch when the task begins, you note down relevant details while it runs, and you stop it when the task finishes. If that task calls another function, you start a second stopwatch. The second stopwatch runs inside the first one. When both stop, you have a parent-child relationship that maps exactly to your code execution.

OpenTelemetry separates the API from the SDK. The API defines the methods you call in your application code. The SDK provides the actual implementation that records timestamps, buffers data, and ships it to a collector. This split keeps your application code stable even when the underlying tracing implementation changes. The API package is lightweight. The SDK package contains the heavy lifting. You import both, but you only interact with the API in your business logic.

Traces use two identifiers. The trace ID is a 128-bit number that stays constant for the entire request lifecycle. The span ID is a 64-bit number that identifies a single operation. Every span in a trace shares the same trace ID. Child spans reference their parent span ID. This structure lets backends reconstruct the execution tree without storing massive nested objects.

Sampling controls how many traces you actually record. Recording every single request in a high-traffic system generates terabytes of data. OpenTelemetry lets you configure a sampler that decides whether to keep or drop a trace before any spans are created. You can sample by percentage, by route, or by custom rules. Sampling happens at the root span. If the root span gets dropped, all child spans get dropped automatically.

Tracing is invisible until something breaks. Instrument early, sample wisely, and export reliably.

The minimal setup

Here is the simplest working configuration. You create a tracer provider, attach a standard output exporter, register it globally, and start a single span.

package main

import (
	"context"
	"log"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/stdout/stdouttrace"
	"go.opentelemetry.io/otel/sdk/trace"
)

func main() {
	// Background context carries no request-specific data
	ctx := context.Background()

	// Exporter writes trace data to stdout in JSON format
	exporter, _ := stdouttrace.New(stdouttrace.WithPrettyPrint())

	// Provider manages the lifecycle of spans and exporters
	provider := trace.NewTracerProvider(trace.WithSyncer(exporter))

	// Global registration makes the provider available to otel.Tracer
	otel.SetTracerProvider(provider)

	// Ensure buffered spans flush before the program exits
	defer provider.Shutdown(ctx)

	// Tracer is a lightweight factory for creating spans
	tracer := otel.Tracer("payment-service")

	// Start creates a span and injects it into a new context
	ctx, span := tracer.Start(ctx, "process-payment")
	defer span.End()

	log.Println("Processing payment with active trace")
}

How the runtime stitches it together

The program runs through a specific sequence at startup. trace.NewTracerProvider allocates the SDK and configures it to use the standard output exporter. The WithSyncer option tells the provider to flush spans synchronously. This is fine for development but blocks the calling goroutine until the exporter finishes writing. Production systems usually switch to WithBatcher to group spans and reduce I/O overhead.

otel.SetTracerProvider swaps the global singleton. OpenTelemetry uses a package-level variable to store the active provider. Every call to otel.Tracer reads from that variable. You only need to call this once per process. The global registration pattern is standard in Go telemetry libraries. It avoids passing the provider through every function signature.

tracer.Start does three things. It generates a unique trace ID and span ID. It records the current timestamp as the start time. It returns a new context that carries the active span. The context propagation is automatic. Any function that receives this context can access the span or create child spans without extra boilerplate.

defer span.End() guarantees the span closes even if the function panics. The end time gets recorded, the span status is set to OK by default, and the span moves to the exporter buffer. When main returns, the deferred provider.Shutdown flushes the buffer and closes the exporter. If you skip the shutdown call, the program exits before the exporter finishes writing. You lose the trace data.

Convention aside: context.Context always goes as the first parameter in Go functions. OpenTelemetry follows this rule strictly. The tracer returns a modified context, not the span itself, because the context is what actually carries the tracing state across function boundaries. You never pass the span object directly. You pass the context. The context holds the span reference internally.

The compiler rejects missing imports with undefined: otel or similar package errors. It also complains with cannot use exporter (type *stdouttrace.Exporter) as trace.SpanExporter in argument if you mix up SDK and API types. Read the error carefully. The SDK and API live in separate module paths for a reason.

Context is plumbing. Run it through every long-lived call site.

Wiring it into real code

Real applications need to trace multiple operations and pass the context through layers. Here is a simplified HTTP handler that traces the incoming request and a downstream database call.

func handleOrder(w http.ResponseWriter, r *http.Request) {
	// Extract active span from the incoming request context
	ctx := r.Context()

	// Create a child span for the HTTP handler
	ctx, span := otel.Tracer("order-service").Start(ctx, "handle-order")
	defer span.End()

	// Record a custom attribute for later filtering
	span.SetAttributes(attribute.String("order.id", "12345"))

	// Pass the enriched context to the downstream function
	if err := processInventory(ctx, "widget"); err != nil {
		span.RecordError(err)
		http.Error(w, "inventory failed", 500)
		return
	}

	w.WriteHeader(200)
}

The downstream function receives the same context. It creates its own span, which automatically becomes a child of the HTTP handler span.

func processInventory(ctx context.Context, item string) error {
	// Child span inherits the trace ID from the parent context
	ctx, span := otel.Tracer("inventory-service").Start(ctx, "check-stock")
	defer span.End()

	// Simulate database query timing
	time.Sleep(50 * time.Millisecond)

	// Attach domain-specific metadata to the span
	span.SetAttributes(attribute.String("item.name", item))
	return nil
}

Notice how the context flows downward. The HTTP handler creates the root span for this request. The inventory function creates a child span. Both spans share the same trace ID. The exporter receives two spans that link together automatically. You never pass the span object directly. You pass the context. The context holds the span reference internally.

Attributes are key-value pairs that travel with the span. Use them for static metadata like service names, database table names, or user IDs. Events are timestamped markers inside a span. Use them for discrete moments like retry attempts, state transitions, or external API calls. Errors get recorded with span.RecordError. The SDK automatically sets the span status to Error and attaches the error message as an event. You do not need to manually set the status code. The SDK handles it.

Convention aside: Go functions that accept a context should respect cancellation and deadlines. OpenTelemetry spans automatically inherit the context deadline. If the parent context cancels, the child span ends early and records a cancelled status. You do not need to manually check for cancellation inside the span. The runtime handles it.

Don't fight the context. Let it carry the trace.

Where things break

Tracing introduces subtle failure modes. The most common mistake is dropping the context. If you create a span but pass context.Background() to a downstream call, the child span loses its parent. The trace tree breaks. You get two unrelated traces instead of one connected timeline. Always pass the context returned by tracer.Start.

Another trap is forgetting to end spans. If a function returns early without calling span.End(), the span stays open in memory. The exporter never receives it. You get incomplete traces and slowly growing memory usage. Always pair tracer.Start with defer span.End(). The defer statement guarantees execution even on panic or early return.

Exporter configuration causes runtime panics if misconfigured. If you point an OTLP exporter at a collector that is down, the exporter blocks until the timeout expires. The SDK does not panic on export failures by default. It drops the spans and continues. You lose visibility without realizing it. Configure retry policies and backoff strategies in the exporter options. Monitor your exporter metrics to catch silent drops.

Goroutine leaks happen when you spawn a background worker that holds a reference to an active span. The span keeps the context alive. The context keeps the goroutine alive. The goroutine never exits. Always detach long-running background tasks from the request context. Create a new background context for the worker and copy only the trace ID if you need to correlate logs later. The worst goroutine bug is the one that never logs.

Sampling can hide bugs. If you sample at 10 percent, nine out of ten errors disappear from your tracing backend. Combine tracing with structured logging. Log every error. Trace a subset. Use the trace ID in your log lines so you can jump from a log entry to the full timeline.

The compiler rejects unused imports with imported and not used: otel. It also complains with span.End undefined (type trace.Span has no field or method End) if you accidentally import the API package instead of the SDK in your test files. Keep your imports clean. Trust gofmt. Argue logic, not formatting.

Picking your tracing strategy

Use the standard output exporter when you are prototyping or running in a local development environment. Use an OTLP exporter when you need to ship traces to a production backend like Jaeger, Zipkin, or Grafana Tempo. Use manual context propagation when you are writing custom middleware or wrapping third-party libraries that do not support OpenTelemetry natively. Use automatic instrumentation when you want to trace standard library calls like net/http or database/sql without modifying your application code. Use span attributes when you need to filter or group traces by business logic like user IDs or order numbers. Use span events when you need to record discrete moments inside a long-running operation like retry attempts or state transitions. Use parent-child span nesting when a single request triggers multiple independent downstream calls. Use flat span structures when you are tracing a single synchronous function call.

Where to go next

Distributed tracing tracks requests as they move through different parts of your application, like following a package through a delivery network. It helps you see exactly where delays or errors happen in complex systems. You use it to debug performance issues and understand how your services interact.