The invisible thread through your service
A user clicks a button on your frontend. The request hits your Go service, which queries a PostgreSQL database, calls a payment API, and returns a JSON response. The response takes two seconds. The user complains. You check the logs and see timestamps, but nothing connects the database query to the payment call. You have no idea where the time went.
Telemetry solves that gap. It stitches together every hop a request makes, attaches timing data to each step, and aggregates the results into numbers you can graph. OpenTelemetry gives you a vendor-neutral way to collect both traces and metrics without rewriting your code when you switch observability backends.
What telemetry actually does
OpenTelemetry splits observability into two tracks. Traces follow a single request through your system. Each step in the request is a span. Spans link together into a trace, which gives you a timeline of exactly where time was spent. Metrics are cumulative gauges, counters, and histograms. They answer questions like how many requests arrived per second, how long the p99 latency sits, or how many database connections are open.
Think of traces as a flight itinerary. You see the departure airport, the arrival airport, the layover duration, and the total travel time. Think of metrics as the cockpit dashboard. You see fuel burn rate, engine temperature, and altitude. You need both to understand what is happening.
OpenTelemetry defines the data model and the SDK. The SDK runs in your process, collects spans and metric points, and ships them to an exporter. The exporter speaks a protocol like OTLP over HTTP or gRPC. Your backend receives the data and stores it. The separation means you can swap backends without touching your application code.
Traces are cheap to start but expensive to store at scale. Metrics are cheap to aggregate but lose individual request context. Use traces to debug latency. Use metrics to monitor health.
The minimal setup
Here is the smallest program that initializes both a tracer and a meter, then wraps an HTTP handler so every request automatically generates telemetry data.
package main
import (
"context"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"
"go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
)
func main() {
ctx := context.Background()
// Attach process metadata so every span and metric carries service identity
res := resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceName("demo-service"),
)
// Create an HTTP exporter that ships trace data to localhost:4318
traceExporter, err := otlptracehttp.New(ctx)
if err != nil {
log.Fatalf("failed to create trace exporter: %v", err)
}
// Batch spans to reduce network overhead and avoid per-span HTTP calls
tracerProvider := trace.NewTracerProvider(
trace.WithBatcher(traceExporter),
trace.WithResource(res),
)
// Register the provider globally so otel.Tracer() finds it automatically
otel.SetTracerProvider(tracerProvider)
// Create an HTTP exporter for metric data points
metricExporter, err := otlpmetrichttp.New(ctx)
if err != nil {
log.Fatalf("failed to create metric exporter: %v", err)
}
// Push metrics every 10 seconds instead of waiting for shutdown
meterProvider := metric.NewMeterProvider(
metric.WithReader(metric.NewPeriodicReader(metricExporter)),
metric.WithResource(res),
)
// Register the meter provider globally for otel.Meter() calls
otel.SetMeterProvider(meterProvider)
// Ensure both providers flush their buffers before the process exits
go func() {
sig := make(chan os.Signal, 1)
signal.Notify(sig, syscall.SIGINT, syscall.SIGTERM)
<-sig
_ = tracerProvider.Shutdown(ctx)
_ = meterProvider.Shutdown(ctx)
}()
// Wrap the handler so OpenTelemetry injects context and creates spans
handler := otelhttp.NewHandler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("ok"))
}), "demo.handler")
log.Fatal(http.ListenAndServe(":8080", handler))
}
Run this with an OTLP collector listening on port 4318. Every request to localhost:8080 produces a trace and emits HTTP server metrics. The code follows the Go convention of checking errors immediately with if err != nil { log.Fatalf(...) }. The community accepts the boilerplate because it makes failure paths impossible to miss.
Providers must shut down explicitly. If you skip the shutdown call, buffered spans and metric points drop on the floor. Trust the shutdown sequence. Flush before exit.
How the pieces connect at runtime
When the program starts, the tracer provider and meter provider allocate internal buffers and background goroutines. The batcher for traces collects spans and ships them in chunks. The periodic reader for metrics aggregates data points and pushes them on a fixed interval. Both exporters open HTTP connections to the collector.
When a request arrives, otelhttp.NewHandler intercepts it. It extracts an incoming trace context from headers if one exists. If not, it starts a new trace. It creates a root span named after the handler, attaches it to the request context, and passes the enriched context to your handler function.
Inside your handler, you call otel.Tracer("mycomponent").Start(ctx, "operation"). The SDK creates a child span linked to the parent. When the span ends, it records elapsed time, status, and any attributes you attached. The batcher queues the span. When the queue fills or the flush interval fires, the exporter serializes the spans to protobuf and sends them over HTTP.
Metrics work differently. You call meter.Int64Counter("requests_total").Add(ctx, 1). The SDK records the increment in an in-memory aggregator. The periodic reader snapshots the aggregator, converts the data to OTLP metric format, and pushes it. No per-call network overhead.
Context is plumbing. Run it through every long-lived call site. If a function does not accept context.Context as its first parameter, it cannot participate in tracing. Rename the parameter to ctx by convention. Keep the signature consistent.
Adding real workloads
Real services call databases, message queues, and external APIs. Each call needs a span. Each span needs the parent context. Here is how you propagate context to a database query and record a custom metric.
package main
import (
"context"
"database/sql"
"log"
"net/http"
"time"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
)
// HandleRequest demonstrates context propagation and custom metric recording.
func HandleRequest(db *sql.DB) http.HandlerFunc {
meter := otel.Meter("demo.handler")
counter := meter.Int64Counter("db.query.count")
histogram := meter.Float64Histogram("db.query.duration")
return func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()
// Create a child span for the database operation
ctx, span := otel.Tracer("demo.db").Start(ctx, "db.query")
defer span.End()
// Record the query start time for duration calculation
start := time.Now()
// Execute the query with the enriched context
row := db.QueryRowContext(ctx, "SELECT status FROM orders WHERE id = $1", 42)
var status string
if err := row.Scan(&status); err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, "query failed")
http.Error(w, "internal error", http.StatusInternalServerError)
return
}
// Calculate elapsed time and attach it as a span attribute
elapsed := time.Since(start).Seconds()
span.SetAttributes(attribute.Float64("db.duration", elapsed))
// Increment the counter and record the histogram point
counter.Add(ctx, 1, attribute.String("status", status))
histogram.Record(ctx, elapsed, attribute.String("status", status))
w.Write([]byte("order: " + status))
}
}
The handler extracts the request context, starts a child span, and passes that context to QueryRowContext. The database driver sees the context and can attach trace headers if it supports OpenTelemetry natively. If it does not, the span still captures timing and errors. The metric calls record data without blocking the request. The histogram aggregates latency distributions so you can see p50, p90, and p99 values later.
Public names start with a capital letter. Private start lowercase. The handler function is exported so other packages can wrap it. The internal helper variables are unexported. Follow the capitalization rule and the compiler enforces visibility for you.
Where things break
Telemetry adds background goroutines and network calls. Misconfiguration turns invisible overhead into visible latency. The most common failure is forgetting to shut down providers. The compiler will not catch it. The runtime will silently drop buffered data when the process exits. Always call Shutdown(ctx) on both providers during graceful termination.
Exporters can fail. Network partitions happen. If you ignore the error from otlptracehttp.New(ctx), the program compiles but ships no data. The compiler rejects unused imports with imported and not used, but it will not warn you about a failed exporter constructor. Check the error. Fail fast.
Context cancellation leaks spans. If a request times out and the context cancels, any goroutine still waiting on that context stops. If that goroutine was holding an open span, the span never ends. Open-ended spans confuse aggregators and inflate latency metrics. Always attach a timeout to long-running operations and end spans in defer blocks.
Unbuffered channels in custom exporters block the calling goroutine. The SDK expects exporters to return quickly. If your exporter writes to a blocking channel, request handlers stall. Buffer the channel or use a worker pool to drain it. The worst goroutine bug is the one that never logs. Add a logger to your exporter fallback path.
Go conventions keep telemetry code readable. gofmt decides indentation and spacing. Run it on save. Receiver names are one or two letters matching the type. Use (h *Handler) ServeHTTP(...) instead of (this *Handler). The underscore discards values intentionally. Use result, _ := ... only when you have verified the second return value is safe to drop. Never drop errors silently.
When to reach for OpenTelemetry
Use OpenTelemetry when your service spans multiple processes and you need to follow a single request across boundaries. Use structured logging with slog when you only need key-value records and do not care about request correlation. Use a simple counter or gauge library when you only track aggregate throughput and latency without distributed context. Use a full commercial APM agent when you want zero configuration and are willing to accept vendor lock-in. Use plain sequential code when you do not need observability: the simplest thing that works is usually the right thing.