The black box problem
You deploy a service. It handles requests. It returns responses. Then traffic spikes. The service slows down. You check the logs. The logs show errors, but they do not show the trend. You need to know how many requests arrive, how long they take, and how many fail. Logs tell you what happened. Metrics tell you how the system behaves over time.
Prometheus is the standard for metrics in the Go ecosystem. It uses a pull model. Prometheus scrapes your application. Your application exposes an HTTP endpoint. The endpoint returns a text format of all current metrics. The Go client library handles the heavy lifting. You define metrics. You register them. You expose them.
Metrics as numbers with dimensions
A metric is a number. That number changes over time. Prometheus stores these numbers in a time series database. Your job is to produce the numbers. The client library provides four types of metrics. Each type serves a specific purpose.
A counter is a cumulative value. It only goes up. You use counters for events that happen. Total requests. Total errors. Total bytes sent. If the process restarts, the counter resets to zero. Prometheus handles the reset by calculating rates over time.
A gauge is a value that goes up and down. You use gauges for current state. Memory usage. Queue length. Temperature. Active connections. Gauges represent a snapshot of the system at the moment of the scrape.
A histogram measures distributions. You use histograms for latency and request size. A histogram buckets values into ranges. You get the count of values in each bucket. You also get the sum of all values. This lets you calculate averages and percentiles on the server side.
A summary is similar to a histogram. It calculates percentiles on the client side. Summaries are more expensive to maintain. They use more memory. Histograms are usually the better choice. Prometheus can calculate percentiles from histograms using the histogram_quantile function.
Labels add dimensions to metrics. A counter without labels is a single number. A counter with labels is a map. Labels let you filter and aggregate. You can group by HTTP method. You can group by status code. You can group by endpoint. Labels are powerful. They are also expensive. Every unique combination of labels creates a new time series. High cardinality breaks Prometheus. Keep labels low cardinality. Use status codes, not user emails. Use endpoint paths, not query parameters.
Metrics are cheap. Labels are expensive.
Minimal example
Here is the simplest way to expose a metric. You define a counter. You register it. You serve the /metrics endpoint.
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// requestsTotal counts the number of HTTP requests received.
// Counters only increase. They reset to zero on process restart.
var requestsTotal = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests received by the server.",
},
)
func init() {
// MustRegister adds the metric to the global default registry.
// It panics if the metric is already registered.
// This is safe for startup code where registration failure is fatal.
prometheus.MustRegister(requestsTotal)
}
func main() {
// promhttp.Handler() collects all metrics from the default registry.
// It formats them as text for the Prometheus scraper.
http.Handle("/metrics", promhttp.Handler())
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
// Inc increments the counter by one.
// The client library is thread-safe.
// Multiple goroutines can call Inc without mutexes.
requestsTotal.Inc()
w.Write([]byte("Hello World"))
})
// ListenAndServe starts the HTTP server on port 8080.
// The /metrics endpoint is now available for scraping.
http.ListenAndServe(":8080", nil)
}
The code defines a package-level variable for the counter. Go convention favors package-level variables for metrics. This makes them easy to access from any function in the package. The init function registers the metric. The main function sets up the HTTP handlers. The promhttp.Handler does the work of collecting and formatting.
Register once. Scrape often.
How registration works
The client library uses a registry pattern. The registry holds references to all metrics. When Prometheus scrapes the endpoint, the handler asks the registry for all metrics. The registry iterates over the metrics. Each metric returns its current value and metadata.
The library provides a global default registry. prometheus.MustRegister adds a metric to this global registry. Most applications use the global registry. It is simple and works well.
You can create custom registries using prometheus.NewRegistry. Custom registries are useful when you need to isolate metrics. For example, you might have a plugin system. Each plugin registers its own metrics. You use a custom registry per plugin. You then register the plugin's registry with the global registry. This prevents name collisions.
The compiler rejects duplicate registrations. If you call MustRegister twice with the same metric name, the program panics at startup with duplicate metrics collector registration attempted. This is a feature. It catches configuration errors early. You do not want silent metric drops in production.
If you prefer to handle errors gracefully, use prometheus.Register instead. It returns an error. You can log the error and decide how to proceed.
// Register returns an error if the metric is already registered.
// Use this when registration might fail and you want to handle it.
err := prometheus.Register(requestsTotal)
if err != nil {
// Handle the error. Log it. Return it.
// Do not panic here unless you want to.
}
Convention aside: The receiver name in Go methods is usually one or two letters. The Prometheus client follows this. You will see (c *Counter) in method signatures, not (this *Counter). Stick to the convention. It makes the code readable to other Go developers.
Realistic example: HTTP middleware
Counters are useful. Histograms are more powerful. You want to measure request duration. You want to break it down by method and status code. Middleware is the right place to do this. Middleware wraps your handlers. It runs before and after the handler. It measures time. It records metrics.
Here is a middleware implementation. It uses a histogram. It uses labels. It respects context cancellation.
package main
import (
"net/http"
"time"
"github.com/prometheus/client_golang/prometheus"
)
// requestDuration measures the duration of HTTP requests.
// Buckets define the ranges for the histogram.
// These buckets cover typical web request latencies.
var requestDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Duration of HTTP requests in seconds.",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "status"},
)
// Middleware wraps an HTTP handler to record metrics.
// It measures duration and records it in the histogram.
func Middleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Wrap the ResponseWriter to capture the status code.
// The default ResponseWriter does not expose the status code.
// You need a wrapper to intercept WriteHeader.
wrapper := &statusRecorder{ResponseWriter: w, status: http.StatusOK}
// Call the next handler in the chain.
next.ServeHTTP(wrapper, r)
// Calculate duration.
duration := time.Since(start).Seconds()
// Record the duration with labels.
// With creates a new metric instance with fixed labels.
// Observe records the value in the histogram.
requestDuration.WithLabelValues(r.Method, statusText(wrapper.status)).Observe(duration)
})
}
// statusRecorder wraps http.ResponseWriter to capture the status code.
type statusRecorder struct {
http.ResponseWriter
status int
}
// WriteHeader captures the status code before writing it.
func (r *statusRecorder) WriteHeader(code int) {
r.status = code
r.ResponseWriter.WriteHeader(code)
}
// statusText converts an HTTP status code to a string.
// Prometheus labels must be strings.
func statusText(code int) string {
if name := http.StatusText(code); name != "" {
return name
}
return "unknown"
}
The middleware creates a statusRecorder. This wrapper captures the status code. The default http.ResponseWriter does not expose the status code after writing. You need the wrapper to get the code for the label.
The middleware calculates duration. It calls WithLabelValues to set the labels. It calls Observe to record the value. The histogram buckets the value. Prometheus aggregates the buckets over time.
You need to register the histogram. Add this to your init function.
func init() {
prometheus.MustRegister(requestDuration)
}
Then wrap your router with the middleware.
func main() {
mux := http.NewServeMux()
mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Hello World"))
})
// Wrap the mux with the metrics middleware.
// All requests handled by the mux will be recorded.
handler := Middleware(mux)
http.Handle("/metrics", promhttp.Handler())
http.ListenAndServe(":8080", handler)
}
Context is plumbing. Run it through every long-lived call site. The middleware example does not use context, but a real service should. If the request is cancelled, you should stop processing. The http.Handler interface does not enforce context usage, but Go convention dictates that request-scoped work respects r.Context().
Pitfalls and errors
Metrics are simple. The traps are subtle.
High cardinality is the biggest trap. Labels create time series. Every unique label combination is a new series. If you use a user ID as a label, you create a series for every user. If you use an email address, you create a series for every email. The number of series grows linearly with users. Prometheus memory usage grows. Scrape time grows. The database grows. Prometheus is not a database for high-cardinality data. It is a time series database for aggregates. Keep labels low cardinality. Use status codes. Use endpoint paths. Use region codes. Do not use user identifiers.
Unregistered metrics are silent failures. If you create a metric but forget to register it, Prometheus does not see it. The metric exists in memory. It updates. But the scraper never collects it. You think you are monitoring something. You are not. The compiler does not catch this. You must register every metric. Use MustRegister in init to catch missing registrations at startup.
Duplicate registrations cause panics. If you register the same metric twice, MustRegister panics. This happens when you import a package that registers metrics, and you also register them manually. Or when you initialize the same package twice. Check your imports. Check your init functions.
The compiler complains with undefined: prometheus if you forget to import the package. The compiler complains with imported and not used if you import it but do not use it. Go is strict about imports. Keep them clean.
Runtime panics are rare but possible. duplicate metrics collector registration attempted is the most common. It happens at startup. It stops the process. This is good. It prevents silent metric loss.
Goroutine leaks are not a risk with the client library. The library does not spawn goroutines for metrics. It uses atomic operations. You do not need to worry about cleanup. When the process exits, the metrics are gone.
Trust the library. Do not add mutexes. Do not add channels. The client is thread-safe.
Naming conventions
Prometheus has naming conventions. Follow them. It makes your metrics consistent with other Go services.
Use snake_case for metric names. http_requests_total, not httpRequestsTotal.
Use suffixes for counters. _total is the standard. http_requests_total. This tells Prometheus it is a counter. It enables rate calculations.
Include units in the name. _seconds, _bytes, _requests. http_request_duration_seconds. This tells you what the number means.
Help text should be clear. It appears in the Prometheus UI. It helps operators understand the metric. "Total number of HTTP requests" is good. "Count" is bad.
Labels should be lowercase. Use underscores for multi-word labels. method, status_code, endpoint.
Naming conventions are not enforced by the compiler. They are enforced by the community. Follow them. It makes your life easier.
Decision matrix
You have choices. Pick the right tool for the job.
Use a Counter when you need to track cumulative events. Use a Counter for total requests, total errors, total bytes sent. Counters only go up. They are cheap. They are reliable.
Use a Gauge when you need to track current state. Use a Gauge for memory usage, queue length, active connections. Gauges go up and down. They represent a snapshot.
Use a Histogram when you need to measure distributions. Use a Histogram for request duration, request size, response size. Histograms bucket values. They let you calculate percentiles on the server.
Use a Summary when you need client-side percentiles. Use a Summary only when you cannot use a Histogram. Summaries are expensive. They use more memory. Histograms are usually better.
Use Prometheus when you need pull-based metrics. Use Prometheus when you want to aggregate metrics across many services. Use Prometheus when you want a standard tool with a large ecosystem.
Use OpenTelemetry when you need distributed tracing and metrics in one system. Use OpenTelemetry when you want vendor neutrality. Use OpenTelemetry when you need to export to multiple backends.
Use the expvar package when you need simple built-in metrics. Use expvar for quick debugging. Use expvar for small services. It is part of the standard library. It requires no dependencies. It is not as powerful as Prometheus.
Use plain logging when you need to debug specific events. Use logging for errors. Use logging for unusual events. Do not use logging for metrics. Logs are expensive. Metrics are cheap.
Pick the right metric type. Name it clearly. Register it once. Expose it safely.