The blind spot
You deploy a Go service. It runs for a week. Then the pager fires. CPU is at 100%. You SSH in, run top, and see the process eating everything. You check the logs. Nothing unusual. You restart. It gets better for an hour, then crashes again. You have no idea what changed. The code is doing something, but you are flying blind.
Metrics fix this. They turn "something is wrong" into "the database latency spiked at 3 AM when the batch job ran." Prometheus is the standard tool for collecting metrics in Go. It gives you numbers you can graph, alert on, and use to understand your system.
The meter reader model
Prometheus does not push data to your app. Your app does not send data to Prometheus. Instead, Prometheus comes to your app and asks for a snapshot.
Think of it like a meter reader. You do not mail your electricity usage to the utility company every second. You install a smart meter. The utility company sends a van to your house once every fifteen minutes. The van reads the meter and drives away.
Your Go app is the house. The metrics are the meter. Prometheus is the van. You expose an HTTP endpoint. Prometheus hits that endpoint. Your app returns a text file with numbers. Prometheus stores the numbers.
This pull model keeps your app simple. You do not need to manage connections to a central server. You do not need to handle retries if the metrics server is down. You just expose the endpoint. If Prometheus cannot reach you, it marks the target as down. That is also useful information.
Minimal counter
Here is the skeleton: define a counter, register it, mount the handler, increment on requests.
package main
import (
"fmt"
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
// requestCount tracks total HTTP requests.
var requestCount = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total number of HTTP requests received.",
},
)
func init() {
// Register the metric so Prometheus can find it.
prometheus.MustRegister(requestCount)
}
func main() {
// Mount the metrics handler at the standard path.
http.Handle("/metrics", promhttp.Handler())
// Handle the root path and increment the counter.
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
requestCount.Inc()
fmt.Fprint(w, "Hello")
})
// Start the server.
http.ListenAndServe(":8080", nil)
}
The counter is a global variable. This is standard practice for metrics. The metric lives for the lifetime of the process. The init() function registers the metric in the global registry. This ensures the metric is ready before main() runs.
The promhttp.Handler() returns an http.Handler that serves the metrics. You mount it at /metrics. This is the convention. Prometheus expects the endpoint at /metrics by default.
The counter increments in the handler. The counter is thread-safe. Multiple goroutines can call Inc() without a mutex. The client library uses atomic operations under the hood.
Register once. Increment everywhere. The registry is global.
What happens at runtime
When the binary starts, init() runs before main(). The metric gets registered in the global registry. If you try to register the same name twice, MustRegister panics. The panic message is duplicate metrics collector registration attempted. This protects you from accidental double-counting.
The server listens. A client hits /. The handler increments the counter. The counter value increases by one. A Prometheus server hits /metrics. The promhttp handler walks the registry, formats the data as text, and returns it.
The response looks like this:
# HELP http_requests_total Total number of HTTP requests received.
# TYPE http_requests_total counter
http_requests_total 42
The # HELP line describes the metric. The # TYPE line tells Prometheus the metric kind. The last line is the value. Prometheus parses this text, stores the value with a timestamp, and repeats the scrape every fifteen seconds.
Dimensions and latency
Real services need dimensions. You want to know how many requests hit /api/users versus /api/orders. You want to know how long requests take. Counters only go up. They cannot track latency. Histograms track distributions.
Here is how to define a histogram with labels.
// requestDuration tracks latency by method and status.
var requestDuration = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Histogram of HTTP request latencies.",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "status"},
)
func init() {
// Register once at startup.
prometheus.MustRegister(requestDuration)
}
The HistogramVec creates a histogram with labels. Labels are key-value pairs that add dimensions. method and status are the labels. DefBuckets provides default bucket boundaries for latency in seconds. Buckets are ranges like [0.005, 0.01, 0.025, 0.05, 0.1, ...]. The histogram counts how many observations fall into each bucket. This is memory efficient. You do not store every value. You store counts per bucket.
Here is how to use the histogram in a handler.
func handleRequest(w http.ResponseWriter, r *http.Request) {
start := time.Now()
// Defer ensures metric is recorded even if the handler panics.
defer func() {
requestDuration.WithLabelValues(r.Method, "200").Observe(time.Since(start).Seconds())
}()
// Simulate work.
time.Sleep(50 * time.Millisecond)
w.Write([]byte("OK"))
}
The WithLabelValues call selects the specific time series for the given labels. Observe records the value. The defer ensures the metric is recorded even if the handler returns early or panics. This is important. You want to capture the latency of failed requests too.
Labels add dimensions. High cardinality kills performance. Keep labels bounded.
Naming and conventions
Metric names follow strict conventions. Names use snake_case. Counters end with _total. Durations end with _seconds. Sizes end with _bytes. The Help text should be clear and concise. It appears in the Prometheus UI and Grafana dashboards.
The client library enforces some rules. If you use an invalid name, the compiler rejects the program with invalid metric name. Names must match [a-zA-Z_:][a-zA-Z0-9_:]*. Colons are reserved for recording rules. Do not use colons in your metric names.
Labels also follow conventions. Label names use snake_case. Label values should be lowercase. Use method for HTTP methods. Use status for status codes. Do not use user_id as a label. If you have a million users, you create a million time series. Memory explodes. The scrape response gets huge. Scrape time increases. The server might timeout.
Keep labels low cardinality. Status codes, methods, endpoints are fine. User IDs, request IDs, timestamps are not.
Pitfalls and errors
Duplicate registrations crash the app. The panic is duplicate metrics collector registration attempted. This happens if you import a package that registers metrics and also register them manually. Use prometheus.WrapRegistererWith to create a sub-registry if you need isolation. Or use Register instead of MustRegister and check the error. MustRegister is standard for app-level code because a duplicate registration is a bug that should fail fast.
Label cardinality is the silent killer. If you add a label for a query parameter, and users send unique values, your memory usage grows linearly. Prometheus stores one time series per unique label combination. A histogram with 10 buckets and 1000 label combinations creates 10,000 time series. Each time series consumes memory. The scrape response grows. The Prometheus server slows down.
Guard against label explosions. Validate label values. Use a fixed set of labels. If you need high-cardinality data, use logs or traces. Metrics are for aggregates.
The worst metric is the one that crashes your server. Guard against duplicate registrations and label explosions.
Choosing the right metric
Use a Counter when you track a monotonically increasing value like total requests or bytes sent.
Use a Gauge when you track a value that goes up and down like current queue size or active connections.
Use a Histogram when you measure distributions of values like request latency or response size.
Use a Summary when you need precise quantiles calculated on the client side, though Histograms are usually preferred for aggregation.
Use promauto helpers when you want to define and register a metric in a single call to reduce boilerplate.
Use promhttp.InstrumentHandlerDuration when you want automatic instrumentation for an entire HTTP handler without writing manual metric code.
Counters count. Gauges gauge. Histograms histogram. Pick the shape that matches the data.