How to Monitor System Resources from Go

The server is hot, but why?

Your production server is running hot. The load average is climbing, and the alerting system pings you about memory usage. You SSH in and run top, which tells you the process is consuming resources, but it doesn't tell you why. Is the garbage collector thrashing? Did a goroutine leak spiral out of control? Or is the heap just growing because of a legitimate cache? You need to look inside the Go runtime to see what's actually happening.

Go's runtime is a sophisticated engine. It manages memory, schedules goroutines, and handles garbage collection. For years, developers used runtime.ReadMemStats or runtime.NumGoroutine() to peek under the hood. Those functions still work, but they are scattered and sometimes expensive. The runtime/metrics package is the modern dashboard. It exposes a unified set of metrics directly from the runtime internals. You ask for specific samples, and the runtime hands them back. It's designed to be low-overhead and consistent. You can track CPU time, memory classes, goroutine counts, and GC cycles in one place.

Think of runtime/metrics like the diagnostic port on a car. You can plug in and ask for engine RPM, fuel pressure, or tire temperature. The car doesn't rebuild the engine to answer; it reads sensors and returns the data. Similarly, the Go runtime maintains internal counters and structures. metrics.Read queries those structures and returns the current state without stopping the world.

Minimal example

Here's the simplest way to grab runtime metrics. You define a slice of sample descriptors, call metrics.Read, and iterate over the results.

package main

import (
	"fmt"
	"runtime/metrics"
)

func main() {
	// Define the metrics we want to sample.
	// The names follow a specific path convention used by the runtime.
	samples := []metrics.Sample{
		{Name: "/sched/goroutines:goroutines"},
		{Name: "/memory/classes/heap-objects:bytes"},
	}

	// Read returns a slice of values matching the requested samples.
	// The runtime calculates these values on demand.
	values := metrics.Read(samples)

	// Iterate over the results and print the name and value.
	// Value types vary, so we check the kind before extracting data.
	for _, v := range values {
		fmt.Printf("%s: %v\n", v.Name, v.Value.Uint64())
	}
}

How the runtime responds

When you call metrics.Read, the runtime doesn't just return cached numbers. It walks through the internal data structures to compute the current state. The cost depends on the metric. Some metrics are cheap counters stored in atomic variables. Others require traversing the heap or inspecting the scheduler.

The metrics.Sample struct tells the runtime exactly what you want. The Name field is a string like /sched/goroutines:goroutines. The part before the colon is the metric path. The part after is the unit. The runtime matches these names against its internal registry. If a name is invalid, the runtime returns a zero value and sets the Value kind to ValueBad. You should always check v.Value.Kind() if you're parsing dynamically, though in static code you usually know what you asked for.

Convention aside: gofmt is mandatory. Don't argue about indentation; let the tool decide. Most editors run it on save. The code above follows standard formatting. If you paste this into an editor, gofmt will align the struct fields and imports automatically.

Understanding metric categories

The runtime groups metrics by category. Knowing the categories helps you find what you need without guessing.

The /sched category covers scheduling. Metrics here tell you about goroutine counts, blocking operations, and scheduler latency. /sched/goroutines:goroutines gives the current number of goroutines. /sched/pauses/total:seconds shows total pause time.

The /memory category covers heap and stack usage. /memory/classes/heap-objects:bytes tracks objects allocated on the heap. /memory/classes/stack-inuse:bytes shows stack memory in use. Memory metrics are granular. You can see heap objects, heap free, stack inuse, and more. This granularity helps you distinguish between a leak in heap allocations and stack growth from deep recursion.

The /gc category covers garbage collection. /gc/cycles/automatic:gc-cycles counts automatic GC cycles. /gc/pauses/total:seconds shows total GC pause time. GC metrics are crucial for latency-sensitive applications. If your p99 latency is spiking, check GC pauses.

The /cpu category covers CPU time. /cpu/threads:threads shows the number of OS threads. /cpu/total:seconds shows total CPU time. CPU metrics help you understand if your application is CPU-bound.

Metric names are stable across Go versions. The documentation for runtime/metrics lists all available names. You can also call metrics.All() to get a list of all available metric names at runtime. This is useful for building dynamic dashboards that adapt to the Go version.

Reading descriptions

You can query metadata about a metric using metrics.Description. This returns a struct with the name, description, and kind. The description is a human-readable string explaining what the metric measures. The kind tells you the data type, like KindUint64 or KindFloat64.

This is helpful when you're building a monitoring tool that needs to label metrics correctly. You can fetch the description once and cache it. The description doesn't change at runtime.

Convention aside: functions that perform I/O or blocking work should accept a context.Context as the first parameter. If your metrics collection involves network calls or long computations, pass ctx through. For metrics.Read, the call is usually fast enough that context isn't strictly necessary, but the habit pays off when you expand the handler.

Realistic example

In a real application, you rarely print metrics to stdout. You expose them to a monitoring system or log them periodically. Here's a handler that returns JSON metrics. This pattern is common for internal health endpoints.

package main

import (
	"encoding/json"
	"net/http"
	"runtime/metrics"
)

// MetricsHandler returns runtime metrics as JSON.
// Sampling occurs on every request; keep the sample list short.
func MetricsHandler(w http.ResponseWriter, r *http.Request) {
	// Request specific metrics.
	// Names follow the /category/name:unit convention.
	samples := []metrics.Sample{
		{Name: "/sched/goroutines:goroutines"},
		{Name: "/memory/classes/heap-objects:bytes"},
	}

	// Read computes values from runtime internals.
	values := metrics.Read(samples)

	// Map names to values for JSON output.
	result := make(map[string]uint64)
	for _, v := range values {
		result[v.Name] = v.Value.Uint64()
	}

	// Write JSON response.
	w.Header().Set("Content-Type", "application/json")
	json.NewEncoder(w).Encode(result)
}

The handler requests a small batch of metrics. Reading multiple metrics in one call is more efficient than calling Read multiple times. The runtime optimizes batch reads. It computes all requested values in a single pass where possible.

Convention aside: if err != nil { return err } is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. In this handler, json.NewEncoder(w).Encode(result) returns an error. In production code, you should check that error. If encoding fails, the response might be partial. Handling the error ensures the client gets a valid response or a clear failure.

Pitfalls and silent failures

The metric names are strings, which means typos are easy. If you mistype a name, the runtime won't panic. It returns a sample with Value.Kind() set to ValueBad. If you blindly call Uint64() on a bad value, you get zero, which can mask bugs. Always validate the kind if you're reading dynamic names.

If you pass an unknown name, the compiler won't catch it. You get a runtime zero. The error isn't a compiler error here; it's a silent failure. The compiler rejects code with undefined: pkg if you forget an import, or cannot use x as string if types mismatch. But metric names are just strings. The runtime handles them at runtime.

Some metrics are expensive. Reading /memory/classes/... might trigger a heap scan. Don't read heavy metrics in a hot path. The measurement shouldn't slow down the work. If you're sampling metrics every millisecond in a tight loop, you're adding overhead. Sample at a reasonable interval, like once per second, or only when a health check triggers.

Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path. If you're running a background goroutine to collect metrics, make sure it can stop. Use a context or a done channel. The worst goroutine bug is the one that never logs.

Metrics are samples, not guarantees. Treat them as signals, not absolute truths. A spike in goroutines might be a leak, or it might be a burst of traffic. Correlate metrics with logs and traces to understand the full picture.

When to use what

Use runtime/metrics when you need a unified, low-overhead way to sample runtime internals like goroutine counts, memory classes, and GC cycles.

Use runtime.ReadMemStats when you need detailed memory statistics that runtime/metrics doesn't expose yet, or when maintaining legacy codebases.

Use runtime.NumGoroutine() when you need a quick goroutine count in a simple script and don't want the boilerplate of metrics.Read.

Use external profiling tools like pprof when you need to debug performance bottlenecks, find leaks, or analyze call stacks rather than just reading scalar values.

Use OS-level tools like top or htop when you need to monitor the entire system, including other processes and hardware counters outside Go's control.

Where to go next

Monitoring system resources from Go lets your program check its own health by reading internal counters for memory, CPU, and active tasks. It works like a car's dashboard, showing you fuel and speed without needing to stop the engine or look under the hood.