How the Go Runtime Works: An Overview

The invisible engine behind your code

You write a web server. You add a go keyword before a function call to handle a request. You expect the operating system to create a new thread for that request. You expect the server to run out of memory at 10,000 concurrent connections. You expect the CPU to spend half its time context-switching.

None of that happens. The server handles 100,000 connections. Memory usage stays flat. The CPU stays busy doing work instead of juggling threads.

What happened? The Go runtime stepped in.

The runtime is a library linked into every Go binary. It runs on top of the operating system but manages your program's resources. It schedules goroutines onto a small pool of OS threads. It allocates memory from a private heap. It collects garbage without stopping your code for long. It handles system calls so a blocking operation doesn't freeze the whole program.

You rarely call the runtime directly. You use go, make, chan, and sync. The runtime translates those high-level operations into efficient system behavior. Understanding the runtime helps you write better code, debug weird performance issues, and use tools like GODEBUG to inspect what's happening under the hood.

The scheduler: G, M, and P

The heart of the runtime is the scheduler. It decides which goroutine runs on which thread and when. The scheduler uses three structures: G, M, and P.

A G represents a goroutine. It holds the goroutine's stack, the program counter, and the state. When you write go func(), the runtime creates a G. Gs are cheap. They start with a small stack, usually 2KB, and grow only if needed.

An M represents a machine, which is an OS thread. The runtime creates Ms to execute code. The number of Ms is usually small, often matching the number of CPU cores. Ms do the actual work. They execute instructions and make system calls.

A P represents a processor. A P is a logical CPU context. It holds the local run queue of goroutines, the memory allocator state, and the random number generator state. The runtime creates one P per CPU core by default. Ps are the key to performance. They keep state local so goroutines on the same core don't fight over locks.

The scheduler binds Ps to Ms. An M executes goroutines from its P's run queue. If the queue is empty, the M looks for work elsewhere. It checks the global run queue. It tries to steal work from another P's queue. It polls network pollers for ready connections. This work-stealing behavior keeps CPUs full even when goroutines are unevenly distributed.

When a goroutine blocks on a system call, the runtime detaches the P from the M. The M goes to sleep or waits for the syscall to return. The P moves to another M, which continues running other goroutines. When the syscall finishes, the G is rescheduled onto a P. The program never stalls because one goroutine is waiting for disk I/O.

Goroutines are cheap. OS threads are expensive. The scheduler hides the cost.

Memory management and escape analysis

The runtime manages memory through a private allocator. You don't call malloc or free. You use make and new, or you let the compiler allocate for you. The runtime divides the heap into spans and manages free lists. Allocation is fast because it's lock-free for the common case. Each P has its own cache of free memory. A goroutine on a P can allocate without locking.

The compiler decides whether a variable lives on the stack or the heap. This is escape analysis. If the compiler can prove a variable is only used within a function and doesn't outlive the function call, it stays on the stack. Stack allocation is fast. The memory is reclaimed when the function returns.

If a variable escapes, the compiler moves it to the heap. Variables escape when you return a pointer to a local variable, store a pointer in a global, or pass a pointer to a goroutine. Heap allocation is slightly slower and requires garbage collection.

package main

import "fmt"

// CreateLocal returns a string that stays on the stack.
// The compiler sees the string is copied, not referenced.
func CreateLocal() string {
    // s is a local variable.
    // It does not escape because the value is returned by copy.
    s := "hello"
    return s
}

// CreatePointer returns a pointer that forces heap allocation.
// The value must survive after the function returns.
func CreatePointer() *string {
    // s is a local variable.
    // It escapes because a pointer to s is returned.
    s := "hello"
    return &s
}

func main() {
    // val is a stack-allocated string.
    val := CreateLocal()
    fmt.Println(val)

    // ptr points to a heap-allocated string.
    ptr := CreatePointer()
    fmt.Println(*ptr)
}

Stacks grow and shrink automatically. If a goroutine needs more stack space, the runtime allocates a larger stack and copies the old stack to the new one. This happens transparently. You don't need to worry about stack overflows in normal code. The maximum stack size is 1GB, which is huge compared to fixed OS thread stacks.

Trust the escape analysis. Profile before optimizing memory. The compiler makes good decisions, and premature heap allocation often hurts performance.

Garbage collection

The runtime collects garbage using a concurrent tricolor mark-and-sweep algorithm. The GC runs while your program is running. It stops the world only for very short pauses, usually under 100 microseconds.

The algorithm treats objects as white, gray, or black. White objects are unvisited and potentially garbage. Gray objects are visited but their children are not yet scanned. Black objects are visited and all their children are scanned.

The GC starts by marking the root set: global variables, stack variables, and registers. These roots turn gray. The GC scans gray objects, turning them black and marking their children gray. This continues until no gray objects remain. All white objects are unreachable and get swept.

The runtime uses write barriers to keep the mark phase correct while the program mutates pointers. When a goroutine writes a pointer, the write barrier ensures the GC sees the change. This allows marking to happen concurrently with mutator threads.

You can control GC behavior with GODEBUG. Setting GODEBUG=gctrace=1 prints GC statistics to stderr after each collection. You see pause times, heap size, and work done. Setting GODEBUG=gcpercent=200 changes the trigger threshold. The default is 100, meaning GC runs when live data doubles. Increasing the percentage reduces GC frequency but increases heap size and pause times.

package main

import (
    "fmt"
    "os"
)

func main() {
    // Enable GC tracing to observe runtime behavior.
    // gctrace=1 prints statistics after each collection.
    os.Setenv("GODEBUG", "gctrace=1")

    // Allocate memory to trigger GC.
    // The runtime will print trace output to stderr.
    data := make([]byte, 100*1024*1024)
    fmt.Println("Allocated", len(data), "bytes")
}

GODEBUG is a scalpel, not a hammer. Use it to diagnose issues, not to tune production performance blindly.

Debugging with GODEBUG and directives

The GODEBUG environment variable lets you override runtime behavior. It accepts a comma-separated list of key=value pairs. Keys control specific subsystems. You can disable HTTP/2, change GC thresholds, enable async preemption, or trace scheduler behavior.

For example, GODEBUG=http2client=0,http2server=0 disables HTTP/2 in the standard library. This helps debug compatibility issues with servers that break HTTP/2. GODEBUG=asyncpreemptoff=1 disables async preemption, forcing the scheduler to rely on cooperative preemption. This can help find bugs where goroutines don't yield.

You can also use //go:debug directives in your source code. These directives set GODEBUG keys at compile time. They are useful for setting debug options based on build tags or for locking in behavior for a specific build.

package main

import "fmt"

//go:debug gctrace=1
// This directive enables GC tracing for this build.
// It is equivalent to setting GODEBUG=gctrace=1 at runtime.

func main() {
    fmt.Println("GC tracing enabled via directive")
}

Convention aside: GODEBUG keys are not stable across Go versions. A key that works in Go 1.21 might change or disappear in Go 1.22. Check the release notes before relying on a specific key. Also, GODEBUG is for debugging. Do not use it to change production behavior unless you fully understand the consequences. The runtime is optimized for general workloads. Tweaking knobs can break assumptions and cause subtle bugs.

Pitfalls and runtime errors

The runtime protects you from many errors, but it can't fix everything. Goroutine leaks are the most common issue. A goroutine leaks when it runs forever without doing useful work. This happens when a goroutine waits on a channel that never gets closed or sent to. The goroutine stays in memory, holding onto resources. Over time, the program consumes more memory and slows down.

The compiler and runtime help detect deadlocks. If all goroutines are blocked and no progress can be made, the runtime panics with fatal error: all goroutines are asleep - deadlock!. This error saves you from silent hangs. You get a stack trace showing where each goroutine is stuck.

Blocking the scheduler is another pitfall. If a goroutine runs an infinite loop without yielding, it occupies a P and prevents other goroutines from running on that core. The scheduler uses async preemption to interrupt long-running goroutines, but some code patterns can delay preemption. Always ensure loops yield periodically, either by doing I/O, sending on a channel, or calling a function that might preempt.

Compiler errors often reveal runtime issues. If you forget to capture a loop variable in a closure, the compiler rejects the program with loop variable i captured by func literal in Go 1.22+. This error prevents a common bug where all goroutines share the same variable. If you pass the wrong type to a function, the compiler complains with cannot use x (type int) as string value in argument. These errors catch mistakes before the runtime sees them.

The worst goroutine bug is the one that never logs. Leaks and deadlocks can be silent until the system collapses. Use tools like pprof to inspect goroutine stacks and find leaks.

When to use runtime features

Use a goroutine when you have independent tasks that can run concurrently. Use a channel when you need to synchronize data flow between goroutines. Use sync.Mutex when multiple goroutines need to access shared state safely. Use context.Context when you need to cancel a long-running operation or pass request-scoped values. Use GODEBUG when you need to inspect runtime internals during development or debugging. Use //go:debug when you want to set debug keys based on build configuration. Use plain sequential code when concurrency adds complexity without performance benefit.

Convention aside: context.Context always goes as the first parameter, conventionally named ctx. Functions that take a context should respect cancellation and deadlines. This convention ensures context flows through your call tree consistently. Also, receiver names are usually one or two letters matching the type, like (b *Buffer) Write(...), not (this *Buffer). Keep names short and consistent.

Where to go next

The Go runtime is the engine that runs your program, handling tasks like scheduling tasks and cleaning up memory. It includes a special system called GODEBUG that lets you tweak how this engine behaves, similar to changing settings in a car's dashboard. You use these settings when a new Go update breaks your old code or when you need to debug specific internal behaviors.