How the Go Garbage Collector Works Internally
You are building a high-throughput image processor. Every request allocates a few megabytes for temporary buffers, decodes pixels, and returns a response. In C, you would spend half your time writing free() calls and debugging use-after-free bugs. In Go, you allocate memory, use it, and let it go. The memory disappears when the buffer goes out of scope. It feels like magic. It is not magic. It is a carefully tuned machine running in the background, balancing latency and throughput so your program does not grind to a halt.
The Go garbage collector is a concurrent, non-blocking, tri-color mark-and-sweep collector. It runs in parallel with your program. It reclaims unused memory without stopping the world for long. Understanding how it works helps you write faster code, avoid leaks, and tune performance when the default settings are not enough.
The tri-color strategy
The collector uses a strategy called tri-color mark-and-sweep. Imagine you are painting the walls of a large house. You have three colors of paint: white, gray, and black. White means "not visited yet." Gray means "I am working on this room, but I have not checked all the doors yet." Black means "this room is fully inspected and safe."
The goal is to paint every reachable room black. The collector starts by painting the root objects gray. Roots are global variables, stack frames, and registers. Then it processes the gray set. For every gray object, it scans its pointers. If it finds a pointer to a white object, it paints that object gray and adds it to the work queue. This continues until the gray set is empty. At that point, every live object is black. Any object that remains white is unreachable and can be swept away.
The write barrier ensures safety during concurrent marking. If your program writes a pointer to a white object while the mark phase is running, the write barrier catches the write and paints the target gray. This prevents the mark phase from missing a live object that the program just created. The write barrier is the safety net that allows the collector to run concurrently without stopping your goroutines.
The write barrier is the safety net. Without it, concurrency breaks the mark.
Minimal example
Here is the simplest way to observe the collector: allocate memory, force a cycle, and check the stats.
package main
import (
"fmt"
"runtime"
)
// allocateBigBuffer creates a large slice that the GC must track.
func allocateBigBuffer() []byte {
// Allocate 1MB on the heap. The GC will scan this later.
buf := make([]byte, 1024*1024)
// Use the buffer so the compiler does not optimize it away.
buf[0] = 42
return buf
}
func main() {
// Force a collection to see stats.
runtime.GC()
var m runtime.MemStats
runtime.ReadMemStats(&m)
// Print current heap usage.
fmt.Printf("HeapAlloc: %d bytes\n", m.HeapAlloc)
}
Walkthrough: Mark, sweep, and barriers
When your program runs, the allocator hands out memory from the heap. The heap is divided into spans, which are contiguous blocks of memory. Small objects are allocated from spans. Large objects get their own span. When the heap grows past a threshold, the collector wakes up. It does not stop your program. It runs concurrently on multiple processors.
The mark phase begins. The collector scans the root set. It paints root objects gray and adds them to a work queue. Worker goroutines pull objects from the queue. For each gray object, they scan its fields. If a field is a pointer to a white object, they paint it gray and enqueue it. This propagates through the object graph.
The write barrier runs on every pointer write in your program. If you write a pointer p to an object x, the barrier checks the color of x. If x is white, the barrier paints x gray. This ensures that any object reachable from the roots is eventually marked, even if the program creates new references during the mark phase.
Once the gray set is empty, the mark phase ends. All live objects are black. The sweep phase follows. It walks the heap span by span. It frees any white objects. The memory goes back to the allocator for reuse. The sweep phase is also concurrent, but it may cause brief pauses when it interacts with the allocator.
The GC runs while you run. It pauses only for microseconds.
Escape analysis: Stack versus heap
The compiler tries to keep objects on the stack. Stack memory is freed instantly when the function returns. There is no GC cost. If an object escapes to the heap, the GC must track it. Escape happens when you return a pointer, store a pointer in a global variable, or pass a pointer to a function that might store it.
You can check escape analysis with go build -gcflags="-m". The output shows which variables escape to the heap. If you see escapes to heap, the object will be tracked by the GC. Keeping objects on the stack reduces GC pressure and improves performance.
Escape analysis is your first optimization. Keep objects small and local.
Realistic tuning
Here is how you tune the collector for a batch job: lower the threshold to keep the heap small, accepting more CPU cost for fewer latency spikes.
package main
import (
"fmt"
"runtime"
"runtime/debug"
)
// processBatch allocates memory in a loop to simulate workload.
func processBatch() {
for i := 0; i < 100; i++ {
// Allocate 1MB per iteration.
buf := make([]byte, 1024*1024)
buf[0] = byte(i)
// buf is discarded at end of loop iteration.
}
}
func main() {
// Lower GOGC to 50 to trigger GC sooner.
// This reduces latency spikes but increases CPU usage.
debug.SetGCPercent(50)
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Before: HeapAlloc %d\n", m.HeapAlloc)
processBatch()
runtime.ReadMemStats(&m)
fmt.Printf("After: HeapAlloc %d\n", m.HeapAlloc)
}
The GOGC environment variable controls the trigger threshold. The default is 100, meaning the collector triggers when the heap doubles since the last collection. You can change this with debug.SetGCPercent. Lower values trigger more frequent collections, reducing pause times but increasing CPU usage. Higher values reduce CPU usage but increase pause times and memory footprint.
You can also observe the collector with GODEBUG=gctrace=1. This prints GC statistics to stderr. The output shows pause times, heap sizes, and processor counts. Use this to diagnose latency issues.
Tune based on metrics, not guesses. The default works for most code.
Pitfalls and leaks
The collector handles most memory management for you, but you can still cause problems. The worst bug is a goroutine leak. If a goroutine holds a reference to a large buffer and waits on a channel that never closes, the buffer stays alive forever. The GC cannot reclaim it because the goroutine is still reachable. You will see your memory usage climb until the process is killed. There is no compiler error for this. The compiler only checks types. If you forget to close a channel, you get a runtime leak, not a compile error.
Another pitfall is allocating huge objects. If you allocate a single object larger than 32KB, it goes directly to the heap. Small objects might stay on the stack if the compiler can prove they do not escape. If you allocate a large slice and pass it to a function, the compiler might force it to the heap. Large objects bypass the span allocator and are swept directly. They can cause longer pauses during sweep.
Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path.
When to tune
Use the default GOGC setting when you are building a standard service: the balance of latency and throughput works for most workloads. Use debug.SetGCPercent to lower the threshold when you need predictable latency and can afford higher CPU usage, such as in a real-time game server. Use debug.SetGCPercent to raise the threshold when CPU is the bottleneck and you can tolerate occasional latency spikes, such as in a batch data processor. Use runtime.GC() to force a collection when you are writing benchmarks or tests and need to stabilize memory metrics before measurement. Use object pooling with sync.Pool when you are allocating and discarding the same type of object in a tight loop: the pool reuses memory without GC pressure. Use stack allocation when possible by keeping objects small and local: the compiler will keep them off the heap if they do not escape.
The worst GC bug is the leak you can't see.