How the Go Scheduler Works: G, M, P Model

You write a web server in Go. You spawn a goroutine for every incoming request. You hit 10,000 concurrent connections. The server doesn't crash. It doesn't even sweat. In other languages, creating 10,000 threads would eat gigabytes of stack memory and grind the CPU to a halt with context switching. Go handles this because the runtime has a scheduler that manages your goroutines on top of a small pool of OS threads. You don't control the threads. The runtime does. Understanding the GMP model explains why your code scales and where the hidden bottlenecks live.

The GMP Triangle

The scheduler revolves around three letters: G, M, and P.

G stands for goroutine. This is your unit of work. Every time you write go func(), you create a G. A G is lightweight. It starts with a tiny stack that grows and shrinks as needed. You can create millions of them.

M stands for machine. This is an OS thread. The kernel schedules M. Creating an M is expensive. It requires kernel resources and stack memory. The runtime keeps the number of Ms low, usually matching the number of CPU cores.

P stands for processor. This is a logical processor. A P holds a local queue of runnable Gs and binds to an M. The number of Ps is controlled by GOMAXPROCS. By default, GOMAXPROCS matches the number of CPU cores.

Think of a busy restaurant. Gs are order tickets. Ms are chefs. Ps are cooking stations. A chef (M) stands at a station (P) and cooks orders from the stack on that station. If the station runs out of tickets, the chef grabs tickets from a neighbor's station. If a chef needs to wait for the oven (a blocking syscall), they step away, leave the station for another chef, and come back later. The kitchen manager (the scheduler) ensures chefs are always cooking and stations are never idle.

Goroutines are tasks. Threads are workers. Processors are stations.

A Minimal Scheduler Demo

Here's a program that spawns many goroutines. The scheduler decides how to map them to threads. The code prints scheduler statistics to show how Gs and Ps interact.

package main

import (
	"fmt"
	"runtime"
	"sync"
)

// main sets up a workload and prints scheduler stats.
func main() {
	// GOMAXPROCS controls the number of Ps. Default is CPU count.
	// Setting it to 2 limits parallel execution to two cores.
	runtime.GOMAXPROCS(2)
	var wg sync.WaitGroup

	// Spawn 1000 goroutines. The scheduler queues them on Ps.
	// Each goroutine does a small amount of work.
	for i := 0; i < 1000; i++ {
		wg.Add(1)
		go func(id int) {
			defer wg.Done()
			// Simulate computation. The runtime switches Gs here.
			// This loop yields control implicitly after a while.
			for j := 0; j < 100000; j++ {
				_ = j * j
			}
			fmt.Println(id)
		}(i)
	}

	wg.Wait()

	// Print runtime stats to see scheduler behavior.
	// NumG shows total goroutines. NumCgo shows cgo calls.
	fmt.Printf("Goroutines: %d, Threads: %d\n", runtime.NumGoroutine(), runtime.NumThread())
}

Inside the Runtime Loop

When the program starts, the runtime creates Ps based on GOMAXPROCS. Each P gets a local run queue. When you launch a goroutine, it goes into the local queue of the P that created it. This is called local scheduling. It keeps work close to the CPU cache, which improves performance.

An M picks up a P and starts executing Gs from that queue. The M runs Gs until the G yields, blocks, or finishes. If a G finishes, the M checks the local queue for the next G. If the local queue is empty, the M looks for work elsewhere.

This is where work-stealing happens. The M checks the global run queue. If that's empty, the M steals work from other Ps. Stealing takes Gs from the back of another P's queue. This preserves locality for Gs at the front of the queue. The scheduler balances load automatically. You don't tune this. The runtime handles it.

Work stealing keeps the load balanced without central coordination.

Blocking and the Network Poller

Real programs block. They wait for I/O, mutexes, or channels. When a G blocks, the M cannot do anything while waiting. The runtime detaches the M from the P. The P moves to the global run queue or gets handed to another M. This keeps the CPU busy.

Go has a special optimization for network I/O called the network poller. Instead of letting the M block on a syscall, the runtime uses epoll on Linux, kqueue on macOS, or IOCP on Windows. The poller monitors file descriptors. When a goroutine blocks on a network operation, the runtime parks the G and registers the file descriptor with the poller. The M continues running other Gs. When the network event fires, the poller wakes up the G and puts it back on a P's queue.

This means network I/O rarely blocks an M. The scheduler can handle massive concurrency with few threads. The M count stays low, usually matching the P count, unless syscalls force the creation of more Ms.

The network poller turns blocking I/O into non-blocking events.

Realistic Concurrency

Real code often involves HTTP handlers, database calls, or channel communication. The scheduler reacts to these patterns. Here's a server that handles requests with blocking sleeps. The scheduler manages the goroutines efficiently.

package main

import (
	"fmt"
	"net/http"
	"time"
)

// handler simulates a request that blocks on I/O.
func handler(w http.ResponseWriter, r *http.Request) {
	// Simulate a database call or network request.
	// The goroutine blocks here. The runtime parks the G.
	// The M releases the P so other Gs can run.
	time.Sleep(100 * time.Millisecond)

	// When the sleep ends, the G becomes runnable again.
	// The scheduler places it back on a P's queue.
	fmt.Fprintln(w, "Done")
}

// main starts a server and prints a message.
func main() {
	http.HandleFunc("/", handler)
	fmt.Println("Server starting...")
	// http.ListenAndServe blocks the main goroutine.
	// The scheduler manages all incoming request goroutines.
	http.ListenAndServe(":8080", nil)
}

When time.Sleep happens, the goroutine blocks. The M cannot do anything while waiting. The runtime detaches the M from the P. Another M grabs the P and continues running other goroutines. This is why Go can handle massive concurrency with few threads. The M count stays low. The scheduler parks the blocking G and keeps the CPU busy.

Blocking is a feature. The scheduler parks the goroutine and keeps the CPU busy.

Pitfalls and Runtime Errors

The scheduler is robust, but you can still shoot yourself in the foot. The most common runtime panic is fatal error: all goroutines are asleep - deadlock!. This happens when every goroutine is waiting on a channel or mutex and no one can proceed. The scheduler detects that no progress is possible and stops the program.

Goroutine leaks are another danger. If a goroutine waits on a channel that never closes, it stays in the scheduler forever. The memory grows. The scheduler keeps the G alive. Always provide a cancellation path. Use context.Context to signal shutdown. The convention is to pass ctx as the first argument to functions that might block.

Context is plumbing. Run it through every long-lived call site.

Changing GOMAXPROCS manually is usually a mistake. The default matches the CPU count. Lowering it limits parallelism. Raising it creates more Ps than cores, which increases context switching overhead. Only change GOMAXPROCS if you have a specific reason, like running a benchmark or dealing with cgroups that don't report CPU counts correctly.

Using runtime.LockOSThread binds a goroutine to a specific OS thread. This breaks the M pool. If you lock a thread and block, you waste a thread. The scheduler cannot reuse that M. Use LockOSThread only when calling C code via cgo that requires thread-local state. Always call runtime.UnlockOSThread immediately after the critical section.

The worst goroutine bug is the one that never logs.

When to Touch the Scheduler

Most of the time, you don't need to touch the scheduler. The runtime handles G, M, and P management automatically. You write goroutines and channels. The scheduler does the rest. There are a few cases where you need to interact with the scheduler directly.

Use GOMAXPROCS only when you need to restrict CPU usage for a specific process or when running inside a container with cgroups that don't report CPU counts correctly.

Use runtime.LockOSThread when you need to bind a goroutine to a specific OS thread, such as when calling C code via cgo that requires thread-local state.

Use runtime.UnlockOSThread immediately after the critical section to release the binding and let the scheduler resume normal operation.

Use runtime.Gosched when you have a long-running computation without blocking calls and want to yield the processor to other goroutines on the same P.

Use channels or mutexes for synchronization instead of busy-waiting, which wastes CPU cycles and prevents the scheduler from parking goroutines efficiently.

Trust the scheduler. Don't fight it with manual thread management.

Where to go next

The Go scheduler is like a restaurant kitchen where orders are Goroutines, chefs are OS threads, and workstations are logical processors. Each workstation holds a stack of orders for its chef to cook, and if a chef runs out of work, they grab orders from a neighbor's stack to keep everyone busy. This system ensures your program uses all available CPU power without you needing to manage threads manually.