How the Go Scheduler Works (GMP Model)

The Go scheduler uses the GMP model to efficiently run millions of goroutines on limited OS threads by dynamically balancing work and handling blocking operations.

The goroutine explosion that doesn't crash

You write a simple HTTP handler. A request arrives, you spawn a goroutine to process it. Another request comes in, another goroutine. You run a load test with 50,000 concurrent connections. The server doesn't crash. Memory usage stays flat. CPU utilization scales linearly with the number of cores.

In languages that map threads one-to-one with OS threads, this scenario forces you to manage thread pools manually. Create too many threads, and the kernel runs out of stack space or context-switching overhead kills performance. Go avoids this trap entirely. The runtime manages a scheduler that multiplexes millions of goroutines across a small set of OS threads. You write concurrent code as if resources are infinite; the scheduler makes it work on finite hardware.

The GMP model

The scheduler revolves around three entities: G, M, and P.

G is a goroutine. It represents a unit of work. Each goroutine has its own stack, which starts small and grows or shrinks dynamically as needed.

M is a Machine. It maps directly to an OS thread. The Machine does the actual execution on the CPU.

P is a Processor. It is a logical resource that holds state for the scheduler. A Processor maintains a local queue of goroutines waiting to run, along with other bookkeeping data like random number generator state and timer information.

Think of a busy restaurant kitchen. The dishes are the goroutines. The chefs are the OS threads. The cooking stations are the processors. A station has a tray of dishes waiting to be cooked. A chef stands at a station and cooks the dishes. If a chef needs to wait for the oven, they step away, and another chef takes over the station. The station doesn't disappear; it just waits for a chef to return.

The number of Processors determines how many goroutines can run in parallel. By default, Go sets this to the number of logical CPUs on the machine. This means if you have 8 cores, you have 8 Processors, and at most 8 goroutines execute simultaneously. The rest wait in queues.

Minimal example

Here's a minimal example showing the scheduler swapping execution between two goroutines on a single processor.

package main

import (
	"fmt"
	"runtime"
)

func main() {
	// Limit to one processor to force cooperative scheduling
	runtime.GOMAXPROCS(1)

	done := make(chan bool)

	// Spawn a goroutine that yields control periodically
	go func() {
		for i := 0; i < 5; i++ {
			fmt.Println("Worker:", i)
			// Yield to let the main goroutine run on the single P
			runtime.Gosched()
		}
		done <- true
	}()

	// Main goroutine runs concurrently with the worker
	for i := 0; i < 5; i++ {
		fmt.Println("Main:", i)
		// Yield to let the worker goroutine run
		runtime.Gosched()
	}

	// Wait for worker to finish
	<-done
}

The output alternates between Main and Worker. Without runtime.Gosched, the behavior depends on the scheduler's internal preemption points. With GOMAXPROCS(1), the scheduler can only run one goroutine at a time, so explicit yields make the handoff visible.

Convention note: runtime.GOMAXPROCS defaults to the number of CPUs. You rarely need to change it. If you do, set it at the start of main before spawning any goroutines. Changing it at runtime works, but it can confuse profiling and debugging tools.

Inside the scheduler loop

When you call go, the runtime allocates a small stack for the goroutine and places it in the local run queue of the current Processor. The scheduler runs in the background, constantly looking for work to do.

The scheduling loop follows a predictable pattern. A Processor grabs a goroutine from its local queue and hands it to a Machine. The Machine executes the goroutine. If the goroutine finishes, the Machine picks the next one from the queue. If the goroutine hits a blocking operation, like waiting on a channel or a network socket, the Machine detaches from the Processor. The Processor moves to a different Machine so other goroutines can keep running. The original Machine sleeps until the blocking operation completes, then wakes up and reattaches to a Processor to resume the goroutine.

Work stealing keeps the load balanced. If a Processor's local queue is empty, it reaches into a neighbor's local queue and steals half the goroutines. This prevents some CPUs from sitting idle while others are overloaded. The local queue is a doubly-linked list optimized for fast access. Work stealing avoids locks by using atomic operations, so it scales well even under heavy contention.

If a Processor's local queue fills up, goroutines spill over to a global run queue. The global queue is protected by a lock, so the runtime tries to keep it empty. Processors check the global queue periodically to drain it. This two-level queue system minimizes lock contention while ensuring fairness.

Goroutines are cheap. The scheduler handles the complexity.

Realistic scenario

Here's how a standard HTTP server leverages the scheduler to handle thousands of concurrent requests without creating thousands of threads.

package main

import (
	"fmt"
	"net/http"
	"time"
)

// handleRequest simulates a slow database query
func handleRequest(w http.ResponseWriter, r *http.Request) {
	// Simulate blocking I/O
	time.Sleep(100 * time.Millisecond)
	// Sleep blocks the goroutine, triggering the scheduler to move the thread to other work
	fmt.Fprintf(w, "Done")
}

func main() {
	http.HandleFunc("/", handleRequest)
	// ListenAndServe spawns a goroutine per request
	if err := http.ListenAndServe(":8080", nil); err != nil {
		panic(err)
	}
}

When a request arrives, http.Server spawns a new goroutine to handle it. The goroutine calls time.Sleep, which blocks. The scheduler detects the block, detaches the Machine, and moves the Processor to another Machine. Other requests get processed immediately. When the sleep finishes, the goroutine wakes up, the Machine reattaches to a Processor, and the response is sent.

The server handles concurrency by creating goroutines. The scheduler handles efficiency by multiplexing them onto OS threads. You don't manage threads. You manage goroutines.

Convention note: The http package follows the "accept interfaces, return structs" mantra. Handlers accept http.ResponseWriter and *http.Request. The server returns errors from ListenAndServe. In production code, always check the error. The community accepts the boilerplate because it makes failure paths visible.

Pitfalls and runtime errors

The scheduler is robust, but you can trip it.

Blocking the whole thread with a non-Go call breaks the model. If you call a C function that blocks without releasing the thread, the Machine stays stuck. The Processor can't move to another Machine, and the whole core stalls. The runtime tries to detect this by spawning a "syscall" thread to handle the blocking call, but it's not perfect. If the C call holds the thread for too long, performance degrades.

Infinite loops without yielding starve other goroutines. If a goroutine runs in a tight loop without blocking or yielding, the scheduler can't preempt it easily. Modern Go versions use async preemption to interrupt long-running goroutines, but it adds overhead. Write loops that check for cancellation or use blocking operations.

Goroutine leaks happen when a goroutine waits on a channel that never gets closed. The goroutine stays in memory, holding onto resources. The scheduler keeps it in the queue, but it never runs. Use context.Context to signal cancellation. Pass the context as the first parameter, conventionally named ctx. Functions that take a context should respect cancellation and deadlines.

The runtime detects deadlocks. If all goroutines are waiting on channels that never get sent, the program stops. The runtime panics with fatal error: all goroutines are asleep - deadlock!. This error saves you from silent hangs. Fix the logic so at least one goroutine can make progress.

CGO breaks the model. Wrap it or lock the thread.

When to touch the scheduler

Use the default scheduler configuration when building standard Go applications; it auto-tunes to your CPU count and handles concurrency efficiently.

Use runtime.GOMAXPROCS when you need to limit parallelism for testing or to match a container's CPU quota; set it early in main before spawning goroutines.

Use runtime.LockOSThread when interfacing with a C library that requires thread-local state; unlock the thread immediately after the call to restore scheduler flexibility.

Use runtime.Gosched when writing tight loops that must yield control to other goroutines on the same processor; avoid it in production code where blocking or channel operations handle scheduling naturally.

Trust the scheduler. Profile the code.

Where to go next

The Go scheduler is like a manager (Processor) assigning tasks (Goroutines) to workers (OS Threads). If a worker gets stuck waiting, the manager gives their tasks to another available worker so nothing stops. This keeps your program fast and responsive even when running millions of tiny tasks at once.