How Many Goroutines Can You Run in Go

The sudden spike

You deploy a Go service to process incoming webhooks. At first, ten requests a second is fine. You wrap the handler in go handleRequest(req) and everything feels fast. Then a partner sends a burst of fifty thousand messages in a minute. Your CPU spikes, memory climbs, and the process crashes. You assume Go has a hard limit on concurrency. It does not. The limit is your machine's RAM, and understanding why changes how you write concurrent code.

Goroutines are not OS threads. They are managed by the Go runtime, which multiplexes them onto a small pool of actual threads. You can run hundreds of thousands of goroutines on a single laptop. The scheduler decides which one runs next. Memory decides when you stop.

How the scheduler actually works

Operating system threads are heavy. Each one reserves megabytes of stack space upfront and requires a context switch that costs the CPU. Goroutines are different. They start with a tiny stack, usually two kilobytes, and grow only when they need to. The Go runtime manages them using a work-stealing scheduler.

Think of a busy kitchen. OS threads are the actual chefs. You can only hire so many before the kitchen gets crowded and they bump into each other. Goroutines are the tickets on the rail. You can print thousands of tickets. The head chef (the scheduler) hands out tickets to the available chefs. When a chef finishes a ticket, they grab another. If a chef is waiting for the oven (I/O), the head chef takes their remaining tickets and gives them to a free chef. The system stays fluid. The only thing that stops you from printing more tickets is the size of the ticket rail, which maps to available memory.

The scheduler uses an M:N model. M goroutines run on N OS threads. N is controlled by GOMAXPROCS. It defaults to the number of logical CPU cores your machine reports. The scheduler maintains a global run queue and a local run queue per thread. When a local queue empties, the thread steals half the tasks from another thread's queue. This keeps all cores busy without constant locking.

Goroutines are cheap. The scheduler is not magic.

Minimal example

You can spawn thousands of goroutines in a few lines. The code below shows the baseline pattern.

package main

import (
	"fmt"
	"sync"
)

// SpawnMany demonstrates creating a large number of lightweight goroutines.
func SpawnMany(count int) {
	var wg sync.WaitGroup
	// Add the count to the WaitGroup before launching goroutines.
	wg.Add(count)

	for i := 0; i < count; i++ {
		// Launch a new goroutine for each iteration.
		go func(id int) {
			// Defer ensures cleanup runs when the goroutine finishes.
			defer wg.Done()
			fmt.Printf("Goroutine %d running\n", id)
		}(i)
	}

	// Block until all goroutines call Done().
	wg.Wait()
	fmt.Println("All finished")
}

func main() {
	// Start with ten thousand concurrent tasks.
	SpawnMany(10000)
}

Run this and watch the output. The goroutines do not execute in order. They do not even start in order. The scheduler batches them onto the available OS threads. Each goroutine allocates a small stack. If a goroutine calls a function that allocates more stack space, the runtime quietly expands it. When the goroutine returns, the stack shrinks back down. This dynamic resizing is why you can run hundreds of thousands of goroutines on a laptop with sixteen gigabytes of RAM. The memory footprint stays low because idle goroutines consume almost nothing.

The runtime tracks concurrency and parallelism separately. Concurrency is how many tasks exist. Parallelism is how many run at the exact same nanosecond. The GOMAXPROCS setting controls parallelism. You can check it at runtime:

package main

import (
	"fmt"
	"runtime"
)

// CheckParallelism prints the current parallelism limit.
func CheckParallelism() {
	// Passing 0 returns the current value without changing it.
	cores := runtime.GOMAXPROCS(0)
	fmt.Printf("Running on %d logical cores\n", cores)
}

func main() {
	CheckParallelism()
}

Changing GOMAXPROCS does not create more goroutines. It only tells the scheduler how many OS threads to keep busy simultaneously. If you set it to one, your program becomes strictly sequential despite having thousands of goroutines. The scheduler will still context-switch between them, but only one touches the CPU at a time.

Trust the default. Only change GOMAXPROCS when you have a measured reason.

What happens under the hood

When you call go, the runtime allocates a g struct. It holds the program counter, the stack pointer, and a link to the current function. The initial stack is two kilobytes. If the goroutine calls a function that needs more space, the runtime doubles the stack size. It keeps doubling until it hits one megabyte, then it grows in larger chunks. The hard limit is one gigabyte per goroutine. You will run out of system memory long before you hit that limit.

The runtime also runs a network poller. When a goroutine calls net.Dial or http.Get, the runtime parks the goroutine and registers the file descriptor with the OS event loop. The OS wakes the poller when data arrives. The scheduler unparks the goroutine and puts it back on a run queue. No OS thread is blocked waiting for the network. This is why Go handles massive I/O concurrency so well.

The community follows a few quiet conventions that keep this machinery healthy. gofmt is mandatory. Let the tool decide indentation and spacing. if err != nil { return err } is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. Receiver names are usually one or two letters matching the type, like (b *Buffer) Write(...), not (this *Buffer). These small habits reduce cognitive load when you are debugging scheduler behavior.

Stack growth is automatic. Memory is the only ceiling.

Realistic monitoring and cancellation

Real systems need to monitor goroutine counts to avoid memory exhaustion. A long-running service might spawn a goroutine per connection, per database query, or per background job. Without bounds, a traffic spike or a slow downstream service will cause goroutines to pile up. Each one holds a stack, captures variables, and waits on a channel or network call. Memory usage climbs until the operating system kills the process.

You can inspect the live count using runtime.NumGoroutine(). It returns a snapshot, not a guarantee, because goroutines are constantly starting and dying. Still, it gives you a baseline for alerts.

package main

import (
	"fmt"
	"runtime"
	"time"
)

// MonitorGoroutines prints the active goroutine count periodically.
func MonitorGoroutines(interval time.Duration) {
	ticker := time.NewTicker(interval)
	// Always stop the ticker when the function returns to prevent leaks.
	defer ticker.Stop()

	for range ticker.C {
		// NumGoroutine returns a point-in-time count.
		count := runtime.NumGoroutine()
		fmt.Printf("Active goroutines: %d\n", count)
	}
}

func main() {
	// Start a background monitor that runs forever.
	go MonitorGoroutines(5 * time.Second)
	// Keep the main goroutine alive for demonstration.
	time.Sleep(30 * time.Second)
}

Pair this with a cancellation mechanism. The Go community standard is context.Context. Pass it as the first parameter to any function that might block. When the context cancels, the goroutine stops waiting and exits. This prevents the pile-up.

package main

import (
	"context"
	"fmt"
	"time"
)

// FetchWithTimeout simulates a network call that respects cancellation.
func FetchWithTimeout(ctx context.Context, url string) error {
	// Create a channel to receive the simulated result.
	done := make(chan error, 1)

	go func() {
		// Simulate a slow downstream service.
		time.Sleep(2 * time.Second)
		done <- nil
	}()

	select {
	case <-ctx.Done():
		// The caller cancelled. Return immediately.
		return ctx.Err()
	case err := <-done:
		// The work finished. Return the result.
		return err
	}
}

func main() {
	// Set a strict deadline for the operation.
	ctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)
	defer cancel()

	err := FetchWithTimeout(ctx, "https://example.com/data")
	fmt.Printf("Result: %v\n", err)
}

Context is plumbing. Run it through every long-lived call site.

Pitfalls and runtime panics

The runtime will not stop you from spawning too many goroutines. It will only stop you when you run out of memory. The operating system sends a signal, and Go prints a fatal error like fatal error: runtime: out of memory. You might also hit a deadlock if every goroutine is waiting on a channel that never receives a value. The runtime detects this and panics with fatal error: all goroutines are asleep - deadlock!. Neither error is a scheduler limit. Both are design limits.

Another common trap is blocking the scheduler. If a goroutine calls a non-Go function that blocks an OS thread, like a C library call or a raw syscall, it ties up one of the GOMAXPROCS threads. The scheduler cannot steal work from it. Your parallelism drops to zero for other goroutines. The program appears to freeze. The compiler cannot catch this. You have to know which packages use cgo or raw syscalls.

Goroutine leaks are the quiet killer. A goroutine waits on a channel. The sender crashes or forgets to close it. The receiver sits forever. The stack stays allocated. The count climbs. The service degrades over hours. You can spot leaks using pprof. Run go tool pprof http://localhost:6060/debug/pprof/goroutine to see the call stacks of every live goroutine. If you see thousands of identical stacks waiting on the same channel, you have a leak. Close the channel when the sender is done. Add a context timeout when the receiver might wait too long.

The worst goroutine bug is the one that never logs.

When to scale up and when to step back

You need to match your concurrency strategy to the workload.

Use a raw goroutine per task when the work is short-lived and the total count stays predictable. Use a worker pool with a buffered channel when you need to cap concurrency to protect a downstream database or API. Use runtime.GOMAXPROCS(1) when you are running a single-threaded benchmark or debugging a race condition. Use sequential code when the tasks depend on each other or the overhead of scheduling outweighs the benefit. Use context.Context with a timeout when a goroutine might block indefinitely on I/O. Use sync.WaitGroup when you need to wait for a dynamic set of goroutines to finish. Use runtime.NumGoroutine() when you need to alert on memory pressure before the OOM killer strikes.

Concurrency is a tool, not a default. Pick the shape that matches the data.

Where to go next

Go lets you run thousands or even millions of tasks at the same time without needing a separate operating system thread for each one. Think of it like a single chef managing hundreds of small orders by switching between them instantly, rather than hiring a separate chef for every single order. You only need to worry about how many CPU cores your computer has, as Go handles the rest automatically.