How to Implement Bounded Parallelism in Go

You have a list of 10,000 URLs to fetch. You write a loop, spawn a goroutine for each URL, and hit run. Your CPU usage spikes to 100 percent. Memory consumption climbs until the garbage collector screams. The target server detects the flood and bans your IP address. You created a thundering herd.

The fix isn't to stop using goroutines. The fix is to bound the parallelism. You need a mechanism that says "run as many tasks as possible, but never more than 50 at once." In Go, the idiomatic solution is a buffered channel acting as a semaphore.

The semaphore pattern

A semaphore is a concurrency primitive that tracks available resources. It allows a limited number of operations to proceed while blocking the rest. Go doesn't have a built-in semaphore type in the standard library, but a buffered channel does the job perfectly.

Think of the channel buffer as a parking lot with a fixed number of spots. The buffer capacity is the limit. Sending a value into the channel is like driving into a spot. If the lot is full, the car waits at the gate. Receiving from the channel is like a car leaving. The spot opens up, and the next car in line can enter.

The value you send doesn't matter. The channel only cares about the count. The convention is to use struct{}{}, which is a zero-size struct. It consumes no memory for the value itself, so the only overhead is the channel bookkeeping. This is a community standard: semaphores use struct{} tokens.

Minimal example

Here's the core pattern: a buffered channel limits concurrency, and a sync.WaitGroup tracks completion.

package main

import (
	"fmt"
	"sync"
)

func main() {
	// Buffer size 3 caps concurrency at 3 goroutines
	sem := make(chan struct{}, 3)
	var wg sync.WaitGroup

	for i := 1; i <= 10; i++ {
		wg.Add(1)
		// Pass loop variable to avoid capture bugs
		go func(id int) {
			defer wg.Done()
			// Block until a slot is available in the buffer
			sem <- struct{}{}
			// Release the slot when the goroutine exits
			defer func() { <-sem }()

			fmt.Printf("Processing %d\n", id)
		}(i)
	}

	// Wait for all goroutines to finish
	wg.Wait()
}

A buffered channel is a semaphore. Treat it like one.

How it runs

The channel sem starts empty with capacity 3. The loop spawns goroutines rapidly. The first three goroutines execute sem <- struct{}{}. The buffer has space, so the sends succeed immediately. Those three goroutines proceed to the work.

The fourth goroutine reaches the send statement. The buffer is full. The goroutine blocks. It suspends execution and waits for a receive operation on sem. It consumes no CPU while waiting. The scheduler moves on to other tasks.

When one of the first three goroutines finishes, the deferred receive <-sem runs. It pulls a value out of the buffer. The buffer now has one free slot. The runtime wakes the blocked goroutine. It completes the send and proceeds.

This cycle repeats until all work is done. The WaitGroup ensures main waits for every goroutine to finish. Without it, main would exit, killing the program while goroutines are still running. The WaitGroup is separate from the semaphore. The semaphore controls flow; the WaitGroup controls lifecycle.

Tuning the limit

Choosing the buffer size depends on the workload. For CPU-bound tasks, the limit should match the number of cores. Use runtime.NumCPU() to get the count. Spawning more goroutines than cores for CPU work adds scheduling overhead without speeding things up. The CPU context switches between goroutines, burning cycles on management instead of computation.

For I/O-bound tasks, the limit can be much higher. Goroutines block while waiting for network or disk. The OS can handle thousands of blocked goroutines efficiently. The limit protects the downstream service from overload and keeps memory usage predictable. Start with a small number like 10 or 50 and measure latency and throughput. Increase the limit until latency degrades or memory usage becomes unacceptable.

Measure before you guess. The right limit depends on the workload, not the language.

Realistic usage

Real code involves I/O, errors, and context. Here's how the pattern looks in an HTTP fetching scenario.

package main

import (
	"context"
	"fmt"
	"net/http"
	"sync"
	"time"
)

// FetchURL retrieves content with a timeout
func FetchURL(ctx context.Context, url string) error {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	if err != nil {
		return err
	}
	// Context carries the timeout to the HTTP client
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return err
	}
	// Discard body to release connection resources
	defer resp.Body.Close()
	_ = resp.StatusCode
	return nil
}

func main() {
	// Context enforces a global deadline for all work
	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	urls := []string{"https://example.com/1", "https://example.com/2"}
	// Semaphore limits concurrent connections to 5
	sem := make(chan struct{}, 5)
	var mu sync.Mutex
	var errs []error

	var wg sync.WaitGroup
	for _, url := range urls {
		wg.Add(1)
		go func(u string) {
			defer wg.Done()
			// Acquire slot; blocks if 5 goroutines are active
			sem <- struct{}{}
			// Release slot on exit
			defer func() { <-sem }()

			err := FetchURL(ctx, u)
			mu.Lock()
			if err != nil {
				errs = append(errs, err)
			}
			mu.Unlock()
		}(url)
	}

	wg.Wait()
	if len(errs) > 0 {
		fmt.Println("Errors:", errs)
	}
}

Context flows down. Errors flow up. Slots flow through.

Context and cancellation

Context is plumbing. Run it through every long-lived call site. When the context cancels, goroutines should stop their work. The deferred receive on the semaphore ensures the slot releases even if the goroutine exits early. This prevents the semaphore from leaking slots when cancellation happens.

Functions that accept context should check ctx.Err() periodically if the work involves loops or long computations. If the error is context.Canceled or context.DeadlineExceeded, return immediately. The convention is to pass context.Context as the first parameter, named ctx. This makes the dependency visible and allows callers to control cancellation.

If a goroutine holds a slot and the context cancels, the goroutine must exit. The deferred receive runs, the slot frees up, and other goroutines can continue. Without this discipline, cancellation stalls the pipeline.

Mutex vs semaphore

A mutex is a binary semaphore. It allows exactly one goroutine to proceed. A buffered channel with capacity 1 behaves like a mutex. A buffered channel with capacity N allows N goroutines. The semaphore pattern generalizes the mutex.

Use a mutex when you need mutual exclusion for a shared variable. Use a semaphore when you need to limit concurrency for independent tasks. The mutex protects data integrity. The semaphore protects resource capacity. Don't confuse the two. A mutex serializes access. A semaphore bounds parallelism.

A mutex protects data. A semaphore protects resources.

Pitfalls and errors

Forgetting to release the slot is the most common bug. If you omit the deferred receive, the goroutine finishes but the semaphore retains the slot. Eventually, all slots fill. New goroutines block forever. The program hangs. The runtime panics with fatal error: all goroutines are asleep - deadlock!.

Always pair the send with a deferred receive. The defer guarantees the slot releases even if the goroutine panics. This is a safety net. If the work panics, the slot still frees up.

Loop variable capture is another trap. If you write go func() { ... }(i) without passing i as an argument, older Go versions allow the code to compile but every goroutine sees the final value of i. Go 1.22+ rejects this pattern with loop variable i captured by func literal. Always pass the variable explicitly: go func(item Item) { ... }(item).

Using a zero-buffer channel turns the semaphore into a mutex. make(chan struct{}) has capacity 0. Sends block until a receive happens. Only one goroutine can proceed at a time. That's mutual exclusion, not bounded parallelism. Set the buffer size to the concurrency limit.

The compiler rejects unused imports with imported and not used. If you add sync but only use WaitGroup, the compiler complains. Remove unused imports. Trust gofmt to format the code consistently. Don't argue about indentation; let the tool decide.

A goroutine that forgets to release its slot is a silent deadlock waiting to happen.

Decision matrix

Use a buffered channel semaphore when you have a burst of independent tasks and need to cap concurrency to protect resources.

Use a worker pool with a job channel when tasks arrive continuously over time and you want a fixed set of long-lived goroutines.

Use sync.WaitGroup alone when you need to wait for completion but don't need to limit how many run at once.

Use errgroup.Group from golang.org/x/sync when you want bounded parallelism plus automatic error propagation and context cancellation.

Use sequential code when the overhead of concurrency outweighs the benefit or the tasks are CPU-bound on a single core.

Concurrency is a tool, not a goal. Bound it or break it.

Where to go next

Bounded parallelism in Go acts like a ticket system for your code. You create a pool of tickets (the buffer size) and a worker must grab a ticket before starting work. When they finish, they return the ticket. This ensures only a specific number of tasks run at once, preventing your system from being overwhelmed.