How to limit concurrent goroutines

You have a list of 10,000 URLs to scrape. You write a loop, spawn a goroutine for each URL, and run the program. The CPU spikes to 100%. Memory usage climbs as 10,000 HTTP connections open simultaneously. The target server detects the flood and bans your IP. Your program crashes with an out-of-memory error before finishing 1%.

Concurrency without limits is a resource leak waiting to happen. You need a governor. You need to cap the number of active goroutines so the system stays stable, the downstream service stays happy, and your memory usage remains predictable.

Go provides a clean idiom for this: a buffered channel acting as a semaphore. The channel capacity represents the maximum number of concurrent workers. Sending to the channel acquires a slot. Receiving from the channel releases a slot. When the buffer is full, sends block until a worker finishes and frees a slot.

The semaphore pattern

A semaphore is a synchronization primitive that controls access to a shared resource by maintaining a count of available permits. In Go, you don't need a special library. A buffered channel with a fixed capacity is a semaphore.

The capacity is the limit. The values inside the channel are tokens. You send a token to enter the critical section. You receive a token to leave. The channel ensures you never have more tokens in circulation than the capacity allows.

Here's the minimal implementation. The code spawns goroutines to process tasks, but never runs more than three at once.

package main

import (
	"fmt"
	"sync"
)

func main() {
	// buffered channel acts as a semaphore; capacity limits concurrency
	sem := make(chan struct{}, 3)
	var wg sync.WaitGroup

	tasks := []string{"task-1", "task-2", "task-3", "task-4", "task-5"}

	for _, task := range tasks {
		// block until a slot opens; consumes one buffer slot
		sem <- struct{}{}
		wg.Add(1)

		go func(t string) {
			// release the slot when done; puts token back in buffer
			defer func() { <-sem }()
			defer wg.Done()

			fmt.Printf("Processing %s\n", t)
		}(task)
	}

	// wait for all goroutines to finish before exiting
	wg.Wait()
}

The channel sem has capacity 3. The loop iterates over five tasks. The first three iterations send to sem immediately because the buffer has space. Three goroutines start. On the fourth iteration, sem <- struct{}{} blocks. The main goroutine waits. No new goroutine starts until one of the first three finishes and executes <-sem in the defer.

This pattern guarantees that at most three goroutines run fmt.Printf concurrently. The main goroutine queues tasks as fast as slots become available. It never spawns more goroutines than the limit allows.

What happens under the hood

Understanding the mechanics helps you debug issues and choose the right capacity.

When you call make(chan struct{}, 3), the runtime allocates a ring buffer with space for three items. The buffer starts empty.

A send operation checks the buffer. If space is available, the value is copied into the buffer and the send returns immediately. If the buffer is full, the sending goroutine parks. It yields the scheduler and waits for a receiver to free space.

A receive operation checks the buffer. If an item is available, it copies the value out and returns. If the buffer is empty, the receiving goroutine parks and waits for a sender.

In the semaphore pattern, the "workers" are the receivers. They receive from the channel in their defer. When a worker finishes, it unparks a blocked sender. The sender then proceeds to spawn a new worker.

The type struct{}{} is the zero-size type. It consumes no memory. This is a Go convention for signaling channels. When you only care about synchronization and not the value, use struct{}{}. It avoids allocating memory for tokens that carry no data. Passing a bool or int would waste bytes in the buffer. The compiler optimizes struct{}{} to zero allocation.

The defer statement is critical. It ensures the token is released even if the goroutine panics. If you release the token manually after the work function, a panic skips the release. The token is lost. The effective capacity shrinks by one. Eventually, all tokens are lost, and the system deadlocks because no new goroutines can acquire a slot. Always defer the release.

Real-world usage

Real code involves I/O, errors, and context. A scraper or API fan-out needs to handle timeouts, propagate errors, and respect cancellation.

Here's a realistic example. The code fetches data from multiple endpoints with a concurrency limit. It uses context.Context for cancellation and sync.WaitGroup for completion tracking.

package main

import (
	"context"
	"fmt"
	"net/http"
	"sync"
)

// Fetcher holds configuration for a single request
type Fetcher struct {
	// URL is the endpoint to fetch
	URL string
}

// Fetch performs the HTTP request; returns error on failure
func (f *Fetcher) Fetch(ctx context.Context) error {
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, f.URL, nil)
	if err != nil {
		return fmt.Errorf("create request: %w", err)
	}

	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		return fmt.Errorf("do request: %w", err)
	}
	defer resp.Body.Close()

	if resp.StatusCode != http.StatusOK {
		return fmt.Errorf("unexpected status: %d", resp.StatusCode)
	}

	return nil
}

func main() {
	// context controls lifetime and cancellation
	ctx := context.Background()

	tasks := []Fetcher{
		{URL: "https://api.example.com/users"},
		{URL: "https://api.example.com/posts"},
		{URL: "https://api.example.com/comments"},
		{URL: "https://api.example.com/tags"},
		{URL: "https://api.example.com/stats"},
	}

	// limit concurrency to 3 requests at once
	sem := make(chan struct{}, 3)
	var wg sync.WaitGroup

	for _, task := range tasks {
		// acquire semaphore; blocks if limit reached
		sem <- struct{}{}
		wg.Add(1)

		go func(t Fetcher) {
			// release semaphore when goroutine exits
			defer func() { <-sem }()
			defer wg.Done()

			if err := t.Fetch(ctx); err != nil {
				fmt.Printf("Error fetching %s: %v\n", t.URL, err)
				return
			}
			fmt.Printf("Fetched %s\n", t.URL)
		}(task)
	}

	// wait for all workers to complete
	wg.Wait()
}

The Fetcher struct uses a receiver named f. Go convention prefers one or two letter receiver names that match the type. f for Fetcher is standard. Avoid this or self.

The Fetch method takes ctx as the first parameter. Context always goes first. This convention allows middleware and wrappers to pass context through call stacks without changing function signatures. The method uses http.NewRequestWithContext to bind the request to the context. If the context is cancelled, the request aborts.

The main loop acquires the semaphore before spawning the goroutine. This ensures the goroutine count stays bounded. The wg.Add(1) call happens before go func. This ordering prevents a race where the goroutine calls wg.Done() before the counter is incremented. The WaitGroup would see a negative count and panic.

Error handling follows the if err != nil pattern. The code logs the error and returns. It doesn't panic. In a real service, you might collect errors in a slice or use errgroup to fail fast on the first error.

Pitfalls and errors

The semaphore pattern is simple, but subtle mistakes cause deadlocks or leaks.

Token leaks. If a goroutine exits without releasing the token, the capacity shrinks. This happens when you forget the defer or use a manual receive that gets skipped. The system eventually deadlocks because all tokens are consumed and no new goroutines can start. The runtime reports fatal error: all goroutines are asleep - deadlock! when the main goroutine blocks on a send and no receiver is active.

Loop variable capture. In Go versions before 1.22, loop variables were reused across iterations. Spawning a goroutine that captures the loop variable caused all goroutines to see the final value. Go 1.22 changed the semantics so loop variables are per-iteration. The compiler now enforces safe capture. If you write code that looks like it captures the variable incorrectly, the compiler rejects the program with loop variable i captured by func literal. Passing the variable explicitly to the closure remains the clearest pattern and ensures compatibility with older toolchains.

Blocking the main goroutine. The main loop blocks on sem <- struct{}{} when the limit is reached. This is intentional. It prevents spawning excess goroutines. However, if the main goroutine is the only one that can unblock the semaphore (for example, if workers wait on the main goroutine), you get a deadlock. Ensure workers are independent. They should release the token without waiting for the main goroutine.

Ignoring context cancellation. If you limit concurrency but don't pass context to workers, you can't cancel in-flight work. When the main goroutine exits, workers might keep running. This is a goroutine leak. Always pass context and respect cancellation. Workers should check ctx.Done() or use context-aware operations.

Capacity tuning. The capacity is a trade-off. A low capacity reduces resource usage but increases latency because tasks wait longer. A high capacity improves throughput but risks overwhelming downstream services. Test with realistic loads. Monitor error rates and latency. Adjust the capacity until you hit the sweet spot.

When to use this pattern

Concurrency tools solve specific problems. Pick the right one for your scenario.

Use a buffered channel semaphore when you need simple throttling for a burst of independent tasks. Use a worker pool with a dedicated channel when tasks arrive continuously and you want a fixed set of long-lived workers. Use golang.org/x/sync/errgroup when you need concurrency limits plus error propagation and context cancellation. Use sequential code when the overhead of concurrency outweighs the benefit: simple loops are faster for CPU-bound work with no I/O wait.

The semaphore pattern is the workhorse for fan-out operations. It keeps resource usage bounded while maximizing parallelism. It's easy to read, easy to debug, and relies on core language primitives.

Goroutines are cheap. Channels are not magic. Limit the blast radius.

Where to go next

Limiting concurrent goroutines uses a channel with a fixed size to act like a ticket dispenser. You must grab a ticket before starting a task and return it when done. If all tickets are gone, new tasks wait until one is returned, ensuring only a set number run at once.