How to Use errgroup for Structured Concurrency

The race to cancel

You are building a dashboard that pulls user data from three separate services. One call hangs. The other two finish fine. Your program waits for the slow one, times out, and finally returns a partial response. Meanwhile, the two finished goroutines are still holding database connections, and the slow one is burning CPU cycles for no reason. You want a system where the first failure triggers an immediate shutdown of every other task. That is exactly what errgroup handles.

What structured concurrency actually means

Structured concurrency is a programming discipline. It treats concurrent tasks as a single logical unit with a defined lifetime. When the unit finishes, every task inside it finishes. When one task fails, the unit fails, and every other task gets a signal to stop. There are no orphaned goroutines drifting in the background.

Think of a film crew shooting a scene. The director calls action. The camera operator, the sound technician, and the lighting crew all start working. If the camera breaks, the director calls cut. Everyone stops immediately. They do not keep rolling while the equipment smokes. errgroup is the director. It tracks every worker, listens for the first failure, and pulls the plug on the entire scene.

The package lives in golang.org/x/sync/errgroup. It is not part of the standard library, but it is maintained by the Go team and considered idiomatic for this exact pattern. It combines a sync.WaitGroup with a context cancellation trigger and an error collector. Under the hood, it uses a simple error channel and a sync.Once to ensure only the first error propagates. Subsequent errors are discarded to prevent race conditions on the error variable.

Structured concurrency is about scope, not speed.

The simplest errgroup

Here is the minimal pattern. You create a group, launch three independent tasks, and wait for the result.

package main

import (
	"context"
	"fmt"
	"time"

	"golang.org/x/sync/errgroup"
)

func main() {
	// Start with a base context. errgroup will derive a cancellable one from it.
	ctx := context.Background()
	g, ctx := errgroup.WithContext(ctx)

	// Launch three independent tasks. Each runs in its own goroutine.
	for i := 0; i < 3; i++ {
		i := i // Capture loop variable to avoid the classic closure bug
		g.Go(func() error {
			time.Sleep(time.Second)
			if i == 1 {
				// Return an error to trigger group-wide cancellation
				return fmt.Errorf("task %d failed", i)
			}
			fmt.Printf("task %d done\n", i)
			return nil
		})
	}

	// Block until all tasks finish or the first one returns an error
	if err := g.Wait(); err != nil {
		fmt.Println("Group failed:", err)
	}
}

The group waits. The context cancels. The error wins.

How the cancellation actually propagates

errgroup.WithContext does two things at once. It creates a new errgroup instance and returns a derived context.Context. That derived context carries a Done channel. When any goroutine launched via g.Go returns a non-nil error, g.Wait immediately calls Cancel() on that context.

The cancellation is cooperative. Go does not forcefully kill goroutines. The context simply signals that the work is no longer needed. Every goroutine that respects the context will notice the signal and exit early. If a goroutine ignores the context, it keeps running. That is a goroutine leak.

You can see the mechanism in action by adding a context check inside the loop. When task 1 fails, the context cancels. Tasks 0 and 2 see the cancellation and stop sleeping. The program exits cleanly instead of waiting for the full sleep duration.

func worker(ctx context.Context, id int) error {
	// Simulate work that can be interrupted
	select {
	case <-time.After(2 * time.Second):
		return fmt.Errorf("worker %d timed out", id)
	case <-ctx.Done():
		// The group already failed. Exit immediately to free resources.
		return ctx.Err()
	}
}

Always wire the context. Always check the error.

Realistic pattern: fetching multiple APIs

In production code, you rarely sleep. You make network calls. HTTP clients in Go accept a context, which makes them perfect for errgroup. Here is how you fetch three endpoints concurrently and fail fast if any one of them drops.

func fetchBatch(ctx context.Context, urls []string) error {
	// Derive the group context. It inherits any parent deadlines.
	g, ctx := errgroup.WithContext(ctx)

	for _, url := range urls {
		url := url // Capture loop variable for the closure
		g.Go(func() error {
			// Pass the group context to the HTTP client
			req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
			if err != nil {
				return fmt.Errorf("creating request for %s: %w", url, err)
			}

			resp, err := http.DefaultClient.Do(req)
			if err != nil {
				return fmt.Errorf("fetching %s: %w", url, err)
			}
			defer resp.Body.Close()

			if resp.StatusCode != http.StatusOK {
				return fmt.Errorf("%s returned status %d", url, resp.StatusCode)
			}

			return nil
		})
	}

	// Return the first error that bubbled up, or nil if all succeeded
	return g.Wait()
}

The http.NewRequestWithContext call binds the request to the group's context. If another goroutine fails first, g.Wait cancels the context. The pending HTTP requests get interrupted at the transport layer. Connections close. Memory frees. The program moves on.

Notice the defer cancel() pattern when you create the parent context. That is standard context hygiene. Even though errgroup cancels the derived context on failure, the parent timeout still needs cleanup if the group succeeds. Go conventions expect you to always call cancel() on contexts created with WithTimeout or WithCancel. Functions that accept a context should always take it as the first parameter, conventionally named ctx. This keeps the signature predictable and makes cancellation wiring obvious to anyone reading the code.

Why not just use channels?

You could build the same pattern with raw channels and sync.WaitGroup. You would launch goroutines, send results to a channel, and use a select to catch the first error. It works, but it requires more boilerplate. You need to manage the channel buffer size, handle the case where a sender blocks because the receiver already moved on, and manually trigger cancellation. errgroup abstracts that coordination away. It gives you a single Wait call that handles the race condition for you.

Channels shine when you need to stream results or build pipelines. errgroup shines when you need to run independent tasks and care about the first failure. They solve different problems. Mixing them without a clear boundary creates tangled control flow.

Where things go wrong

The most common mistake is treating errgroup like a magic thread killer. It is not. It is a coordination primitive. If your goroutine does not check ctx.Done() or does not pass the context to blocking calls, it will ignore the cancellation signal. The goroutine leaks. The program hangs. The worst goroutine bug is the one that never logs.

Another trap is ignoring the loop variable capture. If you write g.Go(func() error { ... use i ... }) without i := i, every goroutine closes over the same variable. By the time they run, i has reached its final loop value. The compiler catches this in modern Go versions. If you forget the capture, you get loop variable i captured by func literal as a hard error. Fix it by shadowing the variable or passing it as an argument.

You also need to match the function signature. g.Go expects a function that takes no arguments and returns exactly one error. If you try to pass a function that returns (string, error), the compiler rejects it with cannot use ... as type func() error in argument to g.Go. Wrap the call in an anonymous function that discards the unwanted return value or handles it internally.

Error wrapping is another detail that pays off. Use %w in fmt.Errorf so that errors.Is and errors.As work correctly downstream. The errgroup error is just the first error that bubbled up. It does not aggregate all errors. If you need to collect every failure, errgroup is the wrong tool.

A goroutine that ignores cancellation is a memory leak waiting to happen.

When to reach for errgroup

Concurrency tools overlap. Picking the right one depends on your failure strategy and your data flow.

Use errgroup when you need to run independent tasks and cancel everything on the first failure. Use sync.WaitGroup when you just need to wait for multiple goroutines to finish and don't care about early cancellation or error propagation. Use a plain channel or select when you need to collect results from multiple workers and merge them into a single stream. Use sequential code when the tasks depend on each other or the overhead of concurrency outweighs the speed gain.

Pick the tool that matches your failure strategy.

Where to go next

Think of errgroup as a team manager for your background tasks. It starts multiple workers at once and ensures that if one worker fails, the manager immediately stops everyone else to save resources. You use it when you need to run several independent jobs in parallel but want to stop everything the moment a single job goes wrong.