The problem with waiting on goroutines
You spin up three goroutines to fetch data from different services. One finishes in twenty milliseconds. Another takes two seconds. The third crashes with a network timeout. You need to know when all of them are done, but you also need to fail fast if any of them hit a wall. Waiting on three separate sync.WaitGroup calls gets messy. Tracking the first error across concurrent paths requires shared state, mutexes, and careful channel management. Go's standard library leaves this gap open by design. The language prefers explicit coordination over hidden magic. The ecosystem filled the gap with a single package that handles the heavy lifting without hiding the mechanics.
What errgroup actually does
errgroup lives in the golang.org/x/sync repository. It combines a wait group, a context, and an error channel into one tight abstraction. Think of it like a construction foreman. The foreman hands out work orders to three crews. Each crew works independently. The foreman watches the site. If one crew hits a structural problem, the foreman blows the whistle, cancels the remaining work orders, and walks away with a single report card that says exactly what went wrong. You do not need to track each crew's progress manually. The foreman handles the coordination, the cancellation, and the error collection.
The package exports two main constructors. errgroup.New() creates a basic group that only waits and returns the first error. errgroup.WithContext() creates a group that also derives a cancellable context. The context version is the standard choice. It gives every goroutine a shared signal to stop working when something goes wrong.
The minimal pattern
Here is the simplest way to use it. You create a group, derive a context from it, launch goroutines via g.Go, and call g.Wait to block until everything finishes or fails.
package main
import (
"context"
"fmt"
"time"
"golang.org/x/sync/errgroup"
)
func main() {
ctx := context.Background()
// WithContext ties the group's lifecycle to a cancellable context.
g, ctx := errgroup.WithContext(ctx)
for i := 0; i < 3; i++ {
taskID := i // capture loop variable to avoid closure mutation
g.Go(func() error {
time.Sleep(100 * time.Millisecond)
if taskID == 1 {
// Return an error to trigger group-wide cancellation.
return fmt.Errorf("task %d failed", taskID)
}
fmt.Printf("task %d done\n", taskID)
return nil
})
}
// Wait blocks until all goroutines finish or the first error returns.
if err := g.Wait(); err != nil {
fmt.Println("group failed:", err)
}
}
The program starts by calling errgroup.WithContext. This returns two values: the group itself and a new context derived from the parent. The context is the cancellation signal. When any goroutine inside the group returns an error, errgroup immediately cancels that context. Every other goroutine should check ctx.Done() or use the context in I/O calls. The g.Go method accepts a function that returns an error. It runs that function in a new goroutine. Under the hood, it increments an internal counter, similar to sync.WaitGroup.Add. When the function returns, it decrements the counter. If the error is not nil, it stores the error and cancels the context. g.Wait blocks until the counter reaches zero. It returns the first error that was recorded, or nil if everything succeeded.
The loop variable capture is a classic Go gotcha. If you pass i directly into the closure without assigning it to a new variable, all three goroutines will read the final value of i after the loop finishes. Go 1.22 changed loop variable scoping to fix this, but explicit capture remains the safest pattern for older codebases and cross-version compatibility.
Goroutines are cheap. Channels are not magic.
How cancellation propagates under the hood
Real code rarely sleeps. It makes network calls, reads files, or queries databases. Those operations accept a context.Context. When errgroup cancels the context, those operations should abort immediately instead of waiting for their own timeouts. The cancellation signal travels through the context tree. errgroup uses a context.WithCancel internally. When the first error arrives, it calls the cancel function. Every derived context receives the signal.
Here is how a realistic HTTP fan-out looks. You fetch three independent endpoints. If one fails, the others cancel. The context does the heavy lifting.
package main
import (
"context"
"fmt"
"net/http"
"time"
"golang.org/x/sync/errgroup"
)
// fetchURL makes an HTTP GET request and returns the status code.
func fetchURL(ctx context.Context, url string) (int, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
// Wrap the error to preserve the original cause.
return 0, fmt.Errorf("building request: %w", err)
}
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Do(req)
if err != nil {
// Context cancellation surfaces here as context.Canceled.
return 0, fmt.Errorf("request failed: %w", err)
}
defer resp.Body.Close()
return resp.StatusCode, nil
}
func main() {
ctx := context.Background()
g, ctx := errgroup.WithContext(ctx)
urls := []string{
"https://httpbin.org/delay/1",
"https://httpbin.org/status/500",
"https://httpbin.org/delay/2",
}
for _, u := range urls {
url := u // capture loop variable
g.Go(func() error {
// Pass the group context so cancellation propagates.
code, err := fetchURL(ctx, url)
if err != nil {
return fmt.Errorf("fetching %s: %w", url, err)
}
fmt.Printf("%s returned %d\n", url, code)
return nil
})
}
if err := g.Wait(); err != nil {
fmt.Println("fan-out failed:", err)
}
}
Notice the context flows as the first parameter in fetchURL. That is a hard convention in Go. Functions that perform I/O or long-running work always take context.Context first, conventionally named ctx. When errgroup cancels the context, http.NewRequestWithContext and client.Do see the cancellation and return immediately. You avoid wasting resources on requests that no longer matter.
Error wrapping with %w is standard practice here. It preserves the original error chain while adding context. The errgroup package itself does not wrap errors. It just returns the first one it sees. You wrap them inside your goroutines so the final error message tells you exactly which task failed. The if err != nil { return err } pattern looks verbose, but it makes the unhappy path visible. The community accepts the boilerplate because it prevents silent failures.
Context is plumbing. Run it through every long-lived call site.
Common traps and compiler complaints
The most common mistake is ignoring the context returned by WithContext. If you pass the original parent context to your goroutines instead of the group's context, cancellation never propagates. The goroutines keep running even after g.Wait returns an error. You end up with goroutine leaks and wasted CPU cycles. Always use the context that errgroup hands back.
Another trap is returning multiple errors. errgroup only captures the first error that reaches g.Wait. If two goroutines fail simultaneously, one error wins and the other gets silently dropped. This is by design. The package assumes you want to fail fast. If you need to collect every single error, you need a different pattern, like a buffered channel feeding into a slice. errgroup is not an error aggregator. It is a fail-fast coordinator.
The compiler will catch type mismatches quickly. If you try to pass a function that returns two values to g.Go, you get cannot use func literal (value of type func() (int, error)) as func() error value in argument. The signature is strict. It must be func() error. If you forget to capture the loop variable in older Go versions, the compiler used to warn but now enforces it with loop variable i captured by func literal. Always assign to a new variable inside the loop.
Goroutine leaks happen when the goroutine waits on a channel that never gets closed. errgroup solves this by giving you a context. Always respect cancellation. Check ctx.Err() after long operations. Return early if the context is done. The worst goroutine bug is the one that never logs.
When to reach for errgroup
Use errgroup when you need to run independent tasks concurrently and fail fast on the first error. Use a sync.WaitGroup when you only need to wait for completion and do not care about error propagation. Use a buffered channel plus a single collector goroutine when you need to aggregate results or collect multiple errors. Use plain sequential code when the tasks are CPU-bound or when concurrency adds more overhead than it saves.
errgroup is a coordination primitive, not a concurrency framework. Keep your goroutines focused. Let the group handle the waiting and the cancellation.