The problem with concurrent errors
You spawn three goroutines to fetch data from different services. One hangs. The other two finish and return. Your program sits there waiting for the hanging one, or worse, it crashes because you forgot to handle the error from the second goroutine. Concurrency in Go is fast, but error handling across goroutines is notoriously tricky. You cannot just return an error from a goroutine. You need a way to collect results, stop work when something breaks, and surface the first failure cleanly.
What errgroup actually does
errgroup solves this by combining three things: a wait group, a context, and an error channel. Think of it as a project manager for goroutines. It hands out a shared context to every worker. If any worker reports a failure, the manager cancels the context. All other workers see the cancellation and stop immediately. The manager then returns the first error that arrived. You get automatic cleanup, bounded concurrency, and a single error to handle.
The package lives in golang.org/x/sync/errgroup. It is not in the standard library, but it is maintained by the Go team and used in production by thousands of services. The core API is tiny. You create a group, spawn tasks with Go, and block on Wait. The heavy lifting happens behind the scenes.
Goroutines are cheap. Error coordination is not.
The minimal setup
Here is the simplest way to wire it up. You pass a parent context, spawn a few tasks, and check the result.
package main
import (
"context"
"fmt"
"golang.org/x/sync/errgroup"
)
func main() {
// Start with a background context. errgroup will derive a cancellable one.
ctx := context.Background()
// WithContext returns a group and a new context tied to that group.
g, ctx := errgroup.WithContext(ctx)
for i := 0; i < 3; i++ {
i := i // Capture loop variable. Go 1.22+ enforces this automatically.
// Go spawns a goroutine and tracks it internally.
g.Go(func() error {
// Check cancellation before doing work.
select {
case <-ctx.Done():
return ctx.Err()
default:
if i == 1 {
return fmt.Errorf("task %d failed", i)
}
return nil
}
})
}
// Wait blocks until all goroutines finish or one returns an error.
if err := g.Wait(); err != nil {
fmt.Println("Error:", err)
}
}
The compiler rejects the program with loop variable i captured by func literal if you forget the i := i line in older Go versions. Go 1.22 changed loop variable semantics to fix this trap, but keeping the capture explicit makes the intent clear to anyone reading the code.
How the cancellation flows
When you call errgroup.WithContext(ctx), the package creates a derived context. It also sets up an internal error channel and a wait group. Every call to g.Go increments the wait group counter and launches a goroutine. The goroutine receives the derived context as its first argument. Go convention dictates that context.Context always goes first in function signatures, conventionally named ctx. This makes it obvious to callers that the function supports cancellation.
When g.Wait() is called, it blocks. It listens on two things: the internal error channel and the wait group counter. If a goroutine returns an error, errgroup sends that error to the channel and cancels the derived context. The cancellation signal flows through the context tree. Every other goroutine that checks <-ctx.Done() sees the signal and returns ctx.Err(). The wait group counter eventually hits zero. Wait unblocks and returns the first error that arrived.
This pattern prevents goroutine leaks. A goroutine leak happens when a goroutine waits on a channel that never gets closed, or blocks on I/O that never completes. By tying all workers to a single cancellable context, you guarantee that a failure in one worker stops the others. You do not need to manually close channels or track individual goroutine lifecycles.
Context is plumbing. Run it through every long-lived call site.
Real-world pattern: fetching multiple endpoints
In production, you rarely just want to know if something failed. You usually want the successful results, plus the first error. Here is how you combine errgroup with a slice of results.
package main
import (
"context"
"fmt"
"net/http"
"golang.org/x/sync/errgroup"
)
// FetchResult holds the response status for a single URL.
type FetchResult struct {
URL string
Status int
}
func fetch(ctx context.Context, url string) (FetchResult, error) {
// Respect context cancellation in the HTTP client.
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return FetchResult{}, err
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return FetchResult{}, err
}
// Close body immediately to free connections.
defer resp.Body.Close()
return FetchResult{URL: url, Status: resp.StatusCode}, nil
}
func main() {
ctx := context.Background()
g, ctx := errgroup.WithContext(ctx)
urls := []string{"https://example.com", "https://httpbin.org/status/500", "https://golang.org"}
results := make([]FetchResult, len(urls))
for i, url := range urls {
i := i
url := url
// Spawn worker. Capture result in the pre-allocated slice.
g.Go(func() error {
res, err := fetch(ctx, url)
if err != nil {
return err
}
results[i] = res
return nil
})
}
// Wait collects the first error. Successful results are already in the slice.
if err := g.Wait(); err != nil {
fmt.Println("First failure:", err)
}
for _, r := range results {
if r.Status != 0 {
fmt.Printf("Got %s with status %d\n", r.URL, r.Status)
}
}
}
The if err != nil { return err } pattern looks verbose. The Go community accepts the boilerplate because it makes the unhappy path visible. You do not hide errors behind panics or silent drops. You surface them immediately. The errgroup package respects this philosophy by returning the first error cleanly, letting you decide whether to log it, wrap it, or fail the request.
Where things go wrong
The most common mistake is forgetting to check ctx.Done() inside long-running tasks. If a goroutine blocks on a database query or a network call without passing the context, it will not stop when errgroup cancels. The wait group counter will never reach zero. Your program will hang until the underlying operation times out or the process is killed. Always pass the derived context to every blocking call.
Another trap is expecting errgroup to collect all errors. It stops on the first one. If you need to aggregate multiple failures, you have to track them manually or use Go 1.20's errors.Join. errgroup is designed for fast-fail scenarios. You want to stop wasting resources the moment something breaks.
Forgetting to import the package triggers undefined: errgroup from the compiler. Importing it and not using it triggers imported and not used. Go enforces clean imports strictly. Trust the toolchain. It saves you from dead code.
The worst goroutine bug is the one that never logs.
When to reach for errgroup
Use errgroup when you need fast-fail cancellation across independent tasks. Use a plain sync.WaitGroup when you want all tasks to finish regardless of errors and you will handle failures manually. Use a channel plus a single goroutine when you need to stream results or process errors one by one in a pipeline. Use sequential code when the tasks are fast and you don't need concurrency: the simplest thing that works is usually the right thing.