Errors in goroutines

The silent failure

You write a function that fetches configuration from three different microservices. You wrap each call in a goroutine to run them concurrently and shave off two seconds of latency. The program starts, the goroutines fire, and one service returns a 500 error. Your main function prints a success message and exits. The error vanishes into the void. Worse, if that goroutine panics instead of returning an error, your entire process crashes with a stack trace that points to a line you never called directly. Handling errors in goroutines requires a different mental model than sequential code because the call stack splits. You have to build a bridge back to the caller.

Goroutines are cheap. Channels are not magic.

How errors travel in concurrent code

In sequential Go code, errors travel up the call stack. A function returns an error, the caller checks it, and the pattern repeats until something handles it. Goroutines break that chain. Think of a goroutine like a separate workbench in a workshop. The main function is the manager. If the manager hands a task to a worker and walks away, the worker has no way to shout back unless you install a communication line. In Go, that line is a channel. You create a channel, pass it to the goroutine, and the goroutine sends the result or error down the pipe. The main function reads from the channel to see what happened.

The Go community accepts a specific convention here: the sender closes the channel, never the receiver. If you close a channel from the reading side, you risk closing it while another goroutine is still trying to write, which triggers a panic. Keep the closing logic on the side that knows when the work is finished.

Errors in goroutines do not bubble up. You must explicitly route them.

A minimal pattern

Start with the simplest case: one goroutine, one result, one error. You need a channel to carry the error back to the main execution flow.

// FetchItem simulates a concurrent task that returns a value or an error.
func FetchItem(id int) (string, error) {
    // Fail intentionally to demonstrate error propagation.
    if id == 2 {
        return "", fmt.Errorf("timeout fetching id %d", id)
    }
    return fmt.Sprintf("item-%d", id), nil
}

func main() {
    // Buffered channel prevents the goroutine from blocking if main isn't ready yet.
    errCh := make(chan error, 1)

    // Run the task concurrently so it doesn't block the main thread.
    go func() {
        // Capture the error and ship it back to the caller.
        _, err := FetchItem(2)
        errCh <- err
    }()

    // Read from the channel to retrieve the result from the background worker.
    err := <-errCh
    if err != nil {
        fmt.Println("Caught error:", err)
    }
}

The channel acts as a rendezvous point. The goroutine does its work, encounters the error, and pushes it into errCh. The main function blocks on <-errCh until that value arrives. Once it does, you handle it exactly like any other error. The if err != nil check is verbose by design. The community keeps it visible because hiding the unhappy path leads to silent data corruption.

Never swallow a channel read. Always assign and check.

What happens under the hood

When the program runs, the Go runtime schedules the goroutine on a logical thread. The main function continues executing until it hits the channel receive operation. At that point, the runtime parks the main goroutine and waits for the background worker to send a value. The worker runs, hits the conditional error, creates an error value, and writes it to the channel. The runtime wakes the main goroutine, delivers the value, and execution resumes.

If you remove the channel and just call go FetchItem(2), the goroutine still runs. It still returns an error. But nobody captures it. The error value is garbage collected, the goroutine exits, and the main function moves on. The runtime does not warn you about unhandled errors in goroutines. It assumes you know what you are doing.

If you forget to synchronize access to shared variables across goroutines, the compiler will not stop you. Go does not enforce thread safety at compile time. You run the program with go run -race main.go to activate the race detector. The detector instruments memory accesses and reports violations like WARNING: DATA RACE when two goroutines touch the same memory without synchronization. The race detector is a development tool, not a runtime guarantee. You still need mutexes or channels to fix the underlying logic.

Trust the race detector during development. Fix the data race before shipping.

Real-world aggregation

Production code rarely launches a single goroutine. You usually fan out multiple tasks, wait for them to finish, and collect whatever errors occurred. This requires a wait group to track completion and a mutex to protect shared state.

// FetchAll runs multiple concurrent tasks and collects any failures.
func FetchAll(ids []int) ([]string, error) {
    // WaitGroup tracks how many goroutines are still running.
    var wg sync.WaitGroup
    // Mutex protects the shared slices from concurrent writes.
    var mu sync.Mutex
    var results []string
    var errs []error

    for _, id := range ids {
        // Increment counter before launching the goroutine.
        wg.Add(1)
        go func(id int) {
            // Decrement counter when this specific goroutine finishes.
            defer wg.Done()

            res, err := FetchItem(id)
            mu.Lock()
            // Capture the result or error while holding the lock.
            if err != nil {
                errs = append(errs, err)
            } else {
                results = append(results, res)
            }
            mu.Unlock()
        }(id)
    }

    // Wait for all launched goroutines to finish their work.
    wg.Wait()

    if len(errs) > 0 {
        return results, fmt.Errorf("partial failure: %v", errs)
    }
    return results, nil
}

The sync.WaitGroup ensures the main function does not return before every worker finishes. The sync.Mutex prevents two goroutines from appending to the errs slice at the exact same nanosecond, which would corrupt the underlying array. You hold the lock only long enough to record the outcome, then release it so other workers can proceed. This pattern scales cleanly to dozens or hundreds of concurrent tasks.

If a goroutine panics inside this loop, the panic escapes the goroutine and crashes the entire process. Panics are for programming errors, not expected failures. Convert recoverable conditions to errors before they reach the goroutine boundary.

Panics crash the process. Errors flow through channels.

Common traps and runtime signals

Goroutine error handling introduces three classic failure modes. The first is the deadlock. This happens when every goroutine is waiting for a channel operation that will never complete. The runtime detects this and halts execution with fatal error: all goroutines are asleep - deadlock!. Deadlocks usually stem from unbuffered channels where the sender and receiver are out of sync, or from forgetting to close a channel that a range loop depends on.

The second trap is the goroutine leak. A leak occurs when a goroutine blocks forever on a channel that nobody will ever write to, or when a background worker ignores cancellation signals. The program appears to hang, memory usage creeps up, and the process never exits gracefully. Always provide a cancellation path. Pass a context.Context as the first parameter to any long-running function, and check ctx.Err() periodically. The convention is strict: context always goes first, named ctx, and functions must respect cancellation and deadlines.

The third trap is silent error swallowing. Developers sometimes write go doWork() and assume the error will surface somewhere. It will not. If doWork returns an error, that value is discarded the moment the goroutine exits. The compiler will not warn you. The runtime will not panic. You simply lose the failure.

The worst goroutine bug is the one that never logs.

Choosing the right pattern

Concurrency adds complexity. You should only introduce goroutines when the benefit outweighs the coordination overhead. Pick the structure that matches your failure tolerance and data flow.

Use a single error channel when one goroutine feeds one result back to the caller and you want to block until it finishes. Use a wait group with a mutex-protected error slice when you need to fan out multiple independent tasks and collect all failures before proceeding. Use context cancellation when you want to abort all background workers immediately if the first one fails or the client disconnects. Use sequential code when the tasks are fast and do not involve I/O: the overhead of goroutines and channels outweighs the speed gain.

Goroutines are cheap. Channels are not magic.

Where to go next

Goroutines are lightweight threads that run code concurrently. Errors happen when they try to access the same data at the same time without protection, or when they get stuck waiting for something that never happens. Think of it like two people trying to write in the same notebook at once, or waiting for a phone call that never comes.