How to wait for goroutines with WaitGroup

Use `sync.WaitGroup` to track a set of goroutines by incrementing the counter before launching each one and decrementing it when finished, then call `Wait()` to block until the counter reaches zero.

The race to zero

Your main function finishes its last line of code and returns. The program exits. Three background goroutines that were still fetching data, writing to a file, or processing a queue get silently killed mid-execution. You stare at the terminal, wondering why your output is missing or why your database writes are incomplete.

This is the most common concurrency mistake in Go. The language does not keep a process alive for background tasks. When main returns, the entire process terminates, regardless of how many goroutines are still running. You need a synchronization primitive to tell the main goroutine to pause until the background work finishes.

What a WaitGroup actually does

sync.WaitGroup is a counter designed specifically for this coordination problem. It tracks how many tasks are still running and blocks execution until that count drops to zero. Think of it like a bouncer at a venue with a mechanical clicker. Every person who walks through the door clicks the counter up. Every person who leaves clicks it down. The bouncer stays at the exit and refuses to lock the doors until the clicker reads zero.

The type lives in the standard library under sync, which stands for synchronization. It does not use channels, locks, or complex state machines. It relies on a single atomic counter and a condition variable under the hood. You increment it before spawning work, decrement it when work finishes, and call a blocking method to wait for the counter to reach zero.

WaitGroups are lightweight. They allocate a small amount of memory and avoid the overhead of creating channels for every single task. They are the standard tool for fan-out concurrency where you launch multiple independent workers and need to know when they are all done.

The minimal pattern

Here is the simplest way to coordinate three independent workers. The pattern follows a strict order: increment, spawn, wait.

package main

import (
	"fmt"
	"sync"
	"time"
)

// worker simulates a background task and signals completion.
func worker(id int, wg *sync.WaitGroup) {
	// Ensure the counter decrements even if the function panics.
	defer wg.Done()
	fmt.Printf("Worker %d starting\n", id)
	// Simulate network or disk I/O latency.
	time.Sleep(1 * time.Second)
	fmt.Printf("Worker %d done\n", id)
}

func main() {
	var wg sync.WaitGroup

	// Launch three independent tasks.
	for i := 1; i <= 3; i++ {
		// Increment the counter before the goroutine can possibly run.
		wg.Add(1)
		go worker(i, &wg)
	}

	// Block the main goroutine until the counter reaches zero.
	wg.Wait()
	fmt.Println("All workers finished")
}

The flow is deliberate. wg.Add(1) runs in the main goroutine before go worker(i, &wg). This guarantees the counter is already tracking the new task before the scheduler might switch to it. Inside worker, defer wg.Done() attaches the decrement to the function's return path. When worker finishes, the counter drops. wg.Wait() sits in the main goroutine and yields the CPU until the counter hits zero. Once it does, execution resumes and prints the final message.

Never call Add after go. The scheduler can run the new goroutine immediately. If it finishes and calls Done before the main goroutine calls Add, the counter drops below zero and the program panics. Always increment first.

How the runtime tracks the counter

The sync.WaitGroup struct contains two 64-bit integers. The first tracks the number of active tasks. The second tracks the number of goroutines currently blocked in Wait(). The runtime uses atomic operations to modify these values without locks. When Add is called with a positive number, it increments the first integer. When Done is called, it decrements it. If the first integer reaches zero and the second integer is greater than zero, the runtime wakes up all blocked goroutines.

This design means WaitGroup is not a general-purpose counter. It is optimized for a specific lifecycle: start at zero, add tasks, wait for completion, reset to zero. Copying a WaitGroup that has been used will trigger a runtime panic. The compiler cannot catch this because the struct looks like a normal value type. The runtime checks the internal state and aborts if it detects a copy of a non-zero WaitGroup.

You will see the error sync: negative WaitGroup counter if Done is called more times than Add. You will see sync: WaitGroup is reused before previous Wait has returned if you try to reuse the same variable across multiple batches without letting Wait finish first. The runtime enforces these rules strictly because a corrupted counter breaks all downstream synchronization.

Convention dictates naming the variable wg. It is short, universally recognized in Go codebases, and keeps the signature clean. Functions that accept a WaitGroup always take a pointer: func doWork(wg *sync.WaitGroup). Passing by value would copy the struct, which leads to the panic mentioned above. The pointer ensures all goroutines and the main function operate on the same counter.

Real-world: fetching URLs concurrently

Background tasks rarely sleep for a fixed duration. They usually make network calls, read files, or query databases. Here is how WaitGroup fits into a realistic HTTP client scenario.

package main

import (
	"fmt"
	"io"
	"net/http"
	"sync"
)

// fetch downloads a URL and returns the response body length.
func fetch(url string, wg *sync.WaitGroup) {
	// Decrement the counter when this goroutine exits.
	defer wg.Done()
	resp, err := http.Get(url)
	if err != nil {
		fmt.Printf("failed to fetch %s: %v\n", url, err)
		return
	}
	// Close the body to release the underlying TCP connection.
	defer resp.Body.Close()
	_, err = io.ReadAll(resp.Body)
	if err != nil {
		fmt.Printf("failed to read %s: %v\n", url, err)
		return
	}
	fmt.Printf("successfully fetched %s\n", url)
}

func main() {
	var wg sync.WaitGroup
	targets := []string{
		"https://example.com",
		"https://golang.org",
		"https://httpbin.org/status/200",
	}

	// Launch a goroutine for each target URL.
	for _, u := range targets {
		wg.Add(1)
		go fetch(u, &wg)
	}

	// Wait for all HTTP requests to complete.
	wg.Wait()
	fmt.Println("all downloads finished")
}

Each goroutine handles its own error path. If http.Get fails, the function prints the error and returns. The defer wg.Done() ensures the counter still decrements, so the main goroutine does not hang forever. The same applies to io.ReadAll. Error handling in Go is explicit by design. You do not wrap every call in a recovery block. You check the error, log it or return it, and let the function exit normally. The defer handles the synchronization cleanup.

Notice the receiver and parameter naming. fetch takes a pointer to sync.WaitGroup named wg. It does not take ctx here because the example is short-lived, but in production code you would pass context.Context as the first parameter. The convention is strict: context always goes first, conventionally named ctx. Functions that accept a context must respect cancellation and deadlines. If the parent context cancels, the goroutine should stop its work and return, still calling wg.Done() via defer.

Pitfalls and runtime traps

WaitGroups are simple, but they have sharp edges. The most common mistake is calling Add inside the goroutine instead of before it.

// BAD: race condition between Add and Done
go func() {
	wg.Add(1)
	defer wg.Done()
	// do work
}()

The scheduler might run the goroutine immediately. It calls Done before the main goroutine ever calls Add. The counter goes negative. The runtime panics with sync: negative WaitGroup counter. Always call Add in the spawning goroutine, right before the go statement.

Another trap is reusing a WaitGroup without waiting for it to reset. If you call Add again while Wait is still blocking, the runtime panics with sync: WaitGroup is reused before previous Wait has returned. The fix is straightforward: declare a new sync.WaitGroup for each batch of work, or ensure Wait completes before you start the next batch.

Copying a WaitGroup is a silent killer until runtime. The compiler sees a struct and allows assignment. The runtime detects the copy and panics. Never pass a WaitGroup by value. Never store it in a struct that gets copied. Always use pointers.

Goroutine leaks happen when a goroutine blocks on a channel that never closes, or waits for a mutex that never releases. A WaitGroup does not prevent leaks. It only tracks completion. If a goroutine hangs, wg.Wait() hangs forever. Always design a cancellation path. Use context deadlines, select statements with timeouts, or ensure channels are closed by their producers. The worst goroutine bug is the one that never logs and never returns.

When to reach for WaitGroup

Use a sync.WaitGroup when you need to block until a fixed set of independent goroutines finishes. Use a channel when you need to collect results, stream data between stages, or signal completion with a payload. Use a context.Context with cancellation when you need to abort long-running tasks early. Use a mutex when multiple goroutines must safely read or write the same shared variable. Use plain sequential code when you do not need concurrency: the simplest thing that works is usually the right thing.

WaitGroups do not return values. They only coordinate timing. If you need to gather results, pair the WaitGroup with a channel or a slice protected by a mutex. The pattern is common: launch workers with a WaitGroup, send results to a buffered channel, close the channel when the WaitGroup hits zero, then range over the channel in the main goroutine.

Where to go next

A WaitGroup acts like a counter that tracks how many background tasks are still running. You tell it to expect a task before you start it, and the task tells it when it's done. The main program pauses at the Wait() call until that counter hits zero, ensuring no tasks are left hanging.