How to Use sync.Cond in Go

The barrier problem

You are building a simulation where five agents must start their tasks at exactly the same moment. One agent prepares the shared environment, and the other four wait. If the waiters start early, they read uninitialized data and the simulation corrupts. If they wait forever, the program hangs. You need a way to pause goroutines until a specific state change happens, then wake them up instantly.

A simple channel won't work here. Channels pass values. You don't need to pass a value; you need to coordinate based on a shared boolean flag. Polling with time.Sleep wastes CPU and adds latency. You need a primitive that puts goroutines to sleep until a condition becomes true, then wakes them with zero overhead.

That is what sync.Cond provides. It is a coordination tool for goroutines that need to wait on a shared state variable.

What sync.Cond actually does

sync.Cond combines a mutex with a wait list. It wraps a sync.Mutex and manages a queue of goroutines that are blocked waiting for a signal. The mutex protects the shared state. The condition variable manages the waiters.

Think of it like a bouncer at a club. The bouncer holds the door open or closed using a key (the mutex). People line up outside (the wait list). When the bouncer decides the club is ready, they signal the line to move. The key difference from a channel is that sync.Cond allows multiple waiters to wake up based on a shared state variable, rather than passing a message through a pipe.

The pattern always follows the same rhythm: lock the mutex, check the condition, wait if the condition is false, unlock when done. The Wait() method is the heart of the primitive. It atomically unlocks the mutex and suspends the goroutine. This atomic step is critical. If the unlock and the sleep were separate steps, a signal could arrive in between, and the waiter would miss it forever.

Goroutines are cheap. Channels are not magic.

Minimal example

Here is the simplest pattern: lock, check in a loop, wait, unlock.

package main

import (
	"fmt"
	"sync"
	"time"
)

func main() {
	var mu sync.Mutex
	// NewCond requires a sync.Locker; the mutex implements the interface
	cond := sync.NewCond(&mu)
	ready := false

	// Waiter blocks until ready is true
	go func() {
		mu.Lock()
		// Loop is mandatory: cond.Wait() can return even if the condition is false
		for !ready {
			// Atomically unlocks mu and suspends the goroutine
			cond.Wait()
		}
		// Condition is true, proceed with critical section
		fmt.Println("Waiter woke up")
		mu.Unlock()
	}()

	// Signaler sets the condition and wakes one waiter
	time.Sleep(100 * time.Millisecond)
	mu.Lock()
	ready = true
	// Wakes one goroutine from the wait list
	cond.Signal()
	mu.Unlock()

	time.Sleep(100 * time.Millisecond)
}

The for loop around cond.Wait() is not a suggestion. It is a requirement. The runtime can wake a goroutine from Wait() even if Signal() was never called. These are called spurious wakeups. They happen due to internal scheduler optimizations or platform-specific behavior. The loop re-checks the condition after every wakeup. If the condition is still false, the goroutine goes back to sleep.

The loop is not optional. It is the only defense against spurious wakeups.

The atomic unlock

When a goroutine calls cond.Wait(), two things happen atomically. The mutex unlocks, and the goroutine moves to the wait list and suspends. This atomic step prevents race conditions where a signal could arrive between the check and the sleep.

When cond.Signal() is called, one goroutine is removed from the wait list. That goroutine wakes up, re-acquires the mutex, and returns from Wait(). The mutex is locked again when Wait() returns. This ensures the waiter can safely inspect the shared state immediately after waking.

The re-locking is subtle but vital. If Wait() returned with the mutex unlocked, the waiter would have a window where the signaler could change the state again before the waiter reads it. By re-locking before returning, sync.Cond guarantees that the waiter holds the lock when it resumes execution.

cond.Broadcast() works the same way but wakes all goroutines on the wait list. Use Broadcast when multiple waiters need to react to the state change. Use Signal when only one waiter needs to proceed. Broadcasting when only one waiter is needed causes a thundering herd problem. All waiters wake up, compete for the lock, and block again. This wastes CPU cycles.

Signal without the lock is a time bomb.

Realistic example: Batch processor

Here is a realistic use case: a batch processor that collects items and signals consumers when a batch is full.

// BatchProcessor groups items until a threshold is reached
type BatchProcessor struct {
	mu    sync.Mutex
	cond  *sync.Cond
	items []string
	size  int
}

// NewBatchProcessor initializes the struct and binds the condition to the mutex
func NewBatchProcessor(size int) *BatchProcessor {
	bp := &BatchProcessor{size: size}
	// sync.NewCond requires a sync.Locker; the mutex provides the lock
	bp.cond = sync.NewCond(&bp.mu)
	return bp
}

// Add appends an item and broadcasts if the batch is ready
func (bp *BatchProcessor) Add(item string) {
	bp.mu.Lock()
	defer bp.mu.Unlock()
	bp.items = append(bp.items, item)
	// Broadcast wakes all waiters; use Signal if only one consumer exists
	if len(bp.items) >= bp.size {
		bp.cond.Broadcast()
	}
}

The receiver name bp follows the Go convention of one or two letters matching the type. gofmt formats the struct fields consistently. The Add method holds the lock while modifying the slice and signaling. This ensures the waiter sees the updated slice length.

// WaitBatch blocks until enough items arrive, then returns the batch
func (bp *BatchProcessor) WaitBatch() []string {
	bp.mu.Lock()
	defer bp.mu.Unlock()
	// Loop is required because Broadcast wakes all waiters, and spurious wakeups exist
	for len(bp.items) < bp.size {
		bp.cond.Wait()
	}
	// Copy items to avoid sharing the internal slice
	batch := make([]string, len(bp.items))
	copy(batch, bp.items)
	bp.items = bp.items[:0]
	return batch
}

The WaitBatch method copies the items before returning. This prevents the caller from holding a reference to the internal slice, which would cause data races when the next batch arrives. The slice reset bp.items = bp.items[:0] reuses the underlying array to avoid allocations.

Modern Go code often wraps sync.Cond with context.Context to support cancellation. sync.Cond itself does not support deadlines. If you need to cancel a wait, run the Wait loop in a goroutine and select on a context channel. This pattern is common in libraries that need backward compatibility with older synchronization primitives.

Context is plumbing. Run it through every long-lived call site.

Pitfalls and panics

Calling cond.Wait() without holding the mutex causes a runtime panic. The compiler cannot catch this; the program crashes with runtime error: sync: Wait without being locked. Always hold the lock before calling Wait. The lock must be the same mutex passed to sync.NewCond. Using a different mutex is undefined behavior.

Calling Signal() or Broadcast() without the lock is also undefined behavior. It might work by accident on some platforms, but it can miss waiters or corrupt the internal state. The convention is to hold the lock while modifying the condition and signaling. This ensures the waiter sees the updated state immediately after waking.

The worst goroutine bug is the one that never logs.

Spurious wakeups are real. The runtime can wake a goroutine from Wait() even if Signal() was never called. This is why the condition check must always be in a loop. A single if statement is a bug waiting to happen. The loop re-checks the condition after every wakeup. If the condition is still false, the goroutine goes back to sleep.

Lost wakeups happen when a signal arrives before the waiter calls Wait(). The signal is lost, and the waiter blocks forever. The loop pattern prevents lost wakeups if the condition variable is initialized correctly. The condition must reflect the true state of the shared data. If the condition is true before the waiter starts, the loop exits immediately without waiting.

Don't fight the type system. Wrap the value or change the design.

When to use sync.Cond

Use a channel when you need to pass a value between goroutines or coordinate a one-off event. Use sync.Cond when multiple goroutines need to wait on a shared state variable that changes over time. Use a sync.WaitGroup when you need to wait for a fixed number of goroutines to finish, not for a state change. Use a polling loop with time.Sleep when the condition is rare and latency doesn't matter, though this wastes CPU. Use context.Context with a channel when you need to support cancellation or deadlines for the wait operation.

sync.Cond is a low-level primitive. It is often hidden inside other synchronization tools. sync.RWMutex uses it internally. Channels use it under the hood. You rarely use it directly unless you are building a custom synchronization primitive or a barrier.

Goroutines are cheap. Channels are not magic.

Where to go next

sync.Cond is a tool that lets multiple parts of your program pause and wait for a specific event to happen. Think of it like a waiter at a restaurant who stops working until the chef signals that a table is ready. You use it when you need one part of your code to pause efficiently until another part finishes a task or changes a state.