Common Concurrency Bugs in Go and How to Find Them

The counter that lies to you

You write a web server that counts active users. You spin up a goroutine per request. The count looks right when you test with one browser tab. You open ten tabs, hit refresh, and the counter jumps to random numbers. Sometimes it is too low. Sometimes it is too high. The code runs without panicking, but the data is garbage. This is a data race. The program is lying to you, and the compiler won't catch it by default.

Concurrency bugs are insidious because they are non-deterministic. The same code might work perfectly for weeks and then fail on a Tuesday morning when the load spikes. The failure mode is rarely a crash. It is usually silent corruption: a balance that drifts, a cache that returns stale data, or a counter that loses increments. You cannot reason about the output by reading the code alone. You need tools and discipline to prove the code is safe.

What a data race actually is

A data race happens when two goroutines access the same memory location at the same time, at least one access is a write, and there is no synchronization. Synchronization means using a mutex, a channel, or an atomic operation to coordinate access.

Imagine two people editing the same line of a shared document without talking. One types "Hello", the other types "World". The result might be "HWeolrldo" or just "World" if one overwrites the other. The computer does not care about your intent. It executes instructions as fast as it can. If the instructions overlap on the same byte, the outcome is undefined.

In Go, undefined behavior usually means a corrupted value. The CPU and compiler are free to reorder instructions, cache values in registers, or split operations. A line like count++ looks like a single step, but it compiles to three distinct instructions: read the value, add one, write it back. The scheduler can switch goroutines between any two instructions. If Goroutine A reads 5, then Goroutine B reads 5, adds one, writes 6, and then Goroutine A adds one to its 5 and writes 6, you lost an increment. The final count is wrong.

Deadlocks are a different class of bug. A deadlock occurs when two or more goroutines wait for each other forever. It is like two people in a narrow hallway both refusing to step back. The program stops making progress. The runtime eventually detects this and panics with fatal error: all goroutines are asleep - deadlock!.

Minimal example: The broken counter

This example shows a data race in action. The code compiles and runs, but the result is incorrect.

package main

import (
	"fmt"
	"sync"
)

// Counter increments a shared value without protection.
// This function demonstrates a data race.
func Counter() {
	var count int
	var wg sync.WaitGroup

	// Launch two goroutines that modify the same variable.
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < 1000; j++ {
				// The read-modify-write cycle is not atomic.
				// Another goroutine can interrupt between read and write.
				count++
			}
		}()
	}

	// Wait for all goroutines to finish.
	wg.Wait()
	fmt.Println(count) // Expected 2000, actual result varies.
}

The sync.WaitGroup ensures the main goroutine waits for the workers. Without it, the program would exit before the workers finish. The bug is inside the loop. count++ is not atomic. Running this program multiple times produces different results. You might see 1542, 1890, or 2000. The value 2000 appears only by chance when the scheduler happens to run the goroutines sequentially.

How the race detector catches bugs

Go includes a built-in race detector that instruments your code at runtime. It tracks every memory access and checks for conflicts. If two goroutines touch the same memory without synchronization, the detector prints a detailed report.

Run your tests with the -race flag:

go test -race ./...

The race detector slows down execution by 2 to 3 times and increases memory usage. It is too expensive for production. Use it in your CI pipeline and during development. If a race exists, the test fails and prints a stack trace showing exactly where the conflicting reads and writes occurred. The output starts with WARNING: DATA RACE and lists the goroutine IDs and call stacks.

If you are running a binary, use the flag with go run or go build:

go run -race main.go

The race detector is your safety net. It catches bugs that unit tests miss because tests often run on a single core or with predictable timing. The detector forces the runtime to check every access, regardless of timing.

Fixing races with mutexes

The standard fix for shared mutable state is a mutex. sync.Mutex provides mutual exclusion. Only one goroutine can hold the lock at a time. Other goroutines block until the lock is released.

Wrap the critical section with Lock and Unlock. The critical section is the code that touches shared state. Keep it small. Holding a lock while performing I/O blocks other goroutines and kills performance.

package main

import (
	"fmt"
	"sync"
)

// SafeCounter increments a shared value with a mutex.
// It ensures only one goroutine modifies the count at a time.
var (
	count int
	mu    sync.Mutex
)

func SafeCounter() {
	var wg sync.WaitGroup

	for i := 0; i < 2; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			for j := 0; j < 1000; j++ {
				// Acquire the lock before modifying shared state.
				mu.Lock()
				count++
				// Release the lock immediately after the update.
				mu.Unlock()
			}
		}()
	}

	wg.Wait()
	fmt.Println(count) // Always prints 2000.
}

Use defer mu.Unlock() in larger functions to ensure the lock is released even if a panic occurs. In tight loops, explicit Unlock is slightly faster because it avoids the defer overhead. The difference is usually negligible. Correctness matters more than micro-optimizations.

Realistic example: A thread-safe cache

Real applications often need a cache. Maps are not safe for concurrent access. Writing to a map from multiple goroutines causes a panic: fatal error: concurrent map writes. You must protect the map with a mutex.

If the cache is read-heavy, use sync.RWMutex. It allows multiple readers to proceed concurrently while writers get exclusive access. This improves throughput for workloads where reads far outnumber writes.

package main

import (
	"sync"
)

// Cache provides thread-safe storage for key-value pairs.
// It uses a read-write mutex to allow concurrent reads.
type Cache struct {
	mu    sync.RWMutex
	items map[string]string
}

// Get returns the value for key.
// It acquires a read lock to allow concurrent access.
func (c *Cache) Get(key string) string {
	c.mu.RLock()
	defer c.mu.RUnlock()
	return c.items[key]
}

// Set updates the value for key.
// It acquires a write lock to ensure exclusive access.
func (c *Cache) Set(key, value string) {
	c.mu.Lock()
	defer c.mu.Unlock()
	c.items[key] = value
}

The receiver name is usually one or two letters matching the type: (c *Cache), NOT (this *Cache) or (self *Cache). This is a community convention that keeps code concise.

Deadlocks and silent leaks

Deadlocks happen when goroutines wait for resources that never become available. A common pattern is holding a lock while waiting on a channel. If the channel sender also needs the lock, both goroutines block forever.

Another cause is inconsistent lock ordering. If Goroutine A holds Lock 1 and wants Lock 2, and Goroutine B holds Lock 2 and wants Lock 1, they deadlock. Always acquire locks in a consistent order across your codebase.

Goroutine leaks are silent killers. A goroutine starts but never finishes. It holds memory and resources. The application slows down over time until it runs out of memory. Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path. Use context.Context to signal shutdown. Functions that take a context should respect cancellation and deadlines. context.Context always goes as the first parameter, conventionally named ctx.

Tools for finding bugs

When the race detector passes but the application still misbehaves, use pprof to inspect goroutine stacks. The pprof tool visualizes where goroutines are stuck. Run your application with profiling enabled, then use go tool pprof to generate a stack trace. Look for goroutines blocked on chan receive or mutex lock.

If you see many goroutines stuck on the same line, you have a bottleneck or a deadlock. If you see goroutines growing without bound, you have a leak.

The race detector is your first line of defense. Run it on every test suite. If the race detector fails, fix the root cause. Do not suppress the warning. A race that passes in tests might still fail in production under different timing conditions.

Decision matrix

Use the race detector (-race) when you suspect data corruption or non-deterministic test failures. Use a mutex when multiple goroutines need to read and write the same variable. Use a read-write mutex (RWMutex) when you have many readers and few writers. Use a channel when one goroutine produces data and another consumes it. Use pprof when your application hangs and you need to see where goroutines are stuck. Use sync.WaitGroup when you need to wait for a fixed number of goroutines to finish. Use a buffered channel when you want to decouple the producer from the consumer temporarily. Use context.Context when you need to cancel a long-running operation or pass deadlines. Use plain sequential code when you don't need concurrency: the simplest thing that works is usually the right thing.

Goroutines are cheap. Channels are not magic. Trust the race detector. Argue logic, not formatting.

Where to go next

The race detector acts like a safety inspector for your code that runs multiple tasks at once. It watches for moments when two parts of your program try to change the same data simultaneously, which causes crashes or weird behavior. You use it whenever you suspect your application is unstable under heavy load or when using goroutines.