The counter that breaks under pressure
You are building a metrics collector. Every incoming HTTP request increments a global counter. In development, you wrap the counter in a sync.Mutex. It works perfectly. You deploy to production. Traffic spikes. The profiler shows the mutex spinning. Goroutines park and unpark while waiting for the lock. Latency jumps. The counter is just a number. You do not need a heavy lock that blocks entire goroutines. You need a fast, hardware-level update.
That is what sync/atomic provides. It gives you lock-free operations on single values. The CPU guarantees the update happens as one indivisible step. No mutex overhead. No context switches. Just a direct instruction to the processor.
What atomic actually means
An atomic operation completes without interruption. If two goroutines try to update the same variable at the exact same moment, the CPU serializes the operations. One finishes, then the other starts. You never see a torn read or a lost update.
Think of a shared variable as a slot in a parking meter. You drop a coin, the display updates. You cannot drop half a coin. The update is atomic. A mutex is like a bouncer who lets only one person approach the meter at a time. The bouncer adds safety but also delay. Atomic operations remove the bouncer and rely on the meter's internal mechanism to handle collisions instantly.
In Go, sync/atomic exposes these hardware primitives. You can load, store, add, and compare-swap values. The package works on fixed-size types: int64, uint64, uintptr, and pointers. It does not work on slices, maps, or structs. Those are complex structures that span multiple memory words. Atomic operations protect a single word.
The simplest atomic operation
Here is the most basic atomic workflow: add to a counter and read it back safely.
package main
import (
"fmt"
"sync/atomic"
)
// counter is a global variable. Go initializes it to zero.
// We use int64 because atomic operations require a fixed size.
var counter int64
func main() {
// Add 1 atomically. The CPU guarantees this read-modify-write
// happens as a single step, even if other goroutines run.
atomic.AddInt64(&counter, 1)
// Load the value safely. Without atomic load, the compiler
// might cache the value in a register and miss updates.
val := atomic.LoadInt64(&counter)
fmt.Println(val)
}
Atomic functions always take pointers. You pass &counter, not counter. The function needs the memory address to modify the value in place. The compiler enforces this. If you pass a value, the program will not compile.
Why direct reads and writes fail
Go compilers optimize aggressively. If you read a variable in a loop, the compiler might hoist the read outside the loop. It assumes the value does not change. This is a valid optimization for local variables. It breaks shared state.
atomic.LoadInt64 inserts a memory barrier. A barrier tells the compiler and CPU not to reorder instructions across the barrier. It forces the latest value to be read from memory. Without atomic.Load, a goroutine might see a stale value forever. The program hangs or makes wrong decisions.
The same rule applies to writes. atomic.StoreInt64 ensures the value is flushed to memory immediately. Other goroutines see the update right away. Direct assignment does not guarantee this ordering.
Convention aside: Always use int64 or uint64 for atomic counters. Never use int. The size of int depends on the architecture. On a 32-bit system, int is 32 bits. atomic.AddInt64 requires a 64-bit pointer. The compiler will reject *int with an error, but it is a design habit to pick the fixed-width type from the start. Using int64 makes your code portable and safe across all platforms.
Atomic operations are fast. They are not a replacement for thinking about data flow.
Coordinating with Compare-And-Swap
Adding a number is straightforward. Real coordination often requires checking a value before updating it. You want to increment a counter only if it is below a limit. You want to swap a pointer only if it points to a specific object.
CompareAndSwapInt64 does this. It takes three arguments: the address, the expected old value, and the new value. The function checks the current value. If it matches the expected old value, it writes the new value and returns true. If the value changed, it does nothing and returns false.
This is the building block of lock-free algorithms. You load a value, compute the update, and try to swap. If the swap fails, another goroutine changed the value. You retry with the new value.
Here is a rate limiter that uses Compare-And-Swap to protect the limit without a mutex.
package main
import (
"sync/atomic"
)
// RateLimiter tracks requests using a single atomic integer.
// It resets the count when a threshold is reached.
type RateLimiter struct {
count int64
limit int64
}
// Allow checks if the request is under the limit.
// It returns true if the request is allowed, false otherwise.
func (rl *RateLimiter) Allow() bool {
for {
// Load the current count to check against the limit.
// We must load inside the loop to get the latest value.
current := atomic.LoadInt64(&rl.count)
if current >= rl.limit {
return false
}
// Attempt to swap. If another goroutine changed the value
// between the load and this swap, this returns false and
// the loop retries with the new value.
if atomic.CompareAndSwapInt64(&rl.count, current, current+1) {
return true
}
}
}
The loop is essential. CompareAndSwap can fail for two reasons. Another goroutine updated the value. Or the CPU detected a conflict and rejected the swap spuriously. The loop retries until the swap succeeds or the limit is reached. This pattern is called a retry loop. It is the standard way to use CAS.
Modern Go provides atomic.Pointer[T] for type-safe atomic pointers. Since Go 1.19, you can use atomic.Pointer instead of unsafe.Pointer. It wraps a pointer in a generic type and provides Load, Store, CompareAndSwap, and Swap methods. You get atomic safety without bypassing the type system. Use atomic.Pointer for atomic references to structs or interfaces.
CAS is powerful. It requires a loop. If you forget the loop, you get silent failures.
The hidden cost of atomic operations
Atomic operations are cheap, but they are not free. Every atomic read or write touches the CPU cache line that holds the variable. When multiple cores modify the same cache line, the hardware constantly invalidates and refreshes that line across cores. This is called cache line bouncing.
If you pack several atomic counters into the same struct, they will likely share a cache line. Updating one counter invalidates the cache line for the other cores, forcing them to fetch it again. The performance drops dramatically. The fix is padding. Place each atomic variable in its own cache line by adding unused bytes around it, or keep them in separate structs.
Modern CPUs handle this well for moderate contention. You only notice the penalty when thousands of goroutines hammer the exact same variable. Even then, atomic operations usually beat mutexes. The overhead comes from cache coherency, not from the atomic instruction itself.
Where atomic operations go wrong
Atomic operations are low-level. They expose hardware behavior. Misuse leads to subtle bugs.
If you pass a plain int to atomic.AddInt64, the compiler rejects it with cannot use counter (variable of type int) as *int64 value in argument. Atomic operations require a fixed-size type and a pointer. The compiler catches type mismatches. It does not catch logic errors.
A common mistake is using atomic operations for complex types. You cannot atomically append to a slice. You cannot atomically update a map. Slices and maps are headers that point to underlying data. Updating the header atomically does not protect the data. Two goroutines can read the same slice header, append independently, and overwrite each other's changes. For slices, maps, or structs, use a sync.Mutex.
Another pitfall is forgetting to load atomically. Direct reads are not safe because writes are atomic. The compiler may cache the value. The CPU may reorder loads. You must use atomic.Load for every read of a shared atomic variable.
Goroutine leaks are not a direct risk with sync/atomic. Atomics do not block. However, retry loops can spin. If contention is extremely high, a CAS loop can consume CPU cycles without making progress. This is called livelock. In practice, CAS loops are efficient. The CPU handles contention well. If you see high CPU usage, consider reducing contention or switching to a mutex.
Atomic protects a word. It does not protect a structure. Use a mutex for anything larger than a single value.
Choosing the right synchronization primitive
Use sync/atomic when you need to update a single counter, flag, or pointer with minimal overhead and high contention. Use sync.Mutex when you need to protect a group of variables, a complex data structure, or a critical section that spans multiple operations. Use a channel when you need to coordinate data flow between goroutines or signal events, rather than just sharing a value. Use sync.RWMutex when you have many readers and few writers, and the read operation is expensive enough to justify the lock overhead. Use plain sequential code when you don't need concurrency. The simplest thing that works is usually the right thing.