The 16-Byte Mystery
You define a struct with a single boolean field. You expect it to take one byte. You check the memory usage and it's 16 bytes. You think Go is wasteful. You're not wrong about the number, but you're wrong about the reason. Go isn't padding your struct for alignment. The allocator is packing your boolean into a shared 16-byte slot alongside other tiny objects. This trade-off sacrifices a few bytes of precision to gain massive speed and reduce fragmentation. The memory manager has a strategy, and it routes every allocation through one of three paths based on size.
How the Allocator Routes Your Data
Go splits memory requests into three buckets: tiny, small, and large. The cutoff points are fixed. Tiny objects are 16 bytes or less. Small objects range from 17 bytes to 32KB. Large objects exceed 32KB. Each path uses a different mechanism to balance allocation speed, memory waste, and garbage collection overhead.
Think of a warehouse. Tiny items go into a shared bin where the worker packs multiple items into a single box. Small items go into pre-sized envelopes. The worker grabs the nearest envelope size, drops the item in, and seals it. Large items skip the envelopes entirely. The worker calls a truck to deliver a custom crate directly from the supplier.
The tiny path packs objects to minimize waste. The small path uses size classes to enable instant reuse. The large path bypasses the allocator to request memory directly from the operating system. This design keeps the common case fast and prevents the heap from fragmenting into unusable gaps.
Tiny Allocations: Packing for Speed
Tiny allocations use a dedicated 4KB span reserved for objects up to 16 bytes. The runtime divides this span into 256 slots of 16 bytes each. When you allocate a tiny object, the runtime grabs the next available slot. No locking is needed for the common case. The pointer arithmetic is trivial. The allocation completes in a few CPU cycles.
package main
import (
"fmt"
"unsafe"
)
// TinyFlag fits well under 16 bytes.
// The allocator packs this into a 16-byte slot.
type TinyFlag struct {
Active bool
Count int32
}
// SmallData is 100 bytes.
// It uses a size-class bin, not the tiny allocator.
var SmallData = make([]byte, 100)
// LargeBuffer exceeds 32KB.
// It bypasses the allocator and requests OS memory.
var LargeBuffer = make([]byte, 65536)
func main() {
// unsafe.Sizeof reports the type size, not the heap allocation size.
// TinyFlag is 8 bytes on 64-bit systems, but allocated as 16 bytes.
fmt.Println(unsafe.Sizeof(TinyFlag{})) // 8
// The slice header is 24 bytes.
// The underlying array is allocated separately based on length.
fmt.Println(unsafe.Sizeof(SmallData)) // 24
// LargeBuffer is an array value, so Sizeof returns the full size.
fmt.Println(unsafe.Sizeof(LargeBuffer)) // 65536
}
The tiny allocator recycles slots automatically. When a tiny object is no longer referenced, the garbage collector marks the slot as free. The next tiny allocation reuses that slot. This packing reduces the total number of heap objects. Fewer objects mean the garbage collector scans less metadata. The throughput improves.
Convention aside: the receiver name for methods is usually one or two letters matching the type. Write (t *TinyFlag) Reset() instead of (this *TinyFlag) Reset(). The community follows this pattern everywhere. It keeps code concise and readable.
Tiny allocations pack tight. Don't fight the 16-byte slot.
Small Allocations: Size Classes and Bins
Small objects use size classes. The runtime defines a set of standard sizes, starting at 8 bytes and growing by steps that decrease in percentage as the size increases. The classes cover every size up to 32KB. When you request 100 bytes, the allocator rounds up to the nearest size class, which might be 112 bytes. It grabs a pointer from the free list for that class. The allocation is instant.
The free list is the key. When a small object is freed, the pointer goes back to the free list for its size class. The next allocation of the same size reuses that pointer. No system calls are needed. No complex data structures are traversed. The allocator maintains a cache of free lists per goroutine. This keeps contention low even under heavy load.
Size classes also reduce fragmentation. If every allocation used its exact size, the heap would fill with gaps of varying widths. Reusing a 112-byte block for a 100-byte request ensures that freed blocks can always be reused by other 100-byte requests. The memory layout stays predictable.
package main
import (
"fmt"
"runtime"
)
// allocateSmall creates a slice of 100 bytes.
// The allocator rounds up to the nearest size class.
func allocateSmall() []byte {
// make triggers a small allocation.
// The runtime finds the bin for 100 bytes.
buf := make([]byte, 100)
return buf
}
func main() {
// Run the allocator and check memory stats.
// This shows how the runtime tracks small allocations.
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
// SmallAlloc counts bytes allocated in the small object path.
fmt.Printf("Small allocations: %d bytes\n", stats.SmallAlloc)
// SmallFrees counts the number of small objects freed.
fmt.Printf("Small frees: %d\n", stats.SmallFrees)
}
The size class system works best when allocations are uniform. If your program allocates many objects of similar size, the free lists stay full and reuse is high. If you allocate random sizes, the free lists fragment and the allocator has to request new memory from the OS more often. Designing data structures with consistent sizes helps the allocator.
Size classes keep allocation fast. Trust the bins.
Large Allocations: Going Direct to the OS
Large objects bypass the allocator entirely. When you request more than 32KB, the runtime calls sysAlloc to get memory directly from the operating system. On Linux, this usually means mmap. On macOS, it's mmap or vm_allocate. The OS returns a contiguous block of virtual memory. The runtime tracks the block but does not put it in a size-class bin.
Large allocations are slower than small ones. They require a system call. They can fail if the OS runs out of memory. They also cause fragmentation at the OS level. If you allocate and free large blocks repeatedly, the virtual address space can fragment. The OS might struggle to find a contiguous region for a future large allocation.
The garbage collector handles large objects differently. It scans them, but it doesn't pack them. A large object occupies its own span. When it's freed, the memory is returned to the OS or kept in a cache for future large allocations. The runtime tries to reuse large blocks, but the reuse rate is lower than for small objects.
package main
import (
"fmt"
"runtime"
)
// processLargeData allocates a 64KB buffer.
// This triggers a large allocation path.
func processLargeData() {
// make with a large size bypasses size classes.
// The runtime requests memory directly from the OS.
buf := make([]byte, 65536)
// Use the buffer.
// Large allocations should be short-lived or reused.
for i := range buf {
buf[i] = byte(i % 256)
}
// buf goes out of scope.
// The GC will reclaim the large block.
}
func main() {
// Trigger the large allocation.
processLargeData()
// Force a GC to see the effect.
runtime.GC()
var stats runtime.MemStats
runtime.ReadMemStats(&stats)
// LargeAlloc tracks bytes allocated via the large path.
fmt.Printf("Large allocations: %d bytes\n", stats.LargeAlloc)
}
Large allocations are necessary for big buffers, image processing, or network payloads. Just be aware of the cost. If you allocate large objects in a hot loop, you'll see allocation latency and GC pressure. Reuse the buffer or pool it. Don't let large objects accumulate.
Large allocations bypass the bins. Free them when done.
Pitfalls and Compiler Errors
The most common confusion is between type size and allocation size. unsafe.Sizeof reports the size of the type in memory, not the size of the heap allocation. A bool is one byte. A pointer to a bool is eight bytes on 64-bit systems. The allocation for a bool on the heap is 16 bytes because of the tiny allocator. Mixing these up leads to wrong assumptions about memory usage.
If you forget to import unsafe, the compiler rejects the program with undefined: unsafe. If you import it but don't use it, you get imported and not used. Go enforces clean imports. Remove unused imports to fix the error.
Another pitfall is assuming that small allocations are free. They are cheap, but they still trigger garbage collection. If you allocate millions of small objects, the GC has to scan them. The cost adds up. Use sync.Pool to reuse objects in tight loops. Pooling eliminates allocation entirely for the reused objects.
Large allocations can fragment the heap. If your program allocates and frees large blocks of varying sizes, the OS might not be able to satisfy future requests. This is rare in Go programs, but it can happen in long-running services with dynamic workloads. Monitor runtime.MemStats to track large allocation patterns.
Convention aside: gofmt is mandatory. Don't argue about indentation or brace placement. Let the tool decide. Most editors run gofmt on save. It keeps the codebase consistent and saves time.
The worst memory bug is the one that never logs. Track allocations with pprof.
Decision: When to Use What
Use a tiny struct when your data fits under 16 bytes and you want the allocator to pack multiple values into a single memory slot. Use a small slice or array when your data ranges from 17 to 32KB and you benefit from the allocator's size-class bins for fast reuse. Use a large allocation when you need more than 32KB and accept that the memory comes directly from the OS without bin optimization. Use sync.Pool when you allocate and discard the same size object in a tight loop to avoid GC pressure. Use a local variable when the compiler can prove the value stays on the stack, eliminating heap allocation entirely.