When the benchmark doesn't match your intuition
You write a function that processes a slice of integers. You expect it to run in microseconds. The benchmark says it takes milliseconds. You suspect the compiler is hiding something. Maybe it is allocating memory on the heap when it should stay on the stack. Maybe it is refusing to inline a helper function. Go does not keep its optimization decisions secret. The compiler prints them out if you ask.
How the compiler decides where data lives
Go's compiler runs two major static passes that dictate performance: escape analysis and inlining. Both passes operate on an intermediate representation called SSA, which stands for static single assignment. SSA rewrites your code so every variable is assigned exactly once. This makes data flow explicit and gives the optimizer a clean graph to work with.
Escape analysis traces every pointer and reference through that graph. It decides where a variable lives. If a variable's lifetime extends beyond the current function, the compiler moves it to the heap. Heap allocations trigger the garbage collector. Stack allocations vanish when the function returns. The compiler proves lifetimes mathematically. If it cannot prove a value dies before the function returns, it conservatively moves it to the heap.
Inlining copies a function's body directly into the caller. It removes call overhead and gives the optimizer more room to rearrange instructions. The compiler checks function size, complexity, and call frequency. Small functions get inlined automatically. Large functions stay as separate calls. The compiler respects explicit directives when you need to override the default behavior.
The -m flag prints escape analysis and inlining decisions. The -S flag prints the resulting assembly. You use -m to understand memory layout and function boundaries. You use -S to verify loop unrolling or register allocation.
Read the output before you rewrite the code. The compiler knows more about the target machine than you do.
A minimal example with escape analysis
Start with a single file. Keep it isolated so the compiler output stays readable.
package main
// Process takes a slice and returns a new slice with doubled values.
func Process(data []int) []int {
// Allocate a new slice with matching length
result := make([]int, len(data))
// Iterate over input and store doubled values
for i, v := range data {
result[i] = v * 2
}
// Return the newly allocated slice to the caller
return result
}
func main() {
// Initialize a small slice on the stack
input := []int{1, 2, 3}
// Call the function and discard the result
_ = Process(input)
}
Run the compiler directly on the file. Skip go build because it swallows diagnostic output.
go tool compile -m optimizations.go
The output lists every optimization decision, line by line. Look for escapes to heap or does not escape. Look for inlined call to. The compiler prints these messages to standard output. You can pipe them to a file or grep for specific functions.
Escape analysis output maps directly to your source lines. The result slice escapes because it is returned. The input slice does not escape because it stays inside main. The loop variables i and v do not escape because they are copied by value and never stored in a pointer.
Trust the escape analysis. It is deterministic and repeatable.
Reading the output line by line
The -m flag stands for print optimization decisions. It runs the compiler's middle-end passes and stops before code generation. Each line in the output maps to a line in your source file. The compiler tells you exactly what it did with each variable and function call.
Escape analysis runs first. The compiler traces every pointer and reference. If a value might outlive the current stack frame, it marks it as escaping. Heap allocation happens at runtime, but the decision happens at compile time. You will see escapes to heap next to variables that trigger garbage collection work. You will see does not escape next to variables that live and die on the stack.
Inlining runs next. The compiler checks function size, complexity, and call frequency. Small functions get inlined by default. Large functions require the //go:noinline directive to prevent inlining, or the compiler might refuse to inline them automatically. The output shows inlined call to when the function body gets copied. It shows not inlined call to when the compiler leaves the call boundary intact.
Go convention favors explicit over implicit. The compiler makes conservative choices by default. If you need to force a behavior, you use compiler directives like //go:noinline or //go:noescape. These are rare in production code. Most performance tuning happens by changing the data structures, not by fighting the optimizer.
Receiver naming follows a simple rule. Use one or two letters matching the type. Write (c ConsoleLogger) instead of (this ConsoleLogger) or (self ConsoleLogger). The compiler does not care about the name, but the community does. Consistent naming keeps the codebase readable when you scan escape analysis output.
Let the compiler decide. Change the data layout if the allocation pattern hurts you.
Interfaces, closures, and heap pressure
Real code rarely lives in a single function. Interfaces and pointers complicate escape analysis. Consider a logging wrapper that accepts an interface.
package main
import "fmt"
// Logger defines a minimal logging interface
type Logger interface {
// Log prints a formatted message
Log(msg string)
}
// ConsoleLogger implements Logger
type ConsoleLogger struct{}
// Log prints to standard output
func (c ConsoleLogger) Log(msg string) {
// Print the message with a prefix
fmt.Println("[LOG]", msg)
}
// ProcessWithLogging runs a task and logs completion
func ProcessWithLogging(l Logger) {
// Allocate a buffer for temporary work
buf := make([]byte, 1024)
// Use the buffer for a short computation
_ = len(buf)
// Log that the task finished
l.Log("task complete")
}
func main() {
// Create a logger instance
var l Logger = ConsoleLogger{}
// Run the processing function
ProcessWithLogging(l)
}
Compile this with -m. The output reveals how interfaces affect allocation. The ConsoleLogger struct itself might stay on the stack, but the interface value contains a pointer to the type information and a pointer to the data. That interface value escapes if it crosses a function boundary. The buf slice inside ProcessWithLogging does not escape. It lives on the stack and vanishes when the function returns. The compiler proves this by tracing the lifetime of buf.
Closures add another layer. A closure captures variables from its surrounding scope. If the closure outlives the function, the captured variables escape. The compiler prints escapes to heap next to every captured variable. You can avoid the escape by passing the value as an argument instead of capturing it.
Go convention says accept interfaces, return structs. Functions should take the most general type they need and return the most specific type they can. This keeps your API flexible and keeps escape analysis predictable.
Switch to -S to see the assembly. The assembly output is verbose. It shows register moves, memory loads, and branch instructions. Look for MOVQ instructions that load addresses. Look for CALL instructions that represent function boundaries. The assembly confirms what -m predicted. If -m says a function was inlined, the assembly will not contain a CALL to that function. It will contain the raw instructions embedded in the caller.
Go does not require you to read assembly to write fast code. The compiler handles register allocation and instruction scheduling. You read assembly only when -m leaves you with questions. You use it to verify loop unrolling or to check if a critical path actually got inlined.
Read the assembly only when the benchmark proves it matters.
Pitfalls and compiler feedback
The -m flag only works on single .go files. Running it against a package directory fails. The compiler rejects the command with go tool compile: cannot specify package directories if you pass a folder path. You must target the exact file you want to inspect.
Escape analysis output can be misleading if you misread the context. A variable marked as escaping does not always mean poor performance. The heap is fast for large allocations. The stack is fast for small, short-lived values. The compiler moves things to the heap when it cannot prove they will die before the function returns. Trust the analysis. Do not rewrite code just to force stack allocation unless profiling proves it matters.
Inlining decisions depend on function size. The compiler has a default inlining size limit. If a function exceeds it, the compiler prints not inlined call to and explains the reason. You can override the limit with //go:noinline to prevent inlining, or you can split the function into smaller helpers. The compiler will not inline recursive functions by default. It will not inline functions that take variadic arguments unless the call site has a fixed number of arguments.
The -S flag prints raw assembly to standard output. The output scrolls quickly. Pipe it to a file or use less to navigate it. The assembly uses the syntax of the target architecture. On x86_64, you see Intel-style or AT&T-style instructions depending on the toolchain. On ARM64, you see ARM assembly. The compiler abstracts the differences for you. You only need to recognize CALL, RET, and memory operands to verify inlining.
Go convention keeps compiler directives out of application code. Directives belong in benchmarks or low-level libraries. Application code should rely on the compiler's defaults. The gofmt tool does not touch compiler directives, but it does enforce consistent formatting around them. Keep directives on their own line. Align them with the function signature.
Error handling in Go follows a simple pattern. Check if err != nil immediately. Return the error or wrap it. The verbose style makes the unhappy path visible. The compiler will reject your program with undefined: err if you reference an error variable that was never assigned. It will reject your program with imported and not used if you add a package and forget to call it. These errors stop you from shipping broken code.
Fix the error. Do not suppress it. The compiler is your first reviewer.
When to use each tool
Use go tool compile -m when you need to verify escape analysis and inlining decisions for a specific function. Use go tool compile -S when you need to confirm that the generated assembly matches your performance expectations. Use go test -bench with pprof when you need to measure real-world allocation rates and CPU time across a full package. Use go tool trace when you need to visualize goroutine scheduling and system call latency. Use plain benchmarks when you only care about end-to-end throughput and latency. Trust the compiler's defaults unless profiling proves a specific bottleneck.
The compiler optimizes what you write. Profile what you ship.