How to Use go tool objdump and go tool compile -S
You wrote a function that processes a slice of integers. The profiler says it's taking 40% of the CPU time. You stare at the Go code. It's just a simple loop adding numbers. There's no allocation, no map lookup, no network call. Yet the machine is sweating. You need to look past the Go syntax and see what the CPU is actually executing. That's where go tool compile -S and go tool objdump come in. They peel back the abstraction layer and show you the raw assembly instructions the compiler generated.
Assembly reveals the truth. Go code is just a suggestion.
Concept in plain words
Go code is high-level. It describes what you want to happen. The compiler translates that into assembly, which describes how the CPU should move data around registers and memory. go tool compile -S stops the compilation process right before linking and prints the assembly for a source file. It's the compiler talking to you directly. go tool objdump takes a finished binary and reverses the process, showing you the assembly inside the executable. One looks at the source-to-assembly step, the other inspects the final artifact.
Go uses its own assembly syntax, which differs from AT&T and Intel syntax. The destination operand comes last. MOVQ src, dst moves data from src to dst. In AT&T syntax, the destination is first. This reversal catches many developers off guard. Also, Go assembly uses SB to denote global symbols and FP for the frame pointer. Registers have specific names like AX, BX, CX, DX. Understanding the syntax is the first step to reading the output.
Assembly reveals the truth. Go code is just a suggestion.
Minimal example
Here's the simplest function to inspect. Run go tool compile -S main.go to see the assembly output.
// main.go
package main
// Add returns the sum of two integers.
// This function demonstrates basic register usage and stack arguments.
func Add(a, b int) int {
// Simple addition to inspect the generated assembly.
return a + b
}
func main() {}
# output:
"".Add t=1 SIZE $24-24
"".Add t=1 DUPLOCS $1
"".Add t=1 NOFRAME
"".Add t=1 FUNCDATA $0,gclocalsΒ·33cdeccccebe80329f1fdbee7f5874cb.s
"".Add t=1 PCFILE $0,main.go
"".Add t=1 TEXT "".Add(SB), $24-24
"".Add t=1 PCDATA $0,$0
"".Add t=1 PCDATA $1,$0
"".Add t=1 MOVQ "".a+8(FP), AX // Load first argument into AX register.
"".Add t=1 ADDQ "".b+16(FP), AX // Add second argument to AX.
"".Add t=1 MOVQ AX, "".~r2+24(FP) // Store result in return slot.
"".Add t=1 RET // Return from function.
"".Add t=1 GLOBL $0
Registers hold the state. The stack holds the history.
Walk through what happens
The assembly shows the compiler loading arguments from the stack frame, adding them in a register, and storing the result back. MOVQ stands for Move Quad Word. It copies 64 bits of data. AX is a general-purpose register. The CPU does math in registers, not directly on the stack. FP is the frame pointer, a register that tracks the current function's stack frame. Arguments live at fixed offsets from the frame pointer. The compiler generates code to fetch a from +8(FP), fetch b from +16(FP), add them, and write the result to the return slot at +24(FP).
The output from go tool compile -S contains more than just CPU instructions. You'll see lines like PCDATA $0,$0 and FUNCDATA $0,gclocals.... These are metadata directives for the runtime. PCDATA maps program counters to panic information. FUNCDATA tells the garbage collector where pointers live. The compiler emits these so the runtime can handle panics and GC correctly. You can ignore them when looking for logic, but they explain why the output is so verbose.
Assembly reveals the truth. Go code is just a suggestion.
Realistic example
Here's a function with a loop. Compile it to a binary and inspect it with objdump.
// sum.go
package main
// Sum calculates the total of a slice.
// This function triggers bounds checks and loop logic in assembly.
func Sum(data []int) int {
total := 0
for i := 0; i < len(data); i++ {
total += data[i]
}
return total
}
func main() {}
Run go build -o sum.exe sum.go && go tool objdump -s Sum ./sum.exe to see the assembly with source line correlation.
# output:
TEXT "".Sum(SB) sum.go
sum.go:6 0x401230 48895c2418 MOVQ BP, 0x18(SP) // Save old base pointer.
sum.go:6 0x401235 488d6c2418 LEAQ 0x18(SP), BP // Set new base pointer.
sum.go:6 0x40123a 4883ec20 SUBQ $0x20, SP // Allocate stack space.
sum.go:7 0x40123e 488b442438 MOVQ 0x38(SP), AX // Load slice length into AX.
sum.go:7 0x401243 488b4c2430 MOVQ 0x30(SP), CX // Load slice data pointer into CX.
sum.go:7 0x401248 488b542428 MOVQ 0x28(SP), DX // Load slice capacity into DX.
sum.go:7 0x40124d 4885c0 TESTQ AX, AX // Check if length is zero.
sum.go:7 0x401250 7e15 JLE 0x401267 // Jump to end if length <= 0.
sum.go:8 0x401252 488b442440 MOVQ 0x40(SP), BX // Load loop index.
sum.go:8 0x401257 4839d8 CMPQ BX, AX // Compare index with length.
sum.go:8 0x40125a 7d0a JGE 0x401266 // Jump to end if index >= length.
sum.go:9 0x40125c 4863d3 MOVSXD RX, BX // Sign-extend index for addressing.
sum.go:9 0x40125f 488b0491 MOVQ (CX)(RX*8), AX // Load data[index] into AX.
sum.go:9 0x401263 4801442448 ADDQ AX, 0x48(SP) // Add to total.
sum.go:8 0x401268 488344244001 ADDQ $0x1, 0x40(SP) // Increment index.
sum.go:8 0x40126d eb e8 JMP 0x401257 // Jump back to loop check.
sum.go:11 0x40126f 488b442448 MOVQ 0x48(SP), AX // Load total into return register.
sum.go:11 0x401274 488b6c2418 MOVQ 0x18(SP), BP // Restore base pointer.
sum.go:11 0x401279 4883c420 ADDQ $0x20, SP // Deallocate stack space.
sum.go:11 0x40127d c3 RET // Return.
Bounds checks are real. The compiler guards every slice access.
The loop body reveals how Go handles slices. The compiler loads the slice header: data pointer, length, and capacity. It checks the length against zero to avoid looping on empty slices. Inside the loop, it compares the index with the length. If the index is greater or equal, it jumps out. This is the bounds check. Go enforces bounds checks on every slice access. The MOVSXD instruction sign-extends the index. This ensures the index is treated as a signed 64-bit value for the memory address calculation. The MOVQ (CX)(RX*8), AX instruction loads the element. It uses the base pointer CX, the index RX, and a scale factor of 8. The scale factor matches the size of an int. This single instruction calculates the address and loads the value.
Assembly is not portable. The instructions depend on the CPU architecture. go tool compile -S outputs assembly for the current machine by default. If you are on an M1 Mac, you see ARM64 instructions. If you are on a Linux server, you likely see AMD64. You can force a specific architecture by setting the GOARCH environment variable. Run GOARCH=amd64 go tool compile -S main.go to see AMD64 assembly regardless of your machine. This helps when you need to understand code running on a different platform.
Bounds checks are real. The compiler guards every slice access.
Pitfalls and compiler errors
The compiler rejects the file with expected 'package', found 'EOF' if you pass a file that isn't valid Go. go tool compile -S runs the full compilation pipeline, so syntax errors stop the assembly dump. If you run go tool objdump on a binary built for a different architecture, you get go tool objdump: ./binary: file format not recognized. Always match the tool to the binary.
By default, the compiler optimizes aggressively. It might eliminate your function entirely if it sees the result is unused. It might inline small functions. It might unroll loops. If you are trying to understand how a specific Go construct translates to assembly, optimizations get in the way. Pass -N -l to go tool compile -S to see the raw translation. -N turns off optimizations. -l turns off inlining. The output will be larger and slower, but it will map directly to your source code. This is essential when you are learning or debugging a compiler bug.
Optimizers lie to the unprepared. Disable them to see the raw translation.
A common mistake is assuming the assembly matches the source line-for-line. The compiler inlines functions, reorders loops, and eliminates dead code. The assembly might look nothing like your Go code if the optimizer is aggressive. When reading assembly, look for patterns. Allocations show up as calls to runtime.mallocgc. If you see that call, your code is allocating memory. Interface calls involve type assertions and method table lookups. They are more expensive than direct calls. Pointer dereferences show up as loads from memory addresses. If you see many loads and stores, you might be chasing pointers. Learning these patterns helps you spot inefficiencies without understanding every instruction.
The compiler does what it wants. Verify with the flags, not assumptions.
Decision: when to use this vs alternatives
Use go tool compile -S when you want to see the assembly for a specific source file before linking. Use go tool objdump when you need to inspect a compiled binary and correlate assembly back to source lines. Use go tool compile -S -N -l when you want to see unoptimized assembly to understand the direct translation of Go constructs. Use go tool objdump -s FunctionName when you only care about one function and want to skip the noise of the entire binary. Use CPU profiling with pprof when you need to find performance hotspots without reading assembly. Use go tool trace when you want to see goroutine scheduling and runtime behavior. Use plain benchmarks with go test -bench when you need quantitative performance data.
Pick the tool that matches your question. Source questions get source tools. Binary questions get binary tools.