Introduction to Go Assembly (Plan 9 Assembly)

The escape hatch

You hit a performance wall. Your Go code processes millions of records per second, but a single tight loop is eating forty percent of your CPU time. The compiler has done its job, but it cannot vectorize the loop the way you need. You look at the generated machine code, see a few extra memory loads, and realize the only way to squeeze out the last few percent is to write the loop yourself in assembly.

Go does not hide this escape hatch. It provides a built-in assembler that speaks a custom dialect derived from the Plan 9 operating system. You do not need to learn x86 AT&T syntax or Intel syntax. You learn the Go dialect, and the go toolchain handles the rest.

Assembly in Go is not a replacement for idiomatic Go code. It is a precision instrument. You reach for it when the compiler cannot see the optimization you know exists, or when you need to emit a specific CPU instruction that the standard library does not expose. The learning curve is steep, but the payoff is direct control over the machine.

Go assembly is a tool, not a crutch. Master the compiler first.

How the dialect works

Plan 9 assembly is a thin layer over raw machine code. Think of it as a direct wiring diagram for the CPU. Instead of abstracting hardware away, it gives you explicit control over registers, stack pointers, and instruction scheduling. The syntax looks familiar if you have seen assembly before, but it follows strict rules that match Go's calling conventions. Arguments live in specific locations. Return values go into specific locations. The stack frame follows a predictable layout.

The assembler reads .s files and outputs .o object files. Those object files link seamlessly with your Go code. You can call assembly functions from Go, and you can call Go functions from assembly. The boundary is clean, but crossing it requires discipline.

Convention aside: Go assembly files use the .s extension and live in the same package directory as the Go files they support. You do not declare imports in assembly. The linker resolves symbols automatically. The middle dot · marks package-local names, and SB means the symbol is relative to the static base. These tokens are not optional. They tell the linker where to find your function.

The dialect maps directly to Go's runtime expectations. Trust the syntax. Follow the ABI.

A minimal function

Here is the simplest possible assembly function. It takes two 64-bit integers, adds them, and returns the result.

// add.s
// Add returns the sum of two 64-bit integers.
//go:nosplit
TEXT ·Add(SB), NOSPLIT|NOFRAME, $0-16
    MOVQ a+0(FP), AX
    MOVQ b+8(FP), BX
    ADDQ BX, AX
    MOVQ AX, ret+16(FP)
    RET

The assembler parses the .s file and translates each line into machine instructions. TEXT declares a function. The NOSPLIT flag tells the runtime not to insert a stack overflow check at the entry point. NOFRAME means we do not allocate a new stack frame. $0-16 means zero bytes of local stack space and 16 bytes of arguments.

Arguments live on the stack at fixed offsets from the frame pointer (FP). a+0(FP) reads the first argument. b+8(FP) reads the second. The MOVQ instruction loads 64-bit values into registers AX and BX. ADDQ performs the addition. The result goes back to the stack at ret+16(FP). RET pops the return address and hands control back to Go.

You compile it with go tool asm add.s. The command produces add.o. You then link it into your module. The Go compiler treats add.o like any other compiled package file. You call Add from Go exactly like a normal function.

Convention aside: The //go:nosplit directive above the TEXT line is standard practice for short assembly functions. It prevents the runtime from inserting a stack check that could interfere with your manual register management. Omit it only when you explicitly need the runtime to handle stack growth.

Write the function. Compile it. Call it from Go. Verify the result.

Real-world patterns

Real assembly in Go usually handles data structures or tight loops. Here is a function that processes a memory region and zeroes it out. It demonstrates how you iterate manually and manage pointers without Go's bounds checking.

// zero.s
// Zero sets all bytes in a memory region to zero.
//go:nosplit
TEXT ·Zero(SB), NOSPLIT|NOFRAME, $0-16
    MOVQ ptr+0(FP), AX
    MOVQ len+8(FP), CX
    TESTQ CX, CX
    JZ done
loop:
    MOVL $0, 0(AX)
    ADDQ $4, AX
    SUBQ $4, CX
    JG loop
done:
    RET

The function receives a pointer and a length. MOVQ ptr+0(FP), AX loads the base address. MOVQ len+8(FP), CX loads the byte count. TESTQ checks if the length is zero. If it is, the jump to done skips the loop entirely.

Inside the loop, MOVL $0, 0(AX) writes a 32-bit zero to the current address. ADDQ $4, AX advances the pointer by four bytes. SUBQ $4, CX decrements the remaining byte count. JG loop continues while the counter stays positive. When the counter hits zero, execution falls through to done and returns.

You would call this from Go by passing a pointer to a slice or a buffer. The assembly function does not know about Go's slice header. It only knows raw memory. That is the trade-off. You gain speed and control. You lose automatic bounds checking and garbage collection awareness.

Convention aside: Go assembly does not use gofmt. The community expects .s files to follow consistent indentation and spacing, but there is no automatic formatter. Most teams adopt a simple two-space indent and align operands vertically. Pick a style and stick to it across the codebase.

Keep loops tight. Preserve registers. Return cleanly.

Where things break

Assembly bypasses Go's safety nets. The compiler will not catch out-of-bounds memory access. It will not verify that you preserved callee-saved registers. If you clobber a register that Go expects to survive across a call, your program will crash with a segmentation fault or corrupt heap metadata.

The assembler is strict about syntax. If you forget the middle dot on a function name, you get go tool asm: zero.s:3: syntax error: unexpected token. If you misalign the stack or forget to preserve registers, the runtime panics with fatal error: stack split at bad time or runtime: unknown stack frame. Always verify your assembly against the generated Go code. Run go tool compile -S on a reference Go function to see how the compiler handles the same logic. Compare register usage and stack layout.

Convention aside: The Go runtime expects certain registers to remain unchanged across function calls. On amd64, BX, BP, R12, R13, R14, and R15 are callee-saved. If you modify them, you must restore them before returning. The compiler will not remind you. The crash will.

Debug assembly by printing register values before and after critical instructions. Use go tool objdump to inspect the final binary. Match the instruction addresses to your source lines.

The worst assembly bug is the one that silently corrupts memory. Test aggressively.

When to reach for assembly

Use Go assembly when you need to emit specific CPU instructions that the compiler cannot generate, such as SIMD extensions or atomic primitives. Use Go assembly when profiling shows a tight loop consuming disproportionate CPU time and compiler optimizations fall short. Use standard Go code when readability and maintainability matter more than marginal performance gains. Use CGO when you need to call existing C libraries rather than write raw machine code. Use the unsafe package when you need pointer arithmetic or type punning without dropping to assembly.

Assembly is a last resort for optimization. Keep your codebase readable.

Where to go next

Go Assembly is a way to write code that talks directly to the computer's processor, skipping the usual Go language rules for maximum speed. Think of it like writing instructions for a robot in its native language instead of using a high-level translator. You use it when you need to squeeze out every bit of performance or access hardware features that Go doesn't expose directly.