When Writing Assembly in Go Actually Makes Sense

You spend three days optimizing a cryptographic hash function. The profiler shows the hot loop is burning cycles on bounds checks and indirect calls. You rewrite the loop in Go. You add //go:noinline. You use unsafe.Pointer to bypass checks. The speed improves, but the compiler still can't emit the exact instruction sequence you need. You need to tell the CPU exactly which registers to use and which SIMD instruction to fire. The Go compiler vectorizes well, but it follows a fixed strategy. Sometimes that strategy leaves performance on the table. That's when you reach for assembly.

Assembly in Go isn't about rewriting your application in low-level code. It's about writing a tiny function that the compiler cannot express. You write a .s file with Go's pseudo-assembly syntax, and the toolchain links it into your package. The result is a function that looks like Go from the outside but executes raw machine instructions on the inside.

The raw wire beneath the abstraction

Go handles memory allocation, garbage collection, and stack growth for you. The compiler inserts checks, manages frames, and ensures safety. Assembly assumes you know exactly where every byte lives and how the CPU executes instructions. Writing assembly in Go means stepping outside the managed world and taking responsibility for the hardware interface.

Go does not use raw Intel or AT&T syntax. It has its own assembler with a pseudo-assembly syntax. You write instructions like MOVQ and ADDL, and the assembler translates them to the correct opcodes for the target architecture. This syntax is tied to the architecture but provides a consistent layer across Go versions. The assembler understands Go's calling conventions, stack layout, and symbol naming. You don't fight the toolchain; you use its assembly dialect to fill gaps the compiler cannot bridge.

Assembly is a scalpel, not a hammer. Use it to remove a bottleneck, not to build the house.

A minimal bridge

Here's the bridge between Go and assembly: a Go declaration with no body, and an assembly file that implements it. The Go file declares the function signature. The assembly file provides the implementation. The linker matches them by symbol name.

// main.go
package main

import "fmt"

//go:nosplit
func add(a, b int) int

func main() {
    fmt.Println(add(1, 2))
}

// asm.s
// add takes two ints and returns their sum using assembly.
TEXT ·add(SB), NOSPLIT, $0-24
// NOSPLIT tells the runtime not to grow the stack here.
// $0-24 means 0 bytes of local stack, 24 bytes of arguments.
// On 64-bit, each int is 8 bytes. 3 slots = 24 bytes.
MOVQ a+0(FP), AX
// Load first argument from frame pointer into AX register.
MOVQ b+8(FP), BX
// Load second argument into BX register.
ADDQ AX, BX
// Add AX to BX. Result stays in BX.
MOVQ BX, ret+16(FP)
// Store result back to return value slot.
RET
// Return to caller.

The TEXT directive defines the function. ·add is the symbol name. The dot · is a special character that binds the symbol to the current package. SB stands for static base pointer, which anchors the symbol in the data segment. NOSPLIT is a flag that disables stack growth. $0-24 specifies the stack frame: zero bytes for local variables, twenty-four bytes for arguments and return values.

Arguments and return values live on the stack at fixed offsets from the frame pointer FP. On AMD64, the first argument is at 0(FP), the second at 8(FP), and the return value at 16(FP). The assembler calculates these offsets based on the function signature. You load values from the stack into registers, perform operations, and store the result back.

The Go declaration includes //go:nosplit. This directive must match the NOSPLIT flag in assembly. If the Go side allows stack splitting but the assembly side forbids it, the runtime behavior becomes undefined. The compiler and assembler must agree on stack management.

The dot binds the symbol to the package. Miss it and the linker gives up.

How the toolchain handles assembly

When you run go build, the compiler scans the package for .s files. It passes them to the assembler, which produces object files. The linker merges the object files with the Go code. You rarely run go tool asm manually. The build system orchestrates the process. If you add a .s file to a package, go build picks it up automatically.

The assembler validates syntax and generates symbols. If a symbol is missing or mismatched, the linker reports an error. The Go compiler does not generate assembly for .s files; it trusts the assembler to produce correct object code. This separation means assembly functions can call Go functions, and Go functions can call assembly functions, as long as the symbols match and the calling conventions align.

Go assembly supports build tags. You can restrict an assembly file to a specific architecture or OS using //go:build directives. This is essential for portability. Assembly is inherently tied to the hardware. A function written for AMD64 will not run on ARM. Build tags ensure the assembler only processes files that match the target.

// go:build amd64
// This file is only assembled for AMD64 targets.

Build tags prevent the assembler from trying to compile incompatible code. They also allow you to provide multiple implementations for different architectures. The build system selects the right file based on GOOS and GOARCH.

Assembly files follow the same naming conventions as Go files. Public symbols start with a capital letter in Go, but in assembly, the dot · handles package scope. The symbol name after the dot must match the Go function name exactly. Case matters. ·Add and ·add are different symbols.

The compiler doesn't generate assembly for your .s files. It passes them to a separate assembler. The Go compiler and the Go assembler are different tools that speak the same symbol language.

Real-world pattern: syscall entry points

Here's a simplified syscall wrapper. Go's syscall package uses assembly to set up registers exactly how the kernel expects them. The kernel interface requires specific registers for the trap number and arguments. Go's high-level syscall package abstracts this, but the underlying implementation relies on assembly to bridge the gap.

// syscall_asm.go
package main

import "syscall"

//go:nosplit
func rawSyscall(trap, a1, a2, a3 uintptr) (r1, r2 uintptr, err syscall.Errno)

// syscall_asm.s
// rawSyscall executes a system call with four arguments.
TEXT ·rawSyscall(SB), NOSPLIT, $0-48
// NOSPLIT is mandatory. Syscalls cannot grow the stack.
// $0-48 means 0 local bytes, 48 bytes for args and returns.
MOVQ trap+0(FP), AX
// Load trap number into AX register.
MOVQ a1+8(FP), DI
// Load first argument into DI register.
MOVQ a2+16(FP), SI
// Load second argument into SI register.
MOVQ a3+24(FP), DX
// Load third argument into DX register.
SYSCALL
// Execute the system call.
MOVQ AX, r1+32(FP)
// Store first return value.
MOVQ DX, r2+40(FP)
// Store second return value.
TESTQ AX, AX
// Check if return value is negative.
JNS noError
// If positive or zero, no error.
CMPQ AX, $-4095
// Errors are in range [-4095, -1].
JGE errorPath
// If AX >= -4095, it's an error.
noError:
// Clear error field.
MOVL $0, err+48(FP)
JMP done
errorPath:
// Negate AX to get positive errno.
NEGQ AX
MOVL AX, err+48(FP)
done:
RET

The AMD64 syscall convention requires the trap number in AX and arguments in DI, SI, DX. The assembly loads values from the stack into these registers. The SYSCALL instruction triggers the kernel. The kernel returns results in AX and DX. The assembly checks for errors. Linux syscalls return negative values in the range -1 to -4095 to indicate errors. The code checks if the return value is negative and within the error range. If so, it negates the value to produce a positive error number.

This pattern appears in the Go runtime and syscall package. Assembly provides precise control over register allocation and instruction selection. The compiler cannot guarantee this level of control. The kernel interface is rigid. Assembly ensures the interface is satisfied.

The kernel doesn't care about Go. It cares about registers. Get the registers right or the call fails.

Pitfalls and compiler traps

Assembly in Go introduces risks that Go code avoids. Stack management is manual. If you misalign the stack or forget to preserve registers, the program crashes. The compiler cannot protect you from these mistakes. You must follow the rules.

If you forget to implement the assembly function, the linker rejects the build with undefined: package·func. The linker expects every symbol declared in Go to have an implementation. Missing assembly functions cause link errors.

If you mark a function NOSPLIT in assembly but the Go declaration lacks //go:nosplit, the compiler might insert a stack check that crashes when the assembly returns. The runtime expects consistency. The NOSPLIT flag in assembly must match the //go:nosplit directive in Go. Mismatched flags lead to undefined behavior.

Stack alignment is another trap. The stack must be 16-byte aligned before a call. If you misalign it, SIMD instructions might fault or the kernel might reject a syscall. The assembler does not enforce alignment. You must calculate offsets correctly. The $0-24 syntax in TEXT helps, but local stack usage must also respect alignment.

Goroutine leaks can happen if an assembly function blocks without a cancellation path. Assembly functions run in the same goroutine as the caller. If the assembly code waits on a channel or sleeps, the goroutine blocks. Always ensure assembly functions return promptly or respect context cancellation.

The receiver name convention does not apply to assembly. Assembly functions use the dot · for package scope. The symbol name must match the Go function name. Case sensitivity matters. ·Add and ·add are distinct symbols.

Stack alignment is invisible until it crashes. Trust the alignment rules.

When to reach for assembly

Use assembly when you need to emit specific instructions the compiler cannot generate, such as SIMD operations or atomic primitives. Use assembly when you must control register allocation for syscalls or hardware interfaces that require precise setup. Use assembly when implementing runtime primitives that require stack manipulation or NOSPLIT guarantees. Use Go with unsafe when you need performance but the compiler can optimize the logic with hints like //go:noinline or //go:noescape. Use cgo when you need to call existing C libraries without writing assembly wrappers. Use plain Go when the performance difference is negligible; assembly is hard to maintain and breaks portability.

Write Go first. Profile. Reach for assembly only when the profiler points to a bottleneck that Go cannot solve.

Where to go next

Writing assembly in Go is like manually tuning a car engine instead of letting the factory build it; you only do this when you need maximum speed or specific control that the standard tools can't provide. It matters for the core parts of the language itself, like memory management or system calls, where every tiny bit of performance counts. You would use it when the Go compiler simply cannot generate the exact machine code your hardware or operating system requires.