How Link-Time Optimization Works in Go

Go does not support traditional Link-Time Optimization; use -ldflags and inlining directives for performance tuning.

The missing flag

You compile a Go program and compare the binary size to a C++ build. The C++ binary is half the size. You check the C++ Makefile and see -flto passed to the linker. You try go build -lto and the compiler rejects the flag. Go does not have Link-Time Optimization. That does not mean your code is unoptimized. It means Go takes a different path to performance.

The absence of LTO is a deliberate design choice. Go prioritizes build speed and caching over the marginal gains of whole-program analysis. The compiler optimizes aggressively within packages and performs specific cross-package optimizations like inlining. You get most of the speed benefits without the build-time tax.

Packages, not programs

Traditional LTO works by delaying optimization until the linker stage. The compiler generates intermediate representations for every file. The linker collects them all, sees the entire program, and runs optimization passes across boundaries. It can inline a function from a library into main, remove dead code across packages, and reorder data for better cache locality.

Go compiles packages individually. A package is a collection of .go files in a directory that share an import path. The compiler sees all files in a package at once. It can inline functions across files within the package, eliminate dead code inside the package, and optimize data flow freely. The output is an object file (an .a archive) containing compiled code and metadata.

The linker combines object files. It resolves symbols, relocations, and imports. It does not re-run optimization passes. This separation enables the build cache. When you change one function, Go only recompiles that package. The cache stores the object file. Subsequent builds reuse it instantly. LTO would break this model. If the linker optimizes across boundaries, a change in a dependency could force a full relink and re-optimization of the entire program. Go avoids that cost.

The trade-off is clear. You lose the ability to optimize across package boundaries in the general case. You gain build times that stay fast as the project grows. The Go team measured this trade-off and chose developer velocity. Most Go programs are fast enough without LTO. The compiler's per-package optimization captures the low-hanging fruit.

Inlining is automatic. Trust the cost model.

Cross-package inlining

Go does perform one major cross-package optimization: inlining. The compiler can inline small functions from other packages into the caller. This removes function call overhead and enables further optimizations like constant folding.

The compiler assigns a "cost" to each function based on its size and complexity. If the cost is below a threshold, the compiler inlines the function. The threshold is tuned to balance speed and binary size. Inlining increases code size. Too much inlining can hurt instruction cache performance. The compiler weighs these factors.

Cross-package inlining works because the compiler stores function bodies in the object file in a form that callers can use. When compiling main, the compiler reads the object file for a dependency. If main calls a small function from the dependency, the compiler inlines the body directly. The dependency does not need to be recompiled. The definition is available in the object file.

This mechanism handles the most common performance win. Small helper functions, getters, and simple calculations get inlined automatically. You do not need to mark them. The compiler decides based on the cost model.

package main

import (
    "fmt"
    "myapp/lib"
)

func main() {
    // lib.Add is a small function in another package.
    // The compiler reads the definition from the lib object file.
    // It calculates the cost and decides to inline it here.
    // The resulting assembly has no function call to lib.Add.
    sum := lib.Add(10, 20)
    fmt.Println(sum)
}

The lib.Add function lives in a separate package. The compiler still inlines it because it is small. The optimization happens during the compilation of main, not at link time. The linker just stitches the inlined code into the final binary.

Seeing the optimizer work

You can inspect the compiler's decisions using -gcflags="-m". This flag enables optimization tracing. The compiler prints why it inlined or did not inline each function.

go build -gcflags="-m" -o myapp main.go

The output lists functions and their status. You will see messages like can inline SmallFunc or cannot inline LargeFunc: too large. The compiler explains its reasoning. This helps you understand what the optimizer is doing without guessing.

package main

import "fmt"

// SmallHelper returns the absolute value of an integer.
// This function is tiny. The compiler will inline it.
func SmallHelper(x int) int {
    if x < 0 {
        return -x
    }
    return x
}

// ComplexCalc performs a loop with branching.
// The cost is high. The compiler will likely not inline this.
func ComplexCalc(n int) int {
    sum := 0
    for i := 0; i < n; i++ {
        // The loop and branching increase the cost.
        // Inlining this would bloat the caller.
        if i%2 == 0 {
            sum += i
        } else {
            sum -= i
        }
    }
    return sum
}

func main() {
    // SmallHelper gets inlined. The check and negation
    // appear directly in main's assembly.
    a := SmallHelper(-5)

    // ComplexCalc stays as a function call.
    // The compiler preserves the call boundary.
    b := ComplexCalc(100)

    fmt.Println(a, b)
}

Run the build with -gcflags="-m". The output confirms SmallHelper is inlined and ComplexCalc is not. The compiler's cost model works as expected. You can also disable inlining entirely with -gcflags="-l". This is useful for debugging or measuring the impact of inlining on binary size.

Inlining is automatic. Trust the cost model.

Shrinking the binary

LTO often reduces binary size by removing dead code and merging identical functions. Go handles binary size differently. The compiler does not remove unused functions across packages by default. The linker includes all exported functions from imported packages. If you import a large library, the binary includes the whole library, even if you use one function.

To reduce size, use -ldflags="-s -w". The -s flag strips the symbol table. The -w flag strips DWARF debug information. These flags remove metadata used by debuggers and profilers. The binary becomes significantly smaller. The code itself remains the same.

go build -ldflags="-s -w" -o myapp main.go

This is the standard way to shrink Go binaries. It is common in Docker images and embedded deployments. The trade-off is that you lose debug symbols. Stack traces become less readable. Profiling tools need source maps or symbols to annotate code. Strip symbols only for production releases.

Dead code elimination does happen in Go. The linker removes functions that are never called. If a package exports a function but no code calls it, the linker drops it. This works across packages. The compiler marks functions as used or unused. The linker respects these marks. You get some dead code removal without full LTO.

Convention aside: Go developers rarely tune binary size manually. The community convention is to ship binaries with debug info for easier troubleshooting. Strip symbols only when size is a hard constraint, such as for IoT devices or minimal containers. Most server applications run with full symbols.

Strip symbols for size. Keep them for debugging.

When optimization bites back

Inlining and optimization can cause issues. Inlined functions disappear from stack traces. If a panic occurs inside an inlined function, the stack trace shows the caller, not the function. This can make debugging harder. The stack trace might skip the inlined frame entirely.

panic: runtime error: index out of range

goroutine 1 [running]:
main.main()
        /path/to/main.go:15 +0x45

The panic happened in SmallHelper, but the trace points to main. The function was inlined. You need to check the source code to find the issue. This is a known behavior. The compiler inlines for speed, not debuggability.

You can force the compiler to respect boundaries using //go:noinline. This directive prevents inlining. Use it when you need a clean stack trace or when inlining causes code bloat.

// DebugHelper prints diagnostic info.
// The noinline directive ensures this function appears in stack traces.
// This helps locate panics during development.
//go:noinline
func DebugHelper(msg string) {
    fmt.Println(msg)
}

The compiler respects //go:noinline. The function stays as a call. You can also suggest inlining with //go:inline. The compiler may ignore this if the cost is too high. If you force inlining on a function that cannot be inlined, the compiler rejects the program. You get an error like function cannot be inlined: exceeds cost limit or function cannot be inlined: contains recursive call. The compiler protects you from invalid optimizations.

Inlining can also increase binary size. If a small function is inlined many times, the code repeats. This can hurt instruction cache performance. The compiler's cost model tries to prevent this, but edge cases exist. Profile your code before adding directives. The compiler usually makes the right choice.

Profile first. Directives are for edge cases.

Decision matrix

Use automatic inlining when you want the compiler to balance speed and size. The cost model handles most cases correctly.

Use //go:noinline when you need a function to appear in stack traces for debugging or when inlining causes excessive code bloat in a hot loop.

Use //go:inline when profiling proves a specific function call overhead is a bottleneck and the compiler refused to inline it. Verify the gain with benchmarks.

Use -ldflags="-s -w" when binary size matters, such as for embedded devices or Docker images, and you can sacrifice debug symbols.

Use -gcflags="-m" when you want to inspect the compiler's optimization decisions. The trace output reveals what gets inlined and why.

Accept package-level optimization when build speed is the priority. The build cache lets you iterate fast. Full LTO would slow down every build.

The build cache is the real optimization. Protect it.

Where to go next