The promise of static binaries
You flash a microcontroller or spin up a single-board computer, and the first thing you want is a program that just runs. No package managers to install, no runtime dependencies to chase, no segfaults at 3 AM. You write a sensor reader, compile it, and drop the binary onto the device. It works. That is the promise of Go in embedded and IoT environments. The reality is slightly more nuanced. Go brings static binaries and a complete standard library to constrained hardware, but it also ships with a garbage collector and a scheduler that expect at least a few megabytes of RAM. Understanding where Go shines and where it fights you is the difference between a reliable field device and a memory leak that bricks your prototype.
Think of Go like a fully equipped expedition vehicle. It comes with its own water filter, spare tires, and navigation system. You do not need to pack those separately. In software terms, that means your binary includes the runtime, the scheduler, and the garbage collector. On a desktop or server, that overhead is invisible. On a device with 512 kilobytes of RAM, that vehicle is too heavy. You either need to strip the vehicle down to its frame, or you need to pick a lighter tool for the job.
How the runtime actually behaves on constrained hardware
The standard Go compiler produces robust, cross-platform binaries that run on Linux-based IoT gateways and single-board computers. For bare-metal microcontrollers, the ecosystem splits into two paths. Standard Go with CGO handles hardware access on systems with a full OS. TinyGo, a subset of Go optimized for microcontrollers, compiles to bare-metal ARM or RISC-V and provides hardware interrupt support without a garbage collector.
Here is the simplest way to read a hardware sensor on a Linux-based gateway. The program opens a virtual file exposed by the kernel, reads the raw bytes, and prints them.
package main
import (
"fmt"
"log"
"os"
)
// main initializes a basic sensor loop for an ARM-based gateway.
func main() {
// Read a mock sensor value from a file or hardware interface.
data, err := os.ReadFile("/sys/class/thermal/thermal_zone0/temp")
if err != nil {
log.Fatal(err)
}
// Print the raw temperature reading.
fmt.Printf("Temperature: %s", data)
}
Compile it for a Raspberry Pi with GOOS=linux GOARCH=arm64 go build -o sensor-gateway .. The resulting binary runs on the Pi without any Go installation. The compiler statically links everything your program needs. You copy the file over via SSH or USB, make it executable, and run it. No shared libraries to hunt down. No version mismatches.
Cross-compilation works because Go treats the target OS and architecture as explicit build parameters. You do not need to install a cross-compiler toolchain or configure sysroots. The compiler downloads the necessary standard library packages for the target platform on the fly. If you target a 32-bit ARM device, you set GOARCH=arm. If you target a 64-bit RISC-V gateway, you set GOARCH=riscv64. The compiler handles the instruction set translation. You only need to ensure your code does not rely on platform-specific assumptions, like 64-bit pointer sizes or little-endian byte order. The unsafe package and pointer arithmetic will break across architectures. Stick to the standard library and the compiler will do the heavy lifting.
Cross-compile once. Deploy everywhere.
What happens under the hood
When the compiler runs, it resolves your imports, type-checks the code, and generates machine code for the target architecture. The -ldflags="-s -w" flags strip debug symbols and DWARF info, shaving megabytes off the final binary. At runtime, the Go scheduler starts a small pool of OS threads. Each goroutine you spawn gets mapped to one of those threads. The garbage collector runs concurrently with your code, pausing execution for a few microseconds to reclaim memory. On a device with 1 gigabyte of RAM, those pauses are unnoticeable. On a device with 64 megabytes, the GC can dominate CPU time if you allocate aggressively. The runtime also sets up signal handlers, file descriptors, and network stacks. All of that happens before your main function executes.
Go conventions apply even in embedded code. The if err != nil { return err } pattern is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. In an IoT context, swallowing errors means a silent sensor failure that goes unnoticed for weeks. Write the check. Return the error. Let the caller decide whether to retry or log. Trust gofmt. Argue logic, not formatting. Most editors run it on save, and the output is consistent across every developer's machine.
The runtime does the heavy lifting. You write the business logic.
Tuning memory and the garbage collector
Field devices often need to report data over HTTP while running on constrained hardware. You want to keep memory usage predictable and avoid GC thrashing. You can tune the garbage collector and structure your code to reuse buffers.
Here is how you structure a long-running sensor server that reuses memory and tunes the garbage collector. The handler grabs a pre-allocated slice, reads directly into it, and returns the slice to a pool when finished.
package main
import (
"net/http"
"os"
"sync"
)
// bufferPool reuses memory across requests to avoid GC pressure.
var bufferPool = sync.Pool{
// New allocates a 1KB slice when the pool is empty.
New: func() any {
b := make([]byte, 1024)
return &b
},
}
// sensorHandler reads a sensor file and writes it to the response.
func sensorHandler(w http.ResponseWriter, r *http.Request) {
// Grab a pooled buffer and return it when done.
buf := bufferPool.Get().(*[]byte)
defer bufferPool.Put(buf)
// Read directly into the pre-allocated slice.
n, err := os.ReadFile("/sys/class/thermal/thermal_zone0/temp", *buf)
if err != nil {
http.Error(w, "sensor unavailable", http.StatusServiceUnavailable)
return
}
// Send only the bytes that were actually read.
w.Header().Set("Content-Type", "text/plain")
w.Write((*buf)[:n])
}
The sync.Pool reuses byte slices across requests. Each allocation avoided is a microsecond of GC work saved. Setting debug.SetGCPercent(50) tells the runtime to trigger garbage collection sooner, keeping the heap smaller at the cost of slightly more frequent cycles. This trade-off matters when your device has a tight memory budget and cannot afford large allocation spikes.
import (
"log"
"net/http"
"runtime/debug"
)
func main() {
// Trigger GC earlier to keep the heap small on constrained devices.
debug.SetGCPercent(50)
// Bind the handler to port 8080 and block until shutdown.
log.Println("Starting sensor server on :8080")
if err := http.ListenAndServe(":8080", http.HandlerFunc(sensorHandler)); err != nil {
log.Fatal(err)
}
}
Pool your buffers. Tune your GC. Keep the heap predictable.
When standard Go fights you
Embedded Go trips developers on three predictable fronts. The first is unbounded goroutine creation. Spawning a new goroutine per sensor reading or per HTTP request will exhaust your memory. The runtime will panic with runtime: goroutine stack exceeds 1000000000-byte limit or simply kill the process when the OS runs out of RAM. The second is CGO dependency. If you import a C library for hardware access, the compiler links against libgcc and the C standard library. The binary stops being fully static. You will see exec format error or missing shared library warnings when you deploy to a minimal root filesystem. The third is ignoring the scheduler overhead. Go expects to run on a multi-core system or at least a responsive OS. On a bare-metal microcontroller without an OS, standard Go cannot run. You need TinyGo, which compiles to bare-metal ARM or RISC-V and provides hardware interrupt support. If you try to use standard Go on an ESP32 without an RTOS, the linker will fail with undefined reference to 'main' or similar startup symbol errors because the Go runtime expects an OS entry point.
Memory leaks in embedded Go usually come from channels that never close or goroutines waiting on a blocking read. The compiler will not catch a goroutine leak. It only catches unused imports or unreachable code. If you forget to close a channel, the goroutine waiting on it stays alive forever. The runtime will eventually report all goroutines are asleep - deadlock! if the main function exits, or it will just consume RAM until the device reboots. The worst goroutine bug is the one that never logs.
Context cancellation is your escape hatch for long-running loops. Pass a context.Context as the first parameter to every function that might block. Name it ctx by convention. Check ctx.Done() before starting expensive operations. If the device receives a shutdown signal, the context cancels, the goroutine exits cleanly, and the runtime reclaims the stack. Context is plumbing. Run it through every long-lived call site.
Decision matrix
Picking the right tool depends on your hardware and your constraints.
Use standard Go when you are targeting Linux-based gateways, Raspberry Pi devices, or any system with at least 256 megabytes of RAM and a full POSIX environment. Use TinyGo when you need bare-metal microcontroller support, deterministic execution without a garbage collector, or sub-megabyte binary sizes. Use CGO only when you have an existing C driver that cannot be rewritten and your target system includes the C standard library. Use plain sequential code with explicit buffer pooling when memory is tight and allocation patterns are predictable. Use a worker pool with bounded concurrency when you need to handle multiple sensors or network connections without spawning unbounded goroutines. Use context.Context cancellation when you need to gracefully shut down background tasks on device reboot. Use static linking with stripped debug symbols when storage space on the target filesystem is limited.
Match the tool to the silicon. Do not force a runtime where it does not belong.