How to Use TensorFlow or ONNX Runtime from Go

You cannot use TensorFlow directly from Go because it lacks an official Go API, but you can easily use ONNX Runtime via its official Go bindings to execute pre-trained models.

The TensorFlow dead end in Go

You spent weeks training a model in Python. The accuracy is solid. The product team wants it deployed in the Go backend to handle requests alongside the rest of the microservices. You open your editor, type import "tensorflow", and the compiler rejects the program with undefined: tensorflow. TensorFlow does not provide an official Go API. The official team focuses on Python and C++. Community bindings exist, but they are often stale, unmaintained, or incompatible with modern Go versions. You cannot run a TensorFlow SavedModel directly in Go.

The path forward requires a translation layer. You must convert the model to a universal format and use a runtime that supports Go. That format is ONNX. That runtime is ONNX Runtime. This approach lets you keep your Python training pipeline while leveraging Go's concurrency and performance for inference.

ONNX as the universal translator

ONNX stands for Open Neural Network Exchange. It is an open format designed to represent machine learning models. Think of it like PDF for documents. You write a document in Word, you export it to PDF, and anyone can read it regardless of the software they use. You train a model in TensorFlow, you export it to ONNX, and ONNX Runtime executes it.

ONNX Runtime is a high-performance inference engine with a C++ core. It supports multiple hardware accelerators and provides official bindings for Go. The Go bindings wrap the C++ API using CGO. This means your Go code calls into a native shared library. You get near-native performance without the overhead of a Python process. The trade-off is that you must manage the C++ side of the bridge carefully.

Converting the model

The conversion happens in Python. You need the tf2onnx library to translate TensorFlow's graph representation into ONNX. This step is mandatory because Go cannot parse TensorFlow's .pb or SavedModel formats.

The conversion command requires the source model, the output path, and an opset version. The opset defines which operators are available. ONNX operators evolve over time. A higher opset version includes newer operators but requires a newer runtime. Mismatched opsets cause runtime errors.

# Install the converter library
pip install tf2onnx

# Convert the SavedModel to ONNX format
# Opset 17 ensures compatibility with modern operators
python -m tf2onnx.convert \
  --saved-model ./my_model \
  --output my_model.onnx \
  --opset 17

The output is a single .onnx file. This file contains the model graph, weights, and metadata. You can move this file to your Go project. The Go code only needs this file and the ONNX Runtime library.

Loading and running in Go

The Go bindings require the ONNX Runtime C++ library to be installed on the system. You install it via your package manager (apt, brew, or vcpkg). The Go compiler links against this library using CGO. If the library is missing, the build fails with a linker error.

Here is the minimal code to load a model and run inference.

package main

import (
	"fmt"
	"log"

	"github.com/yalue/onnxruntime-go"
)

func main() {
	// Environment manages global state and thread pools
	env, err := onnxruntime.NewEnvironment("CPU", "TensorFlow-Model")
	if err != nil {
		log.Fatal(err)
	}
	defer env.Release() // Release cleans up global resources

	// Session loads the ONNX graph into memory
	session, err := env.NewSession("my_model.onnx", nil)
	if err != nil {
		log.Fatal(err)
	}
	defer session.Release() // Release frees the session memory

	// Input data must match the model's expected shape
	inputData := []float32{1.0, 2.0, 3.0, 4.0}
	inputTensor, err := onnxruntime.NewTensorFromFloat32(inputData, []int64{1, 4})
	if err != nil {
		log.Fatal(err)
	}
	defer inputTensor.Release() // Release frees the tensor buffer

	// Run executes the graph and returns output tensors
	outputs, err := session.Run(
		[]string{"input_tensor"},
		[]onnxruntime.Value{inputTensor},
		[]string{"output_tensor"},
	)
	if err != nil {
		log.Fatal(err)
	}
	defer outputs[0].Release() // Release frees each output tensor

	// Extract the result from the output tensor
	result := outputs[0].AsFloat32()
	fmt.Printf("Inference result: %v\n", result)
}

The code follows a strict pattern. You create an environment, create a session, create tensors, run the session, and extract results. Every resource creation has a matching Release call. The defer statements ensure cleanup happens when the function returns. This is critical because the garbage collector does not manage C++ memory. If you forget to release, you leak memory.

How CGO and tensors work

The Go bindings are a thin wrapper around C++. When you call NewTensorFromFloat32, the library allocates memory in the C++ heap. The Go slice inputData is copied into that memory. The tensor object holds a pointer to the C++ buffer. When you call Release, the library frees the C++ buffer.

Tensor shapes are arrays of integers. The shape []int64{1, 4} means a batch size of 1 and a feature size of 4. The total number of elements must match the length of the Go slice. If the shape and slice length disagree, the compiler or runtime rejects the program. The compiler might complain with cannot use ... as ... if the types mismatch, but shape mismatches often surface at runtime with an error like invalid tensor shape.

Go slices are one-dimensional. Tensors are N-dimensional. You must flatten multi-dimensional data into a single slice before creating the tensor. For an image of size 224x224 with 3 channels, the shape is []int64{1, 3, 224, 224}. The slice length must be 1 * 3 * 224 * 224. The order of elements matters. ONNX typically uses NCHW format (batch, channels, height, width). TensorFlow often uses NHWC. The conversion tool usually handles the layout change, but you must verify the input shape matches what ONNX Runtime expects.

A realistic inference pipeline

Real applications require preprocessing and postprocessing. You cannot pass raw bytes to the model. You must normalize values, reshape data, and interpret outputs. Here is a function that classifies an image.

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/yalue/onnxruntime-go"
)

// ClassifyImage runs inference on normalized pixel data
func ClassifyImage(ctx context.Context, session *onnxruntime.Session, pixels []float32) (string, error) {
	// Context allows cancellation of long-running inference
	if err := ctx.Err(); err != nil {
		return "", err
	}

	// Shape defines NCHW layout: batch=1, channels=3, height=224, width=224
	shape := []int64{1, 3, 224, 224}
	
	// Create tensor from the flattened pixel slice
	inputTensor, err := onnxruntime.NewTensorFromFloat32(pixels, shape)
	if err != nil {
		return "", fmt.Errorf("creating tensor: %w", err)
	}
	defer inputTensor.Release()

	// Run inference with input and output names from the model
	outputs, err := session.Run(
		[]string{"input.1"},
		[]onnxruntime.Value{inputTensor},
		[]string{"output.1"},
	)
	if err != nil {
		return "", fmt.Errorf("running inference: %w", err)
	}
	defer outputs[0].Release()

	// Extract probabilities for each class
	probabilities := outputs[0].AsFloat32()
	
	// Find the class with the highest probability
	maxIdx := 0
	for i, p := range probabilities {
		if p > probabilities[maxIdx] {
			maxIdx = i
		}
	}

	// Map index to label (simplified for example)
	return fmt.Sprintf("class-%d", maxIdx), nil
}

The function takes a context, a session, and pixel data. It checks the context first. This respects Go conventions for cancellation. It creates the tensor, runs the session, and extracts the result. The error wrapping with %w preserves the error chain. The receiver name convention does not apply here since this is a function, but the pattern if err != nil { return "", err } is standard. The boilerplate makes the error path visible.

Pitfalls and runtime errors

CGO introduces complexity. The most common issue is missing shared libraries. If libonnxruntime.so is not in the library path, the program fails at startup with error while loading shared libraries. You must ensure the library is installed on the deployment target. Docker images need the library copied in.

Opset mismatches are another trap. If you convert with opset 17 but the runtime only supports opset 13, the session creation fails. The error message usually mentions unsupported operator or opset version. Check the runtime version and the converter opset.

Memory leaks happen when Release is skipped. The Go garbage collector will not free C++ memory. A leak might not crash the program immediately, but it will exhaust memory over time. Always pair allocation with release. Use defer for simple cases. For loops or pipelines, ensure release happens before the next iteration.

Tensor shape errors are silent killers. If the shape array is wrong, the runtime might panic or return garbage. Validate the shape against the model's input specification. Print the shape during development.

Decision matrix

Use ONNX Runtime when you need to run complex models locally within a Go service and require high performance. Use TensorFlow Lite when you are deploying to mobile devices or microcontrollers with strict size constraints. Use a Python subprocess when the model relies on TensorFlow-specific operations that ONNX cannot convert and you can tolerate the overhead. Use a dedicated inference server when you need horizontal scaling and GPU management across a cluster.

ONNX is the bridge. Build it once, run it anywhere. CGO bridges Go and C++. You manage the C++ memory. The garbage collector does not. Opset versions are contracts. Check them before you deploy.

Where to go next