How to Work with Avro in Go

The data pipeline problem

You are building a service that produces events for a data lake. JSON is too verbose; the field names repeat in every record, bloating storage and bandwidth. Protocol Buffers require a code generation step that slows down iteration and adds build complexity. You need a format that is compact, schema-driven, and flexible enough to evolve over time without breaking old readers.

Avro fills that gap. It keeps the schema separate from the data. The binary payload contains only values, not keys. This makes Avro smaller and faster than JSON. The schema can evolve independently, allowing writers and readers to operate on different versions. In Go, you work with Avro through a library that parses the schema string and handles the binary conversion. You define Go structs with tags, encode them to bytes, and decode bytes back to structs. The library enforces the contract between your code and the data format.

How Avro works

Avro is a serialization format designed for data systems. Unlike JSON, it does not embed field names in the output. The schema defines the structure, types, and order of fields. The encoder writes values sequentially based on the schema. The decoder reads values and maps them back using the schema. This separation reduces payload size and speeds up processing.

The schema is written in JSON-like syntax. It describes records, primitives, arrays, maps, and unions. A record is a collection of named fields. Each field has a name and a type. Types include string, int, long, float, double, boolean, bytes, and complex types like array and map.

Avro supports schema evolution by separating the writer schema from the reader schema. When you encode data, you use the writer schema. When you decode, you use the reader schema. The library reconciles differences. If the writer adds a field that the reader does not know about, the reader skips it. If the writer removes a field, the reader uses the default value defined in the reader schema. This allows you to update your data format without breaking existing consumers. You must provide default values for optional fields to make this work.

In Go, the hamba/avro library is the standard tool. It parses the schema string into a compiled object. You create encoders and decoders from that object. The library maps Go types to Avro types. Struct tags tell the library which struct field corresponds to which schema field.

Avro separates schema from data. Keep the schema versioned and accessible.

Minimal example

Start with the schema and the struct. Avro relies on a JSON-like schema definition and struct tags to map fields. The tags must match the field names in the schema exactly.

package main

import (
	"github.com/hamba/avro/v2"
)

// User maps to the Avro record. Tags tell the library which field matches which schema key.
type User struct {
	Name string `avro:"name"`
	Age  int    `avro:"age"`
}

func setupSchema() *avro.Schema {
	// Parse converts the JSON schema string into an optimized internal representation.
	schema, err := avro.Parse(`{"type":"record","name":"User","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}`)
	if err != nil {
		panic(err)
	}
	return schema
}

Encoding and decoding use the compiled schema and an io.Writer or io.Reader. The library handles the binary conversion. You pass a pointer to the struct for decoding so the library can modify it.

func main() {
	schema := setupSchema()
	buf := new(bytes.Buffer)

	// Encoder writes binary data to the buffer using the schema rules.
	enc := avro.NewEncoder(schema, buf)
	// Decoder reads binary data from the buffer and reconstructs the struct.
	dec := avro.NewDecoder(schema, buf)

	user := User{Name: "Alice", Age: 30}
	// Encode serializes the struct. Always check the error.
	if err := enc.Encode(user); err != nil {
		panic(err)
	}

	var result User
	// Decode populates the struct. Pass a pointer so the library can modify it.
	if err := dec.Decode(&result); err != nil {
		panic(err)
	}
	fmt.Println(result)
}

Encode writes bytes. Decode reads bytes. The schema is the contract.

Walkthrough

The avro.Parse call takes the JSON schema string and builds a schema object. This object knows the types, field names, and order. Parsing is relatively expensive because it validates the schema and builds internal structures. You should parse the schema once and reuse the object.

The avro.NewEncoder call creates an encoder tied to that schema. The encoder holds a reference to the schema and the writer. When you call Encode, the encoder walks the Go struct. It looks up the avro tags to find the corresponding schema fields. It converts Go values to Avro binary representation and writes them to the buffer. The binary format uses variable-length encoding for integers and other optimizations to save space.

The avro.NewDecoder call creates a decoder. When you call Decode, the decoder reads bytes from the buffer. It checks the types against the schema. It writes the values into the struct fields. If the data does not match the schema, the decoder returns an error.

The buffer acts as the transport. In a real application, this would be a file, a network socket, or a Kafka message. The encoder and decoder work with any io.Writer and io.Reader, so you can stream data directly without buffering everything in memory.

Realistic example

Production code wraps encoding in functions. You parse the schema once at startup and reuse it. Parsing per-request kills performance. The schema object is thread-safe for reading, so you can share it across goroutines. However, encoders and decoders are not thread-safe. You must create a new encoder for each goroutine or protect it with a mutex.

Go embraces verbose error checking. The if err != nil pattern makes failure paths explicit. In Avro code, schema mismatches are the most common runtime error. Checking the error from Encode and Decode catches structural drift early. Wrap errors to provide context for the caller.

// SchemaCache holds the compiled schema to avoid repeated parsing.
var schemaCache *avro.Schema

func init() {
	// Parse the schema once when the package loads.
	s, err := avro.Parse(`{"type":"record","name":"User","fields":[{"name":"name","type":"string"},{"name":"age","type":"int"}]}`)
	if err != nil {
		panic("invalid schema: " + err.Error())
	}
	schemaCache = s
}

// SerializeUser converts a user to binary Avro bytes.
func SerializeUser(user User) ([]byte, error) {
	buf := new(bytes.Buffer)
	// Reuse the cached schema for encoding.
	enc := avro.NewEncoder(schemaCache, buf)

	if err := enc.Encode(user); err != nil {
		// Wrap errors to provide context for the caller.
		return nil, fmt.Errorf("serialize user: %w", err)
	}
	return buf.Bytes(), nil
}

Streaming multiple records is common in data pipelines. You can write a loop that encodes each record sequentially. The encoder handles the stream efficiently. If you are writing to a network connection, pass the connection directly to the encoder.

// StreamUsers writes multiple users to an io.Writer.
func StreamUsers(schema *avro.Schema, w io.Writer, users []User) error {
	enc := avro.NewEncoder(schema, w)
	for _, u := range users {
		// Encode each record sequentially.
		if err := enc.Encode(u); err != nil {
			return fmt.Errorf("stream user: %w", err)
		}
	}
	return nil
}

Functions that perform I/O should accept a context.Context as the first argument. Even if the Avro library does not use it, the underlying writer might support cancellation. Pass ctx through to respect deadlines.

Cache the schema. Create encoders per request. Check every error.

Pitfalls and errors

Avro is strict about types. If the schema says int and you pass a float64, the library rejects it. Go's type system helps, but Avro types do not map 1:1 to Go types in every case. Avro int is a 32-bit signed integer. Go int is platform-dependent. Use int32 in your struct to match Avro int exactly. If you use int, the library might handle the conversion, but explicit types prevent subtle bugs on 32-bit versus 64-bit systems.

Struct tags must match the schema. If your struct tag does not match the schema field name, the encoder fails. The error surfaces at runtime, not compile time. If you decode into a struct with the wrong field type, you get a runtime error like avro: cannot decode int into string. If you forget a tag, the library cannot map the field and returns an error like avro: field "name" not found in schema.

The avro.Encoder is not safe for concurrent use. If two goroutines call Encode on the same encoder instance, you get data races and corrupted output. Create a new encoder for each goroutine, or use a pool. The schema object itself is safe to read concurrently, so you can share the schema but not the encoder.

Run gofmt on your code. The community expects standard formatting. It removes noise and lets you focus on logic. Most editors run it on save. Do not argue about indentation; let the tool decide.

Mismatched types crash at runtime. Validate schemas early.

Decision matrix

Use Avro when you are building data pipelines where the schema evolves over time and you want compact binary storage without code generation.

Use JSON when you need human-readable payloads for APIs or debugging, and the extra size does not hurt performance.

Use Protocol Buffers when you want compile-time checks, strict schema enforcement, and generated Go code for maximum speed.

Use the standard library encoding/gob when you are serializing Go types for internal caching and do not care about cross-language interoperability.

Where to go next

Avro is a way to save data in a compact format that includes its own definition of what the data looks like. You use it when you need to send data between different systems or store it efficiently without losing track of field types. Think of it like shipping a box with a label that tells the receiver exactly how to unpack it.