How to Use the Outbox Pattern in Go for Reliable Messaging

Web
Use the Outbox Pattern to guarantee message delivery by storing messages in a database table within the same transaction as your business logic and processing them asynchronously.

The phantom sale problem

You build an order service. A user clicks "Buy". Your code saves the order to PostgreSQL, then fires a message to Kafka so the inventory service can update stock. The database commit succeeds. The Kafka broker is down for maintenance. The message vanishes. Your inventory stays at 100, but the order says "paid". You just created a phantom sale.

The opposite failure is equally messy. You send the message to Kafka first, then crash before the database transaction commits. The inventory service sees a deduction for an order that never existed. You now owe a customer a refund and a confused apology.

Both failures happen because you are trying to coordinate two independent systems inside a single request. The database guarantees atomicity for its own tables. The message broker guarantees delivery for its own queues. Neither guarantees that the other will play along.

How the outbox pattern works

The outbox pattern solves this by treating the message like another row in your database. Instead of talking to a message broker directly inside your business transaction, you write the message to a dedicated outbox table. The database guarantees that either both the order and the outbox row save, or neither does. A background process then reads from that table and forwards the messages to the broker. The broker becomes a secondary destination, not a point of failure.

Think of it like a mailroom. When you finish a letter, you don't run to the post office yourself. You drop it in the outgoing tray. The tray is inside your building, so you know it's safe. A dedicated mail clerk picks up the tray, walks to the post office, and handles the delivery. If the clerk trips, the letter stays in the tray. Nothing is lost.

The transaction boundary

Here is the simplest transaction that saves business data and an outbox record together.

// CreateOrder persists an order and queues a notification in the same transaction.
func (s *Service) CreateOrder(ctx context.Context, order Order) error {
    // Start a transaction bound to the incoming context.
    tx, err := s.db.BeginTx(ctx, nil)
    if err != nil {
        return err
    }
    defer tx.Rollback() // Rolls back automatically if commit fails or panics.

    // Save the core business record.
    if err := s.saveOrder(tx, order); err != nil {
        return err
    }

    // Queue the event for later dispatch.
    if err := s.saveOutboxMessage(tx, order.ID, "order.created", order); err != nil {
        return err
    }

    // Commit both rows atomically.
    return tx.Commit()
}

The database handles the heavy lifting. When tx.Commit() returns without error, PostgreSQL has written both the order and the outbox row to disk. If the network drops after the commit, the data is still there. If the broker is offline, the outbox row waits. The transaction boundary is your safety net.

Go conventions matter here. The ctx parameter comes first, which is the standard for any function that might block or make network calls. The receiver name s is short and matches the type Service. The if err != nil checks are verbose by design. They force you to acknowledge every failure point instead of hiding it behind a silent logger.

The compiler will reject the program with cannot use tx (type *sql.Tx) as type *sql.DB in argument if you accidentally pass the transaction to a method that expects a plain database handle. Always type your repository methods to accept a driver.Conn or a custom interface that wraps ExecContext and QueryContext. This keeps your code testable and prevents accidental cross-transaction queries.

The background dispatcher

Writing to the outbox is only half the pattern. You need a process that reads those rows and pushes them to the broker. A polling worker is the most straightforward approach.

// StartOutboxProcessor runs a background loop that dispatches pending messages.
func (s *Service) StartOutboxProcessor(ctx context.Context) {
    // Check for pending messages every two seconds.
    ticker := time.NewTicker(2 * time.Second)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            return // Graceful shutdown when the parent context cancels.
        case <-ticker.C:
            s.dispatchPending(ctx)
        }
    }
}

The dispatcher fetches rows, publishes them, and marks them as sent. It needs to handle failures without losing data.

// dispatchPending reads outbox rows and publishes them to the message broker.
func (s *Service) dispatchPending(ctx context.Context) {
    // Fetch a small batch to avoid holding locks too long.
    rows, err := s.db.QueryContext(ctx,
        "SELECT id, payload, destination FROM outbox WHERE status = 'pending' LIMIT 50 FOR UPDATE SKIP LOCKED")
    if err != nil {
        return // Log the error in production, but don't crash the loop.
    }
    defer rows.Close()

    for rows.Next() {
        var id int64
        var payload json.RawMessage
        var dest string
        if err := rows.Scan(&id, &payload, &dest); err != nil {
            continue
        }

        // Publish to the external broker.
        if err := s.broker.Publish(ctx, dest, payload); err != nil {
            continue // Leave status as pending so the next tick retries.
        }

        // Mark as sent only after successful delivery.
        s.db.ExecContext(ctx, "UPDATE outbox SET status = 'sent' WHERE id = $1", id)
    }
}

The FOR UPDATE SKIP LOCKED clause is the quiet hero here. It tells PostgreSQL to lock the rows you are reading, but skip any rows already locked by another dispatcher instance. This lets you run multiple workers without them stepping on each other. The SKIP LOCKED hint prevents worker starvation when one instance is processing a heavy batch.

The polling interval is a tradeoff. Two seconds is fast enough for most business flows without hammering the database. If you need sub-second delivery, you will need to switch to Change Data Capture, which reads the database transaction log directly. Polling is simpler and easier to debug.

Where things go wrong

The outbox pattern moves the failure boundary from the request path to the background path. That shift introduces new failure modes.

Duplicate processing is the most common runtime issue. If the broker acknowledges a message but your code crashes before updating the status column, the next poll will send the same message again. Your downstream service must be idempotent. It should check whether it has already processed a given event ID before applying state changes. If you ignore idempotency, a single network blip will double-charge a customer or double-ship an order.

Context leaks happen when the polling loop ignores cancellation. If you spawn a goroutine for each message and forget to pass a derived context, those goroutines will outlive the parent process. The runtime will complain with all goroutines are asleep - deadlock! during shutdown if you block on a channel that never closes. Always derive a short-lived context for each dispatch attempt.

Transaction isolation can bite you if you read the outbox table outside a transaction. Without FOR UPDATE, two workers might read the same pending row, both publish it, and both mark it as sent. You will waste broker bandwidth and risk downstream duplicates. Wrap your read in a transaction or use the skip-locked hint consistently.

The compiler will reject your code with undefined: ctx if you forget to pass the context to a database call. Modern Go tooling catches this quickly, but it is a reminder that context propagation is plumbing. Run it through every long-lived call site.

Choosing your dispatch strategy

Use a polling outbox worker when you need simple, predictable delivery and can tolerate a few seconds of latency. Use a Change Data Capture pipeline when you require sub-second propagation and want to avoid polling overhead. Use a synchronous broker call with a local retry queue when the message is critical and you cannot afford any background delay. Use plain sequential code when you don't need external messaging: the simplest thing that works is usually the right thing.

The outbox pattern is not a silver bullet. It adds a table, a background loop, and idempotency requirements to your architecture. It is worth the complexity when data consistency matters more than raw speed. When the broker goes down, your database keeps the promise. When your app restarts, the outbox picks up where it left off.

Trust the transaction. Let the background worker handle the noise.

Where to go next