Read replicas and write splitting in Go
Your service hits a wall. The database CPU spikes to 100 percent. New users can't sign up because the analytics dashboard is hogging every available connection. You provision a read replica to offload queries. Now you have two database endpoints. Go doesn't know about replicas. The standard library doesn't route queries automatically. You have to build the routing logic yourself.
Write splitting means sending mutations to the primary database and queries to a replica. The primary handles inserts, updates, and deletes. The replica receives a copy of the data, usually via streaming replication. Your Go application manages two connection pools and decides which one to hit based on the operation. This gives you full control over routing, failover, and consistency guarantees.
Two pools, one app
Go's database/sql package provides a connection pool wrapper around a driver. It does not implement query routing. You create separate *sql.DB instances for each endpoint. One instance connects to the primary. The other connects to the replica. Your code picks the right instance for each query.
This design is explicit. You see exactly where data flows. You can tune the pools independently. The write pool might need fewer connections with longer timeouts. The read pool might need more connections to handle high traffic. You can add fallback logic, lag checks, or session affinity without fighting a framework.
The convention in Go is to accept interfaces and return structs. Your routing layer returns a struct that wraps the pools. Functions that perform database work accept a context.Context as the first parameter, conventionally named ctx. This lets callers cancel long-running queries or enforce deadlines. The if err != nil pattern is verbose by design. It keeps the error path visible and forces you to handle failures explicitly.
Minimal routing setup
Here's the simplest structure: two global pools, one for writes and one for reads.
package main
import (
"database/sql"
"fmt"
"log"
)
// writeDB handles all mutations on the primary.
var writeDB *sql.DB
// readDB serves queries from the replica.
var readDB *sql.DB
func init() {
// Open primary connection pool.
// sql.Open validates the DSN but does not connect immediately.
w, err := sql.Open("postgres", "postgres://primary:5432/app")
if err != nil {
log.Fatal(err)
}
writeDB = w
// Open replica connection pool.
r, err := sql.Open("postgres", "postgres://replica:5432/app")
if err != nil {
log.Fatal(err)
}
readDB = r
}
func main() {
// Verify connectivity to both pools.
if err := writeDB.Ping(); err != nil {
log.Fatalf("write pool failed: %v", err)
}
if err := readDB.Ping(); err != nil {
log.Fatalf("read pool failed: %v", err)
}
// Write to primary.
writeDB.Exec("INSERT INTO users (name) VALUES ('Alice')")
// Read from replica.
var name string
readDB.QueryRow("SELECT name FROM users").Scan(&name)
fmt.Println(name)
}
sql.Open returns a handle that manages a pool of connections. It does not dial the server. Call Ping to verify the connection and trigger the first dial. The pools are independent and thread-safe. You can call methods on writeDB and readDB concurrently from multiple goroutines. The pools reuse connections and handle concurrency internally.
Two pools, one app. Route by intent.
Walking through the runtime
When you call writeDB.Exec, the pool hands out a connection from the primary pool. The query runs on the primary. The connection returns to the pool. When you call readDB.QueryRow, the pool hands out a connection from the replica pool. The query runs on the replica. The connection returns to the pool.
The pools do not share state. A transaction started on writeDB cannot query readDB. Transactions are bound to a single connection. If you try to use a transaction from the write pool on the read pool, the compiler rejects the code with cannot use tx *sql.Tx as *sql.DB in argument. The types are distinct. You must route at the pool level, not the transaction level.
Context propagation works across pools. If you pass a context with a deadline to ExecContext or QueryContext, the pool respects the deadline. If the context cancels, the query stops and the connection returns to the pool. This prevents goroutine leaks waiting for slow queries. Always pass context to database calls. Context is plumbing. Run it through every long-lived call site.
Realistic service layer
In production, you wrap the pools in a struct. You expose methods that hide the routing logic. This keeps your handlers clean and makes testing easier.
package main
import (
"database/sql"
"context"
)
// DB wraps two connection pools for routing.
type DB struct {
write *sql.DB
read *sql.DB
}
// NewDB initializes both pools.
func NewDB(writeDSN, readDSN string) (*DB, error) {
w, err := sql.Open("postgres", writeDSN)
if err != nil {
return nil, err
}
r, err := sql.Open("postgres", readDSN)
if err != nil {
return nil, err
}
return &DB{write: w, read: r}, nil
}
The struct holds private fields. The receiver name is db, matching the type abbreviation. This follows Go naming conventions. One or two letters for receivers is standard. Avoid this or self.
// CreateUser inserts into the primary.
func (db *DB) CreateUser(ctx context.Context, name string) error {
// ExecContext routes to the write pool.
// The context propagates cancellation to the primary.
_, err := db.write.ExecContext(ctx, "INSERT INTO users (name) VALUES ($1)", name)
return err
}
// GetUser fetches data from the replica.
func (db *DB) GetUser(ctx context.Context, id int) (string, error) {
var name string
// QueryRowContext routes to the read pool.
// The context propagates cancellation to the replica.
err := db.read.QueryRowContext(ctx, "SELECT name FROM users WHERE id = $1", id).Scan(&name)
if err != nil {
return "", err
}
return name, nil
}
Methods take ctx as the first parameter. This is a hard convention in Go. Functions that accept a context always put it first. Callers can pass context.Background() for top-level calls or a request-scoped context in HTTP handlers. The if err != nil block returns the error immediately. This makes the unhappy path visible. The community accepts the boilerplate because it prevents silent failures.
Wrap the pools. Expose methods. Hide the routing.
Pitfalls and runtime behavior
Replica lag is the biggest challenge. Replicas replicate data asynchronously. There is a delay between a write on the primary and the update appearing on the replica. If you write a user and immediately read it, you might see stale data. The user signed up, but the dashboard shows nothing. This breaks user experience.
Handle lag with read-your-writes consistency. After a write, route subsequent reads for that session to the primary for a short window. You can track this with a session cookie or a timestamp. Alternatively, use a library that supports session affinity. The compiler won't help you here. Lag is a runtime phenomenon. You must design for eventual consistency or implement fallback logic.
Replica failure is another risk. If the replica goes down, reads fail. You might want to fall back to the primary. But the primary is busy handling writes. Falling back can overload the primary. Implement a circuit breaker. If the replica fails, route reads to the primary but limit the rate. Or return a cached response. Monitor replica health and alert on lag.
Cross-pool transactions are impossible. You cannot start a transaction on writeDB and query readDB in the same transaction. The database rejects this. If you try to pass a transaction object to a method expecting a pool, the compiler complains with cannot use tx *sql.Tx as *sql.DB in argument. Design your operations to fit within a single pool. If you need atomicity across reads and writes, use the primary for both.
Trust the type system. Route by pool, not by query string.
When to use write splitting
Use separate pools when you have distinct primary and replica endpoints and need to offload reads. Use a single pool when your database handles routing internally via connection parameters or when the complexity of routing outweighs the benefit. Use a proxy like PgBouncer or Pgpool when you want to centralize routing and connection management outside the application. Use read-from-primary fallback when replica lag causes user-visible inconsistencies and you can tolerate higher primary load. Use a single pool for development to avoid infrastructure complexity and simplify debugging.
Route reads to replicas. Route writes to primary. Keep it simple.