The N+1 trap
You build a GraphQL endpoint to list users and their recent posts. The query looks simple. You write a resolver that fetches a user, then loops through their posts to fetch details. It works in development with three users. You deploy to staging with fifty users and the database connection pool exhausts itself in seconds. The latency spikes from 20 milliseconds to four seconds. You just hit the N+1 query problem.
The N+1 problem happens when you execute one query to get a list of items, then N additional queries to fetch related data for each item. If you fetch ten users and then query the database for each user's profile image, you run eleven queries total. GraphQL resolvers execute independently, which makes this pattern easy to trigger by accident. Each resolver asks for what it needs without knowing what its neighbors are doing.
The solution is batching. A data loader collects all the IDs requested during a single request cycle, groups them, and executes one aggregated query. It also caches results so the same ID isn't fetched twice. Batching turns a quadratic explosion of database calls into a linear scan.
Batching and caching
Think of a data loader like a grocery delivery service. If ten neighbors each call the store to order milk, the store makes ten deliveries. If the neighbors use a delivery service, the service collects all the milk requests, makes one trip to the store, and distributes the milk. The store handles one transaction instead of ten. The neighbors get their milk faster, and the store stays efficient.
In Go, a data loader lives in the request scope. It attaches to the context.Context so every resolver can access it. When a resolver needs data, it asks the loader. The loader queues the request. If another resolver asks for the same data, the loader returns the cached result immediately. If the loader has enough queued requests, it triggers a batch function that fetches all the data in one shot.
Caching and batching work together. Batching reduces the number of queries. Caching prevents redundant work when multiple fields reference the same entity. A data loader implements both. The cache lives for the request, not the process. When the request finishes, the loader and its cache are discarded.
Minimal loader
Here's the skeleton of a data loader: it checks a cache, then delegates to a batch function that accepts multiple keys.
// DataLoader batches ID requests into a single database call.
type DataLoader struct {
// batchFunc executes a query for a slice of keys.
batchFunc func(ctx context.Context, keys []string) (map[string]any, error)
cache map[string]any
}
// Load retrieves a value by key, triggering a batch if necessary.
func (dl *DataLoader) Load(ctx context.Context, key string) (any, error) {
// Return cached result to avoid redundant work.
if val, ok := dl.cache[key]; ok {
return val, nil
}
// In a real implementation, this queues the key for batch processing.
// This simplified version calls the batch function immediately.
results, err := dl.batchFunc(ctx, []string{key})
if err != nil {
return nil, err
}
// Populate cache and return the specific result.
dl.cache = results
return results[key], nil
}
The Load method is the entry point. It checks the cache first. If the key exists, it returns the value. If the key is missing, it calls the batch function. In this minimal example, the batch function runs immediately with a single key. A production loader queues keys and waits for a threshold or a timer before calling the batch function. This allows multiple resolvers to fire concurrently and get grouped into one batch.
How the batch coordinates
Resolvers run concurrently. If you just call a function, they might interleave. A data loader uses a channel or a mutex to collect keys. It waits for a time.Ticker or a count threshold. This ensures that even if resolvers fire at different times, they get grouped.
The batch function receives a slice of keys. It builds a single query using an IN clause. The query fetches all rows in one round trip. The results are mapped back to the keys. The loader populates the cache and returns the values to the waiting resolvers.
Here's a batch function that fetches users by ID:
// UserBatchFunc fetches users by IDs in a single query.
func UserBatchFunc(ctx context.Context, ids []string) (map[string]*User, error) {
// Build placeholders for the IN clause.
placeholders := make([]string, len(ids))
args := make([]any, len(ids))
for i, id := range ids {
placeholders[i] = "?"
args[i] = id
}
// Query: SELECT id, name FROM users WHERE id IN (?, ?, ?)
query := fmt.Sprintf("SELECT id, name FROM users WHERE id IN (%s)", strings.Join(placeholders, ","))
rows, err := db.QueryContext(ctx, query, args...)
if err != nil {
return nil, fmt.Errorf("batch query failed: %w", err)
}
defer rows.Close()
// Map results by ID.
results := make(map[string]*User)
for rows.Next() {
var u User
if err := rows.Scan(&u.ID, &u.Name); err != nil {
return nil, err
}
results[u.ID] = &u
}
return results, rows.Err()
}
The batch function uses db.QueryContext to respect cancellation. If the request times out, the context cancels and the database driver aborts the query. Always pass ctx to database calls. If you use db.Query, the query might run even after the response is sent, wasting resources. The function returns a map keyed by ID. This allows the loader to match results back to the original requests.
Realistic resolver
In a GraphQL server, resolvers access the loader from the context. The loader is attached to the request context at the start of the request. This ensures isolation between requests.
Here's how a resolver uses the loader:
// PostResolver resolves the author field for a post.
func (r *PostResolver) Author(ctx context.Context, obj *Post) (*User, error) {
// Access the loader from context.
loader := ctx.Value("userLoader").(*DataLoader)
// Load batches the request automatically.
return loader.Load(ctx, obj.AuthorID)
}
The resolver calls Load with the author ID. The loader handles caching and batching. If another post in the same request references the same author, the loader returns the cached user without hitting the database. If the author hasn't been fetched yet, the loader queues the ID. When the batch triggers, all pending author IDs are fetched in one query.
Convention aside: context.Context always goes as the first parameter, conventionally named ctx. Functions that take a context should respect cancellation and deadlines. The receiver name is usually one or two letters matching the type: (r *PostResolver), not (this *PostResolver). Public names start with a capital letter. Private start lowercase.
Pitfalls and errors
Sharing a data loader across requests leaks memory and returns stale data. The cache must be scoped to the request lifecycle. Create a new loader for each request. Attach it to the context. Discard it when the request ends.
If you forget to pass context, the batch query might hang forever. The compiler rejects code with undefined: ctx if you drop the parameter. If your batch function panics, the whole request fails. Wrap errors properly. fmt.Errorf("batch load failed: %w", err) is the standard pattern. The %w verb wraps the error so callers can inspect it with errors.Is or errors.As.
Partial failures require care. If the batch query fails for one ID, the batch function should return a map of errors or a map of results with nils. The loader needs to handle this. If the whole batch fails, all resolvers get an error. If one ID is missing, the loader returns nil for that key. Don't return a partial map without indicating which keys failed.
Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path. If the loader uses a ticker or a channel, stop it when the request ends. The worst goroutine bug is the one that never logs.
The compiler complains with cannot use loader (type *DataLoader) as context.Context value in argument if you mix up the parameter order. Double-check that ctx is first. The compiler also rejects loop variable i captured by func literal if you capture a loop variable in a closure without assigning it to a new variable. This became a hard error in Go 1.22+.
Decision matrix
Use a data loader when your GraphQL schema has nested relationships that trigger multiple database lookups per request. Use a single eager join when the data fits in one query and the result set is small. Use a cache layer like Redis when you need to share data across requests and the data changes infrequently. Use separate resolvers without batching only for prototyping or when the cost of a single query is negligible.
Batching turns a quadratic explosion into a linear scan. The cache lives for the request, not the process. Context flows down, errors flow up.