gRPC Load Balancing in Go

When one backend isn't enough

You deploy a Go service that calls a gRPC microservice for order processing. It works perfectly in development against a single instance. In staging, you scale the order service to three containers behind a DNS name. Suddenly, half your requests hit a container that is restarting. The other half queue up on the remaining two. Latency spikes. Timeouts fire. You need a way to spread the traffic evenly and keep your application responsive when one node stumbles.

How client-side load balancing works

Load balancing distributes incoming requests across multiple servers. Server-side load balancers like nginx or Envoy sit in front of your backends and route traffic before it reaches your application. Client-side load balancing lives inside your code. The client decides which server to talk to before sending the request.

Go does not ship with a built-in gRPC load balancer. The standard library and the official google.golang.org/grpc package keep the runtime lean by leaving routing logic to you or to external infrastructure. This is a deliberate design choice. Explicit configuration beats hidden magic. You either place a proxy in front of your services or you write the distribution logic yourself. This gives you full visibility into failure modes and connection lifecycle.

Client-side balancing typically uses one of three strategies. Round-robin cycles through a list of addresses in order. Least-connections sends requests to the server with the fewest active streams. Random picks an address without tracking state. Round-robin is the simplest to implement and works well when all backends have similar capacity and latency.

The round-robin dialer

The official gRPC package for Go exposes a dialer hook that lets you intercept the TCP connection step. You can use an atomic counter to cycle through a slice of addresses. The counter increments safely across goroutines, and the modulo operator wraps it back to zero when it reaches the end of the list.

package main

import (
	"context"
	"net"
	"sync/atomic"
	"time"

	"google.golang.org/grpc"
)

// serverIndex tracks the next backend to dial.
var serverIndex uint64

// servers holds the list of gRPC backend addresses.
var servers = []string{"addr1:50051", "addr2:50051", "addr3:50051"}

// DialBackend selects the next server in the list and opens a TCP connection.
func DialBackend(ctx context.Context, addr string) (net.Conn, error) {
	// Bump the counter atomically to avoid race conditions across concurrent goroutines.
	idx := atomic.AddUint64(&serverIndex, 1) % uint64(len(servers))
	
	// Use a timeout so a hung server does not block the dialer indefinitely.
	return net.DialTimeout("tcp", servers[idx], 5*time.Second)
}

func main() {
	// Pass the custom dialer to the gRPC client configuration.
	conn, err := grpc.DialContext(
		context.Background(),
		"", // Address is ignored because the dialer picks the real target.
		grpc.WithDialer(DialBackend),
		grpc.WithInsecure(), // Skip TLS for local testing.
	)
	if err != nil {
		// Handle connection failure immediately.
		panic(err)
	}
	defer conn.Close()
	// Use conn to create a gRPC client stub.
}

The dialer signature must match func(context.Context, string) (net.Conn, error). The addr parameter is ignored here because the dialer overrides the target. The atomic counter guarantees that two goroutines calling DialBackend at the exact same nanosecond will never pick the same index.

Walk through what happens

When grpc.DialContext runs, it does not open a network connection immediately. It configures the transport layer and prepares to multiplex streams. The first RPC triggers the dialer. The atomic counter increments. The modulo operation picks an index. net.DialTimeout opens a TCP socket to that address. HTTP/2 multiplexing takes over from there. Subsequent requests to the same backend reuse the existing connection instead of dialing again.

The counter keeps cycling regardless of connection state. If backend two crashes, the next request that lands on index one will fail at the application layer. The gRPC client receives an error, and your code decides whether to retry or return the failure. The dialer itself does not track health. It only distributes. This separation keeps the routing logic fast and predictable. You add health checking and retry logic at a higher level.

Realistic example: adding fallback and context

Production code needs context propagation and explicit error handling. The context.Context parameter travels through the dialer so that request-level timeouts cancel a hanging connection. Error handling follows the standard Go pattern: check immediately, wrap if needed, return early.

// DialWithFallback wraps the round-robin dialer with a retry mechanism.
func DialWithFallback(ctx context.Context, addr string) (net.Conn, error) {
	// Attempt the primary round-robin dial.
	conn, err := DialBackend(ctx, addr)
	if err == nil {
		return conn, nil
	}
	
	// Fall back to the first server if the selected backend is unreachable.
	// This prevents a single dead node from blocking the entire request flow.
	return net.DialTimeout("tcp", servers[0], 3*time.Second)
}

The fallback pattern is simple but effective. It catches transient network partitions without requiring a full service mesh. You still need to respect context cancellation. If the parent context expires while net.DialTimeout is waiting, the dial returns context.DeadlineExceeded. The gRPC client propagates that error to your handler. You log it, update your metrics, and move on.

Convention aside: context.Context always goes as the first parameter, conventionally named ctx. Functions that accept a context should respect cancellation and deadlines. The compiler will not enforce this, but the community treats it as a baseline expectation. Pass it through every long-lived call site.

Pitfalls and compiler traps

Manual load balancing introduces specific failure modes. The most common is the connection storm. If you dial on every request instead of reusing the gRPC connection pool, you exhaust file descriptors and overwhelm the backends. The grpc.DialContext call returns a *grpc.ClientConn that caches connections. Reuse that connection across your service lifecycle.

Stale backends are another trap. Round-robin does not know if a server is down. You will send roughly one-third of your traffic to a dead node until you remove it from the servers slice. The compiler will not warn you about runtime availability. You need a health check loop or an external service discovery system to keep the list accurate.

Signature mismatches trigger immediate compile failures. If you forget the context parameter or swap the return types, the compiler rejects the program with cannot use DialBackend (type func(string) (net.Conn, error)) as type func(context.Context, string) (net.Conn, error) in argument. Fix the signature to match the dialer interface. The error message is verbose but precise. Read it once, adjust the function header, and move forward.

Goroutine leaks happen when the dialer blocks forever on a network partition. Always use net.DialTimeout or pass a context with a deadline. The worst goroutine bug is the one that never logs. Add a short timeout and a fallback path. Trust the timeout. Argue logic, not formatting.

Decision: when to use this vs alternatives

Use a custom dialer when you need simple round-robin distribution without external dependencies and want full control over the connection lifecycle. Use the built-in roundrobin balancer when you want the official gRPC library to handle connection pooling, health checking, and address updates automatically. Use a sidecar proxy like Envoy when you need advanced routing, retries, circuit breaking, or TLS termination outside your application code. Use DNS-based load balancing when your infrastructure already rotates addresses and you do not mind the DNS cache delay.

Where to go next

Go's standard library doesn't automatically split traffic across multiple gRPC servers. You need to write a small piece of code that picks a server address from a list every time a new connection is made, similar to a traffic cop directing cars to different lanes to prevent congestion.