How to Implement a Leaky Bucket Rate Limiter in Go

The database is screaming

You built an API endpoint. It fetches data, transforms it, and returns JSON. It works perfectly in development. You deploy it. Then a user writes a script that hits your endpoint 500 times a second. Your database connection pool exhausts itself. Latency spikes. Other users get timeouts. The system collapses under its own success.

You need a brake. You need a way to say, "I can handle 10 requests per second. Anything faster waits or gets rejected." This is rate limiting. It protects your backend from traffic spikes, whether those spikes come from a viral post, a misconfigured client, or a malicious bot.

The bucket metaphor

Rate limiting algorithms often use water metaphors. The "leaky bucket" is the most common name. Imagine a bucket with a hole in the bottom. Water flows into the bucket at a variable rate. This represents incoming requests. Water leaks out of the hole at a constant rate. This represents your system processing capacity.

If the water level rises above the rim, the bucket overflows. The excess water spills out. In code, the spilled water is a rejected request. The bucket size determines how much burst traffic you can absorb before you start dropping requests. The leak rate determines your steady-state throughput.

The golang.org/x/time/rate package implements this concept. The package name includes x/ because it is maintained by the Go team but lives outside the standard library. This convention signals that the API is stable but might evolve slightly more than core packages. Most Go projects treat x/time/rate as standard.

Tokens versus leaks

The implementation in Go uses a "token bucket" algorithm instead of a literal leaky bucket. The math is equivalent for rate limiting, but the mental model flips.

In a token bucket, tokens drop into the bucket at a fixed rate. The bucket has a maximum capacity. When a request arrives, it consumes a token. If no token is available, the request waits or gets rejected.

The leak rate in the bucket metaphor matches the token refill rate. The bucket capacity matches the burst size. Go's library tracks tokens, not water levels. You ask for a token. The library checks if one exists. If yes, it hands it over. If no, it calculates how long until the next token arrives.

Minimal example

The library exposes a Limiter struct. You create it with a refill rate and a burst capacity. The Wait method blocks the current goroutine until a token is available.

package main

import (
	"context"
	"fmt"
	"time"

	"golang.org/x/time/rate"
)

func main() {
	// Limit is 2 tokens per second. Burst allows up to 4 tokens to accumulate.
	limiter := rate.NewLimiter(rate.Limit(2), 4)

	// Reserve a token. Blocks until available or context cancels.
	if err := limiter.Wait(context.Background()); err != nil {
		fmt.Println("Context cancelled")
		return
	}
	fmt.Println("First request processed")

	// Wait again. Tokens refill continuously.
	if err := limiter.Wait(context.Background()); err != nil {
		fmt.Println("Context cancelled")
		return
	}
	fmt.Println("Second request processed")
}

The rate.Limit type is a float64. It represents tokens per second. A value of 2 means two tokens arrive every second. A value of 0.5 means one token arrives every two seconds. The burst size is an integer. It caps the maximum tokens the bucket can hold.

How the limiter tracks time

The limiter does not use a timer or a ticker. It calculates availability on demand. When you call Wait, the library checks the current time against the last token refill. It computes how many tokens should have accumulated. If the count is positive, it grants the token and updates the state. If the count is zero, it calculates the sleep duration and blocks.

This lazy evaluation means the limiter has zero overhead when idle. It does not spin a goroutine or consume CPU cycles while waiting. The blocking happens in the runtime scheduler. The goroutine sleeps efficiently.

The burst size controls the grace period. A burst of 1 means the bucket holds only one token. After that token is used, the next request must wait for the refill rate. A burst of 100 allows a sudden spike of 100 requests to pass instantly, as long as the average rate stays within the limit.

Burst size is your shock absorber. Set it too low and legitimate traffic gets throttled. Set it too high and the limiter fails to protect your backend.

Realistic example: HTTP middleware

In a web server, you rarely want to block the request goroutine. Blocking holds the HTTP connection open. It consumes memory and file descriptors. If many requests wait, the server runs out of resources.

The better pattern for HTTP is to check availability immediately. If a token is available, proceed. If not, reject the request with a 429 Too Many Requests status. This fails fast. The client gets an immediate response. The server goroutine returns to the pool.

package main

import (
	"net/http"

	"golang.org/x/time/rate"
)

// RateLimitMiddleware wraps an HTTP handler with rate limiting.
// It rejects requests when the limit is exceeded.
func RateLimitMiddleware(limiter *rate.Limiter, next http.HandlerFunc) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		// Allow checks if a token is available right now.
		// It does not block. It returns false if no token exists.
		if !limiter.Allow() {
			http.Error(w, "Too many requests", http.StatusTooManyRequests)
			return
		}
		// Token consumed. Pass the request to the real handler.
		next(w, r)
	}
}

func main() {
	// 5 requests per second, burst of 10.
	limiter := rate.NewLimiter(rate.Limit(5), 10)

	http.HandleFunc("/api/data", RateLimitMiddleware(limiter, func(w http.ResponseWriter, r *http.Request) {
		w.Write([]byte("OK"))
	}))

	// ListenAndServe blocks until the process exits.
	http.ListenAndServe(":8080", nil)
}

The Allow method is thread-safe. The Limiter struct can be shared across all HTTP handlers. You do not need a mutex. The library handles concurrent access internally. This is a critical detail. Many developers wrap the limiter in a mutex out of habit. The mutex adds contention and slows down the hot path. Trust the library.

The Reserve pattern

Sometimes you need more control than Allow or Wait. You might want to know exactly how long to wait before rejecting. Or you might want to reserve a token for a high-priority request while letting low-priority requests fail.

The Reserve method returns a Reservation struct. The reservation holds a timestamp for when the token becomes available. You can inspect the delay without blocking.

package main

import (
	"context"
	"fmt"
	"time"

	"golang.org/x/time/rate"
)

func main() {
	// 1 token per second, burst of 1.
	limiter := rate.NewLimiter(rate.Limit(1), 1)

	// Reserve attempts to claim a token.
	res := limiter.Reserve()

	// Delay returns how long to wait for this reservation.
	// If the token is available now, Delay is zero.
	delay := res.Delay()

	if delay > 0 {
		fmt.Printf("Must wait %v for token\n", delay)
		// You can choose to wait or reject based on business logic.
		time.Sleep(delay)
	}

	// Cancel releases the reservation if you decide not to proceed.
	// This puts the token back into the bucket.
	res.Cancel()
}

The Reserve method is useful for implementing custom backoff strategies. You can check the delay. If the delay exceeds a threshold, you reject the request. If it is small, you wait. This gives you a sliding scale between "fail fast" and "wait forever".

The DelayFrom method lets you check the delay relative to a specific time. This is helpful when you want to batch requests. You can reserve multiple tokens and calculate the latest delay. Then you sleep once for the whole batch.

Pitfalls and compiler errors

Rate limiters are simple, but they have traps. The first trap is blocking in a web handler. Using Wait in an HTTP handler ties up the goroutine. Under load, this causes the server to stall. Use Allow or Reserve for HTTP. Use Wait for background workers or CLI tools where blocking is acceptable.

The second trap is burst size confusion. A burst of 0 is invalid. The library requires a burst of at least 1. If you pass 0, the limiter panics at runtime. The compiler cannot catch this. You must validate the configuration.

The third trap is context cancellation. Wait respects the context. If the context deadline passes, Wait returns an error. You must handle this error. If you ignore it, your code might proceed without a token, defeating the limiter. The compiler rejects the program with error returned but not handled if you use go vet. Always check the error.

// This code panics at runtime because burst is zero.
limiter := rate.NewLimiter(rate.Limit(1), 0)

The fourth trap is dynamic limits. The Limiter allows you to change the rate and burst at runtime using SetLimit and SetBurst. This is safe for concurrent use. However, changing the limit does not affect pending reservations. If a goroutine is already waiting, it continues to wait based on the old rate. This is usually fine, but it can surprise you during load testing.

The compiler complains with undefined: rate if you forget to import the package. The compiler complains with imported and not used if you import it but do not use it. Go enforces clean imports. This keeps your codebase tidy.

Decision matrix

Rate limiting has several patterns. Pick the one that matches your constraints.

Use Wait when you have a background task that can block. Use Wait when the caller is a CLI tool or a worker goroutine that does not hold an HTTP connection. Use Wait when you want the simplest code and can tolerate blocking.

Use Allow when you need to fail fast. Use Allow in HTTP handlers to reject excess traffic immediately. Use Allow when you want to preserve server resources by not blocking goroutines. Use Allow when the request is stateless and can be retried later by the client.

Use Reserve when you need to inspect the delay before committing. Use Reserve when you want to implement custom backoff or batching logic. Use Reserve when you need to cancel a reservation and return the token to the pool.

Use a channel as a limiter when you want a pure standard library solution. A buffered channel with a ticker goroutine can act as a token bucket. Use a channel when you cannot add external dependencies. Use a channel when you need to integrate the limiter with other channel-based pipelines.

Where to go next

Rate limiting is one piece of the concurrency puzzle. The rate package handles the math. You handle the integration.

Rate limiting is a shield, not a sword. Protect your system, but do not block your users unnecessarily.

A leaky bucket rate limiter controls how fast your application processes requests by allowing a steady flow while preventing sudden spikes. Think of it like a bucket with a hole: water (requests) pours in, but only leaks out at a fixed speed, so the bucket never overflows. You use this to protect your server from getting overwhelmed by too many users at once.