How to Implement Rate Limiting in a Go HTTP Server

The gatekeeper pattern

Your API is live. Traffic is steady. Then a script starts hammering your login endpoint. The database groans. Response times spike. Legitimate users get timeouts. You need a gatekeeper that says "slow down" without crashing the whole server. Rate limiting is that gatekeeper. It controls the flow of requests to protect backend resources and ensure fair usage.

Go does not include rate limiting in the standard library. The community standard is golang.org/x/time/rate. This package implements the token bucket algorithm. It is lightweight, thread-safe, and easy to compose with net/http.

Token buckets and burst capacity

The token bucket algorithm balances steady limits with short bursts. Imagine a bucket with holes in the bottom. Water drips in at a constant rate. Each request needs a drop of water to pass. If the bucket is empty, the request gets rejected. The bucket refills over time. This allows a sudden spike of requests to pass if the bucket has spare capacity, while enforcing a long-term average.

The rate package exposes two key concepts. rate.Limit is a type representing the refill speed. It is a float64 value meaning tokens per second. rate.Every is a helper that converts a duration to a limit. rate.Every(time.Second) returns a limit of 1.0. rate.Every(200 * time.Millisecond) returns 5.0. The second parameter to NewLimiter is the burst size. This is the maximum number of tokens the bucket can hold. A burst of 10 means the first ten requests pass instantly. The eleventh request waits for a refill.

Token buckets smooth bursts. Don't treat them as hard walls.

Minimal global limiter

Here's the simplest setup: create a limiter, check it in the handler, reject if empty.

package main

import (
	"net/http"
	"time"

	"golang.org/x/time/rate"
)

// main starts the server with a global rate limiter.
func main() {
	// refill rate: one token per second.
	// burst size: ten tokens max.
	limiter := rate.NewLimiter(rate.Every(time.Second), 10)

	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		// check if a token is available without blocking.
		if !limiter.Allow() {
			http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
			return
		}
		w.Write([]byte("OK"))
	})

	http.ListenAndServe(":8080", nil)
}

rate.NewLimiter creates the bucket. The first argument sets the refill rate. The second sets the burst capacity. limiter.Allow() checks the bucket. It returns true and consumes a token if one exists. It returns false if the bucket is empty. This check is fast and thread-safe. Multiple goroutines can call Allow() concurrently without a mutex. The limiter handles internal synchronization.

If you pass invalid arguments to NewLimiter, the program panics. The panic message is rate: invalid argument to NewLimiter. This happens if the limit is zero or negative, or if the burst is negative. The compiler does not catch this. Validate configuration before creating the limiter.

Per-client limiting with sync.Map

A global limiter protects the server but starves all users when one client misbehaves. Real applications need per-client limits. You can isolate abusive users from good ones by tracking limiters per IP address or user ID.

A map of limiters requires concurrent access. sync.Map is optimized for workloads where keys are written once and read many times. It avoids the contention of a manual mutex.

Define a store to hold the limiters.

// LimiterStore holds per-client rate limiters.
// sync.Map handles concurrent access without a manual mutex.
type LimiterStore struct {
	limiters sync.Map
	rps      float64
	burst    int
}

// NewLimiterStore creates a store with the given rate and burst settings.
func NewLimiterStore(rps float64, burst int) *LimiterStore {
	return &LimiterStore{
		rps:   rps,
		burst: burst,
	}
}

Implement the middleware that wraps handlers.

// Middleware returns an http.Handler that enforces rate limits per client IP.
func (s *LimiterStore) Middleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ip := r.RemoteAddr

		// LoadOrStore gets the existing limiter or builds a new one.
		val, _ := s.limiters.LoadOrStore(ip, rate.NewLimiter(rate.Limit(s.rps), s.burst))
		limiter := val.(*rate.Limiter)

		if !limiter.Allow() {
			http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
			return
		}

		next.ServeHTTP(w, r)
	})
}

LoadOrStore atomically retrieves the value for the key or stores a new one. It returns the value and a boolean indicating if the load succeeded. The return type is any. You must type assert to *rate.Limiter. The assertion is safe here because LoadOrStore guarantees the stored type matches what you put in. If the assertion fails, the program panics with interface conversion: interface is nil, not *rate.Limiter.

Middleware returns an http.Handler. This follows the Go convention of accepting interfaces and returning structs. The middleware wraps the next handler and returns a new handler that implements the interface.

Maps leak memory. Evict or bound the size.

Blocking versus rejecting

Allow() rejects requests immediately when the bucket is empty. Some use cases prefer to delay requests instead. Wait() blocks until a token is available. It takes a context.Context to handle cancellation.

Use Wait() when you want to queue requests rather than drop them. This is useful for batch jobs or when the client expects a response eventually.

// WaitHandler blocks until a token is available or the context is cancelled.
func (s *LimiterStore) WaitHandler(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ip := r.RemoteAddr
		val, _ := s.limiters.LoadOrStore(ip, rate.NewLimiter(rate.Limit(s.rps), s.burst))
		limiter := val.(*rate.Limiter)

		// Wait blocks until a token is available.
		// It uses the request context to handle client disconnects.
		if err := limiter.Wait(r.Context()); err != nil {
			// context cancelled means the client gave up.
			return
		}

		next.ServeHTTP(w, r)
	})
}

Wait() returns an error if the context is done before the token arrives. This happens if the client closes the connection while waiting. The error check follows the standard if err != nil pattern. The boilerplate makes the unhappy path visible.

context.Context always goes as the first parameter in functions that support cancellation. Wait() respects the context deadline. If the deadline expires, Wait() returns an error immediately.

Context is plumbing. Run it through every long-lived call site.

Pitfalls and compiler errors

Per-client limiters grow the map over time. If every IP gets a limiter, memory fills up. You need an eviction strategy. The standard library does not provide one. Add a background goroutine to sweep stale entries, or use a cache library with TTL support.

Type assertions panic at runtime. LoadOrStore returns any. If you store the wrong type, the assertion fails. The compiler cannot check dynamic types. Trust the invariant or verify the type with a type switch.

val, _ := s.limiters.LoadOrStore(ip, rate.NewLimiter(rate.Limit(s.rps), s.burst))
switch v := val.(type) {
case *rate.Limiter:
	limiter = v
default:
	// handle unexpected type.
}

Forgetting to import golang.org/x/time/rate causes a compile error. The compiler rejects the program with undefined: rate. Run go get golang.org/x/time/rate to fetch the package. The x/ packages are supplementary. They are stable but not part of the standard library. They may change between minor versions, though x/time has been stable for years.

Type assertions panic at runtime. Trust the invariant or check the type.

When to use rate limiting

Use golang.org/x/time/rate when you need a standard token bucket with low overhead. Use a global limiter when you want to protect the entire server from a single flood, regardless of source. Use per-client limiters when you need to isolate abusive users from good ones. Use Allow() when you want to reject excess requests immediately. Use Wait() when you want to delay requests until capacity is available. Use a sliding window log when you need exact counts over a time window rather than a smoothed average. Use a third-party library like github.com/didip/tollbooth when you want configuration via headers or middleware composition out of the box. Use no rate limiting when the cost of a request is negligible and the risk of abuse is low.

Rate limiting is a trade-off. Protect the system, not just the code.

Where to go next

Rate limiting acts like a bouncer at a club, allowing only a specific number of people (requests) inside per minute. It protects your server from crashing when too many users try to access it at once. You use it to ensure fair usage and keep your application stable during traffic spikes.