How to Build a Rate Limiter Service in Go

The problem of too many visitors

A small API lives quietly on a single machine. It handles a few dozen requests per minute. Then a blog post goes viral, or a scraper starts hammering the endpoint. The database connection pool exhausts itself. Response times climb from milliseconds to seconds. The service crashes under its own weight.

You do not need a distributed load balancer to solve this. You need a gatekeeper. A rate limiter sits between the network and your business logic. It counts how many requests arrive and rejects the excess before they touch your database or your CPU. Go makes this straightforward because the standard library ecosystem includes a battle-tested implementation that handles the concurrency math for you.

How a token bucket actually works

Rate limiting algorithms vary, but the token bucket is the most practical for HTTP services. Imagine a physical bucket with a small hole in the bottom. Water drips in at a steady pace. Each request pulls a cup of water out. If the bucket is full, requests flow freely. If the bucket runs dry, the next request waits or gets turned away.

The algorithm tracks two numbers. The refill rate determines how many tokens appear each second. The burst size determines the maximum capacity of the bucket. A high burst size with a low refill rate allows short spikes of traffic while enforcing a long-term average. A low burst size with a high refill rate enforces a strict, steady rhythm.

The golang.org/x/time/rate package implements this exact model. It is maintained by the Go team, handles concurrent access safely, and exposes a clean API. You do not need to write your own mutex logic or timer loops. The package manages the internal clock and token math.

Token buckets are predictable. They do not penalize steady traffic. They only catch bursts.

The simplest rate limiter

Here is the baseline setup. We create a limiter that refills two tokens per second and allows a burst of one. The HTTP handler checks the bucket before processing. If the bucket is empty, the handler returns a 429 Too Many Requests status.

package main

import (
	"net/http"
	"golang.org/x/time/rate"
)

// NewLimiter creates a token bucket with a steady refill rate and a maximum burst capacity.
func NewLimiter(r rate.Limit, b int) *rate.Limiter {
	return rate.NewLimiter(r, b)
}

func main() {
	// Refill 2 tokens per second. Allow 1 token to accumulate for sudden spikes.
	limiter := NewLimiter(rate.Limit(2), 1)

	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		// Allow returns true only if a token is available right now.
		if !limiter.Allow() {
			http.Error(w, "Too Many Requests", http.StatusTooManyRequests)
			return
		}
		w.Write([]byte("Hello, client!"))
	})

	// ListenAndServe blocks the main goroutine until the process exits.
	http.ListenAndServe(":8080", mux)
}

The rate.Limiter struct holds an internal mutex. Every call to Allow() locks that mutex, checks the current token count, and unlocks. This means you can safely share a single limiter across all HTTP handler goroutines without data races. The Go runtime schedules the handler goroutines independently, but the limiter serializes the token check.

When a request arrives, Allow() calculates how many tokens have accumulated since the last check. If the count is greater than zero, it subtracts one and returns true. If the count is zero, it returns false immediately. No blocking occurs. The handler moves on to the next request.

This approach works for simple endpoints. It drops excess traffic instantly. Dropped requests do not queue. They fail fast.

Fast failures protect your backend. Dropped requests are cheaper than queued ones.

Adding precision and state

Instant rejection works for public endpoints. Internal services often need more control. You might want to pause a request until a token becomes available, or you might want to enforce a deadline so the request does not hang forever. The Wait() method handles both cases.

Wait() blocks the calling goroutine until a token is available or the context cancels. It returns an error if the context deadline passes or if the limiter gets closed. This turns the rate limiter into a flow control valve rather than a hard gate.

package main

import (
	"context"
	"net/http"
	"time"
	"golang.org/x/time/rate"
)

// HandleWithWait processes a request after acquiring a token or timing out.
func HandleWithWait(limiter *rate.Limiter, w http.ResponseWriter, r *http.Request) {
	// Context carries the deadline and cancellation signal to the limiter.
	ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
	defer cancel()

	// Wait blocks until a token is available or the context expires.
	if err := limiter.Wait(ctx); err != nil {
		http.Error(w, "Request timed out", http.StatusGatewayTimeout)
		return
	}
	w.Write([]byte("Processed after waiting"))
}

func main() {
	limiter := rate.NewLimiter(rate.Limit(1), 0)
	mux := http.NewServeMux()
	mux.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		HandleWithWait(limiter, w, r)
	})
	http.ListenAndServe(":8080", mux)
}

The context parameter follows Go convention: it always goes first, and it is conventionally named ctx. The WithTimeout call creates a derived context that automatically cancels after 500 milliseconds. The defer cancel() ensures the context resources are released even if the function returns early. This prevents memory leaks in the context tree.

When Wait() returns an error, the handler checks it. The error type is context.DeadlineExceeded or context.Canceled. You do not need to inspect the exact error value for basic routing. Returning a 504 Gateway Timeout tells the client that the server is busy, not that the client made a mistake.

The if err != nil { return err } pattern is verbose by design. The Go community accepts the boilerplate because it makes the unhappy path visible. You cannot accidentally swallow a timeout error when the check sits on its own line.

Context is plumbing. Run it through every long-lived call site.

Where things go wrong

Rate limiters introduce blocking behavior. Blocking behavior introduces goroutine leaks. If you call Wait() without a context deadline, a slow refill rate will keep goroutines suspended indefinitely. The HTTP server will eventually run out of file descriptors or memory. Always attach a timeout or a cancellation channel.

The compiler will catch type mismatches before runtime. If you pass a string where a rate.Limit is expected, the compiler rejects the program with cannot use "two" (untyped string constant) as rate.Limit value in argument. If you forget to import the rate package, you get undefined: rate. These errors are straightforward. Fix the type or add the import.

Runtime panics usually come from nil dereferences. If you initialize the limiter in a separate goroutine and the HTTP server starts before the limiter is ready, limiter.Allow() panics with runtime error: invalid memory address or nil pointer dereference. Initialize shared state before calling ListenAndServe. The main goroutine should set up all dependencies, then hand control to the server.

Another common mistake is sharing a single limiter across unrelated endpoints. A heavy file upload endpoint and a lightweight health check endpoint will compete for the same tokens. The health check will starve. Scope limiters to the route or the service tier they protect.

Goroutine leaks happen when the goroutine waits on a channel that never gets closed. Always have a cancellation path.

Picking the right approach

Rate limiting is not one tool. It is a family of patterns. Choose the one that matches your traffic shape and your failure tolerance.

Use Allow() when you want instant rejection and zero blocking overhead. Use Wait() with a context deadline when you prefer to pause requests rather than drop them. Use a per-client limiter map when different users or API keys need independent quotas. Use a distributed store like Redis when your service runs behind a load balancer and needs a shared counter across multiple instances. Use plain sequential code when you do not need concurrency: the simplest thing that works is usually the right thing.

The golang.org/x/time/rate package handles single-process limiting efficiently. It does not coordinate across network boundaries. If you need cluster-wide limits, you will need to implement a sliding window or a token bucket in Redis, or use a message broker to fan out quota checks.

Do not overcomplicate the first version. Start with a single limiter. Measure the drop rate. Add per-client tracking only when the metrics show unfair starvation.

Where to go next

A rate limiter acts like a traffic cop for your server, ensuring it doesn't get overwhelmed by too many requests at once. It allows a specific number of actions per second and blocks anything extra until the limit resets. Think of it as a turnstile that only lets a few people through every minute to prevent a stampede.