How to Handle Rate Limiting in HTTP Clients in Go

The scraper that got banned

You write a bot to fetch user profiles from an API. You spin up a hundred goroutines to maximize throughput. The first batch of requests flies out instantly. The server responds with 429 Too Many Requests. Your IP gets flagged. The bot stops working.

You realize you need to throttle the requests. Sprinkling time.Sleep in a loop works for a single goroutine, but it falls apart with concurrency. You need a mechanism that controls the rate across all goroutines, handles bursts intelligently, and respects request cancellation.

Rate limiting is about controlling flow. The token bucket algorithm is the standard solution. Imagine a bucket that fills with tokens at a steady rate. Each request consumes one token. If the bucket is empty, the request waits until a token arrives. The bucket has a maximum capacity. If it is full, new tokens are discarded. This design allows bursts: if the bucket is full, you can grab multiple tokens instantly. Over time, the average rate matches the fill rate, but short spikes are permitted.

Go's standard library does not include a rate limiter. The community standard is golang.org/x/time/rate. This package is maintained by the Go team in a separate repository and implements the token bucket algorithm efficiently. It requires a go get to install, but it is the de facto tool for this job.

The basics of rate.Limiter

The rate.Limiter struct holds the state for the token bucket. You create it with a limit and a burst size. The limit defines how many tokens are added per second. The burst defines the maximum capacity of the bucket.

package main

import (
	"context"
	"fmt"
	"time"

	"golang.org/x/time/rate"
)

func main() {
	// Limit is 2 tokens per second. Burst is 1, so only 1 token exists initially.
	limiter := rate.NewLimiter(rate.Limit(2), 1)

	for i := 0; i < 5; i++ {
		// Wait blocks until a token is available or the context is cancelled.
		if err := limiter.Wait(context.Background()); err != nil {
			fmt.Println(err)
			return
		}
		fmt.Printf("Request %d at %s\n", i+1, time.Now().Format(time.TimeOnly))
	}
}

# output:
Request 1 at 10:00:00
Request 2 at 10:00:05
Request 3 at 10:00:10
Request 4 at 10:00:15
Request 5 at 10:00:20

The Wait method calculates how long to sleep based on the current token count. If tokens are available, it returns immediately. If not, it sleeps until the next token arrives. The calculation uses the current time, so the rate stays accurate even if the system clock drifts slightly or the goroutine is preempted.

The burst parameter controls latency and smoothness. A burst of 1 forces strict pacing. Every request waits exactly the interval. A burst of 10 allows ten requests instantly, followed by a pause. Choose the burst based on the downstream service's tolerance. If the server bans you for spikes, keep the burst low. If the server can handle bursts and you want to fill the pipe quickly, increase the burst.

Burst is your safety valve. Tune it to the server's patience, not your impatience.

Wrapping the HTTP client

In a real application, you rarely call Wait manually. You want the HTTP client to enforce the limit transparently. The http.Client uses a Transport to send requests. You can wrap the transport to intercept every request and wait for a token before forwarding it.

The wrapper must implement http.RoundTripper. The RoundTrip method receives the request, waits for a token, and delegates to the underlying transport.

package main

import (
	"net/http"

	"golang.org/x/time/rate"
)

// rateLimitTransport implements http.RoundTripper.
// It waits for a token before forwarding the request.
type rateLimitTransport struct {
	limiter *rate.Limiter
	base    http.RoundTripper
}

// RoundTrip waits for a token, then delegates to the base transport.
func (t *rateLimitTransport) RoundTrip(req *http.Request) (*http.Response, error) {
	// Wait respects request cancellation. If the caller cancels, we stop waiting.
	if err := t.limiter.Wait(req.Context()); err != nil {
		return nil, err
	}
	return t.base.RoundTrip(req)
}

func NewRateLimitedClient(rps float64, burst int) *http.Client {
	limiter := rate.NewLimiter(rate.Limit(rps), burst)
	return &http.Client{
		Transport: &rateLimitTransport{
			limiter: limiter,
			base:    http.DefaultTransport,
		},
	}
}

The receiver name t follows Go convention: one or two letters matching the type. The RoundTrip method signature matches the interface exactly. If you miss a parameter or return type, the compiler rejects the struct with cannot use ... as http.RoundTripper value in struct literal: missing method RoundTrip.

The critical detail is passing req.Context() to Wait. This ties the rate limiter to the request lifecycle. If the caller cancels the request or the client times out, the context expires. Wait detects the cancellation and returns immediately with an error. The request never reaches the network. This prevents goroutine leaks where a goroutine hangs in the limiter waiting for a token while the caller has already given up.

A rate limiter that ignores context is a leak waiting to happen.

Scheduling with Reserve

Sometimes you do not want to block the current goroutine. You might have a queue of requests and want to schedule them for the future. The Reserve method returns a Reservation without blocking. The reservation tells you when the token will be available.

package main

import (
	"fmt"
	"time"

	"golang.org/x/time/rate"
)

func main() {
	limiter := rate.NewLimiter(rate.Limit(1), 1)

	// Reserve checks availability without blocking.
	// It returns a Reservation that tracks when to proceed.
	res := limiter.Reserve()

	// Delay returns how long to wait. Zero means go now.
	delay := res.Delay()
	if delay > 0 {
		fmt.Printf("Wait %v before sending\n", delay)
		// Schedule the request on a timer instead of blocking.
		time.AfterFunc(delay, func() {
			fmt.Println("Sending request now")
		})
	} else {
		fmt.Println("Sending request now")
	}
}

Use Reserve when you need to decouple the decision to make a request from the execution. You can reserve a slot, calculate the delay, and schedule the work on a timer. This keeps your goroutines free while waiting. It is useful for batch processors or when you want to report the expected wait time to a user.

The Reservation also has a Cancel method. If you reserve a token but decide not to send the request, call Cancel to return the token to the bucket. This prevents wasting capacity.

Reserve gives you control. Wait gives you simplicity. Pick the one that matches your flow.

Pitfalls and runtime behavior

Rate limiters are stateful. The rate.Limiter maintains internal counters and timestamps. Concurrent access to Wait is safe. Multiple goroutines can call Wait simultaneously, and the limiter serializes them correctly.

Concurrent calls to SetLimit or SetBurst are not safe. If you need to adjust the rate dynamically, the package does not protect you. Calling SetLimit while Wait is running leads to undefined behavior. If you need dynamic rates, wrap the limiter in a struct that swaps the instance atomically using sync/atomic.

The limiter does not track errors. If a request fails with a 500 status, the token is still consumed. The limiter assumes every attempt costs a token. If you want to retry failed requests without penalizing the rate, you must handle that logic outside the limiter. You can call Reserve, send the request, and if it fails, call Reservation.Cancel to refund the token.

HTTP/2 multiplexing can complicate things. A single HTTP/2 connection can carry many streams. The rate limiter controls the number of requests your client initiates. It does not control the number of streams on the wire. If the server enforces rate limits per connection, your client-side limiter might not align with the server's view. Usually, client-side limiting is sufficient, but be aware that the server might see a different pattern if you reuse connections aggressively.

If you forget to import the package, the compiler complains with undefined: rate. If you try to use a limiter variable that is nil, you get a nil pointer dereference panic at runtime. Always initialize the limiter before passing it to the transport.

Rate limiters are plumbing. They work silently until they fail. Test them under load to verify the burst behavior and cancellation paths.

When to use what

Use golang.org/x/time/rate when you need a token bucket algorithm with burst support and blocking waits.

Use a buffered channel as a semaphore when you want to limit concurrency rather than rate over time.

Use time.Sleep when you have a single sequential loop and do not need burst handling or concurrency control.

Use a custom middleware when you need per-endpoint or per-user rate limiting in a server application.

Use the standard http.Client without wrapping when the downstream service has no rate limit or your volume is negligible.

Where to go next

Rate limiting prevents your Go program from sending too many requests to a server at once, which could get your IP blocked. You achieve this by adding a "traffic cop" (a limiter) to your HTTP client that pauses requests if you've sent too many recently. This ensures your app behaves politely and stays within the server's allowed usage limits.