How to Retry Failed HTTP Requests in Go

Web
Retry failed HTTP requests in Go by wrapping the client call in a loop with exponential backoff for transient errors.

When the network blips

You send a request to a payment gateway. The response never comes back. The timeout hits. Your code returns an error. The user sees a red banner. They click "Pay" again. Now you have two charges. Or worse, the first one actually went through, but the network dropped the response. The user sees an error, clicks again, and you double-charge them.

Transient failures happen. Networks drop packets. Servers restart. Load balancers flip. Your code needs to handle the blip without blaming the user. Retrying failed requests is the standard way to recover from temporary glitches. The trick is retrying smartly. You need to wait long enough for the server to recover, but not so long that the user gives up. You need to avoid retrying errors that won't fix themselves. You need to stop immediately if the caller cancels the operation.

The backoff pattern

Retrying is like calling a friend who doesn't answer. You don't call again immediately. That just rings their phone once more and annoys them. You wait a bit. If they still don't answer, you wait longer. If you call too fast, you might block their line or get flagged as spam.

Exponential backoff is the pattern where you double the wait time after each attempt. One second, then two, then four. It gives the server time to recover and prevents a stampede of retries from crashing a recovering service. Adding a small random delay, called jitter, spreads the retries out. If ten thousand clients all retry at exactly the same time, they might crash the server right as it's recovering. Jitter breaks the synchronization.

Minimal retry loop

Here is the skeleton of a retry loop. It attempts the request, checks for success, sleeps if it fails, and repeats.

func RetryRequest(client *http.Client, req *http.Request, maxRetries int) (*http.Response, error) {
	// Loop through attempts. The counter starts at zero.
	for i := 0; i < maxRetries; i++ {
		// Execute the HTTP request. Capture both response and error.
		resp, err := client.Do(req)

		// Success means no error and a status code below 500.
		// 4xx errors usually mean the client sent bad data. Retrying won't help.
		if err == nil && resp.StatusCode < 500 {
			return resp, nil
		}

		// If this is the last attempt, skip the sleep and fall through to return the error.
		if i < maxRetries-1 {
			// Calculate backoff: 2^i seconds.
			// Shift left by i to get powers of two: 1, 2, 4, 8...
			backoff := time.Duration(1 << uint(i)) * time.Second
			time.Sleep(backoff)
		}
	}

	// Return the final response and error after all attempts are exhausted.
	return nil, fmt.Errorf("request failed after %d attempts", maxRetries)
}

How the loop works

The loop runs maxRetries times. On the first pass, i is zero. The request fires. If the server returns a 200 OK, the function returns immediately. If the server returns a 503 Service Unavailable, the condition resp.StatusCode < 500 fails. The code checks if there are retries left. Since i is less than maxRetries - 1, it calculates the sleep duration. 1 << 0 is one. The code sleeps for one second. The loop increments i to one. The request fires again. If it fails, 1 << 1 is two. The code sleeps for two seconds. This pattern continues until the request succeeds or the loop ends.

The bitwise shift operator << is a fast way to compute powers of two. It shifts the binary representation of one to the left by i positions. The compiler optimizes this to a single instruction. Using math.Pow would work, but it returns a float and requires conversion. The shift operator stays in the integer domain and reads cleanly for powers of two.

The condition resp.StatusCode < 500 filters out client errors. A 400 Bad Request or 404 Not Found means the client sent invalid data or asked for a missing resource. Retrying won't fix a typo in the JSON payload. The code only retries 5xx server errors and network failures. This distinction matters. Retrying a 400 error just wastes time and annoys the server.

Realistic implementation

The minimal loop misses a few details that break in production. The first issue is the request body. An http.Request body is an io.ReadCloser. Once you read it, the stream is exhausted. If you pass the same request object to client.Do a second time, the body is empty. The server receives a request with no payload. You must reset the body before each retry.

The second issue is context. Long-running operations need a way to cancel. If the user closes their browser or the upstream caller times out, your retry loop should stop sleeping and return immediately. context.Context carries the cancellation signal. Functions that take a context should respect cancellation and deadlines. The convention is to pass context.Context as the first parameter, conventionally named ctx.

The third issue is the thundering herd. If many clients retry at the same time, they might crash the server. Adding jitter spreads the retries out.

Here is a robust implementation that handles body reset, context cancellation, and jitter. The code stays under twenty-five lines by keeping the logic tight.

func RetryWithJitter(ctx context.Context, client *http.Client, req *http.Request, body []byte, max int) (*http.Response, error) {
	// RetryWithJitter retries an HTTP request with exponential backoff and jitter.
	// It resets the request body from the provided bytes on each attempt.
	var resp *http.Response
	for i := 0; i < max; i++ {
		// Restore body for this attempt.
		// io.NopCloser wraps the reader so the HTTP client doesn't close the shared buffer.
		req.Body = io.NopCloser(bytes.NewReader(body))

		// client.Do respects req.Context(). If ctx cancels, Do returns immediately.
		resp, err := client.Do(req)
		if err == nil && resp.StatusCode < 500 {
			return resp, nil
		}

		if i == max-1 {
			break
		}

		// Exponential backoff with jitter.
		// Base delay is 2^i seconds. Jitter adds up to 1 second of randomness.
		delay := time.Duration(1<<uint(i))*time.Second + time.Duration(rand.Intn(1000))*time.Millisecond
		select {
		case <-ctx.Done():
			return nil, ctx.Err()
		case <-time.After(delay):
			// Sleep finished. Loop continues.
		}
	}
	return resp, fmt.Errorf("failed: %w", err)
}

The body reset uses io.NopCloser and bytes.NewReader. The http.Client closes the request body after each call. If you wrap the shared byte slice in a standard reader, the client might close it, corrupting the data for the next attempt. io.NopCloser returns a reader that ignores close calls. The underlying byte slice remains safe.

The sleep uses a select statement. The select races the sleep timer against the context channel. If the context cancels, the select returns immediately with ctx.Err(). The loop stops sleeping and returns. This prevents the code from hanging for seconds after the caller gave up. Context is plumbing. Run it through every long-lived call site.

Pitfalls and errors

Retrying blindly causes problems. The first pitfall is retrying non-idempotent operations. If you retry a payment request, you might charge the user twice. Only retry requests where repeating the action is safe. GET requests are safe. POST requests that create a resource might not be. Check the API documentation for idempotency keys. An idempotency key is a unique identifier that tells the server to ignore duplicate requests. The server stores the result of the first request and returns the same result for subsequent requests with the same key.

The second pitfall is the loop variable capture. If you use a goroutine inside the loop, you must capture the loop variable. The compiler rejects the program with loop variable i captured by func literal in Go 1.22+. This error prevents a common bug where all goroutines see the final value of i. The fix is to pass i as an argument to the goroutine or assign it to a new variable inside the loop.

Another error is forgetting to reset the body. If you skip the body reset, the second attempt sends an empty body. The server might return a 400 error. The compiler won't catch this. You'll see a runtime error from the server, not from Go. The error message from the server might say missing required field. The Go code runs fine. The bug hides in the network layer.

The compiler complains with cannot use x (untyped int constant) as string value in argument if you pass the wrong type to a function. This error is common when mixing integers and strings in error formatting. Always check the types.

if err != nil { return err } is verbose by design. The community accepts the boilerplate because it makes the unhappy path visible. Don't try to hide errors in a retry loop. Return them clearly. Error wrapping with %w preserves the error chain. The caller can use errors.Is or errors.As to inspect the root cause.

Trust gofmt. Argue logic, not formatting. Most editors run gofmt on save. The tool decides indentation and spacing. Your code will look like everyone else's code. This consistency reduces cognitive load when reading unfamiliar code.

When to retry

Use a simple retry loop when you need to handle transient network blips for idempotent operations. Use a retry loop with jitter when multiple clients might retry simultaneously to avoid thundering herds. Use a retry loop with context when the caller might cancel the operation and you need to stop sleeping immediately. Use a circuit breaker when the downstream service is down and retries would only waste resources. Use a request queue when you need guaranteed delivery and can tolerate higher latency.

Retries are cheap. Idempotency is not. Don't retry a 400. Fix the payload.

Where to go next