When external calls start failing
You are building a Go service that calls an external payment API. The payment provider experiences an outage. Your service keeps sending requests. Each request waits for a thirty second timeout. Goroutines pile up. Memory consumption climbs. Your service crashes under the weight of waiting. The circuit breaker pattern stops this cascade before it starts.
How the pattern actually works
The pattern borrows its name from electrical engineering. A physical circuit breaker monitors current flow. When the load exceeds a safe threshold, the breaker trips and cuts power. You do not keep plugging appliances in and hoping the wires will cool down. You wait for a cooldown period, reset the breaker, and test the circuit again.
Software systems follow the same rhythm. Your current is outgoing network requests or database queries. The threshold is a count of consecutive failures or a failure rate percentage. When the threshold is crossed, the breaker trips to an open state. New requests fail immediately without touching the external service. After a configured timeout, the breaker moves to a half open state. It allows a single probe request through. If the probe succeeds, the circuit closes and normal traffic resumes. If it fails, the circuit opens again and the timer resets.
Go does not include a circuit breaker in the standard library. You can write one in a few dozen lines, or you can use a battle tested package. Both approaches teach the same mechanics.
A circuit breaker protects your service, not the downstream dependency.
Building a minimal breaker from scratch
Here is a thread safe circuit breaker built from scratch. It tracks consecutive failures, enforces a reset timeout, and manages the three standard states.
package main
import (
"fmt"
"sync"
"time"
)
// State tracks the lifecycle phase of the breaker.
type State int
const (
Closed State = iota // Normal flow, all requests pass through
Open // Fail fast, requests rejected immediately
HalfOpen // Recovery test, allows a single probe request
)
// Breaker stores configuration and runtime metrics.
type Breaker struct {
mu sync.Mutex // Guards concurrent reads and writes to state
state State // Current operational phase
failures int // Consecutive error count since last success
maxFailures int // Failure threshold that triggers an open state
resetTimeout time.Duration // Cooldown window before allowing a probe
lastFailTime time.Time // Anchor point for the reset timer
}
// NewBreaker returns a ready to use breaker instance.
func NewBreaker(maxFailures int, resetTimeout time.Duration) *Breaker {
return &Breaker{
state: Closed,
maxFailures: maxFailures,
resetTimeout: resetTimeout,
}
}
The struct holds the state machine and a mutex. Go does not have built in atomic state machines, so a sync.Mutex keeps concurrent HTTP handlers from corrupting the failure counter. The receiver name b follows the Go convention of using one or two letters that match the type.
// ErrOpenState signals that the circuit is currently tripped.
var ErrOpenState = fmt.Errorf("circuit breaker is open")
// Execute wraps a function call with breaker state checks.
func (b *Breaker) Execute(fn func() error) error {
b.mu.Lock()
if b.state == Open {
// Transition to half open only after the cooldown expires
if time.Since(b.lastFailTime) > b.resetTimeout {
b.state = HalfOpen
} else {
b.mu.Unlock()
return ErrOpenState // Reject immediately to save resources
}
}
b.mu.Unlock()
err := fn() // Delegate to the actual external call
b.mu.Lock()
if err != nil {
b.failures++
b.lastFailTime = time.Now()
if b.failures >= b.maxFailures {
b.state = Open // Trip the breaker on threshold breach
}
} else {
b.failures = 0 // Reset counter on success
b.state = Closed
}
b.mu.Unlock()
return err
}
The Execute method locks before reading state, unlocks before calling the external function, and locks again to record the result. This prevents holding the mutex during a potentially slow network call. The failure counter resets on any success, which matches the standard circuit breaker behavior of tracking consecutive errors rather than a sliding window.
State machines are simple until concurrency touches them. Lock around state changes, never around I/O.
Walking through the lifecycle
When the breaker starts, it sits in the closed state. Every successful call leaves it closed. Every failure increments the counter. Once the counter hits maxFailures, the state flips to open. The lastFailTime captures the exact moment the trip happened.
While open, Execute returns ErrOpenState without calling the wrapped function. This is the fail fast behavior that saves your goroutines. The timer runs in the background via time.Since. When the cooldown expires, the next call triggers a transition to half open.
Half open is a testing phase. The breaker allows one request through. If that request succeeds, the counter resets and the circuit closes. If it fails, the circuit opens again and the timer restarts. This prevents a flood of traffic from hitting a recovering service.
You can extend this skeleton with metrics, sliding window failure rates, or concurrent half open probes. The core loop remains the same.
A half open state is a controlled experiment, not a free pass.
Using a production ready library
Writing a breaker from scratch teaches the mechanics. Production systems benefit from battle tested packages that handle edge cases, metrics, and configuration. The sony/gobreaker library is the most widely used implementation in the Go ecosystem.
Here is how you wrap an HTTP client call with it. The library expects a function that returns an error, which aligns perfectly with Go's if err != nil convention.
package main
import (
"context"
"fmt"
"net/http"
"time"
"github.com/sony/gobreaker"
)
// CallExternalAPI demonstrates wrapping an HTTP request.
func CallExternalAPI(ctx context.Context, cb *gobreaker.CircuitBreaker) error {
// Context always travels first to respect deadlines and cancellation
err := cb.Execute(func() error {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://api.example.com/data", nil)
if err != nil {
return err
}
// Client handles the actual network I/O
resp, err := http.DefaultClient.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode >= 500 {
return fmt.Errorf("server error: %d", resp.StatusCode)
}
return nil
})
// Handle breaker specific errors separately from network errors
if err == gobreaker.ErrOpenState {
return fmt.Errorf("circuit open, backing off")
}
return err
}
The library manages the state machine internally. You configure it with gobreaker.Settings. The ReadyToTrip function lets you define custom thresholds, like tripping only after five total requests with at least one failure. This prevents a single flaky request from opening the circuit during low traffic.
Context flows through every long lived call site. Pass it first, check it early, respect cancellation.
Common pitfalls and compiler traps
Custom breakers often fail on concurrency bugs. If you drop the mutex and use a plain map or slice to track failures, the runtime will panic with concurrent map writes or concurrent slice access. Go's race detector catches this during testing, but it is better to design around it. Stick to sync.Mutex or sync/atomic for state fields.
Another frequent mistake is forgetting to handle ErrOpenState. If you treat it like a network timeout, your retry logic will hammer the breaker instead of backing off. Check for the breaker error explicitly before applying exponential backoff.
Configuration mistakes are harder to spot. Setting ResetTimeout too low causes rapid cycling between open and half open states. This burns CPU and generates noisy logs. Setting MaxFailures too high defeats the purpose of the pattern. Start with three failures and a ten second timeout, then adjust based on your downstream service's recovery time.
If you forget to import the breaker package, the compiler rejects the program with undefined: gobreaker. If you pass a function with the wrong signature to Execute, you get cannot use func() (string, error) as func() error value in argument. Go's type system catches signature mismatches early. Trust it.
The worst goroutine bug is the one that never logs.
When to reach for a circuit breaker
Use a custom circuit breaker when you need zero external dependencies and want full control over state transitions. Use sony/gobreaker when you need production grade reliability, metrics hooks, and configurable trip functions. Use a retry library with exponential backoff when failures are transient and you can afford to wait. Use plain request timeouts when the downstream service is highly available and you only need to prevent single request hangs. Use a bulkhead pattern when you want to isolate resource pools per service instead of failing fast globally.
Pick the tool that matches your failure profile. Do not add complexity to solve a problem you do not have.