Back to Blog
Go Backend

Implementing Circuit Breaker Pattern in Go Microservices

Wang Yinneng
5 min read
golangmicroservicespatternsresilience

๐Ÿšจ When Everything Goes Wrong: My Black Friday Nightmare

3 AM. Pager buzzing. Half our microservices down. Customers can't checkout. $2M revenue evaporating by the minute.

The culprit? One failing payment service bringing down our entire e-commerce platform.

Here's how the Circuit Breaker Pattern saved our Black Friday (and how it can save yours too).

โšก What Went Wrong

Our architecture looked solid on paper:

Order Service โ†’ Payment Service โ†’ Bank API
     โ†“               โ†“
User Service โ†’ Email Service โ†’ SMTP
     โ†“
Inventory Service โ†’ Database

But when the payment service started timing out, every other service kept retrying. The result? A catastrophic cascade failure.

Classic mistake: No circuit breaker protection.

๐Ÿ”ง The Circuit Breaker Pattern Explained

Think of your house's electrical system. When there's a short circuit, the breaker "trips" to prevent a fire. Same concept for microservices.

Three States:

CLOSED โ†’ Requests flow normally
   โ†“
OPEN โ†’ Service is failing, reject fast  
   โ†“
HALF-OPEN โ†’ Testing recovery

๐Ÿ’ป Building Our Circuit Breaker

I'll show you exactly how we implemented it:

package circuitbreaker

type State int

const (
    StateClosed State = iota
    StateOpen
    StateHalfOpen
)

type CircuitBreaker struct {
    mu               sync.RWMutex
    state            State
    failureCount     int
    failureThreshold int
    resetTimeout     time.Duration
    lastFailureTime  time.Time
}

func (cb *CircuitBreaker) Execute(fn func() error) error {
    if !cb.allowRequest() {
        return ErrCircuitBreakerOpen
    }
    
    err := fn()
    cb.recordResult(err == nil)
    return err
}

func (cb *CircuitBreaker) allowRequest() bool {
    cb.mu.RLock()
    defer cb.mu.RUnlock()
    
    switch cb.state {
    case StateClosed:
        return true
    case StateOpen:
        return time.Since(cb.lastFailureTime) > cb.resetTimeout
    case StateHalfOpen:
        return cb.failureCount < 3 // Limited probes
    }
    return false
}

๐Ÿ—๏ธ Real Implementation: Payment Client

Here's our actual payment service client with circuit breaker:

type PaymentClient struct {
    client  *http.Client
    breaker *CircuitBreaker
}

func (p *PaymentClient) ProcessPayment(ctx context.Context, req PaymentRequest) (*PaymentResponse, error) {
    var result *PaymentResponse
    
    err := p.breaker.Execute(func() error {
        resp, err := p.client.PostJSON(ctx, "/payments", req)
        if err != nil {
            return err
        }
        
        if resp.StatusCode >= 500 {
            return fmt.Errorf("payment service error: %d", resp.StatusCode)
        }
        
        return json.NewDecoder(resp.Body).Decode(&result)
    })
    
    return result, err
}

๐ŸŽฏ The Game Changer: Graceful Degradation

When the circuit breaker opens, we don't just fail. We have a fallback:

func (os *OrderService) ProcessOrder(w http.ResponseWriter, r *http.Request) {
    payment, err := os.paymentClient.ProcessPayment(ctx, paymentReq)
    
    if err == circuitbreaker.ErrCircuitBreakerOpen {
        // Payment service is down - queue for later!
        orderID := os.queueOrder(order)
        
        w.WriteHeader(http.StatusAccepted)
        json.NewEncoder(w).Encode(OrderResponse{
            ID:     orderID,
            Status: "pending",
            Message: "Order received, payment will be processed shortly",
        })
        return
    }
    
    if err != nil {
        http.Error(w, "Payment failed", http.StatusBadRequest)
        return
    }
    
    // Success path
    json.NewEncoder(w).Encode(OrderResponse{
        ID:        order.ID,
        Status:    "completed",
        PaymentID: payment.ID,
    })
}

๐Ÿ“Š The Results Speak for Themselves

Before Circuit Breaker:

  • MTTR: 45 minutes (manual recovery)
  • Failed requests: 100% during outage
  • Revenue lost: $2M
  • Services affected: 15/20

After Circuit Breaker:

  • MTTR: 2 minutes (automatic recovery)
  • Failed requests: 0% (queued instead)
  • Revenue lost: <$50k
  • Services affected: 1/20

๐Ÿš€ Advanced Features We Added

1. Adaptive Timeouts

type AdaptiveBreaker struct {
    *CircuitBreaker
    avgLatency time.Duration
    timeout    time.Duration
}

func (ab *AdaptiveBreaker) calculateTimeout() time.Duration {
    return ab.avgLatency * 3 // 3x average as timeout
}

2. Health Check Probe

func (cb *CircuitBreaker) StartHealthCheck(healthURL string) {
    go func() {
        ticker := time.NewTicker(30 * time.Second)
        for range ticker.C {
            if cb.State() == StateOpen {
                if cb.checkHealth(healthURL) {
                    cb.forceHalfOpen() // Force recovery attempt
                }
            }
        }
    }()
}

3. Metrics Integration

func (cb *CircuitBreaker) onStateChange(from, to State) {
    prometheus.CircuitBreakerState.WithLabelValues(cb.name).Set(float64(to))
    
    if to == StateOpen {
        prometheus.CircuitBreakerTrips.WithLabelValues(cb.name).Inc()
        log.Printf("๐Ÿšจ Circuit breaker %s opened", cb.name)
        
        // Alert on-call engineer
        alertmanager.Send(Alert{
            Service:     cb.name,
            Severity:    "critical",
            Description: "Circuit breaker opened due to repeated failures",
        })
    }
}

๐Ÿงช Testing Strategy

We learned testing circuit breakers is tricky. Here's our approach:

func TestCircuitBreakerWithRealService(t *testing.T) {
    // Use httptest.Server for realistic testing
    server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        time.Sleep(100 * time.Millisecond) // Simulate slow service
        w.WriteHeader(http.StatusInternalServerError)
    }))
    defer server.Close()
    
    client := NewPaymentClient(server.URL)
    
    // Should open after 5 failures
    for i := 0; i < 5; i++ {
        _, err := client.ProcessPayment(context.Background(), PaymentRequest{})
        assert.Error(t, err)
    }
    
    // Should now be open
    _, err := client.ProcessPayment(context.Background(), PaymentRequest{})
    assert.Equal(t, circuitbreaker.ErrCircuitBreakerOpen, err)
}

โš ๏ธ Gotchas We Discovered

1. Don't Set Thresholds Too Low

// BAD: Will trip on single network hiccup
config := Config{FailureThreshold: 1}

// GOOD: Allows for occasional failures
config := Config{FailureThreshold: 5}

2. Consider Different Error Types

func isRetriableError(err error) bool {
    // Don't trip breaker for client errors
    if httpErr, ok := err.(*HTTPError); ok {
        return httpErr.StatusCode >= 500
    }
    return true
}

3. Monitor Half-Open State

The half-open state is critical but often overlooked:

func (cb *CircuitBreaker) executeInHalfOpen(fn func() error) error {
    // Limited concurrency in half-open
    if !cb.halfOpenSemaphore.TryAcquire(1) {
        return ErrTooManyRequests
    }
    defer cb.halfOpenSemaphore.Release(1)
    
    return fn()
}

๐ŸŽฏ Key Takeaways

  1. Circuit breakers prevent cascade failures - One service's problems don't become everyone's problems
  2. Graceful degradation > complete failure - Queue operations when possible
  3. Monitor everything - You need visibility into breaker state changes
  4. Test with realistic scenarios - Unit tests aren't enough
  5. Tune based on real traffic - Every service has different failure patterns

๐Ÿ”ฎ What's Next?

In our next post, I'll show you how we built adaptive circuit breakers that adjust thresholds based on traffic patterns and bulkhead isolation to prevent resource exhaustion.


Question: Have you experienced cascade failures in your microservices? How did you handle it? Drop a comment below!

P.S. The full code is on GitHub - star it if this helped you!

WY

Wang Yinneng

Senior Golang Backend & Web3 Developer with 10+ years of experience building scalable systems and blockchain solutions.

View Full Profile โ†’