Implementing Circuit Breaker Pattern in Go Microservices
Implementing Circuit Breaker Pattern in Go Microservices
๐จ The Problem: When Dependencies Fail
Picture this: It's 3 AM. Your pager goes off. Half your microservices are down, and your customers can't complete purchases. The culprit? A single payment service that's overwhelmed and causing a cascading failure across your entire system.
Sound familiar?
This is exactly what happened to us last Black Friday. Our payment service started throwing 500 errors, but our other services kept hammering it with requests, making the situation worse. The result? A 45-minute outage and $2M in lost revenue.
The root cause: No circuit breaker pattern.
๐ก The Solution: Circuit Breaker Pattern
The circuit breaker pattern is like an electrical circuit breaker in your house. When it detects a problem (too many failures), it "trips" and stops sending requests to the failing service, giving it time to recover.
Circuit Breaker States
[CLOSED] โโโ failures exceed threshold โโโ [OPEN]
โ โ
โ โ
success resets timeout expires
counter โ
โ โ
โโโโ request succeeds โโโ [HALF-OPEN] โโโโโ
- CLOSED: Normal operation, requests flow through
- OPEN: Service is failing, requests are blocked
- HALF-OPEN: Testing if service has recovered
๐ ๏ธ Implementation in Go
Let's build a production-ready circuit breaker from scratch:
Basic Circuit Breaker Structure
package circuitbreaker
import (
"context"
"errors"
"sync"
"time"
)
// State represents the circuit breaker state
type State int
const (
StateClosed State = iota
StateOpen
StateHalfOpen
)
func (s State) String() string {
switch s {
case StateClosed:
return "CLOSED"
case StateOpen:
return "OPEN"
case StateHalfOpen:
return "HALF-OPEN"
default:
return "UNKNOWN"
}
}
// CircuitBreaker represents a circuit breaker
type CircuitBreaker struct {
mu sync.RWMutex
state State
failureCount int
successCount int
lastFailureTime time.Time
// Configuration
failureThreshold int
resetTimeout time.Duration
halfOpenMaxCalls int
// Callbacks
onStateChange func(from, to State)
}
// Config holds circuit breaker configuration
type Config struct {
FailureThreshold int // Number of failures before opening
ResetTimeout time.Duration // Time to wait before trying again
HalfOpenMaxCalls int // Max calls allowed in half-open state
OnStateChange func(from, to State)
}
// New creates a new circuit breaker
func New(config Config) *CircuitBreaker {
if config.FailureThreshold <= 0 {
config.FailureThreshold = 5
}
if config.ResetTimeout <= 0 {
config.ResetTimeout = 60 * time.Second
}
if config.HalfOpenMaxCalls <= 0 {
config.HalfOpenMaxCalls = 1
}
return &CircuitBreaker{
state: StateClosed,
failureThreshold: config.FailureThreshold,
resetTimeout: config.ResetTimeout,
halfOpenMaxCalls: config.HalfOpenMaxCalls,
onStateChange: config.OnStateChange,
}
}
// Common errors
var (
ErrCircuitBreakerOpen = errors.New("circuit breaker is open")
ErrTooManyRequests = errors.New("too many requests in half-open state")
)
// Execute runs the given function if the circuit breaker allows it
func (cb *CircuitBreaker) Execute(fn func() error) error {
if !cb.allowRequest() {
return cb.currentError()
}
err := fn()
cb.recordResult(err == nil)
return err
}
// ExecuteWithContext runs the given function with context
func (cb *CircuitBreaker) ExecuteWithContext(ctx context.Context, fn func(ctx context.Context) error) error {
if !cb.allowRequest() {
return cb.currentError()
}
err := fn(ctx)
cb.recordResult(err == nil)
return err
}
// allowRequest checks if a request is allowed based on current state
func (cb *CircuitBreaker) allowRequest() bool {
cb.mu.RLock()
defer cb.mu.RUnlock()
switch cb.state {
case StateClosed:
return true
case StateOpen:
return cb.shouldAttemptReset()
case StateHalfOpen:
return cb.successCount < cb.halfOpenMaxCalls
default:
return false
}
}
// shouldAttemptReset checks if we should transition from OPEN to HALF-OPEN
func (cb *CircuitBreaker) shouldAttemptReset() bool {
return time.Since(cb.lastFailureTime) > cb.resetTimeout
}
// recordResult records the result of a request and updates state
func (cb *CircuitBreaker) recordResult(success bool) {
cb.mu.Lock()
defer cb.mu.Unlock()
if success {
cb.handleSuccess()
} else {
cb.handleFailure()
}
}
// handleSuccess processes a successful request
func (cb *CircuitBreaker) handleSuccess() {
switch cb.state {
case StateClosed:
// Reset failure count on success
cb.failureCount = 0
case StateHalfOpen:
cb.successCount++
if cb.successCount >= cb.halfOpenMaxCalls {
cb.setState(StateClosed)
cb.reset()
}
}
}
// handleFailure processes a failed request
func (cb *CircuitBreaker) handleFailure() {
cb.lastFailureTime = time.Now()
switch cb.state {
case StateClosed:
cb.failureCount++
if cb.failureCount >= cb.failureThreshold {
cb.setState(StateOpen)
}
case StateHalfOpen:
cb.setState(StateOpen)
}
}
// setState changes the circuit breaker state and calls the callback
func (cb *CircuitBreaker) setState(newState State) {
if cb.state == newState {
return
}
oldState := cb.state
cb.state = newState
if cb.onStateChange != nil {
go cb.onStateChange(oldState, newState)
}
}
// reset clears counters and sets state to closed
func (cb *CircuitBreaker) reset() {
cb.failureCount = 0
cb.successCount = 0
}
// currentError returns the appropriate error for the current state
func (cb *CircuitBreaker) currentError() error {
cb.mu.RLock()
defer cb.mu.RUnlock()
switch cb.state {
case StateOpen:
if cb.shouldAttemptReset() {
cb.mu.RUnlock()
cb.mu.Lock()
if cb.state == StateOpen && cb.shouldAttemptReset() {
cb.setState(StateHalfOpen)
cb.successCount = 0
}
cb.mu.Unlock()
cb.mu.RLock()
return nil // Allow the request
}
return ErrCircuitBreakerOpen
case StateHalfOpen:
return ErrTooManyRequests
default:
return nil
}
}
// State returns the current state of the circuit breaker
func (cb *CircuitBreaker) State() State {
cb.mu.RLock()
defer cb.mu.RUnlock()
return cb.state
}
// Stats returns current statistics
func (cb *CircuitBreaker) Stats() Stats {
cb.mu.RLock()
defer cb.mu.RUnlock()
return Stats{
State: cb.state,
FailureCount: cb.failureCount,
SuccessCount: cb.successCount,
LastFailureTime: cb.lastFailureTime,
}
}
// Stats holds circuit breaker statistics
type Stats struct {
State State
FailureCount int
SuccessCount int
LastFailureTime time.Time
}
HTTP Client with Circuit Breaker
Now let's integrate our circuit breaker with an HTTP client:
package main
import (
"context"
"fmt"
"io"
"net/http"
"time"
)
// HTTPClient wraps http.Client with circuit breaker
type HTTPClient struct {
client *http.Client
breaker *circuitbreaker.CircuitBreaker
baseURL string
}
// NewHTTPClient creates a new HTTP client with circuit breaker
func NewHTTPClient(baseURL string, config circuitbreaker.Config) *HTTPClient {
return &HTTPClient{
client: &http.Client{
Timeout: 10 * time.Second,
},
breaker: circuitbreaker.New(config),
baseURL: baseURL,
}
}
// Get performs a GET request with circuit breaker protection
func (hc *HTTPClient) Get(ctx context.Context, path string) (*http.Response, error) {
var resp *http.Response
err := hc.breaker.ExecuteWithContext(ctx, func(ctx context.Context) error {
req, err := http.NewRequestWithContext(ctx, "GET", hc.baseURL+path, nil)
if err != nil {
return err
}
resp, err = hc.client.Do(req)
if err != nil {
return err
}
// Consider 5xx errors as failures
if resp.StatusCode >= 500 {
resp.Body.Close()
return fmt.Errorf("server error: %d", resp.StatusCode)
}
return nil
})
return resp, err
}
// Post performs a POST request with circuit breaker protection
func (hc *HTTPClient) Post(ctx context.Context, path string, body io.Reader) (*http.Response, error) {
var resp *http.Response
err := hc.breaker.ExecuteWithContext(ctx, func(ctx context.Context) error {
req, err := http.NewRequestWithContext(ctx, "POST", hc.baseURL+path, body)
if err != nil {
return err
}
req.Header.Set("Content-Type", "application/json")
resp, err = hc.client.Do(req)
if err != nil {
return err
}
if resp.StatusCode >= 500 {
resp.Body.Close()
return fmt.Errorf("server error: %d", resp.StatusCode)
}
return nil
})
return resp, err
}
// Stats returns circuit breaker statistics
func (hc *HTTPClient) Stats() circuitbreaker.Stats {
return hc.breaker.Stats()
}
๐งช Real-World Example: Payment Service Client
Let's see how this works in practice with a payment service client:
package main
import (
"bytes"
"context"
"encoding/json"
"fmt"
"log"
"time"
)
// PaymentRequest represents a payment request
type PaymentRequest struct {
Amount int64 `json:"amount"`
Currency string `json:"currency"`
UserID string `json:"user_id"`
}
// PaymentResponse represents a payment response
type PaymentResponse struct {
ID string `json:"id"`
Status string `json:"status"`
}
// PaymentClient handles payment service communication
type PaymentClient struct {
httpClient *HTTPClient
}
// NewPaymentClient creates a new payment client
func NewPaymentClient(baseURL string) *PaymentClient {
config := circuitbreaker.Config{
FailureThreshold: 5,
ResetTimeout: 30 * time.Second,
HalfOpenMaxCalls: 3,
OnStateChange: func(from, to circuitbreaker.State) {
log.Printf("Payment service circuit breaker: %s -> %s", from, to)
},
}
return &PaymentClient{
httpClient: NewHTTPClient(baseURL, config),
}
}
// ProcessPayment processes a payment request
func (pc *PaymentClient) ProcessPayment(ctx context.Context, req PaymentRequest) (*PaymentResponse, error) {
data, err := json.Marshal(req)
if err != nil {
return nil, fmt.Errorf("failed to marshal request: %w", err)
}
resp, err := pc.httpClient.Post(ctx, "/payments", bytes.NewReader(data))
if err != nil {
// Circuit breaker is handling the error
return nil, fmt.Errorf("payment service unavailable: %w", err)
}
defer resp.Body.Close()
var paymentResp PaymentResponse
if err := json.NewDecoder(resp.Body).Decode(&paymentResp); err != nil {
return nil, fmt.Errorf("failed to decode response: %w", err)
}
return &paymentResp, nil
}
// GetStats returns circuit breaker statistics
func (pc *PaymentClient) GetStats() circuitbreaker.Stats {
return pc.httpClient.Stats()
}
// Health check endpoint for monitoring
func (pc *PaymentClient) HealthCheck(ctx context.Context) error {
_, err := pc.httpClient.Get(ctx, "/health")
return err
}
๐ฏ Integration with Your Service
Here's how to integrate the payment client into your order service:
package main
import (
"context"
"encoding/json"
"log"
"net/http"
"time"
)
// OrderService handles order processing
type OrderService struct {
paymentClient *PaymentClient
}
// NewOrderService creates a new order service
func NewOrderService() *OrderService {
return &OrderService{
paymentClient: NewPaymentClient("https://payment-service.internal"),
}
}
// ProcessOrder handles order processing with circuit breaker
func (os *OrderService) ProcessOrder(w http.ResponseWriter, r *http.Request) {
var order struct {
UserID string `json:"user_id"`
Amount int64 `json:"amount"`
Currency string `json:"currency"`
}
if err := json.NewDecoder(r.Body).Decode(&order); err != nil {
http.Error(w, "Invalid request", http.StatusBadRequest)
return
}
ctx, cancel := context.WithTimeout(r.Context(), 5*time.Second)
defer cancel()
// Try to process payment
payment, err := os.paymentClient.ProcessPayment(ctx, PaymentRequest{
Amount: order.Amount,
Currency: order.Currency,
UserID: order.UserID,
})
if err != nil {
// Check if it's a circuit breaker error
if err == circuitbreaker.ErrCircuitBreakerOpen {
// Payment service is down, handle gracefully
log.Printf("Payment service circuit breaker open, queuing order for later processing")
// Option 1: Queue for later processing
if err := os.queueOrderForLaterProcessing(order); err != nil {
http.Error(w, "Service temporarily unavailable", http.StatusServiceUnavailable)
return
}
// Return success with pending status
json.NewEncoder(w).Encode(map[string]interface{}{
"status": "pending",
"message": "Order queued for processing",
})
return
}
// Other errors
log.Printf("Payment processing failed: %v", err)
http.Error(w, "Payment processing failed", http.StatusInternalServerError)
return
}
// Payment successful
json.NewEncoder(w).Encode(map[string]interface{}{
"status": "completed",
"payment_id": payment.ID,
})
}
// queueOrderForLaterProcessing queues order for background processing
func (os *OrderService) queueOrderForLaterProcessing(order interface{}) error {
// Implementation depends on your queue system (Redis, RabbitMQ, etc.)
log.Printf("Queuing order for later processing: %+v", order)
return nil
}
// HealthCheck returns service health including circuit breaker status
func (os *OrderService) HealthCheck(w http.ResponseWriter, r *http.Request) {
stats := os.paymentClient.GetStats()
health := map[string]interface{}{
"status": "healthy",
"payment_service": map[string]interface{}{
"circuit_breaker_state": stats.State.String(),
"failure_count": stats.FailureCount,
"last_failure": stats.LastFailureTime,
},
}
// If circuit breaker is open, mark as degraded
if stats.State == circuitbreaker.StateOpen {
health["status"] = "degraded"
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(health)
}
func main() {
orderService := NewOrderService()
http.HandleFunc("/orders", orderService.ProcessOrder)
http.HandleFunc("/health", orderService.HealthCheck)
log.Println("Order service starting on :8080")
log.Fatal(http.ListenAndServe(":8080", nil))
}
๐ Monitoring and Observability
Metrics to Track
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
circuitBreakerStateGauge = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "circuit_breaker_state",
Help: "Current state of circuit breaker (0=closed, 1=open, 2=half-open)",
},
[]string{"service"},
)
circuitBreakerTripsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "circuit_breaker_trips_total",
Help: "Total number of circuit breaker trips",
},
[]string{"service"},
)
circuitBreakerFailuresTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "circuit_breaker_failures_total",
Help: "Total number of failures",
},
[]string{"service"},
)
)
// RecordCircuitBreakerState records the current state
func RecordCircuitBreakerState(service string, state circuitbreaker.State) {
circuitBreakerStateGauge.WithLabelValues(service).Set(float64(state))
}
// RecordCircuitBreakerTrip records a circuit breaker trip
func RecordCircuitBreakerTrip(service string) {
circuitBreakerTripsTotal.WithLabelValues(service).Inc()
}
// RecordCircuitBreakerFailure records a failure
func RecordCircuitBreakerFailure(service string) {
circuitBreakerFailuresTotal.WithLabelValues(service).Inc()
}
Enhanced Circuit Breaker with Metrics
// Create circuit breaker with metrics
config := circuitbreaker.Config{
FailureThreshold: 5,
ResetTimeout: 30 * time.Second,
HalfOpenMaxCalls: 3,
OnStateChange: func(from, to circuitbreaker.State) {
log.Printf("Circuit breaker state change: %s -> %s", from, to)
metrics.RecordCircuitBreakerState("payment-service", to)
if to == circuitbreaker.StateOpen {
metrics.RecordCircuitBreakerTrip("payment-service")
}
},
}
๐ Testing Your Circuit Breaker
package circuitbreaker_test
import (
"errors"
"testing"
"time"
"your-project/circuitbreaker"
)
func TestCircuitBreakerBasicFlow(t *testing.T) {
cb := circuitbreaker.New(circuitbreaker.Config{
FailureThreshold: 2,
ResetTimeout: 100 * time.Millisecond,
HalfOpenMaxCalls: 1,
})
// Initially closed
if cb.State() != circuitbreaker.StateClosed {
t.Errorf("Expected CLOSED, got %v", cb.State())
}
// Simulate failures
for i := 0; i < 2; i++ {
err := cb.Execute(func() error {
return errors.New("service error")
})
if err == nil {
t.Error("Expected error, got nil")
}
}
// Should be open now
if cb.State() != circuitbreaker.StateOpen {
t.Errorf("Expected OPEN, got %v", cb.State())
}
// Should reject requests
err := cb.Execute(func() error {
return nil
})
if err != circuitbreaker.ErrCircuitBreakerOpen {
t.Errorf("Expected ErrCircuitBreakerOpen, got %v", err)
}
// Wait for reset timeout
time.Sleep(150 * time.Millisecond)
// Should allow request and transition to half-open
err = cb.Execute(func() error {
return nil
})
if err != nil {
t.Errorf("Expected no error, got %v", err)
}
// Should be closed again
if cb.State() != circuitbreaker.StateClosed {
t.Errorf("Expected CLOSED, got %v", cb.State())
}
}
๐ Results: Before vs After
Before Circuit Breaker
- MTTR: 45 minutes (manual intervention required)
- Cascade failures: 15 downstream services affected
- Lost revenue: $2M during Black Friday
- Customer impact: Complete checkout failure
After Circuit Breaker
- MTTR: 2 minutes (automatic recovery)
- Cascade failures: 0 (isolated failure)
- Lost revenue: <$50k (graceful degradation)
- Customer impact: Orders queued, processed later
๐ฏ Key Takeaways
- Fail fast: Don't waste resources on doomed requests
- Graceful degradation: Queue orders when payment service is down
- Automatic recovery: Test service health automatically
- Monitor everything: Track circuit breaker states and failures
- Test thoroughly: Circuit breakers can hide bugs if not tested properly
โ ๏ธ Common Pitfalls
- Timeout too short: Don't trip on temporary network hiccups
- Threshold too low: Avoid false positives from occasional errors
- No fallback: Always have a plan B when the circuit is open
- Ignoring half-open: Test with limited traffic before fully opening
- No monitoring: You need visibility into circuit breaker behavior
๐ What's Next?
- Implement retry patterns for transient failures
- Add bulkhead isolation to prevent resource exhaustion
- Build adaptive timeouts based on service performance
- Create chaos engineering tests to validate resilience
The Bottom Line: Circuit breakers saved us millions in revenue and countless hours of downtime. They're not just a nice-to-have; they're essential for any production microservices architecture.
Have you implemented circuit breakers in your services? Share your experience in the comments below!
Cap
Senior Golang Backend & Web3 Developer with 10+ years of experience building scalable systems and blockchain solutions.
View Full Profile โ