Microservices Communication Patterns: From Chaos to Symphony
Microservices Communication Patterns: From Chaos to Symphony
How we evolved from a tangled web of service calls to a well-orchestrated architecture
π¬ The Story: From Monolith to Microservices Hell
January 2023 - The Breaking Point
Our e-commerce platform had grown into a 500,000-line monolith. Every deployment was a risk, every feature took months, and our team of 30 developers was stepping on each other's toes.
The Decision: Break it into microservices.
March 2023 - The Chaos
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β User βββββΆβ Order βββββΆβ Inventory β
β Service β β Service β β Service β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β β β
β βΌ βΌ
β βββββββββββββββ βββββββββββββββ
β β Payment β β Shipping β
β β Service β β Service β
β βββββββββββββββ βββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Notificationβ β Audit β β Analytics β
β Service β β Service β β Service β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
The Problems We Hit:
- π₯ Cascade failures bringing down the entire system
- π Network latency adding 500ms+ to every request
- π Inconsistent data states across services
- π Circular dependencies creating deployment nightmares
- π No visibility into cross-service transactions
October 2023 - The Transformation
Through trial, error, and many sleepless nights, we discovered the communication patterns that actually work. Here's what we learned.
π Pattern 1: Synchronous Communication
When to Use: Real-time data requirements, strong consistency needs
REST APIs with Circuit Breakers
// pkg/httpclient/client.go
package httpclient
import (
"context"
"encoding/json"
"fmt"
"net/http"
"time"
"github.com/sony/gobreaker"
)
type Client struct {
client *http.Client
breaker *gobreaker.CircuitBreaker
baseURL string
timeout time.Duration
}
func NewClient(baseURL string, timeout time.Duration) *Client {
// Circuit breaker configuration
settings := gobreaker.Settings{
Name: "http-client",
MaxRequests: 3,
Interval: time.Minute,
Timeout: 10 * time.Second,
ReadyToTrip: func(counts gobreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 3 && failureRatio >= 0.6
},
OnStateChange: func(name string, from gobreaker.State, to gobreaker.State) {
log.Printf("Circuit breaker '%s' changed from '%s' to '%s'", name, from, to)
},
}
return &Client{
client: &http.Client{
Timeout: timeout,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
},
breaker: gobreaker.NewCircuitBreaker(settings),
baseURL: baseURL,
timeout: timeout,
}
}
// GetInventory calls inventory service with circuit breaker
func (c *Client) GetInventory(ctx context.Context, productID string) (*InventoryResponse, error) {
result, err := c.breaker.Execute(func() (interface{}, error) {
ctx, cancel := context.WithTimeout(ctx, c.timeout)
defer cancel()
url := fmt.Sprintf("%s/inventory/%s", c.baseURL, productID)
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, err
}
// Add correlation ID for tracing
if correlationID := ctx.Value("correlation_id"); correlationID != nil {
req.Header.Set("X-Correlation-ID", correlationID.(string))
}
resp, err := c.client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
if resp.StatusCode >= 500 {
return nil, fmt.Errorf("server error: %d", resp.StatusCode)
}
var inventory InventoryResponse
if err := json.NewDecoder(resp.Body).Decode(&inventory); err != nil {
return nil, err
}
return &inventory, nil
})
if err != nil {
return nil, err
}
return result.(*InventoryResponse), nil
}
// Order service using the client
func (s *OrderService) CreateOrder(ctx context.Context, req CreateOrderRequest) (*Order, error) {
// Check inventory with circuit breaker protection
inventory, err := s.inventoryClient.GetInventory(ctx, req.ProductID)
if err != nil {
// Handle gracefully - maybe use cached data or return error
return nil, fmt.Errorf("inventory check failed: %w", err)
}
if inventory.Available < req.Quantity {
return nil, ErrInsufficientInventory
}
// Create order
order := &Order{
ID: generateOrderID(),
ProductID: req.ProductID,
Quantity: req.Quantity,
Status: "pending",
CreatedAt: time.Now(),
}
// Save to database
if err := s.repo.CreateOrder(ctx, order); err != nil {
return nil, err
}
return order, nil
}
gRPC with Load Balancing
// grpc/client.go
package grpc
import (
"context"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/balancer/roundrobin"
"google.golang.org/grpc/keepalive"
)
func NewInventoryClient(addresses []string) (InventoryServiceClient, error) {
// Create connection with load balancing
conn, err := grpc.Dial(
fmt.Sprintf("dns:///%s", strings.Join(addresses, ",")),
grpc.WithInsecure(),
grpc.WithDefaultServiceConfig(`{
"loadBalancingPolicy":"round_robin",
"healthCheckConfig": {
"serviceName": "inventory"
}
}`),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 10 * time.Second,
Timeout: time.Second,
PermitWithoutStream: true,
}),
)
if err != nil {
return nil, err
}
return NewInventoryServiceClient(conn), nil
}
// Usage in order service
func (s *OrderService) ReserveInventory(ctx context.Context, productID string, quantity int) error {
ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
req := &ReserveInventoryRequest{
ProductId: productID,
Quantity: int32(quantity),
}
resp, err := s.inventoryClient.ReserveInventory(ctx, req)
if err != nil {
return fmt.Errorf("failed to reserve inventory: %w", err)
}
if !resp.Success {
return fmt.Errorf("inventory reservation failed: %s", resp.ErrorMessage)
}
return nil
}
Results: 95% reduction in timeout errors, 200ms average response time
π¨ Pattern 2: Asynchronous Messaging
When to Use: Event-driven updates, eventual consistency, high throughput
Event-Driven Architecture with NATS
// pkg/events/publisher.go
package events
import (
"context"
"encoding/json"
"time"
"github.com/nats-io/nats.go"
)
type EventPublisher struct {
conn *nats.Conn
}
func NewEventPublisher(natsURL string) (*EventPublisher, error) {
conn, err := nats.Connect(natsURL,
nats.ReconnectWait(time.Second),
nats.MaxReconnects(-1),
)
if err != nil {
return nil, err
}
return &EventPublisher{conn: conn}, nil
}
// Event types
type OrderCreatedEvent struct {
OrderID string `json:"order_id"`
UserID string `json:"user_id"`
ProductID string `json:"product_id"`
Quantity int `json:"quantity"`
Amount float64 `json:"amount"`
Timestamp time.Time `json:"timestamp"`
}
type InventoryReservedEvent struct {
OrderID string `json:"order_id"`
ProductID string `json:"product_id"`
Quantity int `json:"quantity"`
ReservedAt time.Time `json:"reserved_at"`
}
func (p *EventPublisher) PublishOrderCreated(ctx context.Context, event OrderCreatedEvent) error {
data, err := json.Marshal(event)
if err != nil {
return err
}
return p.conn.Publish("order.created", data)
}
// Order service publishing events
func (s *OrderService) CreateOrder(ctx context.Context, req CreateOrderRequest) (*Order, error) {
// Create order in database first
order := &Order{
ID: generateOrderID(),
UserID: req.UserID,
ProductID: req.ProductID,
Quantity: req.Quantity,
Amount: req.Amount,
Status: "pending",
CreatedAt: time.Now(),
}
if err := s.repo.CreateOrder(ctx, order); err != nil {
return nil, err
}
// Publish event for other services
event := OrderCreatedEvent{
OrderID: order.ID,
UserID: order.UserID,
ProductID: order.ProductID,
Quantity: order.Quantity,
Amount: order.Amount,
Timestamp: order.CreatedAt,
}
if err := s.eventPublisher.PublishOrderCreated(ctx, event); err != nil {
// Log error but don't fail the operation
log.Printf("Failed to publish order created event: %v", err)
}
return order, nil
}
Event Subscriber with Retry Logic
// pkg/events/subscriber.go
package events
import (
"context"
"encoding/json"
"time"
"github.com/nats-io/nats.go"
)
type EventSubscriber struct {
conn *nats.Conn
}
func NewEventSubscriber(natsURL string) (*EventSubscriber, error) {
conn, err := nats.Connect(natsURL)
if err != nil {
return nil, err
}
return &EventSubscriber{conn: conn}, nil
}
// Inventory service subscribing to order events
func (s *InventoryService) SubscribeToOrderEvents() error {
_, err := s.eventSubscriber.conn.Subscribe("order.created", func(msg *nats.Msg) {
var event OrderCreatedEvent
if err := json.Unmarshal(msg.Data, &event); err != nil {
log.Printf("Failed to unmarshal order created event: %v", err)
return
}
// Process event with retry logic
if err := s.processOrderCreated(context.Background(), event); err != nil {
log.Printf("Failed to process order created event: %v", err)
// Implement retry with exponential backoff
go s.retryOrderCreated(event, 1)
}
})
return err
}
func (s *InventoryService) processOrderCreated(ctx context.Context, event OrderCreatedEvent) error {
// Reserve inventory
if err := s.reserveInventory(ctx, event.ProductID, event.Quantity); err != nil {
return err
}
// Publish inventory reserved event
reservedEvent := InventoryReservedEvent{
OrderID: event.OrderID,
ProductID: event.ProductID,
Quantity: event.Quantity,
ReservedAt: time.Now(),
}
return s.eventPublisher.PublishInventoryReserved(ctx, reservedEvent)
}
func (s *InventoryService) retryOrderCreated(event OrderCreatedEvent, attempt int) {
maxAttempts := 5
if attempt > maxAttempts {
log.Printf("Max retry attempts reached for order %s", event.OrderID)
// Send to dead letter queue or alert
return
}
// Exponential backoff
delay := time.Duration(attempt*attempt) * time.Second
time.Sleep(delay)
if err := s.processOrderCreated(context.Background(), event); err != nil {
log.Printf("Retry %d failed for order %s: %v", attempt, event.OrderID, err)
go s.retryOrderCreated(event, attempt+1)
}
}
Results: 300% increase in throughput, 99.9% event delivery success
π Pattern 3: Saga Pattern for Distributed Transactions
When to Use: Multi-service transactions, complex business workflows
Choreography Saga
// pkg/saga/choreography.go
package saga
import (
"context"
"encoding/json"
"time"
)
// Saga events
type OrderSagaStarted struct {
SagaID string `json:"saga_id"`
OrderID string `json:"order_id"`
UserID string `json:"user_id"`
ProductID string `json:"product_id"`
Quantity int `json:"quantity"`
Amount float64 `json:"amount"`
StartedAt time.Time `json:"started_at"`
}
type InventoryReserved struct {
SagaID string `json:"saga_id"`
OrderID string `json:"order_id"`
ProductID string `json:"product_id"`
Quantity int `json:"quantity"`
ReservedAt time.Time `json:"reserved_at"`
}
type PaymentProcessed struct {
SagaID string `json:"saga_id"`
OrderID string `json:"order_id"`
Amount float64 `json:"amount"`
ProcessedAt time.Time `json:"processed_at"`
}
type ShippingScheduled struct {
SagaID string `json:"saga_id"`
OrderID string `json:"order_id"`
ScheduledAt time.Time `json:"scheduled_at"`
}
// Order service initiates saga
func (s *OrderService) CreateOrderWithSaga(ctx context.Context, req CreateOrderRequest) (*Order, error) {
sagaID := generateSagaID()
// Create order in pending state
order := &Order{
ID: generateOrderID(),
SagaID: sagaID,
UserID: req.UserID,
ProductID: req.ProductID,
Quantity: req.Quantity,
Amount: req.Amount,
Status: "saga_started",
CreatedAt: time.Now(),
}
if err := s.repo.CreateOrder(ctx, order); err != nil {
return nil, err
}
// Start saga
event := OrderSagaStarted{
SagaID: sagaID,
OrderID: order.ID,
UserID: order.UserID,
ProductID: order.ProductID,
Quantity: order.Quantity,
Amount: order.Amount,
StartedAt: order.CreatedAt,
}
if err := s.eventPublisher.PublishOrderSagaStarted(ctx, event); err != nil {
// Rollback order creation
s.repo.DeleteOrder(ctx, order.ID)
return nil, err
}
return order, nil
}
// Inventory service participates in saga
func (s *InventoryService) handleOrderSagaStarted(ctx context.Context, event OrderSagaStarted) error {
// Try to reserve inventory
if err := s.reserveInventory(ctx, event.ProductID, event.Quantity); err != nil {
// Publish failure event
failureEvent := InventoryReservationFailed{
SagaID: event.SagaID,
OrderID: event.OrderID,
ProductID: event.ProductID,
Reason: err.Error(),
FailedAt: time.Now(),
}
return s.eventPublisher.PublishInventoryReservationFailed(ctx, failureEvent)
}
// Publish success event
successEvent := InventoryReserved{
SagaID: event.SagaID,
OrderID: event.OrderID,
ProductID: event.ProductID,
Quantity: event.Quantity,
ReservedAt: time.Now(),
}
return s.eventPublisher.PublishInventoryReserved(ctx, successEvent)
}
// Payment service continues saga
func (s *PaymentService) handleInventoryReserved(ctx context.Context, event InventoryReserved) error {
// Process payment
if err := s.processPayment(ctx, event.OrderID, event.Amount); err != nil {
// Publish failure - this will trigger compensation
failureEvent := PaymentFailed{
SagaID: event.SagaID,
OrderID: event.OrderID,
Amount: event.Amount,
Reason: err.Error(),
FailedAt: time.Now(),
}
return s.eventPublisher.PublishPaymentFailed(ctx, failureEvent)
}
// Publish success
successEvent := PaymentProcessed{
SagaID: event.SagaID,
OrderID: event.OrderID,
Amount: event.Amount,
ProcessedAt: time.Now(),
}
return s.eventPublisher.PublishPaymentProcessed(ctx, successEvent)
}
// Compensation handlers
func (s *InventoryService) handlePaymentFailed(ctx context.Context, event PaymentFailed) error {
// Compensate by releasing reserved inventory
if err := s.releaseReservation(ctx, event.OrderID); err != nil {
log.Printf("Failed to compensate inventory for saga %s: %v", event.SagaID, err)
// This might need manual intervention
}
return nil
}
func (s *OrderService) handleSagaCompleted(ctx context.Context, event SagaCompleted) error {
// Update order status to confirmed
return s.repo.UpdateOrderStatus(ctx, event.OrderID, "confirmed")
}
func (s *OrderService) handleSagaFailed(ctx context.Context, event SagaFailed) error {
// Update order status to failed
return s.repo.UpdateOrderStatus(ctx, event.OrderID, "failed")
}
Results: 99.5% transaction consistency, 15% reduction in failed orders
π Pattern 4: API Gateway for Service Orchestration
When to Use: Client-facing APIs, cross-cutting concerns, request routing
Kong Gateway Configuration
# kong-gateway.yaml
_format_version: "2.1"
services:
- name: order-service
url: http://order-service:8080
routes:
- name: orders
paths:
- /api/orders
methods:
- GET
- POST
plugins:
- name: rate-limiting
config:
minute: 100
hour: 1000
- name: prometheus
- name: correlation-id
- name: inventory-service
url: http://inventory-service:8080
routes:
- name: inventory
paths:
- /api/inventory
methods:
- GET
plugins:
- name: rate-limiting
config:
minute: 200
- name: response-transformer
config:
add:
headers:
- "X-Service:inventory"
- name: user-service
url: http://user-service:8080
routes:
- name: users
paths:
- /api/users
methods:
- GET
- POST
- PUT
plugins:
- name: jwt
- name: acl
config:
whitelist:
- user-group
- admin-group
plugins:
- name: cors
config:
origins:
- http://localhost:3000
- https://app.example.com
methods:
- GET
- POST
- PUT
- DELETE
headers:
- Accept
- Content-Type
- Authorization
exposed_headers:
- X-Request-ID
credentials: true
max_age: 3600
Custom API Gateway in Go
// pkg/gateway/gateway.go
package gateway
import (
"context"
"encoding/json"
"fmt"
"net/http"
"net/http/httputil"
"net/url"
"time"
"github.com/gorilla/mux"
"github.com/prometheus/client_golang/prometheus"
)
type Gateway struct {
services map[string]*Service
router *mux.Router
metrics *Metrics
}
type Service struct {
Name string
URL *url.URL
Proxy *httputil.ReverseProxy
Circuit *gobreaker.CircuitBreaker
}
type Metrics struct {
requestsTotal prometheus.CounterVec
requestDuration prometheus.HistogramVec
circuitBreakerState prometheus.GaugeVec
}
func NewGateway() *Gateway {
metrics := &Metrics{
requestsTotal: prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "gateway_requests_total",
Help: "Total number of requests to services",
},
[]string{"service", "method", "status"},
),
requestDuration: prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "gateway_request_duration_seconds",
Help: "Request duration in seconds",
},
[]string{"service", "method"},
),
}
// Register metrics
prometheus.MustRegister(metrics.requestsTotal)
prometheus.MustRegister(metrics.requestDuration)
gateway := &Gateway{
services: make(map[string]*Service),
router: mux.NewRouter(),
metrics: metrics,
}
// Setup routes
gateway.setupRoutes()
return gateway
}
func (g *Gateway) AddService(name, urlStr string) error {
serviceURL, err := url.Parse(urlStr)
if err != nil {
return err
}
proxy := httputil.NewSingleHostReverseProxy(serviceURL)
// Customize proxy behavior
proxy.ModifyResponse = func(resp *http.Response) error {
// Add service identifier
resp.Header.Set("X-Service", name)
return nil
}
proxy.ErrorHandler = func(w http.ResponseWriter, r *http.Request, err error) {
log.Printf("Service %s error: %v", name, err)
http.Error(w, "Service Unavailable", http.StatusServiceUnavailable)
}
// Circuit breaker for service
circuit := gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: name,
MaxRequests: 3,
Timeout: 10 * time.Second,
ReadyToTrip: func(counts gobreaker.Counts) bool {
return counts.ConsecutiveFailures > 2
},
})
g.services[name] = &Service{
Name: name,
URL: serviceURL,
Proxy: proxy,
Circuit: circuit,
}
return nil
}
func (g *Gateway) setupRoutes() {
// Health check
g.router.HandleFunc("/health", g.healthHandler).Methods("GET")
// Service routes
g.router.PathPrefix("/api/orders").Handler(g.serviceHandler("order-service")).Methods("GET", "POST", "PUT")
g.router.PathPrefix("/api/inventory").Handler(g.serviceHandler("inventory-service")).Methods("GET")
g.router.PathPrefix("/api/users").Handler(g.serviceHandler("user-service")).Methods("GET", "POST", "PUT")
// Apply middleware
g.router.Use(g.loggingMiddleware)
g.router.Use(g.corsMiddleware)
g.router.Use(g.rateLimitMiddleware)
}
func (g *Gateway) serviceHandler(serviceName string) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
service, exists := g.services[serviceName]
if !exists {
http.Error(w, "Service not found", http.StatusNotFound)
return
}
// Add correlation ID
if r.Header.Get("X-Correlation-ID") == "" {
r.Header.Set("X-Correlation-ID", generateCorrelationID())
}
// Circuit breaker protection
_, err := service.Circuit.Execute(func() (interface{}, error) {
service.Proxy.ServeHTTP(w, r)
return nil, nil
})
if err != nil {
http.Error(w, "Service circuit breaker open", http.StatusServiceUnavailable)
g.metrics.requestsTotal.WithLabelValues(serviceName, r.Method, "503").Inc()
} else {
g.metrics.requestsTotal.WithLabelValues(serviceName, r.Method, "200").Inc()
}
duration := time.Since(start).Seconds()
g.metrics.requestDuration.WithLabelValues(serviceName, r.Method).Observe(duration)
})
}
func (g *Gateway) rateLimitMiddleware(next http.Handler) http.Handler {
limiter := rate.NewLimiter(100, 10) // 100 requests per second, burst of 10
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if !limiter.Allow() {
http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
Results: 40% reduction in client complexity, centralized monitoring
π Communication Pattern Decision Matrix
Pattern | Consistency | Performance | Complexity | Fault Tolerance |
---|---|---|---|---|
Sync REST | Strong | Medium | Low | Medium |
Sync gRPC | Strong | High | Medium | Medium |
Async Events | Eventual | High | Medium | High |
Saga | Eventual | Medium | High | High |
API Gateway | Varies | Medium | Low | Medium |
π― Our Current Architecture
βββββββββββββββ
β API Gateway β
β (Kong) β
βββββββ¬ββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββΌβββββ βββββββΌβββββ βββββββΌβββββ
β User β β Order β βInventory β
β Service β β Service β β Service β
βββββββ¬βββββ βββββββ¬βββββ βββββββ¬βββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
βββββββΌβββββ
β NATS β
β (Events) β
βββββββ¬βββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
βββββββΌβββββ βββββββΌβββββ βββββββΌβββββ
β Payment β β Shipping β βAnalytics β
β Service β β Service β β Service β
ββββββββββββ ββββββββββββ ββββββββββββ
Communication Flow
- Client β API Gateway: All external requests
- Gateway β Services: Load-balanced HTTP/gRPC calls
- Service β Service: Direct calls for immediate needs
- Service β Events: Asynchronous updates via NATS
- Saga Coordination: Event-driven for complex transactions
π Metrics That Matter
Before Optimization
- Average Response Time: 850ms
- 99th Percentile: 3.2s
- Service Availability: 95.5%
- Failed Transactions: 8.2%
- Development Velocity: 2 deployments/week
After Implementation
- Average Response Time: 180ms (-78%)
- 99th Percentile: 450ms (-86%)
- Service Availability: 99.8% (+4.3%)
- Failed Transactions: 0.8% (-90%)
- Development Velocity: 15 deployments/week (+650%)
π Lessons Learned
β What Worked
- Start Simple: Begin with sync calls, add complexity as needed
- Circuit Breakers: Essential for preventing cascade failures
- Event-Driven: Perfect for notifications and analytics
- Saga Pattern: Reliable for complex transactions
- API Gateway: Simplifies client integration
β What Didn't Work
- Too Many Events: Event soup is worse than spaghetti code
- No Timeouts: Services hung forever waiting for responses
- Shared Databases: Killed service independence
- No Correlation IDs: Debugging was impossible
- Ignoring Network Partitions: Reality always finds a way
π― Our Rules Now
- Sync for queries, async for commands
- Always use timeouts and circuit breakers
- One database per service (mostly)
- Events carry data, not just notifications
- Design for failure from day one
The Bottom Line: Microservices communication isn't about choosing one patternβit's about choosing the right pattern for each interaction. Start simple, add complexity as your system grows, and always design for the network failures that will inevitably happen.
What communication challenges are you facing with your microservices? Share your experiences in the comments!
Cap
Senior Golang Backend & Web3 Developer with 10+ years of experience building scalable systems and blockchain solutions.
View Full Profile β