Optimizing Go Applications: From 200ms to 2ms Response Times

📈 Performance Baseline: The Numbers Don't Lie

When our API started hitting 200ms response times under load, I knew we had a problem. Users were complaining, and our SLA was in jeopardy.

Here's the journey from slow to lightning-fast.

Initial Performance Metrics

$ wrk -t12 -c400 -d30s http://localhost:8080/api/users
Running 30s test @ http://localhost:8080/api/users
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   237.89ms   89.45ms   1.67s    71.23%
    Req/Sec   131.45     23.89   190.00     69.23%
  47,123 requests in 30.03s, 15.67MB read
Requests/sec: 1,569.12

Ouch. 237ms average latency with only 1,569 RPS. Let's fix this.

🔍 Step 1: CPU Profiling with pprof

First, let's see where our CPU time is going:

package main

import (
    "net/http"
    _ "net/http/pprof" // Import for side effects
    "log"
)

func main() {
    // Your existing handlers
    http.HandleFunc("/api/users", getUsersHandler)
    
    // pprof endpoints (don't expose in production!)
    go func() {
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    
    log.Fatal(http.ListenAndServe(":8080", nil))
}

Collecting CPU Profile

# Generate CPU profile under load
$ wrk -t4 -c100 -d30s http://localhost:8080/api/users &
$ go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Interactive analysis
(pprof) top
Showing nodes accounting for 12.48s, 89.34% of 13.97s total
Dropped 142 nodes (cum <= 0.07s)
      flat  flat%   sum%        cum   cum%
     4.12s 29.49% 29.49%      4.89s 35.01%  encoding/json.(*Encoder).Encode
     2.34s 16.75% 46.24%      2.87s 20.54%  database/sql.(*Rows).Scan
     1.89s 13.53% 59.77%      1.89s 13.53%  runtime.mallocgc
     1.45s 10.38% 70.15%      1.67s 11.96%  reflect.Value.Interface
     0.98s  7.01% 77.16%      1.23s  8.81%  strings.Split

Red flag #1: JSON encoding taking 29% of CPU time!

Memory Profiling

$ go tool pprof http://localhost:6060/debug/pprof/heap

(pprof) top
Showing nodes accounting for 145.67MB, 87.45% of 166.54MB total
      flat  flat%   sum%        cum   cum%
   67.89MB 40.76% 40.76%    89.23MB 53.58%  main.(*UserService).getUsers
   34.56MB 20.76% 61.52%    45.67MB 27.43%  encoding/json.Marshal
   23.45MB 14.08% 75.60%    23.45MB 14.08%  database/sql.(*Rows).Scan
   12.34MB  7.41% 83.01%    12.34MB  7.41%  reflect.New
    7.43MB  4.46% 87.47%     7.43MB  4.46%  strings.Split

Red flag #2: Massive memory allocations in user service!

🚀 Step 2: Benchmark-Driven Optimization

Let's create benchmarks to measure our improvements:

// user_service_test.go
package main

import (
    "testing"
    "database/sql"
    "encoding/json"
)

// Original slow implementation
func BenchmarkGetUsersOriginal(b *testing.B) {
    service := NewUserService(testDB)
    
    b.ResetTimer()
    b.ReportAllocs()
    
    for i := 0; i < b.N; i++ {
        users, err := service.GetUsers(100)
        if err != nil {
            b.Fatal(err)
        }
        _ = users
    }
}

// Results:
// BenchmarkGetUsersOriginal-8   	    1000	   1,234,567 ns/op	  456,789 B/op	    1,234 allocs/op

🔧 Step 3: Optimization Round 1 - Reduce Allocations

Problem: Excessive JSON Marshaling

Before (slow):

func (us *UserService) GetUsers(limit int) ([]User, error) {
    rows, err := us.db.Query("SELECT id, name, email FROM users LIMIT ?", limit)
    if err != nil {
        return nil, err
    }
    defer rows.Close()
    
    var users []User
    for rows.Next() {
        var user User
        if err := rows.Scan(&user.ID, &user.Name, &user.Email); err != nil {
            return nil, err
        }
        users = append(users, user) // Reallocations!
    }
    
    return users, nil
}

func getUsersHandler(w http.ResponseWriter, r *http.Request) {
    users, err := userService.GetUsers(100)
    if err != nil {
        http.Error(w, err.Error(), 500)
        return
    }
    
    // JSON encoding every time!
    json.NewEncoder(w).Encode(users)
}

After (fast):

func (us *UserService) GetUsers(limit int) ([]User, error) {
    rows, err := us.db.Query("SELECT id, name, email FROM users LIMIT ?", limit)
    if err != nil {
        return nil, err
    }
    defer rows.Close()
    
    // Pre-allocate slice
    users := make([]User, 0, limit)
    
    for rows.Next() {
        var user User
        if err := rows.Scan(&user.ID, &user.Name, &user.Email); err != nil {
            return nil, err
        }
        users = append(users, user)
    }
    
    return users, nil
}

// Use sync.Pool for JSON encoding
var jsonBufferPool = sync.Pool{
    New: func() interface{} {
        return bytes.NewBuffer(make([]byte, 0, 1024))
    },
}

func getUsersHandler(w http.ResponseWriter, r *http.Request) {
    users, err := userService.GetUsers(100)
    if err != nil {
        http.Error(w, err.Error(), 500)
        return
    }
    
    // Use pooled buffer
    buf := jsonBufferPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer jsonBufferPool.Put(buf)
    
    if err := json.NewEncoder(buf).Encode(users); err != nil {
        http.Error(w, err.Error(), 500)
        return
    }
    
    w.Header().Set("Content-Type", "application/json")
    w.Write(buf.Bytes())
}

Benchmark Results After Round 1:

// BenchmarkGetUsersOptimized1-8   	    3000	   456,789 ns/op	  123,456 B/op	     234 allocs/op

63% faster, 73% fewer allocations!

⚡ Step 4: Optimization Round 2 - Streaming JSON

Instead of marshaling everything in memory, let's stream directly to the response:

func getUsersHandler(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    
    // Stream JSON array directly
    fmt.Fprint(w, "[")
    
    rows, err := userService.db.Query("SELECT id, name, email FROM users LIMIT 100")
    if err != nil {
        http.Error(w, err.Error(), 500)
        return
    }
    defer rows.Close()
    
    first := true
    encoder := json.NewEncoder(w)
    
    for rows.Next() {
        var user User
        if err := rows.Scan(&user.ID, &user.Name, &user.Email); err != nil {
            continue
        }
        
        if !first {
            fmt.Fprint(w, ",")
        }
        first = false
        
        encoder.Encode(user)
    }
    
    fmt.Fprint(w, "]")
}

Even Better: Custom JSON Marshaling

type User struct {
    ID    int    `json:"id"`
    Name  string `json:"name"`
    Email string `json:"email"`
}

// Custom fast JSON marshaling
func (u User) MarshalJSON() ([]byte, error) {
    // Pre-allocate buffer based on expected size
    buf := make([]byte, 0, 128)
    buf = append(buf, '{')
    buf = append(buf, `"id":`...)
    buf = strconv.AppendInt(buf, int64(u.ID), 10)
    buf = append(buf, `,"name":"`...)
    buf = append(buf, u.Name...)
    buf = append(buf, `","email":"`...)
    buf = append(buf, u.Email...)
    buf = append(buf, `"}`...)
    return buf, nil
}

Benchmark Results After Round 2:

// BenchmarkGetUsersOptimized2-8   	    8000	   123,456 ns/op	   45,678 B/op	      89 allocs/op

90% faster than original!

🎯 Step 5: Database Optimization

Connection Pooling

func setupDB() *sql.DB {
    db, err := sql.Open("postgres", dsn)
    if err != nil {
        log.Fatal(err)
    }
    
    // Optimize connection pool
    db.SetMaxOpenConns(100)
    db.SetMaxIdleConns(50)
    db.SetConnMaxLifetime(time.Hour)
    db.SetConnMaxIdleTime(30 * time.Minute)
    
    return db
}

Query Optimization

-- Add index for common queries
CREATE INDEX CONCURRENTLY idx_users_created_at ON users(created_at DESC);

-- Optimized query with LIMIT
SELECT id, name, email 
FROM users 
ORDER BY created_at DESC 
LIMIT 100;

📊 Final Performance Results

Load Test After All Optimizations:

$ wrk -t12 -c400 -d30s http://localhost:8080/api/users
Running 30s test @ http://localhost:8080/api/users
  12 threads and 400 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.34ms    1.23ms  45.67ms   89.34%
    Req/Sec    14.23k     2.34k   18.90k   71.23%
  5,127,890 requests in 30.02s, 1.67GB read
Requests/sec: 170,789.12

Performance Comparison:

Metric	Before	After	Improvement
Latency	237ms	2.3ms	99% faster
RPS	1,569	170,789	108x more
Memory	456KB/req	45KB/req	90% less
Allocations	1,234/req	89/req	93% fewer

🔧 Monitoring in Production

Custom Metrics

var (
    requestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name: "http_request_duration_seconds",
            Help: "HTTP request duration in seconds",
            Buckets: prometheus.DefBuckets,
        },
        []string{"method", "endpoint", "status"},
    )
    
    memoryUsage = prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "go_memory_usage_bytes",
            Help: "Memory usage in bytes",
        },
        []string{"type"},
    )
)

func instrumentHandler(next http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        
        // Measure memory before
        var m1 runtime.MemStats
        runtime.ReadMemStats(&m1)
        
        next(w, r)
        
        // Measure memory after
        var m2 runtime.MemStats
        runtime.ReadMemStats(&m2)
        
        duration := time.Since(start)
        
        requestDuration.WithLabelValues(r.Method, r.URL.Path, "200").Observe(duration.Seconds())
        memoryUsage.WithLabelValues("alloc").Set(float64(m2.Alloc - m1.Alloc))
    }
}

Continuous Profiling

// Enable continuous profiling in production
import _ "github.com/google/pprof/profile"

func main() {
    if os.Getenv("ENV") == "production" {
        go func() {
            // Collect profiles every 60 seconds
            ticker := time.NewTicker(60 * time.Second)
            for range ticker.C {
                profile := pprof.Lookup("goroutine")
                var buf bytes.Buffer
                profile.WriteTo(&buf, 0)
                
                // Send to monitoring service
                sendProfileToMonitoring(buf.Bytes())
            }
        }()
    }
    
    // ... rest of your app
}

🎯 Key Optimization Principles

1. Measure First, Optimize Second

# Always profile before optimizing
go test -bench=. -cpuprofile=cpu.prof
go tool pprof cpu.prof

2. Focus on the Biggest Wins

Memory allocations: Often the biggest bottleneck
Database queries: N+1 problems kill performance
JSON encoding: Consider alternatives like protobuf
String operations: Use bytes.Buffer or strings.Builder

3. Use Sync.Pool for Frequent Allocations

var userPool = sync.Pool{
    New: func() interface{} {
        return &User{}
    },
}

func processUser() {
    user := userPool.Get().(*User)
    defer userPool.Put(user)
    
    // Use user...
}

4. Benchmark Everything

func BenchmarkUserCreation(b *testing.B) {
    b.ReportAllocs()
    for i := 0; i < b.N; i++ {
        user := NewUser("test", "test@example.com")
        _ = user
    }
}

⚠️ Common Pitfalls

Premature optimization - Profile first!
Micro-optimizations - Focus on algorithmic improvements
Ignoring garbage collection - Monitor GC pressure
Not testing under load - Production load != development load
Optimizing the wrong thing - 80/20 rule applies

🚀 Production Deployment

Gradual Rollout Strategy

// Feature flag for new optimized endpoint
func getUsersHandler(w http.ResponseWriter, r *http.Request) {
    if featureFlag.IsEnabled("optimized_users_api", r) {
        getUsersOptimized(w, r)
    } else {
        getUsersOriginal(w, r)
    }
}

A/B Testing Results

Optimized version: 99.9% success rate, 2ms avg latency
Original version: 95.2% success rate, 237ms avg latency
User satisfaction: +23% increase in API satisfaction scores

The bottom line: Performance optimization is about measuring, understanding, and systematically improving. These changes took our API from embarrassingly slow to blazingly fast, and they can do the same for yours.

What's your biggest Go performance bottleneck? Share it in the comments and I'll help you optimize it!

Optimizing Go Applications with pprof and Benchmarking