Better Error Handling Patterns for Go SDKs

When building SDKs in Go, one common challenge is balancing transparency and debuggability with clean error handling. This becomes particularly evident when implementing retry mechanisms, where we need to track multiple attempts, delays, and error states.

The Common Approach: Logging

Many SDKs resort to internal logging for debugging:

func Retry(scope string, f func() error, s Strategy) error {
    correlationID := crypto.Id()
    log.TraceLn(correlationID, "Starting retry loop")
    // ... retry logic with logging
    return err
}

While this provides visibility, it has several drawbacks:

Forces a logging implementation on SDK users
Mixes concerns between error handling and logging
Makes testing more complicated
Can pollute user logs with SDK internals

Better Patterns for SDK Error Handling

1. Rich Error Types

Instead of logging, return structured error information:

type RetryError struct {
    OriginalError error
    Attempts     int
    TotalTime    time.Duration
    Scope        string
    CorrelationID string
}

func (e *RetryError) Error() string {
    return fmt.Sprintf(
        "failed after %d attempts over %v in scope '%s' (correlation_id: %s): %v",
        e.Attempts,
        e.TotalTime,
        e.Scope,
        e.CorrelationID,
        e.OriginalError,
    )
}

Benefits:

Users can extract detailed information when needed
Follows Go’s error handling patterns
Makes testing straightforward
Keeps separation of concerns

2. Event Callbacks

Allow users to plug in their own handling:

type RetryCallback func(attempt int, delay time.Duration, err error)

func Retry(f func() error, s Strategy, onRetry RetryCallback) error {
    // Call onRetry for each attempt
}

This approach:

Gives users complete control over instrumentation
Enables custom metrics collection
Supports different logging implementations
Maintains flexibility

3. Optional Debug Mode

Configure debug behavior through options:

type Config struct {
    Logger Logger
    Debug  bool
}

func Retry(f func() error, s Strategy, cfg *Config) error {
    if cfg != nil && cfg.Debug && cfg.Logger != nil {
        // Log debug information
    }
}

This pattern:

Makes debugging optional
Allows users to choose their logging implementation
Keeps debug information out of production logs

Best Practices

Return, Don’t Log: SDKs should return rich error information rather than logging internally.
Structured Errors: Use custom error types with fields that capture important debugging details.
Error Wrapping: Preserve error context through the chain:
```
fmt.Errorf("retry failed: %w", originalError)
```
Flexible Instrumentation: Provide hooks for users to implement their own logging/metrics.
Clear Documentation: Document error types and their fields thoroughly.

Implementation Example

Here’s a complete implementation combining these patterns:

type RetryInfo struct {
    CorrelationID string
    Events        []RetryEvent
    StartTime     time.Time
}

type RetryEvent struct {
    Attempt   int
    Delay     time.Duration
    Error     error
    TimeStamp time.Time
}

func Retry(f func() error, s Strategy, onRetry RetryCallback) error {
    info := &RetryInfo{
        CorrelationID: crypto.Id(),
        StartTime:    time.Now(),
    }
    
    for i := 0; i <= int(s.MaxRetries); i++ {
        err := f()
        
        if err == nil {
            return nil
        }
        
        event := RetryEvent{
            Attempt:   i + 1,
            TimeStamp: time.Now(),
            Error:     err,
        }
        
        if onRetry != nil {
            onRetry(event.Attempt, event.Delay, err)
        }
        
        info.Events = append(info.Events, event)
        // ... retry logic
    }
    
    return &RetryError{
        Info:    info,
        LastErr: info.Events[len(info.Events)-1].Error,
    }
}

Conclusion

By following these patterns, we can build SDKs that:

Provide rich debugging information
Respect separation of concerns
Give users control over logging and instrumentation
Maintain clean, idiomatic Go code

Remember: good SDK design is about enabling users while staying out of their way. Return errors, don’t log them, and give users the tools they need to handle errors appropriately in their context.