Error Handling Resilience
Building resilient Elixir systems? This guide teaches error handling patterns and resilience strategies for production systems, covering when to use try/catch/rescue, error tuples, with pipelines, circuit breakers, and retry patterns with exponential backoff.
Why Error Handling Matters
Production systems face inevitable failures:
- Network failures - External API timeouts, connection drops, DNS failures
- Resource exhaustion - Database connection limits, memory pressure, disk full
- Invalid input - Malformed data, constraint violations, business rule failures
- Third-party errors - Payment gateway failures, service degradation, rate limits
- Transient failures - Temporary network glitches, brief service unavailability
Elixir’s approach: Design for failure. Use supervisors for process crashes, error tuples for expected failures, and resilience patterns for external dependencies.
Financial Domain Examples
Examples use Shariah-compliant financial operations:
- Payment processing - Handling transaction failures with retries and idempotency
- External API integration - Circuit breakers for third-party services
- Audit logging - Ensuring error transparency for compliance
- Donation validation - Error pipelines for input validation
These domains demonstrate production error handling with real business requirements.
Error Tuple Conventions
Pattern 1: Tagged Tuples
Elixir uses {:ok, value} and {:error, reason} for expected failures.
When to use: Expected failures that are part of normal flow (validation, business rules, not found).
# Payment validation using error tuples
defmodule Finance.PaymentValidator do
# => Validates payment amount and type
def validate_payment(%{amount: amount, type: type} = payment) do
# => payment: Map with amount and type
# => Returns: {:ok, payment} or {:error, reason}
with :ok <- validate_amount(amount), # => Check amount validity
# => :ok means valid
:ok <- validate_type(type) do # => Check type validity
# => :ok means valid
{:ok, payment} # => All validations passed
# => Returns: {:ok, original payment}
else
{:error, reason} -> {:error, reason} # => Validation failed
# => Propagates error reason
end
end
defp validate_amount(amount) when amount > 0 and amount < 1_000_000 do
:ok # => Amount valid
# => Range: 0-1M
end
defp validate_amount(_amount) do
{:error, :invalid_amount} # => Amount outside valid range
# => Returns: Error tuple
end
defp validate_type(type) when type in [:donation, :zakat, :investment] do
:ok # => Type valid
# => Allowed: donation, zakat, investment
end
defp validate_type(_type) do
{:error, :invalid_payment_type} # => Unknown payment type
# => Returns: Error tuple
end
endUsage:
payment = %{amount: 1000, type: :donation} # => Valid payment
Finance.PaymentValidator.validate_payment(payment)
# => Returns: {:ok, %{amount: 1000, type: :donation}}
invalid = %{amount: -50, type: :donation} # => Invalid amount
Finance.PaymentValidator.validate_payment(invalid)
# => Returns: {:error, :invalid_amount}Best practice: Use error tuples for domain errors that callers should handle explicitly.
Pattern 2: Multiple Error Cases
Return different error reasons for specific failure modes.
# Bank account validation with specific errors
defmodule Finance.BankAccount do
# => Validates bank account for payment processing
def validate_account(account_number) when byte_size(account_number) == 10 do
case check_account_status(account_number) do
# => Check if account active
{:ok, :active} ->
{:ok, account_number} # => Account valid and active
{:ok, :frozen} ->
{:error, :account_frozen} # => Account exists but frozen
# => Caller should handle differently
{:ok, :closed} ->
{:error, :account_closed} # => Account permanently closed
{:error, :not_found} ->
{:error, :account_not_found} # => Account doesn't exist
end
end
def validate_account(_account_number) do
{:error, :invalid_format} # => Wrong length
# => Must be 10 digits
end
defp check_account_status(account_number) do
# => Simulated database lookup
case account_number do
"1234567890" -> {:ok, :active} # => Active account
"0987654321" -> {:ok, :frozen} # => Frozen account
"1111111111" -> {:ok, :closed} # => Closed account
_ -> {:error, :not_found} # => Not in database
end
end
endBest practice: Provide specific error reasons so callers can handle each case appropriately.
with for Error Pipelines
Pattern 3: Chaining Error-Tuple Operations
with chains operations that return {:ok, value} or {:error, reason}.
When to use: Multiple validation steps where early failure should short-circuit.
# Payment processing with validation pipeline
defmodule Finance.PaymentProcessor do
alias Finance.{PaymentValidator, BankAccount, FraudDetector}
def process_payment(payment_data) do
# => payment_data: Map with all payment info
with {:ok, payment} <- PaymentValidator.validate_payment(payment_data),
# => Step 1: Validate payment structure
# => If {:error, _}, skip to else
{:ok, account} <- BankAccount.validate_account(payment.account_number),
# => Step 2: Validate bank account
# => Uses result from step 1
{:ok, _check} <- FraudDetector.check_transaction(payment),
# => Step 3: Fraud detection
# => All checks passed
{:ok, receipt} <- charge_account(account, payment.amount) do
# => Step 4: Execute charge
# => Returns: Receipt on success
audit_success(payment, receipt) # => Log successful transaction
{:ok, receipt} # => Return receipt to caller
else
{:error, :invalid_amount} = error ->
audit_failure(payment_data, error) # => Log validation failure
{:error, :payment_validation_failed} # => Return generic error
{:error, :account_frozen} = error ->
audit_failure(payment_data, error) # => Log frozen account
notify_customer(:account_frozen) # => Send customer notification
{:error, :account_unavailable} # => Return customer-facing error
{:error, :fraud_detected} = error ->
audit_failure(payment_data, error) # => Log fraud attempt
notify_admin(:fraud_detected, payment_data)
# => Alert admin immediately
{:error, :transaction_blocked} # => Block transaction
{:error, reason} = error ->
audit_failure(payment_data, error) # => Log unknown error
{:error, reason} # => Propagate original error
end
end
defp charge_account(account, amount) do
# => Simulated payment charge
if :rand.uniform() > 0.1 do # => 90% success rate
{:ok, %{transaction_id: generate_id(), account: account, amount: amount}}
# => Returns: Receipt
else
{:error, :insufficient_funds} # => 10% failure rate
end
end
defp audit_success(payment, receipt) do
# => Log successful transaction for compliance
IO.puts("SUCCESS: Payment processed - #{receipt.transaction_id}")
end
defp audit_failure(payment_data, error) do
# => Log failed transaction for compliance
IO.puts("FAILURE: Payment failed - #{inspect(error)}")
end
defp notify_customer(reason) do
# => Send customer notification (simulated)
IO.puts("Customer notified: #{reason}")
end
defp notify_admin(reason, payment_data) do
# => Alert admin of critical issue
IO.puts("Admin alert: #{reason} - #{inspect(payment_data)}")
end
defp generate_id, do: :crypto.strong_rand_bytes(16) |> Base.encode64()
endBest practice: Use with for validation pipelines. Handle each error case explicitly in else clause for proper logging and user feedback.
try/catch/rescue Patterns
Pattern 4: When to Use try/catch/rescue
Appropriate use cases (use sparingly):
- Interfacing with third-party libraries that raise exceptions
- Protecting against truly unexpected failures
- Converting exceptions to error tuples at boundaries
Inappropriate use cases (avoid):
- Control flow for expected errors (use error tuples)
- Wrapping all code “just in case” (anti-pattern)
- Catching and ignoring errors (hides problems)
# Converting external library exceptions to error tuples
defmodule Finance.ExternalAPI do
# => Wrapper for third-party payment gateway SDK
def charge_card(card_token, amount) do
# => card_token: Tokenized card
# => amount: Charge amount
try do
# => External library that raises on error
result = PaymentGatewaySDK.charge(card_token, amount)
# => May raise TimeoutError
# => May raise InvalidCardError
# => May raise NetworkError
{:ok, result} # => Success: Return result
rescue
PaymentGatewaySDK.TimeoutError ->
{:error, :gateway_timeout} # => Network timeout
# => Retry eligible
PaymentGatewaySDK.InvalidCardError ->
{:error, :invalid_card} # => Invalid card details
# => NOT retry eligible
PaymentGatewaySDK.NetworkError ->
{:error, :network_error} # => Network issue
# => Retry eligible
error ->
# => Unexpected error - log and propagate
require Logger
Logger.error("Unexpected payment gateway error: #{inspect(error)}")
{:error, :gateway_error} # => Generic error
end
end
endBest practice: Use try/rescue at system boundaries to convert exceptions to error tuples. Never use for control flow within your domain logic.
Pattern 5: Catch for Non-Error Throws
catch handles non-error exits and throws (rare in Elixir).
# Handling early termination in external library
defmodule Finance.ReportGenerator do
# => Generates financial reports using external library
def generate_report(data) do
# => data: Report parameters
try do
# => External library uses throw for early exit
report = LegacyReportLib.generate(data) # => May throw {:early_return, partial_report}
# => May raise on error
{:ok, report} # => Full report generated
catch
# => Handle throw (non-error early exit)
{:early_return, partial} ->
{:ok, {:partial, partial}} # => Partial report available
# => Caller decides if acceptable
:timeout ->
{:error, :report_timeout} # => Generation took too long
rescue
# => Handle actual errors
error ->
{:error, {:report_generation_failed, error}}
end
end
endBest practice: Only use catch when interfacing with libraries that use throw for control flow. Modern Elixir code should use error tuples instead.
Circuit Breaker Patterns
Pattern 6: Protecting External Dependencies
Circuit breakers prevent cascading failures when external services fail.
States:
- Closed - Normal operation, requests pass through
- Open - Service failing, fast-fail without calling service
- Half-open - Testing recovery, limited requests allowed
# Circuit breaker for external payment gateway
defmodule Finance.PaymentGatewayCircuitBreaker do
use GenServer
# => Implements circuit breaker pattern
@failure_threshold 5 # => Open after 5 failures
@recovery_timeout 60_000 # => Try recovery after 60s
@half_open_requests 3 # => Test with 3 requests
# Client API
def start_link(opts) do
GenServer.start_link(__MODULE__, opts, name: __MODULE__)
# => Start GenServer
# => Registered name: module name
end
def call(func) do
# => func: Function to call gateway
GenServer.call(__MODULE__, {:call, func}) # => Request through circuit breaker
# => Returns: {:ok, result} or {:error, reason}
end
def get_state do
GenServer.call(__MODULE__, :get_state) # => Get current circuit state
# => Returns: :closed | :open | :half_open
end
# Server Implementation
def init(_opts) do
{:ok, %{
state: :closed, # => Initial state: closed
failure_count: 0, # => No failures yet
last_failure_time: nil, # => No failures
half_open_success: 0 # => Half-open success counter
}}
end
def handle_call({:call, func}, _from, state) do
case state.state do
:closed ->
# => Circuit closed: normal operation
execute_with_error_tracking(func, state)
:open ->
# => Circuit open: check if recovery time elapsed
if ready_for_half_open?(state) do
new_state = %{state | state: :half_open, half_open_success: 0}
execute_with_error_tracking(func, new_state)
else
{:reply, {:error, :circuit_open}, state}
# => Fast fail: don't call service
end
:half_open ->
# => Circuit half-open: testing recovery
execute_with_recovery_tracking(func, state)
end
end
def handle_call(:get_state, _from, state) do
{:reply, state.state, state} # => Return current state
end
# Private Functions
defp execute_with_error_tracking(func, state) do
# => Execute and track failures
case func.() do # => Call external service
{:ok, result} ->
# => Success: reset failure counter
new_state = %{state | failure_count: 0}
{:reply, {:ok, result}, new_state}
{:error, reason} = error ->
# => Failure: increment counter
new_failure_count = state.failure_count + 1
if new_failure_count >= @failure_threshold do
# => Threshold reached: open circuit
new_state = %{
state |
state: :open,
failure_count: new_failure_count,
last_failure_time: System.monotonic_time(:millisecond)
}
{:reply, error, new_state}
else
# => Below threshold: stay closed
new_state = %{state | failure_count: new_failure_count}
{:reply, error, new_state}
end
end
end
defp execute_with_recovery_tracking(func, state) do
# => Execute and track recovery
case func.() do # => Call external service
{:ok, result} ->
# => Success in half-open state
new_success_count = state.half_open_success + 1
if new_success_count >= @half_open_requests do
# => Enough successes: close circuit
new_state = %{
state |
state: :closed,
failure_count: 0,
half_open_success: 0,
last_failure_time: nil
}
{:reply, {:ok, result}, new_state}
else
# => Continue testing
new_state = %{state | half_open_success: new_success_count}
{:reply, {:ok, result}, new_state}
end
{:error, _reason} = error ->
# => Failure in half-open: reopen circuit
new_state = %{
state |
state: :open,
half_open_success: 0,
last_failure_time: System.monotonic_time(:millisecond)
}
{:reply, error, new_state}
end
end
defp ready_for_half_open?(state) do
# => Check if recovery timeout elapsed
if state.last_failure_time do
elapsed = System.monotonic_time(:millisecond) - state.last_failure_time
elapsed >= @recovery_timeout # => True if 60s passed
else
false # => No failure time: not ready
end
end
endUsage with payment gateway:
defmodule Finance.PaymentService do
alias Finance.{ExternalAPI, PaymentGatewayCircuitBreaker}
def charge_card_with_circuit_breaker(card_token, amount) do
# => Charge with protection
PaymentGatewayCircuitBreaker.call(fn ->
ExternalAPI.charge_card(card_token, amount)
# => Call protected by circuit breaker
end) # => Returns: {:ok, result} or {:error, reason}
end
end
# Start circuit breaker in application supervision tree
defmodule Finance.Application do
use Application
def start(_type, _args) do
children = [
Finance.PaymentGatewayCircuitBreaker # => Circuit breaker GenServer
]
Supervisor.start_link(children, strategy: :one_for_one)
end
endBest practice: Use circuit breakers for all external dependencies. Monitor circuit state transitions to detect service degradation early.
Retry Strategies with Exponential Backoff
Pattern 7: Retry with Exponential Backoff
Transient failures often resolve with retries. Exponential backoff prevents overwhelming failing services.
# Retry with exponential backoff
defmodule Finance.RetryStrategy do
# => Implements retry with exponential backoff
@max_retries 5 # => Maximum retry attempts
@initial_delay 100 # => Initial delay: 100ms
@max_delay 30_000 # => Maximum delay: 30s
@jitter_factor 0.1 # => Add 10% random jitter
def retry(func, opts \\ []) do
# => func: Function to retry
# => opts: Configuration options
max_retries = Keyword.get(opts, :max_retries, @max_retries)
initial_delay = Keyword.get(opts, :initial_delay, @initial_delay)
do_retry(func, 0, max_retries, initial_delay)
end
defp do_retry(func, attempt, max_retries, delay) when attempt <= max_retries do
# => attempt: Current attempt number
# => max_retries: Maximum attempts
# => delay: Current backoff delay
case func.() do # => Execute function
{:ok, result} ->
{:ok, result} # => Success: return result
{:error, reason} = error ->
if retryable?(reason) and attempt < max_retries do
# => Transient error: retry after delay
actual_delay = calculate_backoff(attempt, delay)
Process.sleep(actual_delay) # => Wait before retry
# => Exponential backoff + jitter
do_retry(func, attempt + 1, max_retries, delay)
else
# => Non-retryable or max attempts: fail
{:error, {:max_retries_exceeded, reason}}
end
end
end
defp retryable?(reason) do
# => Determine if error is retryable
reason in [
:timeout, # => Network timeout
:gateway_timeout, # => Gateway timeout
:network_error, # => Network issue
:service_unavailable, # => Temporary unavailability
:rate_limit # => Rate limit (wait and retry)
]
end
defp calculate_backoff(attempt, initial_delay) do
# => Calculate exponential delay
exponential = initial_delay * :math.pow(2, attempt)
# => Doubles each attempt
# => Attempt 0: 100ms
# => Attempt 1: 200ms
# => Attempt 2: 400ms
capped = min(exponential, @max_delay) # => Cap at 30s
jitter = capped * @jitter_factor * :rand.uniform()
# => Add random jitter (0-10%)
# => Prevents thundering herd
round(capped + jitter) # => Final delay with jitter
end
endUsage with payment processing:
defmodule Finance.PaymentService do
alias Finance.{ExternalAPI, RetryStrategy}
def charge_card_with_retry(card_token, amount) do
# => Charge with automatic retries
RetryStrategy.retry(fn ->
ExternalAPI.charge_card(card_token, amount)
end, max_retries: 3, initial_delay: 200) # => 3 retries, 200ms initial delay
# => Delays: 200ms, 400ms, 800ms
end
endBest practice: Use exponential backoff with jitter for all retries. Define clear retryable vs non-retryable errors.
Pattern 8: Combining Circuit Breaker and Retry
Circuit breaker protects system, retry handles transient failures.
defmodule Finance.ResilientPaymentService do
alias Finance.{ExternalAPI, PaymentGatewayCircuitBreaker, RetryStrategy}
def charge_card(card_token, amount) do
# => Maximum resilience strategy
# => Layer 1: Retry for transient failures
RetryStrategy.retry(fn ->
# => Layer 2: Circuit breaker for cascading failure prevention
PaymentGatewayCircuitBreaker.call(fn ->
# => Layer 3: External API with exception handling
ExternalAPI.charge_card(card_token, amount)
end)
end, max_retries: 3, initial_delay: 200)
# => Returns: {:ok, receipt} or {:error, reason}
end
endFailure handling:
case Finance.ResilientPaymentService.charge_card(token, 1000) do
{:ok, receipt} ->
# => Success: process receipt
IO.puts("Payment successful: #{receipt.transaction_id}")
{:error, :circuit_open} ->
# => Circuit open: service degraded
# => Don't retry, notify user to try later
{:error, :service_temporarily_unavailable}
{:error, {:max_retries_exceeded, :gateway_timeout}} ->
# => All retries exhausted: timeout
# => Log for investigation, notify user
{:error, :payment_timeout}
{:error, :invalid_card} ->
# => Non-retryable: invalid input
# => Don't retry, notify user immediately
{:error, :invalid_card_details}
endBest practice: Combine circuit breaker (prevents cascading failures) with retry (handles transient issues). Log all failure modes for monitoring.
Idempotency for Retry Safety
Pattern 9: Idempotent Operations
Retries must be safe to execute multiple times without side effects.
# Idempotent payment processing
defmodule Finance.IdempotentPaymentProcessor do
# => Ensures payment processed exactly once even with retries
def process_payment(idempotency_key, payment_data) do
# => idempotency_key: Unique request identifier
# => payment_data: Payment details
# => Check if already processed
case get_previous_result(idempotency_key) do
{:ok, previous_result} ->
# => Already processed: return cached result
{:ok, previous_result} # => Safe retry: no double charge
{:error, :not_found} ->
# => First attempt: process payment
with {:ok, receipt} <- charge_payment(payment_data),
:ok <- store_result(idempotency_key, receipt) do
# => Store result for future retries
{:ok, receipt}
else
error -> error # => Propagate error
end
end
end
defp get_previous_result(idempotency_key) do
# => Check cache/database for previous result
# => Simulated with process dictionary
case Process.get({:payment_result, idempotency_key}) do
nil -> {:error, :not_found} # => First request
result -> {:ok, result} # => Duplicate request
end
end
defp store_result(idempotency_key, receipt) do
# => Store result in cache/database
# => Simulated with process dictionary
Process.put({:payment_result, idempotency_key}, receipt)
:ok
end
defp charge_payment(payment_data) do
# => Actual payment charge (simulated)
if :rand.uniform() > 0.3 do # => 70% success rate
{:ok, %{transaction_id: generate_id(), amount: payment_data.amount}}
else
{:error, :gateway_timeout} # => 30% transient failure
end
end
defp generate_id, do: :crypto.strong_rand_bytes(16) |> Base.encode64()
endUsage with retry:
# Client generates idempotency key once
idempotency_key = "payment-#{user_id}-#{:os.system_time(:millisecond)}"
# => Unique per payment request
# => Same key used for all retries
Finance.RetryStrategy.retry(fn ->
Finance.IdempotentPaymentProcessor.process_payment(
idempotency_key, # => Same key for retries
payment_data
)
end)Best practice: All retriable operations must be idempotent. Use client-generated idempotency keys, not server-generated request IDs.
Real-World Integration Example
Complete Resilient Payment System
defmodule Finance.ProductionPaymentSystem do
@moduledoc """
Production-grade payment system combining:
- Error tuple conventions for domain errors
- with pipelines for validation
- try/rescue for external library exceptions
- Circuit breaker for cascading failure prevention
- Retry with exponential backoff for transient failures
- Idempotency for retry safety
"""
alias Finance.{
PaymentValidator,
BankAccount,
FraudDetector,
IdempotentPaymentProcessor,
PaymentGatewayCircuitBreaker,
RetryStrategy
}
def process_payment(payment_request) do
# => payment_request: Full payment details
with {:ok, validated} <- validate_request(payment_request),
{:ok, receipt} <- execute_payment(validated) do
audit_success(validated, receipt)
notify_customer(:success, receipt)
{:ok, receipt}
else
{:error, :circuit_open} = error ->
audit_failure(payment_request, error)
notify_customer(:service_unavailable, nil)
error
{:error, {:max_retries_exceeded, reason}} = error ->
audit_failure(payment_request, error)
notify_customer(:payment_timeout, nil)
{:error, :payment_failed}
{:error, reason} = error ->
audit_failure(payment_request, error)
notify_customer(:payment_failed, nil)
error
end
end
defp validate_request(payment_request) do
# => Validation pipeline
with {:ok, payment} <- PaymentValidator.validate_payment(payment_request),
{:ok, account} <- BankAccount.validate_account(payment.account_number),
{:ok, _check} <- FraudDetector.check_transaction(payment) do
{:ok, Map.put(payment, :validated_account, account)}
end
end
defp execute_payment(validated_payment) do
# => Execute with full resilience
idempotency_key = validated_payment.idempotency_key
RetryStrategy.retry(fn ->
PaymentGatewayCircuitBreaker.call(fn ->
IdempotentPaymentProcessor.process_payment(
idempotency_key,
validated_payment
)
end)
end, max_retries: 3, initial_delay: 200)
end
defp audit_success(payment, receipt) do
# => Compliance logging
require Logger
Logger.info("Payment success",
transaction_id: receipt.transaction_id,
amount: payment.amount,
account: payment.account_number
)
end
defp audit_failure(payment, error) do
# => Compliance logging
require Logger
Logger.error("Payment failure",
error: inspect(error),
amount: payment.amount,
account: payment[:account_number]
)
end
defp notify_customer(status, receipt) do
# => Customer notification (email/SMS)
IO.puts("Customer notification: #{status}")
end
endError Handling Checklist
Before deploying error handling code:
- Expected failures use error tuples
{:ok, value}or{:error, reason} - Specific error reasons for different failure modes
-
withpipelines for validation with proper else clauses - try/rescue only at system boundaries to convert exceptions
- Circuit breakers for all external dependencies
- Retry with exponential backoff and jitter
- Retryable errors clearly defined and distinguished
- Idempotent operations for all retriable functions
- Comprehensive audit logging for compliance
- Customer notifications for all error paths
- Admin alerts for critical failures (fraud, circuit open)
- Monitoring and metrics for error rates and circuit states
Summary
Elixir error handling combines multiple patterns:
Error tuples - Expected failures in domain logic with pipelines - Validation chains with explicit error handling try/rescue - Converting external exceptions to error tuples (use sparingly) Circuit breakers - Preventing cascading failures from external dependencies Retry with backoff - Handling transient failures automatically Idempotency - Making retries safe through deduplication
Key principle: Design for failure. External dependencies will fail, network requests will timeout, and services will degrade. Build resilience patterns from the start, not after production incidents.
Next Steps
- Testing Strategies - Test error handling paths
- Supervisor Trees - Process-level fault tolerance
- Phoenix Framework - HTTP error handling in Phoenix
- Best Practices - Production error handling patterns