REL 2: How do you build resiliency into your Serverless application?

Evaluate scaling mechanisms for Serverless and non-Serverless resources to meet customer demand, and build resiliency to withstand partial and intermittent failures across dependencies.


The Amazon Builder's Library
Optimizing AWS SDK for AWS Lambda
AWS Lambda error and retry behavior
Serverless Hero: Production tips for working with Amazon Kinesis Data Streams

Best Practices:

Improvement Plan

Manage transaction, partial, and intermittent failures

  • Use exponential backoff with jitter.
  • Use a dead-letter queue mechanism to retain, investigate, and retry failed transactions
  • Manage duplicate and unwanted events

  • Generate unique attributes needed to manage duplicate events at the beginning of the transaction.
  • Use an external system, such as a database, to store unique attributes of a transaction that can be verified for duplicates.
  • Validate events using a pre-defined and agreed upon schema.
  • Orchestrate long-running transactions

  • Use a state machine to provide a visual representation of distributed transactions, and to separate business logic from orchestration logic.
  • Use dead-letter queues in response to failed state machine executions.
  • Consider scaling patterns at burst rates

  • Perform load test using burst strategy with random intervals of idleness.
  • Review service account limits with combined utilization across resources.
  • Evaluate key metrics to understand how your workload recovers from bursts.