PERF 1: How do you optimize your Serverless application’s performance?

Evaluating and optimizing your Serverless application’s performance based on access patterns, scaling mechanisms, and native integrations allows you to continuously gain more value per transaction.

Resources

Serverless Hero: Reusing Database Connections in AWS Lambda
re:Invent 2019 - Best practices for AWS Lambda and Java
re:Invent 2019 - I didn’t know Amazon API Gateway did that
re:Invent 2019 - Serverless at scale: Design patterns and optimizations

Best Practices:

Measure, evaluate, and select optimum capacity units: Capacity units can be function memory size, stream shards, database reads/writes request units, API endpoints, etc.
Integrate with managed services directly over functions when possible: Consider using native integration between managed services as opposed to functions when no custom logic or data transformation is required.
Measure and optimize function startup time: Evaluate function startup time for both performance and cost.
Take advantage of concurrency via async and stream-based function invocations: Functions can be executed synchronously and asynchronously. A functions’ concurrency model can be better used via asynchronous and stream-based invocations. Queues, streams, or events can be aggregated resulting in more efficient processing time per invocation rather than invocation per request-response approach.
Optimize access patterns and apply caching where applicable: Consider caching data that may not need to be frequently up-to-date, and optimize access patterns to only fetch data that is necessary to end users.

Improvement Plan

Measure, evaluate, and select optimum capacity units

Identify and implement optimum capacity units.

For AWS Lambda functions, the Lambda Power Tuning application can help you systematically test different memory size configurations and depending on your performance strategy - cost, performance, balanced - it identifies what is the most optimum memory size to use.
For Amazon DynamoDB, On-Demand Instance can support up to 40,000 read/write request units per second. It is recommended for unpredictable application traffic and new tables with unknown workloads. For higher and predictable throughputs, provisioned capacity along with DynamoDB automatic scaling is recommended over On-Demand Instance.
For high throughput Amazon Kinesis Data Streams with multiple consumers, consider using Enhanced Fan-Out for dedicated 2-MiB throughput per consumer. When possible, use Kinesis Producer Library and Kinesis Consumer Library for effective record aggregation and de-aggregation.
For Amazon API Gateway, you can use Edge endpoint for geographically distributed clients, Regional endpoints when clients are in the same Region, and Private endpoints for when API consumers should access your API via your Amazon Virtual Private Cloud (VPC).
Performance load testing is recommended at both sustained and burst rates.
Use Amazon CloudWatch Service Dashboards to analyze key performance metrics including load testing results to evaluate the effect of tuning capacity units.
Understanding when to use Amazon DynamoDB On-Demand Instance and provisioned capacity.
Amazon Kinesis Data Streams Enhanced Fan-Out
Choose an Amazon API Gateway Endpoint type
AWS X-Ray
Analyzing Log Data with Amazon CloudWatch Logs Insights

Integrate with managed services directly over functions when possible

Utilize native cloud service integrations.

When using Amazon API Gateway APIs, you can use AWS integration type to natively connect with other AWS services. In this integration type, API Gateway uses Apache Velocity Templates (VTL) and HTTPS to directly integrate with other AWS services and timeouts and errors are expected to be managed by the API consumer.
When using AWS AppSync, you can use both Apache Velocity Templates (VTL), direct integration with Amazon Aurora, OpenSearch and any publicly available HTTP endpoint. AWS AppSync can also utilize multiple integrations and maximize throughput at data field level.
- Examples
  - Full-text searches on field orderDescription are executed against Amazon OpenSearch while remaining data is fetched from Amazon DynamoDB.
For State Machines managed by AWS Step Functions, you can use Service Integrations feature to fetch and put data into Amazon DynamoDB, run an AWS Batch job, publish messages to Amazon SNS topics, send messages to Amazon SQS queues, etc.
For event-driven use cases, EventBridge can connect to various AWS services natively, and act as an event bus across multiple AWS accounts to ease integration.
Amazon API Gateway Apache Velocity Template Reference
Integrating multiple with data sources with AWS AppSync
Integrating with AWS Services via Step Functions Service Integrations
EventBridge and supported targets

Measure and optimize function startup time

Analyze and improve startup time.

Use AWS Lambda function code initialization time reported in Amazon CloudWatch Logs (Init duration) or AWS X-Ray to measure startup time that can be improved.
- Examples
  - For a Python function, use PYTHONPROFILEIMPORTTIME=1 environment variable to profile and understand what packages impact startup time
Prefer simpler frameworks that load quickly on execution context startup.
Serverless Hero: Lambda API framework
MiddyJS framework
Python Chalice framework
Performance improvement configuration - AWS Java SDK
- Examples
  - Prefer simpler Java dependency injection frameworks like Dagger or Juice, over more complex like Spring.
  - Favor lightweight web frameworks optimized for AWS Lambda like MiddyJS, Lambda API JS, Python Chalice over Node.js Express, Python Django, or Flask.

Take advantage of execution context reuse to improve the performance of your function.

Initialize SDK clients and database connections outside of the function handler and cache static assets locally in the /tmp directory. Subsequent invocations processed by the same instance of your AWS Lambda function can reuse these resources.
Understanding AWS Lambda Execution Context
Serverless Hero: Enable HTTP Keep Alive - AWS Node.js SDK

Minimize your deployment package size to only its runtime necessities.

Only bring dependencies that are necessary to your application, and when available use code bundling to reduce file system lookup calls impact including its deployment package size.
Serverless Hero: Optimizing AWS Node.js SDK imports

Take advantage of concurrency via async and stream-based function invocations

Favor asynchronous over synchronous request-response processing.

Asynchronous AWS Lambda function invocations are sent to a queue, and an external process separate from the function manages polling and retries including exponential backoff out of the box.
Asynchronous invocations support dead-letter queues that can be configured at a per function level - Dead-letter queues may be an Amazon SQS queue or an Amazon SNS topic.
AWS Lambda service sends the async event to a dead-letter queue if it’s unable to receive a successful response from Lambda in up to three attempts. For invocations that may not succeed due to throttling (HTTP 429) or system errors (HTTP 500-series), Lambda service retries invoking the function up to 6 hours.
Understanding asynchronous invocation model for AWS Lambda

Tune batch size, batch window and compress payloads for high throughput.

You can configure a batch window to buffer streaming records for up to 5 minutes, or you can set a limit of how many maximum records Lambda can process by setting a batch size - Your Lambda function will be invoked as to whichever comes first.
For high volume throughput, you can increase Amazon Kinesis Data Streams shards resulting in increased concurrency at the expense of ordering (per shard). Additionally, Kinesis Enhanced Fan-Out can maximise throughput by dedicating a 2 MiB input/output channel per second per consumer instead of 2 MiB per shard.
For high volume and single consumer, you can use Amazon SQS as an event source for your Lambda function and process up to 1000 batch of records per second.
When possible, producers can compress records at the expense of additional CPU cycles for decompressing in your Lambda function code.
Using Amazon SQS queues and AWS Lambda for high throughput
Understanding stream-based invocations with Amazon Kinesis and AWS Lambda
Increasing stream processing performance with Enhanced Fan-Out and AWS Lambda

Optimize access patterns and apply caching where applicable

Implement caching for suitable access patterns.

For REST APIs, you can use Amazon API Gateway Caching to reduce the number of calls made to your endpoint and also improve the latency of requests to your API.
For geographically distributed clients, Amazon CloudFront or your third-party trusted CDN can cache results at the edge and further reducing network round-trip latency.
For Amazon DynamoDB, you can enable caching with DynamoDB Accelerator (DAX) for use cases that may not require strongly consistent reads and are ready-intensive.
For GraphQL APIs, you can use AWS AppSync Server-side Caching at the API level. For queries with common arguments or a restrict set of arguments, you can also enable caching at the resolver level to improve overall responsiveness.
For general caching purposes, Amazon ElastiCache supports a variety of caching patterns through in-memory key-value stores like Redis and Memcached engines.
Define what is safe to be cached, TTL and set an eviction policy that fits your baseline performance, and access patterns to ensure that you do not serve stale record or cache data that should have a strongly consistent read.
Enabling Amazon API Gateway Caching
Use cases for Amazon DynamoDB Accelerator
Amazon ElastiCache caching and time-to-live strategies
Serverless Hero: Caching Serverless Applications

Reduce overfetching and underfetching

Utilize Amazon DynamoDB queries over scan and utilize both Global Secondary Index (GSI) as well as composite sort keys to help you query hierarchical relationships in your data.
Consider AWS AppSync and GraphQL for interactive web applications, mobile, real-time, or use cases where data drives the User Interface. It can provide you data fetching flexibilities where your client can query only for the data it needs, in the format that it needs it in, however be mindful of too many nested queries where a response may take a couple of seconds possibly resulting in timeouts. Additionally, GraphQL helps you adapt access patterns as your workload evolves, thus making it more flexible to use purpose-built databases at any point in time.

Compress payload and data storage

If your content supports deflate, gzip or identity content encoding, enable payload compression in Amazon API Gateway.
Amazon Kinesis Data Firehose supports compressing streaming data using gzip, snappy, or zip. Amazon Kinesis Data Firehose also support converting your streaming data from JSON to Apache Parquet or ORC. This can help improve performance and reduce data storage costs.
Best Practices when using Amazon Athena with AWS Glue
Enabling payload compression in Amazon API Gateway