REL 1: How do you regulate inbound request rates?

Defining, analyzing, and enforcing inbound request rates helps achieve better throughput. Regulation helps you adapt different scaling mechanisms based on customer demand.

Resources

Account Level Throttling
Amazon API Gateway Limits

Best Practices:

Use throttling to control inbound request rates: Use throttling limits to control inbound requests by setting steady-state and burst rates limits.
Use mechanisms to protect non-scalable resources: Functions can scale faster than traditional resources, such as relational databases and cache systems. Protect non-scalable resources by adapting fast scaling components to downstream systems throughput.
Use, analyze, and enforce API quotas: API quotas limit the maximum number of requests that can be submitted within a specified time interval with a given API key.

Improvement Plan

Use throttling to control inbound request rates

Identify steady-rate and burst rate requests that your workload can sustain at any point in time before performance degraded.

Perform load testing for a sustained period of time, gradually increasing traffic to determine your steady-state rate of requests.
Use a burst strategy/no ramp up to determine the burst rates that your workload can serve without errors or performance degradation.
AWS Marketplace: Gatling FrontLine Load Testing
Amazon Partner: BlazeMeter Load Testing
Amazon Partner: Apica Load Testing

Throttle inbound request rates using steady-rate and burst rate requests.

Enable throttling for individual Amazon API Gateway APIs, API stages, or per method to improve overall performance across all APIs in your account. This restricts the overall request submissions so that they don't go past the account-level throttling limits.
You can also throttle requests by introducing Amazon Kinesis Data Stream or Amazon SQS as a buffering layer.
Amazon Kinesis can limit the number of requests at the shard level while Amazon SQS can limit at the consumer level.
For AWS Lambda functions, Amazon Kinesis can effectively control concurrency at the shard level, meaning that a single shard will have a single concurrent invocation per second.
Throttle API requests for better throughput
Throttle API requests for better throughput
Amazon Kinesis Data Stream as an event source for AWS Lambda
Amazon SQS as an event source for AWS Lambda
Serverless Hero: Distributing and Throttling events with queues and message filtering

Use mechanisms to protect non-scalable resources

Limit components throughput by enforcing how many transactions it can accept directly or via buffer mechanisms, such as queues and streams.

For relational databases such as Amazon RDS, you can limit the number of connections per user in addition to the global maximum number of connections.
Cache results and only connect and fetch data from databases when needed.
Adjust the maximum number of connections for caching systems, including a caching expiration mechanism to prevent serving stale records.
Amazon Kinesis Data Streams control concurrency at shard level, meaning that a single shard has a single concurrent invocation, thus reducing downstream calls to non-scalable resources such as a traditional database.
Amazon Kinesis Data Streams also supports batch windows up to 5 minutes and batch record sizes, whichever comes first will control how frequent invocations can occur.
Use AWS Lambda reserved concurrency feature on your function to both reserve and limit the maximum concurrency it can achieve, if necessary.
Caching implementation patterns and considerations
Serverless Hero: Managing database connections with AWS Lambda
Reserving AWS Lambda function concurrency

Use, analyze, and enforce API quotas

Define whether your API consumers are end users or machines.

Segregate API consumers steady-rate requests and their quota into multiple buckets or tiers.

Amazon API Gateway Usage Plans allow your API consumer to access selected APIs at agreed-upon request rates and quotas that meet their business requirements and budget constraints.
Create and attach API keys to usage plans to control access to certain API stages.
Extract utilization data from usage plans to analyze API usage on a per-API key basis, generate billing documents and determine whether your customers need higher or lower limits.
Have a mechanism to allow customers to pre-emptively request higher limits, so they can be proactive when they anticipate increased use of your APIs.
Amazon API Gateway Lambda Authorizers can dynamically associate API keys to a given request. This is ideal for scenarios where you don’t control API consumers or want to associate API keys based on your own criteria.
Create and use Usage Plans with API keys
Usage Plan API key output from Lambda Authorizers