This content is outdated. This version of the Well-Architected Framework is now found at: https://docs.aws.amazon.com/en_us/wellarchitected/2022-03-31/framework/reliability.html

REL 7: How do you design your workload to adapt to changes in demand?

A scalable workload provides elasticity to add or remove resources automatically so that they closely match the current demand at any given point in time.

Resources

AWS Auto Scaling: How Scaling Plans Work
What Is Amazon EC2 Auto Scaling?
Managing Throughput Capacity Automatically with DynamoDB Auto Scaling
What is Amazon CloudFront?
Distributed Load Testing on AWS: simulate thousands of connected users
Telling Stories About Little's Law
AWS Marketplace: products that can be used with auto scaling
APN Partner: partners that can help you create automated compute solutions

Best Practices:

Use automation when obtaining or scaling resources: When replacing impaired resources or scaling your workload, automate the process by using managed AWS services, such as Amazon S3 and AWS Auto Scaling. You can also use third-party tools and AWS SDKs to automate scaling.
Obtain resources upon detection of impairment to a workload: Scale resources reactively when necessary if availability is impacted, to restore workload availability.
Obtain resources upon detection that more resources are needed for a workload: Scale resources proactively to meet demand and avoid availability impact.
Load test your workload: Adopt a load testing methodology to measure if scaling activity meets workload requirements.

Improvement Plan

Use automation when obtaining or scaling resources

Configure and use AWS Auto Scaling: This monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Using AWS Auto Scaling, you can setup application scaling for multiple resources across multiple services.
What is AWS Auto Scaling?

Configure Auto Scaling on your Amazon EC2 instances and Spot Fleets, Amazon ECS tasks, Amazon DynamoDB tables and indexes, Amazon Aurora Replicas, and AWS Marketplace appliances as applicable.
Managing throughput capacity automatically with DynamoDB Auto Scaling
- Use service API operations to specify the alarms, scaling policies, warm up times, and cool down times.

Use Elastic Load Balancing: Load balancers can distribute load by path or by network connectivity.
What is Elastic Load Balancing?

Application Load Balancers can distribute load by path.
What is an Application Load Balancer?
- Configure an Application Load Balancer to distribute traffic to different workloads based on the path under the domain name.
- Application Load Balancers can be used to distribute loads in a manner that integrates with AWS Auto Scaling to manage demand.
  Using a load balancer with an Auto Scaling group
Network Load Balancers can distribute load by connection.
What is a Network Load Balancer?
- Configure a Network Load Balancer to distribute traffic to different workloads using TCP, or to have a constant set of IP addresses for your workload.
- Network Load Balancers can be used to distribute loads in a manner that integrates with AWS Auto Scaling to manage demand.

Use a highly available DNS provider: DNS names allow your users to enter names instead of IP address to access your workloads and distributes this information to a defined scope, usually globally for users of the workload.

Use Amazon Route 53 or a trusted DNS provider
What is Amazon Route 53?
Use Route 53 to manage your CloudFront distributions and load balancers.
- Determine the domains and subdomains you are going to manage.
- Create appropriate record sets using ALIAS or CNAME records.
  Working with records

Use the AWS global network to optimize the path from your users to your applications.: AWS Global Accelerator continually monitors the health of your application endpoints and redirects traffic to healthy endpoints in less than 30 seconds

AWS Global Accelerator is a service that improves the availability and performance of your applications with local or global users. It provides static IP addresses that act as a fixed entry point to your application endpoints in a single or multiple AWS Regions, such as your Application Load Balancers, Network Load Balancers or Amazon EC2 instances.
What Is AWS Global Accelerator?

Configure and use Amazon CloudFront or a trusted content delivery network: A content delivery network (CDN) can provide faster end-user response times and can serve requests for content that may cause unnecessary scaling of your workloads.
What is Amazon CloudFront?

Configure Amazon CloudFront distributions for your workloads, or use a third-party CDN.
- You can limit access to your workloads so that they are only accessible from CloudFront by using the IP ranges for CloudFront in your endpoint security groups or access policies.

Obtain resources upon detection of impairment to a workload

Obtain resources upon detection of impairment to a workload: Scale resources reactively when necessary if availability is impacted, to restore workload availability.

Use scaling plans which are the core component of AWS Auto Scaling. It's where you configure a set of instructions for scaling your resources. If you work with AWS CloudFormation or add tags to AWS resources, you can set up scaling plans for different sets of resources, per application. AWS Auto Scaling provides recommendations for scaling strategies customized to each resource. After you create your scaling plan, AWS Auto Scaling combines dynamic scaling and predictive scaling methods together to support your scaling strategy.
AWS Auto Scaling: How Scaling Plans Work
Amazon EC2 Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling groups. You can specify the minimum number of instances in each Auto Scaling group, and Amazon EC2 Auto Scaling ensures that your group never goes below this size. You can specify the maximum number of instances in each Auto Scaling group, and Amazon EC2 Auto Scaling ensures that your group never goes above this size.
What Is Amazon EC2 Auto Scaling?
Amazon DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling.
Managing Throughput Capacity Automatically with DynamoDB Auto Scaling

Obtain resources upon detection that more resources are needed for a workload

Obtain resources upon detection that more resources are needed for a workload: Scale resources proactively to meet demand and avoid availability impact

Calculate how many compute resources you will need (compute concurrency) to handle a given request rate
Telling Stories About Little's Law
When you have a historical pattern for usage, setup scheduled scaling for Amazon EC2 auto scaling
Scheduled Scaling for Amazon EC2 Auto Scaling
Use AWS predictive scaling
Predictive Scaling for EC2, Powered by Machine Learning

Load test your workload

Perform load testing to identify which aspect of your workload indicates that you must add or remove capacity: Load testing should have representative traffic similar to what you receive in production. Increase the load while watching the metrics you have instrumented, to determine which metric indicates when you must add or remove resources.
Distributed Load Testing on AWS: simulate thousands of connected users

Identify the mix of requests: You may have varied mixes of requests, so you should look at various time frames when identifying the mix of traffic.
Implement a load driver: You can use custom code, open source, or commercial software to implement a load driver.
Load test initially using small capacity: You see some immediate effects by driving load onto a lesser capacity, possibly as small as one instance or container.
Load test against larger capacity: The effects will be different on a distributed load, so you must test against as close to a product environment as possible.