REL 1: How do you limit an individual tenant’s ability to impose load that may impact availability for other tenants of your system?

Identify tenants that are consuming resources at a rate that could undermine the overall stability and availability of your system. Use this data and or/scaling policies to limit the load these tenants can place on the system to prevent a large-scale outage, which could cascade across all tenants of your system.

Resources

AWS re:Invent 2017: GPS: SaaS Monitoring - Creating a Unified View of Multi-tenant Health featuring New Relic (GPSTEC309)
Monolith to serverless SaaS: Migrating to multi-tenant architecture
Partitioning Pooled Multi-Tenant SaaS Data with Amazon DynamoDB
AWS re:Invent 2019: [REPEAT] Scaling up to your first 10 million users (ARC211-R)
Architecting Successful SaaS: Interacting with Your SaaS Customer’s Cloud Accounts
AWS Auto Scaling https://aws.amazon.com/autoscaling/

Best Practices:

Use throttling policies to limit the effect that noisy tenants have on the system: Define strategies for capturing and identifying tenants that are imposing excess load on the system that might not be supported at scale. Apply throttling policies to help ensure that these noisy neighbor tenants do not impact the availability of the other tenants of your system.
Partition tenant load to limit the area of effect: Identify partitioning strategies that can effectively distribute or isolate tenant loads, enabling the resources (compute, storage, etc.) to effectively limit access, scale, and distribute spikey tenant loads.
Define SLAs for each tenant tier: Limit the area of effect of lower tier tenants by introducing SLAs that are configured for each tenant tier supported by your system. Use SLAs as part of a throttling strategy to tightly control the level of activity and load a tenant can place on the system.

Improvement Plan

Use throttling policies to limit the effect that noisy tenants have on the system

Use scaling policies to add capacity in anticipation of tenant spikes

For compute intensive resources, add additional cluster capacity to prevent highly active or bursty tenants from impacting availability by saturating the available resources.
For pooled compute resources running on Amazon EC2, define automatic scaling policies that prevent resources from being maximized by a spike that may be introduced by any one tenant.
For container resources, ensure that tenant clusters include additional capacity to support scenarios where tenant may place excessive load on the system.
For AWS Lambda-based SaaS environments, consider concurrent provisioning strategies to help ensure that resources remain available for tenants in a noisy neighbor condition.
Using Amazon SQS in a Multi-Tenant SaaS Solution
Creating and using usage plans with API keys
AWS re:Invent 2019: Building serverless SaaS on AWS (ARC410-R)
AWS Auto Scaling
Managing Concurrency for a Lambda Function

Detect and throttle any tenant that is generating load that might impact availability of your system

For systems using Amazon API Gateway, introduce a usage plan that evaluates tenant consumption and identifies any tenant may be generating a request load that may impact availability of our system. Use the throttling capabilities of API Gateway to limit the number of requests this tenant can make on the system.
Implement application-enforced throttling for the services of your application, using the request processing mechanisms of your stack or the throttling capabilities of third-party tools.
Amazon API Gateway: Throttle API requests for better throughput

Partition tenant load to limit the area of effect

Design your multi-tenant microservices to prevent potential tenant bottlenecks that could impact availability

The design and granularity of your microservices is directly influenced by the multi-tenant availability and scaling footprint you’re targeting. Create more granular services that provide a broader range of tenant partitioning strategies to more effectively target reliability.
Use a mix of silo and pooled partitioning models based on the availability and scaling profile of your application.
Consider supporting a model where higher end tiers might have siloed deployments for key services to maximize availability for higher value tenants.

Optimize for the most active tenant or higher end tiers

Enhance key workflows in the system through the introduction of scaling and performance optimizations that can reduce the load placed on the system. One method, for example, is to offer data caching to highly active tenants to reduce their burden on the system.
Designing for failure: Architecting for resilient systems on AWS

Define SLAs for each tenant tier

Define the tenant tiers and their mapping to application SLAs

Define explicit SLA policies for each of the tenant tiers supported by your system, and define how these SLAs can vary across the range of experiences and workflows that are part of your system.
For systems that include an API, outline the number of requests this API can support for each tier of your application.

Introduce a mechanism to manage and enforce these SLA

If your system uses API Gateway, define separate usage plans for each of the tenant tiers that your system supports.
When SLAs are not met by your system, publish events to enable the proactive discovery of these potential tenant load issues.

Surface potential SLA issues as part of the operational experience

Use AWS operational tools or APN Partner solutions, such as Datadog, New Relic, and AppDynamics, to provide SLA insights.
Display SLA and load challenges on internally built operational dashboards

Use separate reserve concurrency configurations for tenant tiers to help ensure that tenant consumption does not exceed the target consumption profile for a given tier.
AWS re:Invent 2019: Serverless SaaS deep dive: Building serverless SaaS on AWS (ARC410-R)