PERF 1: How do you prevent one tenant from adversely impacting the experience of another tenant?

Introduce scaling and throttling constructs to help ensure that no single tenant can put load on the system in a way that impacts the performance and experience of other tenants.

Resources

Optimizing SaaS Tenant Workflows and Costs Blog
SaaS Storage Strategies Building a Multi-tenant Storage Model on AWS
Whitepaper: SaaS Solutions on AWS Tenant Isolation Architectures
AWS re:Invent 2018: SaaS Reference: Review of Real-World Patterns & Strategies (GPSTEC302)
Microservices decomposition for SaaS environments (ARC210)

Best Practices:

Silo high demand tenant resources: Identify the potential bottlenecks in your system that might create noisy neighbor conditions and distribute these into separate siloes. The separation can happen across layers of your architecture, including compute, storage, messaging, and so on. Siloes should only be introduced at the layer that represents the bottleneck of the experience.
Decompose and deploy services in a pattern that aligns tenant loads with performance expectations: The design of your SaaS system has accounted for common usage scenarios, examined potential bottlenecks, and partitioned resources to ensure that the load is effectively distributed. You have aligned scaling behavior with the consumption profiles of tenants.
Combine tenant-aware policies with added capacity to address tenant spikes: Tenant-aware policies are used to identify potential spikes in tenant activity that could adversely impact performance. These policies are combined with a capacity strategy that adds scaling “cushion” to help ensure that scaling delays don’t impact tenant performance.
Use throttling policies to prevent individual tenants from placing excess load on the system: Introduce throttling policies that evaluate the activity trends of individual tenants and uses SLAs and their tier and plan to prevent saturation of one or more experiences in the system. Prevent lower tier tenants from impacting the performance of higher tier tenants.

Improvement Plan

Silo high demand tenant resources

Identify the workflows where the load of individual tenants could consume excess resources and impact the experience of another tenant. For each of these areas, create separate siloes where each tenant is given dedicated resources.

For a microservice-based architecture, services that represent potential performance bottlenecks can be deployed in a silo model, offering a dedicated experience for each tenant.
In scenarios where storage represents the primary bottleneck, create separate storage constructs for each tenant where the compute and data are not comingled.

Messaging constructs, such as Amazon SNS, Amazon SQS, and Amazon EventBridge, can be impacted by the load of individual tenants. In these cases, you might offer some or all tenants a dedicated message processing resource.

For Amazon EC2-based solutions, you might offer some or all tenants dedicated compute resources.
Microservices decomposition for SaaS environments (ARC210)
See section “Targeted Silo Isolation” in SaaS Tenant Isolation Strategies Isolating Resources in a Multi-Tenant Environment
SaaS tenant isolation patterns (ARC372)

In cases where bottlenecks might span the entire tenant experience (compute, storage, messaging, etc.), consider supporting siloes that can span the entire stack for specific tenants.

For serverless SaaS environments, use reserved concurrency to silo the experience of specific tenant tiers, offering a larger pool of concurrent functions to premium tiers.
Serverless SaaS deep dive: Building serverless SaaS on AWS (ARC410)

Decompose and deploy services in a pattern that aligns tenant loads with performance expectations

Design your multi-tenant microservices to distribute and optimize your scaling profile.

The design and granularity of your microservices is directly influenced by the multi-tenant scaling footprint you’re targeting. Create more granular services that provide a broader range of tenant partitioning strategies that can more effectively target scale and limit noisy neighbor conditions.
Use a mix of silo and pooled partitioning models to limit a tenant’s ability to create bottlenecks on key services or resources.
Support a model where higher end tiers might have siloed deployments for key services to minimize noisy neighbor impacts.
Microservices decomposition for SaaS environments (ARC210)

Optimize for the most active tenant or higher end tiers.

Enhance key workflows in the system through the introduction of scaling and performance optimizations to reduce the load placed on the system. For example, you can offer data caching to highly active tenants to reduce their burden on the system.
AWS re:Invent 2016: Optimizing SaaS Solutions for AWS (ARC408)

Combine tenant-aware policies with added capacity to address tenant spikes

Use tenant consumption insights to define policies that react and respond to scenarios where individual tenants are putting heavier loads on the system.

Instrument your application with metrics that track tenant activity and correlate this activity with load to identify noisy neighbor conditions.
Define scaling policies that are triggered based on tenant consumption profiles, ensuring that scaling can effectively address spikes that are driven individual tenants.
Optimizing SaaS Tenant Workflows and Costs

Add capacity to your baseline environment, adding a scaling buffer that can be used to offset bursts of activity from individual tenants.

Use throttling policies to prevent individual tenants from placing excess load on the system

Detect and throttle any tenant that is generating excess load on your system

For systems using Amazon API Gateway, introduce a usage plan to evaluate tenant consumption and identify any tenant generating a request load that might impact availability of your system. Use the throttling capabilities of API Gateway to limit the number of requests that this tenant can place on the system.
Amazon API Gateway: Throttle API requests for better throughput
Creating and using usage plans with API keys
Implement application-enforced throttling for the services of your application, using the request processing mechanisms of your stack or the throttling capabilities of third-party tools.