PERF 2: How do you select your compute solution?

The optimal compute solution for a workload varies based on application design, usage patterns, and configuration settings. Architectures can use different compute solutions for various components and enable different features to improve performance. Selecting the wrong compute solution for an architecture can lead to lower performance efficiency.

Resources

Amazon EC2 foundations (CMP211-R2)
Powering next-gen Amazon EC2: Deep dive into the Nitro system
Deliver high performance ML inference with AWS Inferentia (CMP324-R1)
Optimize performance and cost for your AWS compute (CMP323-R1)
Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)
Cloud Compute with AWS
EC2 Instance Types
Processor State Control for Your EC2 Instance
EKS Containers: EKS Worker Nodes
ECS Containers: Amazon ECS Container Instances
Functions: Lambda Function Configuration

Best Practices:

Improvement Plan

Evaluate the available compute options

  • Consider compute options: Decide which compute option matches your requirements. From a performance perspective, instances can be used for long running applications, especially those with state and long running computation cycles; functions can be used for event-initiated, stateless applications that need quick response times; and containers enable you to efficiently use the resources of an instance.
    Cloud Compute with AWS
  • Define compute performance requirements: Identify the important compute performance metrics for your workload, and implement your requirements using a data-driven approach involving benchmarking or load testing. Use this data to identify where your compute solution is constrained, and examine configuration options to improve the solution.
  • Understand the available compute configuration options

  • Learn about available configuration options: For your selected compute option, use the available configuration options to optimize for your performance requirements.Utilize AWS Nitro Systems to enable full consumption of the compute and memory resources of the host hardware. Dedicated Nitro Cards enable high speed networking, high speed EBS, and I/O acceleration.
    AWS Nitro System
  • Collect compute-related metrics

  • Collect compute-related metrics: Amazon CloudWatch can collect metrics across the compute resources in your environment. Use a combination of CloudWatch and other metrics-recording tools to track the system-level metrics within your workload. Record data such as CPU usage levels, memory, disk I/O, and network to gain insight into utilization levels or bottlenecks. This data is crucial to understand how the workload is performing and how effectively it is using resources. Use these metrics as part of a data-driven approach to actively tune and optimize your workload's resources.
    Amazon CloudWatch
  • Determine the required configuration by right-sizing

  • Modify your workload configuration by right sizing: To optimize both performance and overall efficiency, determine which resources your workload needs. Choose memory-optimized instances for systems that require more memory than CPU, or compute-optimized instances for components that do data processing that is not memory-intensive. Right sizing enables your workload to perform as well as possible while only using the required resources
  • Use the available elasticity of resources

  • Take advantage of elasticity: Elasticity matches the supply of resources you have against the demand for those resources. Instances, containers, and functions provide mechanisms for elasticity either in combination with automatic scaling or as a feature of the service. Use elasticity in your architecture to ensure that you have sufficient capacity to meet performance requirements at all scales of use. Ensure that the metrics for scaling up or down elastic resources are validated against the type of workload being deployed. If you are deploying a video transcoding application, 100% CPU is expected and should not be your primary metric. Alternatively, you can measure against the queue depth of transcoding jobs waiting to scale your instance types. Ensure that workload deployments can handle both scale up and scale down events. Scaling down workload components safely is as critical as scaling up resources when demand dictates. Create test scenarios for scale-down events to ensure that the workload behaves as expected.
  • Re-evaluate compute needs based on metrics

  • Use a data-driven approach to optimize resources: To achieve maximum performance and efficiency, use the data gathered over time from your workload to tune and optimize your resources. Look at the trends in your workload's usage of current resources and determine where you can make changes to better match your workload's needs. When resources are over-committed, system performance degrades, whereas underutilization results in a less efficient use of resources and higher cost.