PERF 4: How do you select your database solution?

The optimal database solution for a system varies based on requirements for availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various subsystems and enable different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower performance efficiency.

Resources

AWS purpose-built databases (DAT209-L)
Amazon Aurora storage demystified: How it all works (DAT309-R)
Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)
Cloud Databases with AWS
AWS Database Caching
Amazon DynamoDB Accelerator
Amazon Aurora best practices
Amazon Redshift performance
Amazon Athena top 10 performance tips
Amazon Redshift Spectrum best practices
Amazon DynamoDB best practices

Best Practices:

Improvement Plan

Understand data characteristics

  • Research and document data characteristics: Before choosing a database solution, understand the functional requirements of your workload and how it interacts with data. When evaluating a database solution, determine if it is best suited to meet your requirements (for example, transactions or high availability) so that you can select the best combination of databases to use for your workload. Evaluate alternative databases that could better meet your workload requirements. For example, if you are building an IoT application it may be better to select a timeseries database, such as Amazon Timestream, to easily store and analyze trillions of events per day at 1/10th the cost of relational databases.
  • Evaluate the available options

  • Select the appropriate database type for your workload: AWS allows you to choose from multiple purpose-built database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. The AWS portfolio of purpose-built databases supports diverse data models and allows you to build use case driven, highly scalable, distributed applications. By picking the best database to solve a specific problem or a group of problems, you can break away from restrictive one-size-fits-all monolithic databases and focus on building applications to meet the needs of your business.
  • Define database performance requirements: Identify the database performance metrics that matter for your workload, and implement the requirements as part of a data-driven approach, using benchmarking or load testing. Use this data to identify where your database solution is constrained, and examine configuration options to solve the issue.
  • Enable database caching options: Evaluate database caching options, such as Amazon ElastiCache for Redis for caching relational database or Amazon DynamoDB Accelerator (DAX) for a fully managed, highly available, in-memory cache for DynamoDB. These options can deliver improved performance, in some cases from milliseconds to microseconds even at millions of requests per second.
  • Collect and record database performance metrics

  • Collect database-related metrics: Design your workload to record metrics related to database activity. This data is crucial for understanding how your database systems are impacting the overall performance of your workload and where you can make changes to improve performance and efficiency. For example, tracking data points such as query times, the number of transactions, disk-usage, index-usage, or slow queries, enables you to optimize your database systems.
  • Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached.
  • Choose data storage based on access patterns

  • Use access patterns to determine data storage: Evaluate your workload’s access patterns to find an appropriate data storage pattern. For example, if your workload requires ad hoc query access, you may select a relational database such as Amazon RDS. If your workload is driven by a high growth rate or high-traffic events, you should select a key-value database, such as Amazon DynamoDB.
  • Optimize data storage based on access patterns and metrics

  • Optimize data storage based on metrics and patterns: Use reported metrics to identify any underperforming areas in your workload and optimize your database components. Each database system has different performance related characteristics to evaluate, such as how data is indexed, cached, or distributed among multiple systems. Measure the impact of your optimizations.