PERF 4: How do you select your database solution?
The optimal database solution for a system varies based on requirements for
availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various subsystems and enable
different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower
AWS purpose-built databases (DAT209-L)
Amazon Aurora storage demystified: How it all works (DAT309-R)
Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)
Cloud Databases with AWS
AWS Database Caching
Amazon DynamoDB Accelerator
Amazon Aurora best practices
Amazon Redshift performance
Amazon Athena top 10 performance tips
Amazon Redshift Spectrum best practices
Amazon DynamoDB best practices
Understand data characteristics: Understand the different characteristics of data in your workload. Determine if the workload requires transactions, how it interacts with data, and what its performance demands are. Use this data to select the best performing database approach for your
workload (for example, relational databases, NoSQL Key-value, document, wide column, graph, time series, or in-memory storage).
Evaluate the available options: Evaluate the services and storage options that are available as part
of the selection process for your workload's storage mechanisms. Understand how, and when, to use a given service or system
for data storage. Learn about available configuration options that can optimize database
performance or efficiency, such as provisioned IOPs, memory and compute resources, and caching.
Collect and record database performance metrics: Use tools, libraries, and systems that record performance measurements related to database performance. For example, measure transactions per second, slow queries, or system latency introduced when accessing the database. Use this data to understand the performance of your database systems.
Choose data storage based on access patterns: Use the access patterns of the workload to decide which services and technologies to use. For example, utilize a relational database for workloads that require transactions, or a key-value store that provides higher throughput but
is eventually consistent where applicable.
Optimize data storage based on access patterns and metrics: Use performance characteristics and access patterns that optimize how data is stored or queried to
achieve the best possible performance. Measure how optimizations such as indexing, key distribution, data warehouse design, or caching strategies impact system performance or overall efficiency.
Understand data characteristicsResearch and document data characteristics: Before choosing a database solution, understand the functional
requirements of your workload and how it interacts with data. When evaluating a database solution, determine if
it is best suited to meet your requirements (for example, transactions or high availability) so that you can select the best combination of databases to use for your workload. Evaluate alternative databases that could better meet your workload requirements. For example, if you are building an IoT application it may be better
to select a timeseries database, such as Amazon Timestream, to easily store and analyze
trillions of events per day at 1/10th the cost of relational databases.
Evaluate the available optionsSelect the appropriate database type for your workload: AWS allows you to choose from multiple purpose-built database
engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. The AWS portfolio of purpose-built databases
supports diverse data models and allows you to build use case driven, highly scalable,
distributed applications. By picking the best database to solve a specific problem or a group of problems, you can break away from restrictive one-size-fits-all monolithic databases and focus
on building applications to meet the needs of your business.
Define database performance requirements: Identify the database performance metrics that matter for your workload, and implement the requirements as part of a data-driven approach, using benchmarking
or load testing. Use this data to identify where your database solution is constrained,
and examine configuration options to solve the issue.
Enable database caching options: Evaluate database caching options, such as Amazon ElastiCache for Redis for caching relational database or Amazon DynamoDB Accelerator (DAX) for a fully managed, highly available, in-memory cache for DynamoDB. These options can deliver improved performance, in some cases from milliseconds to microseconds even at millions of requests per
Collect and record database performance metricsCollect database-related metrics: Design your workload to record metrics related to database activity. This data is crucial for understanding
how your database systems are impacting the overall performance of your workload and where you can make changes to improve performance and efficiency. For example, tracking data points such as query times, the number
of transactions, disk-usage, index-usage, or slow queries, enables you to optimize
your database systems.
Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics.
Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached.
Choose data storage based on access patternsUse access patterns to determine data storage: Evaluate your workload’s access patterns to find an appropriate data storage pattern. For example, if your
workload requires ad hoc query access, you may select a relational database such as Amazon RDS. If your workload is driven by a high growth rate or high-traffic events, you should select a key-value database, such as Amazon DynamoDB.
Optimize data storage based on access patterns and metricsOptimize data storage based on metrics and patterns: Use reported metrics to identify any underperforming areas in
your workload and optimize your database components. Each database system has different performance related characteristics to evaluate, such as how data is indexed, cached, or distributed
among multiple systems. Measure the impact of your optimizations.