OPS 1: How do you evaluate your Serverless application’s health?

Evaluating your metrics, distributed tracing, and logging gives you insight into business and operational events, and helps you understand which services should be optimized to improve your customer’s experience.

Resources

Amazon CloudWatch Metrics and Dimensions
AWS Personal Health Dashboard
Amazon CloudWatch Automated Dashboard
AWS Serverless Monitoring Partners
re:Invent 2019 - Production-grade full-stack apps with AWS Amplify

Best Practices:

Improvement Plan

Understand, analyze, and alert on metrics provided out of the box

  • Understand what metrics and dimensions each managed service utilized provides out of the box
  • Configure alerts on relevant metrics to engage you when components are unhealthy.
  • Use application, business, and operations metrics

  • Identify user journeys and metrics that can be derived from each customer transaction.
  • Create custom metrics asynchronously as opposed to synchronously for improved performance, cost, and reliability outcomes.
    Creating Custom Metrics Asynchronously with Amazon CloudWatch
  • Emit business metrics from within your workload to measure application performance against business goals.
  • Create and analyze component metrics to measure interactions with upstream and downstream components.
  • Create and analyze operational metrics to assess the health of your continuous delivery pipeline and operational processes.
  • Use distributed tracing and code is instrumented with additional context

  • Identify common business contexts and system data that are commonly present across multiple transactions.
  • Instrument SDKs and requests to upstream/downstream services to understand the flow of a transaction across system.
  • Use structured and centralized logging

  • Log request identifiers from downstream services, component name, component runtime information, unique correlation identifiers and information that helps identify a business transaction.
  • Use JSON as your logging output. Prefer logging entire objects/dictionaries rather than many one line messages. Mask or remove sensitive data when logging.
  • Minimize logging debugging information as they can incur both costs and increase noise to signal ratio.