© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operations priorities How do you determine what your priorities are? OPS 1 Design for workload insights How do you design your workload so that you can understand its state? OPS 2 Development and Integration How do you reduce defects, ease remediation, and improve flow into production? OPS 3 Mitigation of deployment risks How do you mitigate deployment risks? OPS 4 Operational readiness How do you know that you are ready to support a workload? OPS 5 Effective preparation is required to drive operational excellence Business success is enabled by shared goals and understanding across the business development and operations Common standards simplify workload design and management enabling operational success Design workloads with mechanisms to monitor and gain insight into application platform and infrastructure components as well as customer experience and behavior … Prepare Workload health How do you understand the health of your workload? OPS 6 Operations health How do you understand the health of your operations? OPS 7 Event response How do you manage workload and operations events? OPS 8 Successful operation of a workload is measured by the achievement of business and customer outcomes Define expected outcomes determine how success will be measured and identify the workload and operations metrics that will be used in those calculations to determine if operations are successful Consider that operational health includes both the health of the workload and the health and success of the operations acting upon the workload for example deployment and incident response Establish baselines from which improvement or degradation of operations will be identified collect and analyze your metrics and then validate your understanding of operations success and how it changes over time Use collected metrics to determine if you are satisfying customer and business needs and identify areas for improvement … Operate Operations evolution How do you evolve operations? OPS 9 Evolution of operations is required to sustain operational excellence Dedicate work cycles to making continuous incremental improvements Regularly evaluate and prioritize opportunities for improvement for example feature requests issue remediation and compliance requirements including both the workload and operations procedures Include feedback loops within your procedures to rapidly identify areas for improvement and capture learnings from the execution of operations … Evolve The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures. Operational Excellence Credential management How do you manage credentials and authentication? SEC 1 Human access How do you control human access? SEC 2 Programmatic access How do you control programmatic access? SEC 3 Identity and access management are key parts of an information security program ensuring that only authorized and authenticated users are able to access your resources and only in a manner that you intend For example you should define principals that is users groups services and roles that take action in your account build out policies aligned with these principals and implement strong credential management These privilege management elements form the core of authentication and authorization … Identity & Access Management Security events How do you detect and investigate security events? SEC 4 Security awareness How do you defend against emerging security threats? SEC 5 You can use detective controls to identify a potential security threat or incident They are an essential part of governance frameworks and can be used to support a quality process a legal or compliance obligation and for threat identification and response efforts There are different types of detective controls For example conducting an inventory of assets and their detailed attributes promotes more effective decision making and lifecycle controls to help establish operational baselines You can also use internal auditing an examination of controls related to information systems to ensure that practices meet policies and requirements and that you have set the correct automated alerting notifications based on defined conditions These controls are important reactive factors that can help your organization identify and understand the scope of anomalous activity … Detective Controls Network protection How do you protect your networks? SEC 6 Compute protection How do you protect your compute resources? SEC 7 Infrastructure protection encompasses control methodologies such as defense in depth necessary to meet best practices and organizational or regulatory obligations Use of these methodologies is critical for successful ongoing operations in either the cloud or on premises … Infrastructure Protection Data classification How do you classify your data? SEC 8 Data protection at rest How do you protect your data at rest? SEC 9 Data protection in transit How do you protect your data in transit? SEC 10 Before architecting any system foundational practices that influence security should be in place For example data classification provides a way to categorize organizational data based on levels of sensitivity and encryption protects data by way of rendering it unintelligible to unauthorized access These tools and techniques are important because they support objectives such as preventing financial loss or complying with regulatory obligations … Data Protection Incident response How do you respond to an incident? SEC 11 Even with extremely mature preventive and detective controls your organization should still put processes in place to respond to and mitigate the potential impact of security incidents The architecture of your workload strongly affects the ability of your teams to operate effectively during an incident to isolate or contain systems and to restore operations to a known good state Putting in place the tools and access ahead of a security incident then routinely practicing incident response through game days will help you ensure that your architecture can accommodate timely investigation and recovery … Incident Response The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies. Security Service limits How do you manage service limits? REL 1 Network topology How do you manage your network topology? REL 2 Before architecting any system foundational requirements that influence reliability should be in place For example you must have sufficient network bandwidth to your data center These requirements are sometimes neglected because they are beyond a single project s scope This neglect can have a significant impact on the ability to deliver a reliable system In an on premises environment these requirements can cause long lead times due to dependencies and therefore must be incorporated during initial planning … Foundations Demand handling How does your system adapt to changes in demand? REL 3 Resource monitoring How do you monitor your resources? REL 4 Change management How do you implement change? REL 5 Being aware of how change affects a system allows you to plan proactively and monitoring allows you to quickly identify trends that could lead to capacity issues or SLA breaches In traditional environments change control processes are often manual and must be carefully coordinated with auditing to effectively control who makes changes and when they are made … Change Management Data backup How do you back up data? REL 6 Resiliency implementation How does your system withstand component failures? REL 7 Resiliency testing How do you test resilience? REL 8 Disaster recovery How do you plan for disaster recovery? REL 9 In any system of reasonable complexity it is expected that failures will occur It is generally of interest to know how to become aware of these failures respond to them and prevent them from happening again … Failure Management The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues. Reliability Architecture selection How do you select the best performing architecture? PERF 1 Compute selection How do you select your compute solution? PERF 2 Storage selection How do you select your storage solution? PERF 3 Database selection How do you select your database solution? PERF 4 Networking selection How do you configure your networking solution? PERF 5 The optimal solution for a particular system will vary based on the kind of workload you have often with multiple approaches combined Well architected systems use multiple solutions and enable different features to improve performance … Selection Evolving architecture How do you evolve your workload to take advantage of new releases? PERF 6 When architecting solutions there is a finite set of options that you can choose from However over time new technologies and approaches become available that could improve the performance of your architecture … Review Monitor performance How do you monitor your resources to ensure they are performing as expected? PERF 7 After you have implemented your architecture you will need to monitor its performance so that you can remediate any issues before your customers are aware Monitoring metrics should be used to raise alarms when thresholds are breached The alarm can trigger automated action to work around any badly performing components … Monitoring Performance tradeoffs How do you use tradeoffs to improve performance? PERF 8 When you architect solutions think about tradeoffs so you can select an optimal approach Depending on your situation you could trade consistency durability and space versus time or latency to deliver higher performance … Tradeoffs The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve. Performance Efficiency Usage governance How do you govern usage? COST 1 Usage and cost monitoring How do you monitor usage and cost? COST 2 Resource decommissioning How do you decommission resources? COST 3 The increased flexibility and agility that the cloud enables encourages innovation and fast paced development and deployment It eliminates the manual processes and time associated with provisioning on premises infrastructure including identifying hardware specifications negotiating price quotations managing purchase orders scheduling shipments and then deploying the resources However the ease of use and virtually unlimited on demand capacity requires a new way of thinking about expenditures … Expenditure Awareness Service selection How do you evaluate cost when you select services? COST 4 Resource type and size selection How do you meet cost targets when you select resource type and size? COST 5 Pricing model selection How do you use pricing models to reduce cost? COST 6 Data transfer planning How do you plan for data transfer charges? COST 7 Using the appropriate instances and resources for your workload is key to cost savings For example a reporting process might take five hours to run on a smaller server but one hour to run on a larger server that is twice as expensive Both servers give you the same outcome but the smaller server incurs more cost over time … Cost-Effective Resources Matching supply with demand How do you match supply of resources with demand? COST 8 Optimally matching supply to demand delivers the lowest cost for a workload but there also needs to be sufficient extra supply to allow for provisioning time and individual resource failures Demand can be fixed or variable requiring metrics and automation to ensure that management does not become a significant cost … Matching supply & demand New service evaluation How do you evaluate new services? COST 9 As AWS releases new services and features it is a best practice to review your existing architectural decisions to ensure they continue to be the most cost effective As your requirements change be aggressive in decommissioning resources entire services and systems that you no longer require … Optimizing Over Time The ability to run systems to deliver business value at the lowest price point. Cost Optimization