Skip to main content
Skip to footer
AWS Reliability Pillar
Updated : 20-Dec-2020
In category : AWS
Design Principles
- Automatically recover from failure – use KPIs to trigger automatic system recovery
- Test automatic recovery – validate recovery procedures
- Scale horizontally to increase aggregate workload availability – use autoscaling
- Stop guessing capacity – monitor demand and utilization to trigger scaling in or out
- Manage change in automation – automate all changes to infrastructure for reliable recovery
Best Practices
- Foundations – consider service quotas and network capacity
- Workload architecture – design failure prevention and failure mitigation
- Change management – design for changes in demand and capacity with monitoring and triggering in response to KPI changes
- Failure Management – failure detection and automatic repair, backup and recovery, DR planning and testing
Services
- AutoScaling
- AWS Backup
- AWS Cloudwatch