Overview
The notes outline a strategy for observability in AWS and emphasise the importance of transitioning from traditional monitoring to observability, due to the complexities of distributed systems, the proliferation of devices (mobile, IoT), constant change, and the generation of vast amounts of data.
Why Observability?
- ๐ Evolution to Observability: Recognising the need to move from reactive monitoring to proactive observability.
- โก Constant Change: Acknowledging the challenges posed by distributed systems and the dynamic nature of modern applications.
- ๐ Data Volume: Highlighting the importance of handling and deriving insights from large amounts of data.
What to Observe
- ๐ค Customer Needs: Understanding customer preferences including location, choice, price, security, page speed, and search.
- ๐ข Internal Customer: Focusing on the needs of internal stakeholders.
- ๐ Visualisations / Dashboards: Creating dashboards aligned with business goals, KPIs, and objectives โ focused on site availability and performance.
- ๐ต๏ธ Sessions and Canary Testing: Monitoring sessions and conducting synthetic (canary) testing against applications.
How to Implement
- ๐ Alerting Strategy: Defining criteria for warnings, alerts, and alarms โ ensuring actionable alerts and avoiding alert fatigue.
- ๐ค Automation: Implementing automation for alert actions where possible.
- ๐ Thresholds and Runbooks: Continuously reviewing and updating alert thresholds and runbooks after incidents.
- ๐ฅ๏ธ Dashboards: Creating dashboards tailored to various personas (business, cost, capacity planning, security) and using them for predictive analysis.
- ๐ ๏ธ Tool Selection: Selecting the right tools based on specific needs, features, and business value โ while avoiding redundancy.
- ๐ Documentation: Creating runbooks, playbooks, and access/education docs, and integrating observability into internal processes.
- ๐ Continuous Improvement: Implementing iterative processes for baseline establishment, review cycles, and ongoing improvement.