Learnings-From-Work
Exceptions/Errors to Alert On
SSL
Kafka
SQL
NoSQL
Auth
Logging
Caching
Performance Metrics to Monitor
Cpu Utilization
Memory Utilization
Latency
Error count
Error per second
Disk Utilization
Status Codes to Alert On
502
504
503
401
403
404
429
Cost To Consider
Software Licenses
Query exectution costs
Cost to miss
Restore Point Objective
Restore Time Objective
Service Level Agreements
Service Level Objectives
Cost of Human Errors
Due to fat fingering
Due to fatigue
Due to lack of knowledge
Due to lack of training
Cross-region traffic
Ingress
Egress
Compliance violations
Last updated