Learnings-From-Work

Exceptions/Errors to Alert On

  • SSL

  • Kafka

  • SQL

  • NoSQL

  • Auth

  • Logging

  • Caching

Performance Metrics to Monitor

  • Cpu Utilization

  • Memory Utilization

  • Latency

  • Error count

  • Error per second

  • Disk Utilization

Status Codes to Alert On

  • 502

  • 504

  • 503

  • 401

  • 403

  • 404

  • 429

Cost To Consider

  • Software Licenses

  • Query exectution costs

  • Cost to miss

    • Restore Point Objective

    • Restore Time Objective

    • Service Level Agreements

    • Service Level Objectives

  • Cost of Human Errors

    • Due to fat fingering

    • Due to fatigue

    • Due to lack of knowledge

    • Due to lack of training

  • Cross-region traffic

  • Ingress

  • Egress

  • Compliance violations

Last updated