01
The company had zero monitoring beyond basic health checks. Production issues were discovered by customers, MTTR averaged 4 hours, and the on-call team was drowning in noise from hundreds of misconfigured CloudWatch alarms. There was no distributed tracing across their 60+ microservices.