case study · Logistics

Observability Stack for European Logistics Company

Built a complete observability stack from scratch with SLO-driven alerting, reducing MTTR by 90% and eliminating alert fatigue.

the challenge

What stood in the way

The company had zero monitoring beyond basic health checks. Production issues were discovered by customers, MTTR averaged 4 hours, and the on-call team was drowning in noise from hundreds of misconfigured CloudWatch alarms. There was no distributed tracing across their 60+ microservices.

our solution

How we solved it

We deployed Prometheus for metrics, Grafana for visualization, Loki for log aggregation, and Tempo for distributed tracing. Defined 15 SLOs with error budgets tied to business KPIs, implemented SLO-driven alerting that replaced all legacy alarms, and built runbooks for every alert.

the outcome

Measurable results

R / 01

90%

MTTR reduced by from 4 hours to 25 minutes

R / 02

340+

dashboards created

R / 03

15SLOs

defined and tracked

R / 04

70%

On-call pages reduced by

tech stack

What powered it

Prometheus

GrafanaSRE

next step

Let's build the next case study together.

Book a Call Send a Brief