Selected Work

Practical IT systems, tools, and operations work.

A concise view of projects and utilities focused on reliability, automation, monitoring, and searchable knowledge.

Clear
Filtered by #incident
In progress Sep 2025

Monitoring Stack (Prometheus / Grafana / Loki)

SLIs/SLOs, alert thresholds, and runbook links; reduced alert noise by ~40% while improving MTTR.

In progress
In progress Sep 2025

Incident Readiness & Post-mortems

On-call guides, communication templates, and tracked action items that improved overall reliability.

In progress