Blog

Architecture decisions, engineering deep-dives, and lessons from building a managed monitoring platform.

Metrics Tell You Something Broke. Tracing Tells You What, Where, and Why.

June 3, 2026

Metrics tell you something broke. Distributed tracing tells you what, where, and why. How adding OpenTelemetry and Grafana Tempo to an existing Prometheus stack changed the diagnostic workflow from “search for clues” to “read the receipt” — including what I got wrong along the way.

Read more →

Why Shifting from Node Exporter to Alloy is the Better Option

May 27, 2026

Grafana Alloy consolidates metrics, logs, and traces into a single programmable pipeline at the edge, replacing the node_exporter pull model with outbound-only push over authenticated endpoints. Here is what that architectural shift means for fleet management, security posture, and the operational overhead of running a managed observability platform.

Read more →

When you bring your data home, who is going to keep an eye on it?

May 26, 2026

Cloudian’s 2026 research found 75% of senior IT teams had moved workloads back on-premises in the prior two years. The monitoring tools that cloud providers bundled in for free don’t move with the data. Here is what teams lose when they repatriate, and why observability is the gap that gets addressed last.

Read more →

Adding an LLM narration layer to a self-hosted observability stack

May 11, 2026

I almost made the classic AI architecture mistake — dumping raw Prometheus metrics and Loki logs into an LLM and asking it to find anomalies. Why I preprocess metrics into structured summaries before the model ever sees them, why inference runs locally on my LAN, and why the LLM ends up as the narrator, not the analyst.

Read more →

I built a managed observability SaaS from a homelab. I still couldn't explain what was happening.

May 7, 2026

I had dashboards. I had logs. I still didn’t know why things were breaking. The path from a chaotic homelab dashboard to a managed observability SaaS — the original idea I had to throw out, the mTLS rabbit hole I had to climb back out of, and the mistakes that were more useful than the wins.

Read more →

How three-layer tenant isolation works without dedicated infrastructure per client

April 28, 2026

The most common follow-up question after the first post was some version of “but is that actually isolated?” Here’s how three independent layers — Prometheus labels, Grafana organizations, and Cloudflare Access tokens — keep tenant data separate without dedicated infrastructure per client, and why the independence between layers matters more than any single layer on its own.

Read more →

How I built multi-tenant observability on a two-server homelab (and what I'd do differently)

April 23, 2026

About six months ago I was running infrastructure across a handful of different environments with no clean way to give each one isolated visibility — not filtered by a dashboard variable, but structurally isolated. What started as a homelab experiment with Prometheus, Grafana, and Alloy turned into Irin Observability. This is the architecture behind it, the tradeoff that defined everything, and three things I got wrong.

Read more →