Linux Server Monitoring Blog | Irin Observability

Migrating from Promtail to Alloy for Log Collection

July 21, 2026

Promtail reached end of life in March 2026, making Alloy the supported replacement for shipping logs to Loki. How the loki.source.*, loki.process, and loki.write components replace Promtail’s config, and the config trap that silently drops logs if you guess a host’s log method by distro instead of checking it.

Read more →

Monitoring Docker Containers with Grafana Alloy and cAdvisor

July 20, 2026

Host metrics tell you a server is healthy, not which container is eating all the memory. How cAdvisor and Alloy push container metrics upstream without opening a port, the cardinality trap that floods dashboards if you skip a label matcher, and the three alerts worth writing first.

Read more →

Why LLM Decisions Should Be Deterministic

July 13, 2026

A deterministic layer under an LLM isn’t just about consistent output. It’s about auditability: whether a decision made months ago can still be reproduced from the implementation, rather than explained after the fact by asking the model what it thinks it did.

Read more →

Prometheus Agent Mode vs Grafana Alloy: Choosing the Right Push Agent in 2026

July 13, 2026

Prometheus Agent mode and Grafana Alloy both push metrics via remote_write, so the choice isn’t obvious. Here is how to decide between a lightweight metrics-only forwarder and a unified telemetry pipeline, from someone running Alloy across a multi-tenant fleet.

Read more →

Migrating from node_exporter to Grafana Alloy, One Server at a Time

July 2, 2026

A step-by-step migration from node_exporter to Grafana Alloy for Linux fleets: run both agents side by side, verify the new telemetry path, then retire the old one. No metrics gap, no flag day.

Read more →

The LLM narrates. The code decides.

June 25, 2026

Most “AI for observability” tools hand the model the judgment call. Here is why I did the opposite: Python classifies alerts into a fixed eight-value enum, the LLM narrates the verdict in one sentence, and a fail-closed design keeps the model off the critical path entirely.

Read more →

Uptime Kuma Tells You That a Service Broke, Not Why

June 22, 2026

Uptime Kuma is excellent at telling you when a service becomes unreachable, but it cannot explain why a service is slow or unhealthy while still responding. That requires internal metrics. Here is where the line is.

Read more →

Monitoring 20 Linux VMs with Prometheus (No Kubernetes Required)

June 19, 2026

Modern monitoring docs assume Kubernetes. If you are managing a fleet of Linux VMs, node_exporter plus Prometheus gives you everything you need for infrastructure monitoring with a single lightweight agent. No cluster required.

Read more →

The Only 5 PromQL Queries You Really Need to Monitor a Linux Server

June 11, 2026

PromQL has a reputation for being intimidating, and the reputation is half-earned. But monitoring a single Linux server well doesn’t require a comprehensive grasp of the language. Five queries covering CPU, memory, disk space, disk IO, and network will catch the large majority of what goes wrong, and each one is explained here from the inside out.

Read more →

Datadog vs Open Source Monitoring: What Small Teams Should Know

June 10, 2026

If you are evaluating Datadog for a small team, the decision usually comes down to whether your time is more expensive than the monitoring bill. Here is the actual math on per-host costs, the metered surprises nobody warns you about, and an honest look at all three real alternatives for teams running 3–50 Linux servers.

Read more →

Metrics Tell You Something Broke. Tracing Tells You What, Where, and Why.

June 3, 2026

Metrics tell you something broke. Distributed tracing tells you what, where, and why. How adding OpenTelemetry and Grafana Tempo to an existing Prometheus stack changed the diagnostic workflow from “search for clues” to “read the receipt”, including what I got wrong along the way.

Read more →

Why Shifting from Node Exporter to Alloy is the Better Option

May 27, 2026

Grafana Alloy consolidates metrics, logs, and traces into a single programmable pipeline at the edge, replacing the node_exporter pull model with outbound-only push over authenticated endpoints. Here is what that architectural shift means for fleet management, security posture, and the operational overhead of running a managed observability platform.

Read more →

When you bring your data home, who is going to keep an eye on it?

May 26, 2026

Cloudian’s 2026 research found 75% of senior IT teams had moved workloads back on-premises in the prior two years. The monitoring tools that cloud providers bundled in for free don’t move with the data. Here is what teams lose when they repatriate, and why observability is the gap that gets addressed last.

Read more →

Adding an LLM narration layer to a self-hosted observability stack

May 11, 2026

I almost made the classic AI architecture mistake: dumping raw Prometheus metrics and Loki logs into an LLM and asking it to find anomalies. Why I preprocess metrics into structured summaries before the model ever sees them, why inference runs locally on my LAN, and why the LLM ends up as the narrator, not the analyst.

Read more →

I built a managed observability SaaS from a homelab. I still couldn't explain what was happening.

May 7, 2026

I had dashboards. I had logs. I still didn’t know why things were breaking. The path from a chaotic homelab dashboard to a managed observability SaaS, the original idea I had to throw out, the mTLS rabbit hole I had to climb back out of, and the mistakes that were more useful than the wins.

Read more →

How three-layer tenant isolation works without dedicated infrastructure per client

April 28, 2026

The most common follow-up question after the first post was some version of “but is that actually isolated?” Here’s how three independent layers (Prometheus labels, Grafana organizations, and Cloudflare Access tokens) keep tenant data separate without dedicated infrastructure per client, and why the independence between layers matters more than any single layer on its own.

Read more →

How I built multi-tenant observability on a two-server homelab (and what I'd do differently)

April 23, 2026

About six months ago I was running infrastructure across a handful of different environments with no clean way to give each one isolated visibility (not filtered by a dashboard variable, but structurally isolated). What started as a homelab experiment with Prometheus, Grafana, and Alloy turned into Irin Observability. This is the architecture behind it, the tradeoff that defined everything, and three things I got wrong.

Read more →