irin observability

Infrastructure Topology

Outbound-push architecture. Zero inbound ports on customer hosts. Two independent backends (primary + replica), each running its own Prometheus, Grafana, and Loki.
Dev-01 mirrors the production stack on a single bare-metal box for staging and pre-prod validation — no replication, daily snapshot to PBS.
v2026.05 · high-level w/ dev server
tls · zero-trust · streaming repl.
Irin service Cloudflare edge Infra · external TLS push (outbound) Postgres streaming repl. Backup / failover path
01 · CLIENT HOSTS 02 · CLOUDFLARE EDGE 03 · BARE METAL IRIN BACKENDS 04 · WEB · BACKUP TEST CLIENT HOST Grafana Alloy metrics · logs Node Exporter cAdvisor per-tenant cloudflare token CLIENT HOST Grafana Alloy metrics · logs Node Exporter cAdvisor per-tenant cloudflare token EXTERNAL SERVICES Resend Stripe Payments Tailscale Admin Cloudflare DNS GitHub SCM PBS Mirror consumed by irin-api & web CLOUDFLARE EDGE · DEV Cloudflare Tunnel zero-trust ingress Cloudflare Access service-token auth outbound-push only CLOUDFLARE EDGE · PRIMARY Cloudflare Tunnel zero-trust ingress Cloudflare Access service-token auth outbound-push only CLOUDFLARE EDGE · REPLICA Cloudflare Tunnel zero-trust ingress Cloudflare Access service-token auth outbound-push only TLS TLS TLS DEV-01 · DEVELOPMENT Nginx reverse proxy CF Tunnel daemon Prometheus metrics tsdb Alertmanager alert routing Loki log aggregation MinIO log chunks Grafana dashboards Renderer report images Irin-API portal backend Postgres portal db · primary Config-Svr bootstrap · alloy Blackbox endpoint probes staging · pre-prod validation SERVER-01 · PRIMARY Nginx reverse proxy CF Tunnel daemon Prometheus metrics tsdb Alertmanager alert routing Loki log aggregation MinIO log chunks Grafana dashboards Renderer report images Irin-API portal backend Postgres portal db · primary Config-Svr bootstrap · alloy Blackbox endpoint probes active · serves all tenants SERVER-02 · REPLICA Blackbox endpoint probes CF Tunnel daemon Prometheus metrics tsdb Alertmanager alert routing Loki log aggregation MinIO log chunks Grafana dashboards Renderer report images Irin-API portal backend Postgres portal db · replica failover: cf tunnel redirect + postgres promotion · no replay needed hot standby · metrics & logs live streaming postgres repl. WEB-SERVER VM Marketing Site irinobservability.com Portal UI customer dashboard Nginx edge tls · routing PROXMOX BACKUP SERVER pbs-disk-01 mirror pbs-disk-02 mirror pbs-disk-03 offsite postgres · grafana state · kb · stack duplicated across three separate hdds

Why outbound-push

Customer hosts open 0 inbound ports. Alloy initiates a TLS connection through a per-tenant Cloudflare Tunnel to either edge. Compromise of an Irin backend cannot pivot back into a customer network.

Two independent backends

Primary and Replica each run a full Prometheus / Grafana / Loki / MinIO stack. Postgres uses streaming replication for portal state. Failover is a CF Tunnel redirect plus a Postgres promotion — no replay or backfill.

Backup posture

Proxmox Backup Server keeps three rotating copies of Postgres, Grafana state, the knowledge base, and the monitoring stack itself. The Web VM and Dev server back up to PBS on the same schedule.