Migrating from node_exporter to Grafana Alloy, One Server at a Time

July 2, 2026

If you have been monitoring Linux servers for any length of time, there is a good chance node_exporter was the first thing you installed. It is lightweight, reliable, and exposes a large amount of machine metrics for Prometheus to scrape. For years, it has been the default answer.

As your infrastructure grows, though, your monitoring stack usually grows with it. First comes log collection. Then traces. Before long you are running node_exporter, a log shipper, and maybe another telemetry agent. Each component has its own configuration, service unit, upgrade cycle, and failure modes. Grafana Alloy changes that by consolidating those responsibilities into a single telemetry agent.

This post walks through migrating from node_exporter to Alloy on a real fleet, one server at a time, while maintaining continuous visibility throughout the process. These are the exact steps that survived contact with production on the Irin monitoring stack, not the idealized version that looks clean in a diagram.

// TL;DR

If you are already running node_exporter, do not replace it overnight. Install Grafana Alloy alongside it, configure Alloy’s built-in prometheus.exporter.unix component, verify that metrics are reaching your remote Prometheus instance, and only then retire node_exporter. Migrating one server at a time minimizes risk, preserves visibility, and positions your infrastructure for logs, traces, and future telemetry without deploying additional agents.

// THE REAL DIFFERENCE IS THE DIRECTION OF TRAVEL

Before getting started, it is worth understanding what actually changes. This is not simply replacing one monitoring agent with another.

node_exporter is a server. It listens on a port, typically 9100, and waits for Prometheus to connect and scrape metrics. That means every monitored machine needs an open endpoint, network connectivity from Prometheus, firewall rules, and scrape configurations.

Alloy flips that model around. Instead of waiting for Prometheus to connect, Alloy collects metrics locally and pushes them to a remote endpoint using Prometheus Remote Write. On my stack, that outbound traffic travels through a Cloudflare Tunnel. Nothing reaches into the monitored servers. There are no metrics ports exposed to the LAN, no inbound firewall rules to maintain, and no scrape network that has to remain routable. The metrics are pushed out through the secure tunnel, and the monitoring stack has no path back into the server.

That shift is the real migration. You are not replacing a binary, you are changing the direction your telemetry flows. Once you frame it that way, the rest of the migration makes much more sense.

// THE COMPONENT THAT REPLACES NODE_EXPORTER

Alloy is configured using River, which is less like a traditional configuration file and more like a telemetry pipeline. Each component performs one task before handing data to the next component. As you begin to put your model together, the configuration becomes surprisingly readable.

The component that replaces node_exporter is prometheus.exporter.unix. Under the hood it uses the same collector code as node_exporter, so the metrics themselves remain familiar.

A minimal configuration looks like this:

// Collect host metrics.
prometheus.exporter.unix "host" {
}

// Scrape those locally collected metrics.
prometheus.scrape "host" {
  targets    = prometheus.exporter.unix.host.targets
  forward_to = [prometheus.remote_write.default.receiver]
}

// Push metrics to Prometheus Remote Write.
prometheus.remote_write "default" {
  endpoint {
    url = "https://metrics.example.internal/api/v1/write"

    // Production deployments typically authenticate here using
    // basic auth, bearer tokens, or mTLS.
  }
}

Read from top to bottom, it tells a story. The exporter gathers metrics. The scraper collects those metrics internally. The remote write component sends them to Prometheus.

The part that is genuinely new if you are coming from node_exporter is how the components find each other. They are wired together by name, not by network address. In the scraper, targets = prometheus.exporter.unix.host.targets is how the scraper names the exporter it pulls from, and forward_to = [prometheus.remote_write.default.receiver] is how the scraper names the writer it hands off to. There is no port number and no scrape target list, because the scraper is reaching into another component inside the same process rather than across the network. That reference-by-name wiring is the whole mental model. Once it clicks, every other Alloy pipeline you build, for logs, traces, or profiling, is the same shape: a source, a processor, and a destination, each naming the next.

One thing to remember during migration: the metrics themselves stay the same, but some labels can change depending on how Alloy identifies the host. In particular, prometheus.exporter.unix sets the instance label to the hostname of the machine running Alloy, which may differ from the instance value your old node_exporter scrape job produced. That is expected, and it matters when you start verifying your migration.

// WHY MIGRATE ONE SERVER AT A TIME?

The temptation is to deploy Alloy everywhere and immediately disable node_exporter, but the safer approach is to do it gradually. The safest migration pattern is a canary. Pick a non-critical server, install Alloy beside node_exporter, and let both run simultaneously while you verify that the new telemetry path is working correctly. The resource overhead is negligible, but the confidence you gain is significant.

Running both agents briefly means you always have a known-good monitoring path while validating the new one. The two do not interfere. Alloy collects and pushes on its own outbound path while Prometheus keeps scraping node_exporter on the old path. Only after you have confirmed that Alloy is producing fresh, accurate metrics should you retire node_exporter. Once you have done that successfully a few times, batching servers becomes much less stressful.

// THE MIGRATION

1. Pick a canary host and install Alloy

Choose a stable server where a few minutes of metric oddities would not be catastrophic, but would still be noticeable. Install Grafana Alloy using your distribution’s package manager. Most Linux distributions install Alloy as a systemd service, with the primary configuration file located at /etc/alloy/config.alloy.

Leave node_exporter exactly as it is. Do not enable or start the Alloy service yet. You want the configuration in place before the service comes up, so the first time Alloy starts, it starts doing the right thing rather than running with an empty default config.

2. Write the Alloy configuration, then start the service

Create your River configuration at /etc/alloy/config.alloy and point the remote_write endpoint at your Prometheus receiver. With the config in place, enable and start the service:

sudo systemctl enable alloy
sudo systemctl start alloy

Then confirm the service came up cleanly:

sudo systemctl status alloy --no-pager

The --no-pager option prints the status and returns you to the shell instead of opening the interactive pager, which makes it far friendlier for scripts and automation. At this point nothing has changed from Prometheus’ perspective. You are simply running two collectors concurrently.

Alloy also exposes a built-in UI on port 12345, bound to localhost by default. Open it and you get a live view of every component in your telemetry pipeline and its health. If prometheus.exporter.unix and prometheus.remote_write both report healthy, data is flowing. If remote_write is unhealthy, the problem is usually one of three things, in roughly this order of likelihood:

An incorrect endpoint URL
Network connectivity to the endpoint
Authentication

3. Verify the new telemetry

Now both node_exporter and Alloy are reporting metrics, and this is the step people are tempted to skip. It is the one that matters most, because it is what lets you trust the new path before you remove the old one.

Start at the Alloy UI on port 12345. Open the component detail page for prometheus.exporter.unix and confirm it is healthy and exporting targets, then check that prometheus.remote_write is healthy and not reporting send failures. A quick note on a common wrong turn: the /metrics endpoint on port 12345 exposes Alloy’s own internal telemetry, its component health and controller state, not the host metrics that prometheus.exporter.unix collects. So do not expect to find node_memory_MemAvailable_bytes by curling that endpoint. To see the actual host metrics flowing through a component in real time, Alloy has a Live Debugging view on the component detail page, though it is disabled by default and has to be turned on explicitly with a livedebugging block. It defaults to off for a good reason: the live stream can surface sensitive telemetry, so you enable it deliberately rather than leaving it on.

The real proof, though, is downstream. Open Prometheus or Grafana and confirm the new series is arriving. A simple query to confirm the host is alive through the new path:

up

Then compare a real value such as available memory across both paths:

node_memory_MemAvailable_bytes

Do not expect the labels to match perfectly. As noted earlier, the Alloy series may carry a different instance or job label than node_exporter produced. What you are checking is that the values agree and that the new series keeps updating. If your dashboards, recording rules, or alerting rules explicitly reference labels such as job="node_exporter", now is the time to find them, before you remove the old exporter and those references start matching nothing.

4. Retire node_exporter

Once you are confident Alloy is working correctly, stop the old exporter:

sudo systemctl stop node_exporter
sudo systemctl disable node_exporter

stop ends the running process now. disable prevents it from quietly returning after the next reboot. You want both, or a kernel update that reboots the box in three weeks will bring back an exporter you thought was gone.

Then remove the server’s scrape job from your Prometheus configuration and reload:

curl -X POST http://localhost:9090/-/reload

Reloading rather than restarting avoids interrupting metric collection for every other host while you update the configuration. The /-/reload endpoint requires that Prometheus was started with the --web.enable-lifecycle flag. If it was not, sending the process a SIGHUP achieves the same in-place reload without a full restart.

5. Watch the canary

Give the server at least one scrape interval under the new path alone. Verify that metrics remain fresh, alerts continue behaving normally, dashboards still populate, and no recording rules broke because of label changes. If any rule referenced the old job label, this is where you will find out. Fix it, confirm the canary is quiet, and it is done.

Then repeat the process on the next server. After a handful of successful migrations, you will have enough confidence to migrate small batches instead of individual hosts. Resist batching until the single-host process is boring.

// COMMON MIGRATION MISTAKES

A few issues show up repeatedly during migrations, and every one of them is easier to catch on a single canary than after twenty servers have moved:

Removing node_exporter before verifying Alloy.
Forgetting to reload Prometheus after removing scrape targets.
Alert rules or dashboards still referencing the old job label.
Firewall rules blocking outbound Remote Write traffic.
Assuming label changes will not affect existing dashboards.

// WHAT YOU GAIN

The obvious benefit is consolidation. Instead of deploying separate agents for metrics, logs, and traces, Alloy gives you a single telemetry pipeline that grows with your infrastructure. If you need logs, you add a Loki component. For OTLP traces, you add another component. The overall architecture does not change, because every pipeline is the same source, processor, destination shape you already built for metrics.

The less obvious benefit is security. Once every monitored machine pushes telemetry outward, you can close the node_exporter port on every server. There is no longer a listening metrics endpoint on each host, no scrape network to keep routable, and no inbound firewall rule whose only purpose is to let monitoring in. Alloy still serves its own admin UI on port 12345, but that is bound to localhost by default and is an operator interface, not a metrics surface exposed to the LAN. For anyone managing customer infrastructure, or simply working to reduce attack surface, that smaller footprint is arguably the biggest improvement Alloy brings.

// FINAL THOUGHTS

Five years ago, exposing port 9100 across an internal network was unremarkable. Today the direction of travel is clear: zero-trust networking, outbound-only connectivity, and centralized telemetry pipelines. Grafana Alloy is not compelling because it replaces node_exporter. It is compelling because it aligns your monitoring architecture with where modern infrastructure is already heading, and strengthens your security posture by removing exposed endpoints.

Take the migration slowly. Run both agents for a while. Prove the new telemetry path. Then remove the old one. Done this way, there is never a moment when a server is not being watched.

Related reading:

Alloy vs node_exporter: why you should switch. The decision behind this migration.
Monitor 20 Linux VMs without Kubernetes. The fleet architecture this migration fits into.
The only 5 PromQL queries you really need. Verify your metrics after the cutover.