Server monitoring: Grafana, prometheus and Node exporter¶
Server monitoring is an important thing to take care of, as a sysadmin.
Thing is, setting up a monitoring system for a single server is quite easy, especially through tools like Monit, but setting up a multi-server monitoring system quickly becomes confusing, especially when you see the myriad of existing solutions.
This guide will walk through setting up a solid and extensible monitoring solution using Grafana as the dashboard (for good looks), prometheus as the stats collectors aggregator and node_exporter as the endpoint-related collector.
Table of contents¶
- Architecture concepts
- Collector and dashboard server
This document is split in three parts:
- Collector and dashboard server
- Monitored server endpoint
- Grafana query language basics and examples
For shell examples, every shell block will start with
<server> is one of those 3 choices:
- The central collector (and display) server will be
- Each monitored node will be
Note that, if I moved into a folder I created during the guide, it'll be noted by suffixing CA/VPN/CLIENT with "in ./
For this doc, we'll have two servers acting as monitored nodes, and one server acting as the collector and display server. Note that our collector/display server will be one of the two nodes, as we want our server to monitor itself.
Basically, - 10.0.10.1 = CENTRAL/NODE = grafana, prometheus, node_exporter - 10.0.10.2 = NODE = node_exporter
A diagram is generally better at explaining how things work together.
Note that you'll probably have a node_exporter server on your prometheus/grafana server, so it can monitor itself!
Collector and dashboard server¶
We'll start by setting up the collector/dashboard server, so we have a nice dashboard GUI ready for us.
The first step is to install Grafana.
On this guide, the server and every monitored node is debian-based.
If you're not dumb, this guide should get you started.
Then, we'll install the prometheus data collector daemon. It's tasked with retrieving data from every configured endpoint (which will be explained later on).
Thankfully, we have everything in the base repositories.
# Central $ apt install prometheus
We'll take care of the data extractor later on (on next part), so we'll skip it for now, and we'll analyze our prometheus configuration and grafana dashboard, before already syncing them.
Let's firstly take a look at Prometheus' configuration.
The file is, by default,
Its content is the following (may vary; it is, right now).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
# Sample config for Prometheus. global: scrape_interval: 15s # By default, scrape targets every 15 seconds. evaluation_interval: 15s # By default, scrape targets every 15 seconds. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'example' # Load and evaluate rules in this file every 'evaluation_interval' seconds. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s scrape_timeout: 5s # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] - job_name: node # If prometheus-node-exporter is installed, grab stats about the local # machine by default. static_configs: - targets: ['localhost:9100']
I assume you at least have basic knowledge on the YAML format. Otherwise, you should probably document yourself here.
We don't care about the
global block, so let's skip it.
What is interesting for us for a basic setup is the
It's structured as a list of objects, the usual keys are
job_name(how you want to label this particular stats data set you're getting)
static_configs(the configuration for this job)
targets, an array of host/port combinations. This is every monitored node's endpoint.
Following the IPs we gave in the foreword, let's change our configuration a bit.
scrape_configs, we'll have two jobs (we'll kick out the existing
First job is for our monitoring central, and so will be named
1 2 3
- job_name: central static_configs: - targets: ['10.0.10.1:9100']
Note that, here, localhost:9100 (as predefined in
node) is perfectly fine,
but for our example's sake, we'll use its local IP address.
Our second job will be for our monitored server, we'll name it
1 2 3
- job_name: monitored static_configs: - targets: ['10.0.10.2:9100']
Now that we made those blocks, our configuration file (stripped of its comments, for simplicity's sake) looks like that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
global: scrape_interval: 15s evaluation_interval: 15s external_labels: monitor: 'example' scrape_configs: - job_name: 'prometheus' scrape_interval: 5s scrape_timeout: 5s static_configs: - targets: ['localhost:9090'] - job_name: central static_configs: - targets: ['10.0.10.1:9100'] - job_name: monitored static_configs: - targets: ['10.0.10.2:9100']
We can save this file, and restart the prometheus service.
# Central $ systemctl restart prometheus
Now, let's start grafana, if not already done.
# Central $ systemctl start grafana
It's a web server that will, by default, listen to the port
Open your grafana instance by entering your server's IP, suffixed by the port.
Default credentials are
On the left navigation bar, hover "Configuration" and select Data sources.
On the list of available data source types, select Prometheus, and enter the
default provided as placeholder (
Name it as you wish, as it really doesn't matter.
Once it's done, we'll have the ability to create graphs from our extracted data.
Thing is, we still haven't configured any extractor. Let's do that right now.
Monitored server endpoint¶
This can be followed as many times as you want, once per server you wish to monitor.
Now, we'll set up
node_exporter to export data from our servers.
Its installation is pretty straightforward, but its configuration can be painful.
Let's firstly install the debian package
For the Central server, when you installed
prometheus, it came as a dependency,
so you won't have to install it.
# NODE $ apt install prometheus-node-exporter
This daemon doesn't have any configuration file, but can take arguments to configure it on start.
This is technically done by providing arguments to the
but since it's managed by systemd, we'll need another way than to edit the system
Luckily, the file located at
/etc/default/prometheus-node-exporter defines an
environnment variable which contains parameters (which will be given to the
executable by systemd).
We'll only want to change one thing, which is the listening interface for the
node-exporter HTTP server (by default, it opens the port 9100 on every network
interface, we'll restrict it for our local domain address,
/etc/default/prometheus-node-exporter, and append to the ARGS environment
variable the following string.
Make sure to set the ip to your real server IP on which you want to open this server.
That's all! If prometheus has access to this node, it will automatically start collecting data once we restart node-exporter, which we'll do running this.
1 2 3
# Node $ systemctl restart prometheus-node-exporter $ systemctl enable prometheus-node-exporter
Grafana query language basics and examples¶