Skip to content

Server monitoring: Grafana, prometheus and Node exporter

Server monitoring is an important thing to take care of, as a sysadmin.

Thing is, setting up a monitoring system for a single server is quite easy, especially through tools like Monit, but setting up a multi-server monitoring system quickly becomes confusing, especially when you see the myriad of existing solutions.

This guide will walk through setting up a solid and extensible monitoring solution using Grafana as the dashboard (for good looks), prometheus as the stats collectors aggregator and node_exporter as the endpoint-related collector.

Table of contents

  • Foreword
  • Architecture concepts
  • Collector and dashboard server

Foreword

This document is split in three parts:

  • Collector and dashboard server
  • Monitored server endpoint
  • Grafana query language basics and examples

For shell examples, every shell block will start with # <server>, where <server> is one of those 3 choices:

  • The central collector (and display) server will be CENTRAL
  • Each monitored node will be NODE

Note that, if I moved into a folder I created during the guide, it'll be noted by suffixing CA/VPN/CLIENT with "in ./".

For this doc, we'll have two servers acting as monitored nodes, and one server acting as the collector and display server. Note that our collector/display server will be one of the two nodes, as we want our server to monitor itself.

Basically, - 10.0.10.1 = CENTRAL/NODE = grafana, prometheus, node_exporter - 10.0.10.2 = NODE = node_exporter

Architecture concepts

A diagram is generally better at explaining how things work together.

this diagram displays grafana in front of a prometheus server, which is connected to several node_exporter servers

Note that you'll probably have a node_exporter server on your prometheus/grafana server, so it can monitor itself!

Collector and dashboard server

We'll start by setting up the collector/dashboard server, so we have a nice dashboard GUI ready for us.

Installation

The first step is to install Grafana.

On this guide, the server and every monitored node is debian-based.

If you're not dumb, this guide should get you started.

Then, we'll install the prometheus data collector daemon. It's tasked with retrieving data from every configured endpoint (which will be explained later on).

Thankfully, we have everything in the base repositories.

1
2
# Central
$ apt install prometheus

We'll take care of the data extractor later on (on next part), so we'll skip it for now, and we'll analyze our prometheus configuration and grafana dashboard, before already syncing them.

Configuration

Let's firstly take a look at Prometheus' configuration.

The file is, by default, /etc/prometheus/prometheus.yml.

Its content is the following (may vary; it is, right now).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Sample config for Prometheus.

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  evaluation_interval: 15s # By default, scrape targets every 15 seconds.
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
      monitor: 'example'

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  # - "first.rules"
  # - "second.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    scrape_timeout: 5s

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: node
    # If prometheus-node-exporter is installed, grab stats about the local
    # machine by default.
    static_configs:
      - targets: ['localhost:9100']

I assume you at least have basic knowledge on the YAML format. Otherwise, you should probably document yourself here.

We don't care about the global block, so let's skip it. What is interesting for us for a basic setup is the scrape_configs block.

It's structured as a list of objects, the usual keys are

  • job_name (how you want to label this particular stats data set you're getting)
  • static_configs (the configuration for this job)
    • targets, an array of host/port combinations. This is every monitored node's endpoint.

Following the IPs we gave in the foreword, let's change our configuration a bit.

For scrape_configs, we'll have two jobs (we'll kick out the existing node job).

First job is for our monitoring central, and so will be named central.

1
2
3
- job_name: central
  static_configs:
    - targets: ['10.0.10.1:9100']

Note that, here, localhost:9100 (as predefined in node) is perfectly fine, but for our example's sake, we'll use its local IP address.

Our second job will be for our monitored server, we'll name it monitored.

1
2
3
- job_name: monitored
  static_configs:
    - targets: ['10.0.10.2:9100']

Now that we made those blocks, our configuration file (stripped of its comments, for simplicity's sake) looks like that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
global:
  scrape_interval:     15s
  evaluation_interval: 15s
  external_labels:
      monitor: 'example'

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    scrape_timeout: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: central
    static_configs:
      - targets: ['10.0.10.1:9100']

  - job_name: monitored
    static_configs:
      - targets: ['10.0.10.2:9100']

We can save this file, and restart the prometheus service.

1
2
# Central
$ systemctl restart prometheus

Now, let's start grafana, if not already done.

1
2
# Central
$ systemctl start grafana

It's a web server that will, by default, listen to the port 3000.

Open your grafana instance by entering your server's IP, suffixed by the port. Default credentials are admin and admin.

On the left navigation bar, hover "Configuration" and select Data sources.

On the list of available data source types, select Prometheus, and enter the default provided as placeholder (http://localhost:9090).

Name it as you wish, as it really doesn't matter.

Once it's done, we'll have the ability to create graphs from our extracted data.

Thing is, we still haven't configured any extractor. Let's do that right now.

Monitored server endpoint

This can be followed as many times as you want, once per server you wish to monitor.

Now, we'll set up node_exporter to export data from our servers.

Its installation is pretty straightforward, but its configuration can be painful.

Let's firstly install the debian package prometheus-node-exporter. For the Central server, when you installed prometheus, it came as a dependency, so you won't have to install it.

1
2
# NODE
$ apt install prometheus-node-exporter

This daemon doesn't have any configuration file, but can take arguments to configure it on start.

This is technically done by providing arguments to the node_exporter executable, but since it's managed by systemd, we'll need another way than to edit the system files.

Luckily, the file located at /etc/default/prometheus-node-exporter defines an environnment variable which contains parameters (which will be given to the executable by systemd).

We'll only want to change one thing, which is the listening interface for the node-exporter HTTP server (by default, it opens the port 9100 on every network interface, we'll restrict it for our local domain address, 10.0.10.0).

Open /etc/default/prometheus-node-exporter, and append to the ARGS environment variable the following string.

1
-web.listen-address=10.0.10.1:9100

Make sure to set the ip to your real server IP on which you want to open this server.

That's all! If prometheus has access to this node, it will automatically start collecting data once we restart node-exporter, which we'll do running this.

1
2
3
# Node
$ systemctl restart prometheus-node-exporter
$ systemctl enable prometheus-node-exporter

Grafana query language basics and examples

TODO