How to monitor your Kubernetes cluster with Prometheus and Grafana (The whole, long, story)

Get complete node monitoring on your Kubernetes cluster in 5 minutes.

How to monitor your Kubernetes cluster with Prometheus and Grafana (The whole, long, story)

Monitoring a cluster is absolutely vital.

Prometheus and Grafana make it extremely easy to monitor just about any metric in your Kubernetes cluster.

In this article I will show how to add monitoring for all the nodes in your cluster.

TL;DR https://github.com/chris-cmsoft/kubernetes-bootstrapper

TL;DR Here is the short version


A web ui showing graphs of memory, cpu and network usage of a collection of Kubernetes nodes
By the end of this tutorial, you'll have a dashboard that looks like

You need a few things.

  1. An existing Kubernetes Cluster.
  2. kubectl binary locally installed

Getting a Kubernetes Cluster

There are a multitude of ways for getting a Kubernetes cluster setup, but I find the easiest just to use a DigitalOcean managed cluster. They already have all the networking and storage configured and all you have to do is create and download your kubeconfig

You can sign up for Kubernetes using this link
The above is a referral link with $50 free usage :)

You can also spin up clusters using tools like minikube, microk8s, or even using kubeadm to create your own cluster.

For this tutorial you might need slightly beefier nodes. So select 2 of the $40 , 8GB, 4vCPU machines. You’ll only be running these for a little while, so don’t worry too much about cost. You’ll end up losing < $2 of your free $50

Installing kubectl

Checkout the up-to-date Kubernetes docs for installing kubectl


Test your Kubernetes cluster

Your Kubernetes cluster needs to be running. We won’t do a deep test here, we will just confirm it can run pods.

$ kubectl get pods -n kube-system

You should see something along the lines

NAME                                   READY   STATUS    RESTARTS   AGE
cilium-operator-884664456-ss4rz        1/1     Running   78         30h
cilium-vnm5l                           1/1     Running   0          30h
cilium-zgsfw                           1/1     Running   0          30h
coredns-5d668bd598-7qhjm               1/1     Running   0          30h
coredns-5d668bd598-fzk8f               1/1     Running   0          30h
csi-do-node-jvs67                      2/2     Running   0          30h
csi-do-node-zss6b                      2/2     Running   0          30h
kube-proxy-compassionate-easley-ucnd   1/1     Running   0          30h
kube-proxy-compassionate-easley-ucnv   1/1     Running   0          30h
tiller-deploy-dbb85cb99-st8lt          1/1     Running   0          29h

Install Helm

The package manager for Kubernetes

Helm makes it extremely easy to make sure you use up to date versions of Prometheus & Grafana & also makes it a-lot easier to deploy and delete.

If you don’t entirely trust it (I didn’t at first), you can use it to generate all the yaml configs to inspect and apply those.

Install the Helm client

Installing the helm client is fairly simple. I will once again link so it always stays up to date: https://github.com/helm/helm#install

Install Tiller (Helm server)

Installing Tiller is a bit more in-depth as you need to secure it in production clusters. For the purposes of keeping it simple and playing around, we will install it with normal cluster-admin roles.

If you need to secure it for a production cluster: https://docs.helm.sh/using_helm/#tiller-and-role-based-access-control

Create the Tiller Service Account

We will use yaml files so we can keep all of our setup in version control to be re-used later.

Create a folder called helm. Here we will create all Kubernetes resources for tiller. Create a file called helm/service-account.yml and add the following content:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system

Then apply and test that the service account exists in the cluster.

$ kubectl apply -f helm/service-account.yml
$ kubectl get serviceaccounts -n kube-system
NAME                   SECRETS   AGE
[...]
tiller                 1         30h

Create the service account role binding

For demo purpose we will create a role binding to cluster-admin.

DO NOT DO THIS IN PRODUCTION !!

See here for more information: https://docs.helm.sh/using_helm/#understanding-the-security-context-of-your-cluster

Create a file called helm/role-binding.yml in the helm folder with the following content:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

Apply and test that the role binding exists on the cluster

$ kubectl apply -f helm/role-binding.yml
$ kubectl get clusterrolebindings.rbac.authorization.k8s.io
NAME                                                   AGE
[...]
tiller                                                 30h

Finally ! We can deploy tiller !

$ helm init --service-account tiller --wait

The --wait flag makes sure that tiller is finished before we apply the next few commands to start deploying Prometheus and Grafana.

Apply and test tiller is deployed and running

$ kubectl get pods -n kube-system
NAME                                   READY   STATUS   AGE
[...]
tiller-deploy-dbb85cb99-st8lt          1/1     Running  30h

Done ! Tiller is deployed and now the real fun starts !

Unsplash

Time to monitor

The first thing we want to monitor is how the nodes under our cluster are performing.

Helm makes this extremely easy by shipping the node-exporter together with the Prometheus chart.

Create a monitoring directory.

We will separate our monitoring resources into a separate namespace to keep them together. This also helps if we want to set resource quotas to all monitoring services later.

First, create a folder for our monitoring solution.

$ mkdir monitoring

We need a namespace to keep all of our resources in.

We also want to keep this in version control in case we need to recreate our resources easily.

Create a file called namespace.yml in monitoring and add the following contents.

kind: Namespace
apiVersion: v1
metadata:
  name: monitoring

This will create a namespace in the cluster once applied.

Apply & Test the namespace exists.

$ kubectl get namespaces
NAME          STATUS   AGE
default       Active   30h
kube-public   Active   30h
kube-system   Active   30h
monitoring    Active   105m

Deploy Prometheus

Here is where the power of Helm steps in and makes life much easier.

First we need to update our local helm chart repo.

$ helm repo update
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

Next, deploy Prometheus

$ helm install stable/prometheus \
    --namespace monitoring \
    --name prometheus

This will deploy Prometheus into your cluster in the monitoring namespace and mark the release with the name prometheus .

If you want to delete Prometheus from your cluster later, 
you can run:helm del —-purge prometheus

Prometheus is now scraping the cluster together with the node-exporter and collecting metrics from the nodes, and even more information from Kubernetes. We will see these soon in Grafana.


Deploy Grafana

When deploying grafana, we need to configure it to read metrics from the right data sources.

There are two ways of achieving this.

  1. Deploy Grafana & add the data source afterwards through the UI.
  2. Add the data source as yaml configs & deploy Grafana. Grafana will use these to automatically configure the data sources when it is provisioned. We are going to take this path as we want everything to be replicate-able without too much manual intervention.

Defining the grafana data sources.

Grafana takes data sources through yaml configs when it starts up. For more information see here: http://docs.grafana.org/administration/provisioning/#datasources

Kubernetes has nothing to do with importing the data. Kubernetes merely orchestrates the injection of these yaml files.

We will create these files before we deploy Grafana to ensure they are automatically added.

When the Grafana Helm chart gets deployed, it will search for any config maps that contain a grafana_datasource label. So we will add one in our config.

Create a Prometheus data source config map to inject into Grafana

In the monitoring folder, create a sub-folder called grafana .
Here is where we will store our configs for the grafana deployment.

We need to create a config map for the prometheus data source.

Create a file called config.yml in monitoring/grafana/ and add the contents:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-grafana-datasource
  namespace: monitoring
  labels:
    grafana_datasource: '1'
data:
  datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      access: proxy
      orgId: 1
      url: http://prometheus-server.monitoring.svc.cluster.local

Lets go over that briefly.

We are creating a config map which defines an attribute with yaml which we will inject into grafana in a moment.

Here is where we add the grafana_datasource which will tell the grafana provisioner that this is a datasource it should inject.

labels:
    grafana_datasource: '1'

Here is where we add the attribute called datasource.yaml. This is the name of the file the grafana provisioner will inject. The |- tells Kubernetes to interpret the following lines as text.

data:
  datasource.yaml: |-

The following yaml definition is the actual definition grafana will read.

    apiVersion: 1
    datasources:
    - name: Prometheus # The 
      type: prometheus
      access: proxy
      orgId: 1
      url: http://prometheus-server.monitoring.svc.cluster.local

name: Prometheus is the name of the datasource which will be saved in Grafana and used later when defining dashboards.
type: prometheus tells Grafana to interpret the data in the Prometheus format. There are a few formats which Grafana can read.
access: proxy tells Grafana it is reading the data from a Prometheus server and not from an api.
orgId: 1 is used in Grafana when creating users.
url: … is the url at which Grafana should reach the Prometheus server.

prometheus-server.monitoring.svc.cluster.local is a host which points at a Kubernetes service. Kubernetes DNS will ensure you always reach the Prometheus service even if the pod moves and changes IP.

Apply & test the config

$ kubectl apply -f monitoring/grafana/config.yml
$ kubectl get configmaps -n monitoring
NAME                            DATA   AGE
grafana                         1      131m
prometheus-alertmanager         1      131m
prometheus-grafana-datasource   1      138m
prometheus-server               3      131m

Override Grafana value

When Grafana gets deployed and the provisioner runs, the data source provisioner is deactivated. We need to activate it so it searches for our config maps.

When using a predefined Helm chart, we can override it’s startup values by specifying the overrides in a values.yml file, and passing that in when we do helm install . This will override only the values we have specified and will leave the rest of the defaults intact. To see all the possible values when running Grafana checkout: https://github.com/helm/charts/tree/master/stable/grafana.
Also read through the values.yml file as it explains in very clear detail when and how to use the values.

We need to create our own values.yml file to override the datasources search value, so when Grafana is deployed it will search our datasource.yml definition and inject it.

Create a file called values.yml in the monitoring/grafana/ folder with the following contents:

sidecar:
  image: xuxinkun/k8s-sidecar:0.0.7
  imagePullPolicy: IfNotPresent
  datasources:
    enabled: true
    label: grafana_datasource

This will inject a sidecar which will load all the data sources into Grafana when it gets provisioned.

Now we can deploy Grafana with the overridden values.yml file and our datasource will be imported.

$ helm install stable/grafana \
    -f monitoring/grafana/values.yml \
    --namespace monitoring \ 
    --name grafana

Check that it is running:

$ kubectl get pods -n monitoring
NAME                                             READY   STATUS    RESTARTS   AGE
grafana-5f4d8bcb94-ppsjq                         1/1     Running
nginx-deployment-854c944978-hgfgr                1/1     Running
prometheus-alertmanager-5c5958dcb7-bq2fw         2/2     Running
prometheus-kube-state-metrics-76d649cdf9-v5qg5   1/1     Running
prometheus-node-exporter-j74zq                   1/1     Running
prometheus-node-exporter-x5xnq                   1/1     Running
prometheus-pushgateway-6744d69d4-27dxb           1/1     Running
prometheus-server-669b987bcd-swcxh               2/2     Running

Get the Grafana Password

Grafana is deployed with a password. This is good news. But whats the password ?

$ kubectl get secret \
    --namespace monitoring \
    grafana \
    -o jsonpath="{.data.admin-password}" \
    | base64 --decode ; echo
djO3qRObroIY6JYyzvTJrHUeTuy4z26D2tbwTizP

djO3qRObroIY6JYyzvTJrHUeTuy4z26D2tbwTizP is the password to your Grafana dashboard. 
The username is admin

Port Forward the Grafana dashboard to see whats happening:

$ export POD_NAME=$(kubectl get pods --namespace monitoring -l "app=grafana,release=grafana" -o jsonpath="{.items[0].metadata.name}")
$ kubectl --namespace monitoring port-forward $POD_NAME 3000

Go to http://localhost:3000 in your browser. You should see the Grafana login screen:

Smashing login screen

Login with the username and password you have from the previous command.

Grafana dashboard after login. No metrics yet ?

Add a dashboard

Grafana has a long list of prebuilt dashboard here: 
https://grafana.com/dashboards

Here you will find many many dashboards to use. We will use this one as it is quite comprehensive in everything it tracks.

In the left hand menu, choose Dashboards > Manage > + Import

In the Grafana.com dashboard input, add the dashboard ID we want to use: 1860 and click Load

On the next screen select a name for your dashboard and select Prometheus as the datasource for it and click Import.

You’ve got metrics !

The list of metrics is extensive. Go over them and see what is useful, copy their structures and panels and create your own dashboards for the big screens in the office !

To make the process even easier, I have saved all of this in a github repo for everyone to use ! https://github.com/chris-cmsoft/kubernetes-bootstrapper

Enjoy !