Monitoring Docker

cAdvisor - collect metrics

Creating alerts with Prometheus

Create dashboards in Grafana

Why Monitoring and Metrics?

    1. Analyze long-term trends

    2. Comparing overtime or experiment groups

    3. Alerting

    4. Building Dashboards

    5. Conducting ad hoc retrospective analysis ( ex debugging)

Google SRE book

https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html

Why Metrics?

    1. Alert when you find a different behaviour

    2. A way to get information about hardware and software

    3. Predict system resource utilisations (say HDD getting filled up)

    4. Check on the service utilisations (of memory cpu etc)

    5. Latency and traffic ( if there are too many 500 or 404 errors)

IF YOU CAN MEASURE YOU CAN IMPROVE

cAdvisor

Provides information of the resource usages and performance characteristics of running containers.

It is a running demon that collects, aggregates, process, and exports information about running containers

Specifically, for each container it keeps resource isolation parameters, historical resource usages, histograms of complete historical resource usages and network statistics.

The data is exported by container and machine-wide

Information collection by cAdvisor are very valuable, but the UI is not very helpful. Say you are viewing cAdvior UI, we do not know which machine containers are being shown, if you refresh the pages you can see different set of containers from another machine. This might not be useful, so we would be using cAdvisor only for collecting metrics and make use of it in other tools for visualisation.

cAdvisor exposed prosthesis metrics ( at http://<ip<:<port>/metrics and we use this metics in Griffana.

Prometheus

Is a open-source systems monitoring and alerting toolkit built by SoundCloud and has a large user community.

Features:

    • A multi-dimensional data model

    • A flexible query language to leverage this dimensionality

    • No reliance on distributed storage; single server nodes are autonomous

    • Time series collection happens by means of a pull model over HTTP

    • Pushing time series is supported by means of an intermediary gateway

    • Targets are discovered by means of service discovery or static configuration

    • Multiple modes of graphing and dashboard support

Node Exporter

Node Exporter is an official exporter written by Prometheus team and it is capable to collect metrics from hardware and OS metrics and exposed it to Prometheus

https://github.com/prometheus/node_exporter

The issue with node exporter is that, it does not get the hostname, rather it gets the container name which might not be useful to find the right metrics for the host. So, we use

https://github.com/bvis/docker-node-exporter which is a simple node-exporter that obtains the hostname of the host and exposes it as a value in the container.

This value of host is exported out to host machinery the above parameter

There are number of third party exports for prometheus, you can get the list from https://prometheus.io/docs/instrumenting/exporters/

DEPLOYMENT

Start the exporter stack which will bring up cAdvisor and node explorer. These will be deployed globally on all the nodes of swarm.

docker stack deploy -c exporters-stack.yml exporter

Add these lines before in monitoring-stack.yml file at prometheus service

command:

- '-config.file=/etc/prometheus/prometheus.yml'

- '-storage.local.path=/prometheus'

- '-web.console.libraries=/usr/share/prometheus/console_libraries'

- '-web.console.templates=/usr/share/prometheus/consoles'

ports:

- 9090:9090

For mounted directory chown

chown -R 1000:1000 /data

deploy monitoring stack

docker stack deploy -c monitoring-stack.yml monitoring

Prometheus Query

Prometheus provides a functional expression language that lets the user select and aggregate time series data in real time

The results of and expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems by means of the HTTP API

https://prometheus.io/docs/prometheus/latest/querying/basics/

Example - to get consenter memory

container_memory_usage_bytes{container_label_com_docker_swarm_service_name="export_cadvisor"}

Prometheus Alert Manager:

Alerting with Prometheus is separated into 2 parts

    1. Alerting rules in Prometheus servers send alerts to an Alertmanger

    2. Alertmanager then manages these alerts, including silencing, inhibition, aggregation and sending our notifications by means of methods such as email, PageDuty, Slack and HipChat. https://prometheus.io/docs/alerting/overview/

Alerting with SLAX

Slax is a platform that connects teams with the apps, services and resources they need to get work done. https://slack.com/

GRAFANA

    • Grafana is an open source metric analytics and visualisation suite

    • It is most commonly used for visualising time series data for infrastructure and application analytics but many use it in other domains including industrial sensors, home automation, weather and process control

Docker Dashboard - https://grafana.com/dashboards/609

Prometheus Stats - https://grafana.com/dashboards/358

Node Exports - https://grafana.com/dashboards/405

Node Exports server - https://grafana.com/dashboards/704

Alert Manager

Alerting with Prometheus is separated into 2 parts

1. Alerting rules in Prometheus server send alerts to an Alert Manager

2. The AlertManager then manages those alerts, including silencing, inhibition, aggregation and sending out notification my means of methods such as email, Pagerduty, Slack and HipChat

https://prometheus.io/docs/alerting/overview

SLACK

Slack is a platform that connects teams with apps, services and resources they need to get work done.

- First create Channel

- From Manage Apps > Clustom Integration > search (webhook) - Select the channel in the webhook and copy the webhook URL

---------------------

LinuxKit Project

    • A toolkit for building secure, portable and lean operating systems for containers

    • Everything is replaceable and customisable

    • Immutable infrastructure applied to building Linux distributions

    • Easy tooling, with easy iteration

    • Built with containers, for running containers

    • designed for building and running clustered applications, including but not limited to container orchestration such as Docker or Kubernetes

Requirement

- Gelang - https://golang.org/dl/

- Docker

Moby- to build image

- go get -u gitHub.com/moby/tool/cmd/moby

Linuxkit is a tool for pushing and running VM images

- go get -u GitHub.com/linuxkit/linuxkit/src/cmd/linuxkit

How to Describe our OS image

    • Kernel specifies a kernel Docker image, containing a kernel and a filesystem tarsal, example containing modules

      • The example kernels are build from kernel/

    • Init is the base init process Docker image, which is unpacked as the base system, containing init, containers, runs and a few tools

      • Bild from pkg/init/

    • Onboot are the system containers, executed sequentially in order and they should be terminate quickly when done

    • Services is the system service, which normally run for the whole time the system is up

    • Files are additional files to add to the image

    • Trust specifies which build components are to be cryptographically verified with Docker content trust prior to pulling