Monitoring Docker
cAdvisor - collect metrics
Creating alerts with Prometheus
Create dashboards in Grafana
Why Monitoring and Metrics?
Analyze long-term trends
Comparing overtime or experiment groups
Alerting
Building Dashboards
Conducting ad hoc retrospective analysis ( ex debugging)
Google SRE book
https://landing.google.com/sre/book/chapters/monitoring-distributed-systems.html
Why Metrics?
Alert when you find a different behaviour
A way to get information about hardware and software
Predict system resource utilisations (say HDD getting filled up)
Check on the service utilisations (of memory cpu etc)
Latency and traffic ( if there are too many 500 or 404 errors)
IF YOU CAN MEASURE YOU CAN IMPROVE
cAdvisor
Provides information of the resource usages and performance characteristics of running containers.
It is a running demon that collects, aggregates, process, and exports information about running containers
Specifically, for each container it keeps resource isolation parameters, historical resource usages, histograms of complete historical resource usages and network statistics.
The data is exported by container and machine-wide
Information collection by cAdvisor are very valuable, but the UI is not very helpful. Say you are viewing cAdvior UI, we do not know which machine containers are being shown, if you refresh the pages you can see different set of containers from another machine. This might not be useful, so we would be using cAdvisor only for collecting metrics and make use of it in other tools for visualisation.
cAdvisor exposed prosthesis metrics ( at http://<ip<:<port>/metrics and we use this metics in Griffana.
Prometheus
Is a open-source systems monitoring and alerting toolkit built by SoundCloud and has a large user community.
Features:
A multi-dimensional data model
A flexible query language to leverage this dimensionality
No reliance on distributed storage; single server nodes are autonomous
Time series collection happens by means of a pull model over HTTP
Pushing time series is supported by means of an intermediary gateway
Targets are discovered by means of service discovery or static configuration
Multiple modes of graphing and dashboard support
Node Exporter
Node Exporter is an official exporter written by Prometheus team and it is capable to collect metrics from hardware and OS metrics and exposed it to Prometheus
https://github.com/prometheus/node_exporter
The issue with node exporter is that, it does not get the hostname, rather it gets the container name which might not be useful to find the right metrics for the host. So, we use
https://github.com/bvis/docker-node-exporter which is a simple node-exporter that obtains the hostname of the host and exposes it as a value in the container.
This value of host is exported out to host machinery the above parameter
There are number of third party exports for prometheus, you can get the list from https://prometheus.io/docs/instrumenting/exporters/
DEPLOYMENT
Start the exporter stack which will bring up cAdvisor and node explorer. These will be deployed globally on all the nodes of swarm.
docker stack deploy -c exporters-stack.yml exporter
Add these lines before in monitoring-stack.yml file at prometheus service
command:
- '-config.file=/etc/prometheus/prometheus.yml'
- '-storage.local.path=/prometheus'
- '-web.console.libraries=/usr/share/prometheus/console_libraries'
- '-web.console.templates=/usr/share/prometheus/consoles'
ports:
- 9090:9090
For mounted directory chown
chown -R 1000:1000 /data
deploy monitoring stack
docker stack deploy -c monitoring-stack.yml monitoring
Prometheus Query
Prometheus provides a functional expression language that lets the user select and aggregate time series data in real time
The results of and expression can either be shown as a graph, viewed as tabular data in Prometheus's expression browser, or consumed by external systems by means of the HTTP API
https://prometheus.io/docs/prometheus/latest/querying/basics/
Example - to get consenter memory
container_memory_usage_bytes{container_label_com_docker_swarm_service_name="export_cadvisor"}
Prometheus Alert Manager:
Alerting with Prometheus is separated into 2 parts
Alerting rules in Prometheus servers send alerts to an Alertmanger
Alertmanager then manages these alerts, including silencing, inhibition, aggregation and sending our notifications by means of methods such as email, PageDuty, Slack and HipChat. https://prometheus.io/docs/alerting/overview/
Alerting with SLAX
Slax is a platform that connects teams with the apps, services and resources they need to get work done. https://slack.com/
GRAFANA
Grafana is an open source metric analytics and visualisation suite
It is most commonly used for visualising time series data for infrastructure and application analytics but many use it in other domains including industrial sensors, home automation, weather and process control
Docker Dashboard - https://grafana.com/dashboards/609
Prometheus Stats - https://grafana.com/dashboards/358
Node Exports - https://grafana.com/dashboards/405
Node Exports server - https://grafana.com/dashboards/704
Alert Manager
Alerting with Prometheus is separated into 2 parts
1. Alerting rules in Prometheus server send alerts to an Alert Manager
2. The AlertManager then manages those alerts, including silencing, inhibition, aggregation and sending out notification my means of methods such as email, Pagerduty, Slack and HipChat
https://prometheus.io/docs/alerting/overview
SLACK
Slack is a platform that connects teams with apps, services and resources they need to get work done.
- First create Channel
- From Manage Apps > Clustom Integration > search (webhook) - Select the channel in the webhook and copy the webhook URL
---------------------
LinuxKit Project
A toolkit for building secure, portable and lean operating systems for containers
Everything is replaceable and customisable
Immutable infrastructure applied to building Linux distributions
Easy tooling, with easy iteration
Built with containers, for running containers
designed for building and running clustered applications, including but not limited to container orchestration such as Docker or Kubernetes
Requirement
- Gelang - https://golang.org/dl/
- Docker
Moby- to build image
- go get -u gitHub.com/moby/tool/cmd/moby
Linuxkit is a tool for pushing and running VM images
- go get -u GitHub.com/linuxkit/linuxkit/src/cmd/linuxkit
How to Describe our OS image
Kernel specifies a kernel Docker image, containing a kernel and a filesystem tarsal, example containing modules
The example kernels are build from kernel/
Init is the base init process Docker image, which is unpacked as the base system, containing init, containers, runs and a few tools
Bild from pkg/init/
Onboot are the system containers, executed sequentially in order and they should be terminate quickly when done
Services is the system service, which normally run for the whole time the system is up
Files are additional files to add to the image
Trust specifies which build components are to be cryptographically verified with Docker content trust prior to pulling