essh @ kubernetes-master: ~ $ docker run -d –net = host –name prometheus prom / prometheus
09416fc74bf8b54a35609a1954236e686f8f6dfc598f7e05fa12234f287070ab
essh @ kubernetes-master: ~ $ docker ps -f name = prometheus
CONTAINER ID IMAGE NAMES
09416fc74bf8 prom / prometheus prometheus
UI with graphs for displaying metrics:
essh @ kubernetes-master: ~ $ firefox localhost: 9090
Add the go_gc_duration_seconds {quantile = "0"} metric from the list:
essh @ kubernetes-master: ~ $ curl localhost: 9090 / metrics 2> / dev / null | head -n 4
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds {quantile = "0"} 1.0097e-05
go_gc_duration_seconds {quantile = "0.25"} 1.7841e-05
Going to the UI at localhost: 9090 in the menu, select Graph. Let's add to the dashboard with the chart: select the metric using the list – insert metrics at cursor . Here we see the same metrics as in the localhost: 9090 / metrics list, but aggregated by parameters, for example, just go_gc_duration_seconds. We select the go_gc_duration_seconds metric and show it on the Execute button . In the console tab of the dashboard, we see the metrics:
go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0"} 0.000009186 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.25"} 0.000012056 = go_congc_ instance "localhost: 9090", JOB = "prometheus", quantile = "0.5"} 0.000023256 go_gc_duration_seconds {instance = "localhost: 9090", JOB = "prometheus", quantile = "0.75"} 0.000068848 go_gc_duration_seconds {instance = "localhost: 9090 ", JOB =" prometheus ", quantile =" 1 "} 0.00021869
by going to the Graph tab – their graphical representation.
Now Prometheus collects metrics from the current node: go_ *, net_ *, process_ *, prometheus_ *, promhttp_ *, scrape_ * and up. To collect metrics from Docker, we tell him to write his metrics in Prometheus on port 9323:
eSSH @ Kubernetes-master: ~ $ curl http: // localhost: 9323 / metrics 2> / dev / null | head -n 20
# HELP builder_builds_failed_total Number of failed image builds
# TYPE builder_builds_failed_total counter
builder_builds_failed_total {reason = "build_canceled"} 0
builder_builds_failed_total {reason = "build_target_not_reachable_error"} 0
builder_builds_failed_total {reason = "command_not_supported_error"} 0
builder_builds_failed_total {reason = "Dockerfile_empty_error"} 0
builder_builds_failed_total {reason = "Dockerfile_syntax_error"} 0
builder_builds_failed_total {reason = "error_processing_commands_error"} 0
builder_builds_failed_total {reason = "missing_onbuild_arguments_error"} 0
builder_builds_failed_total {reason = "unknown_instruction_error"} 0
# HELP builder_builds_triggered_total Number of triggered image builds
# TYPE builder_builds_triggered_total counter
builder_builds_triggered_total 0
# HELP engine_daemon_container_actions_seconds The number of seconds it takes to process each container action
# TYPE engine_daemon_container_actions_seconds histogram
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.005"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.01"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.025"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.05"} 1
engine_daemon_container_actions_seconds_bucket {action = "changes", le = "0.1"} 1
In order for the docker daemon to apply the parameters, it must be restarted, which will lead to the fall of all containers, and when the daemon starts, the containers will be raised in accordance with their policy:
essh @ kubernetes-master: ~ $ sudo chmod a + w /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ echo '{"metrics-addr": "127.0.0.1:9323", "experimental": true}' | jq -M -f / dev / null> /etc/docker/daemon.json
essh @ kubernetes-master: ~ $ cat /etc/docker/daemon.json
{
"metrics-addr": "127.0.0.1:9323",
"experimental": true
}
essh @ kubernetes-master: ~ $ systemctl restart docker
Prometheus will only respond to metrics on the same server from different sources. In order for us to collect metrics from different nodes and see the aggregated result, we need to put an agent collecting metrics on each node:
essh @ kubernetes-master: ~ $ docker run -d \
–v "/ proc: / host / proc" \
–v "/ sys: / host / sys" \
–v "/: / rootfs" \
–-net = "host" \
–-name = explorer \
quay.io/prometheus/node-exporter:v0.13.0 \
–collector.procfs / host / proc \
–collector.sysfs / host / sys \
–collector.filesystem.ignored-mount-points "^ / (sys | proc | dev | host | etc) ($ | /)"
1faf800c878447e6110f26aa3c61718f5e7276f93023ab4ed5bc1e782bf39d56
and register to listen to the address of the node, but for now everything is local, localhost: 9100. Now let's tell Prometheus to listen to agent and docker:
essh @ kubernetes-master: ~ $ mkdir prometheus && cd $ _
essh @ kubernetes-master: ~ / prometheus $ cat << EOF> ./prometheus.yml
global:
scrape_interval: 1s
evaluation_interval: 1s
scrape_configs:
– job_name: 'prometheus'
static_configs:
– targets: ['127.0.0.1:9090', '127.0.0.1:9100', '127.0.0.1:9323']
labels:
group: 'prometheus'
EOF
essh @ kubernetes-master: ~ / prometheus $ docker rm -f prometheus
prometheus
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-net = host \
–-restart always \
–-name prometheus \
–v $ (pwd) /prometheus.yml:/etc/prometheus/prometheus.yml
prom / prometheus
7dd991397d43597ded6be388f73583386dab3d527f5278b7e16403e7ea633eef
essh @ kubernetes-master: ~ / prometheus $ docker ps \
–f name = prometheus
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7dd991397d43 prom / prometheus "/ bin / prometheus –c…" 53 seconds ago Up 53 seconds prometheus
1702 host metrics are now available:
essh @ kubernetes-master: ~ / prometheus $ curl http: // localhost: 9100 / metrics | grep -v '#' | wc -l
1702
out of all the variety, it is difficult to find the ones you need for everyday tasks, for example, the amount of memory used by node_memory_Active. There are metrics aggregators for this:
http: // localhost: 9090 / consoles / node.html
http: // localhost: 9090 / consoles / node-cpu.html
But it's better to use Grafana. Let's install it too, you can see an example:
essh @ kubernetes-master: ~ / prometheus $ docker run \
–d \
–-name = grafana \
–-net = host
grafana / grafana
Unable to find image 'grafana / grafana: latest' locally
latest: Pulling from grafana / grafana
9d48c3bd43c5: Already exists
df58635243b1: Pull complete
09b2e1de003c: Pull complete
f21b6d64aaf0: Pull complete
719d3f6b4656: Pull complete
d18fca935678: Pull complete
7c7f1ccbce63: Pull complete
Digest: sha256: a10521576058f40427306fcb5be48138c77ea7c55ede24327381211e653f478a
Status: Downloaded newer image for grafana / grafana: latest
6f9ca05c7efb2f5cd8437ddcb4c708515707dbed12eaa417c2dca111d7cb17dc
essh @ kubernetes-master: ~ / prometheus $ firefox localhost: 3000
We will enter the login admin and the password admin, after which we will be prompted to change the password. Next, you need to perform the subsequent configuration.
In Grafana, the initial login is admin and this password. First, we are prompted to select a source – select Prometheus, enter localhost: 9090, select the connection not as to the server, but as to the browser (that is, over the network) and select that we have basic authentication – that's all – click Save and Test and Prometheus is connected.
It is clear that it is not worth giving out a password and login from admin rights to everyone. To do this, you will need to create users or integrate them with an external user database such as Microsoft Active Directory.
I will select in the Dashboard tab and activate all three reconfigured dashboards. From the New Dashboard list in the top menu, select the Prometheus 2.0 Stats dashboard. But, there is no data:
I click on the "+" menu item and select "Dashboard", it is proposed to create a dashboard. A dashboard can contain several widgets, for example, charts that can be positioned and customized, so click on the add chart button and select its type. On the graph itself, we select edit by choosing a size, click edit, and the most important thing here is the choice of the displayed metric. Choosing Prometheus
Complete assembly available:
essh @ kubernetes-master: ~ / prometheus $ wget \
https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
–-2019-10-30 07: 29: 52– https://raw.githubusercontent.com/grafana/grafana/master/devenv/docker/ha_test/docker-compose.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com) … 151.101.112.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com) | 151.101.112.133 |: 443 … connected.
HTTP request sent, awaiting response … 200 OK
Length: 2996 (2.9K) [text / plain]
Saving to: 'docker-compose.yaml'
docker-compose.yaml 100% [=========>] 2.93K –.– KB / s in 0s
2019-10-30 07:29:52 (23.4 MB / s) – 'docker-compose.yaml' saved [2996/2996]
Obtaining application metrics
Up to this point, we have looked at the case where Prometheus polled the standard metric accumulator, getting the standard metrics. Now let's try to create an application and submit our metrics. First, let's take a NodeJS server and write an application for it. To do this, let's create a NodeJS project:
vagrant @ ubuntu: ~ $ mkdir nodejs && cd $ _
vagrant @ ubuntu: ~ / nodejs $ npm init
This utility will walk you through creating a package.json file.
It only covers the most common items, and tries to guess sensible defaults.
See `npm help json` for definitive documentation on these fields
and exactly what they do.
Use `npm install
save it as a dependency in the package.json file.
name: (nodejs)
version: (1.0.0)
description:
entry point: (index.js)
test command:
git repository:
keywords:
author: ESSch
license: (ISC)
About to write to /home/vagrant/nodejs/package.json:
{
"name": "nodejs",
"version": "1.0.0",
"description": "",
"main": "index.js",
"scripts": {
"test": "echo \" Error: no test specified \ "&& exit 1"
},
"author": "ESSch",
"license": "ISC"
}
Is this ok? (yes) yes
First, let's create a WEB server. I'll use the library to create it:
vagrant @ ubuntu: ~ / nodejs $ npm install Express –save
npm WARN deprecated Express@3.0.1: Package unsupported. Please use the express package (all lowercase) instead.
nodejs@1.0.0 / home / vagrant / nodejs
└── Express@3.0.1
npm WARN nodejs@1.0.0 No description
npm WARN nodejs@1.0.0 No repository field.
vagrant @ ubuntu: ~ / nodejs $ cat << EOF> index.js
const express = require ('express');
const app = express ();
app.get ('/ healt', function (req, res) {
res.send ({status: "Healt"});
});
app.listen (9999, () => {
console.log ({status: "start"});
});
EOF
vagrant @ ubuntu: ~ / nodejs $ node index.js &
[1] 18963
vagrant @ ubuntu: ~ / nodejs $ {status: 'start'}
vagrant @ ubuntu: ~ / nodejs $ curl localhost: 9999 / healt
{"status": "Healt"}
Our server is ready to work with Prometheus. We need to configure Prometheus for it.
The Prometheus scaling problem arises when the data does not fit on one server, more precisely, when one server does not have time to record data and when the processing of data by one server does not suit the performance. Thanos solves this problem by not requiring federation setup, by providing the user with an interface and API that it broadcasts to Prometheus instances. A web interface similar to Prometheus is available to the user. He himself interacts with agents that are installed on instances as a side-car, as Istio does. He and the agents are available as containers and as a Helm chart. For example, an agent can be brought up as a container configured on Prometheus, and Prometheus is configured with a config followed by a reboot.
docker run –rm quay.io/thanos/thanos:v0.7.0 –help
docker run -d –net = host –rm \
–v $ (pwd) /prometheus0_eu1.yml:/etc/prometheus/prometheus.yml \
–-name prometheus-0-sidecar-eu1 \
–u root \
quay.io/thanos/thanos:v0.7.0 \
sidecar \
–-http-address 0.0.0.0:19090 \
–-grpc-address 0.0.0.0:19190 \
–-reloader.config-file /etc/prometheus/prometheus.yml \
–-prometheus.url http://127.0.0.1:9090
Notifications are an important part of monitoring. Notifications consist of firing triggers and a provider. A trigger is written in PromQL, as a rule, with a condition in Prometheus. When a trigger is triggered (metric condition), Prometheus signals the provider to send a notification. The standard provider is Alertmanager and is capable of sending messages to various receivers such as email and Slack.
For example, the metric "up", which takes the values 0 or 1, can be used to poison a message if the server is off for more than 1 minute. For this, a rule is written:
groups:
– name: example
rules:
– alert: Instance Down
expr: up == 0
for: 1m
When the metric is equal to 0 for more than 1 minute, then this trigger is triggered and Prometheus sends a request to the Alertmanager. Alertmanager specifies what to do with this event. We can prescribe that when the InstanceDown event is received, we need to send a message to the mail. To do this, configure Alertmanager to do this:
global:
smtp_smarthost: 'localhost: 25'
smtp_from: 'youraddress@example.org'
route:
receiver: example-email
receivers:
– name: example-email
email_configs:
– to: 'youraddress@example.org'
Alertmanager itself will use the installed protocol on this computer. In order for it to be able to do this, it must be installed. Take Simple Mail Transfer Protocol (SMTP), for example. To test it, let's install a console mail server in parallel with the Alert Manager – sendmail.
Fast and clear analysis of system logs
OpenSource full-text search engine Lucene is used for quick search in logs. On its basis, two low-level products were built: Sold and Elasticsearch, which are quite similar in capabilities, but differ in usability and license. Many popular assemblies are built on them, for example, just a delivery set with ElasticSearch: ELK (Elasticsearch (Apache Lucene), Logstash, Kibana), EFK (Elasticsearch, Fluentd, Kibana), and products, for example, GrayLog2. Both GrayLog2 and assemblies (ELK / EFK) are actively used due to the lesser need to configure non-test benches, for example, you can put EFK in a Kubernetes cluster with almost one command
helm install efk-stack stable / elastic-stack –set logstash.enabled = false –set fluentd.enabled = true –set fluentd-elastics
An alternative that has not yet received much consideration are systems built on the previously considered Prometheus, for example, PLG (Promtail (agent) – Loki (Prometheus) – Grafana).
Comparison of ElasticSearch and Sold (systems are comparable):
Elastic:
** Commercial with open source and the ability to commit (via approval);
** Supports more complex queries, more analytics, out of the box support for distributed queries, more complete REST-full JSON-BASH, chaining, machine learning, SQL (paid);
*** Full-text search;
*** Real-time index;
*** Monitoring (paid);
*** Monitoring via Elastic FQ;
*** Machine learning (paid);
*** Simple indexing;
*** More data types and structures;
** Lucene engine;
** Parent-child (JOIN);
** Scalable native;
** Documentation from 2010;
Solr:
** OpenSource;
** High speed with JOIN;
*** Full-text search;
*** Real-time index;
*** Monitoring in the admin panel;
*** Machine learning through modules;
*** Input data: Work, PDF and others;
*** Requires a schema for indexing;
*** Data: nested objects;
** Lucene engine;
** JSON join;
** Scalable: Solar Cloud (setting) && ZooKeeper (setting);
** Documentation since 2004.
At the present time, micro-service architecture is increasingly used, which allows due to weak
the connectivity between their components and their simplicity to simplify their development, testing, and debugging.
But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition
in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises
the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,
mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means