cpu_usage: 2
cpu_usage {app: myapp}: 2
Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:
* TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.
* Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.
* Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.
To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:
** node_exporter – node metrics (CRU, Memory, Network);
** snmp_exporter – SNMP protocol metrics;
** mysqld_exporter – MySQL database metrics;
** consul_exporter – Consul database metrics;
** graphite_exporter – Graphite database metrics;
** memcached_exporter – Memcached database metrics;
** haproxy_exporter – HAProxy balancer metrics;
** CAdvisor – container metrics;
** process-exporter – detailed process metrics;
** metrics-server – CRU, Memory, File-descriptors, Disks;
** cAdvisor – a Docker daemon metrics – containers monitoring;
** kube-state-metrics – deployments, PODs, nodes.
Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:
remote_write:
– url: "http: // localhost: 9000 / receive"
Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:
controlplane $ kubectl -n istio-system get svc prometheus
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE
prometheus ClusterIP 10.99.70.170
To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"
machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01", kubernetes_io_hostname = "node01"
If the network is MEMORY, then you can select machine_memory_bytes – the size of the RAM on the machine (server or virtual):
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "}
machine_memory_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node901", kubernetes_io_hostname = "node901"
But in bytes it is not clear, so we will use PromQL to translate to Gb: machine_memory_bytes / 1000/1000/1000
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "controlplane", kubernetes_io25}
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname = "node01", kubernetes_io48}
Let's enter for memory_bytes to search for container_memory_usage_bytes – used memory. The list contains all containers and their current memory consumption, I will give only three:
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepod-pods-beseff633. b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes-cadvisorname," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor "," kubernetes-cadvisor " , name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name = "etcd-controlplane"} 45056
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepods-pods-besteff2. 76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-caduvisor ", kubernetes_dio_host , name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-nhzhn", pod_name = "kube-proxy-450 nhz
container_memory_usage_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-besteffort.slice / kubepa-poda-besteffort. 24ef0e898e1bb7dec9854b67291171aa9c5715d7683f53bdfc2cef49a19744fe.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" node01 ", job =" kubernetes-caduvisor amuber, kubernetes_dio_arch ", kubernetes_dio_arch , name = "k8s_POD_kube-proxy-6v49x_kube-system_6473aeea-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube-proxy-6v49x", pod_name = "kube-proxy-835549x
Let's set the label that is contained in the metrics to filter out one: container_memory_usage_bytes {container_name = "prometheus"}
container_MEMORY_usage_bytes {beta_Kubernetes_io_arch = "amd64", beta_Kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubePODs.slice / kubePODs-burstableODslice-burdeaf2.slice. b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6.scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", JOB =" Kubernetes-cadvisor ", Kubernetes_io_arch =" amd64 ", Kubernetes_io_hostname =" node01 ", Kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus- 5b77b7d695-knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", POD =" prometheus-5b77b7d695-knf44 ", POD_name =" prometheus-5b77b7d44
283443200
Let's bring in Mb: container_memory_usage_bytes {container_name = "prometheus"} / 1000/1000
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6 .scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695 -knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
286.18752
Filter by container_memory_usage_bytes {container_name = "prometheus", instance = "node01"}
beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "prometheus", container_name = "prometheus", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-podeaf4e833_f2de_11ea_88d2_0242ac110032.slice / docker-b314fb5c4ce8894f872f05bdd524b4b7d6ce5415aeb3fb91d6048441c47584a6. scope ", image =" sha256: b82ef1f3aa072922c657dd2b2c6b59ec0ac88e69c447998291066e1f67e741d8 ", instance =" node01 ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" node01 ", kubernetes_io_os =" linux ", name =" k8s_prometheus_prometheus-5b77b7d695- knf44_istio-system_eaf4e833-f2de-11ea-88d2-0242ac110032_0 ", namespace =" istio-system ", pod =" prometheus-5b77b7d695-knf44 ", pod_name =" prometheus-5b77b7d695-knf44 "}
289.890304
And on the second one it is not: container_memory_usage_bytes {container_name = "prometheus", instance = "node02"}
no data
There are also aggregate functions sum (container_memory_usage_bytes) / 1000/1000/1000
{} 22.812798976
max (container_memory_usage_bytes) / 1000/1000/1000
{} 3.6422983679999996
min (container_memory_usage_bytes) / 1000/1000/1000
{} 0
You can also group by labels instance: max (container_memory_usage_bytes) by (instance) / 1000/1000/1000
{instance = "controlplane"} 1.641836544
{instance = "node01"} 3.6622745599999997
You can perform actions with the same type of labels and filter out: container_memory_mapped_file / container_memory_usage_bytes * 100> 80
{Beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", id = "/ kubepods.slice / kubepods-burstable.slice / kubepods-burstable-pode45f10af1ae684722cbd74cb11807900.slice / docker-5cb2f2083fbc467b8b394b27b69686d309f951450bcb910d509572aea9922806 .scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname =" controlplane ", kubernetes_io_os =" linux ", name = "k8s_POD_kube-controller-manager-controlplane_kube-system_e45f10af1ae684722cbd74cb11807900_0", namespace = "kube-system", pod = "kube-controller-manager-controlplane", pod_name = "kube-controller-manager-controlplane"}
80.52631578947368
You can look at the file system metrics using container_fs_limit_bytes, which produces a large list – I will give a few of it:
container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod0e619e5dc53ed9efcef63f5fe1d7ee71.slice / docker-b6549e892baa8687e4e98a106024b5c31a4af077d7c5544af03a3c72ec8997e0.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_etcd-controlplane_kube-system_0e619e5dc53ed9efcef63f5fe1d7ee71_0", namespace = "kube-system", pod = "etcd-controlplane", pod_name "} etcd-controlplane =" etc
253741748224
container_fs_limit_bytes {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", container = "POD", container_name = "POD", device = "/ dev / vda1", id = "/ kubepods.slice / kubepods-besteffort.subods / kubepods-besteffort.slice -besteffort-pod5a815a40_f2de_11ea_88d2_0242ac110032.slice / docker-76711789af076c8f2331d8212dad4c044d263c5cc3fa333347921bd6de7950a4.scope ", image =" k8s.gcr.io/pause:3.1 ", instance =" controlplane ", job =" kubernetes-cadvisor ", kubernetes_io_arch =" amd64 ", kubernetes_io_hostname = "controlplane", kubernetes_io_os = "linux", name = "k8s_POD_kube-proxy-nhzhn_kube-system_5a815a40-f2de-11ea-88d2-0242ac110032_0", namespace = "kube-system", pod = "kube_name =", podhn "kube-proxy-nhzhn"}
253741748224
It contains the metrics of RAM through its device: "container_fs_limit_bytes {device =" tmpfs "} / 1000/1000/1000"
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes control_ioplane_host , kubernetes_io_os = "linux"} 0.209702912
{beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", device = "tmpfs", id = "/", instance = "node01", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_host , kubernetes_io_os = "linux"} 0.409296896
If we want to get the minimum disk, then we need to remove the RAM device from the list: "min (container_fs_limit_bytes {device! =" Tmpfs "} / 1000/1000/1000)"
{} 253.74174822400002
In addition to metrics that indicate the value of the metric itself, there are metrics and counters. Their names usually end in "_total". If we look at them, we will see an ascending line. To get the value, we need to get the difference (using the rate function) over a period of time (indicated in square brackets), something like rate (name_metric_total) [time]. Time is usually kept in seconds or minutes. The prefix "s" is used to represent seconds, for example 40s, 60s. For minutes – "m", for example, 2m, 5m. It is important to note that you cannot set a time shorter than the exporter polling time, otherwise the metric will not be displayed.
And you can see the names of the metrics that you could record along the path / metrics:
controlplane $ curl https://2886795314-9090-ollie08.environments.katacoda.com/metrics 2> / dev / null | head
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds {quantile = "0"} 3.536e-05
go_gc_duration_seconds {quantile = "0.25"} 7.5348e-05
go_gc_duration_seconds {quantile = "0.5"} 0.000163193
go_gc_duration_seconds {quantile = "0.75"} 0.001391603
go_gc_duration_seconds {quantile = "1"} 0.246707852
go_gc_duration_seconds_sum 0.388611299
go_gc_duration_seconds_count 74
# HELP go_goroutines Number of goroutines that currently exist.
Raising the Prometheus and Graphana ligament
We examined the metrics in the already configured Prometheus, now we will raise Prometheus and configure it ourselves: