Metrics & Monitoring

The Prometheus Node Exporter is an open-source time-series monitoring and alerting system for cloud-native environments, including Kubernetes, hosted by the Cloud Native Computing Foundation (CNCF) on GitHub.

It can collect and store node-level metrics as time-series data, recording information with a timestamp.

What is the Prometheus Node Exporter?

The Prometheus Node Exporter is an open-source time-series monitoring and alerting system for cloud-native environments, including Kubernetes, hosted by the Cloud Native Computing Foundation (CNCF) on GitHub. It can collect and store node-level metrics as time-series data, recording information with a timestamp. It can also collect and record labels, which are optional key-value pairs.

The statistics which are detailed in the table below are used to monitor system performance to avoid slow-down, outages, and troubleshoot node-level issues.

Components of Prometheus

  • Nodes: Servers or devices in a larger system that are capable of either sending, receiving, or forwarding information to another member of the system.
  • Collectors: Various types of collectors can be enabled to gather the statistics from nodes, containers, and pods.
  • Prometheus Server: The server is used to configure the data collection specifics via collectors from pods and nodes using a scrape configuration file. It stores data in its integrated local on-disk time series database. It uses a unique query language to aggregate results in real-time. It triggers alerts based on defined conditions. 
  • Node Exporter: The exporter exposes the metrics via the user interface (UI) and aggregates them in the system’s preferred format.

There are several third-party integrations that allow database, hardware, storage, web, messaging, network, monitoring, logging, and CI/CD metrics to be exported as Prometheus metrics. Examples of these integrations are listed below.

  • Hardware: Netgear, Windows, IBM Z, etc. 
  • Database: MySQL, MS SQL, CouchDB, MongoDB, Oracle, etc. 
  • Messaging: MQ, Kafka, MQTT, etc. 
  • Storage: NetApp, Hadoop, Pure Storage, Tivoli, etc. 
  • HTTP: Apache, Nginx, Squid, etc. 
  • APIs: Docker, Azure, AWS, GitHub, Google, etc. 
  • Issue trackers and Continuous Integration: Bitbucket, Confluence, Jenkins, JIRA, etc. 

Key Benefits of using Prometheus

  • Time Series Database (TSDB): Prometheus is a Time Series Database which can track, monitor, and aggregate metrics over time. Once collected, these repeated measurements can be visualized to show patterns and anomalies. 
  • PULL Retrieval: Prometheus actively scrapes targets to retrieve system and application metrics from endpoints via HTTP calls. 
  • Centralized Control: Prometheus is configured on the Prometheus system to determine which metrics to pull from which endpoints and how often to pull them. 
  • System Discovery: Prometheus can discover new endpoints dynamically and automatically begin collecting metrics. 
  • Alerting Ecosystem: Prometheus Alert Manager can push alerts designated by custom rules defined in configuration files to specified endpoints. 
  • Scalable: All single server nodes remain autonomous so data can be sent to various aggregation points as needed. Metrics can be clustered or sent to individual servers based on the configuration of each endpoint. 
  • Data Visualization: Prometheus allows data to be easily filtered and graphed using the four core metrics types:  
  • Counter: A cumulative metric whose value can either increase or be reset to zero on restart. It can measure values such as number of requests served, errors, or tasks completed. 
  • Gauge: This metric represents one numerical value which can arbitrarily increase or decrease. It can measure values such as current memory usage, temperatures, number of currently running processes, or number of concurrent requests. 
  • Histogram: This metric samples observations and counts them in configurable buckets for such things as request durations and response sizes. A histogram with a base metric name of <basename> exposes multiple time series during a scrape: 
  • cumulative counters for the observation buckets, exposed as <basename>_bucket{le=”<upper inclusive bound>”} 
  • the total sum of all observed values, exposed as <basename>_sum 
  • the count of events that have been observed, exposed as <basename>_count (identical to <basename>_bucket{le=”+Inf”} above) 
  • Summary: This metric is like histogram, but also calculates configurable quantiles over a sliding time window. A summary with a base metric name of <basename> exposes multiple time series during a scrape: 
  • streaming φ-quantiles (0 ≤ φ ≤ 1) of observed events, exposed as <basename>{quantile=”<φ>”} 
  • the total sum of all observed values, exposed as <basename>_sum 
  • the count of events that have been observed, exposed as <basename>_count 

Find out more here.