Welcome to Prometheus! Prometheus is a monitoring platform that collects metrics from monitored targets by scraping metrics HTTP endpoints on these targets. This guide will show you to how to install, configure and monitor our first resource with Prometheus. You'll download, install and run Prometheus. You'll also download and install an exporter, tools that expose time series data on hosts and services. Our first exporter will be the Node Exporter, which exposes host-level metrics like CPU, memory, and disk.
Download the latest release of Prometheus for your platform, then extract it:
tar xvfz prometheus-*.tar.gz cd prometheus-*
The Prometheus server is a single binary called
prometheus.exe on Microsoft Windows). We can run the binary and see help on its options by passing the
./prometheus --help usage: prometheus [<flags>] The Prometheus monitoring server . . .
Before starting Prometheus, let's configure it.
Prometheus configuration is YAML. The Prometheus download comes with a sample configuration in a file called
prometheus.yml that is a good place to get started.
We've stripped out most of the comments in the example file to make it more succinct (comments are the lines prefixed with a
global: scrape_interval: 15s evaluation_interval: 15s rule_files: # - "first.rules" # - "second.rules" scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
There are three blocks of configuration in the example configuration file:
global block controls the Prometheus server's global configuration. We have three options present. The first,
scrape_interval, controls how often Prometheus will scrape targets. You can override this for individual targets. In this case the global setting is to scrape every 15 seconds. The
evaluation_interval option controls how often Prometheus will evaluate rules. Prometheus uses rules to create new time series and to generate alerts.
rule_files block specifies the location of any rules we want the Prometheus server to load. For now we've got no rules.
The last block,
scrape_configs, controls what resources Prometheus monitors. Since Prometheus also exposes data about itself as an HTTP endpoint it can scrape and monitor its own health. In the default configuration there is a single job, called
prometheus, which scrapes the time series data exposed by the Prometheus server. The job contains a single, statically configured, target, the
localhost on port
9090. Prometheus expects metrics to be available on targets on a path of
/metrics. So this default job is scraping via the URL: http://localhost:9090/metrics.
The time series data returned will detail the state and performance of the Prometheus server.
For a complete specification of configuration options, see the configuration documentation.
To start Prometheus with our newly created configuration file, change to the directory containing the Prometheus binary and run:
Prometheus should start up. You should also be able to browse to a status page about itself at http://localhost:9090. Give it about 30 seconds to collect data about itself from its own HTTP metrics endpoint.
You can also verify that Prometheus is serving metrics about itself by navigating to its own metrics endpoint: http://localhost:9090/metrics.
Let us try looking at some data that Prometheus has collected about itself. To use Prometheus's built-in expression browser, navigate to http://localhost:9090/graph and choose the "Console" view within the "Graph" tab.
As you can gather from http://localhost:9090/metrics, one metric that
Prometheus exports about itself is called
http_requests_total (the total number of HTTP requests the Prometheus server has made). Go ahead and enter this into the expression console:
This should return a number of different time series (along with the latest value recorded for each), all with the metric name
http_requests_total, but with different labels. These labels designate different types of requests.
If we were only interested in requests that resulted in HTTP code
200, we could use this query to retrieve that information:
To count the number of returned time series, you could write:
For more about the expression language, see the expression language documentation.
To graph expressions, navigate to http://localhost:9090/graph and use the "Graph" tab.
For example, enter the following expression to graph the per-second HTTP request rate happening in the self-scraped Prometheus:
You can experiment with the graph range parameters and other settings.
Collecting metrics from Prometheus alone is not a good representation of Prometheus' capabilities. So let's use the Node Exporter to monitor our first resource. We're going to monitor the local Linux host that the Prometheus server is running on but you could monitor any Linux or OS X host. There's also a WMI exporter for Microsoft Windows hosts too.
Download the latest release of the Node Exporter of Prometheus for your platform, then extract it:
tar xvfz node_exporter-*.tar.gz cd node_exporter-*
The Node Exporter is a single binary,
node_exporter, and has a configurable set of collectors for gathering various types of host-based metrics. By default, collectors gather CPU, memory, disk, and other metrics and expose them for scraping.
Let's start the Node Exporter now on our Linux host.
The Node Exporter's metrics are available on port
9100 on the host at the
/metrics path. In our case this is: http://localhost:9100/metrics.
You can browse to this URL to see the metrics being exposed.
We now need to tell Prometheus about our new exporter.
We will configure Prometheus to scrape this new target. To achieve this, add a new job definition to the
scrape_configs section in our
- job_name: node static_configs: - targets: ['localhost:9100']
Our new job is called
node. It scrapes a static target,
localhost on port
9100. You would replace this name with the name or IP address of the host you're monitoring.
Now we restart our Prometheus server to activate our new job.
Go to the expression browser and verify that Prometheus now has information
about the time series that this endpoint exposes. Navigate to
http://localhost:9090/graph and use the dropdown next to the "Execute" button to see a list of metrics this server is collecting. In the list you'll see a number of metrics prefixed with
node_, that have been collected by the Node Exporter by our
node job. For example, you can see the node's CPU usage via the
One useful metric to look for is the
up metric. The
up metric can be used to track the status of the target. If the metric has a value of
1 then the scrape of the target was successful, if
0 it failed. This can help give you an indication of the status of the target. You'll see two
up metrics, one for each target we're scraping: the Prometheus server and the Node Exporter.
Now you've been introduced to Prometheus, installed it, and configured it to monitor your first resources. We've also installed our first exporter and seen the basics of how to work with time series data scraped using the expression browser. You can find more documentation and guides to help you continue to learn more about Prometheus.