Comparison to alternatives

Prometheus vs. Graphite


Graphite focuses on being a passive time series database with a query language and graphing features. Any other concerns are addressed by external components.

Prometheus is a full monitoring and trending system that includes built-in and active scraping, storing, querying, graphing, and alerting based on time series data. It has knowledge about what the world should look like (which endpoints should exist, what time series patterns mean trouble, etc.), and actively tries to find faults.

Data model

Graphite stores numeric samples for named time series, much like Prometheus does. However, Prometheus's metadata model is richer: while Graphite metric names consist of dot-separated components which implicitly encode dimensions, Prometheus encodes dimensions explicitly as key-value pairs (labels) attached to a metric name. This allows easy filtering, grouping, and matching by these labels via in the query language.

Further, especially when Graphite is used in combination with StatsD, it is common to store only aggregated data over all monitored instances, rather than preserving the instance as a dimension and being able to drill down into individual problematic ones.

As an example, storing the number of HTTP requests to API servers with the response code 500 and the method POST to the /tracks endpoint would commonly be encoded like this in Graphite/StatsD: -> 93

In Prometheus the same data could be encoded like this (assuming three api-server instances):

api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample1>"} -> 34
api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample2>"} -> 28
api_server_http_requests_total{method="POST",handler="/tracks",status="500",instance="<sample3>"} -> 31


Graphite stores time series data on local disk in the Whisper format, an RRD-style database that expects samples to arrive at regular intervals. Every time series is stored in a separate file, and new samples overwrite old ones after a certain amount of time.

Prometheus also creates one local file per time series, but allows storing samples at arbitrary intervals as scrapes or rule evaluations occur. Since new samples are simply appended, old data may be kept arbitrarily long. Prometheus also works well for many short-lived, frequently changing sets of time series.


Prometheus offers a richer data model and query language, in addition to being easier to run and integrate into your environment. If you want a clustered solution that can hold historical data long term, Graphite may be a better choice.

Prometheus vs. InfluxDB

InfluxDB is an open-source time series database, with a commercial option for scaling and clustering. The InfluxDB project was released almost a year after Prometheus development began, so we were unable to consider it as an alternative at the time. Still, there are significant differences between Prometheus and InfluxDB, and both systems are geared towards slightly different use cases.


For a fair comparison, we must also consider Kapacitor together with InfluxDB, as in combination they address the same problem space as Prometheus and the Alertmanager.

The same scope differences as in the case of Graphite apply here for InfluxDB itself. In addition InfluxDB offers continuous queries, which are equivalent to Prometheus recording rules.

Kapacitor’s scope is a combination of Prometheus recording rules, alerting rules, and the Alertmanager's notification functionality. Prometheus offers a more powerful query language for graphing and alerting. The Prometheus Alertmanager additionally offers grouping, deduplication and silencing functionality.

Data model / storage

Like Prometheus, the InfluxDB data model has key-value pairs as labels, which are called tags. In addition InfluxDB has a second level of labels called fields, which are more limited in use. InfluxDB supports timestamps with up to nanosecond resolution, and float64, int64, bool, and string data types. Prometheus by contrast supports the float64 data type with limited support for strings, and millisecond resolution timestamps.

InfluxDB uses a variant of a log-structured merge tree for storage with a write ahead log, sharded by time. This is much more suitable to event logging than Prometheus's append-only file per time series approach.

Logs and Metrics and Graphs, Oh My! describes the difference between event logging and metrics recording.


Prometheus servers run independently of each other and only rely on their local storage for their core functionality: scraping, rule processing, and alerting. The open source version of InfluxDB is similar.

The commercial InfluxDB offering is by design a distributed storage cluster with storage and queries being handled by many nodes at once.

This means that the commercial InfluxDB will be easier to scale horizontally, but it also means that you have to manage the complexity of a distributed storage system from the beginning. Prometheus will be simpler to run, but at some point you will need to shard servers explicitly along scalability boundaries like products, services, datacenters, or similar aspects. Independent servers (which can be run redundantly in parallel) may also give you better reliability and failure isolation.

Kapacitor currently has no built-in distributed/redundant options for rules, alerting or notifications. Prometheus and the Alertmanager by contrast offer a redundant option via running redundant replicas of Prometheus and using the Alertmanager's High Availability mode. In addition, Kapacitor can be scaled via manual sharding by the user, similar to Prometheus itself.


There are many similarities between the systems. Both have labels (called tags in InfluxDB) to efficiently support multi-dimensional metrics. Both use basically the same data compression algorithms. Both have extensive integrations, including with each other. Both have hooks allowing you to extend them further, such as analysing data in statistical tools or performing automated actions.

Where InfluxDB is better:

  • If you're doing event logging.
  • Commercial option offers clustering for InfluxDB, which is also better for long term data storage.
  • Eventually consistent view of data between replicas.

Where Prometheus is better:

  • If you're primarily doing metrics.
  • More powerful query language, alerting, and notification functionality.
  • Higher availability and uptime for graphing and alerting.

InfluxDB is maintained by a single commercial company following the open-core model, offering premium features like closed-source clustering, hosting and support. Prometheus is a fully open source and independent project, maintained by a number of companies and individuals, some of whom also offer commercial services and support.

Prometheus vs. OpenTSDB

OpenTSDB is a distributed time series database based on Hadoop and HBase.


The same scope differences as in the case of Graphite apply here.

Data model

OpenTSDB's data model is almost identical to Prometheus's: time series are identified by a set of arbitrary key-value pairs (OpenTSDB "tags" are Prometheus "labels"). All data for a metric is stored together, limiting the cardinality of metrics. There are minor differences though, such as that Prometheus allows arbitrary characters in label values, while OpenTSDB is more restrictive. OpenTSDB is also lacking a full query language, only allowing simple aggregation and math via its API.


OpenTSDB's storage is implemented on top of Hadoop and HBase. This means that it is easy to scale OpenTSDB horizontally, but you have to accept the overall complexity of running a Hadoop/HBase cluster from the beginning.

Prometheus will be simpler to run initially, but will require explicit sharding once the capacity of a single node is exceeded.


Prometheus offers a much richer query language, can handle higher cardinality metrics and forms part of a complete monitoring system. If you're already running Hadoop and value long term storage over these benefits, OpenTSDB is a good choice.

Prometheus vs. Nagios

Nagios is a monitoring system that originated in the 90s as NetSaint.


Nagios is primarily about alerting based on the exit codes of scripts. These are called “checks”. There is silencing of individual alerts, however no grouping, routing or deduplication.

There are a variety of plugins. For example, piping the few kilobytes of perfData plugins are allowed to return to a time series database such as Graphite or using NRPE to run checks on remote machines.

Data model

Nagios is host-based. Each host can have one or more services, which has one check.

There is no notion of labels or a query language.


Nagios has no storage per-se, beyond the current check state. There are plugins which can store data such as for visualisation.


Nagios servers are standalone. All configuration of checks is via file.


Nagios is suitable for basic monitoring of small and/or static systems where blackbox probing is sufficient.

If you want to do whitebox monitoring, or have a dynamic or cloud based environment then Prometheus is a good choice.

Prometheus vs. Sensu

Sensu is broadly speaking a more modern Nagios.


The same general scope differences as in the case of Nagios apply here.

The primary difference is that Sensu clients register themselves, and can determine the checks to run either from central or local configuration. Sensu does not have a limit on the amount of perfData.

There is also a client socket permitting arbitrary check results to be pushed into Sensu.

Data model

Sensu has the same rough data model as Nagios.


Sensu has storage in Redis called stashes. These are used primarily for storing silences. It also stores all the clients that have registered with it.


Sensu has a number of components. It uses RabbitMQ as a transport, Redis for current state, and a separate Server for processing.

Both RabbitMQ and Redis can be clustered. Multiple copies of the server can be run for scaling and redundancy.


If you have an existing Nagios setup that you wish to scale as-is or taking advantage of the registration feature of Sensu, then Sensu is a good choice.

If you want to do whitebox monitoring, or have a very dynamic or cloud based environment, then Prometheus is a good choice.