Modernizing Prometheus: Native Storage for Composite Types

February 14, 2026 by Bartłomiej Płotka (@bwplotka)

Over the last year, the Prometheus community has been working hard on several interesting and ambitious changes that previously would have been seen as controversial or not feasible. While there might be little visibility about those from the outside (e.g., it's not an OpenClaw Prometheus plugin, sorry 🙃), Prometheus developers are, organically, steering Prometheus into a certain, coherent future. Piece by piece, we unexpectedly get closer to goals we never dreamed we would achieve as an open-source project!

This post starts (hopefully!) as a series of blog posts that share a few ambitious shifts that might be exciting to new and existing Prometheus users and developers. In this post, I'd love to focus on the idea of native storage for the composite types which is tidying up a lot of challenges that piled up over time. Make sure to check the provided inlined links on how you can adopt some of those changes early or contribute!

CAUTION: Disclaimer: This post is intended as a fun overview, from my own personal point of view as a Prometheus maintainer. Some of the mentioned changes haven't been (yet) officially approved by the Prometheus Team; some of them were not proved in production.
NOTE: This post was written by humans; AI was used only for cosmetic and grammar fixes.

Classic Representation: Primitive Samples

As you might know, the Prometheus data model (so server, PromQL, protocols) supports gauges, counters, histograms and summaries. OpenMetrics 1.0 extended this with gaugehistogram, info and stateset types.

Impressively, for a long time Prometheus' TSDB storage implementation had an explicitly clean and simple data model. The TSDB allowed the storage and retrieval of string-labelled primitive samples containing only float64 values and int64 timestamps. It was completely metric-type-agnostic.

The metric types were implied on top of the TSDB, for humans and best effort tooling for PromQL. For simplicity, let's call this way of storing types a classic model or representation. In this model:

We have primitive types:

  • gauge is a "default" type with no special rules, just a float sample with labels.

  • counter that should have a _total suffix in the name for humans to understand its semantics.

    foo_total 17.0
    
  • info that needs an _info suffix in the metric name and always has a value of 1.

We have composite types. This is where the fun begins. In the classic representation, composite metrics are represented as a set of primitive float samples:

  • histogram is a group of counters with certain mandatory suffixes and le labels:

    foo_bucket{le="0.0"} 0
    foo_bucket{le="1e-05"} 0
    foo_bucket{le="0.0001"} 5
    foo_bucket{le="0.1"} 8
    foo_bucket{le="1.0"} 10
    foo_bucket{le="10.0"} 11
    foo_bucket{le="100000.0"} 11
    foo_bucket{le="1e+06"} 15
    foo_bucket{le="1e+23"} 16
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    
  • gaugehistogram, summary, and stateset types follow the same logic – a group of special gauges or counters that compose a single metric.

The classic model served the Prometheus project well. It significantly simplified the storage implementation, enabling Prometheus to be one of the most optimized, open-source time-series databases, with distributed versions based on the same data model available in projects like Cortex, Thanos, and Mimir, etc.

Unfortunately, there are always tradeoffs. This classic model has a few limitations:

  • Efficiency: It tends to yield overhead for composite types because every new piece of data (e.g., new bucket) takes precious index space (it's a new unique series), whereas samples are significantly more compressible (rarely change, time-oriented).
  • Functionality: It poses limitations to the shape and flexibility of the data you store (unless we'd go into some JSON-encoded labels, which have massive downsides).
  • Transactionality: Primitive pieces of composite types (separate counters) are processed independently. While we did a lot of work to ensure write isolation and transactionality for scrapes, transactionality completely breaks apart when data is received or sent via remote write, OTLP protocols. For example, a classic histogram foo might have been partially sent, its foo_bucket{le="1.1e+23"} 17 counter series be delayed or dropped accidentally, which risks triggering false positive alerts.
  • Reliability: Consumers of the TSDB data have to essentially guess the type semantics. There's nothing stopping users from writing a foo_bucket gauge or foo_total histogram.

A Glimpse of Native Storage for Composite Types

The classic model was challenged by the introduction of native histograms. The TSDB was extended to store composite histogram samples  other than float. We tend to call this a native histogram, because TSDB can now "natively" store a full (with sparse and exponential buckets) histogram as an atomic, composite sample.

At that point, the common wisdom was to stop there. The special advanced histogram that's generally meant to replace the "classic" histograms uses a composite sample, while the rest of the metrics use the classic model. Making other composite types consistent with the new native model felt extremely disruptive to users, with too much work and risks. A common counter-argument was that users will eventually migrate their classic histograms naturally, and summaries are also less useful, given the more powerful bucketing and lower cost of native histograms.

Unfortunately, the migration to native histograms was known to take time, given the slight PromQL change required to use them, and the new bucketing and client changes needed (applications have to define new or edit existing metrics to new histograms). There will also be old software used for a long time that never migrates. Eventually, it leaves Prometheus with no chance of deprecating classic histograms, with all the software solutions required to support the classic model, likely for decades.

However, native histograms did push TSDB and the ecosystem into that new composite sample pattern. Some of those changes could be easily adapted to all composite types. Native histograms also gave us a glimpse of the many benefits of that native support. It was tempting to ask ourselves: would it be possible to add native counterparts of the existing composite metrics to replace them, ideally transparently?

Organically, in 2024, for transactionality and efficiency, we introduced a native histogram custom buckets(NHCB)  concept that essentially allows storing classic histograms with explicit buckets natively, reusing native histogram composite sample data structures.

NHCB has proven to be at least 30% more efficient than the classic representation, while offering functional parity with classic histograms. However, two practical challenges emerged that slowed down the adoption:

  1. Expanding, that is converting from NHCB to classic histogram, is relatively trivial, but combining, which is turning a classic histogram into NHCB, is often not feasible. We don't want to wait for client ecosystem adoption, and also being mindful of legacy, hard to change software, we envisioned NHCB being converted (so combined) on scrape from the classic representation. That has proven to be somewhat expensive on scrape. Additionally, combination logic is practically impossible when receiving "pushes" (e.g., remote write with classic histograms), as you could end up having different parts of the same histogram sample (e.g., buckets and count) sent via different remote write shards or sequential messages. This combination challenge is also why OpenTelemetry collector users see an extra overhead on prometheusreceiver as the OpenTelemetry model strictly follows the composite sample model.

  2. Consumption is slightly different, especially in the PromQL query syntax. Our initial decision was to surface NHCB histograms using a native-histogram-like PromQL syntax. For example the following classic histogram:

    foo_bucket{le="0.0"} 0
    # ...
    foo_bucket{le="1.1e+23"} 17
    foo_bucket{le="+Inf"} 17
    foo_count 17
    foo_sum 324789.3
    

    When we convert this to NHCB, you can no longer use foo_bucket as your metric name selector. Since NHCB is now stored as a foo metric, you need to use:

    histogram_quantile(0.9, sum(foo{job="a"}))
    
    # Old syntax: histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
    

    This has also another effect. It violates our "what you see is what you query"  rule for the text formats, at least until OpenMetrics 2 .

    On top of that, similar problems occur on other Prometheus outputs (federation, remote read, and remote write).

NOTE: Fun fact: Prometheus client data model (SDKs)  and PrometheusProto scrape protocol use the composite sample model already!

Transparent Native Representation

Let's get straight to the point. Organically, the Prometheus community seems to align with the following two ideas:

  • We want to eventually move to a fully composite sample model on the storage layer, given all the benefits.
  • Users needs to be able to switch (e.g., on scrape) from classic to native form in storage without breaking consumption layer. Essentially to help with non-trivial migration pains (finding who use what, double-writing, synchronizing), avoiding tricky, dual mode, protocol changes  and to deprecate the classic model ASAP for the sustainability of the Prometheus codebase, we need to ensure eventual consumption migration e.g., PromQL queries -- independently to the storage layer.

Let's go through evidence of this direction, which also represents efforts you can contribute to or adopt early!

  1. We are discussing the "native" summary  and stateset  to fully eliminate classic model for all composite types. Feel free to join and help on that work!

  2. We are working on the OpenMetrics 2.0  to consolidate and improve the pull protocol scene and apply the new learnings. One of the core changes will be the move to composite values in text , which makes the text format trivial to parse for storages that support composite types natively. This solves the combining challenge. Note that, by default, for now, all composite types will be still "expanded" to classic format on scrape, so there's no breaking change for users. Feel free to join our WG to help or give feedback.

  3. Prometheus receive and export protocol has been updated. Remote Write 2.0 allows transporting histograms in the "native" form instead of a classic representation (classic one is still supported). In the future versions (e.g. 2.1), we could easily follow a similar pattern and add native summaries and stateset. Contributions are welcome to make Remote Write 2.0 stable !

  4. We are experimenting with the consumption compatibility modes that translate the composite types store as composite samples to classic representation. This is not trivial; there are edge cases, but it might be more feasible (and needed!) than we might have initially anticipated. See:

    In PromQL it might work as follows, for an NHCB that used to be a classic histogram:

    # New syntax gives our "foo" NHCB:    
    histogram_quantile(0.9, sum(foo{job="a"}))
    # Old syntax still works, expanding "foo" NHCB to classic representation:
    histogram_quantile(0.9, sum(foo_bucket{job="a"}) by (le))
    

    Alternatives, like a special label or annotations , are also discussed.

When implemented, it should be possible to fully switch different parts of your metric collection pipeline to native form transparently.

Summary

Moving Prometheus to a native composite type world is not easy and will take time, especially around coding, testing and optimizing. Notably it switches performance characteristics of the metric load from uniform, predictable sample sizes to a sample size that depends on a type. Another challenge is code architecture - maintaining different sample types has already proven to be very verbose (we need unions, Go! ).

However, recent work revealed a very clean and possible path that yields clear benefits around functionality, transactionality, reliability, and efficiency in the relatively near future, which is pretty exciting!

If you have any questions around these changes, feel free to:

  • DM me on Slack.
  • Visit the #prometheus-dev Slack channel and share your questions.
  • Comment on related issues, create PRs, also review PRs (the most impactful work!)

The Prometheus community is also at KubeConEU 2026 in Amsterdam! Make sure to:

I'm hoping we can share stories of other important, orthogonal shifts we see in the community in future posts. No promises (and help welcome!), but there's a lot to cover, such as (random order, not a full list):

  1. Our native start timestamp feature journey  that cleanly unblocks native delta temporality  without "hacks" like reusing gauges, separate layer of metric types or label annotations e,g., __temporality__.
  2. Optional schematization of Prometheus metrics  that attempt to solve a ton of stability problems with metric naming and shape; building on top of OpenTelemetry semconv.
  3. Our metadata storage journey  that attempts to improve the OpenTelemetry Entities and resource attributes storage and consumption experience.
  4. Our journey to organize and extend Prometheus scrape pull protocols with the recent ownership move of OpenMetrics.
  5. An incredible TSDB Parquet  effort, coming from the three LTS project groups (Cortex, Thanos, Mimir) working together, attempting to improve high-cardinality cases.
  6. Fun experiments with PromQL extensions, like PromQL with pipes and variables  and some new SQL transpilation ideas.
  7. Governance changes.

See you in open-source!