Architecture

Architecture

Quick Start

Data Model

Every time series is uniquely identified by its metric name and optional key-value pairs called labels.

  • Metric Name
  • Metric Label
    • The change of any labels value, including adding or removing labels, will create a new time series.

Metric Type

Prometheus supports four types of metrics, which are - Counter - Gauge - Histogram - Summary

  • Counter: a metric value that can only increase or reset
    # use a counter to represent the number of requests served, tasks completed, or errors
    - http_requests_total{handler='/get_user_id', method='GET', status='200'}
    - errors_total{type='runtime', severity='critical'}
    
  • Gauge: a number which can either go up or down
    # used for measured values like temperatures or current memory usage
    - memory_usage_bytes{process_name='web_server', instance='10.0.0.1:8080'}
    - queue_size{queue_name='low_priority', worker_type='background'}
    
  • Histogram: used for any calculated value which is counted based on bucket values,
    • bucket value determines the ordinate value (y coordinate of a standard two-dimensional graph)
    • cumulative counters for the observation buckets, exposed as _bucket{le="<upper inclusive bound>"}
    # usually things like request durations or response sizes
    # le="0.3" means less or equal to 0.3
    http_latency_sum 134420.14452212452
    http_latency_second_bucket{le="0.05"} 11326.0
    http_latency_second_bucket{le="0.1"} 2.284831e+06
    http_latency_second_bucket{le="0.15"} 2.285367e+06
    http_latency_second_bucket{le="0.25"} 2.285592e+06
    http_latency_second_bucket{le="1.0"} 2.285613e+06
    http_latency_second_bucket{le="+Inf"} 2.285619e+06
    http_latency_count 2.285619e+06
    
    # cumulative means that the count for le=”0.5” bucket also includes the count for le=”0.25” bucket.
    # Consider the following hypothetical distribution of observations for 200 observations.
    ┌─────────────┬──────────────────────┬──────────────────┐
    │ Bucket Size │ Cumulative Frequency │ Upper Bound      │
    │             │ Count                │ Percentile       │
    ├─────────────┼──────────────────────┼──────────────────┤
    │ 50ms        │                   20 │ p10              │
    │ 100ms       │                   70 │ p35              │
    │ 250ms       │                  120 │ p60              │
    │ 500ms       │                  150 │ p75              │
    │ 1000ms      │                  200 │ p100             │
    │ INF         │                  200 │ p100             │
    └─────────────┴──────────────────────┴──────────────────┘
    
  • Summary: measure events and are an alternative to histograms. They are cheaper but lose more data (it is highly recommended to use histograms over summaries whenever possible.)

Storage

workflow

./data
├── 01BKGV7JBM69T2G1BGBGM6KB12
│   └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98 
│   ├── chunks                 
│   │   └── 000001
│   ├── tombstones
│   ├── index                  
│   └── meta.json
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K
│   └── meta.json
├── 01BKGV7JC0RY8A6MACW02A2PJD
│   ├── chunks
│   │   └── 000001
│   ├── tombstones
│   ├── index
│   └── meta.json
├── chunks_head
│   └── 000001
└── wal
    ├── 000000002
    └── checkpoint.00000001
        └── 00000000

see detals, simple put:

  • blocks: ingested samples are grouped into blocks of 2 hours, e.g. 01BKGV7JBM69T2G1BGBGM6KB12 is a block
  • chunks:
    • it’s a directory that contains the time series data for that window of time (up to 2 hours)
    • The samples in the chunks directory are grouped together into one or more segment files of up to 512MB each by default
  • tombstones: marked deletion records (instead of deleting the data immediately from the chunk segments)
  • index: inverted index which indexes metric names and labels to time series in the chunks directory
  • meta.json: block info
  • wal(write-ahead log):
    • The current block for incoming samples is kept in memory and is not fully persisted. It is secured against crashes by a write-ahead log (WAL) that can be replayed when the Prometheus server restarts.
    • files are stored in the wal directory in 128MB segments, which are significantly larger than regular block files (not yet been compacted)
    • minimum of 3 write-ahead log files. High-traffic servers may retain more than 3 WAL files in order to keep at least 2 hours of raw data.

PromQL

Time series Selectors

Instant Vector

Instant vector selectors allow the selection of a set of time series and a single sample value for each at a given timestamp (instant)

# only metric name
http_requests_total

# with labels
http_requests_total{job="prometheus",group="canary"}

# with regex
http_requests_total{environment=~"staging|testing|development",method!="GET"}

Range Vector Selectors

Range vector literals work like instant vector literals, except that they select a range of samples back from the current instant

http_requests_total{job="prometheus"}[5m]

Offset Modifier

The offset modifier allows changing the time offset for individual instant and range vectors in a query.

# returns the 5-minute rate that http_requests_total had a week ago
rate(http_requests_total[5m] offset 1w)

Functions

There are some common functions in PromQL which are used in most popular queries and scenarios.

1. rate() && irate()

see details

2. histogram_quantile()

How P99 is calculated?

Reference