Updated: 2021-10-24 01:52:17 +0800 +08

Prometheus Storage Optimization

Storage Size Evaluation

needed_disk_space = retention_time_seconds * ingested_samples_per_second * bytes_per_sample (average 1-2 bytes per sample)

Data Structure

The current block for incoming samples is kept in memory and is not fully persisted.

  • ./data
    • 01BKGTZQ1SYQJTR4PB43C8PD98 (Ingested samples are grouped into blocks of two hours)
      • chunks (containing all the time series samples for that window of time)
        • 000001 (segment files of up to 512MB)
      • tombstones (When series are deleted via the API, deletion records are stored here)
      • index (indexes metric names and labels to time series in the chunks directory)
      • meta.json
    • chunks_head
      • 000001
    • wal (keep at least two hours of raw data; replayed when the Prometheus server restarts; has not yet been compacted)
      • 000000002 (stored in the wal directory in 128MB segments;retain a minimum of three write-ahead log files)
      • checkpoint.00000001
        • 00000000

take up to two hours to remove expired blocks

Find largest volume metrics

Delete metrics

https://prometheus.io/docs/prometheus/latest/querying/api/#delete-series

$ curl -X POST \
  -g 'http://your.prometheus.host:9090/api/v1/admin/tsdb/delete_series? \
  match[]=up& \
  match[]=http_request{status="200"}& \
  start=1595521900& \
  end=1595521990'

Use relabel to reduce metrics

1 target could have N metrics

  • reduce target by relabel_configs
  • reduce metrics from target by source_labels in metric_relabel_configs
  • reduct label from metrics by directly use metric_relabel_configs
scrape_configs:
  - job_name: 'node'
    ec2_sd_configs:
      - region: us-west-2
        port: 8080
        # only discover tag Name is myapp
        filters:
          - name: "tag:name"
            values:
              - "myapp"
    relabel_configs:
      # Drop all metrics from windows server
      - source_labels: [__meta_ec2_platform]
        regex: windows
        action: drop
    metric_relabel_configs:
    # keep only http_request with label status 20X metric
    - source_labels: [__name__, status]
      regex: 'http_request;20.'
      action: keep
    # drop all source ip label
    - regex: 'source_ip'
      action: labeldrop

Care must be taken with labeldrop and labelkeep to ensure that metrics are still uniquely labeled once the labels are removed.

Reference