Taming High Cardinality Metrics
Master high cardinality metrics at scale with proven Prometheus and ClickHouse patterns from production deployments. Optimize your observability today.
The High Cardinality Problem
In my years managing cloud infrastructure at scale, I’ve encountered the high cardinality metrics problem more times than I’d like to admit. It starts innocently—developers add a user ID label to a metric, or someone decides to track every API endpoint variation. Suddenly, your monitoring system is drowning.
High cardinality happens when metrics have labels with many unique values. A metric tracking HTTP requests with labels like user_id, endpoint, and status_code can explode into millions of unique time series. I’ve seen production Prometheus instances consume 500GB of memory trying to handle cardinality that grew unchecked.
The real challenge isn’t just storage—it’s query performance. When you have millions of time series, even simple queries can timeout. Your alerts slow down, dashboards fail to load, and suddenly your monitoring system needs monitoring.
Diagnose High Cardinality in Production
Let me share a real example from a microservices deployment I worked on. We had a service mesh tracking requests between 50 services. Each metric included labels for:
# High cardinality metric labels
- source_service: 50 values
- destination_service: 50 values
- http_method: 7 values
- status_code: 50+ values
- endpoint: 500+ unique paths
The math is brutal: 50 × 50 × 7 × 50 × 500 = 437.5 million potential time series. Even with sparse data, we were generating tens of millions of active series.
The symptoms appeared gradually:
- Prometheus scrape intervals started timing out
- Query response times jumped from milliseconds to seconds
- Memory usage climbed relentlessly
- Eventually, Prometheus crashed during startup trying to load the write-ahead log
Optimize Prometheus for High Cardinality
Prometheus wasn’t designed for high cardinality. Its in-memory storage model assumes thousands to hundreds of thousands of series—not millions. When you exceed that threshold, performance degrades exponentially.
I’ve learned several patterns to keep Prometheus healthy:
Pattern 1: Aggressive Label Reduction
The first step is ruthless label pruning. In that service mesh example, I eliminated the endpoint label entirely and replaced it with a parameterized version:
# Before: High cardinality
http_requests_total{endpoint="/api/users/12345"}
http_requests_total{endpoint="/api/users/67890"}
# After: Low cardinality
http_requests_total{endpoint="/api/users/:id"}
This single change reduced cardinality by 10x. We implemented it in our application instrumentation:
from prometheus_client import Counter
import re
request_counter = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint_pattern', 'status']
)
def normalize_endpoint(path):
"""Normalize endpoint paths to reduce cardinality"""
# Replace UUIDs
path = re.sub(r'/[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}', '/:uuid', path)
# Replace numeric IDs
path = re.sub(r'/\d+', '/:id', path)
return path
def track_request(method, path, status):
normalized = normalize_endpoint(path)
request_counter.labels(method=method, endpoint_pattern=normalized, status=status).inc()
Pattern 2: Recording Rules for Aggregation
Prometheus recording rules let you pre-aggregate high cardinality metrics into lower cardinality ones. I use this extensively:
groups:
- name: cardinality_reduction
interval: 30s
rules:
# Aggregate per-user metrics to per-service
- record: service:http_requests:rate5m
expr: |
sum by (service, status) (
rate(http_requests_total[5m])
)
# Keep detailed metrics for errors only
- record: service:http_errors:rate5m
expr: |
sum by (service, endpoint, status) (
rate(http_requests_total{status=~"5.."}[5m])
)
This approach gives you aggregated metrics for dashboards while preserving detailed labels only for error cases where you need them for debugging.
Pattern 3: Relabeling at Scrape Time
Prometheus relabel configs are powerful for controlling cardinality before metrics hit storage:
scrape_configs:
- job_name: 'api-servers'
relabel_configs:
# Drop high cardinality metrics entirely
- source_labels: [__name__]
regex: 'high_cardinality_metric_.*'
action: drop
# Limit label value length
- source_labels: [endpoint]
regex: '(.{50}).*'
target_label: endpoint
replacement: '${1}'
# Drop specific label values
- source_labels: [user_id]
regex: '.*'
action: labeldrop
Scale High Cardinality with ClickHouse
When Prometheus patterns aren’t enough, I turn to ClickHouse. Unlike Prometheus, ClickHouse is a columnar database built to handle billions of rows efficiently. It excels at high cardinality scenarios.
I recently migrated a logging pipeline from Elasticsearch to ClickHouse. The difference was staggering—query performance improved 50x and storage costs dropped 70%.
ClickHouse Schema Design
The key to ClickHouse performance is proper schema design. Here’s a metrics table I use:
CREATE TABLE metrics_distributed
(
timestamp DateTime,
metric_name LowCardinality(String),
value Float64,
labels Map(String, String),
-- Materialized columns for common labels
service LowCardinality(String) MATERIALIZED labels['service'],
environment LowCardinality(String) MATERIALIZED labels['environment'],
-- Ordering key for time-series queries
INDEX GRANULARITY 8192
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (metric_name, service, timestamp);
The LowCardinality type is crucial—it provides dictionary encoding that dramatically reduces storage for repeated values. The Map type handles arbitrary labels without schema changes.
Efficient Querying
ClickHouse’s columnar storage makes aggregations incredibly fast:
-- 95th percentile latency by service over 24 hours
SELECT
service,
quantile(0.95)(value) as p95_latency
FROM metrics_distributed
WHERE
metric_name = 'http_request_duration_seconds'
AND timestamp >= now() - INTERVAL 24 HOUR
GROUP BY service
ORDER BY p95_latency DESC;
This query processes millions of rows in under a second. The secret is that ClickHouse only reads the columns needed and uses the sorting key to skip irrelevant data blocks.
Retention Policies
ClickHouse’s partitioning makes retention policies trivial:
-- Drop data older than 90 days
ALTER TABLE metrics_distributed
DROP PARTITION '20260101';
-- Automatic TTL-based deletion
ALTER TABLE metrics_distributed
MODIFY TTL timestamp + INTERVAL 90 DAY;
I configure different retention periods per metric type:
CREATE TABLE metrics_with_ttl
(
timestamp DateTime,
metric_name LowCardinality(String),
value Float64,
labels Map(String, String),
service LowCardinality(String) MATERIALIZED labels['service']
)
ENGINE = MergeTree()
PARTITION BY toYYYYMMDD(timestamp)
ORDER BY (metric_name, service, timestamp)
TTL
timestamp + INTERVAL 7 DAY WHERE metric_name LIKE 'debug_%',
timestamp + INTERVAL 90 DAY;
Deploy Hybrid Metrics Architecture
In production, I often run both Prometheus and ClickHouse. Prometheus handles real-time alerting with low-cardinality metrics, while ClickHouse stores detailed historical data.
The architecture looks like this:
┌──────────────┐
│ Applications │
└──────┬───────┘
│ Expose metrics
▼
┌──────────────┐ ┌─────────────────┐
│ Prometheus │─────▶│ ClickHouse │
│ (Scrape) │ │ (Long-term) │
└──────┬───────┘ └─────────────────┘
│ ▲
│ Alerts │ Queries
▼ │
┌──────────────┐ ┌──────┴──────────┐
│ Alertmanager│ │ Grafana │
└──────────────┘ └─────────────────┘
I use Prometheus remote write to forward metrics to ClickHouse:
# prometheus.yml
remote_write:
- url: http://clickhouse-writer:9090/write
queue_config:
capacity: 100000
max_samples_per_send: 10000
batch_send_deadline: 10s
write_relabel_configs:
# Only send high-value metrics to ClickHouse
- source_labels: [__name__]
regex: '(important_metric|critical_gauge).*'
action: keep
The remote write endpoint is a custom service that batches metrics and inserts them into ClickHouse efficiently:
package main
import (
"github.com/ClickHouse/clickhouse-go/v2"
"github.com/prometheus/prometheus/prompb"
)
type ClickHouseWriter struct {
conn clickhouse.Conn
}
func (w *ClickHouseWriter) Write(req *prompb.WriteRequest) error {
batch, err := w.conn.PrepareBatch("INSERT INTO metrics_distributed")
if err != nil {
return err
}
for _, ts := range req.Timeseries {
labels := make(map[string]string)
metricName := ""
for _, label := range ts.Labels {
if label.Name == "__name__" {
metricName = label.Value
} else {
labels[label.Name] = label.Value
}
}
for _, sample := range ts.Samples {
err = batch.Append(
sample.Timestamp / 1000, // Convert to seconds
metricName,
sample.Value,
labels,
)
if err != nil {
return err
}
}
}
return batch.Send()
}
Practical Lessons Learned
After managing high cardinality metrics across dozens of production environments, here are my key takeaways:
1. Cardinality is a Product Decision
Every label you add has a cost. I now involve product teams in discussions about metric instrumentation. If they want per-user tracking, we talk about the operational costs and explore alternatives like sampling.
2. Monitor Your Monitoring
I treat Prometheus itself as critical infrastructure. We alert on:
- Time series cardinality trends
- Scrape duration
- Memory usage growth rate
- Query latency percentiles
This catches cardinality explosions before they become outages.
3. Sample Strategically
For truly high cardinality scenarios, sampling is your friend. I use exemplars in Prometheus to sample detailed traces while keeping metric cardinality low:
from opentelemetry import trace
from prometheus_client import Histogram
latency_histogram = Histogram(
'http_request_duration_seconds',
'HTTP request latency',
['method', 'endpoint_pattern'],
# Enable exemplars
exemplar_config={'exemplar_exemplars': True}
)
tracer = trace.get_tracer(__name__)
def handle_request(method, path):
with tracer.start_as_current_span("http_request") as span:
start = time.time()
# ... handle request ...
duration = time.time() - start
# Record histogram with trace context
latency_histogram.labels(
method=method,
endpoint_pattern=normalize_endpoint(path)
).observe(duration)
4. Design for Scale from Day One
It’s much harder to fix cardinality problems after they’re in production. I now enforce these rules in code review:
- Maximum 8 labels per metric
- No unbounded label values (IDs, UUIDs, emails)
- Required cardinality estimates in PRs that add metrics
5. Use the Right Tool
Prometheus is excellent for operational metrics with alerting. ClickHouse shines for analytics and historical queries. Don’t force one tool to do everything.
Conclusion
High cardinality metrics don’t have to be a nightmare. With proper label design, aggressive aggregation, and the right storage backend, you can maintain observability at any scale.
The key is understanding the tradeoffs. Prometheus gives you real-time alerting with low latency, but requires discipline around cardinality. ClickHouse handles billions of rows effortlessly, but has higher query latency.
In my infrastructure, I use both: Prometheus for what’s happening right now, ClickHouse for understanding what happened over time. This hybrid approach has served me well across multiple companies and billions of metrics per day.
Start by auditing your current cardinality. Run curl localhost:9090/api/v1/status/tsdb against your Prometheus instance and look at the series count. If you’re north of a million series, it’s time to take action. Your monitoring system will thank you.