DevOpsDays Taipei 2026

Taboola ingests 500M+ unique metrics per minute across seven data centers, with a hybrid fleet of physical servers and Kubernetes.

This session explains how we keep ingestion stable, query latency predictable, and costs under control.

We’ll walk through the real architecture decisions and tradeoffs:

Physical layer (Puppet managed Prometheus): per DC scraping, and long retention.
Thanos integration: sidecars for per DC exposure, and rulers for local and cross DC rules.
Kubernetes layer (Helm): per DC query tier, cross DC query in IL.
High card strategy: label hygiene, shard boundaries, short term vs long term query paths, and when to offload to compactor/store gateway.
Operational lessons: where the architecture breaks first, what we changed, and how we keep it predictable at scale.

The goal is to share concrete practices and a reusable mental model for anyone trying to scale Prometheus + Thanos across multiple DCs.

聽眾收穫：

How to split metrics from “regular” metrics without losing visibility
How to combine per DC isolation with cross DC global views using Thanos
Practical sharding and retention strategies for large scale Prometheus deployments
A reference architecture for hybrid (physical + K8s) observability stacks

講者

進階

中文

DevOps老司機 (DevOps Veteran)IT人員 / 偏開發 (IT / DEV)IT人員 / 全都做 (IT / I have to do everything)