講者資訊

熊崇緯 (Chungwei)

Taboola
R&D Team Lead, Infrastructure

LinkedIn, an engineering leader who has delivered multiple complex SaaS platforms while managing more than 30 Software engineers. He enables Product Owners, Engineers, QA, DS, and DevOps into effective teams via Agile methodology. He also had worked with API for processing > 100 million requests per month and, for a legacy system, reduced system outage from 30+ to 0 per month.

演講議程

2026-06-26 | 15:30 - 15:55 DE 會議室

每分鐘超過 5 億筆可觀測性服務

Taboola ingests 500M+ unique metrics per minute across seven data centers, with a hybrid fleet of physical servers and Kubernetes.

This session explains how we keep ingestion stable, query latency predictable, and costs under control.

We’ll walk through the real architecture decisions and tradeoffs:

Physical layer (Puppet managed Prometheus): per DC scraping, and long retention.
Thanos integration: sidecars for per DC exposure, and rulers for local and cross DC rules.
Kubernetes layer (Helm): per DC query tier, cross DC query in IL.
High card strategy: label hygiene, shard boundaries, short term vs long term query paths, and when to offload to compactor/store gateway.
Operational lessons: where the architecture breaks first, what we changed, and how we keep it predictable at scale.

The goal is to share concrete practices and a reusable mental model for anyone trying to scale Prometheus + Thanos across multiple DCs.

聽眾收穫：

How to split metrics from “regular” metrics without losing visibility
How to combine per DC isolation with cross DC global views using Thanos
Practical sharding and retention strategies for large scale Prometheus deployments
A reference architecture for hybrid (physical + K8s) observability stacks

詳細介紹