How CloudFlare Serves 20% of Internet Traffic with Only 15 Postgres Clusters

Listen to this Post

CloudFlare efficiently manages 20% of internet traffic using just 15 Postgres clusters. Here’s how they achieve this:

  • Metadata Storage & OLTP Workloads: They store service metadata and handle OLTP workloads in Postgres.
  • Connection Pooling: PgBouncer is used for connection pooling to prevent resource starvation and the thundering herd problem.
  • Bare Metal Servers: Postgres runs on bare metal servers for optimal performance.
  • Load Balancing: HAProxy is used to load balance traffic across databases.
  • Congestion Avoidance: A congestion avoidance algorithm throttles tenants to maintain system stability.
  • Priority Queues: Queries are ordered based on resource usage using priority queues.
  • High Availability: Stolon cluster manager ensures data replication for high availability.
  • Chaos Testing: Regular chaos testing is conducted to ensure system resilience.
  • Failover Synchronization: The `pg_rewind` tool synchronizes Postgres clusters after failover.

You Should Know:

Here are some practical commands and tools related to the article:

1. PgBouncer Configuration:


<h1>Install PgBouncer</h1>

sudo apt-get install pgbouncer

<h1>Basic PgBouncer configuration</h1>

[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

2. HAProxy Setup:


<h1>Install HAProxy</h1>

sudo apt-get install haproxy

<h1>Sample HAProxy configuration for Postgres</h1>

frontend pg_frontend
bind *:5000
default_backend pg_backend

backend pg_backend
balance roundrobin
server pg1 192.168.1.101:5432 check
server pg2 192.168.1.102:5432 check

3. Postgres Failover with pg_rewind:


<h1>Synchronize a standby server after failover</h1>

pg_rewind --target-pgdata=/var/lib/postgresql/12/main --source-server="host=192.168.1.100 port=5432 user=postgres dbname=mydb"

4. Chaos Testing with Chaos Mesh:


<h1>Install Chaos Mesh in Kubernetes</h1>

helm repo add chaos-mesh https://charts.chaos-mesh.org
helm install chaos-mesh chaos-mesh/chaos-mesh --namespace=chaos-testing

5. Stolon Cluster Management:


<h1>Initialize Stolon cluster</h1>

stolonctl init --cluster-name=mycluster --store-backend=etcd

What Undercode Say:

CloudFlare’s approach to scaling Postgres is a masterclass in system design. By leveraging tools like PgBouncer, HAProxy, and Stolon, they ensure high availability, performance, and resilience. Their use of chaos testing and failover synchronization further strengthens their infrastructure. For those looking to implement similar systems, the commands and configurations provided above can serve as a starting point.

For further reading, check out these resources:

References:

Reported By: Nk Systemdesign – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

Whatsapp
TelegramFeatured Image