Tuning Valkey on Graviton: Achieving 11M+ RPS

Listen to this Post

Featured Image
Khawaja Shams, Co-Founder & CEO at Momento, shared insights on tuning Valkey on a c8g.2xl instance, highlighting the impressive performance of Graviton chips. The setup achieved over 1.1 million requests per second (RPS) on a single 8 vCPU box with consistent tail latencies—without pipelining.

Key Observations:

  1. IRQ Processing Efficiency – Only 2 Graviton cores handle packet processing to support 1M+ RPS, visible as “red” in monitoring tools.
  2. Main Valkey Thread – Core 6 was fully saturated, indicating optimal workload distribution.
  3. Thermal & Performance Stability – Unlike x86, Graviton maintains consistent tail latencies even at ~100% CPU utilization.

Graviton’s architecture ensures high throughput while keeping latency predictable—ideal for high-performance databases like Valkey.

You Should Know:

Performance Tuning Commands & Tools

To replicate such performance, use these Linux commands and tools:

1. Monitor CPU & IRQ Activity

mpstat -P ALL 1  Per-core CPU utilization 
top -H -p $(pgrep valkey)  Thread-level CPU usage 
cat /proc/interrupts  Check IRQ distribution 

2. Isolate CPU Cores for IRQ Handling

sudo systemctl set-property --runtime -- user.slice AllowedCPUs=0,1  Reserve cores 0-1 
echo 0 > /proc/irq//smp_affinity_list  Bind IRQs to core 0 

3. Optimize Valkey (Redis-compatible) Configuration

 In valkey.conf 
maxmemory 16gb 
io-threads 4  Match vCPU count 
disable-thp yes  Disable Transparent HugePages 

4. Network Tuning for High RPS

sudo sysctl -w net.core.somaxconn=65535 
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=8192 
sudo ethtool -C eth0 rx-usecs 10  Reduce NIC interrupt delay 

5. Benchmark with `redis-benchmark`

redis-benchmark -h 127.0.0.1 -p 6379 -t set,get -n 1000000 -c 32 -P 16 

What Undercode Say

Graviton’s ARM-based architecture outperforms x86 in sustained high-load scenarios, making it ideal for real-time databases. Key takeaways:
– IRQ Optimization is critical—dedicate cores to avoid contention.
– Thread Pinning ensures deterministic performance.
– Network Stack Tuning reduces bottlenecks at high RPS.

For further reading:

Prediction

As cloud providers adopt Graviton3/4, expect ~30% better price-performance for in-memory databases. Hybrid x86-ARM clusters may emerge for cost-sensitive workloads.

Expected Output:

A high-performance Valkey setup on Graviton with:

✔ 1M+ RPS at sub-millisecond latency

✔ Dedicated IRQ cores for stability

✔ Optimized network & thread scheduling

IT/Security Reporter URL:

Reported By: Kshams Valkey – Hackers Feeds
Extra Hub: Undercode MoN
Basic Verification: Pass ✅

Join Our Cyber World:

💬 Whatsapp | 💬 Telegram