Your database might be fast… and still feel slow
You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”
Your database might be fast… and still feel slow
You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”
And yet — users complain.
Requests randomly take seconds. APIs time out. Dashboards hang.
This is the uncomfortable truth: Systems don’t fail because of average latency — they fail because of tail latency.
What latency actually means
Latency is simply the time your system takes to respond to a request.
You don’t measure it once — you measure it across thousands or millions of requests.
That distribution is what actually matters.
Why averages are misleading
Averages hide the exact thing that hurts your system.
Let’s say your database handles 100 requests:
- 95 requests → 10 ms
- 5 requests → 2 seconds
Your average latency? ~110 ms Looks acceptable.
But those 5 slow requests?
- They cause user-visible delays
- They trigger retries
- They can cascade into failures
The average tells you everything is fine. Your users tell you it’s not.
What p95 and p99 latency actually mean
This is where percentiles come in.
- p95 latency → 95% of requests are faster than this
- p99 latency → 99% of requests are faster than this
So if:
- p95 = 50 ms
- p99 = 800 ms
It means:
- Most requests are fast
- But a small percentage are very slow
That small percentage is your tail latency.
And that’s what users remember.
Why p99 matters in real systems
Users don’t experience averages. They experience the worst-case moments.
One slow request can:
- Block a page load
- Delay a payment
- Break an API chain
- Trigger retries and duplicate work
In distributed systems, this gets worse.
A single slow database query can:
- Hold a thread
- Delay downstream services
- Cause queue buildup
- Lead to timeouts and cascading failures
One bad request is rarely isolated.
Where tail latency comes from
Tail latency isn’t random. It has very real causes:
Resource contention CPU spikes, thread pool exhaustion, lock contention
Disk I/O delays Cache misses, slow reads, compaction
Network variability Cross-region calls, packet delays
Garbage collection / pauses JVM GC, memory pressure
Distributed coordination Consensus protocols, replication delays
In modern systems, these are not edge cases — they’re normal conditions.
Impact at scale (this is where things break)
As traffic increases, something subtle happens:
The probability of slow requests increases.
Even if only 1% of requests are slow:
- At 100 RPS → 1 slow request/sec
- At 10,000 RPS → 100 slow requests/sec
Now your system is constantly dealing with slow paths.
And worse:
- Retries amplify load
- Queues start forming
- Latency compounds
Scaling doesn’t just increase throughput — it amplifies tail latency.
This is why systems that look fine in staging collapse in production.
The distributed systems effect
Modern applications don’t make one request.
They fan out.
A single user request might involve:
- API gateway
- 3 backend services
- 2 database calls
- 1 cache lookup
Total latency becomes:
The sum of multiple components — each with its own tail
If each component has a small chance of being slow, the combined probability grows fast.
This is why:
- Microservices architectures are sensitive to p99
- Fan-out queries amplify tail latency
- One slow dependency slows everything
In practice:
Your system is only as fast as its slowest dependency.
Common mistakes engineers make
Most systems don’t fail because of lack of optimization. They fail because of wrong metrics.
Common mistakes:
- Optimizing for average latency only
- Ignoring p95 / p99 metrics
- Not testing under realistic load
- Assuming “fast enough” without measuring distribution
- Treating latency as a single number instead of a curve
This leads to systems that look good on dashboards — but break under pressure.
How to think about performance
If you’re trying to design a reliable system, shift your mindset:
1. Track the right metrics
- Always monitor p95 and p99
- Treat averages as secondary
2. Design for predictability
- Consistency > peak speed
- Stable latency beats occasional spikes
3. Reduce variance, not just latency
- Eliminate long tails
- Smooth out spikes
4. Understand your workload
Different systems have different tolerance:
- Real-time systems → extremely sensitive to p99
- Batch systems → more tolerant
- User-facing APIs → dominated by tail latency
This is why database selection is not just about speed — it’s about latency behavior under load
Practical takeaway
- Average latency is not enough
- Tail latency defines user experience
- Systems fail at the edges, not the center
If you remember one thing:
A system that is fast most of the time can still feel slow — because of the worst 1%.
A better way to evaluate databases
Choosing the best database for your application isn’t just about throughput or features.
It’s about:
- How it behaves under load
- How it handles contention
- How predictable its latency is
If you want a structured way to think through this:
It helps you evaluate databases based on real workload characteristics — including latency behavior, not just averages.