Akshith Varma Chittiveli

• 5 min read

Your database might be fast… and still feel slow

You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”

Your database might be fast… and still feel slow

You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”

And yet — users complain.

Requests randomly take seconds. APIs time out. Dashboards hang.

This is the uncomfortable truth: Systems don’t fail because of average latency — they fail because of tail latency.

What latency actually means

Latency is simply the time your system takes to respond to a request.

You don’t measure it once — you measure it across thousands or millions of requests.

That distribution is what actually matters.

Why averages are misleading

Averages hide the exact thing that hurts your system.

Let’s say your database handles 100 requests:

95 requests → 10 ms
5 requests → 2 seconds

Your average latency? ~110 ms Looks acceptable.

But those 5 slow requests?

They cause user-visible delays
They trigger retries
They can cascade into failures

The average tells you everything is fine. Your users tell you it’s not.

What p95 and p99 latency actually mean

This is where percentiles come in.

p95 latency → 95% of requests are faster than this
p99 latency → 99% of requests are faster than this

So if:

p95 = 50 ms
p99 = 800 ms

It means:

Most requests are fast
But a small percentage are very slow

That small percentage is your tail latency.

And that’s what users remember.

Why p99 matters in real systems

Users don’t experience averages. They experience the worst-case moments.

One slow request can:

Block a page load
Delay a payment
Break an API chain
Trigger retries and duplicate work

In distributed systems, this gets worse.

A single slow database query can:

Hold a thread
Delay downstream services
Cause queue buildup
Lead to timeouts and cascading failures

One bad request is rarely isolated.

Where tail latency comes from

Tail latency isn’t random. It has very real causes:

Resource contention CPU spikes, thread pool exhaustion, lock contention
Disk I/O delays Cache misses, slow reads, compaction
Network variability Cross-region calls, packet delays
Garbage collection / pauses JVM GC, memory pressure
Distributed coordination Consensus protocols, replication delays

In modern systems, these are not edge cases — they’re normal conditions.

Impact at scale (this is where things break)

As traffic increases, something subtle happens:

The probability of slow requests increases.

Even if only 1% of requests are slow:

At 100 RPS → 1 slow request/sec
At 10,000 RPS → 100 slow requests/sec

Now your system is constantly dealing with slow paths.

And worse:

Retries amplify load
Queues start forming
Latency compounds

Scaling doesn’t just increase throughput — it amplifies tail latency.

This is why systems that look fine in staging collapse in production.

The distributed systems effect

Modern applications don’t make one request.

They fan out.

A single user request might involve:

API gateway
3 backend services
2 database calls
1 cache lookup

Total latency becomes:

The sum of multiple components — each with its own tail

If each component has a small chance of being slow, the combined probability grows fast.

This is why:

Microservices architectures are sensitive to p99
Fan-out queries amplify tail latency
One slow dependency slows everything

In practice:

Your system is only as fast as its slowest dependency.

Common mistakes engineers make

Most systems don’t fail because of lack of optimization. They fail because of wrong metrics.

Common mistakes:

Optimizing for average latency only
Ignoring p95 / p99 metrics
Not testing under realistic load
Assuming “fast enough” without measuring distribution
Treating latency as a single number instead of a curve

This leads to systems that look good on dashboards — but break under pressure.

How to think about performance

If you’re trying to design a reliable system, shift your mindset:

1. Track the right metrics

Always monitor p95 and p99
Treat averages as secondary

2. Design for predictability

Consistency > peak speed
Stable latency beats occasional spikes

3. Reduce variance, not just latency

Eliminate long tails
Smooth out spikes

4. Understand your workload

Different systems have different tolerance:

Real-time systems → extremely sensitive to p99
Batch systems → more tolerant
User-facing APIs → dominated by tail latency

This is why database selection is not just about speed — it’s about latency behavior under load

Practical takeaway

Average latency is not enough
Tail latency defines user experience
Systems fail at the edges, not the center

If you remember one thing:

A system that is fast most of the time can still feel slow — because of the worst 1%.

A better way to evaluate databases

Choosing the best database for your application isn’t just about throughput or features.

It’s about:

How it behaves under load
How it handles contention
How predictable its latency is

If you want a structured way to think through this:

https://whatdbshouldiuse.com

It helps you evaluate databases based on real workload characteristics — including latency behavior, not just averages.