WhatDbShouldIUse
Akshith Varma Chittiveli Akshith Varma Chittiveli
5 min read

Your database might be fast… and still feel slow

You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”

Your database might be fast… and still feel slow

You check your metrics and everything looks fine. Average latency is low. Queries are “fast.”

And yet — users complain.

Requests randomly take seconds. APIs time out. Dashboards hang.

This is the uncomfortable truth: Systems don’t fail because of average latency — they fail because of tail latency.


What latency actually means

Latency is simply the time your system takes to respond to a request.

You don’t measure it once — you measure it across thousands or millions of requests.

That distribution is what actually matters.


Why averages are misleading

Averages hide the exact thing that hurts your system.

Let’s say your database handles 100 requests:

  • 95 requests → 10 ms
  • 5 requests → 2 seconds

Your average latency? ~110 ms Looks acceptable.

But those 5 slow requests?

  • They cause user-visible delays
  • They trigger retries
  • They can cascade into failures

The average tells you everything is fine. Your users tell you it’s not.


What p95 and p99 latency actually mean

This is where percentiles come in.

  • p95 latency → 95% of requests are faster than this
  • p99 latency → 99% of requests are faster than this

So if:

  • p95 = 50 ms
  • p99 = 800 ms

It means:

  • Most requests are fast
  • But a small percentage are very slow

That small percentage is your tail latency.

And that’s what users remember.


Why p99 matters in real systems

Users don’t experience averages. They experience the worst-case moments.

One slow request can:

  • Block a page load
  • Delay a payment
  • Break an API chain
  • Trigger retries and duplicate work

In distributed systems, this gets worse.

A single slow database query can:

  • Hold a thread
  • Delay downstream services
  • Cause queue buildup
  • Lead to timeouts and cascading failures

One bad request is rarely isolated.


Where tail latency comes from

Tail latency isn’t random. It has very real causes:

  • Resource contention CPU spikes, thread pool exhaustion, lock contention

  • Disk I/O delays Cache misses, slow reads, compaction

  • Network variability Cross-region calls, packet delays

  • Garbage collection / pauses JVM GC, memory pressure

  • Distributed coordination Consensus protocols, replication delays

In modern systems, these are not edge cases — they’re normal conditions.


Impact at scale (this is where things break)

As traffic increases, something subtle happens:

The probability of slow requests increases.

Even if only 1% of requests are slow:

  • At 100 RPS → 1 slow request/sec
  • At 10,000 RPS → 100 slow requests/sec

Now your system is constantly dealing with slow paths.

And worse:

  • Retries amplify load
  • Queues start forming
  • Latency compounds

Scaling doesn’t just increase throughput — it amplifies tail latency.

This is why systems that look fine in staging collapse in production.


The distributed systems effect

Modern applications don’t make one request.

They fan out.

A single user request might involve:

  • API gateway
  • 3 backend services
  • 2 database calls
  • 1 cache lookup

Total latency becomes:

The sum of multiple components — each with its own tail

If each component has a small chance of being slow, the combined probability grows fast.

This is why:

  • Microservices architectures are sensitive to p99
  • Fan-out queries amplify tail latency
  • One slow dependency slows everything

In practice:

Your system is only as fast as its slowest dependency.


Common mistakes engineers make

Most systems don’t fail because of lack of optimization. They fail because of wrong metrics.

Common mistakes:

  • Optimizing for average latency only
  • Ignoring p95 / p99 metrics
  • Not testing under realistic load
  • Assuming “fast enough” without measuring distribution
  • Treating latency as a single number instead of a curve

This leads to systems that look good on dashboards — but break under pressure.


How to think about performance

If you’re trying to design a reliable system, shift your mindset:

1. Track the right metrics

  • Always monitor p95 and p99
  • Treat averages as secondary

2. Design for predictability

  • Consistency > peak speed
  • Stable latency beats occasional spikes

3. Reduce variance, not just latency

  • Eliminate long tails
  • Smooth out spikes

4. Understand your workload

Different systems have different tolerance:

  • Real-time systems → extremely sensitive to p99
  • Batch systems → more tolerant
  • User-facing APIs → dominated by tail latency

This is why database selection is not just about speed — it’s about latency behavior under load


Practical takeaway

  • Average latency is not enough
  • Tail latency defines user experience
  • Systems fail at the edges, not the center

If you remember one thing:

A system that is fast most of the time can still feel slow — because of the worst 1%.


A better way to evaluate databases

Choosing the best database for your application isn’t just about throughput or features.

It’s about:

  • How it behaves under load
  • How it handles contention
  • How predictable its latency is

If you want a structured way to think through this:

https://whatdbshouldiuse.com

It helps you evaluate databases based on real workload characteristics — including latency behavior, not just averages.