Ask most developers what their API latency is and they will give you an average. That number is almost always misleading. Latency percentiles tell a richer, more honest story about how your API actually performs for real users — including the ones who are suffering the most.
Why Averages Mislead
Averages collapse a distribution into a single number. A handful of very slow requests can be completely hidden. If 95% of requests complete in 80ms but 5% take 3,000ms, the average might read 230ms — technically accurate, but deeply misleading. Those slow requests represent real users having a terrible experience that your average will never reveal.
What P50 Means
P50 is the median: half of all requests are faster than this value and half are slower. It is a better measure of 'typical' performance than the mean because it is not distorted by extreme outliers. If your P50 is 90ms, most of your users are experiencing sub-100ms responses — which is a useful signal.
P95 and P99: The Slow Tail
P95 means 95% of requests are faster than this value. Only the slowest 5% exceed it. P99 is even more conservative: 99% of requests fall below it. Engineers call this 'tail latency' — it is where performance bugs hide and where your most frustrated users live. In user-facing systems, the P99 experience is the experience that generates support tickets and churn.
Good Latency Targets to Aim For
- P50 under 100ms for internal or backend APIs is a solid baseline
- P95 under 500ms for user-facing endpoints — above this, users start perceiving sluggishness
- P99 under 1,000ms — anything above this threshold, users actively notice the delay
- P99 above 2,000ms is a serious reliability problem that demands immediate attention
Tracking Percentiles in Practice
Most monitoring tools report averages by default because they are cheap to compute. True percentile tracking requires storing individual timing samples and computing distributions across them. For production use, you want per-route percentiles — a single aggregate across all endpoints tells you almost nothing useful, since one slow endpoint can hide behind dozens of fast ones.
Statvisor tracks P50, P95, and P99 latency per route, updated in real time. You can immediately identify which specific endpoint is responsible for slow tail latency without digging through raw logs.
Ready to monitor your API in production?
Statvisor gives you latency percentiles, error rates, and request volume for every route — in minutes, not days.
Get started free →