When Your Trace Is Lying to You: A Performance Case Study
We ran into a performance issue recently. Requests were taking 2–4 seconds, multiple attempts had already been made to improve them, and nothing really moved.
Notably, we couldn’t explain why the requests were slow.
Carlos Bueno’s The Mature Optimization Handbook has been sitting in the back of my head through all of this. His core warning is simple: optimization is hard to do well, and it is easy to turn it into ritualized guesswork. I have done that more times than I would like to admit.
The Actual Problem
Most performance work starts this way: You read code, form a mental model, optimize what seems expensive, and repeat until something appears to move. The assumption underneath that loop is that your mental model is a reasonable approximation of how the system behaves under real load, with real data, and real users. It rarely is. Your intuition is useful for generating hypotheses, but it is a terrible way to decide where the time is actually going.
Phase 1 — The Misleading Trace
We had tracing, and at first glance it looked reasonable. It showed database calls, request flow, and auth. Nothing obviously broken jumped out, which is exactly what made it dangerous. Here is what the trace amounted to:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
GET /users/me — 4,722ms total
WHAT WE CAN SEE:
DB query 79ms
DB query 12ms
DB query 6ms
DB query 34ms
DB query 25ms
DB query 3ms
DB query 2ms
UPDATE 2ms
... (~15 more small spans)
Visible subtotal: ~180–320ms
WHAT WE CAN'T SEE:
??? the rest
TOTAL: 4,722ms
In other words, the trace is lying to you by omission.
That is strictly worse than having no trace at all. With no data, you know you are guessing. With incomplete data, you think you are reasoning, which is how teams end up spending days on entirely plausible explanations:
- “DB queries look chatty.”
- “Auth is a bit slow.”
- “Maybe N+1.”
All of those are reasonable hypotheses. None of them mattered here. If I ask you to optimize this the right answer is: I can’t.
What’s Missing
Before you can optimize anything, you need a problem you can prove or disprove. Bueno’s formulation is: if the problem statement is not specific enough to suggest a fix and precise enough to tell you whether the fix worked, then you do not yet have a real optimization problem.
Until you have that, you are not solving a performance problem. You are exploring a codebase in the general direction of slowness and calling it progress.
Phase 2 — A Richer Trace
After instrumenting the missing spans, the picture changed immediately. The system itself had not changed, but our ability to see the request had. This is a representative slow request on the same endpoint after the missing spans were added.
1
2
3
4
5
6
7
8
9
10
GET /users/me — ~4.7s total
>>> NEWLY VISIBLE SPANS:
auth.verify_aad_token ~187ms (JWKS fetch)
auth.get_or_create_user ~29ms
auth.resolve_roles ~13ms
middleware.call_next ~4,300ms ← dominant
DB queries (combined) ~150–250ms
The dominant cost is now obvious, which means the optimization problem has become concrete.
Nothing new was added to the request path. The system did not get faster because we instrumented it. We just stopped being blind to where the time was going.
Phase 3 — Fix One Thing
Once the missing time was visible, the optimization problem got much smaller. The issue was not the database and it was not the route handler. It was middleware: work being done on every request, token verification bypassing caching, and stacked layers compounding latency, so the fix was correspondingly narrow:
- refactor or remove the offending middleware
- respect caching
The result was immediate and boring in the best possible way. 2–4s became 100–300ms, and took hours rather than dragging on for days of speculative performance work.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
AFTER FIX: representative /users/me trace
>>> END-TO-END TRACE: ~237ms
GET /users/me app span ~166ms
>>> VISIBLE COSTS:
auth.verify_aad_token ~0.6ms
auth.get_or_create_user ~22.3ms
connect ~5.4ms
SELECT vectordb ~4.0ms
SELECT vectordb ~2.2ms
SELECT vectordb ~2.1ms
auth.resolve_roles ~13.2ms
connect ~2.0ms
SELECT vectordb ~3.8ms
SELECT vectordb ~2.1ms
users_me.ensure_user ~12.4ms
users_me.update_last_login ~6.0ms
users_me.build_response ~7.2ms
WHAT DISAPPEARED:
no multi-second hidden wall
no single mystery span dominating the request
A Quick Sanity Check
When an improvement is too good to be true, it problably is. There are many ways to produce a fake performance win. Some of the common ones are familiar:
- work failing earlier, because crashing is cheap
- less data flowing, because network egress per request quietly dropped
- errors no longer being counted, because they were silenced or miscategorized
- traffic composition shifting, because health checks or other cheap requests started dominating the sample
- infrastructure changing underneath you, because faster nodes replaced slower ones or capacity moved around
In this case, we checked these things: Response size was unchanged, request volume was stable, error rates were unchanged, and the improvement showed up in the distribution rather than only in a flattering average.
What Usually Happens Instead
In a scenario like this, teams often spend their time on plausible local optimizations. They tune queries, refactor handlers, or adjust infrastructure because those are the things the visible data seems to implicate. I have seen this enough times that I no longer think of it as an individual mistake. It is usually a skills-and-context problem.
As Rob Staton pointed out when we discussed this, many engineers simply lack the foundational concepts and inclinations around data-driven decisions. Part of this is structural: for the last decade, web applications have been largely “solved.” There are fewer opportunities for the kind of pain that forces you to learn these lessons the hard way. Without those battle scars, engineers who have not internalized an instrument-first habit will do the obvious thing: read code, guess at hotspots, and ship performance PRs that feel intelligent—and accomplish very little.
There is also a sunk cost fallacy at play. You see a similar pattern when engineers try to wrangle an AI agent or LLM conversation that started out poorly, instead of just discarding the context and asking a better question. In performance work, once you have spent hours guessing at a fix based on partial traces, it is hard to throw that work away and just go instrument the missing spans.
Leaders can misread this too. In standups and status updates, speculation sounds a lot like progress: people have “leads,” they are “working through fixes,” and there are several plausible explanations at different layers of the stack. That sounds reassuring when the team is still operating from partial visibility.
This is eerily like an LLM hallucinating from partial context. The reasoning is fluent, the confidence is real, and the grounding is missing. If you see this happening, the answer is to make the hidden time visible, walk people through the before and after traces, and help them feel why the first round of optimizations was doomed from the start.
The Mental Model Shift
Optimization is is about identifying and removing the dominant constraint which is not the same as eliminating possible performance problems. That requires measurement that makes the hidden time visible enough for the real bottleneck to reveal itself.
This is the same loop behind SLO-driven systems and the broader operational analytics posture. You define what “good” looks like, measure it, act when it deviates, and then verify that the intervention changed the thing you actually care about.
1
Measure → Identify → Change → Verify
Anything else is guesswork dressed up as optimization.
A Subtle Failure Mode
Before proper instrumentation, the system looks like a pile of small things that might be slow. After proper instrumentation, it often looks like one thing that is obviously slow. If your system still looks like the first case, you are not done measuring. You are still looking at a partially hidden system and mistaking partial visibility for understanding.
Closing
Most performance problems are not hard in the way people usually mean hard. They stay hard only while the important part of the system is still hidden, because invisible constraints invite superstition and visible ones can be addressed with straightforward engineering work. Systems do not get faster because we try harder. They get faster because we can finally see what is happening, and once we can see it, the fix gets a lot smaller.
Appendix A — Exercise: Your Hunch Does Not Matter Yet
This is the exercise I would run with a team. The goal is not to reward the smartest guess. The goal is to make people feel how little their first guess matters when most of the request is still invisible.
Step 1 — Show only the misleading trace
Give people only the Phase 1 trace and say: Go optimize that. Ask what they think is slow or what they would change first. Write the answers down. Most of them will evaporate as soon as the richer trace shows up.
Step 2 — Name the lesson
Before you show the richer trace, say it directly: your hunch does not matter yet. If the trace accounts for only a small fraction of the request, you do not know enough to optimize anything. You are projecting a story onto missing data.
Step 3 — Show the richer trace
Now show Phase 2 and reflect on the original answers:
- Which of your earlier ideas still matter?
- What became obviously dominant?
- What work would have been wasted if you had shipped the first fix?
The first hunch was not morally wrong. It was irrelevant. Once the hidden time becomes visible, the optimization problem changes shape.
Step 4 — Show the after state
Now show the Phase 3 chart and ask:
- What disappeared?
- What stayed roughly the same?
- What does that tell you about where the time was really going?
Discussion Prompt
If most of the request is still invisible, what is the right next step?
1
2
Not "optimize."
Instrument until the dominant cost is visible.
Appendix B — Guardrails
These are the rules I would want around any serious performance work:
- No performance work without a trace.
- No optimization without a baseline.
- No fix without before/after measurement.
- If the bottleneck is not obvious, instrument more.
Appendix C — What to Measure (Minimum)
Each item below rules out a different kind of self-deception. The goal is not a bigger dashboard. The goal is to make fake wins harder to mistake for real ones.
- latency distribution (p50, p95, p99) — tells you whether the pain is universal or hiding in the tail; averages will lie to you: Averages Mask Tails
- error rates — because a fast failure is still a failure
- request volume — because traffic mix shifts can make a system look faster than it is
- DB vs. application time — because “the app is slow” and “the database is slow” are different problems
- external calls (auth, APIs) — because many “application” problems are really dependency problems
- network bytes in and out per request — because lower latency with half the payload is not the same result
- cache hit/miss rates — because caching is often assumed, praised, and discussed long before it is actually working
- server utilization (CPU, memory, I/O) — because you need to know whether you are compute-bound, memory-bound, or mostly waiting somewhere else
If something is not measured, it is not being optimized. More dangerously, it is place for a false explanation to hide, because the unmeasured part of the system is where people project whatever story feels plausible.
Appendix D — What Actually Fixed It
What helped here, once the trace exposed the bottlenecks, was much less glamorous than we were about to do.
The biggest gain came from the middleware path the new spans exposed. Reworking that layer so the request stopped going through the problematic base middleware machinery took out the largest chunk of hidden time. On ordinary requests, that was roughly a second by itself, and it was worse in some of the uglier cases.
Second, the Azure AD path turned out to be paying an avoidable tax on every request. The module had built-in caching, but we were defeating it by creating a fresh client each time. Reusing a module-level instance brought that cache back into play and gave back roughly another 300ms per request.
Other cleanup may still be worth doing for robustness or higher-throughput scenarios, but those were not the changes that took this endpoint from multi-second latency down into the low hundreds of milliseconds.
Further Reading
- Carlos Bueno, The Mature Optimization Handbook — still the best short book I know on why measurement has to come before optimization and why most of what people call optimization is usually ritualized guessing
Related
- Why Most Operational Analytics Fall Short And What You Can Do About It — the same diagnostic posture applied to business operations rather than request latency
- Implementing SLOs for Data Quality — how the Measure → Identify → Change → Verify loop becomes a reliability practice once you formalize what “good” means
Thanks to Rob Staton for the discussion and review that led to my writing this article, as well as for his review of earlier versions of the text. His insights on the changing nature of software engineering and the sunk cost fallacy provided crucial framing for this piece.
Comments powered by Disqus.