Playing to Win through Managed Disequilibrium

Posted Oct 1, 2025

By Muness Castle 33 min read

tl;dr — Most orgs slide into a don’t‑lose default: busy, safe work that doesn’t change user behavior. Leadership shifts the default by making the Safety net real (clarity + guardrails) and running nimble tests with short checkpoints. Protect the downside so people can play to win, and judge work by results, not looks.

Introduction

If your week feels busy but thin on outcomes, you’re not alone. Treat this as a field guide: we’ll name the rut, show a close‑up, and add a few small moves that make results the default.

Who this is for

Leaders who want outcomes without heroics.
Teams tired of KPI theater who want a safer way to try better work.

What you’ll get

A simple language for the rut and why it persists.
Four small techniques (safety net + nimble tests) you can install.
Four real scenes with copy‑and‑run fixes.

Left alone, companies settle into a false equilibrium—the don’t‑lose rut. It looks safe but traps energy; people do work that looks good instead of work that helps users. Leadership isn’t heroics; it’s changing the setup so real results win by default.

Before we jump into definitions and fixes, here’s a quick snapshot from the field to make the problem concrete.

Recently, someone I coach described their week: dashboards up, Slack hot, pilots spinning, and review decks growing—yet the outcome that mattered stayed flat. It felt like momentum because it was busy and safe; it wasn’t progress because no mechanism was written down. We agreed to name what was actually happening, not what we wished was true.

A quick close‑up

I put on an anthropologist hat and wrote down what I was seeing: cultural habits, symptoms, and the loops that reinforced them. Faux alignment got rewarded. Thrash hid in “pilot” land. Silence felt safer than surfacing a bad assumption. I couldn’t circulate it widely; folks read it as “rocking the boat.” That was the sign: naming the mechanism felt unsafe. That’s what a play-not-to-lose culture does to smart people.

Why this happens

I don’t blame anyone — this is a predictable systems outcome. Bad is stronger than good and the incentives reward for A while hoping for B, teams rationally optimize for safety. You should expect to see this unless leadership steps up. You get “phoning it in”: work that is obvious and safe, but not impactful.

The plan

The goal: shift the default from don’t lose to play to win, safely. The pattern is a behavioral barbell:

a safety net that makes good work survivable (clear Aim, guardrails, shared defaults, weekly review), and
nimble tests that make learning inevitable (bounded experiments that test a real mechanism).

I’ll show the signs, the loops, and the fixes. Then a handful of techniques I’ve used with teams that stuck. All small. All compounding. And all cross-linked to the rest of the field guide: Alignment is the Constraint, Open Horizons, Operational Data, Intent Engineering.

If your org feels busy but underachieving, the problem isn’t the people. It’s the system. Let’s redesign it so “playing to win” is the safe move.

How to use this piece

Problem — What the phoning-it-in equilibrium looks like → The equilibrium.
Managed disequilibrium (system fix) — The behavioral barbell: Safety net + Nimble tests → The behavioral barbell.
Leadership as equilibrium work — Shift the default with barbell processes → Leadership as equilibrium work.
Mental models — Definitions with sources → Mental models.
Action guide — Strategy • Ritual • Techniques → Action guide.
Examples — Real scenes + barbell rewrites → Scenes & Fixes.
FAQ + Worksheets — Objections + templates → FAQ • Appendices.

Leadership as equilibrium work

Your job isn’t to hero the plan. It’s to reshape the basin the plan rolls into.

Default: Left alone, orgs slide into a don’t‑lose rut—busy, looks good, feels safe.

Leadership move:

Sense the shock (opportunity or threat) and set a clear Aim (which user behavior should change?).
Name it in one paragraph everyone sees (what’s changing, for whom, why now).
Inject managed disequilibrium with Nimble‑test processes (Novelty Quota + nimble tests + short checkpoints).
Re‑stabilize by updating Safety‑net processes (defaults, guardrails, Definition of Done).
Make it stick with a reward swap (activity → evidence) so the new basin holds.

Management ≠ Leadership ≠ Management

One of the easiest traps is collapsing leadership and management into the same thing. They overlap, but they’re not synonyms:

Management → Service: Management is about helping people do the job well: clearing distractions, building structures, enabling mastery, and connecting them to the resources they need. Done right, it’s a service posture: “How do I help you succeed and make sure the org sees it?” .
Leadership → Disequilibrium: Leadership is about awareness and direction. It’s sensing shocks, naming the Aim, and creating intentional disequilibrium: just enough variance to break inertia, paired with guardrails to make learning survivable . Leadership channels energy that would otherwise scatter — setting tone, example, and Aim.

The behavioral barbell

The system fix: make the safety net real and the nimble tests ambitious.

If the system defaults to don’t lose, leaders must make it safe to play to win. The tool is a barbell:

Safety net (don’t lose): psychological safety, clear aims, guardrails, standard defaults.
Nimble tests (play to win): bold but bounded experiments that test a mechanism against real signals.

Design both, explicitly. The barbell works because the disequilibrium is balanced: the safety net makes variance survivable; the nimble test creates the variance. That balance converts fear into learning and effort into outcomes.

Nimble tests every week; big moves when the evidence is strong.

See mental model: Psychological safety.

Before/After: default vs. barbell

Habitual default (play not to lose)	Barbell stance (play to win, safely)
Announce a goal; assume teams will figure it out.	Co-author Aim (user behavior) and Mechanism in writing; publish the one-pager.
Ship big changes; review at the end.	Run nimble tests; review weekly with a short checkpoint.
Reward activity (tickets, dials, emails).	Reward evidence the mechanism moved the Aim.
Add approvals late; blame deviations.	Guardrails up front (SLIs/SLOs, Definition of Done); deviations within quota.
Treat metrics as targets.	Pair proxies with outcomes + a written because statement; kill if proxies don’t predict outcomes.
Hide mistakes to avoid heat.	“Safe-to-fail” scope; decision log makes kills/pivots normal.

See mental model: Threat‑rigidity.

The phoning-it-in equilibrium

What it feels like when the system rewards looking busy. Signs and loops to name it.

Most teams aren’t lazy. They’re rational. When the system rewards looking busy and punishes risk, smart people optimize how things look and avoid risk. That’s the rut you feel: busy, good‑looking‑on‑paper work that doesn’t move the Aim.

Why good people drift there

Reward mismatch — We pay for A while hoping for B (activity, artifacts, “green” dashboards → instead of behavior change).
Goodhart in the wild — When a metric becomes a target, people learn to move the number even if the mechanism is bogus.
Bad > Good — Negativity bias + threat-rigidity: one harsh review outweighs five wins, so the safe move is “don’t rock the boat.”

See mental model: Rewarding A while hoping for B.

See mental model: Goodhart’s Law.

See mental model: Bad is stronger than good.

Everyday signs

Mandate without method — Exec goal drops (“Win SMB by Q2”), but no design constraints, no problem statement, no target user story. Teams translate it to slide polish and ticket volume.
Pilot purgatory — “We’ll pilot it” becomes forever pilot. No owner, no success threshold, no checkpoint to graduate or kill — a stagnant equilibrium with no managed disturbance.
KPI theater — Numerators inflate (emails sent, sessions), denominators shrink (narrowed cohorts), lagging outcomes untouched (activation, retention).
Compliance after the fact — Reviews occur at the end, so “alignment” is rewritten history rather than a mechanism you committed to up front.
Discovery-free motions — Recruiting and sales hit activity targets with templated outreach and demo-first calls; qualified hires and opportunities don’t budge.
No definition of done — Teams “finish” without a user behavior to check against, so everything can be called done and nothing is actually resolved.

The four loops that hold it in place

Appearance Loop — Cause: Big goals, no guidance → Effect: Faux alignment (polished narratives) → Reinforcement: Leaders reward polish, issue more mandates without “how,” repeat.
Silo Loop — Cause: Context vacuum + siloed incentives → Effect: Fragmented execution → Reinforcement: Local GM metrics turn green, cross-team work stays under-resourced.
Thrash Loop — Cause: No operational guardrails (checkpoints, sanity checks) → Effect: Meandering backlog → Reinforcement: Stalled pilots go unremarked, nobody adds clarity, thrash compounds — a false equilibrium: motion without learning.
Silence Loop — Cause: Conflict aversion → Effect: No experimentation mindset (hypotheses unspoken, failures hidden) → Reinforcement: Surfacing mechanisms feels unsafe, so issues stay subterranean.

Pilot purgatory: five smells

Owner unclear.
No graduation/kill criterion.
Demo replaces user behavior.
“Temporary” state persists past one cycle.
No ops path if it worked.

Spot-check your org (five fast questions)

Name a user behavior we changed last quarter. If you can’t, you’re in KPI theater.
What’s one mechanism you wrote down before building? If none, you’re narrating, not designing.
Which pilot did we graduate or kill at a checkpoint? If none, you’re drifting.
Which proxy moved before the outcome? If only proxies moved, you’re optimizing how things look.
Who can say “stop” when the guardrail trips? If nobody, your floor isn’t real.

Left alone, these loops reinforce each other. Breaking the equilibrium requires installing counter-loops that make it cheaper to surface mechanisms than to hide them, and safer to seek outcomes than to game how things look. That’s where the behavioral barbell comes in.

See mental model: Ruts: looks vs. results.

Not lazy, logical. When the system rewards looking busy, smart people optimize how things look. The fix is system design, not pep talks, not “holding people accountable”.

Translation: raise the bar on looks‑good metrics and deepen the habits that produce real results.

Action guide

Three parts you can run together this week: a tiny strategy, a weekly ritual, and two techniques that make trying better work safe. Each cycle, add one small, bounded dose of managed disequilibrium (e.g., a Novelty Quota deviation or an Anthropologist Hat case) inside guardrails.

Strategy, made small

Three sentences that replace a strategy deck.

Borrow the smallest workable definition from my playbook:

Aim — the outcome you want to create (a change in user behavior).
Mechanism — the causal lever you believe will move it (because statement).
Feedback — the few signals that will validate or disprove the mechanism soon.

Mechanism ≠ task. A mechanism is a causal bet about why a change will move the Aim (“because…”). “Refactor X,” “send more emails,” or “run a campaign” are tasks, not mechanisms. Write the cause, not just the motion. See also: Mechanism ≠ task.

Quality bar: examples (weak → strong)

Element	Weak (sounds busy)	Strong (names cause & check)
Aim	Improve engagement	Increase activation from 27% → 35% for Segment S in 14 days.
Mechanism	Send more lifecycle emails	Make Action X unmissable in-product and nudge only stalled users because we remove confusion at Step N and reduce time-to-value.
Feedback	Track sessions and opens	Proxy: % doing Action X in 24h; Outcome: 14-day activation; Leading: median time-to-first-action.

Pair this with a short checkpoint: what we expect to learn, and the decision rule to continue, change, or stop. Once that loop runs, pour on speed.

See mental model: Checkpoint ≠ deadline.

Proxy → Outcome pairs by domain

Domain	Proxy (moves first)	Outcome (must move)
Product	% completing first-run Action X	Activation / 14-day retained users
Recruiting	% qualified screens from outreach	Accepted offers for target role
Sales	% calls with discovery checklist completed	Qualified opp rate (ICP segment)
Platform/SRE	p95 latency on hot path	Task success rate / support tickets down
Support/Success	% tickets with root cause tagged	Time-to-resolution / CSAT

Numbers move. Users don’t. If a KPI is moving while user behavior isn’t, you’re watching a proxy without a mechanism. Re-attach the number to a because statement and a user outcome, or kill it.

Example: More lifecycle emails → higher open rates/sessions, but activation is flat. That’s proxy-chasing.

Sources:

Eric Ries, “Vanity Metrics vs. Actionable Metrics” (2009) — avoid intermediate metrics like click-through rate in favor of customer behaviors that drive value (link).
Mixpanel, “What are vanity metrics?” — vanity metrics aren’t tied to user value (link).
“Open rate” — tracking is unreliable and often misleading (link).

Leader’s playbook

A weekly ritual you can run tomorrow.

This is the Change loop: sense → name → test (small change) → stabilize (Safety‑net update) → institutionalize (reward swap).

Run this loop weekly to keep momentum and prevent thrash.

Here are tools for your toolbelt:

Co-author the Aim. Prompt: “What user behavior change would prove we created value?” Artifact: one-sentence Aim + definition of done.
Name the Mechanism. Template: “We believe X will move Y because Z.” Artifact: the mechanism sentence right under the Aim.
Pick two Feedback signals + a Checkpoint. One proxy (should move first) and one outcome metric. Add a short decision rule.
Install Guardrails (the safety net). One SLI/SLO to protect users; a simple Definition of Done checklist:
- User behavior to verify is stated.
- Guardrail SLI/SLO attached.
- Rollback/kill path written.
- Owner + date for 15-minute review set.
- Decision log entry stubbed (Aim, Mechanism, Signals, Checkpoint).
Run a Nimble Test. The smallest change likely to move the proxy. Ship quickly.
Run a 15-minute Review, every 1-2 weeks. Ask: What changed? What didn’t? Why? Decide: continue, change, or stop. Capture a 1-3-line decision log.

Techniques (practices)

Two techniques you’ll use in the fixes below. Each has a job; together they make trying better work safe.

Novelty Quota

What it is — A simple policy that commits each team to a few explicit deviations from the standard way of working every cycle. It’s a lightweight exploration budget that creates safe, bounded disequilibrium so the system learns without chaos.
Why it matters — Left alone, orgs settle into false equilibria that feel busy but don’t learn. A Novelty Quota guarantees small, testable variance; paired with guardrails and checkpoints, it surfaces signals and prevents drift.
How to use it —
- Pick the reflex to deviate from (e.g., “send more email,” “demo‑first,” “rewrite”).
- Define one deviation that tests a mechanism this cycle (what changes, for whom, why now).
- Add guardrails and a short checkpoint to the one‑pager (SLI/SLO, rollout %, rollback; continue/change/stop rule).
- Name the owner and put a 15‑minute review on the calendar.
- Label the deviation in a “deviation log”; fold winners into defaults next cycle; retire the rest.
Tip — Keep the deviation small, reversible, and easy to review; one per team per cycle is the point.
Worksheet — See Appendix B (Novelty Quota Worksheet).

Anthropologist Hat

What it is — A structured reflection tool. You step back like an anthropologist observing a culture, writing down the traits, symptoms, and loops that shape how work actually happens. It surfaces hidden mechanisms—reward mismatches, silos, drift, silence—that keep teams stuck in ruts.
Why it matters — Teams normalize dysfunction because it feels “just the way things are.” Naming those loops makes them visible, so you can design counter‑loops (guardrails, checkpoints, small prototypes, retrospectives) that break the cycle. The goal isn’t critique; it’s turning tacit patterns into explicit mechanisms everyone can work on.
How to use it — Fill in one real case (a slice, not the whole org):
- Context — Where you’re seeing friction (team/vertical/domain).
- Cultural traits — e.g., Big goals, no guidance • Context vacuum • Siloed incentives • Conflict aversion.
- Symptoms — Faux alignment • Fragmented execution • Operational drift • No experimentation mindset.
- Reinforcing (vicious) loops — Appearance • Silo • Drift • Silence.
- Balancing loops — Context & “How?” Checkpoint • Cross‑Team 1:1s • Mini Prototype • Publish v0.0.1 + Retros.
Tip — Keep loops to a few lines; clarity and brevity help others see the loop, nod, and act.
Template — See Appendix C (Anthropologist Hat — Case Template).

Why this works: Safety → more honest signals. Mechanisms → less gaming. Short cycles → less thrash, more evidence.

Thrash alarms

When these show up, you’re slipping into a play-not-to-win equilibrium. Add one small, bounded dose of managed disequilibrium to fight back.

Two consecutive cycles with zero deviations or no kills.
Proxies moving without a written “because” or an outcome shift.
Weekly review turns into status only (no continue/change/stop decisions).
Guardrails exist on paper but no one can say “stop”.

Next: see the Action guide applied in four real examples.

Scenes & Fixes

Situational examples of dysfunction and how we addressed them.

Real snapshots of the equilibrium, paired with barbell rewrites you can copy. Each uses Aim • Mechanism • Feedback • Checkpoint • Guardrail.

QBR: KPI theater → activation

If this sounds like you: your deck glows green but you can’t name one behavior that changed.

Context

End of Q2. Pricing lifted signups, but activation in Segment S is stuck at 27% and 14‑day retention is flat.
The brief for the review: show how we’ll move activation next quarter and what we’ll watch.

What happened

We ran a QBR with fourteen people on Zoom. The deck looked great: vision, pillars, and green KPI arrows (sessions +18%, emails +42%). Activation and retention were missing.
A peer PM asked, “Which user behavior will change next quarter, and why?” The answer was “engagement is up” and a finger on a dashboard.
We left with “send two more lifecycle emails” and “ship the dashboard.”
Six weeks later the KPIs kept climbing—because we sent more emails.
Activation and retention didn’t move. Numbers moved; users didn’t.

Fix — techniques and steps

New signups stall before first value — a stagnant equilibrium that feels busy but doesn’t learn. We’ll re‑introduce managed disequilibrium by making the first win unmissable in‑product and sending one targeted nudge only to users who stall. We’ll judge success by an early proxy and activation, inside guardrails, then decide in a 15‑minute review to continue, change, or stop. We apply four techniques, each with a clear job and concrete steps.

Aim — Increase activation 27% → 35% (Segment S, 14d).
Techniques (why + specifics).
- Anthropologist Hat.
  - Why: name the KPI‑theater mechanism and loop; write the causal “because” before tactics.
  - Mechanism — Make Action X unmissable in‑product + one targeted nudge to stalled users (removes confusion at Step N; reduces time‑to‑value).
  - Decision log — Aim, Mechanism, Signals/Checkpoint captured before work starts.
- Novelty Quota — Why: force one safe deviation from “send more email” to a real, testable change.
  - Nimble test we run — Add an in‑product checklist prompt; ship Segment S only; one triggered nudge >24h stall.
  - Artifact — “Start with Action X to get [first value] in under 2 minutes.”
- Short checkpoint — Why: turn signals into a decision and prevent thrash.
  - Signals — Proxy: % complete Action X in 24h; Leading: time‑to‑first‑action; Outcome: 14‑day activation.
  - Checkpoint rule — Continue if proxy +10% AND activation +3 pts; else change/stop (15‑minute review).
- Guardrails — Why: protect users and timebox risk.
  - SLI — Error ≤0.1%.
  - Outreach cap — ≤1 email per user/7 days.
  - Rollback — Documented plan before launch.
Owner and timing — Product lead • Prep 20m • Ship 2h • Review 15m.

Takeaway — Activation moves in Segment S this fortnight; celebrate the mechanism, not the email count.

System shifts: more weight on results (activation evidence); higher bar on looks‑good metrics (pair proxies with outcomes); faster nimble tests (weekly).

Recruiting: spray‑and‑pray → high‑intent personalization

If this sounds like you: reply rates are flat despite heavy outreach.

Context

We need three Staff Data Engineers in 30 days to unblock a streaming pipeline.
Baseline last cycle: 6% qualified screens per 100 messages.
Outreach is generic; the CRM has no field for “why you/why now.”.

What happened

A great candidate was already in the ATS. Their portfolio showed the exact pipeline we’re building.
They got the standard opener (“Hi {FirstName}…”), no mention of their repo or the role’s fit. They ghosted.
The recruiter still hit “outreach sent” and “time‑to‑first‑touch.” We lost the candidate we needed.

Fix — techniques and steps

Outreach is heavy and replies are flat — a stagnant equilibrium. We’ll add managed disequilibrium with reason‑first personalization that references the candidate’s work and maps it to the role’s outcome. We’ll judge by personalization rate and replies, inside guardrails, then decide in a 15‑minute review to continue, change, or stop. We apply four techniques with clear jobs and steps.

Aim — Raise qualified screen rate 6% → 12% (Staff Data Engineer, 4 weeks).
Techniques (why + specifics).
- Anthropologist Hat — Why: name the spray‑and‑pray reflex; write the “why‑you/why‑now” before tactics.
  - Mechanism — High‑intent personalization (reference repo/talk + why this role now) with a 4‑touch sequence.
  - Decision log — Aim, Mechanism, Signals/Checkpoint captured before work starts.
- Novelty Quota — Why: force one safe deviation from template‑first to reason‑first.
  - Nimble test we run — Add a reason‑first paragraph (one line that proves we read their work and why now); 4‑stage sequence; require repo/talk link in CRM to send.
  - Artifact — “FirstName, your [repo/talk] on [topic] maps to our [outcome]… Up for a 12‑min screen?”
- Short checkpoint — Why: turn signals into a decision and prevent thrash.
  - Signals — Personalized outreach rate; reply rate to stage‑1 and cumulative across 4 touches; Outcome: qualified screens per 100.
  - Checkpoint rule — Continue if personalization ≥80% AND reply rate +25%; else change/stop (weekly review).
- Guardrails — Why: respect candidates and timebox risk.
  - Do‑not‑contact honored; max 4 touches/14 days.
  - Keep outreach under 150 words; no bait‑and‑switch on scope/level.
Owner and timing — Recruiter • Prep 20m • Ship 2h • Review 15m.

Recruiting outreach: what works (evidence)

Highly personalized messages show 23–47% higher response rates than non‑personalized outreach.
What “personalized” means here: a short reason‑first sentence that references the candidate’s repo/talk and “why now.” (Gem calls this the “reason” field.)
Greenhouse reports teams expect 30–50% response on well‑targeted initial sourcing emails, and emphasize personalization to earn replies.
Multi‑touch sequences matter: Lever’s analyses show most responses come after the first outreach, with meaningful gains by the 3rd touch. Sources:

Takeaway — Qualified screens per 100 doubled via reason‑first opener; share examples org‑wide.

System shifts: faster nimble tests (reason‑first deviation); better visibility (CRM fields enforce “because”); more weight on results (screens per 100).

Sales: demo-first → discovery-first

If this sounds like you: demos are high and qualified opps are stuck at ~9%.

Context

Outbound to ICP Segment S with an activity‑heavy motion.
Targets: 80 dials/day, 12 demos/week.
Baseline: 9% conversion from first call to qualified opportunity.
Discovery notes rarely capture a job‑to‑be‑done.

What happened

Minute one of the call: “Do you have 30 minutes?” Minute two: a screenshare.
No problem statement, no “why now.” The team hit its activity targets; conversion stayed at 9%.
I asked for one example where the rep wrote the customer’s job to be done in the notes. We had none.
We were pitching a product, not solving a problem.

Fix — techniques and steps

Demo‑first activity is high and qualified opps are stuck — a stagnant equilibrium. We’ll re‑introduce managed disequilibrium by requiring 8 minutes of structured discovery before any demo and adopting a “no‑fit, no‑demo” rule. We’ll judge by discovery completion and conversion, inside guardrails, then decide in a 15‑minute review to continue, change, or stop. We apply four techniques with clear jobs and steps.

Aim — Raise qualified opportunity rate 9% → 15% (ICP Segment S, 4 weeks).
Techniques (why + specifics).
- Anthropologist Hat — Why: name the appearance loop; write the problem before the pitch.
  - Mechanism — 8 minutes of structured discovery (JTBD, impact, urgency) before any demo; “no‑fit, no‑demo” rule.
  - Decision log — Aim, Mechanism, Signals/Checkpoint captured before work starts.
- Novelty Quota — Why: one safe deviation from demo‑first to discovery‑first.
  - Nimble test we run — Add a discovery script to the dialer; template CRM notes (JTBD, Problem‑Fit Y/N, Next Step); auto‑flag missing checklists; pilot on Segment S only.
  - Artifact — Discovery questions: 1) What job are you trying to get done? 2) What’s the impact if it works? 3) Why now?
- Short checkpoint — Why: turn signals into a decision and prevent thrash.
  - Signals — % of calls with discovery checklist completed; % of demos ≤10 minutes when problem‑fit is confirmed; Outcome: opportunity conversion rate.
  - Checkpoint rule — Continue if discovery ≥80% AND conversion +3 pts; else change/stop (15‑minute review).
- Guardrails — Why: protect trust and compliance.
  - Do‑not‑pitch if no fit; ≤2 follow‑ups without a written problem statement.
  - Adhere to do‑not‑contact and compliance rules.
Owner and timing — Sales manager • Prep 20m • Ship 2h • Review 15m.

Takeaway — Discovery completion ≥80% and opportunity rate +3 pts; coach with call snippets.

System shifts: higher bar on looks‑good metrics (no demo without problem); faster nimble tests (discovery‑first); more weight on results (opportunity conversion).

Platform/SRE: rewrite impulse → test‑and‑measure

If this sounds like you: rewrite ideas show up before a single measurement.

Context

Checkout complaints spike at peak; support tickets are up +22%.
p95 on /price/quote is 450ms with spikes above 700ms.
The error budget is burning; SLAs are at risk ahead of seasonal traffic.

What happened

In triage, the first idea was a full rewrite.
That would push the fix out by months while customers keep timing out on checkout.

Fix — techniques and steps

Customers are timing out; the first impulse is a rewrite — a false equilibrium of motion without learning. Instead, we’ll apply managed disequilibrium by precomputing hot inputs and adding a read‑through cache to relieve p95 quickly. We’ll judge by p95 and timeouts on a flagged cohort, inside guardrails, then decide in a 15‑minute review to continue, change, or stop. We apply four techniques with clear jobs and steps.

Aim — Reduce p95 on /price/quote 450ms → 250ms and timeouts 1.8% → 0.6% (Segment S, 4 weeks).
Techniques (why + specifics).
- Anthropologist Hat — Why: name the thrash loop (rewrite impulse) and write the causal change.
  - Mechanism — Precompute hot inputs + add a read‑through cache (fetch from source on miss) to cut tail latency.
  - Decision log — Aim, Mechanism, Signals/Checkpoint captured before work starts.
- Novelty Quota — Why: one safe deviation from rewrite to a bounded, testable change.
  - Nimble test we run — Add a read‑through cache (TTL 5m); precompute inputs nightly; instrument a per‑segment dashboard (latency, hit rate, queue depth, timeouts); release to 10% of Segment S behind a flag.
  - Artifact — Dashboard focus: p95 latency, cache hit rate, timeout%.
- Short checkpoint — Why: turn signals into a decision and prevent thrash.
  - Signals — p95/p99 on /price/quote (Segment S only); queue depth; cache hit rate; Outcome: task success rate + support tickets per 1k sessions.
  - Checkpoint rule — Continue if p95 −25% AND timeouts −50%; else change/stop (15‑minute review).
- Guardrails — Why: protect users and error budget.
  - Error rate ≤0.1%; do not exceed error budget.
  - Rollout ≤10% behind a feature flag; documented rollback plan.
Owner and timing — SRE lead • Prep 20m • Ship 2h • Review 15m.

Takeaway — p95 −25% in the pilot cohort; publish before/after in the weekly rollup.

System shifts: stronger safety net (error‑budget guardrails); faster nimble tests (cache change); better visibility (per‑segment dashboard).

FAQ (skeptic’s corner)

Isn’t this slower? A little, at first. That’s the point. A 15-minute weekly review and a 2-week stage gate replace months of drift. The loop trades calendar speed for causal speed. The fastest path is the one with fewer wrong turns.

Isn’t this micro-management? No. It’s mechanism management. We’re aligning on Aim, Mechanism, Feedback, Gate, Guardrail—then giving teams autonomy to design and ship mechanism slices. Control the interface, not the internals.

What if proxies lie? Pair every proxy with an outcome and a written because statement. If the proxy moves and the outcome doesn’t, the mechanism is wrong. Change it or kill it.

Proxy sniff test ✔ Can it move within a week? ✔ Is it upstream of the outcome (not just correlated)? ✔ Would a bad actor be able to juice it without moving user behavior? If yes, add a second signal or tighten the definition.

How do I pick good guardrails? Choose one user-protecting signal (e.g., error rate, latency on hot path, time-to-resolution) and one blast-radius rule (rollback path, feature flag, rollout limit).

Guardrail picker: 3 rules 1) Small, observable, and attributable to the slice. 2) Trips before harm compounds. 3) Has a named owner empowered to say stop.

We’re compliance-heavy. Does this still work? Yes. Put non-negotiables in the Floor (security, privacy, regulatory steps). Run novelty and mechanism slices inside those constraints (small cohorts, offline pilots, synthetic data). Stage gates and decision logs make audits easier, not harder.

What if my execs only want big bets? Offer a barbell one-pager: Aim → Mechanism → two signals → 2-week gate → guardrail → cost of the slice. Ask for permission to run three slices before the next big review. Big bets are fine—after we’ve found a working mechanism.

I don’t control comp/bonuses. How do I fix “rewarding A while hoping for B”? You still control recognition. Praise and promotions are strong signals. Publicly celebrate evidence of mechanisms working (even small wins), and kills that saved cycles. Propose one compensation tweak per quarter: swap one activity metric for an evidence metric.

Data is messy; we can’t measure cleanly. Now what? Start with directionally reliable signals you can observe this week. Use segments and event counts you trust, even if imperfect. Log assumptions in the decision log. Improve instrumentation as a follow-on slice. Imperfect + fast beats perfect + late.

Cross-team dependencies make weekly slices impossible. Shrink the slice to what your team can ship without external help (content, copy, default toggles, UI affordance, docs, sampling, cohort selection). Use the Anthropologist Hat to map the dependency loop and add a balancing loop (e.g., a cross-team 30-minute weekly unblocker with a rotating chair and a 3-line decision log).

Won’t this cause chaos? No. Managed disequilibrium ≠ chaos. With a real Floor (clear defaults, guardrails) and a quota limited to one explicit deviation per cycle, you get a safe shake‑up that surfaces learning without collateral damage. Label the deviation, review it, and fold winners into defaults. That’s exploration without entropy.

How do we scale this beyond one team? Standardize the artifacts (one-pager, decision log, Definition of Done). Keep the ritual tiny (15 minutes). Publish a weekly roll-up: which mechanisms moved outcomes, which were killed, what defaults changed. That compounding visibility drives adoption.

What if a mechanism causes collateral damage? That’s what guardrails + flags are for. Roll back, log the lesson, and update the Floor so the next slice can’t trip the same wire.

When do we stop using stage gates? When the mechanism is proven and stable for the segment in question. Promote it to the Default. Keep gates for new segments or when context shifts.

Where do I start tomorrow? Take one scene you recognize (QBR, Sales, Recruiting). Write the one-pager (Aim, Mechanism, Signals, Gate, Guardrail). Commit to one mechanism slice this week and a 15-minute review next week. That’s it.

Leadership is disequilibrium made safe: just enough disruption to move forward, bounded enough to stick.

Appendices (worksheets)

Appendix A: Mental models & definitions (click to expand)

Mental models to keep in mind when you’re creating intentional disequilibria.

Goodhart’s Law — When a measure becomes a target, it stops being a good measure. Why it matters: easy‑to‑measure numbers get optimized even if the underlying behavior doesn’t change. Fix: pair proxies with outcomes; set the rules before you start and write definitions. Source: Goodhart (1975); Strathern’s phrasing (1997); Campbell’s Law (1979).

Rewarding A while hoping for B — Incentives pay for activity/looks while leaders want outcomes/learning. People optimize what’s paid. Fix: swap one activity metric per quarter for an evidence metric. Source: Kerr (1995).

Bad is stronger than good — Negatives weigh more than positives; under threat, variance shrinks. Fix: safe‑to‑fail scope, visible wins, praise evidence, not just postmortems. Source: Baumeister et al. (2001).

Threat‑rigidity — Under pressure, groups narrow attention, over‑rely on habit, avoid variance. Fix: smaller bets, fast feedback, explicit permission to update. Source: Staw, Sandelands & Dutton (1981).

Psychological safety — Shared belief that it’s safe to take interpersonal risks; predicts learning/performance in complex work. Fix: protect the safety net, normalize the 15‑minute review. Source: Edmondson (1999).

Disequilibrium (managed) — The purposeful disturbance leaders introduce to shake teams out of a false equilibrium (the “don’t‑lose” rut). It widens variance just enough to surface learning signals, but is bounded by a safety net (guardrails, clarity, short checkpoints) so the system doesn’t collapse. Fix: pair every disruption (Novelty Quota, nimble test) with explicit safety nets; the goal isn’t chaos — it’s a new, healthier equilibrium that produces outcomes. Source: Heifetz & Linsky (2002), Adaptive Leadership; applied here as the behavioral barbell.

Ruts: looks vs. results — Left alone, orgs settle into a looks‑first rut (busy, safe, easy to show). Barbell‑encouraging processes (Safety‑net + Nimble‑test rituals) make the results‑first rut deeper. Fix: small weekly changes with a strong safety net.

Checkpoint ≠ deadline — Deadlines measure when; checkpoints measure whether the mechanism worked. Fix: define expected signal change and continue/change/stop rules up front.

Mechanism ≠ task — Tasks are motions; mechanisms are because statements. Fix: write the cause, then choose the smallest test.

Proxy vs. outcome (“Numbers move. Users don’t.”) — If proxies move without user behavior, you’re chasing looks. Fix: re‑attach a mechanism or kill. Example & sources: see callout in Strategy, made small.

‘Our Iceberg Is Melting’ (intentional change management) — A structured 8‑step process for leading change. See Appendix F for the steps.

Appendix B: Novelty Quota Worksheet (click to expand)

Purpose — Keep the floor safe while forcing exploration.

Defaults (Floor): List the shared stack/process you start from.

Quota (Ceiling): Each cycle, each team logs ≥1 deviation.

Deviation Log

Default Area	Deviation	Hypothesis (because…)	Expected Benefit/Risk	Outcome	Promote/Retire
e.g., API framework	Try X on service Y	Because Z should cut p95 by 20%	Better latency; small migration cost	p95 −18%	Promote next cycle

Guardrails No deviations in: security, compliance, user‑data handling—unless pre‑approved.

Review At cycle end, promote winners to defaults; retire the rest; capture one lesson.

Appendix C: Anthropologist Hat — Case Template (click to expand)

Context — Where you’re seeing friction (team/vertical/domain).
Cultural Traits — e.g., Big goals, no guidance • Context vacuum • Siloed incentives • Conflict aversion.
Symptoms — Faux alignment • Fragmented execution • Operational drift • No experimentation mindset.
Reinforcing (Vicious) Loops — Appearance • Silo • Drift • Silence.
Balancing Loops — Context & “How?” Checkpoint • Cross‑Team 1:1s • Mini Prototype • Publish v0.0.1 + Retros.

Appendix D: One‑Page Alignment Sheet (click to expand)

Field	Description
Aim	A clear statement of the outcome (user behavior change).
Mechanism	“We believe __ will move __ because ___.”
Feedback	Two to three signals: one proxy, one outcome, one leading indicator.
Stage Gate (2 wks)	“We will continue if ___.”
Guardrail	One SLI/SLO to protect users while testing.
Owner & Cadence	Person • Weekly 15‑minute review on ___.

Appendix E: Barbell Plan (Floor + Ceiling) (click to expand)

Purpose — Make the Floor explicit (defaults, guardrails) and the Ceiling deliberate (three mechanism slices with gates).

Floor (don’t lose)

Defaults: frameworks/process/infra we start from.
Guardrails: 1 SLI/SLO + rollback path + flag/rollout limits.
Non‑negotiables: security, privacy, compliance tasks baked in.

Ceiling (play to win)

Slices: three bounded mechanism tests with signals and 2‑week gates.
Novelty Quota: ≥1 deviation per cycle.
Visibility: decision log + weekly 15‑minute review.

Appendix F: ‘Our Iceberg Is Melting’ — 8‑step change (click to expand)

Create urgency.
Form a guiding team.
Build vision and strategy.
Communicate for buy‑in.
Empower action (remove barriers).
Produce short‑term wins.
Don’t let up; make it stick.
Institutionalize: update defaults/rewards so the new equilibrium holds.

(Source: John Kotter & Holger Rathgeber, 2006).

This post is licensed under CC BY 4.0 by the author.