6. Troubleshoot with exd eval
← Previous: Roll out with the testing attribute · Index
A user emails support: "I'm in San Francisco and I got the boring welcome banner. My coworker right next to me got a different one. Is this broken?"
You have lint, schema validation, fixtures, and a testing-gated rollout — all of which prove the manifest can behave correctly. None of them tell you what variant one specific user actually got last night. For that you reach for exd eval.
This chapter walks through the troubleshooting loop on the welcome-banner flag. The same pattern applies to any flag.
Reconstruct the user's evaluation
Pull the user's user.id and user.country from your logs (the application log line should carry them — or, if you've wired the evaluation-record sink, the record carries the bucketing context attributes directly). Reconstruct the evaluation locally:
$ exd eval welcome-banner --env production --manifest marketing \
--ctx user.id=u-37281 --ctx user.country=US
control
One line. The bare output is the variant key — safe to interpolate into shell pipelines (the user got control, as they reported).
That's the answer. The why takes one more flag.
Walk the resolution path with --trace
$ exd eval welcome-banner --env production --manifest marketing \
--ctx user.id=u-37281 --ctx user.country=US --trace
[flag.environments.production].rules
rule[0] segment `welcome-banner-bucket-control` MATCHES → control
rule[1] segment `welcome-banner-bucket-treat-a` (not reached)
rule[2] segment `welcome-banner-bucket-treat-b` (not reached)
[flag.environments.production].variant (not reached)
[flag.environments._].rules SKIPPED — env declared its own rules
[flag.environments._].variant (not reached)
outcome: control
via: rule[0] in env `production` matched segment `welcome-banner-bucket-control`
ctx: user.id="u-37281", user.country="US"
--trace adds two things to the bare output: the resolution walk (which rules the engine considered, in order, and which one matched) and the outcome block (the variant, the rule index, and the context the user was evaluated against).
The user got control because rule[0] — the bucket-control segment — matched. Their user.id = "u-37281" hashed into the [0, 3299] range with the shared welcome-banner-2026 salt. Their coworker, with a different user.id, would hash into a different range and get treat_a or treat_b. Working as designed.
You can confirm with their coworker's id:
$ exd eval welcome-banner --env production --manifest marketing \
--ctx user.id=u-94120 --ctx user.country=US --trace
[flag.environments.production].rules
rule[0] segment `welcome-banner-bucket-control` (no match — bucket=4521)
rule[1] segment `welcome-banner-bucket-treat-a` MATCHES → treat_a
rule[2] segment `welcome-banner-bucket-treat-b` (not reached)
...
Different user.id, different bucket hash, different rule fires. Report back to support with confidence: "Both users are in the US, both got the variant their user.id deterministically assigned them. The A/B/C split is working."
Counterfactual with exd explain --ctx
exd eval --trace shows the path this evaluation took. exd explain --ctx ... shows the same path plus the full static description of the flag — variants, segment tree, required context, pitfalls. Reach for explain when you want context-bound reasoning and the big picture in one call:
$ exd explain welcome-banner --env production --manifest marketing \
--ctx user.id=u-37281 --ctx user.country=US
[...all seven static sections — variants, walk, rules, segment tree, required context, pitfalls, notes...]
=== Counterfactual outcome
outcome: control
via: rule[0] in env `production` matched segment `welcome-banner-bucket-control`
ctx: user.id="u-37281", user.country="US"
In support workflows this is often more useful than exd eval --trace alone — the user might be asking about something the flag doesn't do, and the full static description is what tells you why. See the exd eval vs. exd explain callout for the choice in one sentence: eval for what one ctx gets, explain for what the flag does.
Common diagnostic patterns
A few situations come up over and over. Each one has a recognizable shape in the --trace output.
"Missing user.id" — falls through to the catch-all
If the application forgot to pass user.id on the evaluation context, every bucket rule misses (bucket assignment can't resolve without an entity id), and the engine falls through:
$ exd eval welcome-banner --env production --manifest marketing --ctx user.country=US --trace
[flag.environments.production].rules
rule[0] segment `welcome-banner-bucket-control` (no match — bucketing attribute user.id not set)
rule[1] segment `welcome-banner-bucket-treat-a` (no match — bucketing attribute user.id not set)
rule[2] segment `welcome-banner-bucket-treat-b` (no match — bucketing attribute user.id not set)
[flag.environments.production].variant = control MATCHES → control
outcome: control
via: env `production` variant (all rules missed)
Every rule reports "bucketing attribute not set." The user gets control by fallthrough. Fix is in the application code, not the manifest — add the user.id line back to the context builder. The validate_context test from Chapter 3 would have caught this in CI; if it didn't, it means the application code is calling the SDK directly without going through the schema-aware path.
"Wrong env" — typo in the env name
Untyped-env mode (no namespace.toml) accepts any env string. A typo silently becomes a new env that has no env block, so resolution always falls through to _:
$ exd eval welcome-banner --env prodution --manifest marketing \
--ctx user.id=u-1 --ctx user.country=US --trace
[flag.environments.prodution].rules SKIPPED — env did not declare its own
[flag.environments.prodution].variant SKIPPED — env did not declare its own
[flag.environments._].rules
rule[0] segment `welcome-banner-bucket-control` MATCHES → control
...
outcome: control
via: rule[0] in env `prodution` matched segment `welcome-banner-bucket-control`
prodution is not production. In untyped mode no diagnostic fires; the typo just means the production-specific env block is ignored. The fix is to declare a [namespace.environments] list in namespace.toml — that flips the namespace into typed-env mode, and lint code E010 then rejects unknown env names before any evaluation can use them.
"Predicate operand type mismatch" — fails the rule silently
A rule that compares a string-typed attribute against an integer literal (or vice versa) won't match anything. The --trace output flags it:
$ exd eval some-other-flag --env production --manifest marketing \
--ctx user.tier=pro --trace
[flag.environments.production].rules
rule[0] inline predicate user.tier == 1 (no match — type mismatch: user.tier is string, expected integer)
...
The root cause is in the manifest, not the call. Lint code E020 (predicate operand type mismatch) catches most of these statically; the ones that slip through are usually attribute names that look numeric but carry strings ("v2", "tier-1"). Fix in TOML; re-run lint to confirm.
"Test pass, prod miss" — the manifest the SDK loaded isn't the manifest you're inspecting
If exd eval --manifest marketing/ produces variant X but the running service produces variant Y for the same context, the running service is loading a different manifest. Common causes:
- Stale cache. The SDK loaded a manifest from a
tar.gzURL; the URL changed but the SDK didn't refresh. Bounce the service, or trigger a refresh via SSE. - Wrong directory. The SDK was pointed at a different directory than you think — a stale build path, a checked-in fixture directory under
tests/, a different namespace slug. - Server-mediated evaluation that hasn't pulled the new push.
exd manifest pushsucceeded but the namespace cache on the eval path hasn't invalidated yet; usually clears in seconds, but worth eliminating.
The discriminator: ask the SDK for the manifest fingerprint (Namespace.manifest_fingerprint() in Rust; WasmNamespace.manifestFingerprint() in TS) and compare it to the fingerprint exd schema --format json | jq .result.manifest_fingerprint reports for the manifest you're inspecting. Match → same manifest, look elsewhere; mismatch → the running service is on stale data.
JSON output for programmatic post-mortems
Every exd eval and exd explain call supports --format json. The result is the same data the human renderer shows, in a stable structured envelope, ready for jq and tooling:
$ exd eval welcome-banner --env production --manifest marketing \
--ctx user.id=u-37281 --ctx user.country=US --format json \
| jq '.result.outcome'
{
"variant": "control",
"via": { "kind": "rule", "rule_index": 0, "env": "production", "segment": "welcome-banner-bucket-control" }
}
Wrap this in your incident-response runbook: "given a user.id and a flag, here's the one-liner that prints the variant and the rule that produced it, in JSON." The provenance envelope at the top of the response carries exd_version, the manifest fingerprint, and the inputs you passed — so the post-mortem record is complete on its own.
You're done
Six chapters, six tools, one flag:
| Chapter | Tool | What it defends against |
|---|---|---|
| 1 | exd explain | Authoring the flag wrong — typoed variant, surprising fallthrough, hidden pitfall |
| 2 | exd lint (pre-commit + PR) | Typos, dangling references, shape errors making it past code review |
| 3 | exd schema + validate_context | Application code and manifest drifting apart |
| 4 | exd fixtures (+ SDK test) | A rule rewrite silently changing a variant assignment |
| 5 | testing = true (+ include_testing SDK opt-in) | A botched bucket math reaching real users on day 1 |
| 6 | exd eval / exd explain --ctx | Real-time user-specific debugging when something does still go sideways |
The flag is shipped, the rollout is safe, the post-mortem playbook is in place. From here:
- Centrally hosted flag namespace. Move the
marketing/directory out of the app repo into a dedicatedfeature-flagsrepo. See Centrally hosted flag namespace. - Telemetry. Pipe evaluation records to a sink, then run
exd telemetry srm | rules | dead-flagsto catch rollout pathologies before users do. See Telemetry reference. - Reference. Linter rules ·
exdCLI · Rust SDK · TypeScript SDK.