6. Troubleshoot with `exd eval`

← Previous: Roll out with the testing attribute · Index

A user emails support: "I'm in San Francisco and I got the boring welcome banner. My coworker right next to me got a different one. Is this broken?"

You have lint, schema validation, fixtures, and a testing-gated rollout — all of which prove the manifest can behave correctly. None of them tell you what variant one specific user actually got last night. For that you reach for exd eval.

This chapter walks through the troubleshooting loop on the welcome-banner flag. The same pattern applies to any flag.

Reconstruct the user's evaluation

Pull the user's user.id and user.country from your logs (the application log line should carry them — or, if you've wired the evaluation-record sink, the record carries the bucketing context attributes directly). Reconstruct the evaluation locally:

$ exd eval welcome-banner --env production --manifest marketing \
    --ctx user.id=u-37281 --ctx user.country=US
control

One line. The bare output is the variant key — safe to interpolate into shell pipelines (the user got control, as they reported).

That's the answer. The why takes one more flag.

Walk the resolution path with `--trace`

$ exd eval welcome-banner --env production --manifest marketing \
    --ctx user.id=u-37281 --ctx user.country=US --trace
[flag.environments.production].rules
  rule[0] segment `welcome-banner-bucket-control`   MATCHES        → control
  rule[1] segment `welcome-banner-bucket-treat-a`   (not reached)
  rule[2] segment `welcome-banner-bucket-treat-b`   (not reached)
[flag.environments.production].variant              (not reached)
[flag.environments._].rules                         SKIPPED — env declared its own rules
[flag.environments._].variant                       (not reached)

outcome: control
  via:   rule[0] in env `production` matched segment `welcome-banner-bucket-control`
  ctx:   user.id="u-37281", user.country="US"

--trace adds two things to the bare output: the resolution walk (which rules the engine considered, in order, and which one matched) and the outcome block (the variant, the rule index, and the context the user was evaluated against).

The user got control because rule[0] — the bucket-control segment — matched. Their user.id = "u-37281" hashed into the [0, 3299] range with the shared welcome-banner-2026 salt. Their coworker, with a different user.id, would hash into a different range and get treat_a or treat_b. Working as designed.

You can confirm with their coworker's id:

$ exd eval welcome-banner --env production --manifest marketing \
    --ctx user.id=u-94120 --ctx user.country=US --trace
[flag.environments.production].rules
  rule[0] segment `welcome-banner-bucket-control`   (no match — bucket=4521)
  rule[1] segment `welcome-banner-bucket-treat-a`   MATCHES        → treat_a
  rule[2] segment `welcome-banner-bucket-treat-b`   (not reached)
...

Different user.id, different bucket hash, different rule fires. Report back to support with confidence: "Both users are in the US, both got the variant their user.id deterministically assigned them. The A/B/C split is working."

Counterfactual with `exd explain --ctx`

exd eval --trace shows the path this evaluation took. exd explain --ctx ... shows the same path plus the full static description of the flag — variants, segment tree, required context, pitfalls. Reach for explain when you want context-bound reasoning and the big picture in one call:

$ exd explain welcome-banner --env production --manifest marketing \
    --ctx user.id=u-37281 --ctx user.country=US
[...all seven static sections — variants, walk, rules, segment tree, required context, pitfalls, notes...]

=== Counterfactual outcome
outcome: control
  via:   rule[0] in env `production` matched segment `welcome-banner-bucket-control`
  ctx:   user.id="u-37281", user.country="US"

In support workflows this is often more useful than exd eval --trace alone — the user might be asking about something the flag doesn't do, and the full static description is what tells you why. See the exd eval vs. exd explain callout for the choice in one sentence: eval for what one ctx gets, explain for what the flag does.

Common diagnostic patterns

A few situations come up over and over. Each one has a recognizable shape in the --trace output.

"Missing `user.id`" — falls through to the catch-all

If the application forgot to pass user.id on the evaluation context, every bucket rule misses (bucket assignment can't resolve without an entity id), and the engine falls through:

$ exd eval welcome-banner --env production --manifest marketing --ctx user.country=US --trace
[flag.environments.production].rules
  rule[0] segment `welcome-banner-bucket-control`   (no match — bucketing attribute user.id not set)
  rule[1] segment `welcome-banner-bucket-treat-a`   (no match — bucketing attribute user.id not set)
  rule[2] segment `welcome-banner-bucket-treat-b`   (no match — bucketing attribute user.id not set)
[flag.environments.production].variant = control    MATCHES        → control

outcome: control
  via:   env `production` variant (all rules missed)

Every rule reports "bucketing attribute not set." The user gets control by fallthrough. Fix is in the application code, not the manifest — add the user.id line back to the context builder. The validate_context test from Chapter 3 would have caught this in CI; if it didn't, it means the application code is calling the SDK directly without going through the schema-aware path.

"Wrong env" — typo in the env name

Untyped-env mode (no namespace.toml) accepts any env string. A typo silently becomes a new env that has no env block, so resolution always falls through to _:

$ exd eval welcome-banner --env prodution --manifest marketing \
    --ctx user.id=u-1 --ctx user.country=US --trace
[flag.environments.prodution].rules                 SKIPPED — env did not declare its own
[flag.environments.prodution].variant               SKIPPED — env did not declare its own
[flag.environments._].rules
  rule[0] segment `welcome-banner-bucket-control`   MATCHES        → control
...

outcome: control
  via:   rule[0] in env `prodution` matched segment `welcome-banner-bucket-control`

prodution is not production. In untyped mode no diagnostic fires; the typo just means the production-specific env block is ignored. The fix is to declare a [namespace.environments] list in namespace.toml — that flips the namespace into typed-env mode, and lint code E010 then rejects unknown env names before any evaluation can use them.

"Predicate operand type mismatch" — fails the rule silently

A rule that compares a string-typed attribute against an integer literal (or vice versa) won't match anything. The --trace output flags it:

$ exd eval some-other-flag --env production --manifest marketing \
    --ctx user.tier=pro --trace
[flag.environments.production].rules
  rule[0] inline predicate user.tier == 1            (no match — type mismatch: user.tier is string, expected integer)
...

The root cause is in the manifest, not the call. Lint code E020 (predicate operand type mismatch) catches most of these statically; the ones that slip through are usually attribute names that look numeric but carry strings ("v2", "tier-1"). Fix in TOML; re-run lint to confirm.

"Test pass, prod miss" — the manifest the SDK loaded isn't the manifest you're inspecting

If exd eval --manifest marketing/ produces variant X but the running service produces variant Y for the same context, the running service is loading a different manifest. Common causes:

Stale cache. The SDK loaded a manifest from a tar.gz URL; the URL changed but the SDK didn't refresh. Bounce the service, or trigger a refresh via SSE.
Wrong directory. The SDK was pointed at a different directory than you think — a stale build path, a checked-in fixture directory under tests/, a different namespace slug.
Server-mediated evaluation that hasn't pulled the new push. exd manifest push succeeded but the namespace cache on the eval path hasn't invalidated yet; usually clears in seconds, but worth eliminating.

The discriminator: ask the SDK for the manifest fingerprint (Namespace.manifest_fingerprint() in Rust; WasmNamespace.manifestFingerprint() in TS) and compare it to the fingerprint exd schema --format json | jq .result.manifest_fingerprint reports for the manifest you're inspecting. Match → same manifest, look elsewhere; mismatch → the running service is on stale data.

JSON output for programmatic post-mortems

Every exd eval and exd explain call supports --format json. The result is the same data the human renderer shows, in a stable structured envelope, ready for jq and tooling:

$ exd eval welcome-banner --env production --manifest marketing \
    --ctx user.id=u-37281 --ctx user.country=US --format json \
  | jq '.result.outcome'
{
  "variant": "control",
  "via": { "kind": "rule", "rule_index": 0, "env": "production", "segment": "welcome-banner-bucket-control" }
}

Wrap this in your incident-response runbook: "given a user.id and a flag, here's the one-liner that prints the variant and the rule that produced it, in JSON." The provenance envelope at the top of the response carries exd_version, the manifest fingerprint, and the inputs you passed — so the post-mortem record is complete on its own.

You're done

Six chapters, six tools, one flag:

Chapter	Tool	What it defends against
1	`exd explain`	Authoring the flag wrong — typoed variant, surprising fallthrough, hidden pitfall
2	`exd lint` (pre-commit + PR)	Typos, dangling references, shape errors making it past code review
3	`exd schema` + `validate_context`	Application code and manifest drifting apart
4	`exd fixtures` (+ SDK test)	A rule rewrite silently changing a variant assignment
5	`testing = true` (+ `include_testing` SDK opt-in)	A botched bucket math reaching real users on day 1
6	`exd eval` / `exd explain --ctx`	Real-time user-specific debugging when something does still go sideways

The flag is shipped, the rollout is safe, the post-mortem playbook is in place. From here:

Centrally hosted flag namespace. Move the marketing/ directory out of the app repo into a dedicated feature-flags repo. See Centrally hosted flag namespace.
Telemetry. Pipe evaluation records to a sink, then run exd telemetry srm | rules | dead-flags to catch rollout pathologies before users do. See Telemetry reference.
Reference. Linter rules · exd CLI · Rust SDK · TypeScript SDK.

Reconstruct the user's evaluation​

Walk the resolution path with --trace​

Counterfactual with exd explain --ctx​

Common diagnostic patterns​

"Missing user.id" — falls through to the catch-all​

"Wrong env" — typo in the env name​

"Predicate operand type mismatch" — fails the rule silently​

"Test pass, prod miss" — the manifest the SDK loaded isn't the manifest you're inspecting​

JSON output for programmatic post-mortems​

You're done​