Skip to main content

Telemetry diagnostics

The exd telemetry analysis surface emits structured diagnostics — T-codes — when the data it analyzes meets a documented condition. Each T-code has stable semantics, a fixed severity, and a documented detection criterion.

This page is the authoritative vocabulary. New conditions get new codes; once allocated, a code's semantics are immutable. The structure mirrors manifest diagnostics so readers familiar with that page can navigate this one without retraining.

The CLI surface that emits these diagnostics is documented in reference/cli/exd/telemetry/ and lists, per command, the codes it can raise.


Severity levels

SeverityMeaning
infoAn observation. No action required.
warningA likely problem warranting investigation.
errorA high-confidence problem requiring action.

Severity is fixed per code. Thresholds determine when a code fires; severity does not change based on observed data.

A diagnostic at any severity is a normal output of the analysis surface. Diagnostics do not, by themselves, cause exd telemetry commands to fail. The --fail-on-error flag and agent policies opt callers into action on diagnostic output.


Conformance levels

Every rule across the telemetry surface applies at one of three levels:

LevelEnforced byEffect of violation
SchemaRecord producers and consumers; SDK buildA malformed record is rejected by the receiving sink; an SDK that produces malformed records is non-conformant.
Telemetry diagnosticexd telemetry analysis (T0xx codes documented below)An observation about the data, surfaced to the user. Does NOT fail any operation by itself; agent policies MAY gate actions on diagnostic output.
ConventionReviewers, dashboards, documentationNo automated enforcement. Affects readability and team workflow.

T-codes are reserved as the entire T001T099 range. The single E040 code (Phase 2 agent-policy gating) lives in the manifest diagnostic vocabulary and follows the manifest's stability invariants; see manifest § E040. The Q-code range (Q001Q099) is reserved for a deferred-indefinitely user-defined query catalog.


Diagnostic structure

Each diagnostic emitted by a query has the following fields:

FieldTypeDescription
codestringThe stable code, e.g. "T001".
severityenuminfo, warning, or error.
messagestringHuman-readable description. Not stable across versions; consumers SHOULD match on code only.
dataobjectStructured payload specific to the code. Documented per-code below.

JSON output renders severity as a lowercase string and data as an object (never a primitive).


Numeric index

CodeSeveritySummary
T001warningSample-ratio mismatch (observed split differs significantly from expected)
T002infoInsufficient sample size for statistical inference
T003infoDead flag (only the catch-all variant returned)
T004warningManifest-version skew across SDK clients
T005infoRule declared but never matched in the window
T006infoA predicate-referenced attribute is never set
T007errorEvaluation rate dropped vs. compare-to window
T008infoFlag declared in manifest, zero evaluations in window
T009infoDeclared variant never produced
T010warningSame unit_id_hash got different variants at same manifest_version
T011warningA private_attributes key appeared in context_attributes
T012infoRecord carries an unknown evaluation_reason

Diagnostics

T001 — sample-ratio mismatch

Severity: warning

Triggers when:

  • A query analyzing variant distribution observes proportions deviating from the expected split with a chi-square p-value below the configured srm.significance_level threshold.
  • The total sample size meets srm.min_sample_size.

Data fields:

{
"expected": { "control": 0.5, "variant_b": 0.5 },
"observed": { "control": 0.487, "variant_b": 0.513 },
"n": 12480,
"chi_square": 8.31,
"p_value": 0.004,
"threshold": 0.001
}

Detected by: telemetry.srm, telemetry.summary (when expected is supplied).

Remedy. Investigate causes — caching layer that bypasses some users, mid-flight rule changes, SDKs on stale manifest versions (correlate with T004), or a faulty bucketing implementation.

T002 — insufficient sample size

Severity: info

Triggers when:

  • A query that performs statistical inference operates on data where the per-group sample size is below the configured srm.min_sample_size (for SRM) or lift.min_per_variant_n (for lift) threshold.

Data fields:

{
"n": 23,
"min_required": 100,
"test": "srm"
}

Detected by: any inferential query.

Remedy. Wait for more data, or invoke the query with a wider --since window. Does not represent a problem with the data itself.

T003 — dead flag

Severity: info

Triggers when:

  • Every evaluation of the flag in the window had evaluation_reason ∈ {fallthrough, off} AND the variant returned was the per-environment default value.
  • The window contained at least dead_flag.min_sample_size evaluations.

Data fields:

{
"flag_key": "old-experiment",
"evaluation_count": 4203,
"default_variant": "control",
"first_seen": "2025-12-01T...",
"last_seen": "2026-05-08T..."
}

Detected by: telemetry.dead-flags.

Remedy. Candidate for retirement. Set flag.lifecycle = "retired" and remove rule definitions in a follow-up cleanup PR.

T004 — manifest-version skew

Severity: warning

Triggers when:

  • More than one distinct manifest_version was observed in the window.
  • The oldest version observed is manifest_skew.max_versions_behind or more behind the newest.

Data fields:

{
"newest_version": 47,
"oldest_version": 42,
"version_distribution": {
"47": 0.62,
"46": 0.31,
"42": 0.07
},
"threshold": 2
}

Detected by: telemetry.version-skew, telemetry.summary.

Remedy. Identify which SDK clients have not refreshed. May indicate stuck CDN caching, paused SDK pollers, or clients with a hard-pinned manifest version.

T005 — rule never fires

Severity: info

Triggers when:

  • A rule_id that exists in the flag's manifest at the latest observed manifest_version produced zero matches across the window.
  • The window contained at least rule.min_evaluation_count evaluations of the flag.

Data fields:

{
"flag_key": "checkout-redesign",
"rule_id": "rule-3",
"rule_segment": "internal-employees",
"flag_evaluation_count": 12480
}

Detected by: telemetry.rules.

Remedy. Investigate whether the segment definition matches the production population. Common causes: typo in segment attribute name, expired condition (country_code = "GB" after a market exit), shadowing by an earlier rule.

T006 — context attribute always missing

Severity: info

Triggers when:

  • An attribute referenced by any rule's predicate or any segment's predicate is absent from context_attributes in 100% of evaluations in the window.
  • The window contained at least context.min_evaluation_count evaluations.

Data fields:

{
"flag_key": "checkout-redesign",
"missing_attribute": "device_family",
"evaluation_count": 12480
}

Detected by: telemetry.contexts.

Remedy. Either the SDK call sites are not populating the attribute (instrumentation bug) or the rule references a stale attribute name.

T007 — evaluation rate anomaly

Severity: error

Triggers when:

  • A query is invoked with --compare-to (a prior window of equal duration) AND the current-window evaluation count differs from the prior window by a factor exceeding evaluation_rate_anomaly.drop_fraction.

Data fields:

{
"flag_key": "checkout-redesign",
"current_count": 1240,
"prior_count": 12480,
"delta_fraction": -0.901,
"threshold": 0.5,
"current_window": { "from": "...", "to": "..." },
"prior_window": { "from": "...", "to": "..." }
}

Detected by: telemetry.summary.

Remedy. Investigate upstream traffic (load balancer, CDN, deployment), SDK initialization failures, or a manifest change that disabled the flag (correlate with T004).

T008 — cold flag

Severity: info

Triggers when:

  • A flag declared in the latest observed manifest had zero evaluations in the window.

Data fields:

{
"flag_key": "untested-feature",
"namespace": "checkout",
"manifest_version": 47
}

Detected by: telemetry.coverage.

Remedy. Confirm whether the flag is intentionally not in use. Candidates for retirement or for explicit rollout.

T009 — variant never returned

Severity: info

Triggers when:

  • A variant declared in [flag.variants] was returned in zero evaluations across the window.
  • The window contained at least rule.min_evaluation_count evaluations of the flag.

Data fields:

{
"flag_key": "checkout-redesign",
"unused_variant": "variant-c",
"evaluation_count": 12480
}

Detected by: telemetry.summary.

Remedy. Investigate why the variant is unreachable — no rule references it, all referencing rules are unreachable, or its bucket allocation is zero.

T010 — bucketing inconsistency

Severity: warning

Triggers when:

  • The same unit_id_hash value received different variant_key values for the same flag within a single manifest_version.

Data fields:

{
"flag_key": "checkout-redesign",
"unit_id_hash": "9f86d081...",
"manifest_version": 47,
"variants_observed": ["variant-a", "variant-b"],
"occurrence_count": 4
}

Detected by: telemetry.summary, telemetry.user.

Remedy. Indicates a non-deterministic context (the bucketing attribute changes between calls), an SDK bug in bucket computation, or an upstream identity-stitching issue. Compare against T011.

T011 — private attribute leakage

Severity: warning

Triggers when:

  • A record contains a context attribute whose key is listed in namespace.private_attributes (or, for records produced for a specific flag, in that flag's flag.private_attributes). The effective set is the union of both lists.

Data fields:

{
"namespace": "checkout",
"attribute_key": "email",
"occurrence_count": 17,
"first_seen_sdk_version": "0.4.1"
}

Detected by: any query.

Remedy. A non-conformant SDK is emitting unfiltered records. Identify by sdk_name + sdk_version and upgrade. The record itself MUST NOT be persisted in long-term storage; the analysis surface is permitted to read it for the purpose of emitting this diagnostic, but downstream archival systems SHOULD redact the offending attribute.

T012 — unknown evaluation reason

Severity: info

Triggers when:

  • A record carries an evaluation_reason value the analysis surface does not recognize.

Data fields:

{
"unknown_reason": "future_reason",
"first_seen_sdk_version": "0.6.0",
"occurrence_count": 2104
}

Detected by: any query.

Remedy. A newer SDK is producing records this exd install does not yet understand. Upgrade the analysis tooling.


Reserved ranges

RangeReserved for
T013T099Future schema-version-1 diagnostics.
T100T199Schema-version-2 diagnostics.

A future revision MAY add codes within the schema-version-1 range without bumping the schema version, provided the new code does not change the meaning of any existing data.


See also