Skip to main content

Provenance

Every JSON output produced by exd telemetry MUST include a structured provenance block that allows the output to be cited reproducibly. This is a core invariant of the telemetry surface.

Provenance enables three properties the system depends on:

  1. Reproducibility. Any human or agent reading a finding can re-run the exact query against the exact data and reproduce the exact numbers.
  2. Citation chains. Agent-produced artifacts (commits, PR descriptions, alert messages) cite the provenance block of the query that produced their findings. A reviewer can trace any agent claim back to the data that supports it.
  3. Drift detection. Two findings produced from the same input data differ in their provenance only by executed_at. Findings whose provenance diverges in any other field are not comparable.

JSON envelope

The full envelope returned by every exd telemetry --format json invocation:

{
"query": "telemetry.srm",
"query_version": 1,
"schema_version": 1,
"exd_version": "0.5.0",
"inputs": {
"flag": "checkout-redesign",
"since": "7d",
"expected": { "control": 0.5, "variant_b": 0.5 }
},
"result": { ... },
"diagnostics": [ ... ],
"provenance": {
"source": ["s3://acme-data/exd/checkout/2026-05-08/"],
"time_range": {
"from": "2026-05-01T14:22:13Z",
"to": "2026-05-08T14:22:13Z"
},
"manifest_versions_observed": [42, 43],
"executed_at": "2026-05-08T14:22:13.421Z",
"engine": "duckdb",
"engine_version": "1.0.0",
"record_count": 12480,
"thresholds_source": "queries/thresholds.toml@a1b2c3d",
"tool_invocation": "exd telemetry srm --flag checkout-redesign --since 7d"
}
}

The same envelope is used by exd eval, exd explain, exd fixtures, and exd schema. For those commands provenance.engine is static / static+ctx / static+synth and record_count is 0.


Top-level envelope fields

FieldRequiredTypeDescription
queryyesstringThe query name. Built-in queries use the canonical telemetry.* form. User-defined queries use the name field from the query file.
query_versionyesintegerThe query's version field. For built-in queries, the spec version of the implementation.
schema_versionyesintegerThe EvaluationRecord schema version the install reads. Differs from query_version; the same query at version 1 may consume records of any compatible schema version.
exd_versionyesstringThe exd-client version.
inputsyesobjectThe fully-resolved parameter map passed to the query. Defaults applied. References (${name}) substituted.
resultyesobjectQuery-specific. Schema available via the capabilities manifest.
diagnosticsyesarrayZero or more diagnostic entries (see diagnostics § Diagnostic structure).
provenanceyesobjectThe provenance block (below).

A query that produced no result for structural reasons (e.g., insufficient data) still emits a complete envelope; result is the documented "empty" shape for that query.


Provenance block fields

FieldRequiredTypeDescription
sourceyesarray of stringInput source URIs and / or paths, in canonical form. Globs are expanded; relative paths are made absolute. Order is sorted lexicographically for stability.
time_range.fromyesRFC 3339Inclusive lower bound applied to records.
time_range.toyesRFC 3339Exclusive upper bound applied to records.
manifest_versions_observedyesarray of integerSorted, deduplicated set of manifest_version values present in the analyzed records. Empty array if no records matched.
executed_atyesRFC 3339Wall-clock time the command produced its result. UTC, millisecond precision.
engineyesstringQuery engine. One of duckdb, snowflake, bigquery, databricks, redshift, inmemory, static, static+ctx, static+synth.
engine_versionyesstringVersion string of the engine.
record_countyesintegerNumber of records considered after window and other filters. 0 for static read-side commands.
thresholds_sourceyesstringOne of: built-in-defaults, or <path> when an explicit --thresholds <path> was passed on the CLI (with @<git-sha> appended if the install can identify the file's git commit). The queries/thresholds.toml@<git-sha> form is deferred indefinitely — the manifest-tree threshold file is not landed in Phase 1.
tool_invocationyesstringThe command line as parsed (canonicalized, quoted where needed). Used for replay.

Citation format

Agents and tools that cite a finding from exd telemetry MUST produce a citation that allows a human or another agent to re-run the same analysis. The recommended citation format:

[T001 SRM] checkout-redesign control/variant_b 48.7%/51.3% n=12480 p=0.004
source: s3://acme-data/exd/checkout/2026-05-08/
window: 2026-05-01T14:22:13Z .. 2026-05-08T14:22:13Z
manifest_versions: [42, 43]
reproduce: exd telemetry srm --flag checkout-redesign --since 7d \
--thresholds queries/thresholds.toml@a1b2c3d

Agent-generated commit messages (see agent-policies) embed a structured trailer carrying the same fields.


Verification

A reviewer presented with an agent finding can verify it by:

  1. Checking out the manifest repo at the cited commit (which determines the query version and the thresholds version).
  2. Running the cited tool_invocation against the cited source URI and time_range.
  3. Comparing record_count and manifest_versions_observed. Both MUST match.
  4. Comparing the result and diagnostics. Both MUST match modulo numerical precision.

A finding that does not reproduce indicates either a data mutation (records were rewritten or deleted) or an exd version drift. Both are auditable.


Implementation note

The reference implementation populates the provenance block from a single ProvenanceContext struct that flows through every query execution. No query may produce a JSON output without going through this context, ensuring the invariant cannot be violated by accident.


See also