Provenance
Every JSON output produced by exd telemetry MUST include a structured provenance block that allows the output to be cited reproducibly. This is a core invariant of the telemetry surface.
Provenance enables three properties the system depends on:
- Reproducibility. Any human or agent reading a finding can re-run the exact query against the exact data and reproduce the exact numbers.
- Citation chains. Agent-produced artifacts (commits, PR descriptions, alert messages) cite the provenance block of the query that produced their findings. A reviewer can trace any agent claim back to the data that supports it.
- Drift detection. Two findings produced from the same input data differ in their provenance only by
executed_at. Findings whose provenance diverges in any other field are not comparable.
JSON envelope
The full envelope returned by every exd telemetry --format json invocation:
{
"query": "telemetry.srm",
"query_version": 1,
"schema_version": 1,
"exd_version": "0.5.0",
"inputs": {
"flag": "checkout-redesign",
"since": "7d",
"expected": { "control": 0.5, "variant_b": 0.5 }
},
"result": { ... },
"diagnostics": [ ... ],
"provenance": {
"source": ["s3://acme-data/exd/checkout/2026-05-08/"],
"time_range": {
"from": "2026-05-01T14:22:13Z",
"to": "2026-05-08T14:22:13Z"
},
"manifest_versions_observed": [42, 43],
"executed_at": "2026-05-08T14:22:13.421Z",
"engine": "duckdb",
"engine_version": "1.0.0",
"record_count": 12480,
"thresholds_source": "queries/thresholds.toml@a1b2c3d",
"tool_invocation": "exd telemetry srm --flag checkout-redesign --since 7d"
}
}
The same envelope is used by exd eval, exd explain, exd fixtures, and exd schema. For those commands provenance.engine is static / static+ctx / static+synth and record_count is 0.
Top-level envelope fields
| Field | Required | Type | Description |
|---|---|---|---|
query | yes | string | The query name. Built-in queries use the canonical telemetry.* form. User-defined queries use the name field from the query file. |
query_version | yes | integer | The query's version field. For built-in queries, the spec version of the implementation. |
schema_version | yes | integer | The EvaluationRecord schema version the install reads. Differs from query_version; the same query at version 1 may consume records of any compatible schema version. |
exd_version | yes | string | The exd-client version. |
inputs | yes | object | The fully-resolved parameter map passed to the query. Defaults applied. References (${name}) substituted. |
result | yes | object | Query-specific. Schema available via the capabilities manifest. |
diagnostics | yes | array | Zero or more diagnostic entries (see diagnostics § Diagnostic structure). |
provenance | yes | object | The provenance block (below). |
A query that produced no result for structural reasons (e.g., insufficient data) still emits a complete envelope; result is the documented "empty" shape for that query.
Provenance block fields
| Field | Required | Type | Description |
|---|---|---|---|
source | yes | array of string | Input source URIs and / or paths, in canonical form. Globs are expanded; relative paths are made absolute. Order is sorted lexicographically for stability. |
time_range.from | yes | RFC 3339 | Inclusive lower bound applied to records. |
time_range.to | yes | RFC 3339 | Exclusive upper bound applied to records. |
manifest_versions_observed | yes | array of integer | Sorted, deduplicated set of manifest_version values present in the analyzed records. Empty array if no records matched. |
executed_at | yes | RFC 3339 | Wall-clock time the command produced its result. UTC, millisecond precision. |
engine | yes | string | Query engine. One of duckdb, snowflake, bigquery, databricks, redshift, inmemory, static, static+ctx, static+synth. |
engine_version | yes | string | Version string of the engine. |
record_count | yes | integer | Number of records considered after window and other filters. 0 for static read-side commands. |
thresholds_source | yes | string | One of: built-in-defaults, or <path> when an explicit --thresholds <path> was passed on the CLI (with @<git-sha> appended if the install can identify the file's git commit). The queries/thresholds.toml@<git-sha> form is deferred indefinitely — the manifest-tree threshold file is not landed in Phase 1. |
tool_invocation | yes | string | The command line as parsed (canonicalized, quoted where needed). Used for replay. |
Citation format
Agents and tools that cite a finding from exd telemetry MUST produce a citation that allows a human or another agent to re-run the same analysis. The recommended citation format:
[T001 SRM] checkout-redesign control/variant_b 48.7%/51.3% n=12480 p=0.004
source: s3://acme-data/exd/checkout/2026-05-08/
window: 2026-05-01T14:22:13Z .. 2026-05-08T14:22:13Z
manifest_versions: [42, 43]
reproduce: exd telemetry srm --flag checkout-redesign --since 7d \
--thresholds queries/thresholds.toml@a1b2c3d
Agent-generated commit messages (see agent-policies) embed a structured trailer carrying the same fields.
Verification
A reviewer presented with an agent finding can verify it by:
- Checking out the manifest repo at the cited commit (which determines the query version and the thresholds version).
- Running the cited
tool_invocationagainst the citedsourceURI andtime_range. - Comparing
record_countandmanifest_versions_observed. Both MUST match. - Comparing the
resultanddiagnostics. Both MUST match modulo numerical precision.
A finding that does not reproduce indicates either a data mutation (records were rewritten or deleted) or an exd version drift. Both are auditable.
Implementation note
The reference implementation populates the provenance block from a single ProvenanceContext struct that flows through every query execution. No query may produce a JSON output without going through this context, ensuring the invariant cannot be violated by accident.
See also
- capabilities —
query_versionandschema_versioncorrespond to entries in the capabilities manifest. - diagnostics — every
diagnostics[]entry in the envelope is one of theseT-codes. - agent-policies — agent-produced commits cite this envelope.
reference/cli/conventions § JSON output— the CLI-facing summary of the same envelope.