Thresholds

Thresholds parameterize the conditions under which T-codes fire. They are policy applied to data; both the verdict and the underlying numbers always appear in query output. An agent or human reviewing a finding sees what threshold tripped, what value tripped it, and can apply different policy by passing --thresholds <path> to a different file.

Implementation status — partially deferred. Threshold values (the category schema, the built-in defaults, the per-T-code semantics) are part of Phase 1. Storing the threshold file at queries/thresholds.toml in the manifest tree is deferred indefinitely. In Phase 1 the only override path is the CLI flag --thresholds <path>. The Q050–Q054 lint codes that would validate a manifest-tree thresholds file are deferred with it; the range stays reserved.

File location and discovery

The reference Rust implementation searches in this order:

The path passed via the CLI --thresholds <path> flag.
(Deferred indefinitely.) queries/thresholds.toml in the current manifest repo. The manifest-tree discovery rule is not in Phase 1; an operator who wants to version-control their thresholds may still do so at any path of their choosing and pass it via the CLI flag.
Built-in defaults compiled into exd-client.

The resolved source is recorded in the provenance block's thresholds_source field for every output.

File schema

# thresholds.toml

[srm]
significance_level = 0.001
min_sample_size = 100

[lift]
significance_level = 0.05
min_per_variant_n = 1000

[evaluation_rate_anomaly]
drop_fraction = 0.5
window = "1h"

[manifest_skew]
max_versions_behind = 2

[dead_flag]
min_sample_size = 100

[rule]
min_evaluation_count = 100

[context]
min_evaluation_count = 100

Top-level tables correspond to threshold categories. Each category controls one or more diagnostics.

Category reference

`[srm]`

Field	Type	Default	Controls
`significance_level`	float `(0, 1)`	`0.001`	Maximum chi-square p-value for `T001` to fire.
`min_sample_size`	integer ≥ 1	`100`	Total sample size below which `T002` fires (preempting `T001`).

The default significance_level = 0.001 is intentionally stricter than the conventional 0.05 to limit false positives — SRM tests at scale evaluate many flags continuously, and a lax threshold floods agents with low-signal warnings.

`[lift]`

Field	Type	Default	Controls
`significance_level`	float `(0, 1)`	`0.05`	Reserved for future lift-related diagnostics.
`min_per_variant_n`	integer ≥ 1	`1000`	Minimum per-variant sample size for any lift estimation. Below this, `T002` fires.

`[evaluation_rate_anomaly]`

Field	Type	Default	Controls
`drop_fraction`	float `(0, 1)`	`0.5`	Magnitude of relative change between current and prior window for `T007` to fire.
`window`	duration	`1h`	Default comparison window when `--compare-to` is given without an explicit value.

`[manifest_skew]`

Field	Type	Default	Controls
`max_versions_behind`	integer ≥ 1	`2`	Number of versions behind newest before `T004` fires.

`[dead_flag]`

Field	Type	Default	Controls
`min_sample_size`	integer ≥ 1	`100`	Minimum evaluation count required for `T003` to fire.

`[rule]`

Field	Type	Default	Controls
`min_evaluation_count`	integer ≥ 1	`100`	Minimum flag evaluation count required for `T005` and `T009` to fire.

`[context]`

Field	Type	Default	Controls
`min_evaluation_count`	integer ≥ 1	`100`	Minimum evaluation count for `T006` to fire.

Lint rules for `thresholds.toml`

Deferred indefinitely with the manifest-tree thresholds file. In Phase 1 a malformed --thresholds <path> file produces a CLI error (exit code 2), not a lint diagnostic; the codes below are reserved for the day the manifest-tree path lands.

The thresholds file would be subject to the same lint pipeline as other manifest files. Diagnostics:

Code	Severity	Triggers when
`Q050`	error	`thresholds.toml` contains an unknown top-level table.
`Q051`	error	A field has the wrong type for its category (e.g., `significance_level = "tight"`).
`Q052`	error	A `significance_level` is not in `(0, 1)`.
`Q053`	error	An integer threshold is `< 1`.
`Q054`	warning	A `significance_level` is `≥ 0.1`, which is unusually permissive and likely to produce false positives.

These diagnostics sit in the Q- namespace because they apply to a TOML manifest-tree file lint-validated at upload time, not to an analysis result.

Threshold versioning

The thresholds file is part of the manifest (under the deferred manifest-tree path) or a stand-alone file (today). It is versioned by git history, not by an internal version field. Findings cite the thresholds file's git commit (recorded in provenance.thresholds_source) so that reproducibility holds even when thresholds change between analyses.

A push that modifies thresholds.toml MAY surface different diagnostics on subsequent runs against the same data. This is intentional: thresholds are policy, and changing policy is meant to surface or suppress observations.

Recommendations for tuning

Informative, not normative.

Start with defaults. The defaults are deliberately conservative.
Tighten srm.significance_level further (0.0001 or below) for very high-traffic experiments where the default still produces too many warnings.
Raise manifest_skew.max_versions_behind to 5 or higher in environments with deliberately staggered SDK rollouts (e.g., mobile apps with slow update adoption).
Lower dead_flag.min_sample_size for low-volume flag namespaces where the default would never fire.
Avoid setting any min_* threshold to 1 — single-observation diagnostics are almost always noise.

File location and discovery​

File schema​

Category reference​

[srm]​

[lift]​

[evaluation_rate_anomaly]​

[manifest_skew]​

[dead_flag]​

[rule]​

[context]​

Lint rules for thresholds.toml​

Threshold versioning​

Recommendations for tuning​

See also​