Thresholds
Thresholds parameterize the conditions under which T-codes fire. They are policy applied to data; both the verdict and the underlying numbers always appear in query output. An agent or human reviewing a finding sees what threshold tripped, what value tripped it, and can apply different policy by passing --thresholds <path> to a different file.
Implementation status — partially deferred. Threshold values (the category schema, the built-in defaults, the per-
T-codesemantics) are part of Phase 1. Storing the threshold file atqueries/thresholds.tomlin the manifest tree is deferred indefinitely. In Phase 1 the only override path is the CLI flag--thresholds <path>. TheQ050–Q054lint codes that would validate a manifest-tree thresholds file are deferred with it; the range stays reserved.
File location and discovery
The reference Rust implementation searches in this order:
- The path passed via the CLI
--thresholds <path>flag. - (Deferred indefinitely.)
queries/thresholds.tomlin the current manifest repo. The manifest-tree discovery rule is not in Phase 1; an operator who wants to version-control their thresholds may still do so at any path of their choosing and pass it via the CLI flag. - Built-in defaults compiled into
exd-client.
The resolved source is recorded in the provenance block's thresholds_source field for every output.
File schema
# thresholds.toml
[srm]
significance_level = 0.001
min_sample_size = 100
[lift]
significance_level = 0.05
min_per_variant_n = 1000
[evaluation_rate_anomaly]
drop_fraction = 0.5
window = "1h"
[manifest_skew]
max_versions_behind = 2
[dead_flag]
min_sample_size = 100
[rule]
min_evaluation_count = 100
[context]
min_evaluation_count = 100
Top-level tables correspond to threshold categories. Each category controls one or more diagnostics.
Category reference
[srm]
| Field | Type | Default | Controls |
|---|---|---|---|
significance_level | float (0, 1) | 0.001 | Maximum chi-square p-value for T001 to fire. |
min_sample_size | integer ≥ 1 | 100 | Total sample size below which T002 fires (preempting T001). |
The default significance_level = 0.001 is intentionally stricter than the conventional 0.05 to limit false positives — SRM tests at scale evaluate many flags continuously, and a lax threshold floods agents with low-signal warnings.
[lift]
| Field | Type | Default | Controls |
|---|---|---|---|
significance_level | float (0, 1) | 0.05 | Reserved for future lift-related diagnostics. |
min_per_variant_n | integer ≥ 1 | 1000 | Minimum per-variant sample size for any lift estimation. Below this, T002 fires. |
[evaluation_rate_anomaly]
| Field | Type | Default | Controls |
|---|---|---|---|
drop_fraction | float (0, 1) | 0.5 | Magnitude of relative change between current and prior window for T007 to fire. |
window | duration | 1h | Default comparison window when --compare-to is given without an explicit value. |
[manifest_skew]
| Field | Type | Default | Controls |
|---|---|---|---|
max_versions_behind | integer ≥ 1 | 2 | Number of versions behind newest before T004 fires. |
[dead_flag]
| Field | Type | Default | Controls |
|---|---|---|---|
min_sample_size | integer ≥ 1 | 100 | Minimum evaluation count required for T003 to fire. |
[rule]
| Field | Type | Default | Controls |
|---|---|---|---|
min_evaluation_count | integer ≥ 1 | 100 | Minimum flag evaluation count required for T005 and T009 to fire. |
[context]
| Field | Type | Default | Controls |
|---|---|---|---|
min_evaluation_count | integer ≥ 1 | 100 | Minimum evaluation count for T006 to fire. |
Lint rules for thresholds.toml
Deferred indefinitely with the manifest-tree thresholds file. In Phase 1 a malformed
--thresholds <path>file produces a CLI error (exit code 2), not a lint diagnostic; the codes below are reserved for the day the manifest-tree path lands.
The thresholds file would be subject to the same lint pipeline as other manifest files. Diagnostics:
| Code | Severity | Triggers when |
|---|---|---|
Q050 | error | thresholds.toml contains an unknown top-level table. |
Q051 | error | A field has the wrong type for its category (e.g., significance_level = "tight"). |
Q052 | error | A significance_level is not in (0, 1). |
Q053 | error | An integer threshold is < 1. |
Q054 | warning | A significance_level is ≥ 0.1, which is unusually permissive and likely to produce false positives. |
These diagnostics sit in the Q- namespace because they apply to a TOML manifest-tree file lint-validated at upload time, not to an analysis result.
Threshold versioning
The thresholds file is part of the manifest (under the deferred manifest-tree path) or a stand-alone file (today). It is versioned by git history, not by an internal version field. Findings cite the thresholds file's git commit (recorded in provenance.thresholds_source) so that reproducibility holds even when thresholds change between analyses.
A push that modifies thresholds.toml MAY surface different diagnostics on subsequent runs against the same data. This is intentional: thresholds are policy, and changing policy is meant to surface or suppress observations.
Recommendations for tuning
Informative, not normative.
- Start with defaults. The defaults are deliberately conservative.
- Tighten
srm.significance_levelfurther (0.0001or below) for very high-traffic experiments where the default still produces too many warnings. - Raise
manifest_skew.max_versions_behindto5or higher in environments with deliberately staggered SDK rollouts (e.g., mobile apps with slow update adoption). - Lower
dead_flag.min_sample_sizefor low-volume flag namespaces where the default would never fire. - Avoid setting any
min_*threshold to1— single-observation diagnostics are almost always noise.
See also
- diagnostics — every
T-codewhose firing condition references a threshold. - provenance —
provenance.thresholds_sourcerecords which file was applied. reference/cli/exd/telemetry/—--thresholdsis a shared flag across every telemetry command.