Skip to main content

Thresholds

Thresholds parameterize the conditions under which T-codes fire. They are policy applied to data; both the verdict and the underlying numbers always appear in query output. An agent or human reviewing a finding sees what threshold tripped, what value tripped it, and can apply different policy by passing --thresholds <path> to a different file.

Implementation status — partially deferred. Threshold values (the category schema, the built-in defaults, the per-T-code semantics) are part of Phase 1. Storing the threshold file at queries/thresholds.toml in the manifest tree is deferred indefinitely. In Phase 1 the only override path is the CLI flag --thresholds <path>. The Q050Q054 lint codes that would validate a manifest-tree thresholds file are deferred with it; the range stays reserved.


File location and discovery

The reference Rust implementation searches in this order:

  1. The path passed via the CLI --thresholds <path> flag.
  2. (Deferred indefinitely.) queries/thresholds.toml in the current manifest repo. The manifest-tree discovery rule is not in Phase 1; an operator who wants to version-control their thresholds may still do so at any path of their choosing and pass it via the CLI flag.
  3. Built-in defaults compiled into exd-client.

The resolved source is recorded in the provenance block's thresholds_source field for every output.


File schema

# thresholds.toml

[srm]
significance_level = 0.001
min_sample_size = 100

[lift]
significance_level = 0.05
min_per_variant_n = 1000

[evaluation_rate_anomaly]
drop_fraction = 0.5
window = "1h"

[manifest_skew]
max_versions_behind = 2

[dead_flag]
min_sample_size = 100

[rule]
min_evaluation_count = 100

[context]
min_evaluation_count = 100

Top-level tables correspond to threshold categories. Each category controls one or more diagnostics.


Category reference

[srm]

FieldTypeDefaultControls
significance_levelfloat (0, 1)0.001Maximum chi-square p-value for T001 to fire.
min_sample_sizeinteger ≥ 1100Total sample size below which T002 fires (preempting T001).

The default significance_level = 0.001 is intentionally stricter than the conventional 0.05 to limit false positives — SRM tests at scale evaluate many flags continuously, and a lax threshold floods agents with low-signal warnings.

[lift]

FieldTypeDefaultControls
significance_levelfloat (0, 1)0.05Reserved for future lift-related diagnostics.
min_per_variant_ninteger ≥ 11000Minimum per-variant sample size for any lift estimation. Below this, T002 fires.

[evaluation_rate_anomaly]

FieldTypeDefaultControls
drop_fractionfloat (0, 1)0.5Magnitude of relative change between current and prior window for T007 to fire.
windowduration1hDefault comparison window when --compare-to is given without an explicit value.

[manifest_skew]

FieldTypeDefaultControls
max_versions_behindinteger ≥ 12Number of versions behind newest before T004 fires.

[dead_flag]

FieldTypeDefaultControls
min_sample_sizeinteger ≥ 1100Minimum evaluation count required for T003 to fire.

[rule]

FieldTypeDefaultControls
min_evaluation_countinteger ≥ 1100Minimum flag evaluation count required for T005 and T009 to fire.

[context]

FieldTypeDefaultControls
min_evaluation_countinteger ≥ 1100Minimum evaluation count for T006 to fire.

Lint rules for thresholds.toml

Deferred indefinitely with the manifest-tree thresholds file. In Phase 1 a malformed --thresholds <path> file produces a CLI error (exit code 2), not a lint diagnostic; the codes below are reserved for the day the manifest-tree path lands.

The thresholds file would be subject to the same lint pipeline as other manifest files. Diagnostics:

CodeSeverityTriggers when
Q050errorthresholds.toml contains an unknown top-level table.
Q051errorA field has the wrong type for its category (e.g., significance_level = "tight").
Q052errorA significance_level is not in (0, 1).
Q053errorAn integer threshold is < 1.
Q054warningA significance_level is ≥ 0.1, which is unusually permissive and likely to produce false positives.

These diagnostics sit in the Q- namespace because they apply to a TOML manifest-tree file lint-validated at upload time, not to an analysis result.


Threshold versioning

The thresholds file is part of the manifest (under the deferred manifest-tree path) or a stand-alone file (today). It is versioned by git history, not by an internal version field. Findings cite the thresholds file's git commit (recorded in provenance.thresholds_source) so that reproducibility holds even when thresholds change between analyses.

A push that modifies thresholds.toml MAY surface different diagnostics on subsequent runs against the same data. This is intentional: thresholds are policy, and changing policy is meant to surface or suppress observations.


Recommendations for tuning

Informative, not normative.

  • Start with defaults. The defaults are deliberately conservative.
  • Tighten srm.significance_level further (0.0001 or below) for very high-traffic experiments where the default still produces too many warnings.
  • Raise manifest_skew.max_versions_behind to 5 or higher in environments with deliberately staggered SDK rollouts (e.g., mobile apps with slow update adoption).
  • Lower dead_flag.min_sample_size for low-volume flag namespaces where the default would never fire.
  • Avoid setting any min_* threshold to 1 — single-observation diagnostics are almost always noise.

See also

  • diagnostics — every T-code whose firing condition references a threshold.
  • provenanceprovenance.thresholds_source records which file was applied.
  • reference/cli/exd/telemetry/--thresholds is a shared flag across every telemetry command.