Test Matrix

This matrix is a normative proof map, not a product-behavior source of truth. The Requirement column summarizes canonical contracts from spec.md and companion docs so each gate has a testable obligation; if a row conflicts with a canonical contract, the canonical contract wins and this matrix must be patched in the same PR.

Required phase column. v0.1 CI means the gate is enforced from the first v0.1 commit. Release means the gate is enforced at release tagging. Same PR means the proof artifact MUST land in the same PR as the change that triggered the requirement — no deferral to a follow-up PR is permitted. vN.M CI means the gate is enforced starting with the named version's CI; PRs targeting that version must include the proof.

Requirement	Proof artifact	Required phase
Tier A has exactly 9 meta-tools	`TestTierAMetaToolCount`	v0.1 CI
Tier A has at most 18 convenience tools	`TestTierAConvenienceToolCount`	v0.1 CI
Tier A cold-start schema budget <= 8,000 `cl100k_base` tokens	`TestTierATokenBudget`	v0.1 CI and release
Tier A output schemas preserve `$defs`	`TestOutputSchemaDefs`	v0.1 CI
Tier A representative `structuredContent` validates against `ToonResult`, `SingleObjectResult`, `RawJsonResult`, `GainResult`, and `CacheStatsResult` schemas; diff-only 304 responses (`{"unchanged": true, "etag": "..."}`) are exempt and the test MUST include at least one 304 fixture and verify validation is skipped (not failed) for that shape	`TestTierAResponseShapeConformance` / `TestGainOutputSchema` / `TestCacheStatsOutputSchema`	v0.1 CI
All Tier A tool registrations include `outputSchema`	`TestTierARegistrationScan`	v0.1 CI
No production code path imports any package under a `testdata/` path segment; test-only registration helpers in `testdata/` cannot contribute to the Tier A budget	`TestTestdataNoProductionImport`	v0.1 CI
Tool annotations serialize according to go-sdk v1.6.0 wire semantics: `readOnlyHint:true` present on read-only tools, `readOnlyHint:false` may be omitted on non-read-only tools, `destructiveHint` present with explicit true/false on every Tier A tool, and optional `idempotentHint` / `openWorldHint` never serialize as null	`TestToolAnnotations` / `TestToolAnnotationsWireForm`	v0.1 CI
Confirmation-required paths are bound to exact pending requests: `gum.destructive`, `gum.write` variants with `confirmation_policy="high_stakes_write"`, and `gum.code` emit short-lived confirmation tokens, reject missing/expired/replayed/mismatched tokens with `CONFIRMATION_TOKEN_INVALID`, and make no upstream call on rejection; `gum.code` tokens bind `language`, `source_sha256`, `allow_write`, `allow_destructive`, `destructive_budget`, and canonical `destructive_scope`; fixture set MUST include (a) replay attempt within TTL, (b) replay attempt after TTL, (c) cross-process replay (token minted by one `gum` process and presented to another), (d) simulated cross-profile replay (token minted with profile A's per-process HMAC key presented to a handler initialized with profile B's distinct per-process HMAC key), (e) confirmed `gum.code` re-invocation that increases `destructive_budget` or broadens `destructive_scope` after approval, and (f) a `gum.write` high-stakes variant whose token cannot be reused for another write or destructive variant — all reject with `CONFIRMATION_TOKEN_INVALID`, with the cross-process and cross-profile cases carrying `reason: "mismatch"` per §6.1.2; the cross-profile fixture documents the per-process profile-binding invariant (§6.1.2 Profile binding) even though it cannot occur in a correctly deployed v0.1.0 system	`TestConfirmationTokenBinding` / `TestHighStakesWriteConfirmation`	v0.1 CI
`gum.code` confirmation token re-hashes submitted `source` at re-invocation: presenting a valid token with a different `source` body (same `source_sha256` in token, different actual `source`) is rejected with `CONFIRMATION_TOKEN_INVALID` (reason: `mismatch`); no script execution occurs	`TestConfirmationTokenSourceRehash`	v0.1 CI
Tier A convenience-tool ABI is generated from the normative §4.1 table: each row binds to the listed backing `op_id`, fixed/default variant rule, required/optional args, output profile/formats, and confirmation-passthrough policy; rows that wrap `confirmation_policy="high_stakes_write"` variants without `confirmation_passthrough=yes` fail catalog generation; every convenience tool registers an outputSchema covering its confirmation branch when applicable	`TestTierAConvenienceABI` / `TestConvenienceHighStakesConfirmation`	v0.1 CI
MCP completions are available for op IDs, variant IDs, resource-template params, plugin names, help topics, operation handles, and the closed enums `gum.code.language` (`risor` only in v0.1.0) and `gum.*.format` (`toon`, `csv`, `json`, `markdown`) without upstream calls; CLI invoke completions cover op IDs, `--variant-id`, plugin names, help topics, `--profile`, and `gum call` boolean output flags (`--json`, `--toon`, `--csv`, `--markdown`), while `gum call --format` and `gum code --lang` are absent in v0.1; reporting subcommands may expose their own flags, and `gum gain --format` completes `text`, `json`, and `csv`	`TestMCPCompletions` / `TestCLICompletions`	v0.1 CI
Tier A registrations complete before MCP transport accepts connections; no `Server.AddTool` calls occur after `Server.Run`; no spurious `tools/list_changed` notifications fire during handshake	`TestMCPStartupOrdering`	v0.1 CI
Loading active plugins before `Server.Run` does not grow `tools/list`; v0.1.0 roster matches `docs/tier-a-roster.v1.json` exactly (9 meta-tools + 18 convenience tools), while plugin variants are reachable only through catalog/resources/search after restart or through a later standalone CLI process startup	`TestTierARosterManifest` / `TestTierAToolCountWithPlugins`	v0.1 CI
`logging` server capability is absent from the v0.1.0 `initialize` response; the §13.2 capability matrix declares exactly which server-owned capabilities are advertised (tools, resources, prompts, completions — `notifications/cancelled` and `notifications/progress` honoured; roots consumed only when client-owned capability is declared; sampling/elicitation/logging/list_changed absent)	`TestMCPInitializeCapabilities`	v0.1 CI
MCP 2025-11-25 task capability is absent from the v0.1.0 `initialize` response; `tasks/get`, `tasks/result`, `tasks/list`, `tasks/cancel`, `notifications/tasks/status`, and `notifications/elicitation/complete` are not advertised or emitted in v0.1.0; emitted `tools/list`, `resources/list`, `resources/templates/list`, and `prompts/list` payloads contain no `icons` fields	`TestMCPInitializeCapabilities` / `TestNoTaskCapabilityV01` / `TestNoIconMetadataV01`	v0.1 CI
`gum.poll` sends progress only when `_meta.progressToken` is present, uses MCP progress wire shape, and preserves the token's JSON type (string or integer) end-to-end without stringifying integers or coercing to float	`TestPollProgressTokenContract`	v0.1 CI
`gum.poll` timeout and context cancellation do not leak goroutines	`TestPollTimeoutAndCancellation`	v0.1 CI
MCP roots drive project-local expression-profile lookup; fixtures cover single `file://` root, multiple roots with `_meta.gumRoot`, multiple roots without `_meta.gumRoot` failing with `PROJECT_ROOT_REQUIRED`, non-file roots rejected for project-local lookup, roots-unavailable MCP sessions disabling project-local lookup by default, and `--allow-implicit-project-root` as the only path that may use `GUM_PROJECT_ROOT` / `$PWD` with `_profile_resolution_warning: "implicit_project_root"`	`TestProfileResolutionFromMCPRoots`	v0.1 CI
Prompts are registered and retrievable through `prompts/list` and `prompts/get`; v0.1 prompts are exactly the zero-argument static prompts in §13, `prompts/list` reports empty `arguments`, `prompts/get` rejects supplied arguments with `INVALID_ARGS`, embedded templates stay under 6 KiB, and returned payloads contain one user-message text block	`TestPromptRegistration` / `TestPromptZeroArgumentContract`	v0.1 CI
No unsolicited server notifications are emitted before `notifications/initialized`	`TestMCPInitializedWaitRule`	v0.1 CI
MCP resource templates appear in `resources/templates/list` (not `resources/list`)	`TestResourceTemplateRegistration`	v0.1 CI
`resources/list` and `resources/templates/list` accept only MCP cursor pagination: requests use optional `cursor`, responses use `nextCursor`, the server page cap is 100 entries, and no client page-size parameter is accepted or required	`TestMCPResourceCursorPagination`	v0.1 CI
`gum://schema/{ref}` returns JSON Schema documents for schema refs exposed by active `gum://op/{id}` / `gum://variant/{id}` resources using exactly one MCP text resource content item with requested `uri`, `mimeType="application/schema+json"`, and JCS-canonical JSON payload, enabling exact TOON reconstruction; `gum://op/{id}` and `gum://variant/{id}` success reads use the same one-item JSON shape with `mimeType="application/json"`; refs must satisfy the safe served-ref grammar; unknown refs, invalid ref grammar, inactive-plugin refs (`installed_pending_restart` or `needs_configuration`), and quarantined-plugin refs return canonical `RESOURCE_NOT_FOUND` as a JSON-RPC resource error; divergent full-profile-inventory ref collisions fail build/install with `SCHEMA_REF_COLLISION` while identical-body reuse is allowed	`TestSchemaResourceLookup` / `TestOpVariantResourceWireShape` / `TestSchemaRefGrammar` / `TestSchemaRefCollision`	v0.1 CI
Static resources (`gum://catalog`, `gum://status/canaries`, `gum://plugins`, `gum://help/topics`) appear in `resources/list`	`TestStaticResourceRegistration`	v0.1 CI
Recovery links for `recovery="resource_link"` appear as both `_expression.full_result_resource` and exactly one MCP `resource_link` content block with the same `gum://results/<hash>` URI; `local_artifact` never emits a resource-link block	`TestRecoveryResourceLinkContentBlock`	v0.1 CI
`tee.secret` lifecycle and principal scoping: (a) absent secret is generated lazily on first lossy write, exactly 32 bytes, mode 600; (b) corrupt or malformed secret (not 64 lowercase hex chars) causes `TEE_SECRET_CORRUPT` error, no silent regeneration; (c) `gum://results/{hash}` resolution uses directory scan, success reads return exactly one decompressed `application/json` resource item, and absent hashes return `RESULT_ARTIFACT_EXPIRED`; (d) same op/variant/args under two credential subjects in one profile yields different recovery URIs, tee paths, HTTP cache keys, semantic cache keys, and gain-ledger subject fingerprints	`TestTeeSecretStability` / `TestResultResourceReadWireShape` / `TestPrincipalScopedRecoveryAndCache`	v0.1 CI
`installed_pending_restart` and `needs_configuration` plugins are inventory-only: not searchable, operation-completable, describable as active, invokable via tools, or reachable from `gum.code`; `gum://plugin/{name}` validates against the fixed per-status resource shape, including safe credential descriptors for `needs_configuration` and no raw env var names; `gum://op/{id}` and `gum://variant/{id}` may consult inventory only for status-only inactive-plugin responses with `execution_support: "schema_only"` and `status`, never full invocable schemas; quarantined `gum://op/{id}` and `gum://variant/{id}` return `VARIANT_QUARANTINED` JSON-RPC resource errors instead of status-only responses; invoking inactive plugins via risk invoke tools returns `UNSUPPORTED_CAPABILITY` with the same `status`; inactive-plugin `UNSUPPORTED_CAPABILITY` envelopes MUST contain `status` and MUST NOT contain `unsupported_capabilities` or `loader_kind` (mutual-exclusion per §5.8); the capability-class path MUST NOT contain `status`; fixtures prove that live inventory registry and active session catalog snapshot can diverge without leaking inactive variants into search/describe/completions/invoke, and that `quarantined` wins when a plugin also has pending-restart or needs-configuration state	`TestPluginInactiveInventoryOnly` / `TestPluginResourceShape` / `TestPendingRestartExcludedFromCompletions` / `TestPendingRestartVariantResource` / `TestPluginNeedsConfiguration` / `TestPluginStatusPrecedence`	v0.1 CI
JSON-valued resource reads (`gum://plugin/{name}` and deprecated `gum://help/{topic}` redirects) return exactly one MCP text resource content item with requested `uri`, `mimeType="application/json"`, and parseable JCS-canonical JSON payload; markdown help topics return exactly one `text/markdown` item	`TestJSONResourceReadWireShape`	v0.1 CI
Sampling remains optional	capability-negotiation test once sampling lands	v0.3 CI
Plugin `schema_ref` bundle resolves and materializes served request/response refs for bundled and third-party install paths; third-party fixtures cover safe ref grammar rejection, traversal/separator/percent-encoded separator rejection, copied `plugin-schemas/<schema_ref>.request.<sha256>.json` and `plugin-schemas/<schema_ref>.response.<sha256>.json`, full-profile-inventory collision rejection with `SCHEMA_REF_COLLISION` across active, pending-restart, needs-configuration, and quarantined plugins, and identical-body ref reuse	`TestPluginSchemaRefBundled` / `TestPluginSchemaRefThirdPartyInstall` / `TestPluginSchemaRefCollision`	v0.1 CI
Unknown capabilities fail closed	generator + plugin install tests for `UNKNOWN_CAPABILITY`	v0.1 CI
Unknown `backend_kind` values fail closed at build/install with `UNKNOWN_BACKEND_KIND`; stale-binary runtime loading of a newer catalog fails before upstream dispatch with `UNSUPPORTED_CAPABILITY` carrying `loader_kind="backend_kind"`, `unknown_value`, and `catalog_abi_version`	`TestUnknownBackendKind` / `TestRuntimeUnknownBackendKindUnsupportedCapability`	v0.1 CI
Unknown `interface_kind` values fail closed at build/install with `UNKNOWN_INTERFACE_KIND`; stale-binary runtime loading of a newer catalog fails before upstream dispatch with `UNSUPPORTED_CAPABILITY` carrying `loader_kind="interface_kind"`, `unknown_value`, and `catalog_abi_version`	`TestUnknownInterfaceKind` / `TestRuntimeUnknownInterfaceKindUnsupportedCapability`	v0.1 CI
Variant lifecycle does not silently fall back	tests for deprecated, superseded, removed, and quarantined variants	v0.1 CI
Default variant never selects removed/quarantined/deprecated variants when an active executable alternative exists	`TestDefaultVariantLifecycleSelection`	v0.1 CI
Plugin registry writes are atomic across the canonical registry ABI (`plugin-catalog.json`, `plugins.lock`, and `plugin-state.json` schemas in `spec.md` §8.7 / `docs/catalog-abi.md`): install/remove/startup-activation stages temp files with one `install_generation`/`install_txid`, publishes all three under `plugins.install.lock`, keeps the previous complete generation authoritative on failure, and startup recovers from mixed generations by selecting the last complete shared generation and quarantining orphan staged artifacts	`TestPluginRegistryABISchemas` / `TestPluginInstallTransactionAtomicity` / `TestPluginInstallCrashRecovery`	v0.1 CI
`gum://plugins` inventory is deterministic and includes a `status` field per plugin; fixture set MUST include at least one plugin in each status value (`active`, `installed_pending_restart`, `needs_configuration`, `quarantined`)	resource read test sorted by plugin name	v0.1 CI
Gain ledger begins with `{"record_type":"header","schema_version":1,"tokenizer":"cl100k_base"}` and every dispatch row has `record_type:"entry"` plus the required v0.1 fields (`session`, `op_id`, `variant_id`, `output_profile`, `args_hash`, `auth_subject_fingerprint`, token counts, cache/field-mask status, `served_from_cache`, `is_retry`, `op_family`, `baseline_method`); `gum_parallel` outer rows use the exact sentinel values from §12.3 (`output_profile=null`, `auth_subject_fingerprint="batch"`, `raw_tokens=0`, `cache_status/field_mask_status="not_applicable"`, `op_family="gum_parallel"`); cancelled parallel rows include `cancelled: true`	`TestGainLedgerHeader` / `TestGainLedgerEntrySchema` / `TestGainParallelOuterEntrySchema`	v0.1 CI
`gum.gain` and `gum gain` both read the selected profile's server-local `gain-ledger.jsonl`; `usage.jsonl` is not read in v0.1.0; remote/containerized MCP behavior depends on server-side ledger availability, not client filesystem access; disabled or unavailable ledger returns stable `isError=true` tool errors with `GAIN_DISABLED` or `GAIN_LEDGER_UNAVAILABLE` respectively and no `GainResult` structuredContent	`TestGainLedgerSourceOfTruth` / `TestGainRemoteServerSideLedger` / `TestGainErrorBranches`	v0.1 CI
Release-gated `>=80%` savings uses end-to-end ledger totals including `gum_parallel` outer entries; the test MUST verify `batch_id` linkage (outer entry `batch_id` equals all inner entries' `batch_id`), `element_count` equals the number of inner entries, `end_to_end_savings` includes outer envelope overhead, `batch_envelope_overhead` matches outer-entry token contribution, `per_op_shaping_savings` is diagnostic only, `GainResult` validates its `mode` discriminator and exactly-one-array `summary`/`session`/`history` schema branches, and outer entry `variant_id` is `null` (per §12.3 intentional design — a parallel dispatch has no single variant)	`TestGainEndToEndSavingsIncludesBatchEnvelope` / `TestGainJSONOutputSchema`	Release gate
`gum gain --session <ID>` filters by the local ledger `session` field and emits `GainResult.operations[]`; `gum gain --since <RFC3339>` applies a UTC lower-bound filter; `gum gain --history` groups chronological JSON and text output by `session` and `op_family` through `GainResult.history[]`; and `gum gain --exclude-retries` excludes only entries with `is_retry=true` from displayed aggregates while leaving raw ledger entries unchanged	`TestGainSessionAndRetryFilters` / `TestGainSinceAndHistoryFilters`	v0.1 CI
Diff-only mode 304 response short-circuits the expression pipeline; ledger records `served_from_cache: "etag_304"` with `response_tokens: 0`; `_expression` is absent in the 304 response; a second call with the same `op_id` and `args_canonical` but a different resolved `variant_id` or different `auth_subject_fingerprint` MUST NOT produce a 304 (different cache key)	`TestDiffOnlyModeEtagReplay`	v0.1 CI
`gum.code` enforces cumulative output byte budget across prints and final return	`TestCodeOutputBudget`	v0.1 CI
`gum.code` destructive fan-out is bounded across MCP and CLI: confirmed destructive scripts require `destructive_budget` / `--destructive-budget` in `1..20`, consume one unit before each destructive call, reject over-budget calls with `DESTRUCTIVE_BUDGET_EXCEEDED`, reject calls outside `destructive_scope` / `--destructive-scope` with `DESTRUCTIVE_SCOPE_MISMATCH`, consume each `gum_confirm_destructive(op_id, resource_key?)` on the immediately following destructive call only, reject v0.1 script-header pragmas as unsupported rather than parsing them silently, and make no upstream request on any rejection path	`TestCodeDestructiveBudgetAndScope` / `TestCodeCLINoScriptHeadersV01`	v0.1 CI
Expression profiles validate; `recovery="resource_link"` with `tee_mode!="always"` fails with `PROFILE_TEE_MODE_CONFLICT` before dispatch	`gum profile validate`; JSON Schema fixture tests; `TestProfileTeeModeConflict`	v0.1 CI
Expression-profile fixtures meet token budgets	`gum profile test` with `cl100k_base`	Release gate
Catalog ABI versions reject unsupported future artifacts	loader tests for `CATALOG_SCHEMA_UNSUPPORTED`, `PLUGIN_MANIFEST_SCHEMA_UNSUPPORTED`, `PLUGIN_CATALOG_SCHEMA_UNSUPPORTED`, `PLUGIN_LOCK_SCHEMA_UNSUPPORTED`, and `PLUGIN_STATE_SCHEMA_UNSUPPORTED`	v0.1 CI
Third-party plugin manifests use top-level `manifest_schema_version` only: missing version and `[plugin].manifest_schema_version` both fail install with `PLUGIN_MANIFEST_SCHEMA_UNSUPPORTED`, while bundled development manifests may use the documented v0.1.0 compatibility default	`TestPluginManifestSchemaVersionPlacement`	v0.1 CI
`service_root_template` is rejected before v0.4.0 with `SERVICE_ROOT_TEMPLATE_DEFERRED`; v0.1-v0.3 dispatch uses discovery-derived root metadata only and never advertises sovereign/government/private-service-connect endpoint variants as executable	`TestServiceRootTemplateDeferred`	v0.1 CI
Embedded `catalog.json` integrity is verified against a committed SHA256 digest in `catalog.json.sha256`; `cmd/gen-catalog` writes both files in lockstep so silent disk corruption, accidental edits, or unauthorized modification fail the build	`TestCatalogIntegrity` / `TestCatalogChecksumFileFormat`	v0.1 CI
JCS canonicalization is stable	RFC 8785 test vectors in `internal/cache/canonical_test.go`	v0.1 CI
Audit log rotation and recovery are deterministic	rotation, ENOENT retry, `audit.broken` sentinel tests	v0.1 CI
CI runs on the Go floor and current stable Go toolchains	CI matrix: `go1.25.x` and current stable	v0.1 CI
Easy auth contract: `gum init` runs auth readiness and launches/prints the default `gum auth login` path when no profile credential exists; GUM-managed browser OAuth uses built-in client ID + PKCE + loopback + CSRF state, never commits OAuth client secrets to source, and treats release-injected Desktop client material as public client material rather than a confidential secret; refresh tokens and plugin secrets are stored only in OS keychain; profile config stores only non-secret metadata; interactive desktop profiles do not silently consume ambient `GOOGLE_APPLICATION_CREDENTIALS` unless `gum auth use-adc` was run; `AUTH_REQUIRED` and `SCOPE_MISSING` user messages point `gum_oauth` users to `gum auth login ...`, but non-`gum_oauth` strategies point to `gum auth setup <op_id>`; CLI/MCP/code mode reuse the same selected-profile credential on the GUM host	`TestAuthHappyPathNoUserClientSecret` / `TestAuthKeychainStorageOnly` / `TestAuthNoAmbientADCWithoutOptIn` / `TestAuthErrorNextAction` / `TestAuthLoopbackStateRequired`	v0.1 CI
Bundled OAuth scope allowlist is enforced for variants that use `auth_strategy="gum_oauth"`: `apps/gum/internal/embedded/data/auth-managed-scopes.v1.json` is the only source of eligible scopes; only `status="active"` + `verification_state="verified"` + `project_evidence_state="ready"` + `live_canary_state="passing"` scopes count; planned scopes are not requested; generator rejects a `gum_oauth` variant requiring any out-of-manifest or unverified scope with `GUM_OAUTH_SCOPE_NOT_MANAGED`; release gate verifies every active restricted/sensitive scope has an evidence pointer and every active scope has project, token-exchange, and refresh-canary evidence, otherwise `GUM_OAUTH_MANAGED_CLIENT_NOT_READY`	`TestManagedOAuthScopeManifest` / `TestGumOAuthScopeNotManaged` / `TestManagedScopeVerificationEvidence` / `TestManagedOAuthProjectReadiness` / `TestManagedOAuthLiveCanaryRequired`	v0.1 CI and release
Testing-window opt-in gate (§7 transitional): a scope is eligible for `gum_oauth` before full promotion only when `managed_project.publishing_status == "testing"` and the scope sets `testing_allowed: true`; the opt-in is per-scope (flipping the project to testing never auto-exposes non-opted-in scopes) and inert under any other publishing status (production stays strict by default); `embedded_client_secret=true` still fails `GUM_OAUTH_MANIFEST_INVALID` at the gate even with eligible scopes	`TestCanStartGumOAuthTestingModeAllowsOptedInScope` / `TestCanStartGumOAuthTestingModeRequiresOptIn` / `TestCanStartGumOAuthTestingFlagInertOutsideTestingStatus` / `TestCanStartGumOAuthEmbeddedSecretRejected`	v0.1 CI
Managed OAuth scope expansion is full re-consent, not implicit incremental append: requesting additional scopes builds the complete desired managed scope set, verifies granted scopes and subject fingerprint before replacing the keychain credential, rejects subject mismatch without storing tokens, and returns BYO/compound setup when any requested scope is not managed-ready	`TestAuthScopeUpgradeFullReconsent` / `TestAuthScopeUpgradeSubjectMismatch` / `TestAuthScopeUpgradeUnmanagedScopeRoutesToSetup`	v0.1 CI
Auth requirement taxonomy is enforced: every executable variant/plugin declares one `auth_strategy`; unknown non-`x-*` auth components fail build/install with `AUTH_COMPONENT_UNKNOWN`; non-`gum_oauth` variants emit errors with `auth_strategy`, `missing_components`, and `setup_command`; fixture includes a Google Ads Keyword Planner-like compound operation requiring developer token, OAuth client/client secret or refresh token, customer ID, optional login customer ID, billing/account prerequisites, account-permission checks, and approved access/permissible-use/allowlist hints, and proves plain `gum auth login` is not suggested as sufficient	`TestAuthStrategyRequired` / `TestAuthComponentUnknown` / `TestCompoundAuthErrorEnvelope` / `TestGoogleAdsKeywordPlannerAuthFixture`	v0.1 CI
BYO OAuth grant storage (spec §7 "BYO grant storage"): refresh tokens are keyed per `sha256(client_id)` with the granted-scope set stored alongside; a stored grant is reused when it is a superset of the requested scopes (broad `gum login` satisfies narrow per-op resolves), an uncovered scope routes to `NO_REFRESH_TOKEN` carrying the op's full scopes, separate per-op authorizations union into one grant (no clobber), and distinct `client_id`s stay isolated	`TestByoOAuthBroadGrantSatisfiesNarrowResolve` / `TestByoOAuthMissingScopeForcesReauth` / `TestByoOAuthGrantUnionAccumulates` / `TestByoOAuthPerClientIsolation`	v0.1 CI
v0.1.0 login surface + just-in-time auth (spec §7 "v0.1.0 login surface", "Just-in-time authorization"): `gum auth login` / `gum login` run the BYO loopback flow against the registered client, pre-authorizing the full catalog scope set when no `--scope` is given; the first `gum call` on an unauthorized `byo_oauth` variant prompts a TTY operator `Authorize <scope>? [Y/n]` then retries once on assent, while agents/pipes fall through to structured `AUTH_REQUIRED` (no `gcloud` dependency)	`TestRunLoginWithConfiguredClientRunsFlow` / `TestResolveLoginScopesEmptyDerivesFromCatalog` / `TestTopLevelLoginAliasRegistered` / `TestMaybeJITLoginAccepted` / `TestMaybeJITLoginNotInteractive` / `TestCallRetriesAfterJITLogin`	v0.1 CI
BYO-only public auth posture: `gum login` and the JIT `byo_oauth` resolver require an operator-registered OAuth client. Injected bundled-client values do not satisfy `byo_oauth`, and the public auth CLI does not register `managed-status`.	`TestResolveAuthIgnoresInjectedManagedClient` / `TestResolveAuthRequiresBYOForAllScopes` / `TestResolveAuthNoManagedClientStillNotConfigured` / `TestRunLoginIgnoresInjectedManagedClient` / `TestAuthManagedStatusNotRegisteredForV1` / `TestManagedSupportedScopesIncludesSearchConsoleReadonly`	v1 CI
Workspace and account-policy failures are actionable: Google `admin_policy_enforced`-style failures map to `missing_components:["workspace_admin_trust"]` or `["org_policy_exception"]`, never to a retry-login loop; active credential alias and `auth_subject_fingerprint` prevent wrong-account dispatch after browser account switching	`TestWorkspaceAdminPolicyAuthEnvelope` / `TestAuthActiveCredentialAliasRequired` / `TestAuthSubjectFingerprintMismatchBlocksDispatch`	v0.1 CI
Plugin credential setup is centralized: manifests with credential descriptors and auth components can be configured through `gum plugin setup <name>`; setup stores secret components in the OS keychain, exposes only descriptor aliases/display names in resources and errors, displays external prerequisites as checklist items, runs the live canary after credentials are supplied, and clears `needs_configuration` only on canary success; raw env var names and secret values are not emitted in MCP resources	`TestPluginSetupCredentialFlow` / `TestPluginCredentialNoRawEnvLeak` / `TestPluginExternalPrerequisiteChecklist`	v0.1 CI
CLI `gum call` argument grammar parses typed JSON, repeated arrays, `@file`, stdin, and dotted-key escaping deterministically; every call requires `--risk=read\|write\|destructive`; host-control flags (`--fields`, `--page-size`, `--page-token`, boolean output flags) are not aliases for positional operation args and cannot be overridden by them; v0.1 rejects `gum call --format` while allowing reporting subcommands such as `gum gain --format`; duplicate output-format flags fail with `CLI_ARG_DUPLICATE`; `--variant-id` selects a non-default active variant and unknown/removed/quarantined/`installed_pending_restart`/`needs_configuration` variants fail before upstream dispatch; mismatched resolved variant risk returns `RISK_TOOL_MISMATCH`; destructive plus `confirmation_policy="high_stakes_write"` calls require `--yes` or an interactive TTY confirmation before dispatch	`TestCLIArgGrammar` / `TestCLIRiskGate` / `TestCLIVariantSelection`	v0.1 CI
CLI `gum code` confirmation grammar is exact: `--allow-write` or `--allow-destructive` requires interactive `y` or non-interactive `--yes`; read-only code requires neither; repeated `--destructive-scope op_id[:resource_key]` is the only v0.1 scope grammar; `--no-confirm`, script-header pragmas, and `--lang` fail parsing or validation rather than being silently accepted	`TestCLICodeConfirmationAndScopeGrammar`	v0.1 CI
Automation-safe read-only/reporting CLI commands support stable `--format=json` roots from §12 (`search`, `describe`, `plugin list`, `plugin info`, `catalog list-overrides`, `cache stats`, `profile validate`, `profile test`, `gain`) and golden tests reject drift in JSON field names	`TestCLIJSONOutputContracts`	v0.1 CI
`strip_nulls=true` is rejected unless `null_elision_safe_fields` covers elided fields	`TestProfileStripNullsSafety`	v0.1 CI
`field_mask_mode="dual_fetch"` is rejected for every write/destructive variant and every non-idempotent read variant; only read + idempotent variants may issue the second unmasked recovery fetch	`TestDualFetchReadOnlyIdempotentGate`	v0.1 CI
TOON output includes resolved `variant` header and exact variant schema lookup works	`TestToonVariantHeader`	v0.1 CI
TOON in-tree parser round-trips representative fixtures (list result with nulls, quoted CSV fields, zero `omitted_count`, empty body) losslessly	`TestToonRoundTrip`	v0.1 CI
`RESULT_ARTIFACT_EXPIRED` envelope returned on `resources/read` of a deleted `gum://results/<hash>` artifact as a JSON-RPC application error with `error.code=-32010` and exact §7 `error.data` fields: `error_code`, `hash`, `uri`, `expires_at`, `user_message`, and `suggestion`	`TestResultArtifactExpiredError`	v0.1 CI
Generated REST dispatch stubs propagate context (`.Context(ctx)` before `.Do()`); cancelling context aborts in-flight HTTP within 100ms	`TestExecutorContextPropagation`	v0.1 CI
Long-tail raw REST unknown-argument handling is fail-closed by default for every risk class; read-only allowlist pass-through applies only to explicitly configured `discovery-rest`/`raw-http` variants, emits `_validation_warnings`, and is ignored for write/destructive, typed SDK, gRPC, and plugin backends	`TestLongTailUnknownArgHandling`	v0.1 CI
Any PR adding a new `backend_kind` value includes a fixture-backed executor contract test	`TestBackendKind<Name>`	Same PR
`google-ads-sdk` executes Google Ads Keyword Planner POST custom-methods (`customers/{id}:generate*`), injecting the secret `developer-token` header server-side (never an invocation arg) alongside the byo_oauth Bearer and optional `login-customer-id`	`TestBackendKindGoogleAdsSDK`	v0.1 CI
Any PR adding a new `interface_kind` value includes a fixture-backed interface contract test	`TestInterfaceKind<Name>`	Same PR
Any PR adding a new `grpc-sdk` or `sdk-native` `adapter_key` includes a binding-schema fixture and adapter-registry contract test proving the binding resolves without ad hoc code in the catalog loader	`TestBackendBinding<Name>`	Same PR
Plugin variants materialize explicit backend binding objects: `mcp-plugin` requires `tool_name`, and bundled `grpc-plugin` ABI fixtures require `rpc_service` plus `rpc_method`; missing or malformed selector fields fail with `PLUGIN_BINDING_INVALID` before subprocess start unless the third-party Shape 2 install gate applies first	`TestPluginBindingSchema`	v0.1 CI
Third-party Shape 2 manifests are rejected before v0.4.0: `[plugin].shape="grpc-subprocess"` or any `backend_kind="grpc-plugin"` in a third-party install fails with `PLUGIN_SHAPE_UNSUPPORTED` before binding selector validation, schema copy, executable staging, canary, or registry writes; malformed third-party Shape 2 selectors still return `PLUGIN_SHAPE_UNSUPPORTED`, while bundled ABI fixtures continue to use `PLUGIN_BINDING_INVALID` for selector failures	`TestThirdPartyShape2InstallRejected`	v0.1 CI
Third-party plugin namespace ownership is stable and profile-scoped: manifests require `namespace_owner`, install records prefix ownership in the selected profile's `plugins.lock`, matching owners may upgrade/reinstall within that profile, mismatched owners fail with `PLUGIN_NAMESPACE_CONFLICT`, cross-profile locks never merge, and `--dev-allow-namespace-conflict` is rejected outside dev profiles	`TestPluginNamespaceOwnership`	v0.1 CI
Any PR promoting a capability class from `schema_only` to executable (or adding a new executable atom) includes `TestCapabilityClass<Name>` and updates this matrix	`TestCapabilityClass<Name>`	Same PR
Any PR adding or removing a `language` closed-enum value updates §4.3, §6.2, §12 CLI help, §13 completions, dependency floors as needed, and the `TestMCPCompletions` fixture in the same PR; any PR adding or removing a `format` value also updates `docs/expression-profile-dsl.md`, `docs/expression-profile-dsl.json`, and format fixtures	`TestMCPCompletions` (extended)	Same PR
Elicitation-based managed-scope re-consent, when enabled in v0.2.0+, uses a structured approval object bound to `op_id`, exact required scopes, profile, expected subject hint, and request hash; successful login verifies granted scopes and resulting `auth_subject_fingerprint`, stores nothing on mismatch/decline/cancel, emits an audit event, returns `SCOPE_GRANTED`, and does not auto-retry the original operation	`TestElicitationScopeUpgradeBinding`	v0.2 CI
`[override_bindings]` in project-local and user-global profile files attaches a profile to listed `op_id` or `variant_id` keys; override-bindings-only files are valid when referenced profiles resolve elsewhere; `variant_id` wins over `op_id` for the same resolved variant; rejects undefined-profile, unknown-op_id, unknown-variant_id, and structural errors with `OVERRIDE_BINDING_INVALID`; project-local wins over user-global	`TestOverrideBindings`	v0.1 CI
`gum.describe_op` registers `#/$defs/DescribeOpResult` as its outputSchema, validating responses including the deterministic variants[] truncation form controlled by `meta_tools.describe_op.max_variants` (default 5; ops with 6+ variants truncate to 5 by default, with `variants_total` and `variants_omitted_count`), an override-positive fixture that includes `risk_override=true` and `risk_override_reason`, and explicit exclusion of inactive-plugin-only `status` / `reason` fields, which appear only on `gum://op/{id}` and `gum://variant/{id}` resource responses	`TestDescribeOpOutputSchema`	v0.1 CI
`gum://help/{topic}` returns the canonical §7 `RESOURCE_NOT_FOUND` envelope as a JSON-RPC application error with `error.code=-32004` for topics absent from `gum://help/topics`; active topics return the §7 `text/markdown` MCP resource-content shape; deprecated topics return the §7 `application/json` resource-content shape containing only `{"status":"deprecated","redirect":"<new-topic>"}`	`TestHelpResourceNotFound`	v0.1 CI
Plugin-local failure codes map deterministically to stable GUM error codes (`RATE_LIMIT`→`RATE_LIMITED`, `AUTH_EXPIRED`→`AUTH_REQUIRED`, `PARSE_FAILURE`→`SERVICE_DOWN`, `SERVICE_DOWN`→`SERVICE_DOWN`, `INVALID_INPUT`→`INVALID_ARGS`) while preserving retry fields and sanitized source metadata as specified	`TestPluginErrorCodeMapping`	v0.1 CI
Plugin schema bundles declare one manifest `schema_ref` whose JSON Schema document contains `$defs.request` and `$defs.response`; build/install materializes `request_ref=<schema_ref>.request` and `response_ref=<schema_ref>.response`, copies those served schemas by hash, rejects missing defs with `PLUGIN_SCHEMA_REF_INVALID`, and rejects divergent full-inventory collisions with `SCHEMA_REF_COLLISION`	`TestPluginSchemaBundleMaterialization` / `TestPluginSchemaRefCollision`	v0.1 CI
Plugin manifests listing `GUM_`-prefixed env vars, exact denylist entries (`GOOGLE_APPLICATION_CREDENTIALS`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`), or `_GUM*` variables in `needs_user_creds` fail install with `PLUGIN_ENV_PROHIBITED`; a single curated in-binary denylist source of truth is shared by catalog build, plugin install, and runtime env scrubbing; dispatch scrubs prohibited vars from the subprocess environment regardless of manifest declarations	`TestPluginEnvProhibited` / `TestPluginEnvExactDenylist`	v0.1 CI
Plugin manifests with non-empty `needs_user_creds` must declare one safe credential descriptor per env var; missing, duplicate, or extra descriptors fail with `PLUGIN_CREDENTIAL_DESCRIPTOR_INVALID`; install with missing required credentials records `needs_configuration`, skips live canary without quarantine, exposes only descriptor aliases in resources/errors, and a later successful credentialed `gum canary --live` is required before activation	`TestPluginCredentialDescriptors` / `TestPluginNeedsConfigurationInstall`	v0.1 CI
Plugin executable binding: non-dev plugin sources launch only an absolute executable inside the host-managed verified install root; the selected profile's `plugins.lock` records executable path, executable SHA-256, normalized argv, and install root; runtime spawn re-hashes the executable and quarantines/refuses execution on mismatch; PATH-only, shell-wrapper, and runtime `uvx` resolution paths fail with `PLUGIN_EXECUTABLE_UNTRUSTED`; fixtures assert normalized argv for PyPI, GitHub release, Git, and dev-only local sources	`TestPluginExecutableBinding` / `TestPluginCommandNormalization`	v0.1 CI
`gum_parallel` cancellation: cancelling the outer context propagates to in-flight inner calls within 200ms; after scheduling starts, result envelopes contain completed elements plus unfinished elements with `error.error_code="CANCELLED"` and `error.cancelled=true`, no success payload fields on cancelled elements, ledger records per-element `cancelled: true` and marks the outer entry `cancelled: true`; no goroutine leak	`TestGumParallelCancellation`	v0.1 CI
`gum_parallel` 429 per-service-family isolation: a 429 from a `workspace` worker pauses only workers sharing `service_family = "workspace"`; workers with a different `service_family` continue uninterrupted; the pausing family resumes after `retry_after_ms` (or 60s fallback), staggered 50ms × worker_index; the gain ledger records the outer entry with the correct total `element_count`	`TestGumParallel429ServiceFamilyIsolation`	v0.1 CI
`grpc-sdk` `routing_headers` invariant: catalog build rejects unknown field paths (`GRPC_ROUTING_HEADER_NOT_FOUND`), duplicates (`GRPC_ROUTING_HEADER_DUPLICATE`), and empty-array forms (`GRPC_ROUTING_HEADER_NOT_REQUIRED`); fixture covers both present and omitted forms	`TestGrpcRoutingHeaderInvariant`	v0.1 CI
`gum://help/topics` is generated from `docs/help-topics.v1.json`; every listed active or deprecated topic resolves; deprecated rows return only a redirect object; no topic handler exists unless it is listed by the manifest; v0.1.0 manifest contains the canonical eight active topics and no deprecated rows	`TestHelpTopicsSeedSet` / `TestHelpTopicsManifest`	v0.1 CI
`gum_parallel` result envelope compression (§9.0.1): when ≥2 inner results share identical `ExpressionMeta` field values, those fields are hoisted into the outer envelope's `shared_expression_fields` and omitted from per-result `_expression` delta objects (`#/$defs/ExpressionMetaDelta`); the receiver reconstructs effective per-result `ExpressionMeta` as `shared_expression_fields ∪ per_result._expression` with per-result values winning on conflict; each reconstructed effective object MUST validate against full `ExpressionMeta`, while each emitted delta validates against `ExpressionMetaDelta`; round-trip MUST yield byte-identical unhoisted forms; identity is determined under canonical-JSON byte-equality so `null` and absent are NOT identical (§9.0.1 rule 1); fixture set MUST include (a) at least one heterogeneous batch (N≥2, no field identical across all results) verifying `shared_expression_fields` is absent and behavior matches the un-hoisted form, (b) one all-null fixture verifying explicit `null` hoists with value `null`, (c) one mixed null/absent fixture verifying no hoisting occurs, (d) one fixture asserting the outer `_expression.variant_id` is `null` per §12.3, and (e) one fixture where every inner result carries both `intentional_zero_max_items: true` and a non-null `on_empty_message` whose string value is byte-identical across all N results (required for §9.0.1 rule 1 hoisting), verifying that both fields hoist into `shared_expression_fields` and that the `{intentional_zero_max_items: true, on_empty_message != null}` invariant (§13 ExpressionMeta) holds on the effective ExpressionMeta of every result after reconstruction	`TestGumParallelResultEnvelopeCompression`	v0.1 CI
Per-tool inputSchema budget verification (§4.1): each Tier A risk tool's registered inputSchema stays within its declared per-tool budget after parameter additions; refreshed every parameter-adding PR; also verifies that the `gum.write` tool description string contains the normative irreversibility sentence (§13 `gum.write` irreversibility notice)	`TestTierAPerToolInputSchemaBudget`	v0.1 CI
Per-tool token delta gate (spec §2 line 129, bead gum-coo): per-tool cl100k_base description token counts are stored in `testdata/tier-a-token-baseline.json`; any tool whose measured count exceeds its stored baseline fails the test with `TOKEN_DELTA_REGRESSION`; a tool absent from the baseline fails with `MISSING_BASELINE`; an orphan baseline entry (in the file but not in `tools/list`) fails with `ORPHAN_BASELINE`. Increases require the `token-budget-increase` PR label and a paired baseline bump in the same change; decreases pass and emit a `RATCHET_OPPORTUNITY` Logf hint.	`TestTierAPerToolTokenDelta` / `TestTierABaselineJSONLoads`	v0.1 CI
Tier A output schemas cover branch-specific successful results: `gum.write` high-stakes confirmation-required, `gum.destructive` confirmation-required, `gum.code` confirmation-required, normal `gum.code`, `gum.code` returning `gum_parallel`, and code output-limit structured results all validate against the registered branch schemas; schemas using `oneOf` still have root `type: object` so go-sdk registration accepts them	`TestTierABranchOutputSchemas`	v0.1 CI
JSON-RPC batch handling is legacy-compatible only: GUM does not advertise batching and does not rely on it for client-facing behavior; if the pinned transport accepts a legacy batch frame, caps reject oversized input before dispatch (more than 32 requests, more than 1 MiB decoded batch body, or more than 256 KiB decoded params per item) and run no tool handler	`TestMCPBatchCaps`	v0.1 CI
Audit log hard ceiling cannot be exceeded under cross-process lock contention: when `audit.jsonl` is at or above 10 GB and `audit.unbounded=false`, append blocks until emergency rotation succeeds or returns an audit append failure rather than writing past the cap	`TestAuditHardCeilingContention`	v0.1 CI
Structured logging (§14.1): build-time AST lint rejects `log.Printf`, `fmt.Fprintln(os.Stderr, ...)`, and third-party logger imports in `internal/dispatch`, `internal/adapters/*`, `internal/mcp`, `internal/cli`, `internal/cache`, `internal/auth`, `internal/profiles`, `internal/sandbox`, `internal/ratelimit`, `internal/tee`, and `internal/output`. The scan set MUST include every package in the §14 constructor-convention table (each of which mandates `WithLogger` injection) plus stateless `internal/output` (which must not log at all).	`TestStdLogProhibition`	v0.1 CI
Packages listed in §14 expose `WithLogger(*slog.Logger)`, default to `slog.Default()`, and emit nothing when passed `slog.New(slog.DiscardHandler)`	`TestLoggerInjectionContract`	v0.1 CI
v0.1.0 release binaries import neither `go.opentelemetry.io/` nor `net/http/pprof` (§14.1.5–6). `TestNoOTelImportV01` verifies via the import graph (`go list -deps ./cmd/gum/...` MUST produce no `go.opentelemetry.io/` matches); the module-graph scan (`go list -m all`) is run as an advisory warning, not a CI failure, so transitive SDK module presence does not produce a false positive. `TestNoPprofImportV01` uses the same import-graph methodology for `net/http/pprof`.	`TestNoOTelImportV01`, `TestNoPprofImportV01`	Release gate
Audit-log entries always emit schema version field `v` as the first key; absent `v` is parsed as `v:0`	`TestAuditLogSchemaVersion`	v0.1 CI
`internal/output` statelessness (§14): AST scan confirms no exported constructors (no `func New` / `func (T) New`), no package-level mutable `var` declarations, and no exported types with unexported fields that could hold mutable state; a future PR adding a stateful encoder cache to `internal/output` fails CI before merge	`TestOutputStatelessness`	v0.1 CI
`intentional_zero_max_items` invariant (§13 ExpressionMeta, §9.1): when the runtime emits `_expression.intentional_zero_max_items: true`, it MUST also emit a non-null `_expression.on_empty_message`; the combination `{intentional_zero_max_items: true, on_empty_message: null}` is never emitted by a v0.1.0 dispatch path; the test runs the expression pipeline on a fixture profile with `collapse_arrays.max_items=0` + `on_empty="..."` and asserts the emitted envelope satisfies the invariant. The invariant MUST also be asserted on the effective per-result ExpressionMeta after `gum_parallel` envelope compression and reconstruction (covered jointly by `TestGumParallelResultEnvelopeCompression` fixture (e)) so that hoisting cannot smuggle in a violating combination via `shared_expression_fields ∪ per_result._expression`	`TestIntentionalZeroMaxItemsInvariant`	v0.1 CI
`gum://catalog` resources/list entry carries `size` annotation equal to `len(catalogBin)` (non-zero, runtime-computed from the embedded artifact) AND `"x-gum-do-not-auto-inject": true` annotation (§13). `TestStaticResourceRegistration` is extended with both assertions	`TestStaticResourceRegistration` (extended)	v0.1 CI
`gum://status/canaries` returns `status="stale"` for every installed plugin canary on server startup before any passive cron run completes; the resource covers plugin canaries only and never first-party Google API health (§13)	`TestCanaryStaleOnStartup`	v0.1 CI
`gum://plugins` MCP view omits rows whose status is `installed_pending_restart`; the same plugin appears in `gum plugin list --format=json` CLI output for the same profile (§13 R28B-M-2 filter)	`TestPluginsResourceFiltersPendingRestart`	v0.1 CI
`gum://status/health` returns rows for the closed v0.1 subsystem enum `[audit_log, cache_sqlite, tee_filesystem, keychain, gain_ledger, canary_runner]`; values `status∈{healthy, degraded, unavailable}`; health probes are local-only and emit no network calls; per-subsystem sample TTL bounds resource-read cost (§13)	`TestStatusHealthSubsystemEnum` / `TestStatusHealthNoNetwork`	v0.1 CI
Stdio transport uses newline-delimited JSON-RPC framing only; the silent-stdout startup invariant holds: between process start and `notifications/initialized`, no non-JSON bytes appear on stdout (§13.1)	`TestStdioFramingClean`	v0.1 CI
Unsolicited `logging/setLevel` (sent by a permissive client despite `logging` being unadvertised) returns a successful empty-result response and does not log to stdout; the requested level is recorded in an in-memory per-session field for forward compatibility (§13.1)	`TestLoggingSetLevelTolerant`	v0.1 CI
MCP `completion/complete` returns within 100 ms (P95) and 250 ms (P99) for a worst-case Tier A argument (variant_id completion for a 50-variant op) and for every completable resource-template argument with a 100-plugin inventory; resource-template completions meet the same budget; failing the P95 budget on linux/amd64 CI hardware is a release-blocker (§13.1)	`TestMCPCompletionLatency`	Release gate
BM25 index (`gen/index/bm25.bin`), dense embedding index (`gen/index/embeddings.bin`), and the active session catalog snapshot are built from the same `catalog.json` in a single `go generate` run; every snapshot op_id appears in `bm25.bin`; every snapshot op_id with embedding-enabled variants appears in `embeddings.bin`; every embedding entry corresponds to a BM25 entry; no completion-eligible op_id is absent from any required index (§5.3 single-source invariant)	`TestCatalogIndexSnapshotInvariant`	v0.1 CI
`internal/dispatch` is the leaf of the internal import graph: no package in `internal/dispatch`'s transitive import set re-imports `internal/dispatch`; `internal/usage`, `internal/catalog`, `internal/profiles`, `internal/auth`, `internal/cache`, `internal/ratelimit`, `internal/retry`, `internal/sanitize`, `internal/output`, `internal/tee`, and `internal/pluginenv` do not import `internal/dispatch`; `cmd/gen-catalog` imports no internal package other than `internal/catalog` schema types (§14 import-graph contract)	`TestNoCyclicImports`	v0.1 CI
Shipped `linux/amd64` binary (including embedded `gen/catalog.bin`, `gen/index/embeddings.bin`, `gen/index/bm25.bin`) does not exceed 120 MB at release tagging (§15 binary-size cap)	`TestBinarySize` (in `internal/securityscan`, cap = `MaxBinarySizeBytes`) + `binary-size` job in `.github/workflows/build-matrix.yml`	Release gate (every PR)
`cmd/gum` and its transitive dependency closure contain no CGo imports (§15 CGO_ENABLED=0 scoping); native keychain code lives behind the `internal/auth` keychain-backend abstraction	`TestReleaseBinaryNoCGo` (in `internal/securityscan`) + CI `build-matrix` workflow with `CGO_ENABLED=0` across linux/amd64, linux/arm64, darwin/amd64, darwin/arm64	Release gate + PR gate
Fixtures under `testdata/` contain no real Google API keys (`AIza...`), bearer tokens, OAuth refresh tokens (`1//0...`), AWS access keys (`AKIA...`), PEM private keys, or non-example.com email addresses; allowlists cover RFC 2606 reserved domains (`.example.{com,org,net}`, `.test.local`, `localhost`) and synthetic Google Calendar iCalUIDs (`evNNN@google.com`)	`TestFixturesNoSecrets` (in `internal/securityscan`)	v0.1 CI
Release binary contains no shared-library linkage (linux `ldd`: "not a dynamic executable" or "statically linked"; darwin `otool -L`: only `/usr/lib/` and `/System/` entries); built with `CGO_ENABLED=0 -trimpath -ldflags='-s -w'`	`TestNoSharedLibDependencies` (in `internal/securityscan`)	Release gate
Build-time `-X main.version=<tag>` propagates to `gum --version` and `gum version` output; release tags drive the stamped value via goreleaser ldflags	`TestVersionStamp` (in `internal/securityscan`)	Release gate
Opt-in update notifier (gum-afcv.5): `notify.enabled=true` per-profile gates an async GitHub releases-API check on `gum version`; check honours a 2s timeout; results cache 24h at `<XDG_CACHE_HOME>/gum/<profile>/notify.json`; the check NEVER blocks version output and warnings emit to stderr only (not stdout — pipelines stay clean); `dev` builds and unparseable semvers are no-ops; fetch errors are silently swallowed	`internal/notify` package tests + `TestVersionNotifierDisabledByDefault` / `TestVersionNotifierOptInPrintsToStderrNotStdout` in `cmd/gum`	v0.1 CI
Homebrew tap install path: `ehmo/homebrew-tap` carries `Formula/gum.rb`; users install with `brew tap ehmo/tap https://github.com/ehmo/homebrew-tap` then `brew install ehmo/tap/gum`. The formula name must stay qualified because Homebrew core already has Charmbracelet's unrelated `gum`. Binary casks stay disabled until macOS signing/notarization is active; unsigned cask binaries can be quarantined or blocked on first launch.	`ruby -c Formula/gum.rb` in the tap + remote Homebrew install smoke where host policy and disk allow it	Release gate
Release workflow rejects tags that do not match `^v[0-9]+\.[0-9]+\.[0-9]+$` (no pre-release suffix on the stable channel) via the `validate-tag` job before any goreleaser step runs	`validate-tag` job in `.github/workflows/release.yml`	Release gate
Two sequential builds of `./cmd/gum` from the same source tree with `CGO_ENABLED=0 -trimpath -ldflags='-s -w -X main.version=<stamp>'` produce byte-identical SHA-256 outputs	`TestReproducibleBuild` (in `internal/securityscan`) + `reproducible` job in `.github/workflows/build-matrix.yml`	Release gate
Release pipeline runs `go test -race ./...` on Go 1.25.x and 1.26.x as a release gate; runs the three fuzz targets `internal/output.FuzzToonParser`, `internal/cache.FuzzJCSCanonical`, and `internal/plugins.FuzzPluginManifest` with `-fuzztime=60s` per CI run; `govulncheck ./...` runs advisory in v0.1.0 and blocking in v0.2.0 (§15)	`TestRaceModeReleaseGate` / `FuzzToonParser` / `FuzzJCSCanonical` / `FuzzPluginManifest` / `TestGovulncheckPipeline`	Release gate (race + fuzz from v0.1; govulncheck-blocking from v0.2)
SLSA Level 1 provenance attestations are generated by `slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.0.0` against the SHA-256 hashes of all goreleaser-produced `*.tar.gz` artifacts; the `.intoto.jsonl` is uploaded to the GitHub release; a `verify-provenance` job downloads the just-published artifacts + provenance and runs `slsa-verifier verify-artifact --source-tag <tag>` against every artifact before the workflow exits (§15 release pipeline)	`provenance` + `verify-provenance` jobs in `.github/workflows/release.yml`; `slsa-framework/slsa-verifier/actions/installer@v2.6.0`	Release gate
Release fixture set composition gate: `internal/bench/fixtures/release/` contains at least 200 fixture calls with composition (50% Workspace read, 20% `gum_parallel` 2-4 batches, 15% non-Workspace read, 15% write/destructive across ≥3 services), validated at release within ±5% tolerance per category (§12.3)	`TestGainFixtureCompositionGate`	Release gate
`gum gain --fixture-replay` runs the release fixture set against the TOON-default catalog profiles, computes end-to-end savings against the fixture-backed baseline (`baseline_method = "fixture_replay"`), and emits the same `GainResult` schema as live ledger reads; failure modes (missing fixture set, schema drift, baseline mismatch) return stable error envelopes (§12.3 R28B-M-3)	`TestGainFixtureReplay`	Release gate
`gum gain --fixture-replay` second pass with `output.default_format=json` set on the profile config replays the same release fixture set, computes JSON-default end-to-end savings against the fixture-backed baseline, and is the release-gate proof artifact required by §2.1 "Savings claim measurement". The two passes share fixture data; only the default-format setting differs. Failure (savings drift beyond the documented JSON-default band, fixture-replay error envelope) blocks the release tag.	`TestGainFixtureReplayJSONDefault`	Release gate
Managed-scope manifest (`apps/gum/internal/embedded/data/auth-managed-scopes.v1.json`) validates against `apps/gum/internal/embedded/data/auth-managed-scopes.v1.schema.json` (JSON Schema 2020-12); manifest invalidity fails the build before `cmd/gen-catalog` evaluates `gum_oauth` variants; for every scope with `status="active"`, the schema enforces `verification_state="verified"`, `project_evidence_state="ready"`, `live_canary_state="passing"`, and non-empty `evidence` (§7)	`TestManagedScopeManifestSchema`	v0.1 CI
`cmd/gen-catalog/overrides.toml` validates against `cmd/gen-catalog/overrides.schema.json` (JSON Schema 2020-12 applied to TOML decoded as JSON) before each `cmd/gen-catalog` run; validation failure is `OVERRIDES_SCHEMA_INVALID` (§5.2)	`TestOverridesManifestSchema`	v0.1 CI
Cobra-forbidden patterns: AST lint rejects `cobra.OnInitialize` usage anywhere in `internal/cli/`; required initialization order is constructor-driven not init-hook-driven (§12.2)	`TestForbiddenPatterns`	v0.1 CI
Atomic settings.json patch: `gum init` acquires `plugins.install.lock` before writing settings.json; the lock is the canonical mutex; concurrent `gum init` runs serialize correctly; lock-held state surfaces a structured error rather than racing the patch (§12.2 R28C-M-12)	`TestSettingsAtomicPatch`	v0.1 CI
Profile config schema versioning: profile config files declare `config_schema_version = 1`; unsupported future versions fail with `CONFIG_SCHEMA_UNSUPPORTED`; unknown keys in the current version fail with `UNKNOWN_CONFIG_KEY` (§12.2)	`TestProfileConfigSchemaVersion`	v0.1 CI
Gain-ledger retention mirrors §11 audit retention: per-profile `gain-ledger.jsonl` rotates on the same configurable schedule; rotation does not lose entries; rotated segments are read by `gum gain` and `gum.gain` (§12.3 R28C-OQ-3)	`TestGainLedgerRetention`	v0.1 CI
`interface_kind` closed-enum membership is enforced at catalog build: unknown non-`x-*` values fail with `UNKNOWN_INTERFACE_KIND`; promotion from `x-<name>` to `<name>` requires the multi-step procedure in `docs/catalog-abi.md` (§Interface Kind extension procedure); `TestInterfaceKindClosedEnum` verifies both the closed-enum and a representative experimental-to-stable promotion fixture	`TestInterfaceKindClosedEnum`	v0.1 CI
`binding_schema_version` is an integer; non-integer values (decimal point, string suffix, semver-triple) fail with `BINDING_SCHEMA_UNSUPPORTED` at load (`docs/catalog-abi.md` binding-version patch prohibition)	`TestBindingSchemaVersionInteger`	v0.1 CI
`prompts/get` argument-rejection transport returns JSON-RPC `error.code = -32602` (`InvalidParams`) with `error.data.error_code = "INVALID_ARGS"` for argument maps supplied to zero-argument v0.1 prompts (§7)	`TestPromptsGetInvalidArgs`	v0.1 CI
`gum.code` with a reserved `language` value (`starlark`, `yaegi`, `js`, `python`) is rejected at the JSON Schema validation layer with JSON-RPC `error.code = -32602`; no GUM stable error code wraps the rejection; no script execution occurs (§4.3)	`TestCodeReservedLanguageRejection`	v0.1 CI
MCP roots/list change handling: on receipt of `notifications/roots/list_changed`, GUM invalidates the session-cached `roots/list` and re-calls `roots/list` before the next request requiring project-local profile resolution; in-flight requests retain the pre-invalidation root set (§13.2 R28G-MF-3)	`TestRootsListChangedHandling`	v0.1 CI
`gum://results/{hash}` lifecycle polling pattern: clients copy artifacts before `artifact_expires_at`; GUM emits no `notifications/resources/updated` for results in v0.1.0; reads after expiry return the stable `RESULT_ARTIFACT_EXPIRED` envelope (§13 R28B-S1-7 reframe)	`TestResultsArtifactExpiryPolling`	v0.1 CI
`ParallelResultItem` schema (§13 `$defs/ParallelResultItem`) is the validating schema for `ParallelResults.results[]` items; items are NOT validated against `ToonResult`; each item requires `_idx` and `_expression` (an `ExpressionMetaDelta`); optional sibling `_code_output_truncated: boolean` lives on the item, not inside `_expression`; outer `ParallelResults._code_output_truncated` is set when any element carries the field set to true (R28B-min-5)	`TestParallelResultItemSchema`	v0.1 CI
Goroutine-leak verification: `TestPollTimeoutAndCancellation` and `TestGumParallelResultEnvelopeCompression` end with `defer goleak.VerifyNone(t)`; the pinned `uber-go/goleak` floor is v1.3.0+ (Appendix A)	(extended assertions on existing tests)	v0.1 CI
`testscript`-based CLI contract tests for `gum call`, `gum code`, and `gum auth` flows: `testdata/script/*.txtar` files drive deterministic stdin/stdout/stderr assertions; the pinned `rogpeppe/go-internal` floor is v0.13.0+ (Appendix A)	`TestCLIContractTestscript`	v0.1 CI
Filesystem fsync fallback: on filesystems without `fsync` semantics, `internal/tee` falls back to best-effort durability and emits a structured warning `{"level":"warn","event":"fsync_not_supported", ...}`; no silent loss of artifact data (§8.7 R28A-Minor-5)	`TestTeeFsyncFallback`	v0.1 CI
TOON forward-compatibility: plugins emitting TOON with an unsupported version stamp fail with `TOON_VERSION_UNSUPPORTED` structured error rather than silent best-effort decoding (§9.0)	`TestToonVersionUnsupported`	v0.1 CI
`tee.secret` lifecycle embedding-model-independence: rotating the embedding model identity does not invalidate or rewrite `tee.secret`; the secret is keyed only to profile identity (§9.0 R28C-OQ-6)	`TestTeeSecretEmbeddingIndependence`	v0.1 CI
Aggregate batch output ceiling for `gum_parallel` from inside `gum.code`: element-level UTF-8-boundary truncation sets `_code_output_truncated: true` on the affected element AND on the outer `ParallelResults` envelope; the cumulative pre-dispatch ceiling at 32,768 bytes (§9.0.1) is enforced; truncation never corrupts UTF-8 multi-byte sequences	`TestGumParallelCodeOutputCeiling`	v0.1 CI
Meta-tool admin tuning: `meta_tools.search_apis.truncate_strings.default_chars` and `meta_tools.search_apis.collapse_arrays.max_items` are configurable from the admin tuning surface (§9.4); defaults and overrides are validated at config load	`TestMetaToolAdminTuning`	v0.1 CI
Audit log graceful shutdown: SIGTERM triggers a 2-second drain (configurable via `audit.drain_timeout_seconds`, 0-30, default 2) of pending audit entries; `audit.broken` is NOT set on clean exit; structured `audit_drain_complete` event is emitted (§11)	`TestAuditGracefulShutdown`	v0.1 CI
Keychain unavailable policy: when neither native keychain nor encrypted-file vault is available (e.g., read-only filesystem), credential reads fail with `AUTH_KEYCHAIN_UNAVAILABLE` and `user_message` suggests `gum auth use-adc`; the error never silently falls back to an in-memory cache (§7)	`TestAuthKeychainUnavailable`	v0.1 CI
HELP_TOPIC_TOO_LARGE: any `internal/help/<topic>.md` whose rendered body exceeds 8 KiB fails the build with `HELP_TOPIC_TOO_LARGE` (§7 error code table; §13 `gum://help/{topic}` cap)	`TestHelpTopicSizeCap`	v0.1 CI
`gum.poll.operation_name` description is the canonical name for the LRO handle argument; the description is reachable through `gum.describe_op("gum.poll")` and matches the v0.1 contract	(extended assertion on `TestTierAPerToolInputSchemaBudget`)	v0.1 CI
Compound auth token-forwarding allowlist: when an op's resolved variant declares `auth_strategy="compound"`, the dispatch layer MAY whitelist `google_access_token` in the plugin subprocess environment (§7 R28A-M-5); other strategies never forward access tokens to plugin subprocesses; the env denylist (§8.1) is the single source of truth	`TestCompoundAuthTokenForwarding`	v0.1 CI
LRO_UNSUPPORTED_IN_CODE: ops whose default variant is classified `lro=true` raise `{"error_code": "LRO_UNSUPPORTED_IN_CODE", ...}` before dispatch when called from inside `gum.code` (§6.1)	`TestLROUnsupportedInCode`	v0.1 CI
`gum_print` UTF-8 truncation: when the per-call byte limit is reached, the output is truncated at the last valid UTF-8 boundary before the limit; partial multi-byte sequences are excluded; the truncated suffix never contains a half-encoded character (§6.1)	`TestGumPrintUtf8Boundary`	v0.1 CI
Registry remains structurally mutable post-`Server.Run` (§4.2 forward-compat invariant): `Server.AddTool` is callable at the object level on the v0.1.0 `internal/mcp` server after `Run` returns. The v0.1.0 behavioral policy that forbids mid-session registration is enforced by a session-scoped gate in `internal/dispatch`, NOT by structural prohibition; this enables the v0.2.0 `tools/list_changed` enablement to land without re-architecting `internal/mcp`	`TestMCPRegistryStructurallyMutable`	v0.1 CI
`confirmation_purpose="high_stakes_write"` reserved-value rejection (§6.1.2): a `confirmation_token` whose embedded purpose is `"high_stakes_write"` (the v0.3.0 reserved value, NOT emitted by a v0.1.0 dispatcher) presented to a v0.1.0 dispatcher returns `CONFIRMATION_TOKEN_INVALID` with `data.reason="unknown_purpose"`; the closed-enum check happens before HMAC verification so that a forged token does NOT reveal whether the HMAC would have matched	`TestConfirmationTokenUnknownPurpose`	v0.1 CI
`tee_mode="failures"` step-6 vs step-7 boundary (§9.1, §expression-profile-dsl.md `tee_mode`): a `RATE_LIMITED` error raised by the in-process token bucket at step 6 of the dispatch lifecycle (§3.1) does NOT write a tee artifact (failure occurred before the executor); a `RATE_LIMITED` error raised by an upstream HTTP 429 at step 7 DOES write a tee artifact (failure occurred during the executor). The test fixture pair distinguishes the two paths by stubbing the token bucket vs. the HTTP transport	`TestTeeFailuresStep6VsStep7`	v0.1 CI
`cmd/gum` release-binary CGo-freeness across the full release matrix (§15 release build flags): `go list -deps ./cmd/gum/...` returns an empty intersection with the set of CGo-flagged packages for each `GOOS`/`GOARCH` pair in (`linux/amd64`, `linux/arm64`, `darwin/amd64`, `darwin/arm64`); the assertion is broader than `TestNoCGoInDispatch` (which only covers `internal/dispatch`) and catches keychain-backend regressions before tagging	`TestReleaseBinaryNoCGo`	Release gate
`internal/pluginenv/denylist.txt` single-source invariant (§14 pluginenv row): the SHA-256 of the `go:embed`-ed denylist material in `internal/pluginenv` and the SHA-256 of the `go:embed`-ed material in `cmd/gen-catalog` (which embeds the same file via a relative path or build-time copy) MUST be identical; the test fails if either embedding site drifts	`TestPluginEnvDenylistSingleSource`	v0.1 CI
`auth_strategy` enum extension completeness (§7 extension procedure): the gate enforces the (a)/(b)/(c) check against the closed enum in `internal/auth/strategy.go` minus the v0.1.0 baseline set (`gum_oauth`, `byo_oauth`, `adc`, `service_account`, `api_key`, `compound`, `plugin_managed`, `none`) which is already covered by pre-existing tests (`TestAuthStrategyRequired`, `TestCompoundAuthErrorEnvelope`, etc.). For every residual value: (a) it appears in `docs/catalog-abi.md`'s `auth_strategy` cross-reference, (b) it has an `internal/auth/strategy_<name>.go` file, and (c) this matrix contains a row whose test name is exactly `TestAuthStrategy<Name>` (CamelCase). The baseline set is enumerated in the test source so removing a value from it re-arms the residual check	`TestAuthStrategyEnumExtensionComplete`	v0.1 CI
Per-package line-coverage retention gate over the full tracked surface — `./cmd/gum/...` and `./internal/...` (`internal/coverage.GatedPackages`), excluding build-time tooling (`cmd/gen-*`, `cmd/measure-tier-a`, `cmd/test-matrix`, `cmd/coverage-floor`) and the generated, gitignored `gen/dispatch` tree (beads gum-b22o.5, gum-5wkg, gum-8ilq, gum-ql6c). `internal/coverage.FloorPercent` (85%) is the absolute minimum for any un-listed package; `internal/coverage.Ratchets` pins every tracked package at a retention baseline of `floor(current − ~1% jitter headroom)` (a baseline `Min` may sit above or below 85%, recording the level actually held). `make coverage-floor` (CI job `coverage-floor`, single Go toolchain) measures per-package coverage via `go test -coverprofile`, parses the raw profile, and fails when any package falls below its effective threshold (ratchet `Min`, else 85%); it also prints a non-failing `RATCHET_OPPORTUNITY` hint when a package exceeds its baseline by `RatchetOpportunityMargin` (2%). Lowering a ratchet `Min` requires a matching test-matrix note plus an owning `Bead`; gum-ql6c is the explicit v0.1 release-candidate recalibration after the gum.code/trust-boundary hardening and catalog-depth work shifted package denominators. Undocumented lowering or removing a `Bead` reference is itself a regression and breaks `TestRatchetEntriesHaveBeadReferences`	`TestFloorIs85` / `TestRatchetEntriesHaveBeadReferences` / `TestRatchetEntriesAreUnique` / `TestGatedPackages` / `TestCheckRespectsRatchet` / `TestOpportunitiesFlagsImprovedPackages` / `TestParseProfileAggregatesPerPackage`	v0.1 CI

Release claims such as ">=80% savings" may cite only fixture-backed local gain-ledger entries. Privacy-minimized telemetry is product telemetry, not release-gating evidence.