Test Matrix
This matrix is a normative proof map, not a product-behavior source of truth. The Requirement column summarizes canonical contracts from spec.md and companion docs so each gate has a testable obligation; if a row conflicts with a canonical contract, the canonical contract wins and this matrix must be patched in the same PR.
Required phase column. v0.1 CI means the gate is enforced from the first v0.1 commit. Release means the gate is enforced at release tagging. Same PR means the proof artifact MUST land in the same PR as the change that triggered the requirement — no deferral to a follow-up PR is permitted. vN.M CI means the gate is enforced starting with the named version's CI; PRs targeting that version must include the proof.
| Requirement | Proof artifact | Required phase |
|---|---|---|
| Tier A has exactly 9 meta-tools | TestTierAMetaToolCount |
v0.1 CI |
| Tier A has at most 18 convenience tools | TestTierAConvenienceToolCount |
v0.1 CI |
Tier A cold-start schema budget <= 8,000 cl100k_base tokens |
TestTierATokenBudget |
v0.1 CI and release |
Tier A output schemas preserve $defs |
TestOutputSchemaDefs |
v0.1 CI |
Tier A representative structuredContent validates against ToonResult, SingleObjectResult, RawJsonResult, GainResult, and CacheStatsResult schemas; diff-only 304 responses ({"unchanged": true, "etag": "..."}) are exempt and the test MUST include at least one 304 fixture and verify validation is skipped (not failed) for that shape |
TestTierAResponseShapeConformance / TestGainOutputSchema / TestCacheStatsOutputSchema |
v0.1 CI |
All Tier A tool registrations include outputSchema |
TestTierARegistrationScan |
v0.1 CI |
No production code path imports any package under a testdata/ path segment; test-only registration helpers in testdata/ cannot contribute to the Tier A budget |
TestTestdataNoProductionImport |
v0.1 CI |
Tool annotations serialize according to go-sdk v1.6.0 wire semantics: readOnlyHint:true present on read-only tools, readOnlyHint:false may be omitted on non-read-only tools, destructiveHint present with explicit true/false on every Tier A tool, and optional idempotentHint / openWorldHint never serialize as null |
TestToolAnnotations / TestToolAnnotationsWireForm |
v0.1 CI |
Confirmation-required paths are bound to exact pending requests: gum.destructive, gum.write variants with confirmation_policy="high_stakes_write", and gum.code emit short-lived confirmation tokens, reject missing/expired/replayed/mismatched tokens with CONFIRMATION_TOKEN_INVALID, and make no upstream call on rejection; gum.code tokens bind language, source_sha256, allow_write, allow_destructive, destructive_budget, and canonical destructive_scope; fixture set MUST include (a) replay attempt within TTL, (b) replay attempt after TTL, (c) cross-process replay (token minted by one gum process and presented to another), (d) simulated cross-profile replay (token minted with profile A's per-process HMAC key presented to a handler initialized with profile B's distinct per-process HMAC key), (e) confirmed gum.code re-invocation that increases destructive_budget or broadens destructive_scope after approval, and (f) a gum.write high-stakes variant whose token cannot be reused for another write or destructive variant — all reject with CONFIRMATION_TOKEN_INVALID, with the cross-process and cross-profile cases carrying reason: "mismatch" per §6.1.2; the cross-profile fixture documents the per-process profile-binding invariant (§6.1.2 Profile binding) even though it cannot occur in a correctly deployed v0.1.0 system |
TestConfirmationTokenBinding / TestHighStakesWriteConfirmation |
v0.1 CI |
gum.code confirmation token re-hashes submitted source at re-invocation: presenting a valid token with a different source body (same source_sha256 in token, different actual source) is rejected with CONFIRMATION_TOKEN_INVALID (reason: mismatch); no script execution occurs |
TestConfirmationTokenSourceRehash |
v0.1 CI |
Tier A convenience-tool ABI is generated from the normative §4.1 table: each row binds to the listed backing op_id, fixed/default variant rule, required/optional args, output profile/formats, and confirmation-passthrough policy; rows that wrap confirmation_policy="high_stakes_write" variants without confirmation_passthrough=yes fail catalog generation; every convenience tool registers an outputSchema covering its confirmation branch when applicable |
TestTierAConvenienceABI / TestConvenienceHighStakesConfirmation |
v0.1 CI |
MCP completions are available for op IDs, variant IDs, resource-template params, plugin names, help topics, operation handles, and the closed enums gum.code.language (risor only in v0.1.0) and gum.*.format (toon, csv, json, markdown) without upstream calls; CLI invoke completions cover op IDs, --variant-id, plugin names, help topics, --profile, and gum call boolean output flags (--json, --toon, --csv, --markdown), while gum call --format and gum code --lang are absent in v0.1; reporting subcommands may expose their own flags, and gum gain --format completes text, json, and csv |
TestMCPCompletions / TestCLICompletions |
v0.1 CI |
Tier A registrations complete before MCP transport accepts connections; no Server.AddTool calls occur after Server.Run; no spurious tools/list_changed notifications fire during handshake |
TestMCPStartupOrdering |
v0.1 CI |
Loading active plugins before Server.Run does not grow tools/list; v0.1.0 roster matches docs/tier-a-roster.v1.json exactly (9 meta-tools + 18 convenience tools), while plugin variants are reachable only through catalog/resources/search after restart or through a later standalone CLI process startup |
TestTierARosterManifest / TestTierAToolCountWithPlugins |
v0.1 CI |
logging server capability is absent from the v0.1.0 initialize response; the §13.2 capability matrix declares exactly which server-owned capabilities are advertised (tools, resources, prompts, completions — notifications/cancelled and notifications/progress honoured; roots consumed only when client-owned capability is declared; sampling/elicitation/logging/list_changed absent) |
TestMCPInitializeCapabilities |
v0.1 CI |
MCP 2025-11-25 task capability is absent from the v0.1.0 initialize response; tasks/get, tasks/result, tasks/list, tasks/cancel, notifications/tasks/status, and notifications/elicitation/complete are not advertised or emitted in v0.1.0; emitted tools/list, resources/list, resources/templates/list, and prompts/list payloads contain no icons fields |
TestMCPInitializeCapabilities / TestNoTaskCapabilityV01 / TestNoIconMetadataV01 |
v0.1 CI |
gum.poll sends progress only when _meta.progressToken is present, uses MCP progress wire shape, and preserves the token's JSON type (string or integer) end-to-end without stringifying integers or coercing to float |
TestPollProgressTokenContract |
v0.1 CI |
gum.poll timeout and context cancellation do not leak goroutines |
TestPollTimeoutAndCancellation |
v0.1 CI |
MCP roots drive project-local expression-profile lookup; fixtures cover single file:// root, multiple roots with _meta.gumRoot, multiple roots without _meta.gumRoot failing with PROJECT_ROOT_REQUIRED, non-file roots rejected for project-local lookup, roots-unavailable MCP sessions disabling project-local lookup by default, and --allow-implicit-project-root as the only path that may use GUM_PROJECT_ROOT / $PWD with _profile_resolution_warning: "implicit_project_root" |
TestProfileResolutionFromMCPRoots |
v0.1 CI |
Prompts are registered and retrievable through prompts/list and prompts/get; v0.1 prompts are exactly the zero-argument static prompts in §13, prompts/list reports empty arguments, prompts/get rejects supplied arguments with INVALID_ARGS, embedded templates stay under 6 KiB, and returned payloads contain one user-message text block |
TestPromptRegistration / TestPromptZeroArgumentContract |
v0.1 CI |
No unsolicited server notifications are emitted before notifications/initialized |
TestMCPInitializedWaitRule |
v0.1 CI |
MCP resource templates appear in resources/templates/list (not resources/list) |
TestResourceTemplateRegistration |
v0.1 CI |
resources/list and resources/templates/list accept only MCP cursor pagination: requests use optional cursor, responses use nextCursor, the server page cap is 100 entries, and no client page-size parameter is accepted or required |
TestMCPResourceCursorPagination |
v0.1 CI |
gum://schema/{ref} returns JSON Schema documents for schema refs exposed by active gum://op/{id} / gum://variant/{id} resources using exactly one MCP text resource content item with requested uri, mimeType="application/schema+json", and JCS-canonical JSON payload, enabling exact TOON reconstruction; gum://op/{id} and gum://variant/{id} success reads use the same one-item JSON shape with mimeType="application/json"; refs must satisfy the safe served-ref grammar; unknown refs, invalid ref grammar, inactive-plugin refs (installed_pending_restart or needs_configuration), and quarantined-plugin refs return canonical RESOURCE_NOT_FOUND as a JSON-RPC resource error; divergent full-profile-inventory ref collisions fail build/install with SCHEMA_REF_COLLISION while identical-body reuse is allowed |
TestSchemaResourceLookup / TestOpVariantResourceWireShape / TestSchemaRefGrammar / TestSchemaRefCollision |
v0.1 CI |
Static resources (gum://catalog, gum://status/canaries, gum://plugins, gum://help/topics) appear in resources/list |
TestStaticResourceRegistration |
v0.1 CI |
Recovery links for recovery="resource_link" appear as both _expression.full_result_resource and exactly one MCP resource_link content block with the same gum://results/<hash> URI; local_artifact never emits a resource-link block |
TestRecoveryResourceLinkContentBlock |
v0.1 CI |
tee.secret lifecycle and principal scoping: (a) absent secret is generated lazily on first lossy write, exactly 32 bytes, mode 600; (b) corrupt or malformed secret (not 64 lowercase hex chars) causes TEE_SECRET_CORRUPT error, no silent regeneration; (c) gum://results/{hash} resolution uses directory scan, success reads return exactly one decompressed application/json resource item, and absent hashes return RESULT_ARTIFACT_EXPIRED; (d) same op/variant/args under two credential subjects in one profile yields different recovery URIs, tee paths, HTTP cache keys, semantic cache keys, and gain-ledger subject fingerprints |
TestTeeSecretStability / TestResultResourceReadWireShape / TestPrincipalScopedRecoveryAndCache |
v0.1 CI |
installed_pending_restart and needs_configuration plugins are inventory-only: not searchable, operation-completable, describable as active, invokable via tools, or reachable from gum.code; gum://plugin/{name} validates against the fixed per-status resource shape, including safe credential descriptors for needs_configuration and no raw env var names; gum://op/{id} and gum://variant/{id} may consult inventory only for status-only inactive-plugin responses with execution_support: "schema_only" and status, never full invocable schemas; quarantined gum://op/{id} and gum://variant/{id} return VARIANT_QUARANTINED JSON-RPC resource errors instead of status-only responses; invoking inactive plugins via risk invoke tools returns UNSUPPORTED_CAPABILITY with the same status; inactive-plugin UNSUPPORTED_CAPABILITY envelopes MUST contain status and MUST NOT contain unsupported_capabilities or loader_kind (mutual-exclusion per §5.8); the capability-class path MUST NOT contain status; fixtures prove that live inventory registry and active session catalog snapshot can diverge without leaking inactive variants into search/describe/completions/invoke, and that quarantined wins when a plugin also has pending-restart or needs-configuration state |
TestPluginInactiveInventoryOnly / TestPluginResourceShape / TestPendingRestartExcludedFromCompletions / TestPendingRestartVariantResource / TestPluginNeedsConfiguration / TestPluginStatusPrecedence |
v0.1 CI |
JSON-valued resource reads (gum://plugin/{name} and deprecated gum://help/{topic} redirects) return exactly one MCP text resource content item with requested uri, mimeType="application/json", and parseable JCS-canonical JSON payload; markdown help topics return exactly one text/markdown item |
TestJSONResourceReadWireShape |
v0.1 CI |
| Sampling remains optional | capability-negotiation test once sampling lands | v0.3 CI |
Plugin schema_ref bundle resolves and materializes served request/response refs for bundled and third-party install paths; third-party fixtures cover safe ref grammar rejection, traversal/separator/percent-encoded separator rejection, copied plugin-schemas/<schema_ref>.request.<sha256>.json and plugin-schemas/<schema_ref>.response.<sha256>.json, full-profile-inventory collision rejection with SCHEMA_REF_COLLISION across active, pending-restart, needs-configuration, and quarantined plugins, and identical-body ref reuse |
TestPluginSchemaRefBundled / TestPluginSchemaRefThirdPartyInstall / TestPluginSchemaRefCollision |
v0.1 CI |
| Unknown capabilities fail closed | generator + plugin install tests for UNKNOWN_CAPABILITY |
v0.1 CI |
Unknown backend_kind values fail closed at build/install with UNKNOWN_BACKEND_KIND; stale-binary runtime loading of a newer catalog fails before upstream dispatch with UNSUPPORTED_CAPABILITY carrying loader_kind="backend_kind", unknown_value, and catalog_abi_version |
TestUnknownBackendKind / TestRuntimeUnknownBackendKindUnsupportedCapability |
v0.1 CI |
Unknown interface_kind values fail closed at build/install with UNKNOWN_INTERFACE_KIND; stale-binary runtime loading of a newer catalog fails before upstream dispatch with UNSUPPORTED_CAPABILITY carrying loader_kind="interface_kind", unknown_value, and catalog_abi_version |
TestUnknownInterfaceKind / TestRuntimeUnknownInterfaceKindUnsupportedCapability |
v0.1 CI |
| Variant lifecycle does not silently fall back | tests for deprecated, superseded, removed, and quarantined variants | v0.1 CI |
| Default variant never selects removed/quarantined/deprecated variants when an active executable alternative exists | TestDefaultVariantLifecycleSelection |
v0.1 CI |
Plugin registry writes are atomic across the canonical registry ABI (plugin-catalog.json, plugins.lock, and plugin-state.json schemas in spec.md §8.7 / docs/catalog-abi.md): install/remove/startup-activation stages temp files with one install_generation/install_txid, publishes all three under plugins.install.lock, keeps the previous complete generation authoritative on failure, and startup recovers from mixed generations by selecting the last complete shared generation and quarantining orphan staged artifacts |
TestPluginRegistryABISchemas / TestPluginInstallTransactionAtomicity / TestPluginInstallCrashRecovery |
v0.1 CI |
gum://plugins inventory is deterministic and includes a status field per plugin; fixture set MUST include at least one plugin in each status value (active, installed_pending_restart, needs_configuration, quarantined) |
resource read test sorted by plugin name | v0.1 CI |
Gain ledger begins with {"record_type":"header","schema_version":1,"tokenizer":"cl100k_base"} and every dispatch row has record_type:"entry" plus the required v0.1 fields (session, op_id, variant_id, output_profile, args_hash, auth_subject_fingerprint, token counts, cache/field-mask status, served_from_cache, is_retry, op_family, baseline_method); gum_parallel outer rows use the exact sentinel values from §12.3 (output_profile=null, auth_subject_fingerprint="batch", raw_tokens=0, cache_status/field_mask_status="not_applicable", op_family="gum_parallel"); cancelled parallel rows include cancelled: true |
TestGainLedgerHeader / TestGainLedgerEntrySchema / TestGainParallelOuterEntrySchema |
v0.1 CI |
gum.gain and gum gain both read the selected profile's server-local gain-ledger.jsonl; usage.jsonl is not read in v0.1.0; remote/containerized MCP behavior depends on server-side ledger availability, not client filesystem access; disabled or unavailable ledger returns stable isError=true tool errors with GAIN_DISABLED or GAIN_LEDGER_UNAVAILABLE respectively and no GainResult structuredContent |
TestGainLedgerSourceOfTruth / TestGainRemoteServerSideLedger / TestGainErrorBranches |
v0.1 CI |
Release-gated >=80% savings uses end-to-end ledger totals including gum_parallel outer entries; the test MUST verify batch_id linkage (outer entry batch_id equals all inner entries' batch_id), element_count equals the number of inner entries, end_to_end_savings includes outer envelope overhead, batch_envelope_overhead matches outer-entry token contribution, per_op_shaping_savings is diagnostic only, GainResult validates its mode discriminator and exactly-one-array summary/session/history schema branches, and outer entry variant_id is null (per §12.3 intentional design — a parallel dispatch has no single variant) |
TestGainEndToEndSavingsIncludesBatchEnvelope / TestGainJSONOutputSchema |
Release gate |
gum gain --session <ID> filters by the local ledger session field and emits GainResult.operations[]; gum gain --since <RFC3339> applies a UTC lower-bound filter; gum gain --history groups chronological JSON and text output by session and op_family through GainResult.history[]; and gum gain --exclude-retries excludes only entries with is_retry=true from displayed aggregates while leaving raw ledger entries unchanged |
TestGainSessionAndRetryFilters / TestGainSinceAndHistoryFilters |
v0.1 CI |
Diff-only mode 304 response short-circuits the expression pipeline; ledger records served_from_cache: "etag_304" with response_tokens: 0; _expression is absent in the 304 response; a second call with the same op_id and args_canonical but a different resolved variant_id or different auth_subject_fingerprint MUST NOT produce a 304 (different cache key) |
TestDiffOnlyModeEtagReplay |
v0.1 CI |
gum.code enforces cumulative output byte budget across prints and final return |
TestCodeOutputBudget |
v0.1 CI |
gum.code destructive fan-out is bounded across MCP and CLI: confirmed destructive scripts require destructive_budget / --destructive-budget in 1..20, consume one unit before each destructive call, reject over-budget calls with DESTRUCTIVE_BUDGET_EXCEEDED, reject calls outside destructive_scope / --destructive-scope with DESTRUCTIVE_SCOPE_MISMATCH, consume each gum_confirm_destructive(op_id, resource_key?) on the immediately following destructive call only, reject v0.1 script-header pragmas as unsupported rather than parsing them silently, and make no upstream request on any rejection path |
TestCodeDestructiveBudgetAndScope / TestCodeCLINoScriptHeadersV01 |
v0.1 CI |
Expression profiles validate; recovery="resource_link" with tee_mode!="always" fails with PROFILE_TEE_MODE_CONFLICT before dispatch |
gum profile validate; JSON Schema fixture tests; TestProfileTeeModeConflict |
v0.1 CI |
| Expression-profile fixtures meet token budgets | gum profile test with cl100k_base |
Release gate |
| Catalog ABI versions reject unsupported future artifacts | loader tests for CATALOG_SCHEMA_UNSUPPORTED, PLUGIN_MANIFEST_SCHEMA_UNSUPPORTED, PLUGIN_CATALOG_SCHEMA_UNSUPPORTED, PLUGIN_LOCK_SCHEMA_UNSUPPORTED, and PLUGIN_STATE_SCHEMA_UNSUPPORTED |
v0.1 CI |
Third-party plugin manifests use top-level manifest_schema_version only: missing version and [plugin].manifest_schema_version both fail install with PLUGIN_MANIFEST_SCHEMA_UNSUPPORTED, while bundled development manifests may use the documented v0.1.0 compatibility default |
TestPluginManifestSchemaVersionPlacement |
v0.1 CI |
service_root_template is rejected before v0.4.0 with SERVICE_ROOT_TEMPLATE_DEFERRED; v0.1-v0.3 dispatch uses discovery-derived root metadata only and never advertises sovereign/government/private-service-connect endpoint variants as executable |
TestServiceRootTemplateDeferred |
v0.1 CI |
Embedded catalog.json integrity is verified against a committed SHA256 digest in catalog.json.sha256; cmd/gen-catalog writes both files in lockstep so silent disk corruption, accidental edits, or unauthorized modification fail the build |
TestCatalogIntegrity / TestCatalogChecksumFileFormat |
v0.1 CI |
| JCS canonicalization is stable | RFC 8785 test vectors in internal/cache/canonical_test.go |
v0.1 CI |
| Audit log rotation and recovery are deterministic | rotation, ENOENT retry, audit.broken sentinel tests |
v0.1 CI |
| CI runs on the Go floor and current stable Go toolchains | CI matrix: go1.25.x and current stable |
v0.1 CI |
Easy auth contract: gum init runs auth readiness and launches/prints the default gum auth login path when no profile credential exists; GUM-managed browser OAuth uses built-in client ID + PKCE + loopback + CSRF state, never commits OAuth client secrets to source, and treats release-injected Desktop client material as public client material rather than a confidential secret; refresh tokens and plugin secrets are stored only in OS keychain; profile config stores only non-secret metadata; interactive desktop profiles do not silently consume ambient GOOGLE_APPLICATION_CREDENTIALS unless gum auth use-adc was run; AUTH_REQUIRED and SCOPE_MISSING user messages point gum_oauth users to gum auth login ..., but non-gum_oauth strategies point to gum auth setup <op_id>; CLI/MCP/code mode reuse the same selected-profile credential on the GUM host |
TestAuthHappyPathNoUserClientSecret / TestAuthKeychainStorageOnly / TestAuthNoAmbientADCWithoutOptIn / TestAuthErrorNextAction / TestAuthLoopbackStateRequired |
v0.1 CI |
Bundled OAuth scope allowlist is enforced for variants that use auth_strategy="gum_oauth": apps/gum/internal/embedded/data/auth-managed-scopes.v1.json is the only source of eligible scopes; only status="active" + verification_state="verified" + project_evidence_state="ready" + live_canary_state="passing" scopes count; planned scopes are not requested; generator rejects a gum_oauth variant requiring any out-of-manifest or unverified scope with GUM_OAUTH_SCOPE_NOT_MANAGED; release gate verifies every active restricted/sensitive scope has an evidence pointer and every active scope has project, token-exchange, and refresh-canary evidence, otherwise GUM_OAUTH_MANAGED_CLIENT_NOT_READY |
TestManagedOAuthScopeManifest / TestGumOAuthScopeNotManaged / TestManagedScopeVerificationEvidence / TestManagedOAuthProjectReadiness / TestManagedOAuthLiveCanaryRequired |
v0.1 CI and release |
Testing-window opt-in gate (§7 transitional): a scope is eligible for gum_oauth before full promotion only when managed_project.publishing_status == "testing" and the scope sets testing_allowed: true; the opt-in is per-scope (flipping the project to testing never auto-exposes non-opted-in scopes) and inert under any other publishing status (production stays strict by default); embedded_client_secret=true still fails GUM_OAUTH_MANIFEST_INVALID at the gate even with eligible scopes |
TestCanStartGumOAuthTestingModeAllowsOptedInScope / TestCanStartGumOAuthTestingModeRequiresOptIn / TestCanStartGumOAuthTestingFlagInertOutsideTestingStatus / TestCanStartGumOAuthEmbeddedSecretRejected |
v0.1 CI |
| Managed OAuth scope expansion is full re-consent, not implicit incremental append: requesting additional scopes builds the complete desired managed scope set, verifies granted scopes and subject fingerprint before replacing the keychain credential, rejects subject mismatch without storing tokens, and returns BYO/compound setup when any requested scope is not managed-ready | TestAuthScopeUpgradeFullReconsent / TestAuthScopeUpgradeSubjectMismatch / TestAuthScopeUpgradeUnmanagedScopeRoutesToSetup |
v0.1 CI |
Auth requirement taxonomy is enforced: every executable variant/plugin declares one auth_strategy; unknown non-x-* auth components fail build/install with AUTH_COMPONENT_UNKNOWN; non-gum_oauth variants emit errors with auth_strategy, missing_components, and setup_command; fixture includes a Google Ads Keyword Planner-like compound operation requiring developer token, OAuth client/client secret or refresh token, customer ID, optional login customer ID, billing/account prerequisites, account-permission checks, and approved access/permissible-use/allowlist hints, and proves plain gum auth login is not suggested as sufficient |
TestAuthStrategyRequired / TestAuthComponentUnknown / TestCompoundAuthErrorEnvelope / TestGoogleAdsKeywordPlannerAuthFixture |
v0.1 CI |
BYO OAuth grant storage (spec §7 "BYO grant storage"): refresh tokens are keyed per sha256(client_id) with the granted-scope set stored alongside; a stored grant is reused when it is a superset of the requested scopes (broad gum login satisfies narrow per-op resolves), an uncovered scope routes to NO_REFRESH_TOKEN carrying the op's full scopes, separate per-op authorizations union into one grant (no clobber), and distinct client_ids stay isolated |
TestByoOAuthBroadGrantSatisfiesNarrowResolve / TestByoOAuthMissingScopeForcesReauth / TestByoOAuthGrantUnionAccumulates / TestByoOAuthPerClientIsolation |
v0.1 CI |
v0.1.0 login surface + just-in-time auth (spec §7 "v0.1.0 login surface", "Just-in-time authorization"): gum auth login / gum login run the BYO loopback flow against the registered client, pre-authorizing the full catalog scope set when no --scope is given; the first gum call on an unauthorized byo_oauth variant prompts a TTY operator Authorize <scope>? [Y/n] then retries once on assent, while agents/pipes fall through to structured AUTH_REQUIRED (no gcloud dependency) |
TestRunLoginWithConfiguredClientRunsFlow / TestResolveLoginScopesEmptyDerivesFromCatalog / TestTopLevelLoginAliasRegistered / TestMaybeJITLoginAccepted / TestMaybeJITLoginNotInteractive / TestCallRetriesAfterJITLogin |
v0.1 CI |
BYO-only public auth posture: gum login and the JIT byo_oauth resolver require an operator-registered OAuth client. Injected bundled-client values do not satisfy byo_oauth, and the public auth CLI does not register managed-status. |
TestResolveAuthIgnoresInjectedManagedClient / TestResolveAuthRequiresBYOForAllScopes / TestResolveAuthNoManagedClientStillNotConfigured / TestRunLoginIgnoresInjectedManagedClient / TestAuthManagedStatusNotRegisteredForV1 / TestManagedSupportedScopesIncludesSearchConsoleReadonly |
v1 CI |
Workspace and account-policy failures are actionable: Google admin_policy_enforced-style failures map to missing_components:["workspace_admin_trust"] or ["org_policy_exception"], never to a retry-login loop; active credential alias and auth_subject_fingerprint prevent wrong-account dispatch after browser account switching |
TestWorkspaceAdminPolicyAuthEnvelope / TestAuthActiveCredentialAliasRequired / TestAuthSubjectFingerprintMismatchBlocksDispatch |
v0.1 CI |
Plugin credential setup is centralized: manifests with credential descriptors and auth components can be configured through gum plugin setup <name>; setup stores secret components in the OS keychain, exposes only descriptor aliases/display names in resources and errors, displays external prerequisites as checklist items, runs the live canary after credentials are supplied, and clears needs_configuration only on canary success; raw env var names and secret values are not emitted in MCP resources |
TestPluginSetupCredentialFlow / TestPluginCredentialNoRawEnvLeak / TestPluginExternalPrerequisiteChecklist |
v0.1 CI |
CLI gum call argument grammar parses typed JSON, repeated arrays, @file, stdin, and dotted-key escaping deterministically; every call requires --risk=read|write|destructive; host-control flags (--fields, --page-size, --page-token, boolean output flags) are not aliases for positional operation args and cannot be overridden by them; v0.1 rejects gum call --format while allowing reporting subcommands such as gum gain --format; duplicate output-format flags fail with CLI_ARG_DUPLICATE; --variant-id selects a non-default active variant and unknown/removed/quarantined/installed_pending_restart/needs_configuration variants fail before upstream dispatch; mismatched resolved variant risk returns RISK_TOOL_MISMATCH; destructive plus confirmation_policy="high_stakes_write" calls require --yes or an interactive TTY confirmation before dispatch |
TestCLIArgGrammar / TestCLIRiskGate / TestCLIVariantSelection |
v0.1 CI |
CLI gum code confirmation grammar is exact: --allow-write or --allow-destructive requires interactive y or non-interactive --yes; read-only code requires neither; repeated --destructive-scope op_id[:resource_key] is the only v0.1 scope grammar; --no-confirm, script-header pragmas, and --lang fail parsing or validation rather than being silently accepted |
TestCLICodeConfirmationAndScopeGrammar |
v0.1 CI |
Automation-safe read-only/reporting CLI commands support stable --format=json roots from §12 (search, describe, plugin list, plugin info, catalog list-overrides, cache stats, profile validate, profile test, gain) and golden tests reject drift in JSON field names |
TestCLIJSONOutputContracts |
v0.1 CI |
strip_nulls=true is rejected unless null_elision_safe_fields covers elided fields |
TestProfileStripNullsSafety |
v0.1 CI |
field_mask_mode="dual_fetch" is rejected for every write/destructive variant and every non-idempotent read variant; only read + idempotent variants may issue the second unmasked recovery fetch |
TestDualFetchReadOnlyIdempotentGate |
v0.1 CI |
TOON output includes resolved variant header and exact variant schema lookup works |
TestToonVariantHeader |
v0.1 CI |
TOON in-tree parser round-trips representative fixtures (list result with nulls, quoted CSV fields, zero omitted_count, empty body) losslessly |
TestToonRoundTrip |
v0.1 CI |
RESULT_ARTIFACT_EXPIRED envelope returned on resources/read of a deleted gum://results/<hash> artifact as a JSON-RPC application error with error.code=-32010 and exact §7 error.data fields: error_code, hash, uri, expires_at, user_message, and suggestion |
TestResultArtifactExpiredError |
v0.1 CI |
Generated REST dispatch stubs propagate context (.Context(ctx) before .Do()); cancelling context aborts in-flight HTTP within 100ms |
TestExecutorContextPropagation |
v0.1 CI |
Long-tail raw REST unknown-argument handling is fail-closed by default for every risk class; read-only allowlist pass-through applies only to explicitly configured discovery-rest/raw-http variants, emits _validation_warnings, and is ignored for write/destructive, typed SDK, gRPC, and plugin backends |
TestLongTailUnknownArgHandling |
v0.1 CI |
Any PR adding a new backend_kind value includes a fixture-backed executor contract test |
TestBackendKind<Name> |
Same PR |
google-ads-sdk executes Google Ads Keyword Planner POST custom-methods (customers/{id}:generate*), injecting the secret developer-token header server-side (never an invocation arg) alongside the byo_oauth Bearer and optional login-customer-id |
TestBackendKindGoogleAdsSDK |
v0.1 CI |
Any PR adding a new interface_kind value includes a fixture-backed interface contract test |
TestInterfaceKind<Name> |
Same PR |
Any PR adding a new grpc-sdk or sdk-native adapter_key includes a binding-schema fixture and adapter-registry contract test proving the binding resolves without ad hoc code in the catalog loader |
TestBackendBinding<Name> |
Same PR |
Plugin variants materialize explicit backend binding objects: mcp-plugin requires tool_name, and bundled grpc-plugin ABI fixtures require rpc_service plus rpc_method; missing or malformed selector fields fail with PLUGIN_BINDING_INVALID before subprocess start unless the third-party Shape 2 install gate applies first |
TestPluginBindingSchema |
v0.1 CI |
Third-party Shape 2 manifests are rejected before v0.4.0: [plugin].shape="grpc-subprocess" or any backend_kind="grpc-plugin" in a third-party install fails with PLUGIN_SHAPE_UNSUPPORTED before binding selector validation, schema copy, executable staging, canary, or registry writes; malformed third-party Shape 2 selectors still return PLUGIN_SHAPE_UNSUPPORTED, while bundled ABI fixtures continue to use PLUGIN_BINDING_INVALID for selector failures |
TestThirdPartyShape2InstallRejected |
v0.1 CI |
Third-party plugin namespace ownership is stable and profile-scoped: manifests require namespace_owner, install records prefix ownership in the selected profile's plugins.lock, matching owners may upgrade/reinstall within that profile, mismatched owners fail with PLUGIN_NAMESPACE_CONFLICT, cross-profile locks never merge, and --dev-allow-namespace-conflict is rejected outside dev profiles |
TestPluginNamespaceOwnership |
v0.1 CI |
Any PR promoting a capability class from schema_only to executable (or adding a new executable atom) includes TestCapabilityClass<Name> and updates this matrix |
TestCapabilityClass<Name> |
Same PR |
Any PR adding or removing a language closed-enum value updates §4.3, §6.2, §12 CLI help, §13 completions, dependency floors as needed, and the TestMCPCompletions fixture in the same PR; any PR adding or removing a format value also updates docs/expression-profile-dsl.md, docs/expression-profile-dsl.json, and format fixtures |
TestMCPCompletions (extended) |
Same PR |
Elicitation-based managed-scope re-consent, when enabled in v0.2.0+, uses a structured approval object bound to op_id, exact required scopes, profile, expected subject hint, and request hash; successful login verifies granted scopes and resulting auth_subject_fingerprint, stores nothing on mismatch/decline/cancel, emits an audit event, returns SCOPE_GRANTED, and does not auto-retry the original operation |
TestElicitationScopeUpgradeBinding |
v0.2 CI |
[override_bindings] in project-local and user-global profile files attaches a profile to listed op_id or variant_id keys; override-bindings-only files are valid when referenced profiles resolve elsewhere; variant_id wins over op_id for the same resolved variant; rejects undefined-profile, unknown-op_id, unknown-variant_id, and structural errors with OVERRIDE_BINDING_INVALID; project-local wins over user-global |
TestOverrideBindings |
v0.1 CI |
gum.describe_op registers #/$defs/DescribeOpResult as its outputSchema, validating responses including the deterministic variants[] truncation form controlled by meta_tools.describe_op.max_variants (default 5; ops with 6+ variants truncate to 5 by default, with variants_total and variants_omitted_count), an override-positive fixture that includes risk_override=true and risk_override_reason, and explicit exclusion of inactive-plugin-only status / reason fields, which appear only on gum://op/{id} and gum://variant/{id} resource responses |
TestDescribeOpOutputSchema |
v0.1 CI |
gum://help/{topic} returns the canonical §7 RESOURCE_NOT_FOUND envelope as a JSON-RPC application error with error.code=-32004 for topics absent from gum://help/topics; active topics return the §7 text/markdown MCP resource-content shape; deprecated topics return the §7 application/json resource-content shape containing only {"status":"deprecated","redirect":"<new-topic>"} |
TestHelpResourceNotFound |
v0.1 CI |
Plugin-local failure codes map deterministically to stable GUM error codes (RATE_LIMIT→RATE_LIMITED, AUTH_EXPIRED→AUTH_REQUIRED, PARSE_FAILURE→SERVICE_DOWN, SERVICE_DOWN→SERVICE_DOWN, INVALID_INPUT→INVALID_ARGS) while preserving retry fields and sanitized source metadata as specified |
TestPluginErrorCodeMapping |
v0.1 CI |
Plugin schema bundles declare one manifest schema_ref whose JSON Schema document contains $defs.request and $defs.response; build/install materializes request_ref=<schema_ref>.request and response_ref=<schema_ref>.response, copies those served schemas by hash, rejects missing defs with PLUGIN_SCHEMA_REF_INVALID, and rejects divergent full-inventory collisions with SCHEMA_REF_COLLISION |
TestPluginSchemaBundleMaterialization / TestPluginSchemaRefCollision |
v0.1 CI |
Plugin manifests listing GUM_-prefixed env vars, exact denylist entries (GOOGLE_APPLICATION_CREDENTIALS, OPENAI_API_KEY, ANTHROPIC_API_KEY), or _GUM* variables in needs_user_creds fail install with PLUGIN_ENV_PROHIBITED; a single curated in-binary denylist source of truth is shared by catalog build, plugin install, and runtime env scrubbing; dispatch scrubs prohibited vars from the subprocess environment regardless of manifest declarations |
TestPluginEnvProhibited / TestPluginEnvExactDenylist |
v0.1 CI |
Plugin manifests with non-empty needs_user_creds must declare one safe credential descriptor per env var; missing, duplicate, or extra descriptors fail with PLUGIN_CREDENTIAL_DESCRIPTOR_INVALID; install with missing required credentials records needs_configuration, skips live canary without quarantine, exposes only descriptor aliases in resources/errors, and a later successful credentialed gum canary --live is required before activation |
TestPluginCredentialDescriptors / TestPluginNeedsConfigurationInstall |
v0.1 CI |
Plugin executable binding: non-dev plugin sources launch only an absolute executable inside the host-managed verified install root; the selected profile's plugins.lock records executable path, executable SHA-256, normalized argv, and install root; runtime spawn re-hashes the executable and quarantines/refuses execution on mismatch; PATH-only, shell-wrapper, and runtime uvx resolution paths fail with PLUGIN_EXECUTABLE_UNTRUSTED; fixtures assert normalized argv for PyPI, GitHub release, Git, and dev-only local sources |
TestPluginExecutableBinding / TestPluginCommandNormalization |
v0.1 CI |
gum_parallel cancellation: cancelling the outer context propagates to in-flight inner calls within 200ms; after scheduling starts, result envelopes contain completed elements plus unfinished elements with error.error_code="CANCELLED" and error.cancelled=true, no success payload fields on cancelled elements, ledger records per-element cancelled: true and marks the outer entry cancelled: true; no goroutine leak |
TestGumParallelCancellation |
v0.1 CI |
gum_parallel 429 per-service-family isolation: a 429 from a workspace worker pauses only workers sharing service_family = "workspace"; workers with a different service_family continue uninterrupted; the pausing family resumes after retry_after_ms (or 60s fallback), staggered 50ms × worker_index; the gain ledger records the outer entry with the correct total element_count |
TestGumParallel429ServiceFamilyIsolation |
v0.1 CI |
grpc-sdk routing_headers invariant: catalog build rejects unknown field paths (GRPC_ROUTING_HEADER_NOT_FOUND), duplicates (GRPC_ROUTING_HEADER_DUPLICATE), and empty-array forms (GRPC_ROUTING_HEADER_NOT_REQUIRED); fixture covers both present and omitted forms |
TestGrpcRoutingHeaderInvariant |
v0.1 CI |
gum://help/topics is generated from docs/help-topics.v1.json; every listed active or deprecated topic resolves; deprecated rows return only a redirect object; no topic handler exists unless it is listed by the manifest; v0.1.0 manifest contains the canonical eight active topics and no deprecated rows |
TestHelpTopicsSeedSet / TestHelpTopicsManifest |
v0.1 CI |
gum_parallel result envelope compression (§9.0.1): when ≥2 inner results share identical ExpressionMeta field values, those fields are hoisted into the outer envelope's shared_expression_fields and omitted from per-result _expression delta objects (#/$defs/ExpressionMetaDelta); the receiver reconstructs effective per-result ExpressionMeta as shared_expression_fields ∪ per_result._expression with per-result values winning on conflict; each reconstructed effective object MUST validate against full ExpressionMeta, while each emitted delta validates against ExpressionMetaDelta; round-trip MUST yield byte-identical unhoisted forms; identity is determined under canonical-JSON byte-equality so null and absent are NOT identical (§9.0.1 rule 1); fixture set MUST include (a) at least one heterogeneous batch (N≥2, no field identical across all results) verifying shared_expression_fields is absent and behavior matches the un-hoisted form, (b) one all-null fixture verifying explicit null hoists with value null, (c) one mixed null/absent fixture verifying no hoisting occurs, (d) one fixture asserting the outer _expression.variant_id is null per §12.3, and (e) one fixture where every inner result carries both intentional_zero_max_items: true and a non-null on_empty_message whose string value is byte-identical across all N results (required for §9.0.1 rule 1 hoisting), verifying that both fields hoist into shared_expression_fields and that the {intentional_zero_max_items: true, on_empty_message != null} invariant (§13 ExpressionMeta) holds on the effective ExpressionMeta of every result after reconstruction |
TestGumParallelResultEnvelopeCompression |
v0.1 CI |
Per-tool inputSchema budget verification (§4.1): each Tier A risk tool's registered inputSchema stays within its declared per-tool budget after parameter additions; refreshed every parameter-adding PR; also verifies that the gum.write tool description string contains the normative irreversibility sentence (§13 gum.write irreversibility notice) |
TestTierAPerToolInputSchemaBudget |
v0.1 CI |
Per-tool token delta gate (spec §2 line 129, bead gum-coo): per-tool cl100k_base description token counts are stored in testdata/tier-a-token-baseline.json; any tool whose measured count exceeds its stored baseline fails the test with TOKEN_DELTA_REGRESSION; a tool absent from the baseline fails with MISSING_BASELINE; an orphan baseline entry (in the file but not in tools/list) fails with ORPHAN_BASELINE. Increases require the token-budget-increase PR label and a paired baseline bump in the same change; decreases pass and emit a RATCHET_OPPORTUNITY Logf hint. |
TestTierAPerToolTokenDelta / TestTierABaselineJSONLoads |
v0.1 CI |
Tier A output schemas cover branch-specific successful results: gum.write high-stakes confirmation-required, gum.destructive confirmation-required, gum.code confirmation-required, normal gum.code, gum.code returning gum_parallel, and code output-limit structured results all validate against the registered branch schemas; schemas using oneOf still have root type: object so go-sdk registration accepts them |
TestTierABranchOutputSchemas |
v0.1 CI |
| JSON-RPC batch handling is legacy-compatible only: GUM does not advertise batching and does not rely on it for client-facing behavior; if the pinned transport accepts a legacy batch frame, caps reject oversized input before dispatch (more than 32 requests, more than 1 MiB decoded batch body, or more than 256 KiB decoded params per item) and run no tool handler | TestMCPBatchCaps |
v0.1 CI |
Audit log hard ceiling cannot be exceeded under cross-process lock contention: when audit.jsonl is at or above 10 GB and audit.unbounded=false, append blocks until emergency rotation succeeds or returns an audit append failure rather than writing past the cap |
TestAuditHardCeilingContention |
v0.1 CI |
Structured logging (§14.1): build-time AST lint rejects log.Printf, fmt.Fprintln(os.Stderr, ...), and third-party logger imports in internal/dispatch, internal/adapters/*, internal/mcp, internal/cli, internal/cache, internal/auth, internal/profiles, internal/sandbox, internal/ratelimit, internal/tee, and internal/output. The scan set MUST include every package in the §14 constructor-convention table (each of which mandates WithLogger injection) plus stateless internal/output (which must not log at all). |
TestStdLogProhibition |
v0.1 CI |
Packages listed in §14 expose WithLogger(*slog.Logger), default to slog.Default(), and emit nothing when passed slog.New(slog.DiscardHandler) |
TestLoggerInjectionContract |
v0.1 CI |
v0.1.0 release binaries import neither go.opentelemetry.io/* nor net/http/pprof (§14.1.5–6). TestNoOTelImportV01 verifies via the import graph (go list -deps ./cmd/gum/... MUST produce no go.opentelemetry.io/* matches); the module-graph scan (go list -m all) is run as an advisory warning, not a CI failure, so transitive SDK module presence does not produce a false positive. TestNoPprofImportV01 uses the same import-graph methodology for net/http/pprof. |
TestNoOTelImportV01, TestNoPprofImportV01 |
Release gate |
Audit-log entries always emit schema version field v as the first key; absent v is parsed as v:0 |
TestAuditLogSchemaVersion |
v0.1 CI |
internal/output statelessness (§14): AST scan confirms no exported constructors (no func New* / func (T) New*), no package-level mutable var declarations, and no exported types with unexported fields that could hold mutable state; a future PR adding a stateful encoder cache to internal/output fails CI before merge |
TestOutputStatelessness |
v0.1 CI |
intentional_zero_max_items invariant (§13 ExpressionMeta, §9.1): when the runtime emits _expression.intentional_zero_max_items: true, it MUST also emit a non-null _expression.on_empty_message; the combination {intentional_zero_max_items: true, on_empty_message: null} is never emitted by a v0.1.0 dispatch path; the test runs the expression pipeline on a fixture profile with collapse_arrays.max_items=0 + on_empty="..." and asserts the emitted envelope satisfies the invariant. The invariant MUST also be asserted on the effective per-result ExpressionMeta after gum_parallel envelope compression and reconstruction (covered jointly by TestGumParallelResultEnvelopeCompression fixture (e)) so that hoisting cannot smuggle in a violating combination via shared_expression_fields ∪ per_result._expression |
TestIntentionalZeroMaxItemsInvariant |
v0.1 CI |
gum://catalog resources/list entry carries size annotation equal to len(catalogBin) (non-zero, runtime-computed from the embedded artifact) AND "x-gum-do-not-auto-inject": true annotation (§13). TestStaticResourceRegistration is extended with both assertions |
TestStaticResourceRegistration (extended) |
v0.1 CI |
gum://status/canaries returns status="stale" for every installed plugin canary on server startup before any passive cron run completes; the resource covers plugin canaries only and never first-party Google API health (§13) |
TestCanaryStaleOnStartup |
v0.1 CI |
gum://plugins MCP view omits rows whose status is installed_pending_restart; the same plugin appears in gum plugin list --format=json CLI output for the same profile (§13 R28B-M-2 filter) |
TestPluginsResourceFiltersPendingRestart |
v0.1 CI |
gum://status/health returns rows for the closed v0.1 subsystem enum [audit_log, cache_sqlite, tee_filesystem, keychain, gain_ledger, canary_runner]; values status∈{healthy, degraded, unavailable}; health probes are local-only and emit no network calls; per-subsystem sample TTL bounds resource-read cost (§13) |
TestStatusHealthSubsystemEnum / TestStatusHealthNoNetwork |
v0.1 CI |
Stdio transport uses newline-delimited JSON-RPC framing only; the silent-stdout startup invariant holds: between process start and notifications/initialized, no non-JSON bytes appear on stdout (§13.1) |
TestStdioFramingClean |
v0.1 CI |
Unsolicited logging/setLevel (sent by a permissive client despite logging being unadvertised) returns a successful empty-result response and does not log to stdout; the requested level is recorded in an in-memory per-session field for forward compatibility (§13.1) |
TestLoggingSetLevelTolerant |
v0.1 CI |
MCP completion/complete returns within 100 ms (P95) and 250 ms (P99) for a worst-case Tier A argument (variant_id completion for a 50-variant op) and for every completable resource-template argument with a 100-plugin inventory; resource-template completions meet the same budget; failing the P95 budget on linux/amd64 CI hardware is a release-blocker (§13.1) |
TestMCPCompletionLatency |
Release gate |
BM25 index (gen/index/bm25.bin), dense embedding index (gen/index/embeddings.bin), and the active session catalog snapshot are built from the same catalog.json in a single go generate run; every snapshot op_id appears in bm25.bin; every snapshot op_id with embedding-enabled variants appears in embeddings.bin; every embedding entry corresponds to a BM25 entry; no completion-eligible op_id is absent from any required index (§5.3 single-source invariant) |
TestCatalogIndexSnapshotInvariant |
v0.1 CI |
internal/dispatch is the leaf of the internal import graph: no package in internal/dispatch's transitive import set re-imports internal/dispatch; internal/usage, internal/catalog, internal/profiles, internal/auth, internal/cache, internal/ratelimit, internal/retry, internal/sanitize, internal/output, internal/tee, and internal/pluginenv do not import internal/dispatch; cmd/gen-catalog imports no internal package other than internal/catalog schema types (§14 import-graph contract) |
TestNoCyclicImports |
v0.1 CI |
Shipped linux/amd64 binary (including embedded gen/catalog.bin, gen/index/embeddings.bin, gen/index/bm25.bin) does not exceed 120 MB at release tagging (§15 binary-size cap) |
TestBinarySize (in internal/securityscan, cap = MaxBinarySizeBytes) + binary-size job in .github/workflows/build-matrix.yml |
Release gate (every PR) |
cmd/gum and its transitive dependency closure contain no CGo imports (§15 CGO_ENABLED=0 scoping); native keychain code lives behind the internal/auth keychain-backend abstraction |
TestReleaseBinaryNoCGo (in internal/securityscan) + CI build-matrix workflow with CGO_ENABLED=0 across linux/amd64, linux/arm64, darwin/amd64, darwin/arm64 |
Release gate + PR gate |
Fixtures under testdata/ contain no real Google API keys (AIza...), bearer tokens, OAuth refresh tokens (1//0...), AWS access keys (AKIA...), PEM private keys, or non-example.com email addresses; allowlists cover RFC 2606 reserved domains (*.example.{com,org,net}, *.test.local, localhost) and synthetic Google Calendar iCalUIDs (evNNN@google.com) |
TestFixturesNoSecrets (in internal/securityscan) |
v0.1 CI |
Release binary contains no shared-library linkage (linux ldd: "not a dynamic executable" or "statically linked"; darwin otool -L: only /usr/lib/* and /System/* entries); built with CGO_ENABLED=0 -trimpath -ldflags='-s -w' |
TestNoSharedLibDependencies (in internal/securityscan) |
Release gate |
Build-time -X main.version=<tag> propagates to gum --version and gum version output; release tags drive the stamped value via goreleaser ldflags |
TestVersionStamp (in internal/securityscan) |
Release gate |
Opt-in update notifier (gum-afcv.5): notify.enabled=true per-profile gates an async GitHub releases-API check on gum version; check honours a 2s timeout; results cache 24h at <XDG_CACHE_HOME>/gum/<profile>/notify.json; the check NEVER blocks version output and warnings emit to stderr only (not stdout — pipelines stay clean); dev builds and unparseable semvers are no-ops; fetch errors are silently swallowed |
internal/notify package tests + TestVersionNotifierDisabledByDefault / TestVersionNotifierOptInPrintsToStderrNotStdout in cmd/gum |
v0.1 CI |
Homebrew tap install path: ehmo/homebrew-tap carries Formula/gum.rb; users install with brew tap ehmo/tap https://github.com/ehmo/homebrew-tap then brew install ehmo/tap/gum. The formula name must stay qualified because Homebrew core already has Charmbracelet's unrelated gum. Binary casks stay disabled until macOS signing/notarization is active; unsigned cask binaries can be quarantined or blocked on first launch. |
ruby -c Formula/gum.rb in the tap + remote Homebrew install smoke where host policy and disk allow it |
Release gate |
Release workflow rejects tags that do not match ^v[0-9]+\.[0-9]+\.[0-9]+$ (no pre-release suffix on the stable channel) via the validate-tag job before any goreleaser step runs |
validate-tag job in .github/workflows/release.yml |
Release gate |
Two sequential builds of ./cmd/gum from the same source tree with CGO_ENABLED=0 -trimpath -ldflags='-s -w -X main.version=<stamp>' produce byte-identical SHA-256 outputs |
TestReproducibleBuild (in internal/securityscan) + reproducible job in .github/workflows/build-matrix.yml |
Release gate |
Release pipeline runs go test -race ./... on Go 1.25.x and 1.26.x as a release gate; runs the three fuzz targets internal/output.FuzzToonParser, internal/cache.FuzzJCSCanonical, and internal/plugins.FuzzPluginManifest with -fuzztime=60s per CI run; govulncheck ./... runs advisory in v0.1.0 and blocking in v0.2.0 (§15) |
TestRaceModeReleaseGate / FuzzToonParser / FuzzJCSCanonical / FuzzPluginManifest / TestGovulncheckPipeline |
Release gate (race + fuzz from v0.1; govulncheck-blocking from v0.2) |
SLSA Level 1 provenance attestations are generated by slsa-framework/slsa-github-generator/.github/workflows/generator_generic_slsa3.yml@v2.0.0 against the SHA-256 hashes of all goreleaser-produced *.tar.gz artifacts; the .intoto.jsonl is uploaded to the GitHub release; a verify-provenance job downloads the just-published artifacts + provenance and runs slsa-verifier verify-artifact --source-tag <tag> against every artifact before the workflow exits (§15 release pipeline) |
provenance + verify-provenance jobs in .github/workflows/release.yml; slsa-framework/slsa-verifier/actions/installer@v2.6.0 |
Release gate |
Release fixture set composition gate: internal/bench/fixtures/release/ contains at least 200 fixture calls with composition (50% Workspace read, 20% gum_parallel 2-4 batches, 15% non-Workspace read, 15% write/destructive across ≥3 services), validated at release within ±5% tolerance per category (§12.3) |
TestGainFixtureCompositionGate |
Release gate |
gum gain --fixture-replay runs the release fixture set against the TOON-default catalog profiles, computes end-to-end savings against the fixture-backed baseline (baseline_method = "fixture_replay"), and emits the same GainResult schema as live ledger reads; failure modes (missing fixture set, schema drift, baseline mismatch) return stable error envelopes (§12.3 R28B-M-3) |
TestGainFixtureReplay |
Release gate |
gum gain --fixture-replay second pass with output.default_format=json set on the profile config replays the same release fixture set, computes JSON-default end-to-end savings against the fixture-backed baseline, and is the release-gate proof artifact required by §2.1 "Savings claim measurement". The two passes share fixture data; only the default-format setting differs. Failure (savings drift beyond the documented JSON-default band, fixture-replay error envelope) blocks the release tag. |
TestGainFixtureReplayJSONDefault |
Release gate |
Managed-scope manifest (apps/gum/internal/embedded/data/auth-managed-scopes.v1.json) validates against apps/gum/internal/embedded/data/auth-managed-scopes.v1.schema.json (JSON Schema 2020-12); manifest invalidity fails the build before cmd/gen-catalog evaluates gum_oauth variants; for every scope with status="active", the schema enforces verification_state="verified", project_evidence_state="ready", live_canary_state="passing", and non-empty evidence (§7) |
TestManagedScopeManifestSchema |
v0.1 CI |
cmd/gen-catalog/overrides.toml validates against cmd/gen-catalog/overrides.schema.json (JSON Schema 2020-12 applied to TOML decoded as JSON) before each cmd/gen-catalog run; validation failure is OVERRIDES_SCHEMA_INVALID (§5.2) |
TestOverridesManifestSchema |
v0.1 CI |
Cobra-forbidden patterns: AST lint rejects cobra.OnInitialize usage anywhere in internal/cli/; required initialization order is constructor-driven not init-hook-driven (§12.2) |
TestForbiddenPatterns |
v0.1 CI |
Atomic settings.json patch: gum init acquires plugins.install.lock before writing settings.json; the lock is the canonical mutex; concurrent gum init runs serialize correctly; lock-held state surfaces a structured error rather than racing the patch (§12.2 R28C-M-12) |
TestSettingsAtomicPatch |
v0.1 CI |
Profile config schema versioning: profile config files declare config_schema_version = 1; unsupported future versions fail with CONFIG_SCHEMA_UNSUPPORTED; unknown keys in the current version fail with UNKNOWN_CONFIG_KEY (§12.2) |
TestProfileConfigSchemaVersion |
v0.1 CI |
Gain-ledger retention mirrors §11 audit retention: per-profile gain-ledger.jsonl rotates on the same configurable schedule; rotation does not lose entries; rotated segments are read by gum gain and gum.gain (§12.3 R28C-OQ-3) |
TestGainLedgerRetention |
v0.1 CI |
interface_kind closed-enum membership is enforced at catalog build: unknown non-x-* values fail with UNKNOWN_INTERFACE_KIND; promotion from x-<name> to <name> requires the multi-step procedure in docs/catalog-abi.md (§Interface Kind extension procedure); TestInterfaceKindClosedEnum verifies both the closed-enum and a representative experimental-to-stable promotion fixture |
TestInterfaceKindClosedEnum |
v0.1 CI |
binding_schema_version is an integer; non-integer values (decimal point, string suffix, semver-triple) fail with BINDING_SCHEMA_UNSUPPORTED at load (docs/catalog-abi.md binding-version patch prohibition) |
TestBindingSchemaVersionInteger |
v0.1 CI |
prompts/get argument-rejection transport returns JSON-RPC error.code = -32602 (InvalidParams) with error.data.error_code = "INVALID_ARGS" for argument maps supplied to zero-argument v0.1 prompts (§7) |
TestPromptsGetInvalidArgs |
v0.1 CI |
gum.code with a reserved language value (starlark, yaegi, js, python) is rejected at the JSON Schema validation layer with JSON-RPC error.code = -32602; no GUM stable error code wraps the rejection; no script execution occurs (§4.3) |
TestCodeReservedLanguageRejection |
v0.1 CI |
MCP roots/list change handling: on receipt of notifications/roots/list_changed, GUM invalidates the session-cached roots/list and re-calls roots/list before the next request requiring project-local profile resolution; in-flight requests retain the pre-invalidation root set (§13.2 R28G-MF-3) |
TestRootsListChangedHandling |
v0.1 CI |
gum://results/{hash} lifecycle polling pattern: clients copy artifacts before artifact_expires_at; GUM emits no notifications/resources/updated for results in v0.1.0; reads after expiry return the stable RESULT_ARTIFACT_EXPIRED envelope (§13 R28B-S1-7 reframe) |
TestResultsArtifactExpiryPolling |
v0.1 CI |
ParallelResultItem schema (§13 $defs/ParallelResultItem) is the validating schema for ParallelResults.results[] items; items are NOT validated against ToonResult; each item requires _idx and _expression (an ExpressionMetaDelta); optional sibling _code_output_truncated: boolean lives on the item, not inside _expression; outer ParallelResults._code_output_truncated is set when any element carries the field set to true (R28B-min-5) |
TestParallelResultItemSchema |
v0.1 CI |
Goroutine-leak verification: TestPollTimeoutAndCancellation and TestGumParallelResultEnvelopeCompression end with defer goleak.VerifyNone(t); the pinned uber-go/goleak floor is v1.3.0+ (Appendix A) |
(extended assertions on existing tests) | v0.1 CI |
testscript-based CLI contract tests for gum call, gum code, and gum auth flows: testdata/script/*.txtar files drive deterministic stdin/stdout/stderr assertions; the pinned rogpeppe/go-internal floor is v0.13.0+ (Appendix A) |
TestCLIContractTestscript |
v0.1 CI |
Filesystem fsync fallback: on filesystems without fsync semantics, internal/tee falls back to best-effort durability and emits a structured warning {"level":"warn","event":"fsync_not_supported", ...}; no silent loss of artifact data (§8.7 R28A-Minor-5) |
TestTeeFsyncFallback |
v0.1 CI |
TOON forward-compatibility: plugins emitting TOON with an unsupported version stamp fail with TOON_VERSION_UNSUPPORTED structured error rather than silent best-effort decoding (§9.0) |
TestToonVersionUnsupported |
v0.1 CI |
tee.secret lifecycle embedding-model-independence: rotating the embedding model identity does not invalidate or rewrite tee.secret; the secret is keyed only to profile identity (§9.0 R28C-OQ-6) |
TestTeeSecretEmbeddingIndependence |
v0.1 CI |
Aggregate batch output ceiling for gum_parallel from inside gum.code: element-level UTF-8-boundary truncation sets _code_output_truncated: true on the affected element AND on the outer ParallelResults envelope; the cumulative pre-dispatch ceiling at 32,768 bytes (§9.0.1) is enforced; truncation never corrupts UTF-8 multi-byte sequences |
TestGumParallelCodeOutputCeiling |
v0.1 CI |
Meta-tool admin tuning: meta_tools.search_apis.truncate_strings.default_chars and meta_tools.search_apis.collapse_arrays.max_items are configurable from the admin tuning surface (§9.4); defaults and overrides are validated at config load |
TestMetaToolAdminTuning |
v0.1 CI |
Audit log graceful shutdown: SIGTERM triggers a 2-second drain (configurable via audit.drain_timeout_seconds, 0-30, default 2) of pending audit entries; audit.broken is NOT set on clean exit; structured audit_drain_complete event is emitted (§11) |
TestAuditGracefulShutdown |
v0.1 CI |
Keychain unavailable policy: when neither native keychain nor encrypted-file vault is available (e.g., read-only filesystem), credential reads fail with AUTH_KEYCHAIN_UNAVAILABLE and user_message suggests gum auth use-adc; the error never silently falls back to an in-memory cache (§7) |
TestAuthKeychainUnavailable |
v0.1 CI |
HELP_TOPIC_TOO_LARGE: any internal/help/<topic>.md whose rendered body exceeds 8 KiB fails the build with HELP_TOPIC_TOO_LARGE (§7 error code table; §13 gum://help/{topic} cap) |
TestHelpTopicSizeCap |
v0.1 CI |
gum.poll.operation_name description is the canonical name for the LRO handle argument; the description is reachable through gum.describe_op("gum.poll") and matches the v0.1 contract |
(extended assertion on TestTierAPerToolInputSchemaBudget) |
v0.1 CI |
Compound auth token-forwarding allowlist: when an op's resolved variant declares auth_strategy="compound", the dispatch layer MAY whitelist google_access_token in the plugin subprocess environment (§7 R28A-M-5); other strategies never forward access tokens to plugin subprocesses; the env denylist (§8.1) is the single source of truth |
TestCompoundAuthTokenForwarding |
v0.1 CI |
LRO_UNSUPPORTED_IN_CODE: ops whose default variant is classified lro=true raise {"error_code": "LRO_UNSUPPORTED_IN_CODE", ...} before dispatch when called from inside gum.code (§6.1) |
TestLROUnsupportedInCode |
v0.1 CI |
gum_print UTF-8 truncation: when the per-call byte limit is reached, the output is truncated at the last valid UTF-8 boundary before the limit; partial multi-byte sequences are excluded; the truncated suffix never contains a half-encoded character (§6.1) |
TestGumPrintUtf8Boundary |
v0.1 CI |
Registry remains structurally mutable post-Server.Run (§4.2 forward-compat invariant): Server.AddTool is callable at the object level on the v0.1.0 internal/mcp server after Run returns. The v0.1.0 behavioral policy that forbids mid-session registration is enforced by a session-scoped gate in internal/dispatch, NOT by structural prohibition; this enables the v0.2.0 tools/list_changed enablement to land without re-architecting internal/mcp |
TestMCPRegistryStructurallyMutable |
v0.1 CI |
confirmation_purpose="high_stakes_write" reserved-value rejection (§6.1.2): a confirmation_token whose embedded purpose is "high_stakes_write" (the v0.3.0 reserved value, NOT emitted by a v0.1.0 dispatcher) presented to a v0.1.0 dispatcher returns CONFIRMATION_TOKEN_INVALID with data.reason="unknown_purpose"; the closed-enum check happens before HMAC verification so that a forged token does NOT reveal whether the HMAC would have matched |
TestConfirmationTokenUnknownPurpose |
v0.1 CI |
tee_mode="failures" step-6 vs step-7 boundary (§9.1, §expression-profile-dsl.md tee_mode): a RATE_LIMITED error raised by the in-process token bucket at step 6 of the dispatch lifecycle (§3.1) does NOT write a tee artifact (failure occurred before the executor); a RATE_LIMITED error raised by an upstream HTTP 429 at step 7 DOES write a tee artifact (failure occurred during the executor). The test fixture pair distinguishes the two paths by stubbing the token bucket vs. the HTTP transport |
TestTeeFailuresStep6VsStep7 |
v0.1 CI |
cmd/gum release-binary CGo-freeness across the full release matrix (§15 release build flags): go list -deps ./cmd/gum/... returns an empty intersection with the set of CGo-flagged packages for each GOOS/GOARCH pair in (linux/amd64, linux/arm64, darwin/amd64, darwin/arm64); the assertion is broader than TestNoCGoInDispatch (which only covers internal/dispatch) and catches keychain-backend regressions before tagging |
TestReleaseBinaryNoCGo |
Release gate |
internal/pluginenv/denylist.txt single-source invariant (§14 pluginenv row): the SHA-256 of the go:embed-ed denylist material in internal/pluginenv and the SHA-256 of the go:embed-ed material in cmd/gen-catalog (which embeds the same file via a relative path or build-time copy) MUST be identical; the test fails if either embedding site drifts |
TestPluginEnvDenylistSingleSource |
v0.1 CI |
auth_strategy enum extension completeness (§7 extension procedure): the gate enforces the (a)/(b)/(c) check against the closed enum in internal/auth/strategy.go minus the v0.1.0 baseline set (gum_oauth, byo_oauth, adc, service_account, api_key, compound, plugin_managed, none) which is already covered by pre-existing tests (TestAuthStrategyRequired, TestCompoundAuthErrorEnvelope, etc.). For every residual value: (a) it appears in docs/catalog-abi.md's auth_strategy cross-reference, (b) it has an internal/auth/strategy_<name>.go file, and (c) this matrix contains a row whose test name is exactly TestAuthStrategy<Name> (CamelCase). The baseline set is enumerated in the test source so removing a value from it re-arms the residual check |
TestAuthStrategyEnumExtensionComplete |
v0.1 CI |
Per-package line-coverage retention gate over the full tracked surface — ./cmd/gum/... and ./internal/... (internal/coverage.GatedPackages), excluding build-time tooling (cmd/gen-*, cmd/measure-tier-a, cmd/test-matrix, cmd/coverage-floor) and the generated, gitignored gen/dispatch tree (beads gum-b22o.5, gum-5wkg, gum-8ilq, gum-ql6c). internal/coverage.FloorPercent (85%) is the absolute minimum for any un-listed package; internal/coverage.Ratchets pins every tracked package at a retention baseline of floor(current − ~1% jitter headroom) (a baseline Min may sit above or below 85%, recording the level actually held). make coverage-floor (CI job coverage-floor, single Go toolchain) measures per-package coverage via go test -coverprofile, parses the raw profile, and fails when any package falls below its effective threshold (ratchet Min, else 85%); it also prints a non-failing RATCHET_OPPORTUNITY hint when a package exceeds its baseline by RatchetOpportunityMargin (2%). Lowering a ratchet Min requires a matching test-matrix note plus an owning Bead; gum-ql6c is the explicit v0.1 release-candidate recalibration after the gum.code/trust-boundary hardening and catalog-depth work shifted package denominators. Undocumented lowering or removing a Bead reference is itself a regression and breaks TestRatchetEntriesHaveBeadReferences |
TestFloorIs85 / TestRatchetEntriesHaveBeadReferences / TestRatchetEntriesAreUnique / TestGatedPackages / TestCheckRespectsRatchet / TestOpportunitiesFlagsImprovedPackages / TestParseProfileAggregatesPerPackage |
v0.1 CI |
Release claims such as ">=80% savings" may cite only fixture-backed local gain-ledger entries. Privacy-minimized telemetry is product telemetry, not release-gating evidence.
