Skip to main content

PostgreSQL Claim Authority Initiative

Goal

Build a PostgreSQL-backed claim authority for DevNexus so multiple coordinator hosts can safely race for eligible work and still produce exactly one verified claim winner per work item.

This initiative keeps work trackers as the human-visible work record. PostgreSQL owns claim authority, fencing, heartbeat, expiry, release, and audit facts. GitHub, GitLab, Jira, or local trackers continue to hold titles, descriptions, labels, comments, and workflow status.

Done means:

  • Coordinators can call the existing claim-next surfaces and receive a claim backed by PostgreSQL when configured.
  • Claim acquisition has one-winner semantics inside a configured project, component, tracker, or repository scope.
  • Each claim receives a lease token and monotonic fencing token.
  • Workers can verify, heartbeat, release, and reclaim claims through the same provider-neutral API.
  • Provider-visible status updates remain best-effort mirrors and cannot create duplicate workers when PostgreSQL authority is enabled.
  • Core tests prove the contract without requiring a live database.
  • Optional live PostgreSQL smoke tests are gated behind explicit runner policy.

Integration Surface

  • Component: dev-nexus.
  • Initiative branch: codex/dev-nexus/postgres-claim-authority.
  • Delivery topology: initiative integration branch. Slice branches should target this branch; final publication to main happens only after coherent review and verification.
  • Tracker anchor: existing RFC issue Evref-BL/DevNexus#187 until a dedicated implementation issue or epic is approved.

Current Architecture

Current claim acquisition lives in src/work-items/nexusWorkItemClaim.ts. It:

  1. Lists eligible work through configured work-tracker providers.
  2. Re-reads the candidate from the tracker.
  3. Writes an HTML claim block into the work-item description.
  4. Sets the work-item status to in_progress.
  5. Re-reads the tracker item and verifies the lease token.
  6. Adds a provider comment when supported.

That path is useful for visibility but still relies on provider updates that are not compare-and-swap. The PostgreSQL path should add a claim-authority boundary below claimNexusEligibleWorkItem, not replace work tracking.

Relevant files:

  • src/work-items/nexusWorkItemClaim.ts
  • test/work-items/nexusWorkItemClaim.test.ts
  • src/automation/nexusAutomationAgentLaunch.ts
  • src/automation/nexusAutomationCoordinatorLoop.ts
  • src/mcp/nexusMcpServer.ts
  • src/cli.ts
  • src/automation/nexusAutomationConfig.ts
  • src/project/nexusProjectConfig.ts
  • src/work-items/workTrackingTypes.ts

Target Design

Introduce a provider-neutral claim authority interface separate from WorkTrackerProvider.

Initial operations:

  • claimNext: choose and claim one eligible candidate for an owner.
  • verifyClaim: confirm a lease token and fencing token are still current.
  • heartbeatClaim: extend or refresh a claim owned by the same token.
  • releaseClaim: voluntarily release a claim.
  • reclaimExpired: acquire a claim whose lease expired, according to policy.
  • inspectClaims: list active and stale claims for reporting.

Backends:

  • optimistic_tracker: existing tracker-description claim behavior.
  • postgres: strong multi-host claim authority.
  • memory: test-only implementation used by contract tests.

PostgreSQL should use a short transaction for the picker critical section:

  1. Acquire a scoped picker mutex, probably a PostgreSQL transaction-scoped advisory lock keyed by project/component/tracker scope.
  2. Evaluate candidates in deterministic order.
  3. Insert or update one claim row only when there is no active claim, or when reclaim policy allows an expired claim.
  4. Assign a monotonic fencing token from the database.
  5. Commit quickly.
  6. Mirror status/comment data to the work tracker after the authoritative claim exists.

Provider mirror failure must not erase the database claim. It should return a claimed result with a provider-sync warning, or release the claim only when the configured policy explicitly says failed provider mirroring makes the claim unusable.

Data Model

Minimum logical schema:

  • claim_scope: project id, component id, tracker id, provider, repository or equivalent provider scope.
  • work_item_key: provider item id plus canonical external reference fields.
  • lease_token: random unique token generated by DevNexus.
  • fencing_token: monotonic database-issued token.
  • owner: host id, optional agent id, optional owner id, optional execution id.
  • state: active, released, expired, reclaimed, or abandoned.
  • claimed_at, expires_at, last_heartbeat_at, released_at.
  • provider_mirror: last status/comment/update attempt and warning details.
  • audit: created and updated timestamps, previous fencing token on reclaim.

Schema creation should be explicit and policy-gated. Do not silently create or migrate a team database from routine claim-next calls.

Slices

Slice 1: Claim Authority Interface

Scope:

  • Add claim-authority types and an in-memory implementation.
  • Move current optimistic tracker behavior behind the new interface without changing CLI/MCP behavior.
  • Add backend contract tests for claim success, lost race, active claim, stale claim, reclaim, release, heartbeat, and verify.

Acceptance:

  • Existing work_item_claim_next CLI/MCP tests still pass.
  • claimNexusEligibleWorkItem delegates authority decisions through the new interface.
  • No PostgreSQL dependency is introduced yet.

Verification:

  • npm test -- test/work-items/nexusWorkItemClaim.test.ts
  • npm test -- test/automation/nexusAutomationAgentLaunch.test.ts test/automation/nexusAutomationCoordinatorLoop.test.ts
  • npm run build

Progress:

  • 2026-05-22: Slice 1A added NexusWorkItemClaimAuthority and delegated ready-candidate acquisition through the default optimistic tracker authority. No database dependency was introduced.
  • 2026-05-22: Slice 1B added the in-memory authority contract for active claims, duplicate rejection, verify, heartbeat, release, stale inspection, reclaim, and fencing-token increments.
  • 2026-05-22: Slice 1C routed stale reclaim in claimNexusEligibleWorkItem through the authority backend while preserving the optimistic tracker default behavior.
  • Slice 1 is ready to feed the PostgreSQL backend slice.

Slice 2: PostgreSQL Backend Contract

Scope:

  • Add PostgreSQL claim backend using an injected SQL client boundary.
  • Add schema DDL text or migration helper, but keep live schema mutation explicit.
  • Add tests against a fake SQL client or transaction harness; avoid live database requirements in the normal suite.
  • Decide whether the runtime dependency is pg or a small user-supplied client adapter before adding package dependencies.

Acceptance:

  • Contract tests prove one-winner selection, fencing increments, heartbeat, release, expired reclaim, and provider mirror warnings.
  • The backend can be configured but is not the default.

Verification:

  • npm test -- test/work-items/nexusPostgresWorkItemClaimAuthority.test.ts
  • npm run build

Progress:

  • 2026-05-22: Slice 2A added a PostgreSQL authority backend behind an injected transaction/query client, exported explicit schema DDL, and covered one-winner claim, fencing, heartbeat, release, expired reclaim, inspection, and provider-mirror warning behavior with a fake SQL harness. No pg dependency or live database requirement was introduced.

Slice 3: Configuration And Surfaces

Scope:

  • Add project or automation configuration for claim authority backend selection.
  • Add CLI/MCP options or status output showing configured backend, scope, readiness, and blockers.
  • Keep credentials and connection strings host-local; portable project config should reference a profile or environment binding, not raw secrets.

Acceptance:

  • project_status, automation_status, and claim-next output show the active claim authority backend.
  • Misconfigured PostgreSQL reports a blocker without falling back silently to optimistic tracker claims unless fallback is configured.

Current config shape:

  • automation.workItemClaims.authority.backend: optimistic_tracker or postgres.

  • automation.workItemClaims.leaseDurationMs: lease TTL for one authority claim. Defaults to 60 minutes.

  • automation.workItemClaims.heartbeatIntervalMs: expected renewal cadence for active long-running owners. Defaults to 20 minutes and must be no more than half of leaseDurationMs.

  • automation.workItemClaims.authority.postgres.connectionProfileId: host-local credential/profile binding reference. Portable project config must not store the raw connection string.

  • Host-local dev-nexus.home.json may define claimAuthorityProfiles:

    {
    "id": "shared-claims",
    "backend": "postgres",
    "driver": "node_postgres",
    "connectionStringEnv": "DEV_NEXUS_CLAIMS_DATABASE_URL",
    "schema": "dev_nexus"
    }

    The environment variable name is stored in home config. The connection string value remains outside DevNexus config.

Verification:

  • Focused config, CLI, and MCP tests.
  • npm run build

Progress:

  • 2026-05-22: Slice 3A added backend selection to automation work-item claim config, exposed claim-authority readiness through project and automation status plus CLI/MCP JSON/text output, blocks PostgreSQL authority when no connection profile is configured, and prevents direct claim-next calls from silently falling back to optimistic tracker claims when PostgreSQL is selected but no runtime adapter is injected. No live database or dependency wiring was introduced.
  • 2026-05-22: Slice 3B kept PostgreSQL support in core as an opt-in backend while leaving the driver dependency optional. DevNexus home config now accepts host-local claim authority profiles, rejects stored database connection strings, and automation status reports missing profile, missing environment binding, and missing optional node_postgres adapter blockers without opening a database connection.
  • 2026-05-22: Slice 3C added the optional pg peer dependency and dynamic node_postgres runtime adapter. Claim-next now resolves host-local PostgreSQL profiles through CLI, MCP, and automation launch paths, while the normal suite still uses injected/fake SQL clients and does not require a live database.

Slice 4: Coordinator Enforcement

Scope:

  • Wire PostgreSQL-backed claims into automation launch and coordinator loop.
  • Pass fencing facts to worker context.
  • Require claim verification before worktree preparation, publication, status done, or release where those paths are under DevNexus control.

Acceptance:

  • Coordinator launch refuses to start a worker when PostgreSQL authority reports no verified claim.
  • Stale workers can detect their claim is no longer current.
  • Existing optimistic behavior remains available for projects without a strong backend.

Verification:

  • npm test -- test/automation/nexusAutomationAgentLaunch.test.ts test/automation/nexusAutomationCoordinatorLoop.test.ts
  • npm run build

Progress:

  • 2026-05-23: Slice 4A added post-claim authority verification before coordinator launch and carries verified authority/fencing facts into the agent context and launch environment. Projects without authority-backed claims keep the existing optimistic tracker behavior.
  • 2026-05-23: Slice 4B added an agent-launch claim guard for DevNexus-controlled worktree preparation. CLI and MCP worktree preparation now verify the current authority-backed claim from the launch context before creating a Git worktree, while optimistic tracker claims remain unchanged.
  • 2026-05-23: Slice 4C extended the same launch-context claim guard to current-agent completion recording. Stale workers can still report blocked or failed outcomes, but cannot record successful completion under an expired, released, or mismatched authority claim.

Slice 5: Documentation And Optional Live Smoke

Scope:

  • Document PostgreSQL claim authority setup, schema initialization, host-local connection binding, fallback policy, and operational recovery.
  • Add an optional live smoke script or test profile that runs only with an explicit PostgreSQL connection and approved runner policy.

Acceptance:

  • Users can configure a private PostgreSQL authority without storing secrets in portable project config.
  • Normal npm run check does not require a live database.

Verification:

  • npm run check
  • Optional PostgreSQL smoke under explicit approval.

Progress:

  • 2026-05-23: Slice 5A added user-facing PostgreSQL claim authority setup documentation, linked it from the docs index, and documented the claim-next --home option in CLI usage. Live PostgreSQL smoke remains gated.
  • 2026-05-23: Slice 5B added a gated Vitest live smoke for PostgreSQL claim authority. It is skipped unless DEV_NEXUS_POSTGRES_CLAIM_AUTHORITY_SMOKE=1 is set, creates/uses the configured schema, applies the exported schema SQL, verifies one-winner claiming and fencing-token verification, then releases the synthetic claim.
  • 2026-05-23: Slice 5C added npm run smoke:postgres-containers, a gated Docker canary that starts PostgreSQL plus two isolated DevNexus runner containers. The runners race for one synthetic work item through the real PostgreSQL backend and the script asserts one winner, one loser observing the same fencing token, heartbeat, release, and a released database row.

Slice 6: Current-Agent Coordination Parity

Scope:

  • Route current-agent adoption through the same work-item claim authority as spawned coordinator launches.
  • Include claim ownership and authority fencing facts in current-agent context and environment.
  • Preserve run-id reuse without taking a second claim.

Acceptance:

  • Current-agent adoption does not proceed without a successful claim when work-item claims are enabled.
  • Authority-backed claim verification failures skip adoption before the current coordinator starts mutable work.
  • Existing current-agent result recording and target-cycle behavior remain compatible.

Verification:

  • npm test -- test/automation/nexusAutomationCurrentAgentAdoption.test.ts
  • npm test -- test/automation/nexusAutomationAgentLaunch.test.ts test/work-items/nexusWorkItemClaim.test.ts
  • npm run build

Progress:

  • 2026-05-23: Slice 6A added current-agent adoption claim acquisition, authority claim context/environment projection, verification-lost-race skip behavior, and reuse handling that keeps an existing adoption context instead of taking a duplicate claim for the same run id.
  • 2026-05-23: Slice 6B added post-launch completion verification for spawned coordinator launches. Authority-backed completed results are converted to failed run records when the claim is no longer verified at result-recording time, preventing stale workers from recording successful completion after lease loss or reclaim.
  • 2026-05-23: Slice 6C added explicit current-agent claim heartbeat through the core guard helper, CLI automation current-agent heartbeat, and MCP current_agent_heartbeat. Long-running authority-backed current agents can now renew their lease without recording a terminal result.
  • 2026-05-23: Slice 6D made the lease policy explicit: 60 minute default lease, 20 minute default heartbeat interval, config validation that rejects heartbeat intervals over half the lease, environment projection of both values, and a preflight guard that blocks configured synchronous agent commands whose timeout is not lower than the claim lease.

Human Gates

  • Provider write gate: create or update GitHub implementation issues/comments.
  • Dependency gate: add pg or any runtime database client dependency.
  • Database gate: run live PostgreSQL tests, create schemas, or apply migrations.
  • Publication gate: push branch, open PRs, or merge into main.

Open Decisions

  • Store claim history forever, retain by age/count, or let operators prune.
  • Treat provider mirror failure as claimed-with-warning or claim-release by default.
  • Scope picker mutex by project, component, tracker, repository, or custom concurrency key.
  • Whether the first PostgreSQL implementation should include active-worker and runtime semaphores or only the picker mutex and claim lease.
  • Whether completed, blocked, and failed worker records should synchronously release database claims, rely on lease expiry, or use an explicit separate release command.
  • Whether async/app-server command workers should run an automatic heartbeat sidecar. The synchronous command launcher is guarded by timeout < lease because the runner blocks while the child command executes.