Skip to content

feat: /titan-run orchestrator with diff review, semantic assertions, arch snapshots#557

Open
carlos-alm wants to merge 29 commits intomainfrom
feat/release-skill-auto-semver
Open

feat: /titan-run orchestrator with diff review, semantic assertions, arch snapshots#557
carlos-alm wants to merge 29 commits intomainfrom
feat/release-skill-auto-semver

Conversation

@carlos-alm
Copy link
Contributor

Summary

  • Add /titan-run orchestrator skill that dispatches the full Titan pipeline (recon → gauntlet → sync → forge) to sub-agents with fresh context windows, enabling hands-free end-to-end execution
  • Add Diff Review Agent to /titan-forge (Step 9) — verifies each diff matches the gauntlet recommendation and sync plan intent before gate runs (scope check, intent match, commit message accuracy, deletion audit, leftover check)
  • Add Semantic Assertions to /titan-gate (Step 5) — export signature stability, import resolution integrity, dependency direction assertions, re-export chain validation
  • Add Architectural Snapshot Comparator to /titan-gate (Step 5.5) — captures pre-forge architectural baseline, compares community stability, cross-domain dependency direction, cohesion delta, and drift after each commit
  • Add hardening to /titan-run: Pre-Agent Gate (G1-G4), post-phase validation (V1-V15), stall detection, state file backup/recovery, NDJSON integrity checks, mandatory human checkpoint before forge

Test plan

  • Run /titan-run on a test codebase in a worktree — verify full pipeline completes
  • Verify gauntlet loop resumes correctly across sub-agent boundaries
  • Verify diff review catches a scope violation (touch file outside sync plan)
  • Verify semantic assertions catch a removed export with active callers
  • Verify architectural snapshot comparison catches a new upward dependency
  • Verify stall detection stops the loop after 3 consecutive no-progress iterations
  • Verify state file backup/recovery works when titan-state.json is corrupted
  • Verify --start-from forge skips analysis phases but validates their artifacts exist

…in backlog

These two items deliver the highest immediate impact on agent experience
and graph accuracy without requiring Rust porting or TypeScript migration.
They should be implemented before any Phase 4+ roadmap work.

- #83: hook-optimized `codegraph brief` enriches passively-injected context
- #71: basic type inference closes the biggest resolution gap for TS/Java
Impact: 14 functions changed, 0 affected
Add new Phase 4 covering the port of JS-only build phases to Rust:
- 4.1-4.3: AST nodes, CFG, dataflow visitor ports (~587ms savings)
- 4.4: Batch SQLite inserts (~143ms)
- 4.5: Role classification & structure (~42ms)
- 4.6: Complete complexity pre-computation
- 4.7: Fix incremental rebuild data loss on native engine
- 4.8: Incremental rebuild performance (target sub-100ms)

Bump old Phases 4-10 to 5-11 with all cross-references updated.
Benchmark evidence shows ~50% of native build time is spent in
JS visitors that run identically on both engines.
Take main's corrected #57 section anchors; keep HEAD's v2.7.0 version reference.

Impact: 10 functions changed, 11 affected
…ative-acceleration

Impact: 25 functions changed, 46 affected
- Add COMMITS=0 guard in publish.yml to return clean version when HEAD
  is exactly at a tag (mirrors bench-version.js early return)
- Change bench-version.js to use PATCH+1-dev.COMMITS format instead of
  PATCH+COMMITS-dev.SHA (mirrors publish.yml's new scheme)
- Fix fallback in bench-version.js to use dev.1 matching publish.yml's
  no-tags COMMITS=1 default

Impact: 1 functions changed, 0 affected
The release skill now scans commit history using conventional commit
rules to determine major/minor/patch automatically. Explicit version
argument still works as before.
…ns, and architectural snapshot

Add /titan-run skill that dispatches the full Titan pipeline (recon → gauntlet →
sync → forge) to sub-agents with fresh context windows, enabling end-to-end
autonomous execution.

Hardening layers added across the pipeline:
- Pre-Agent Gate (G1-G4): git health, worktree validity, state integrity, backups
- Post-phase validation (V1-V15): artifact structure, coverage, consistency checks
- Stall detection with per-phase thresholds and no-progress abort
- Mandatory human checkpoint before forge (unless --yes)

New validation tools integrated into forge and gate:
- Diff Review Agent (forge Step 9): verifies each diff matches the gauntlet
  recommendation and sync plan intent before gate runs
- Semantic Assertions (gate Step 5): export signature stability, import resolution
  integrity, dependency direction, re-export chain validation
- Architectural Snapshot Comparator (gate Step 5.5): community stability, cross-domain
  dependency direction, cohesion delta, drift detection vs pre-forge baseline
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR adds /titan-run, a hands-free end-to-end orchestrator for the full Titan pipeline (recon → gauntlet → sync → forge), along with a Diff Review Agent in /titan-forge (Step 9), Semantic Assertions in /titan-gate (Step 5), and an Architectural Snapshot Comparator in /titan-gate (Step 5.5). It also hardens the orchestrator with Pre-Agent Gate checks (G1-G4), Step 0.5 artifact pre-validation, stall detection, state backup/recovery, and a mandatory human checkpoint before forge. A large number of issues from the previous review cycle were addressed thoroughly.

Two issues remain:

  • titan-gate Step 4 still hardcodes npm test and npm run build — the test-runner detection fix (detect from package.json scripts) was applied to titan-run V13 and titan-forge Step 10 in the previous round, but titan-gate Step 4 was missed. On non-npm projects this causes false FAIL verdicts and incorrect auto-rollbacks. Affects both .claude/skills/titan-gate/SKILL.md:107 and docs/examples/claude-code-skills/titan-gate/SKILL.md:107.

  • TITAN_TMP_ID shell variable won't persist between separate Bash tool invocations in Step 5.5TITAN_TMP_ID is set in the "Capture current state" block and referenced in both the comparison prose and the cleanup block. Because Claude Code runs each bash code block in a fresh shell, TITAN_TMP_ID is empty when the cleanup block runs, causing rm -f to silently match nothing — leaving temp files at /tmp/titan-arch-<timestamp>-<pid>-current-*.json behind after every gate invocation. This re-introduces the stale-file problem that the TITAN_TMP_ID fix was meant to solve. Affects both .claude/skills/titan-gate/SKILL.md:182-227 and docs/examples/claude-code-skills/titan-gate/SKILL.md:182-227.

Confidence Score: 3/5

  • Safe to merge with minor fixes — two gaps in titan-gate that could cause false FAIL verdicts and temp file accumulation.
  • The orchestrator, diff review, semantic assertions, and architectural snapshot logic are all solid and the previous round of review issues were addressed thoroughly. Two issues in titan-gate prevent a higher score: (1) the hardcoded npm test that causes false failures on non-npm projects with real auto-rollback consequences, and (2) the TITAN_TMP_ID shell variable scope bug that silently leaves temp files after every gate invocation, which re-introduces the cross-worktree stale-file problem the fix was supposed to close. Neither is a data-corruption or security risk, but both cause observable incorrect behavior in production pipeline runs.
  • .claude/skills/titan-gate/SKILL.md and its mirror docs/examples/claude-code-skills/titan-gate/SKILL.md — specifically Step 4 (hardcoded npm test) and Step 5.5 (TITAN_TMP_ID variable scope).

Important Files Changed

Filename Overview
.claude/skills/titan-run/SKILL.md New orchestrator skill (599 lines) — adds Pre-Agent Gate (G1-G4), Step 0.5 artifact pre-validation, stall detection, state backup/recovery, NDJSON integrity checks, and architectural snapshot capture (Step 3.5a). Incorporates all previously reviewed fixes. Logic is sound; no new issues found here.
.claude/skills/titan-gate/SKILL.md Adds semantic assertions (Step 5) and architectural snapshot comparison (Step 5.5). Two issues: Step 4 still hardcodes npm test/npm run build (fix was applied to forge/run but missed here), and TITAN_TMP_ID shell variable won't persist between separate bash invocations causing temp file cleanup to silently fail.
.claude/skills/titan-forge/SKILL.md Adds diff review (Step 9) with D1-D5 checks, reorders stage → diff-review → test → gate → commit flow, adds git reset HEAD before git checkout in rollback (Step 13), and removes --yes flag. Clean changes, no new issues found.
docs/examples/claude-code-skills/titan-gate/SKILL.md Mirror of .claude/skills/titan-gate/SKILL.md — carries the same two issues: hardcoded npm test at line 107 and TITAN_TMP_ID scope problem at lines 182-227.
docs/examples/claude-code-skills/README.md Updates pipeline diagram, skills table, usage examples, artifact table, and command table to reflect the new /titan-run orchestrator and updated command usages. Accurate and consistent.

Sequence Diagram

sequenceDiagram
    participant User
    participant TitanRun as /titan-run (Orchestrator)
    participant Recon as /titan-recon Sub-Agent
    participant Gauntlet as /titan-gauntlet Sub-Agent
    participant Sync as /titan-sync Sub-Agent
    participant Forge as /titan-forge Sub-Agent
    participant Gate as /titan-gate Sub-Agent

    User->>TitanRun: /titan-run [args]
    TitanRun->>TitanRun: Step 0 — Pre-flight (worktree, args, git sync)
    TitanRun->>TitanRun: Step 0.5 — Artifact pre-validation (if --start-from)
    TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
    TitanRun->>Recon: Dispatch sub-agent
    Recon-->>TitanRun: titan-state.json, GLOBAL_ARCH.md
    TitanRun->>TitanRun: V1-V4 Post-recon validation

    loop Until gauntlet-summary.json complete=true (max 50 iters)
        TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
        TitanRun->>Gauntlet: Dispatch sub-agent (batch-size N)
        Gauntlet-->>TitanRun: gauntlet.ndjson (incremental)
        TitanRun->>TitanRun: Stall check + progress tracking
    end
    TitanRun->>TitanRun: V5-V7 + NDJSON integrity

    TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
    TitanRun->>Sync: Dispatch sub-agent
    Sync-->>TitanRun: sync.json
    TitanRun->>TitanRun: V8-V10 Post-sync validation

    TitanRun->>TitanRun: Step 3.5a — Capture arch-snapshot.json
    TitanRun->>User: FORGE CHECKPOINT — show plan, await confirmation

    loop Until all sync phases complete (max 20 iters)
        TitanRun->>TitanRun: G1-G4 Pre-Agent Gate
        TitanRun->>Forge: Dispatch sub-agent (--phase N [--yes])
        loop Per target in phase
            Forge->>Forge: Step 8 — git add staged files
            Forge->>Forge: Step 9 — Diff Review (D1-D5)
            Forge->>Forge: Step 10 — Run tests
            Forge->>Gate: Invoke /titan-gate (Step 11)
            Note over Gate: Step 5 Semantic assertions<br/>Step 5.5 Arch snapshot compare
            Gate-->>Forge: PASS / WARN / FAIL
            Forge->>Forge: Step 12 — git commit (on PASS)
            Forge->>Forge: Step 13 — rollback (on FAIL)
        end
        Forge-->>TitanRun: titan-state.json updated
        TitanRun->>TitanRun: V11-V13 Post-phase validation
    end
    TitanRun->>TitanRun: V14-V15 Final state consistency
    TitanRun->>User: TITAN PIPELINE COMPLETE report
Loading

Comments Outside Diff (1)

  1. .claude/skills/titan-gate/SKILL.md, line 107 (link)

    P1 titan-gate Step 4 still hardcodes npm test — inconsistent with the fix applied elsewhere

    A previous review cycle applied test-runner detection to titan-run V13 and titan-forge Step 10, but titan-gate Step 4 (the lint/build/test step) was missed. On non-npm projects (pnpm, yarn, bun) or projects with no test script, npm test will fail with npm error Missing script: "test", which produces a false FAIL verdict — triggering an incorrect auto-rollback on every gate run.

    The same line exists at docs/examples/claude-code-skills/titan-gate/SKILL.md:107 and needs the same fix.

Last reviewed commit: "fix: address titan-r..."

Comment on lines +200 to +205

if currentAuditedCount == previousAuditedCount:
stallCount += 1
Print: "WARNING: Gauntlet iteration <iteration> made no progress (stall <stallCount>/<maxStalls>)"
if stallCount >= maxStalls:
Stop: "Gauntlet stalled for <maxStalls> consecutive iterations at <currentAuditedCount>/<expectedTargetCount> targets. Likely stuck on a problematic target. Check gauntlet.ndjson for the last successful entry and investigate the next target in the batch."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Undefined variable previousAuditedCountBeforeAgent in gauntlet efficiency check

The efficiency check references previousAuditedCountBeforeAgent, which is never defined in the pseudocode. By the time this line is reached, previousAuditedCount has already been updated to currentAuditedCount on the line just above, so using it there would always yield 0.

To correctly compute how many targets this iteration processed, you need to capture the pre-agent count before the update. For example:

    previousAuditedCount = currentAuditedCount   # update for next iteration's stall check

    # Efficiency check: if progress is very slow (< 2 targets per iteration), warn
    targetsThisIteration = currentAuditedCount - previousAuditedCountBeforeAgent  # ← undefined

Should be:

    # Save count before update for efficiency check
    countBeforeUpdate = previousAuditedCount
    previousAuditedCount = currentAuditedCount   # update for next iteration's stall check

    # Efficiency check
    targetsThisIteration = currentAuditedCount - countBeforeUpdate
    if targetsThisIteration == 1 and iteration > 3:
        Print: "WARNING: Only 1 target per iteration..."

This same issue exists in the identical copy at docs/examples/claude-code-skills/titan-run/SKILL.md at the same line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — saved pre-update count to countBeforeUpdate before reassignment. Applied to both .claude/skills/ and docs/examples/ copies.

@carlos-alm
Copy link
Contributor Author

Resolved merge conflicts with main and fixed the undefined previousAuditedCountBeforeAgent variable in the gauntlet loop efficiency check.

@carlos-alm
Copy link
Contributor Author

@greptileai

If any SHA doesn't exist → **WARN**: "Commit <sha> recorded in state but not found in git history. State may be out of sync."

**V15. Gate log consistency:**
If `.codegraph/titan/gate-log.ndjson` exists:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incomplete merge conflict detection — misses AU, UA, DU, UD markers

The check only lists UU, AA, DD as conflict indicators in git status --porcelain output. Git also produces AU (added by us, modified by them), UA (modified by us, added by them), DU (deleted by us, modified by them), and UD (modified by us, deleted by them) in the porcelain format for merge conflicts. Missing these cases could allow the pipeline to proceed with unresolved conflicts into a forge sub-agent that commits.

Suggested change
If `.codegraph/titan/gate-log.ndjson` exists:
- **Merge conflicts** (lines starting with `UU`, `AA`, `DD`, `AU`, `UA`, `DU`, `UD`): Stop immediately: "Unresolved merge conflict detected. Resolve before continuing."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added AU, UA, DU, UD to the merge conflict detection markers in both .claude/skills/ and docs/examples/ copies.

Gate now includes semantic assertions (Step 5) and architectural snapshot comparison (Step 5.5).
The arch snapshot is at .codegraph/titan/arch-snapshot.json.
6. Commit on success, rollback on failure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 --start-from forge silently disables architectural comparison without warning

When using --start-from forge, Step 3.5a (architectural snapshot capture) is skipped entirely since it lives inside Step 3.5, which comes after sync. The gate's Step 5.5 gracefully handles a missing arch-snapshot.json by skipping the comparison, but the user receives no warning that the architectural drift check will be inactive for the entire forge run.

Consider adding an explicit notice at the start of the forge loop when arch-snapshot.json is absent and the start phase was forge:

if startPhase == "forge" and not exists(".codegraph/titan/arch-snapshot.json"):
    Print: "NOTE: No arch-snapshot.json found. Architectural comparison in /titan-gate (Step 5.5) will be skipped for this run.
           To enable it, run '/titan-run --start-from sync' to re-capture the pre-forge snapshot."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a pre-loop check in Step 4a that warns when arch-snapshot.json is missing, advising the user to run --start-from sync to re-capture it.

- Fix undefined previousAuditedCountBeforeAgent variable in gauntlet
  efficiency check (save pre-update count before reassignment)
- Add AU, UA, DU, UD to merge conflict detection markers
- Add warning when --start-from forge runs without arch-snapshot.json
- Fix undefined previousAuditedCountBeforeAgent variable in gauntlet
  efficiency check (save pre-update count before reassignment)
- Add AU, UA, DU, UD to merge conflict detection markers
- Add warning when --start-from forge runs without arch-snapshot.json
@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +206 to 208
13. **On failure (test, gate, or diff-review):**
```bash
git checkout -- <changed files>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incomplete rollback — staged files not unstaged in Step 13

The new flow stages files in Step 8 (git add) before running tests (Step 10) and gate (Step 11). When a test or gate failure routes to Step 13, only the working-tree restoration is performed:

git checkout -- <changed files>

This restores the file contents on disk but does not remove the files from the staging area. The changes remain indexed in git. If the orchestrator loops back and the next target's git add picks up different files, the leftover staged changes from the failed target will be silently included in the next commit.

The rollback should unstage first, then restore the working tree:

Suggested change
13. **On failure (test, gate, or diff-review):**
```bash
git checkout -- <changed files>
13. **On failure (test, gate, or diff-review):**
```bash
git reset HEAD <changed files>
git checkout -- <changed files>
```

Note: the DIFF FAIL path in Step 9 already says "Unstage and revert changes" correctly — this gap only affects Step 13. The same fix is needed in the docs mirror at docs/examples/claude-code-skills/titan-forge/SKILL.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added unstaging step before working tree restore in Step 13 rollback, so staged files are properly cleared before checkout. Applied to both .claude/skills/ and docs/examples/ copies.

Both commands are called in titan-forge's new diff review step (Step 9):
fn-impact for deletion audit (D4) and exports for re-export chain checks.
@carlos-alm
Copy link
Contributor Author

Addressed the remaining open item from the summary:

README command table (P2): Restored FORGE to both codegraph fn-impact and codegraph exports in the tool reference table. Both commands are called in titan-forge's new diff review Step 9 — fn-impact for the deletion audit (D4) and exports for re-export chain checks (D3). The two P1 issues (shell substitution bug in arch-snapshot.json and incomplete git rollback in Step 13) were addressed in an earlier commit.

The --yes passthrough was also checked — titan-run passes --yes to forge and forge still handles it at Step 8 ($ARGUMENTS contains --yes), so that is working as intended.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines 197 to 199
| `codegraph exports` | GAUNTLET | Per-symbol export consumers |
| `codegraph fn-impact` | GAUNTLET, SYNC | Blast radius |
| `codegraph search` | GAUNTLET | Duplicate code detection (needs embeddings) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Command table stale after new diff-review and semantic-assertion additions

Three commands were removed from FORGE/GATE in the table, but the new steps introduced in this PR use them:

  1. codegraph fn-impact — listed as "GAUNTLET, SYNC" only, but is still called in titan-forge Step 4 (pre-touch understanding) and in the new Step 9 D4 (deletion audit). It's also now used in titan-gate Step 5a. All three users (FORGE, GATE) are missing.

  2. codegraph context — listed as "SYNC" only, but titan-forge Step 4 still calls codegraph context <target> -T --json (this code is unchanged and confirmed in the current file). FORGE should be restored.

  3. codegraph exports — listed as "GAUNTLET" only, but the new titan-gate Step 5a explicitly runs codegraph exports <changed-file> -T --json. GATE should be added.

The same table appears in .claude/skills/ (mirrored) so the same correction applies there.

Suggested correction:

Suggested change
| `codegraph exports` | GAUNTLET | Per-symbol export consumers |
| `codegraph fn-impact` | GAUNTLET, SYNC | Blast radius |
| `codegraph search` | GAUNTLET | Duplicate code detection (needs embeddings) |
| `codegraph exports` | GAUNTLET, GATE | Per-symbol export consumers |
| `codegraph fn-impact` | GAUNTLET, SYNC, FORGE, GATE | Blast radius |

And separately restore FORGE to the codegraph context row:

| `codegraph context` | SYNC, FORGE | Full function context |

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added GATE to exports and fn-impact rows, added FORGE to context row in the command table. Verified against actual usage in titan-gate Step 5a and titan-forge Steps 4 and 9.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added GATE to codegraph communities, codegraph structure, and codegraph deps in the command table. These are all used by GATE Step 5.5 (architectural snapshot comparison) but were missing from the "Used by" column.

5. Run /titan-gate — read .claude/skills/titan-gate/SKILL.md and follow it exactly.
Gate now includes semantic assertions (Step 5) and architectural snapshot comparison (Step 5.5).
The arch snapshot is at .codegraph/titan/arch-snapshot.json.
6. Commit on success, rollback on failure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 --yes description inaccurate — forge no longer accepts this flag

The argument hint says --yes is "passed through to forge", but this PR removes --yes from titan-forge's supported arguments list (the --yes → skip confirmation prompt line was deleted from titan-forge). The flag is now passed to forge's dispatch invocation (line ~835 of this file: yesFlag = "--yes" if autoConfirm else ""), but forge silently ignores it since it no longer handles that argument.

--yes still has meaningful effects within the orchestrator itself (auto-resume on existing state, skip the pre-pipeline and forge-checkpoint confirmations), so it should stay — but the description should be updated to reflect its actual scope.

Suggested change
6. Commit on success, rollback on failure
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the --yes description to reflect its actual scope: orchestrator-level prompts only (pre-pipeline, forge checkpoint, and resume). Applied to both .claude/skills/ and docs/examples/ copies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated the Rules section description of --yes to clarify both purposes: (1) passed through to forge so it skips its per-phase confirmation prompt, and (2) within the orchestrator itself, skips pre-pipeline, forge checkpoint, and resume prompts. Updated in both .claude/skills/titan-run/SKILL.md and the example copy.

@carlos-alm
Copy link
Contributor Author

Addressed round 2 Greptile feedback:

  1. Incomplete rollback — added unstaging step before working tree restore in Step 13
  2. Command table — added GATE to exports/fn-impact, added FORGE to context row
  3. --yes description — updated to reflect actual orchestrator scope

All applied to both .claude/skills/ and docs/examples/ copies.

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +189 to +193
**A1. Community stability:**
Compare community assignments between snapshot and current. For each symbol that **moved** to a different community:
- If the symbol was the target of this forge phase → **OK** (expected)
- If the symbol was NOT touched in the diff → **WARN**: "Symbol `<name>` shifted from community <X> to <Y> as a side effect"
- If > 5 untouched symbols shifted communities → **FAIL**: "Significant community restructuring detected — <N> symbols shifted communities. This change may have unintended architectural impact."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Louvain community IDs are non-deterministic — A1 comparison will produce false positives

The Louvain algorithm (used by codegraph communities) assigns community IDs non-deterministically. Two runs on the exact same codebase can produce different ID assignments for identical clusters (e.g., "community 1" pre-forge might be "community 3" post-forge even if nothing changed). Comparing community assignments by ID between the pre-forge snapshot and the current state will therefore flag nearly every symbol as having "moved," triggering spurious WARN and FAIL verdicts on completely valid commits.

The same issue exists at docs/examples/claude-code-skills/titan-gate/SKILL.md lines 189–193.

To make this comparison reliable, community membership should be compared by content (e.g., match communities across snapshots by finding the best overlap in member sets — Hungarian algorithm or a simple "largest intersection" heuristic), not by raw ID. Alternatively, codegraph communities --drift is already designed to track community evolution across time and should be used for this comparison instead of directly diffing the raw communities output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — A1 comparison now uses codegraph communities --drift instead of raw community IDs. This uses content-based matching (largest-intersection heuristic) rather than fragile IDs that change across runs. Applied to both .claude/skills/ and docs/examples/ copies.

- `--skip-gauntlet` → skip gauntlet (assumes artifacts exist)
- `--start-from <phase>` → jump to phase: `recon`, `gauntlet`, `sync`, `forge`
- `--gauntlet-batch-size <N>` → batch size for gauntlet (default: 5)
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 --yes description contradicts the Rules section and the dispatch code

Line 21 says --yes applies to "orchestrator-level prompts only (pre-pipeline, forge checkpoint, and resume prompts)", but:

  1. Line 574 (Rules section) explicitly says "Pass --yes through to forge if the user provided it, so forge doesn't prompt for confirmation on each phase."
  2. The dispatch code on line 421 confirms this: yesFlag = "--yes" if autoConfirm else "", which is then passed to forge on line 422.
  3. titan-forge Step 0.8 still checks for --yes to skip per-phase confirmation.

The description was updated as part of a previous fix but the Rules section (line 574) was not reconciled with it — and the Rules section is actually the more accurate description of the real behavior.

The same inconsistency exists in docs/examples/claude-code-skills/titan-run/SKILL.md at the corresponding lines.

Suggested change
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts)
- `--yes` → skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated --yes description to explicitly include forge scope: 'skip all confirmation prompts in the orchestrator (pre-pipeline, forge checkpoint, and resume prompts) and in forge (per-phase confirmation)'. Consistent with the Rules section and dispatch code.

Comment on lines +182 to +184
codegraph communities -T --json > /tmp/titan-arch-current-communities.json
codegraph structure --depth 2 --json > /tmp/titan-arch-current-structure.json
codegraph communities --drift -T --json > /tmp/titan-arch-current-drift.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded /tmp paths not cleaned up — potential stale data across gate invocations

The three temp files are written to hardcoded /tmp/titan-arch-current-*.json paths and never deleted within the gate skill. Because /tmp is shared across worktrees, two concurrent invocations of /titan-gate (e.g., from parallel forge sessions in different worktrees) would overwrite each other's files mid-run, producing subtly wrong comparisons without any error signal.

Even within a single session, stale files from a prior gate run would sit in /tmp until the next run overwrites them — harmless in the common case but fragile.

Consider either:

  1. Using a unique path per invocation: /tmp/titan-arch-<timestamp>-current-communities.json
  2. Adding a cleanup step at the end of Step 5.5 (or in the gate's exit path) to rm -f /tmp/titan-arch-current-*.json

The same applies to docs/examples/claude-code-skills/titan-gate/SKILL.md at the corresponding lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — temp files now use unique paths with epoch+PID suffix (/tmp/titan-arch-${TITAN_TMP_ID}-current-*.json) and are cleaned up at the end of Step 5.5. No more cross-worktree collisions.

@carlos-alm
Copy link
Contributor Author

@greptileai

> "GATE FAIL: [reason]. Graph restored, changes unstaged but preserved. Fix and re-stage."

For structural-only failures (Steps 2-4, 6-8), do NOT auto-rollback — report and let user decide.
For structural-only failures (Steps 1-3, 5-7), do NOT auto-rollback — report and let user decide.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stale step range excludes new semantic assertions from auto-rollback

The auto-rollback exception "Steps 1-3, 5-7" was written when Step 5 was "Branch structural diff." This PR inserts the new Step 5 (Semantic assertions) and Step 5.5 (Architectural snapshot) before the old Step 5, which is now Step 6. The step numbers in the exception were never updated.

As a result, the new Step 5 (semantic assertions) and Step 5.5 (architectural snapshot) are now implicitly classified as "structural-only" failures that skip auto-rollback. But a semantic FAIL — e.g., "Removed export parseConfig still imported by 3 files" — is a real code-correctness breakage, not a structural advisory. Leaving staged changes in place after such a FAIL means the next operation could commit broken code.

The exception should be updated to refer to the old structural/advisory checks by their new numbers:

Suggested change
For structural-only failures (Steps 1-3, 5-7), do NOT auto-rollback — report and let user decide.
For structural-only failures (Steps 1-3, 6-8), do NOT auto-rollback — report and let user decide.

The same line exists in docs/examples/claude-code-skills/titan-gate/SKILL.md at the corresponding position and needs the identical fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — updated auto-rollback exception from 'Steps 1-3, 5-7' to 'Steps 1-3, 6-8' to reflect the step renumbering after new Step 5 (semantic assertions) and Step 5.5 (architectural snapshot) were inserted.

- **Back up state before every sub-agent.** The `.bak` file is your safety net against mid-write crashes.
- **Mandatory pause before forge** unless `--yes` is set. Analysis is safe; code changes deserve human review.
- **Stall detection is strict for forge** (2 retries) and looser for gauntlet (3 retries) since gauntlet is more likely to hit context limits legitimately.
- **Respect --start-from.** Skip phases before the specified starting point, but verify their artifacts exist AND pass validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 --start-from artifact validation promised in Rules but absent from the procedure

The Rules section states:

Respect --start-from. Skip phases before the specified starting point, but verify their artifacts exist AND pass validation.

However, none of Steps 1–3 (or any preamble step) contains the corresponding "if phase is being skipped, run its V-checks" logic. When the orchestrator runs --start-from forge, it jumps straight to Step 3.5/4 and reads sync.json directly. A missing or partially-corrupt gauntlet.ndjson (e.g., 35 % corrupt lines from a previous crash) would silently pass through — the V5/V6 integrity checks only run when Step 2 is NOT skipped.

The practical failure mode: an agent running --start-from forge on a corrupted or incomplete gauntlet file will proceed to forge and make real commits based on bad recommendations, with no upfront warning.

Consider adding an explicit pre-loop artifact validation block at the top of each skipped phase, or consolidating it into a "Step 0.5 — Artifact pre-validation" that runs for all phases preceding startPhase. At minimum, the V-checks for the prior phases should be extracted and re-run even when the phase itself is skipped.

The same gap exists in docs/examples/claude-code-skills/titan-run/SKILL.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added Step 0.5 (Artifact pre-validation) that runs V-checks for all phases before the start phase. If recon is skipped, runs V1/V2/V4; if gauntlet is skipped, runs V5/V6/V7 + NDJSON integrity; if sync is skipped, runs V8/V9/V10. Missing or invalid artifacts stop the pipeline before any sub-agent is spawned.

…rollback range, FAIL template

- Use codegraph communities --drift for A1 comparison instead of raw IDs
- Add unique /tmp paths per invocation (epoch+PID) with cleanup
- Update auto-rollback exception from Steps 1-3,5-7 to Steps 1-3,6-8
- Split FAIL template: test/lint (rollback) vs structural/semantic (preserved)
… validation

- Update --yes description to include forge per-phase confirmation
- Add Step 0.5 artifact pre-validation for --start-from skipped phases
@carlos-alm
Copy link
Contributor Author

Addressed all 6 outstanding Greptile items from Round 3:

  1. Louvain non-determinism — A1 now uses codegraph communities --drift for content-based matching instead of raw community IDs
  2. /tmp path collisions — temp files use unique epoch+PID suffix, with cleanup at end of Step 5.5
  3. --yes description — updated to include forge per-phase confirmation scope
  4. Stale rollback range — updated from "Steps 1-3, 5-7" to "Steps 1-3, 6-8"
  5. --start-from artifact validation — added Step 0.5 with V-checks for all skipped phases
  6. FAIL template rollback — split into test/lint (rollback) vs structural/semantic (preserved) paths

All changes applied to both .claude/skills/ and docs/examples/claude-code-skills/ copies.

@carlos-alm
Copy link
Contributor Author

@greptileai


## Rules

- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13) and NDJSON integrity checks are run directly since they're pure validation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Arch snapshot commands missing from Rules exception list

The Rules section declares exactly two exceptions to "never run codegraph commands directly": the post-forge test run (V13) and NDJSON integrity checks. However, Step 3.5a runs three codegraph commands directly from the orchestrator:

codegraph communities -T --json
codegraph structure --depth 2 --json
codegraph communities --drift -T --json

An agent following the Rules strictly would interpret Step 3.5a as a violation and skip it, resulting in no arch-snapshot.json being written. Since gate's Step 5.5 silently skips when arch-snapshot.json is absent, the entire architectural comparison layer would be silently disabled for every forge run — with no indication to the user.

Step 3.5a should be explicitly added to the Rules exception list, e.g.:

Suggested change
- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13) and NDJSON integrity checks are run directly since they're pure validation.
- **You are the orchestrator, not the executor.** Never run codegraph commands, edit source files, or make commits yourself. Only spawn sub-agents and read state files. The ONE exception: the post-forge test run (V13), NDJSON integrity checks, and the pre-forge architectural snapshot capture (Step 3.5a) are run directly since they're pure validation or one-time data collection.

The same line exists in docs/examples/claude-code-skills/titan-run/SKILL.md and needs the same fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added Step 3.5a (pre-forge architectural snapshot capture) to the Rules exception list. The wording now reads "Exceptions (pure validation/snapshot, no code changes): the post-forge test run (V13), NDJSON integrity checks, and the pre-forge architectural snapshot capture (Step 3.5a) are run directly by the orchestrator." Applied to both .claude/skills/ and docs/examples/ copies.

Comment on lines +97 to +101

## Step 0.5 — Artifact pre-validation (--start-from only)

**Only run this step if `--start-from` was specified.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 --skip-recon/--skip-gauntlet bypass Step 0.5 artifact pre-validation

Step 0.5 says "Only run this step if --start-from was specified." But --skip-recon and --skip-gauntlet also skip phases without triggering Step 0.5's validation. The table in Step 0.5 covers exactly what's needed when these phases are skipped:

  • --skip-recon → should run V1/V2/V4 (but doesn't)
  • --skip-gauntlet → should run V5/V6/V7 + NDJSON integrity check (but doesn't)

The failure mode is concrete: a user running /titan-run --skip-gauntlet with a partially-corrupt gauntlet.ndjson (e.g., from a prior crash) will skip directly to sync, which produces a sync.json based on bad data, and then forge makes real commits against wrong recommendations — all with no upfront warning. --start-from gauntlet and --skip-recon are semantically equivalent, but only one triggers the guard.

The condition in Step 0.5 should also include the skip flags:

Suggested change
## Step 0.5 — Artifact pre-validation (--start-from only)
**Only run this step if `--start-from` was specified.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding.
**Only run this step if `--start-from` was specified OR if `--skip-recon` or `--skip-gauntlet` is set.** Phases before the start point are being skipped — their artifacts must exist and be valid before proceeding.

The same issue exists in docs/examples/claude-code-skills/titan-run/SKILL.md at the corresponding location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Step 0.5 now triggers when any phase-skipping flag is set: --start-from, --skip-recon, or --skip-gauntlet. The heading was updated to "Artifact pre-validation (phase skip)" and the condition text specifies which artifacts to validate for each flag. Applied to both copies.

Comment on lines +485 to +492
# V13. Test suite still green after forge commits
# Quick sanity — run tests to make sure the cumulative commits haven't broken anything
npm test 2>&1
if tests fail:
Print: "CRITICAL: Test suite fails after forge phase <nextPhase>. Stopping pipeline."
Print: "Commits from this phase: git log --oneline <headBefore>..<headAfter>"
Print: "Consider reverting: git revert <headBefore>..<headAfter>"
Stop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 V13 hardcodes npm test — inconsistent with gate's tool detection

The gate's Step 4 detects the available test runner by checking package.json scripts before running npm test. V13 here skips that check and unconditionally runs npm test. On a project using pnpm, yarn, bun, or a project with no test script defined, this produces a misleading failure (e.g., npm error Missing script: "test") rather than a clean "no test runner found" message, and the pipeline would stop on a false positive.

The same hardcoded npm test appears in titan-forge Step 10 (line 190 of .claude/skills/titan-forge/SKILL.md). Consider either detecting the runner first (matching gate's approach), or at minimum documenting the assumption that only npm projects are supported.

The same issue exists in docs/examples/claude-code-skills/titan-run/SKILL.md and docs/examples/claude-code-skills/titan-forge/SKILL.md.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — V13 in titan-run and Step 10 in titan-forge now say "detect the project's test command from package.json scripts — npm test, yarn test, pnpm test, etc." with \ as the placeholder instead of hardcoded \Unknown command: test"

@carlos-alm
Copy link
Contributor Author

@greptileai

Comment on lines +182 to +227
TITAN_TMP_ID="$(date +%s)-$$"
codegraph communities -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json
codegraph structure --depth 2 --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json
codegraph communities --drift -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json
```

### Compare

**A1. Community stability:**
Use the drift output (which uses content-based matching, not raw IDs, to track community movements across runs):

```bash
# Compare current drift report against snapshot drift baseline
# New drift warnings not present in arch-snapshot.json → side-effect restructuring
```

Read `.codegraph/titan/arch-snapshot.json → drift` (the pre-forge drift baseline) and compare against `/tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json`:
- For each **new** drift warning in current that was NOT present in the snapshot: if the drifted symbol was NOT touched in the diff → **WARN**: "Symbol `<name>` drifted community as a side effect"
- If > 5 untouched symbols appear in new drift warnings → **FAIL**: "Significant community restructuring detected — <N> symbols drifted communities. This change may have unintended architectural impact."

**A2. Dependency direction between domains:**
From `GLOBAL_ARCH.md`, extract the expected dependency direction between domains (e.g., "presentation depends on features, not the reverse").

Check if any new cross-domain dependency violates the expected direction:
```bash
codegraph deps <changed-file> --json
```
- New upward dependency (lower layer importing higher layer) not present in snapshot → **FAIL**
- New lateral dependency within the same layer → **OK**

**A3. Cohesion delta:**
Compare directory cohesion scores from `structure`:
- If any directory's cohesion dropped by > 0.2 → **WARN**: "Directory `<dir>` cohesion dropped from <X> to <Y>"
- If a directory went from above 0.5 to below 0.3 → **FAIL**: "Directory `<dir>` became tangled (cohesion <X> → <Y>)"

**A4. New drift warnings:**
Compare drift warnings between snapshot and current:
- New drift warning not in snapshot → **WARN** with details
- Drift warning resolved → note as positive

### Cleanup

```bash
rm -f /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json \
/tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json \
/tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TITAN_TMP_ID shell variable won't survive between separate Bash tool invocations

TITAN_TMP_ID is set in the "Capture current state" block (line 182), but the "Cleanup" block (lines 225-227) and the inline A1 comparison prose (line 198) reference ${TITAN_TMP_ID} as if it's still in scope. In Claude Code, each bash code block is run as a separate shell invocation — shell variables do not persist between them.

When an AI agent executes the cleanup block independently, TITAN_TMP_ID is unset, so the rm -f becomes rm -f /tmp/titan-arch--current-*.json, which silently matches nothing. The actual temp files at /tmp/titan-arch-<timestamp>-<pid>-current-*.json are left behind — re-introducing the stale-file problem that the TITAN_TMP_ID fix was meant to solve.

The same issue exists at docs/examples/claude-code-skills/titan-gate/SKILL.md:182-227.

Options to fix:

  1. Combine the capture, compare, and cleanup into a single bash block so the variable stays in scope:
TITAN_TMP_ID="$(date +%s)-$$"
codegraph communities -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json
codegraph structure --depth 2 --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json
codegraph communities --drift -T --json > /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json
# ... comparison logic inline ...
rm -f /tmp/titan-arch-${TITAN_TMP_ID}-current-communities.json \
      /tmp/titan-arch-${TITAN_TMP_ID}-current-structure.json \
      /tmp/titan-arch-${TITAN_TMP_ID}-current-drift.json
  1. Or use a fixed but gated per-invocation directory (mkdir + trap):
TITAN_TMP_DIR="$(mktemp -d /tmp/titan-arch-XXXXXX)"
trap 'rm -rf "$TITAN_TMP_DIR"' EXIT
codegraph communities -T --json > "$TITAN_TMP_DIR/communities.json"
codegraph structure --depth 2 --json > "$TITAN_TMP_DIR/structure.json"
codegraph communities --drift -T --json > "$TITAN_TMP_DIR/drift.json"

@carlos-alm carlos-alm force-pushed the feat/release-skill-auto-semver branch from 4e93913 to 9274f3f Compare March 22, 2026 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant