feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565
feat: add maintenance skills — deps-audit, bench-check, test-health, housekeep#565carlos-alm wants to merge 19 commits intomainfrom
Conversation
…housekeep Four recurring maintenance routines as Claude Code skills: - /deps-audit: vulnerability scanning, staleness, unused deps, license checks - /bench-check: benchmark regression detection against saved baselines - /test-health: flaky test detection, dead tests, coverage gap analysis - /housekeep: clean worktrees, dirt files, sync main, prune branches
|
Claude finished @carlos-alm's task —— View job PR Review: Maintenance SkillsReviewing this PR that adds 4 new maintenance skills for Claude Code. Todo List:
Root Cause AnalysisProblem Being Solved: This PR addresses maintenance debt and technical hygiene gaps that accumulate in active development repositories. The skills automate routine maintenance tasks that are typically done manually or inconsistently. Why This Approach: These are documentation-only additions that define Claude Code skills (not executable code), following the established Risk Assessment: Low risk - these are pure documentation files that define maintenance workflows. No code changes, no dependencies added. Backlog Compliance
Critical ConcernsNone identified. The skills are:
Quality AssessmentExcellent Design Patterns:
Minor Observations:
Technical Validation:
Final Recommendation
|
Greptile SummaryThis PR introduces four new maintenance skills ( Key changes:
Confidence Score: 3/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User invokes skill] --> B{Which skill?}
B --> BC[/bench-check/]
B --> DA[/deps-audit/]
B --> TH[/test-health/]
B --> HK[/housekeep/]
BC --> BC0[Phase 0: Pre-flight\nparse --save-baseline/--compare-only/--threshold]
BC0 --> BC1[Phase 1: Run 4 benchmark scripts\ncapture output+exit_code per script]
BC1 --> BC2[Phase 2: Normalize to flat metrics map]
BC2 --> BC3{SAVE_ONLY\nor no baseline?}
BC3 -- No --> BC3b[Phase 3: Compare vs baseline\ndelta_pct, guard zero-baseline]
BC3 -- Yes --> BC4[Phase 4: Verdict\nSave-baseline / First-run path]
BC3b --> BC4b[Phase 4: Verdict\nPASSED / FAILED / COMPARE_ONLY]
BC4 --> BC5[Phase 5: Write baseline.json + history.ndjson\ngit add → diff --cached -- files → commit]
BC4b --> BC5
BC5 --> BC6[Phase 6: Write BENCH_REPORT_date.md]
DA --> DA0[Phase 0: Pre-flight\nif --fix: git stash push → STASH_CREATED]
DA0 --> DA1[Phase 1-5: Audit vulns/outdated/unused/licenses/dupes]
DA1 --> DA6[Phase 6: Write DEPS_AUDIT_date.md]
DA6 --> DA7{AUTO_FIX?}
DA7 -- Yes --> DA7a[npm test → branch on pass/fail × STASH_CREATED\ndrop or pop stash / git checkout]
DA7 -- No --> DA_done[Done]
DA7a --> DA_done
TH --> TH0[Phase 0: Pre-flight\nparse --flaky-runs/--quick/--coverage]
TH0 --> TH1{COVERAGE_ONLY\nor QUICK?}
TH1 -- No --> TH1a[Phase 1: Flaky detection\nmktemp RUN_DIR → N vitest runs → per-run JSON\nanalyze → rm -rf RUN_DIR]
TH1 -- Yes --> TH2
TH1a --> TH2[Phase 2: Dead/trivial test scan]
TH2 --> TH3[Phase 3: Coverage gap analysis\nvitest --coverage → parse coverage-summary.json]
TH3 --> TH4[Phase 4: Structural analysis]
TH4 --> TH5[Phase 5: Write TEST_HEALTH_date.md]
HK --> HK0[Phase 0: Pre-flight\nparse --dry-run/--skip-update]
HK0 --> HK1[Phase 1: Clean stale worktrees\ngit worktree prune + confirm removal]
HK1 --> HK2[Phase 2: Delete dirt files\ntmp/bak/log/coverage/DS_Store]
HK2 --> HK3[Phase 3: Sync with main\ngit fetch + pull/inform]
HK3 --> HK4[Phase 4: Prune merged branches\nconfirm each git branch -d]
HK4 --> HK5[Phase 5: Update codegraph\nSKIPPED — source repo guard always fires]
HK5 --> HK6[Phase 6: Verify health\ncodegraph stats + npm ls + git fsck]
HK6 --> HK7[Phase 7: Print console report]
Reviews (6): Last reviewed commit: "fix: housekeep require confirmation befo..." | Re-trigger Greptile |
| - `nodeCount`, `edgeCount` — graph size | ||
|
|
||
| ### 1b. Incremental Benchmark | ||
|
|
||
| ```bash | ||
| node scripts/incremental-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `noOpRebuild` (ms) — time for no-change rebuild | ||
| - `singleFileRebuild` (ms) — time after one file change | ||
| - `importResolution` (ms) — resolution throughput | ||
|
|
||
| ### 1c. Query Depth Benchmark | ||
|
|
||
| ```bash | ||
| node scripts/query-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `fnDeps` scaling by depth | ||
| - `fnImpact` scaling by depth | ||
| - `diffImpact` latency | ||
|
|
||
| ### 1d. Embedding Benchmark (optional) | ||
|
|
||
| ```bash | ||
| node scripts/embedding-benchmark.js 2>/dev/null | ||
| ``` | ||
|
|
||
| Extract: | ||
| - `embeddingTime` (ms) | ||
| - `recall` at Hit@1, Hit@3, Hit@5, Hit@10 | ||
|
|
||
| > **Timeout:** Each benchmark gets 5 minutes max. If it times out, record `"timeout"` for that suite and continue. |
There was a problem hiding this comment.
2>/dev/null discards error messages that should be recorded
All four benchmark invocations pipe stderr to /dev/null, yet the Rules section (line 222) and the error-handling note on line 90 both say to record "error: <message>" when a script exits non-zero. With stderr silenced there is no message to capture — the skill can only detect that a failure occurred, not why.
Replace the silent discard with a captured-stderr pattern:
# Instead of:
node scripts/benchmark.js 2>/dev/null
# Use:
output=$(node scripts/benchmark.js 2>&1)
exit_code=$?
if [ $exit_code -ne 0 ]; then
# record "error: $output" and continue
fiThis applies to all four benchmark invocations (lines 48, 59, 70, 81).
There was a problem hiding this comment.
Fixed in commit a562b52. All four benchmark invocations now use output=$(... 2>&1) plus an exit_code check, so stderr is captured and recorded as "error: $output" rather than silently discarded.
There was a problem hiding this comment.
Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5, so the commit is skipped when the baseline values are identical to the previously committed version.
| - **Improvement**: metric decreased by more than `THRESHOLD`% (time) or increased (quality) | ||
| - **Stable**: within threshold | ||
|
|
||
| > **Direction awareness:** For latency metrics (ms), higher = worse. For recall/quality metrics, higher = better. For count metrics (nodes, edges), changes are informational only — not regressions. |
There was a problem hiding this comment.
Division-by-zero when baseline metric is
0
delta_pct = ((current - baseline) / baseline) * 100
If baseline is 0 (e.g. graph.edges on the very first build, or embedding.recall.hit1 on a cold model), this formula produces NaN or Infinity. The skill should guard against a zero baseline before computing the percentage:
if baseline == 0:
mark as "N/A — baseline was zero"
else:
delta_pct = ((current - baseline) / baseline) * 100
There was a problem hiding this comment.
Fixed in commit a562b52. Phase 3 now guards before dividing: when baseline == 0, the delta is marked "N/A — baseline was zero" and the metric is treated as informational only (no regression/improvement classification).
There was a problem hiding this comment.
The division-by-zero guard was already added in a562b52 (Round 2). The current Phase 3 text at step 2 reads: "Guard against division-by-zero: if baseline == 0, mark the delta as N/A and treat the metric as informational only (not a regression or improvement)." No further change needed.
…line - Replace 2>/dev/null with output=$(... 2>&1) + exit_code check on all four benchmark invocations so error messages are captured and recorded - Add division-by-zero guard in Phase 3: when baseline == 0, mark delta as "N/A — baseline was zero" (informational only, not a regression) - Add git add + git commit step in Phase 5 so the baseline file is actually committed after each save, matching the documented rule
- After reverting package.json + package-lock.json on --fix test failure, also run `npm ci` to resync node_modules/ with the restored lock file; without this the manifest is reverted but installed packages are not - Add explanatory comment on @anthropic-ai/tokenizer skip-list entry clarifying it is a peer dependency of @anthropic-ai/sdk and may be required at runtime without an explicit import in our code
…erion - Phase 5 (Update Codegraph): add source-repo guard that skips the self-update logic when running inside the codegraph source repo; comparing the dev version to the published release and running npm install is a no-op since codegraph is not one of its own deps - Phase 1b stale-worktree criterion: replace "created more than 7 days ago" (not determinable via git worktree list) with "last commit on the branch is more than 7 days old AND branch has no commits ahead of origin/main", using `git log -1 --format=%ci <branch>`
…at/maintenance-skills
|
Addressing the 5 items from the "Comments Outside Diff" section of the Greptile review: 1. deps-audit — incomplete revert (line 388–392): Fixed in commit 4fc994d. The Phase 7 revert path now runs 2. housekeep — Phase 5 self-update incoherent in source repo (line 572–607): Fixed in commit 89aef6b. Added a source-repo guard at the top of Phase 5: when running inside the codegraph source repo (confirmed by pre-flight), the phase is skipped entirely and prints 3. housekeep — stale-worktree creation timestamp not available (line 445–451): Fixed in commit 89aef6b. Replaced "created more than 7 days ago with no commits since" with a criterion that is actually determinable: the branch's last commit is more than 7 days old (via 4. bench-check — baseline declared committed but no git add/commit (line 226–227): Fixed in commit a562b52. Phase 5 now includes explicit 5. deps-audit — |
| ## Phase 6 — Report | ||
|
|
||
| Write a human-readable report to `generated/bench-check/BENCH_REPORT_<date>.md`: | ||
|
|
||
| ```markdown |
There was a problem hiding this comment.
Auto-commit fails silently when baseline values are unchanged
If the benchmark results are numerically identical to the stored baseline (e.g. a second back-to-back run on the same commit), generated/bench-check/baseline.json and history.ndjson will still be rewritten with a new timestamp. However, git commit will exit non-zero with "nothing to commit" if git detects the content hash is the same — causing the Phase 5 step to produce a spurious error.
Guard with --allow-empty (not ideal) or, better, only run the commit when the files actually changed:
git add generated/bench-check/baseline.json generated/bench-check/history.ndjson
if ! git diff --cached --quiet; then
git commit -m "chore: update bench-check baseline (<gitRef>)"
fiThis keeps Phase 5 idempotent and avoids a noisy error on successive clean runs.
There was a problem hiding this comment.
Fixed in ce5d811 — added git diff --cached --quiet || guard before git commit in Phase 5. When the baseline values are identical to the previously committed version, git add stages nothing and the commit is now skipped instead of erroring out.
|
Addressing Round 3 outside-diff feedback (ce5d811): test-health/SKILL.md � Changed both the flaky-detection loop and the coverage run from housekeep/SKILL.md � Changed Phase 4a from deps-audit/SKILL.md � Replaced the |
.claude/skills/deps-audit/SKILL.md
Outdated
| **Before running any auto-fix** (in Phase 1/2), save the original manifests so pre-existing unstaged changes are preserved: | ||
| ```bash | ||
| git stash push -m "deps-audit-backup" -- package.json package-lock.json | ||
| ``` | ||
|
|
||
| Summarize all changes made: | ||
| 1. List each package updated/fixed | ||
| 2. Run `npm test` to verify nothing broke | ||
| 3. If tests pass: drop the saved state (`git stash drop`) | ||
| 4. If tests fail: | ||
| - Restore the saved manifests: `git stash pop` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
Stash runs too late — backup is taken after files are already modified
The git stash push is described inside Phase 7, which executes after Phase 1 (npm audit fix) and Phase 2 (npm update) have already modified package.json and package-lock.json. An agent following the phases sequentially will reach Phase 7 with the manifests already changed, making the stash a backup of the modified state rather than the original state. If tests then fail and git stash pop is invoked, it restores the already-modified files — the recovery is a no-op.
The stash command must run inside Phase 0 (immediately after AUTO_FIX=true is detected), before any npm commands touch the manifests. For example, add a step to Phase 0:
4. Parse `$ARGUMENTS` — set `AUTO_FIX=true` if `--fix` is present
5. **If `AUTO_FIX` is set:** stash the original manifests now, before any modifications:
```bash
git stash push -m "deps-audit-backup" -- package.json package-lock.json
Then remove the stash instruction from Phase 7 (keeping only the summary/rollback logic there).
There was a problem hiding this comment.
Fixed — moved the stash command to Phase 0, step 5, so manifests are backed up before any npm commands run. Removed the stash from Phase 7.
| For stale worktrees with merged branches: | ||
| - List them and ask the user for confirmation before removing | ||
| - If confirmed (or `--full` without `--dry-run`): | ||
| ```bash | ||
| git worktree remove <path> | ||
| git branch -d <branch> # only if fully merged | ||
| ``` | ||
|
|
||
| **If `DRY_RUN`:** Just list what would be removed, don't do it. | ||
|
|
||
| > **Never force-remove** a worktree with uncommitted changes. List it as "has uncommitted work" and skip. |
There was a problem hiding this comment.
Default
--full silently removes stale worktrees without user confirmation
The cleanup condition at line 57 reads: "If confirmed (or --full without --dry-run)". Because --full is the default mode (no arguments = full cleanup), every default run of /housekeep will remove stale worktrees without asking. The project requires every Claude Code session to start with /worktree (CLAUDE.md: "Every session must start with /worktree to get an isolated copy of the repo"), meaning there may be active worktrees belonging to concurrent sessions that the "7-day old last commit" heuristic could incorrectly classify as stale — especially when a session hasn't committed anything yet but the branch tip is old.
The "never force-remove a worktree with uncommitted changes" guard (line 65) only catches worktrees with locally-modified files; it does not protect sessions that branched off main, haven't committed anything yet, but have no staged or unstaged changes either.
The confirmation prompt should always be required for worktree removal (regardless of --full), since this is a potentially destructive, irreversible action. Reserve auto-removal only for explicitly prunable worktrees (missing directory — git worktree prune), which are already guaranteed safe:
For stale worktrees with merged branches:
- **Always** list them and ask the user for confirmation before removing
(even with `--full` — worktree deletion is irreversible)
- If confirmed (and not `--dry-run`): ...There was a problem hiding this comment.
Fixed — worktree removal now always requires user confirmation, regardless of --full mode. Only git worktree prune (missing-directory cleanup) remains automatic.
| { | ||
| "savedAt": "<ISO 8601>", | ||
| "version": "<package version>", | ||
| "gitRef": "<HEAD short SHA>", |
There was a problem hiding this comment.
Hardcoded threshold in saved
baseline.json
Line 180 writes "threshold": 15 literally into baseline.json, regardless of the --threshold N argument parsed in Phase 0. If the user runs /bench-check --threshold 20, the stored baseline will record 15, making the artifact misleading when inspected later. While the skill always re-parses the threshold from $ARGUMENTS rather than reading it from the file, an operator debugging a regression from the JSON file will see the wrong value.
Replace the literal with the parsed variable:
| "gitRef": "<HEAD short SHA>", | |
| "threshold": <THRESHOLD>, |
There was a problem hiding this comment.
Fixed — baseline.json now writes "threshold": $THRESHOLD (the parsed value) instead of the hardcoded 15.
| ### First run (no baseline) | ||
| - Print: `BENCH-CHECK — initial baseline saved` | ||
| - Save current results as baseline | ||
|
|
There was a problem hiding this comment.
--compare-only doesn't guard the "First run" baseline save
Phase 4 has two result paths:
- "No regressions found" correctly guards with "If not
COMPARE_ONLY: update baseline." - "First run (no baseline)" has no such guard — it unconditionally saves a baseline.
An agent running /bench-check --compare-only against a repo with no prior baseline will fall through to the "First run" path and save a baseline, contradicting the --compare-only semantics ("compare against baseline without updating it").
Add the same guard to the first-run path:
### First run (no baseline)
- If `COMPARE_ONLY`: print a warning that no baseline exists and exit
- Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baselineThere was a problem hiding this comment.
Fixed — the first-run path now checks COMPARE_ONLY: if set, prints a warning that no baseline exists and exits without saving. Otherwise proceeds to save the initial baseline as before.
.claude/skills/test-health/SKILL.md
Outdated
| ```bash | ||
| for i in $(seq 1 $FLAKY_RUNS); do | ||
| npx vitest run --reporter=json 2>&1 | ||
| done | ||
| ``` | ||
|
|
||
| For each run, parse the JSON reporter output to get per-test results. |
There was a problem hiding this comment.
Flaky detection loop discards output — nothing to parse
The Phase 1 loop runs:
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json 2>&1
doneThe JSON reporter output streams to stdout and is not captured in any variable, so there is no in-memory data for the "parse the JSON reporter output to get per-test results" step immediately below. Each iteration's output will be mixed into the session's terminal output and lost before comparison is possible.
Each run's output must be captured and stored for cross-run comparison. For example:
mkdir -p /tmp/test-health-runs
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err
doneThen read /tmp/test-health-runs/run-*.json to build the per-test pass/fail matrix.
There was a problem hiding this comment.
Fixed — the flaky-detection loop now redirects each run's output to /tmp/test-health-runs/run-.json (stdout) and /tmp/test-health-runs/run-.err (stderr). The parsing step reads from these per-run files for cross-run comparison.
| If `AUTO_FIX` was set: | ||
|
|
||
| Summarize all changes made: | ||
| 1. List each package updated/fixed | ||
| 2. Run `npm test` to verify nothing broke | ||
| 3. If tests pass: drop the saved state (`git stash drop`) | ||
| 4. If tests fail: | ||
| - Restore the saved manifests: `git stash pop` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
Stash pop/drop operates on wrong entry when Phase 0 stash was a no-op
git stash push -- package.json package-lock.json is a no-op when neither file has any local modifications (the common case — a fresh branch before running the skill). Git outputs "No local changes to save" and exits non-zero without creating a stash entry.
Phase 7 then unconditionally calls either git stash drop (success path) or git stash pop (failure path). With no deps-audit-backup entry on the stack:
- On success:
git stash dropsilently pops a pre-existing, unrelated stash entry (data loss). - On failure:
git stash poprestores the wrong stash, while the brokenpackage.json/package-lock.jsonfrom the failednpm audit fix/npm updaterun remain on disk.
The stash guard in Phase 0 should track whether the stash was actually created:
5. **If `AUTO_FIX` is set:**
```bash
git stash push -m "deps-audit-backup" -- package.json package-lock.json
STASH_CREATED=$?
# STASH_CREATED=0 → stash entry exists; STASH_CREATED=1 → nothing was stashed
Then in Phase 7, only call `git stash drop`/`git stash pop` when `STASH_CREATED` was `0`. When `STASH_CREATED` was `1`, the manifests were already clean and can simply be restored with `git checkout -- package.json package-lock.json` after a test failure.
There was a problem hiding this comment.
Fixed — Phase 0 now captures STASH_CREATED=$? after the stash push. Phase 7 branches on this: when STASH_CREATED=0 (stash was created), uses stash drop/pop as before; when STASH_CREATED=1 (no-op), uses git checkout -- package.json package-lock.json instead. The Rules section was updated to match.
.claude/skills/test-health/SKILL.md
Outdated
| mkdir -p /tmp/test-health-runs | ||
| for i in $(seq 1 $FLAKY_RUNS); do | ||
| npx vitest run --reporter=json > /tmp/test-health-runs/run-$i.json 2>/tmp/test-health-runs/run-$i.err | ||
| done |
There was a problem hiding this comment.
Hardcoded
/tmp/test-health-runs/ path corrupts results under concurrent sessions
CLAUDE.md explicitly states that "Multiple Claude Code instances run concurrently in this repo." If two sessions invoke /test-health simultaneously, both write to the same /tmp/test-health-runs/run-$i.json files. Run files from session A get overwritten by session B mid-loop, and the cross-run comparison in Phase 1 silently mixes results from two independent executions — producing false flaky-test results.
Use an isolated, unique working directory per invocation:
RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX)
for i in $(seq 1 $FLAKY_RUNS); do
npx vitest run --reporter=json > "$RUN_DIR/run-$i.json" 2>"$RUN_DIR/run-$i.err"
doneUsing mktemp -d guarantees each invocation gets its own directory, preventing cross-session pollution. Clean up with rm -rf "$RUN_DIR" at the end of Phase 1 (or Phase 5 after the report is written).
There was a problem hiding this comment.
Fixed — Replaced hardcoded /tmp/test-health-runs/ with RUN_DIR=$(mktemp -d /tmp/test-health-XXXXXX). All references now use $RUN_DIR. Added rm -rf "$RUN_DIR" cleanup after analysis is complete.
| ## Phase 4 — Verdict | ||
|
|
||
| Based on comparison results: | ||
|
|
||
| ### No regressions found | ||
| - Print: `BENCH-CHECK PASSED — no regressions beyond {THRESHOLD}% threshold` | ||
| - If not `COMPARE_ONLY`: update baseline with current results | ||
|
|
||
| ### Regressions found | ||
| - Print: `BENCH-CHECK FAILED — {N} regressions detected` | ||
| - List each regression with metric name, baseline value, current value, delta % | ||
| - Do NOT update the baseline | ||
| - Suggest investigation: | ||
| - `git log --oneline <baseline-ref>..HEAD` to find what changed | ||
| - `codegraph diff-impact <baseline-ref> -T` to find structural changes | ||
| - Re-run individual benchmarks to confirm (not flaky) | ||
|
|
||
| ### First run (no baseline) | ||
| - If `COMPARE_ONLY` is set: print a warning that no baseline exists and exit without saving | ||
| - Otherwise: print `BENCH-CHECK — initial baseline saved` and save current results as baseline |
There was a problem hiding this comment.
Missing verdict path when
--save-baseline is passed and a baseline already exists
Phase 3 is correctly skipped when SAVE_ONLY=true, but Phase 4 only defines three verdict paths:
- "No regressions found" — only reachable after a comparison run
- "Regressions found" — only reachable after a comparison run
- "First run (no baseline)" — only applies when no baseline file was found
When --save-baseline is passed against a repo that already has baseline.json, none of these paths apply. The agent has no clear verdict to follow and may fall through to an undefined state (e.g. silently skipping Phase 5 and never saving the baseline, or incorrectly entering the "First run" path even though a baseline was present).
Add a fourth explicit path:
### Save-baseline run (`--save-baseline`)
- Print: `BENCH-CHECK — baseline updated (previous baseline overwritten)`
- Proceed directly to Phase 5 to save the current results as the new baselineThis ensures the skill has a deterministic branch for every combination of flags × baseline-file-present.
There was a problem hiding this comment.
Fixed — Added a 4th explicit verdict path: "Save-baseline with existing baseline (--save-baseline)" which prints BENCH-CHECK — baseline overwritten (previous: <old gitRef>, new: <new gitRef>) and saves current results as the new baseline.
.claude/skills/bench-check/SKILL.md
Outdated
| For each metric in the current run: | ||
|
|
||
| 1. Look up the same metric in the baseline | ||
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) |
There was a problem hiding this comment.
Corrupted em-dash character in "N/A" string
The string "N/A � baseline was zero" contains a corrupted em-dash (the UTF-8 replacement character U+FFFD). This garbled text will appear verbatim in the regression table written to BENCH_REPORT_*.md and in the baseline.json marker, making the report hard to read.
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A � baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) | |
| 2. Guard against division-by-zero: if `baseline == 0`, mark the delta as `"N/A — baseline was zero"` and treat the metric as **informational only** (not a regression or improvement) |
There was a problem hiding this comment.
Fixed — Replaced the corrupted character with a proper em-dash. The string now reads "N/A — baseline was zero".
| ```bash | ||
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | ||
| git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)" | ||
| ``` |
There was a problem hiding this comment.
Commit command deviates from project's "specific file paths" convention
CLAUDE.md requires: "Commit with specific file paths: git commit <files> -m "msg"". The current Phase 5 does git add <files> followed by a bare git commit -m "...", which stages then commits everything in the index — potentially picking up unrelated staged changes from an ongoing session.
| ```bash | |
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet || git commit -m "chore: update bench-check baseline (<gitRef>)" | |
| ``` | |
| After writing both files, commit the baseline so it is a shared reference point: | |
| ```bash | |
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
There was a problem hiding this comment.
Fixed — Replaced git add + git commit -m with explicit file paths per project convention: git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "...".
When Phase 0 stash push is a no-op (manifests unchanged), Phase 7 was calling stash drop/pop on the wrong entry. Track STASH_CREATED exit code and branch on it: use git checkout when no stash exists.
…ent corruption Replace hardcoded /tmp/test-health-runs/ with mktemp -d so parallel sessions get isolated directories. Add cleanup at end of analysis.
…plicit commit paths Add 4th verdict path for --save-baseline when baseline already exists. Replace corrupted em-dash character in N/A string. Change commit command to use explicit file paths per project convention.
…ress Phase 5 was listed as "2 of 7 complete" with outdated pre-Phase 3 file paths. Updated to reflect actual state: 32 of 269 source modules migrated (~12%). Steps 5.3-5.5 now list exact migrated/remaining files with verified counts (5.3=8, 5.4=54, 5.5=175, total=237 JS-only files). Added note about 14 stale .js counterparts of already-migrated .ts files needing deletion.
.claude/skills/deps-audit/SKILL.md
Outdated
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | ||
| If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists) |
There was a problem hiding this comment.
Success +
STASH_CREATED=1 incorrectly reverts npm changes
When STASH_CREATED=1, it means the files had no pre-existing modifications before the skill ran — the stash push was a no-op. The npm commands then modified package.json/package-lock.json, and tests passed. In this success case the npm changes should be kept, but step 3 runs git checkout -- package.json package-lock.json, which reverts the files to HEAD — silently undoing every fix made by npm audit fix and npm update. This makes --fix a complete no-op on a clean branch.
The git checkout here was correct for the failure path (step 5), not the success path. The success path for STASH_CREATED=1 should be a no-op:
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | |
| If tests pass and `STASH_CREATED` is `1`: discard manifest changes with `git checkout -- package.json package-lock.json` (no stash entry exists) | |
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) | |
| If tests pass and `STASH_CREATED` is `1`: nothing to do — npm changes are already on disk and there is no stash entry to clean up |
There was a problem hiding this comment.
Fixed — the success path (STASH_CREATED=1, tests pass) no longer runs git checkout to discard manifest changes. The npm audit fix changes are good when tests pass, so no action is needed. Only the failure path reverts.
.claude/skills/bench-check/SKILL.md
Outdated
|
|
||
| After writing both files, commit the baseline so it is a shared reference point: | ||
| ```bash | ||
| git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
There was a problem hiding this comment.
git diff --quiet misses new files — baseline never committed on first run
git diff [<paths>] (without --cached or a tree-ish) compares the working tree to the index. On the very first run, baseline.json and history.ndjson are freshly created and are untracked — they have no index entry at all. For untracked files, git diff produces no output and exits 0, so the || short-circuits and git commit is never reached. The baseline is written to disk but silently never committed to git, contradicting the rule "The baseline file is committed to git."
Use git diff HEAD to compare the working tree against the last commit, which correctly shows new/untracked files as additions:
| git diff --quiet generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" | |
| git diff HEAD --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
This exits non-zero for both new files (first run) and modified files (subsequent runs), so the commit fires whenever there is genuinely new content.
There was a problem hiding this comment.
Fixed — replaced git diff --quiet with git add followed by git diff --cached --quiet. This ensures newly created files (first run) are staged and detected, so the baseline commit is not silently skipped.
| **If `DRY_RUN`:** List branches that would be deleted. | ||
|
|
||
| **Otherwise:** Delete merged branches: | ||
| ```bash | ||
| git branch -d <branch> # safe delete, only if fully merged | ||
| ``` | ||
|
|
||
| > **Never use `git branch -D`** (force delete). If `-d` fails, the branch has unmerged work — skip it. |
There was a problem hiding this comment.
Branch deletion silently skips user confirmation unlike worktree removal
Phase 1c (worktree removal) always requires user confirmation, even for clearly stale worktrees. Phase 4d ("Otherwise: Delete merged branches") runs git branch -d immediately with no equivalent confirmation step. While git branch -d is safe (refuses to delete unmerged work), users may still want to review the list of branches about to be pruned — especially in a repo where multiple concurrent sessions may have feature branches.
Consider adding a brief "List branches to be deleted and ask for confirmation" step before the git branch -d loop, consistent with Phase 1c's pattern.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed — branch deletion now requires user confirmation before each delete, consistent with worktree removal in Phase 1c.
When STASH_CREATED=1 and tests pass, the npm audit fix changes are good — no action needed. Previously it ran git checkout to discard them, which undid the successful fix.
git diff --quiet ignores untracked files, so on the first run when baseline.json and history.ndjson are newly created, the commit was skipped. Stage first with git add, then check with --cached.
Branch deletion now asks for user confirmation before each delete, consistent with worktree removal in Phase 1c.
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | ||
| git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
There was a problem hiding this comment.
git diff --cached --quiet not scoped to bench-check files
git diff --cached --quiet (without a path argument) returns non-zero whenever ANY file is staged in the repo — including files staged by concurrent sessions. This repo explicitly runs multiple Claude Code instances at the same time, each in its own worktree but sharing the same index state.
The problematic sequence:
- Another session stages unrelated files.
git add generated/bench-check/baseline.json generated/bench-check/history.ndjsonruns — but if the bench-check file content is identical to the last commit,git adddoesn't change the index entry for those files.git diff --cached --quietreturns non-zero because the other session's staged files are detected.git commit generated/bench-check/baseline.json generated/bench-check/history.ndjsonruns — but since the bench-check files haven't changed, git reports "nothing to commit" and exits non-zero, producing a spurious failure.
Scope the diff check to only the files being committed:
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" | |
| git add generated/bench-check/baseline.json generated/bench-check/history.ndjson | |
| git diff --cached --quiet -- generated/bench-check/baseline.json generated/bench-check/history.ndjson || git commit generated/bench-check/baseline.json generated/bench-check/history.ndjson -m "chore: update bench-check baseline (<gitRef>)" |
| npx vitest run --coverage --coverage.reporter=json 2>&1 | ||
| ``` | ||
|
|
||
| ### 3a. Overall coverage | ||
|
|
||
| Parse `coverage/coverage-summary.json` and extract: |
There was a problem hiding this comment.
Coverage reporter mismatch — wrong output file
--coverage.reporter=json generates coverage-final.json, not coverage-summary.json. The json-summary reporter is what generates coverage/coverage-summary.json, which is the file parsed in step 3a (line 115).
Because the generated file has the wrong name, the parse step silently fails (the skill's note says to skip phase on failure), meaning coverage analysis will never produce useful output without this fix.
| npx vitest run --coverage --coverage.reporter=json 2>&1 | |
| ``` | |
| ### 3a. Overall coverage | |
| Parse `coverage/coverage-summary.json` and extract: | |
| npx vitest run --coverage --coverage.reporter=json-summary 2>&1 |
Alternatively, keep --coverage.reporter=json but update the parse target to coverage/coverage-final.json. The json-summary format is more concise and maps more directly to the per-metric percentages extracted in step 3a (lines, branches, functions, statements).
| 3. If tests pass and `STASH_CREATED` is `0`: drop the saved state (`git stash drop`) — the npm changes are good, no rollback needed | ||
| If tests pass and `STASH_CREATED` is `1`: no action needed — the npm changes are good and no stash entry exists to clean up | ||
| 4. If tests fail and `STASH_CREATED` is `0`: | ||
| - Restore the saved manifests: `git stash pop` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed | ||
| 5. If tests fail and `STASH_CREATED` is `1`: | ||
| - Discard manifest changes: `git checkout -- package.json package-lock.json` | ||
| - Restore `node_modules/` to match the reverted lock file: `npm ci` | ||
| - Report what failed |
There was a problem hiding this comment.
git stash drop/pop targets stash by position, not by name
git stash drop and git stash pop (without an argument) always target stash@{0} — the most recently pushed stash. Stashes are shared across all worktrees in this repo. Since CLAUDE.md states multiple Claude Code instances run concurrently, another session could push a stash between Phase 0 (stash creation) and Phase 7 (stash consumption), making the deps-audit-backup entry no longer stash@{0}.
Consequences:
- Success +
STASH_CREATED=0:git stash dropdrops another session's stash — silent data loss. - Failure +
STASH_CREATED=0:git stash popapplies another session's stash to this worktree, while the brokenpackage.json/package-lock.jsonchanges remain on disk unrestored.
Resolve by looking up the stash entry by its named message before dropping or popping it:
# In Phase 0, after STASH_CREATED=$?
STASH_REF=$(git stash list | grep "deps-audit-backup" | head -1 | cut -d: -f1)
# In Phase 7, replace git stash drop / git stash pop with:
# success path (STASH_CREATED=0): git stash drop "$STASH_REF"
# failure path (STASH_CREATED=0): git stash pop "$STASH_REF"
git stash push -m "deps-audit-backup" guarantees a unique searchable label, so this lookup is reliable.
| ## Phase 5 — Update Codegraph | ||
|
|
||
| **Skip if `SKIP_UPDATE` is set.** | ||
|
|
||
| > **Source-repo guard:** This phase is only meaningful when codegraph is installed as a *dependency* of a consumer project. Because the pre-flight confirms we are inside the codegraph *source* repo (`"name": "@optave/codegraph"`), comparing the dev version to the published release and running `npm install` would be a no-op — codegraph is not one of its own dependencies. **Skip this entire phase** when running inside the source repo and print: | ||
| > `Codegraph: skipped (running inside source repo — update via git pull / branch sync instead)` | ||
|
|
||
| ### 5a. Check current version | ||
|
|
||
| ```bash | ||
| node -e "console.log(require('./package.json').version)" | ||
| ``` | ||
|
|
||
| ### 5b. Check latest published version | ||
|
|
||
| ```bash | ||
| npm view @optave/codegraph version | ||
| ``` | ||
|
|
||
| ### 5c. Update if needed | ||
|
|
||
| If a newer version is available: | ||
| - Show the version diff (current → latest) | ||
| - Check the CHANGELOG for what changed | ||
| - If it's a patch/minor: update automatically | ||
| ```bash | ||
| npm install | ||
| ``` | ||
| - If it's a major: warn the user and ask for confirmation | ||
|
|
||
| ### 5d. Rebuild | ||
|
|
||
| After any update: | ||
| ```bash | ||
| npm install | ||
| ``` | ||
|
|
||
| Verify the build works: | ||
| ```bash | ||
| npx codegraph stats 2>/dev/null && echo "OK" || echo "FAILED" |
There was a problem hiding this comment.
Phase 5 subphases (5a–5d) are unreachable dead documentation
Phase 0 pre-flight confirms the skill runs inside the @optave/codegraph source repo (line 22: checks "name": "@optave/codegraph"). Phase 5's source-repo guard then unconditionally skips the entire phase with a print statement:
"Because the pre-flight confirms we are inside the codegraph source repo ... Skip this entire phase when running inside the source repo."
Since the Phase 0 check is required for the skill to proceed at all, Phase 5 sub-steps 5a (check current version), 5b (check latest published version), 5c (update if needed), and 5d (rebuild) are unreachable documentation that will never execute. An operator reading the skill would reasonably believe these steps run.
Consider either:
- Moving the source-repo guard to the very top of Phase 5 (before 5a) so that the guard and the dead code are visually co-located, or
- Removing 5a–5d entirely and keeping only the guard + skip message, since the pre-flight already guarantees we are always in the source repo.
Summary
/deps-audit— Audit dependencies for vulnerabilities (npm audit), staleness (npm outdated), unused packages, and license compliance. Produces a scored health report. Optional--fixfor safe auto-updates./bench-check— Run benchmark suite against a saved baseline, detect regressions beyond a configurable threshold, maintain a history log for trend tracking. Guards against silent performance degradation./test-health— Detect flaky tests (multi-run), dead/trivial tests (no assertions), coverage gaps on core modules, and structural issues (oversized files, missing cleanup). Read-only audit with prioritized fix suggestions./housekeep— Local repo spring cleaning: prune stale worktrees, delete temp/dirt files, sync with main, update codegraph, prune merged branches, verify graph and node_modules integrity. Supports--dry-run.Test plan
/deps-auditand verify report is generated ingenerated/deps-audit//deps-audit --fixand verify safe updates are applied, tests pass/bench-check --save-baselineto create initial baseline/bench-checkagain to verify comparison works/test-health --quickand verify report ingenerated/test-health//test-health(full) and verify flaky detection runs N times/housekeep --dry-runand verify no modifications are made/housekeepand verify cleanup actions