docs: require deeper PR review evidence

This commit is contained in:
Peter Steinberger
2026-05-31 18:42:32 +01:00
parent 201bf125af
commit 05b3f1c29d
2 changed files with 5 additions and 0 deletions

View File

@@ -187,11 +187,13 @@ gh pr view <number> --json additions,deletions,changedFiles \
## Read beyond the diff
- Review the surrounding code path, not just changed lines. Open the caller, callee, data contracts, adjacent tests, and owner module.
- Before any verdict, read enough code to fill this map: changed surface, runtime entry point, owner boundary, one caller, one callee, sibling implementations sharing the invariant, adjacent tests, current `main` behavior, and shipped/dependency/Codex contracts when relevant.
- For large-codebase PRs, sample enough related files to understand the runtime boundary before deciding. Default to more code reading when the change touches agents, gateway, plugins, auth, sessions, process, config, or provider/runtime seams.
- Compare the PR against current `origin/main` behavior. Check whether recent main already changed the same surface.
- Dependency-backed behavior: MUST read upstream docs/source/types before judging API use, defaults, output shapes, errors, timeouts, memory behavior, or compatibility. Do not assume dependency contracts from memory or PR text.
- Judge solution quality, not only correctness. Ask whether the PR is the clean owner-boundary fix or a wart/workaround that should be replaced by a small refactor, moved seam, contract change, or deletion of duplicate logic.
- Mention the main files read when the verdict depends on code-path evidence.
- If the user challenges the verdict or asks whether the idea is really good, resume code reading first. Do not defend, soften, or reverse the verdict until the missing caller/callee/sibling/dependency path is checked.
## Best-fix review loop
@@ -206,6 +208,7 @@ Before verdict:
5. Compare against current `origin/main` and shipped behavior when regression/compat matters.
6. Inspect upstream dependency/Codex source or docs for dependency-backed behavior.
7. Identify at least one alternative fix location or shape, then reject it with evidence.
8. If any required path above is uninspected, keep reading or mark `Remaining uncertainty`; do not call the PR best, blocked, proof-sufficient, or merge-ready.
Review output must include:

View File

@@ -10,6 +10,7 @@ Skills own workflows; root owns hard policy and routing.
- Docs/user-visible work: `pnpm docs:list`, then read relevant docs only.
- Fix/triage answers need source, tests, current/shipped behavior, and dependency contract proof.
- Reviews/answers: high confidence required. Default to exhaustive relevant codebase search/read, including owners, callers, siblings, tests, docs, and upstream/dependency contracts before verdict. Diff-only review is insufficient.
- Review default: read the whole changed function/module plus callers, callees, sibling implementations, adjacent tests, scoped docs, and dependency/Codex contracts before saying `good`, `bad`, `best fix`, `proof sufficient`, or posting a comment. If challenged, keep reading first; do not defend the earlier verdict until the missing path is checked.
- Dependency-touching work: direct dependency inspection is mandatory when feasible; do not rely on assumptions, wrappers, or memory. Most dependencies are OSS, so read their source/docs/types. Codex-related work: before any verdict, comment, approval, merge recommendation, or `proof sufficient` claim, inspect sibling `../codex` source for the exact protocol/runtime behavior involved; if missing, clone `https://github.com/openai/codex.git` there first. Do not rely on PR text, OpenClaw wrappers, generated schemas, memory, or prior bot reviews as a substitute. Cite Codex files/lines checked in final/review/comment.
- Dependency-backed behavior: read upstream docs/source/types first. No API/default/error/timing guesses.
- External API work: Google/search for additional proof. Prefer official docs/source/types; cite current proof. No memory-only API claims.
@@ -30,6 +31,7 @@ Skills own workflows; root owns hard policy and routing.
- For PRs that add, remove, or change config/default surfaces with possible compatibility, upgrade, provider/plugin, operator, setup, startup, or fallback impact, ClawSweeper review should emit a `reviewMetrics` entry when practical. The metric should name the count and direction of the changes, such as added, changed, or removed config/default surfaces, and explain why the metric matters before merge. When the metric indicates concrete merge risk, also surface the concern in `risks`, use `mergeRiskLabels` when the risk matches the label rubric, make `bestSolution` name the desired pre-merge state, and ensure `labelJustifications` explain the specific reason rather than restating the label.
- Review whole decision surfaces, not only the touched runtime, provider, channel, harness, plugin seam, or context path. Check sibling Codex/Pi-style runtimes, provider/model routing, channel delivery, gateway/protocol, plugin SDK, and context-management paths when relevant.
- Every PR review must explicitly ask whether the PR is the best fix, not merely a plausible fix. Verdicts need a best-fix judgment backed by enough code reading to compare owner boundaries, callers, siblings, tests, docs, current `main`, shipped behavior when relevant, and dependency/Codex contracts when involved.
- Before a PR verdict, build a small evidence map: changed surface, entry point, owner boundary, at least one caller and callee, sibling surfaces that share the invariant, existing tests, and current `main` behavior. If any cell is missing, say the gap instead of concluding.
- One-sided fixes need sibling-surface proof, an explanation for why siblings are unaffected, or explicit follow-up work.
- Changelog findings: see Docs / Changelog.
- Public ClawSweeper comments prefer `https://docs.openclaw.ai/...` when a public docs page exists; structured evidence still cites repo files, lines, SHAs.