test(qa-lab): trace scenario issue evidence

2026-06-06 05:51:15 +08:00 · 2026-05-22 00:51:32 +08:00
parent b33deb4159
commit efb7e4742f
9 changed files with 87 additions and 2 deletions
--- a/qa/scenarios/index.md
+++ b/qa/scenarios/index.md
@@ -5,8 +5,8 @@ Single source of truth for repo-backed QA suite bootstrap data.

 - `index.md` defines pack-level bootstrap data
 - each nested `*.md` scenario defines one runnable test via `qa-scenario` + `qa-flow`
- scenario markdown may also define coverage IDs, category metadata, required plugins,
-  lane filters, runtime parity tiers, and gateway config patching
+- scenario markdown may also define coverage IDs, evidence links, category metadata,
+  required plugins, lane filters, runtime parity tiers, and gateway config patching

 - kickoff mission
 - QA operator identity
@@ -20,6 +20,9 @@ Coverage tracking:
 - prefer reusing an existing feature ID over minting a scenario-shaped ID
 - avoid copying the scenario title into coverage IDs
 - use `pnpm openclaw qa coverage` to render the current inventory
+- use `evidence.github` for full `https://github.com/openclaw/openclaw/issues/<n>` or
+  `https://github.com/openclaw/openclaw/pull/<n>` links when a scenario directly protects
+  a reported regression, RFC, or accepted PR behavior
 - use `runtimeParityTier` for runtime-pair gate membership: `standard`,
  `optional`, `live-only`, or `soak`
 - treat the old `coverage: ["id"]` / `coverage: - id` list shape as invalid
--- a/qa/scenarios/runtime/codex-pi-shaped-read-vocabulary.md
+++ b/qa/scenarios/runtime/codex-pi-shaped-read-vocabulary.md
@@ -11,6 +11,10 @@ coverage:
  secondary:
    - runtime.prompt-compatibility
    - tools.fs.read
+evidence:
+  github:
+    - https://github.com/openclaw/openclaw/pull/80323
+    - https://github.com/openclaw/openclaw/issues/81734
 objective: Verify Codex-mode agents can satisfy legacy Pi-shaped "Read tool" wording through the native Codex workspace-read capability instead of stopping because duplicate OpenClaw dynamic read is intentionally filtered.
 successCriteria:
  - Agent reads the seeded workspace file and replies with the exact marker line.
--- a/qa/scenarios/runtime/first-hour-20-turn.md
+++ b/qa/scenarios/runtime/first-hour-20-turn.md
@@ -10,6 +10,11 @@ coverage:
    - runtime.first-hour-20
  secondary:
    - runtime.long-context
+evidence:
+  github:
+    - https://github.com/openclaw/openclaw/issues/80171
+    - https://github.com/openclaw/openclaw/issues/80337
+    - https://github.com/openclaw/openclaw/issues/80364
 objective: Verify both runtimes preserve a same-session conversation across the required 20-turn maintainer gate.
 successCriteria:
  - The same QA session accepts 20 sequential user turns.
--- a/qa/scenarios/runtime/soak-100-turn.md
+++ b/qa/scenarios/runtime/soak-100-turn.md
@@ -10,6 +10,11 @@ coverage:
    - runtime.soak-100
  secondary:
    - runtime.long-context
+evidence:
+  github:
+    - https://github.com/openclaw/openclaw/issues/80171
+    - https://github.com/openclaw/openclaw/issues/80338
+    - https://github.com/openclaw/openclaw/issues/80395
 objective: Provide an optional long-run soak that can be scheduled or run in Testbox without entering the maintainer default gate.
 successCriteria:
  - The same QA session accepts 100 sequential user turns.
--- a/qa/scenarios/runtime/tools/apply-patch.md
+++ b/qa/scenarios/runtime/tools/apply-patch.md
@@ -8,6 +8,9 @@ runtimeParityTier: standard
 coverage:
  primary:
    - tools.apply-patch
+evidence:
+  github:
+    - https://github.com/openclaw/openclaw/issues/80320
 objective: Verify apply_patch behavior is tracked across Pi and Codex while Codex owns patching natively.
 successCriteria:
  - Pi may expose OpenClaw apply_patch while Codex app-server mode may omit duplicate OpenClaw dynamic apply_patch.
--- a/qa/scenarios/runtime/tools/fs-read.md
+++ b/qa/scenarios/runtime/tools/fs-read.md
@@ -8,6 +8,9 @@ runtimeParityTier: standard
 coverage:
  primary:
    - tools.fs.read
+evidence:
+  github:
+    - https://github.com/openclaw/openclaw/issues/80312
 objective: Verify file read behavior is tracked across Pi and Codex while Codex owns read natively.
 successCriteria:
  - Pi may expose OpenClaw read while Codex app-server mode may omit duplicate OpenClaw dynamic read.