Docs: keep CI backlog out of policy doc

Docs: list remaining CI coverage gaps
Docs: drop unrelated instruction edits
2026-06-26 17:31:31 +08:00 · 2026-04-16 18:27:08 +02:00 · 2026-04-16 18:18:32 +02:00 · 2026-04-16 18:03:39 +02:00 · 2026-04-16 17:59:27 +02:00 · 2026-04-16 17:46:25 +02:00
7 changed files with 114 additions and 5 deletions
--- a/docs/.i18n/glossary.zh-CN.json
+++ b/docs/.i18n/glossary.zh-CN.json
@@ -207,6 +207,10 @@
    "source": "Release Policy",
    "target": "发布策略"
  },
+  {
+    "source": "Testing CI Policy",
+    "target": "测试 CI 策略"
+  },
  {
    "source": "Release policy",
    "target": "发布策略"
--- a/docs/.i18n/zh-Hans-navigation.json
+++ b/docs/.i18n/zh-Hans-navigation.json
@@ -496,7 +496,11 @@
        },
        {
          "group": "发布策略",
-          "pages": ["zh-CN/reference/RELEASING", "zh-CN/reference/test"]
+          "pages": [
+            "zh-CN/reference/RELEASING",
+            "zh-CN/reference/testing-ci-policy",
+            "zh-CN/reference/test"
+          ]
        }
      ]
    },
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -1573,7 +1573,7 @@
              },
              {
                "group": "Release policy",
-                "pages": ["reference/RELEASING", "reference/test"]
+                "pages": ["reference/RELEASING", "reference/testing-ci-policy", "reference/test"]
              }
            ]
          },
--- a/docs/help/testing.md
+++ b/docs/help/testing.md
@@ -348,6 +348,8 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
  - Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
 - Expectations:
  - Not CI-stable by design (real networks, real provider policies, quotas, outages)
+  - That usually means "run this in release or scheduled CI instead of PR CI"
+  - Do not read this as "only run it by hand"
  - Costs money / uses rate limits
  - Prefer running narrowed subsets instead of “everything”
 - Live runs source `~/.profile` to pick up missing API keys.
@@ -846,6 +848,10 @@ These Docker runners split into two buckets:
  explicitly want the larger exhaustive scan.
 - `test:docker:all` builds the live Docker image once via `test:docker:live-build`, then reuses it for the two live Docker lanes.
 - Container smoke runners: `test:docker:openwebui`, `test:docker:onboard`, `test:docker:gateway-network`, `test:docker:mcp-channels`, and `test:docker:plugins` boot one or more real containers and verify higher-level integration paths.
+- Use [Testing CI Policy](/reference/testing-ci-policy) to decide whether a
+  runner belongs in `PR CI`, `release CI`, `scheduled CI`, or `manual only`.
+  The important part is this: a suite can be required in CI even when it does
+  not block every PR or live in the publish workflow.

 The live-model Docker runners also bind-mount only the needed CLI auth homes (or all supported ones when the run is not narrowed), then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:

@@ -883,6 +889,9 @@ This lane expects a usable live model key, and `OPENCLAW_PROFILE_FILE`
 (`~/.profile` by default) is the primary way to provide it in Dockerized runs.
 Successful runs print a small JSON payload like `{ "ok": true, "model":
 "openclaw/default", ... }`.
+Keep this in release or scheduled CI if it matters for the product surface you
+are changing. Being slower than a normal PR lane is not a reason to quietly
+drop it to manual-only.
 `test:docker:mcp-channels` is intentionally deterministic and does not need a
 real Telegram, Discord, or iMessage account. It boots a seeded Gateway
 container, starts a second container that spawns `openclaw mcp serve`, then
--- a/docs/reference/RELEASING.md
+++ b/docs/reference/RELEASING.md
@@ -51,6 +51,15 @@ OpenClaw has three public release lanes:
 - This split is intentional: keep the real npm release path short,
  deterministic, and artifact-focused, while slower live checks stay in their
  own lane so they do not stall or block publish
+- Use [Testing CI Policy](/reference/testing-ci-policy) as the source of truth
+  for which end-to-end and live suites belong in `PR CI`, `release CI`,
+  `scheduled CI`, or `manual only`.
+- Read that split literally:
+  - the publish workflow is the short path that prepares and promotes artifacts
+  - other important end-to-end checks can still be required CI in release or
+    scheduled workflows
+- In other words, "not in the publish workflow" does not mean "manual only."
+  It often means "run it in a different CI lane."
 - Release checks must be dispatched from the `main` workflow ref so the
  workflow logic and secrets stay canonical
 - That workflow accepts either an existing release tag or the current full
--- a/docs/reference/test.md
+++ b/docs/reference/test.md
@@ -8,6 +8,7 @@ title: "Tests"
 # Tests

 - Full testing kit (suites, live, Docker): [Testing](/help/testing)
+- CI placement for end-to-end and live suites: [Testing CI Policy](/reference/testing-ci-policy)

 - `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests don’t collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
 - `pnpm test:coverage`: Runs the unit suite with V8 coverage (via `vitest.unit.config.ts`). Global thresholds are 70% lines/branches/functions/statements. Coverage excludes integration-heavy entrypoints (CLI wiring, gateway/telegram bridges, webchat static server) to keep the target focused on unit-testable logic.
@@ -28,9 +29,9 @@ title: "Tests"
 - `pnpm test:perf:profile:main`: writes a CPU profile for the Vitest main thread (`.artifacts/vitest-main-profile`).
 - `pnpm test:perf:profile:runner`: writes CPU + heap profiles for the unit runner (`.artifacts/vitest-runner-profile`).
 - Gateway integration: opt-in via `OPENCLAW_TEST_INCLUDE_GATEWAY=1 pnpm test` or `pnpm test:gateway`.
- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs.
- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip.
- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key (for example OpenAI in `~/.profile`), pulls an external Open WebUI image, and is not expected to be CI-stable like the normal unit/e2e suites.
+- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs. This is normal CI coverage, not just a local debugging command.
+- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip. These tests are often too noisy or expensive for PR CI, but that usually means "release or scheduled CI," not "skip CI."
+- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key (for example OpenAI in `~/.profile`) and pulls an external Open WebUI image. Treat it as a release or scheduled CI compatibility check rather than as a blocking PR lane.
 - `pnpm test:docker:mcp-channels`: Starts a seeded Gateway container and a second client container that spawns `openclaw mcp serve`, then verifies routed conversation discovery, transcript reads, attachment metadata, live event queue behavior, outbound send routing, and Claude-style channel + permission notifications over the real stdio bridge. The Claude notification assertion reads the raw stdio MCP frames directly so the smoke reflects what the bridge actually emits.

 ## Local PR gate
--- a/docs/reference/testing-ci-policy.md
+++ b/docs/reference/testing-ci-policy.md
@@ -0,0 +1,82 @@
+---
+summary: "Source of truth for where end-to-end and live tests belong in CI"
+read_when:
+  - Deciding whether an end-to-end or live suite belongs in CI
+  - Adding or moving Docker, release, or live-provider coverage
+title: "Testing CI Policy"
+---
+
+# Testing CI Policy
+
+This page is the source of truth for where OpenClaw end-to-end and live suites
+belong.
+
+Use this page to answer one practical question: when we have a real-world test,
+where should it run?
+
+Work through the questions in this order:
+
+1. Do we need this test to protect users from a real regression?
+2. If yes, where should it run: on PRs, before releases, on a schedule, or
+   only by hand?
+3. If it runs in CI, should it fail the lane or just report problems?
+
+The mistake to avoid is simple: a test can be important enough to run in CI
+without being important enough to block every PR or to sit inside the publish
+workflow.
+
+Example:
+
+- A live provider test may be too slow, flaky, or expensive for normal PR CI.
+- That does not make it a manual-only test.
+- It usually means the test belongs in release CI or scheduled CI instead.
+
+## CI lanes
+
+- `PR CI`: runs on pull requests or push validation when the touched surface
+  needs it. Use this for fast, high-signal checks that should catch regressions
+  before merge.
+- `Release CI`: runs before a release in a dedicated workflow lane. It may be
+  blocking or non-blocking, but it is still required CI. Use this for important
+  install, upgrade, compatibility, and provider checks that are too heavy for
+  normal PR workflows.
+- `Scheduled CI`: runs on a timer or on-demand to catch drift in providers,
+  third-party integrations, or long-running compatibility paths. Use this when
+  you want ongoing coverage but do not want every PR or release to wait on it.
+- `Manual only`: keep for debug, hardware-specific, or operator-driven VM work.
+  Do not put a suite here just because it is slower than a unit test.
+
+## End-to-end and live matrix
+
+| Suite                                                              | What it proves                                                                            | Expected CI lane                                                               | Blocking guidance                                                  |
+| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
+| `pnpm test`                                                        | Core unit, integration, and routed repo test coverage                                     | `PR CI`                                                                        | Blocking                                                           |
+| `pnpm test:install:smoke`                                          | Install script smoke plus packed tarball size checks                                      | `PR CI` when relevant; `release CI` before tags                                | Blocking                                                           |
+| `pnpm test:e2e`                                                    | Real gateway WS/HTTP/node pairing behavior                                                | `PR CI` when gateway or pairing changes                                        | Blocking when relevant                                             |
+| `pnpm test:docker:onboard`                                         | Interactive onboarding wizard, config creation, gateway startup, health                   | `PR CI` when onboarding/setup changes; otherwise `release CI`                  | Blocking when relevant                                             |
+| `pnpm test:docker:gateway-network`                                 | Two-container gateway auth and health path                                                | `PR CI` when gateway/network transport changes; otherwise `release CI`         | Blocking when relevant                                             |
+| `pnpm test:docker:mcp-channels`                                    | Real `openclaw mcp serve` bridge, routing, transcripts, notifications                     | `PR CI` when MCP/channel bridge surfaces change; otherwise `release CI`        | Blocking when relevant                                             |
+| `pnpm test:docker:plugins`                                         | Plugin install, `/plugin` alias behavior, restart semantics                               | `PR CI` when plugin runtime or install surfaces change; otherwise `release CI` | Blocking when relevant                                             |
+| `pnpm test:docker:doctor-switch`                                   | Repair and daemon switching between git and npm installs                                  | `release CI`                                                                   | Blocking for release work that touches install or doctor flows     |
+| `pnpm test:docker:qr`                                              | QR runtime compatibility under supported Docker Node versions                             | `release CI`                                                                   | Usually non-blocking, but still required CI                        |
+| `pnpm test:install:e2e`                                            | Full installer path with real onboarding-style flow in Docker                             | `release CI`                                                                   | Required CI; may live outside the publish workflow                 |
+| `OpenClaw Cross-OS Release Checks` workflow                        | Fresh install, packaged upgrade, installer fresh, dev update across macOS, Windows, Linux | `release CI`                                                                   | Required CI; keep separate from the publish workflow               |
+| Native Discord roundtrip in cross-OS release checks                | Real Discord send/readback after install or update                                        | `release CI`                                                                   | Usually non-blocking, but still required CI when enabled           |
+| `pnpm test:docker:openwebui`                                       | OpenClaw behind Open WebUI with a real proxied chat                                       | `release CI` and `scheduled CI`                                                | Non-blocking is fine; do not drop it from CI                       |
+| `pnpm test:live`                                                   | Real provider/model behavior with live credentials                                        | `scheduled CI` and `release CI` when provider risk matters                     | Non-blocking is fine; do not make "not CI-stable" mean manual-only |
+| `pnpm test:docker:live-models` and `pnpm test:docker:live-gateway` | Live provider coverage inside repo Docker images                                          | `scheduled CI` and `release CI` when provider/gateway risk matters             | Non-blocking is fine                                               |
+| `pnpm test:docker:live-cli-backend`                                | Real CLI backend compatibility inside Docker                                              | `scheduled CI`                                                                 | Non-blocking is fine                                               |
+| `pnpm test:docker:live-acp-bind`                                   | ACP bind compatibility against real agent backends                                        | `scheduled CI`                                                                 | Non-blocking is fine                                               |
+| `pnpm test:docker:live-codex-harness`                              | Codex app-server harness compatibility                                                    | `scheduled CI`                                                                 | Non-blocking is fine                                               |
+| `test:parallels:*`                                                 | VM-specific host/guest install and upgrade smoke                                          | `manual only` unless a dedicated VM CI lane exists                             | Manual/operator lane                                               |
+
+## Change policy
+
+When you add or move an end-to-end or live suite:
+
+1. Update this matrix in the same PR.
+2. Update the owning workflow or add the missing lane.
+3. Update any release or maintainer docs that point to the suite.
+
+If current workflows lag behind this matrix, treat that as follow-up work to
+close rather than as permission to quietly downgrade the suite to manual-only.
Author	SHA1	Message	Date
Onur Solmaz	1eb2fbb618	Docs: keep CI backlog out of policy doc	2026-04-16 18:27:08 +02:00
Onur Solmaz	0f9058b1b7	Docs: list remaining CI coverage gaps	2026-04-16 18:18:32 +02:00
Onur Solmaz	42c822286b	Docs: drop unrelated instruction edits	2026-04-16 18:03:39 +02:00
Onur Solmaz	d0472a8da1	Docs: add context to testing CI policy	2026-04-16 17:59:27 +02:00
Onur Solmaz	a84b0515cd	Docs: clarify testing CI policy	2026-04-16 17:46:25 +02:00