Compare commits

...

5 Commits

Author SHA1 Message Date
Onur Solmaz
1eb2fbb618 Docs: keep CI backlog out of policy doc 2026-04-16 18:27:08 +02:00
Onur Solmaz
0f9058b1b7 Docs: list remaining CI coverage gaps 2026-04-16 18:18:32 +02:00
Onur Solmaz
42c822286b Docs: drop unrelated instruction edits 2026-04-16 18:03:39 +02:00
Onur Solmaz
d0472a8da1 Docs: add context to testing CI policy 2026-04-16 17:59:27 +02:00
Onur Solmaz
a84b0515cd Docs: clarify testing CI policy 2026-04-16 17:46:25 +02:00
7 changed files with 114 additions and 5 deletions

View File

@@ -207,6 +207,10 @@
"source": "Release Policy",
"target": "发布策略"
},
{
"source": "Testing CI Policy",
"target": "测试 CI 策略"
},
{
"source": "Release policy",
"target": "发布策略"

View File

@@ -496,7 +496,11 @@
},
{
"group": "发布策略",
"pages": ["zh-CN/reference/RELEASING", "zh-CN/reference/test"]
"pages": [
"zh-CN/reference/RELEASING",
"zh-CN/reference/testing-ci-policy",
"zh-CN/reference/test"
]
}
]
},

View File

@@ -1573,7 +1573,7 @@
},
{
"group": "Release policy",
"pages": ["reference/RELEASING", "reference/test"]
"pages": ["reference/RELEASING", "reference/testing-ci-policy", "reference/test"]
}
]
},

View File

@@ -348,6 +348,8 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
- Catch provider format changes, tool-calling quirks, auth issues, and rate limit behavior
- Expectations:
- Not CI-stable by design (real networks, real provider policies, quotas, outages)
- That usually means "run this in release or scheduled CI instead of PR CI"
- Do not read this as "only run it by hand"
- Costs money / uses rate limits
- Prefer running narrowed subsets instead of “everything”
- Live runs source `~/.profile` to pick up missing API keys.
@@ -846,6 +848,10 @@ These Docker runners split into two buckets:
explicitly want the larger exhaustive scan.
- `test:docker:all` builds the live Docker image once via `test:docker:live-build`, then reuses it for the two live Docker lanes.
- Container smoke runners: `test:docker:openwebui`, `test:docker:onboard`, `test:docker:gateway-network`, `test:docker:mcp-channels`, and `test:docker:plugins` boot one or more real containers and verify higher-level integration paths.
- Use [Testing CI Policy](/reference/testing-ci-policy) to decide whether a
runner belongs in `PR CI`, `release CI`, `scheduled CI`, or `manual only`.
The important part is this: a suite can be required in CI even when it does
not block every PR or live in the publish workflow.
The live-model Docker runners also bind-mount only the needed CLI auth homes (or all supported ones when the run is not narrowed), then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:
@@ -883,6 +889,9 @@ This lane expects a usable live model key, and `OPENCLAW_PROFILE_FILE`
(`~/.profile` by default) is the primary way to provide it in Dockerized runs.
Successful runs print a small JSON payload like `{ "ok": true, "model":
"openclaw/default", ... }`.
Keep this in release or scheduled CI if it matters for the product surface you
are changing. Being slower than a normal PR lane is not a reason to quietly
drop it to manual-only.
`test:docker:mcp-channels` is intentionally deterministic and does not need a
real Telegram, Discord, or iMessage account. It boots a seeded Gateway
container, starts a second container that spawns `openclaw mcp serve`, then

View File

@@ -51,6 +51,15 @@ OpenClaw has three public release lanes:
- This split is intentional: keep the real npm release path short,
deterministic, and artifact-focused, while slower live checks stay in their
own lane so they do not stall or block publish
- Use [Testing CI Policy](/reference/testing-ci-policy) as the source of truth
for which end-to-end and live suites belong in `PR CI`, `release CI`,
`scheduled CI`, or `manual only`.
- Read that split literally:
- the publish workflow is the short path that prepares and promotes artifacts
- other important end-to-end checks can still be required CI in release or
scheduled workflows
- In other words, "not in the publish workflow" does not mean "manual only."
It often means "run it in a different CI lane."
- Release checks must be dispatched from the `main` workflow ref so the
workflow logic and secrets stay canonical
- That workflow accepts either an existing release tag or the current full

View File

@@ -8,6 +8,7 @@ title: "Tests"
# Tests
- Full testing kit (suites, live, Docker): [Testing](/help/testing)
- CI placement for end-to-end and live suites: [Testing CI Policy](/reference/testing-ci-policy)
- `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests dont collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
- `pnpm test:coverage`: Runs the unit suite with V8 coverage (via `vitest.unit.config.ts`). Global thresholds are 70% lines/branches/functions/statements. Coverage excludes integration-heavy entrypoints (CLI wiring, gateway/telegram bridges, webchat static server) to keep the target focused on unit-testable logic.
@@ -28,9 +29,9 @@ title: "Tests"
- `pnpm test:perf:profile:main`: writes a CPU profile for the Vitest main thread (`.artifacts/vitest-main-profile`).
- `pnpm test:perf:profile:runner`: writes CPU + heap profiles for the unit runner (`.artifacts/vitest-runner-profile`).
- Gateway integration: opt-in via `OPENCLAW_TEST_INCLUDE_GATEWAY=1 pnpm test` or `pnpm test:gateway`.
- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs.
- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip.
- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key (for example OpenAI in `~/.profile`), pulls an external Open WebUI image, and is not expected to be CI-stable like the normal unit/e2e suites.
- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs. This is normal CI coverage, not just a local debugging command.
- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip. These tests are often too noisy or expensive for PR CI, but that usually means "release or scheduled CI," not "skip CI."
- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key (for example OpenAI in `~/.profile`) and pulls an external Open WebUI image. Treat it as a release or scheduled CI compatibility check rather than as a blocking PR lane.
- `pnpm test:docker:mcp-channels`: Starts a seeded Gateway container and a second client container that spawns `openclaw mcp serve`, then verifies routed conversation discovery, transcript reads, attachment metadata, live event queue behavior, outbound send routing, and Claude-style channel + permission notifications over the real stdio bridge. The Claude notification assertion reads the raw stdio MCP frames directly so the smoke reflects what the bridge actually emits.
## Local PR gate

View File

@@ -0,0 +1,82 @@
---
summary: "Source of truth for where end-to-end and live tests belong in CI"
read_when:
- Deciding whether an end-to-end or live suite belongs in CI
- Adding or moving Docker, release, or live-provider coverage
title: "Testing CI Policy"
---
# Testing CI Policy
This page is the source of truth for where OpenClaw end-to-end and live suites
belong.
Use this page to answer one practical question: when we have a real-world test,
where should it run?
Work through the questions in this order:
1. Do we need this test to protect users from a real regression?
2. If yes, where should it run: on PRs, before releases, on a schedule, or
only by hand?
3. If it runs in CI, should it fail the lane or just report problems?
The mistake to avoid is simple: a test can be important enough to run in CI
without being important enough to block every PR or to sit inside the publish
workflow.
Example:
- A live provider test may be too slow, flaky, or expensive for normal PR CI.
- That does not make it a manual-only test.
- It usually means the test belongs in release CI or scheduled CI instead.
## CI lanes
- `PR CI`: runs on pull requests or push validation when the touched surface
needs it. Use this for fast, high-signal checks that should catch regressions
before merge.
- `Release CI`: runs before a release in a dedicated workflow lane. It may be
blocking or non-blocking, but it is still required CI. Use this for important
install, upgrade, compatibility, and provider checks that are too heavy for
normal PR workflows.
- `Scheduled CI`: runs on a timer or on-demand to catch drift in providers,
third-party integrations, or long-running compatibility paths. Use this when
you want ongoing coverage but do not want every PR or release to wait on it.
- `Manual only`: keep for debug, hardware-specific, or operator-driven VM work.
Do not put a suite here just because it is slower than a unit test.
## End-to-end and live matrix
| Suite | What it proves | Expected CI lane | Blocking guidance |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | ------------------------------------------------------------------ |
| `pnpm test` | Core unit, integration, and routed repo test coverage | `PR CI` | Blocking |
| `pnpm test:install:smoke` | Install script smoke plus packed tarball size checks | `PR CI` when relevant; `release CI` before tags | Blocking |
| `pnpm test:e2e` | Real gateway WS/HTTP/node pairing behavior | `PR CI` when gateway or pairing changes | Blocking when relevant |
| `pnpm test:docker:onboard` | Interactive onboarding wizard, config creation, gateway startup, health | `PR CI` when onboarding/setup changes; otherwise `release CI` | Blocking when relevant |
| `pnpm test:docker:gateway-network` | Two-container gateway auth and health path | `PR CI` when gateway/network transport changes; otherwise `release CI` | Blocking when relevant |
| `pnpm test:docker:mcp-channels` | Real `openclaw mcp serve` bridge, routing, transcripts, notifications | `PR CI` when MCP/channel bridge surfaces change; otherwise `release CI` | Blocking when relevant |
| `pnpm test:docker:plugins` | Plugin install, `/plugin` alias behavior, restart semantics | `PR CI` when plugin runtime or install surfaces change; otherwise `release CI` | Blocking when relevant |
| `pnpm test:docker:doctor-switch` | Repair and daemon switching between git and npm installs | `release CI` | Blocking for release work that touches install or doctor flows |
| `pnpm test:docker:qr` | QR runtime compatibility under supported Docker Node versions | `release CI` | Usually non-blocking, but still required CI |
| `pnpm test:install:e2e` | Full installer path with real onboarding-style flow in Docker | `release CI` | Required CI; may live outside the publish workflow |
| `OpenClaw Cross-OS Release Checks` workflow | Fresh install, packaged upgrade, installer fresh, dev update across macOS, Windows, Linux | `release CI` | Required CI; keep separate from the publish workflow |
| Native Discord roundtrip in cross-OS release checks | Real Discord send/readback after install or update | `release CI` | Usually non-blocking, but still required CI when enabled |
| `pnpm test:docker:openwebui` | OpenClaw behind Open WebUI with a real proxied chat | `release CI` and `scheduled CI` | Non-blocking is fine; do not drop it from CI |
| `pnpm test:live` | Real provider/model behavior with live credentials | `scheduled CI` and `release CI` when provider risk matters | Non-blocking is fine; do not make "not CI-stable" mean manual-only |
| `pnpm test:docker:live-models` and `pnpm test:docker:live-gateway` | Live provider coverage inside repo Docker images | `scheduled CI` and `release CI` when provider/gateway risk matters | Non-blocking is fine |
| `pnpm test:docker:live-cli-backend` | Real CLI backend compatibility inside Docker | `scheduled CI` | Non-blocking is fine |
| `pnpm test:docker:live-acp-bind` | ACP bind compatibility against real agent backends | `scheduled CI` | Non-blocking is fine |
| `pnpm test:docker:live-codex-harness` | Codex app-server harness compatibility | `scheduled CI` | Non-blocking is fine |
| `test:parallels:*` | VM-specific host/guest install and upgrade smoke | `manual only` unless a dedicated VM CI lane exists | Manual/operator lane |
## Change policy
When you add or move an end-to-end or live suite:
1. Update this matrix in the same PR.
2. Update the owning workflow or add the missing lane.
3. Update any release or maintainer docs that point to the suite.
If current workflows lag behind this matrix, treat that as follow-up work to
close rather than as permission to quietly downgrade the suite to manual-only.