Compare commits

...

123 Commits

Author SHA1 Message Date
pash
b23ff97ddc Merge remote-tracking branch 'origin/main' into codex/session-transcript-oom-guard
# Conflicts:
#	src/docker-build-cache.test.ts
#	src/scripts/test-projects.test.ts
2026-04-26 17:02:51 -07:00
pash
e705246619 Fix core support boundary test expectations 2026-04-26 17:00:59 -07:00
pash
f936f16cc5 Merge remote-tracking branch 'origin/main' into codex/session-transcript-oom-guard 2026-04-26 16:59:22 -07:00
Peter Steinberger
d2786fb969 test(docker): run observability harness with global tsx 2026-04-27 00:57:55 +01:00
Peter Steinberger
fa0729e145 test: auto-discover vitest suites 2026-04-27 00:55:06 +01:00
pash
fd48faa4ed Fix qa-lab merge CI and compaction review notes 2026-04-26 16:53:15 -07:00
Peter Steinberger
21c51bc140 test(docker): resolve otel decoder from plugin runtime 2026-04-27 00:51:47 +01:00
Vincent Koc
265bc6b6ea test(plugins): guard command cold registry paths
Add command-level sentinel coverage proving channel setup metadata, onboarding auth choices, and models-list provider ownership stay on manifest/registry paths without importing plugin runtime.\n\nLocal verification:\n- pnpm exec oxfmt --check --threads=1 src/commands/plugin-control-plane-cold-imports.test.ts\n- OPENCLAW_LOCAL_CHECK_MODE=throttled pnpm test:serial src/commands/plugin-control-plane-cold-imports.test.ts\n- OPENCLAW_LOCAL_CHECK_MODE=throttled pnpm check:changed\n- clean rebase sanity: git diff --check origin/main...HEAD\n\nPR CI had known unrelated main-red failures matching latest main run 24970053892; the new sentinel test passed in CI.
2026-04-26 16:51:36 -07:00
Peter Steinberger
42db865673 test(docker): run observability on shared image 2026-04-27 00:49:36 +01:00
Vincent Koc
5d7c6e6bda test(docker): add observability smoke
Add Docker aggregate observability coverage for QA-lab OTEL and Prometheus diagnostics.
2026-04-26 16:43:56 -07:00
pash
29f1cae867 Fix qa-lab private qa-channel import 2026-04-26 16:43:55 -07:00
Tak Hoffman
560ddd2f9b Fail package update on unhealthy restart (#72422) 2026-04-26 18:38:23 -05:00
pash
f58dd36a1d Harden session truncation concurrency guards 2026-04-26 16:34:05 -07:00
Peter Steinberger
998e37fcb3 ci: allow installer smoke baseline override 2026-04-27 00:31:30 +01:00
pash
33e3dccbea Fix update CLI service state test mocks 2026-04-26 16:29:55 -07:00
Vincent Koc
3cc52d9050 docs(changelog): note codex usage accounting fix 2026-04-26 16:27:23 -07:00
Vincent Koc
7902c769da fix(codex): normalize cached harness input tokens 2026-04-26 16:27:23 -07:00
Peter Steinberger
9be8d43c31 docs: document installer recovery cleanup 2026-04-27 00:26:02 +01:00
Peter Steinberger
eccb79db99 build: remove private QA package compat shims 2026-04-27 00:26:02 +01:00
pash
6fc954539f Harden session truncation rewrite 2026-04-26 16:20:29 -07:00
pash
fc13a0135e Fix stale e2e Docker cache test 2026-04-26 16:14:58 -07:00
pash
0ced62f512 Fix transcript truncation OOM guard 2026-04-26 16:10:10 -07:00
Peter Steinberger
09a635a28b test: fix main release validation forward-port 2026-04-27 00:07:31 +01:00
Peter Steinberger
5b257cb352 test(qa): drop brittle telegram workflow assertions
(cherry picked from commit b02fdb8264)
2026-04-27 00:07:31 +01:00
Peter Steinberger
efe940e9cb ci(qa): remove telegram beta approval gate
(cherry picked from commit 5e04b0f97a)
2026-04-27 00:07:31 +01:00
Peter Steinberger
8d909ed0da ci(docker): pass beta env to installer e2e
(cherry picked from commit 7677b4ca24)
2026-04-27 00:07:31 +01:00
Peter Steinberger
1bb46ce68a ci(docker): test release installer against beta
(cherry picked from commit d8c4dcb6a4)
2026-04-27 00:07:31 +01:00
Peter Steinberger
54e77a9ec4 ci(docker): use resolved pnpm for scheduled lanes
(cherry picked from commit 61a539a1b7)
2026-04-27 00:07:31 +01:00
Peter Steinberger
43e651db9a ci(docker): preserve pnpm path in scheduler lanes
(cherry picked from commit 2e8a089836)
2026-04-27 00:07:31 +01:00
Peter Steinberger
e7d069edcf test(qa): relax telegram mention reply assertion
(cherry picked from commit 7109251318)
2026-04-27 00:07:31 +01:00
Peter Steinberger
17094640f8 ci(release): trust release branch docker checks
(cherry picked from commit abf0ef9cd3)
2026-04-27 00:07:31 +01:00
Peter Steinberger
16c6a92c53 ci(release): allow npm telegram e2e from release branch
(cherry picked from commit 53f8e9de13)
2026-04-27 00:07:31 +01:00
Peter Steinberger
ef3309a986 fix(release): harden beta validation lanes
(cherry picked from commit 218bceaa14)
2026-04-27 00:07:31 +01:00
Peter Steinberger
95ae3c00bd docs: explain test routing model 2026-04-27 00:05:27 +01:00
Vincent Koc
97e64196a0 fix(hooks): use local timezone for session-memory filenames (#72408) 2026-04-26 16:04:10 -07:00
Peter Steinberger
41ad03dda4 fix(test): allow legacy qa inventory entry 2026-04-27 00:02:33 +01:00
Peter Steinberger
4a578740a2 refactor: deduplicate changed lane detection 2026-04-27 00:02:00 +01:00
Peter Steinberger
20d6daaeaa docs: document automatic bonjour container policy 2026-04-27 00:00:22 +01:00
Peter Steinberger
6018f29dbf ci: keep docker bonjour setting automatic 2026-04-27 00:00:22 +01:00
Peter Steinberger
989cfd1e33 fix(bonjour): auto-disable advertising in containers 2026-04-27 00:00:22 +01:00
Peter Steinberger
89ab39ca64 test: simplify changed test routing 2026-04-26 23:58:13 +01:00
Peter Steinberger
199d5f765f docs(test): explain cheap docker reruns 2026-04-26 23:56:14 +01:00
Peter Steinberger
2fe11020d2 refactor(test): split bundled channel docker scenarios 2026-04-26 23:56:14 +01:00
Peter Steinberger
1ddf6b4e39 ci: skip existing docker e2e images 2026-04-26 23:56:14 +01:00
Peter Steinberger
1a02d00eb4 test: add docker e2e rerun helpers 2026-04-26 23:56:14 +01:00
Peter Steinberger
cfe58387a7 docs: update changelog attribution guidance 2026-04-26 23:51:51 +01:00
Peter Steinberger
6077941d0b fix: restart package updates through updated install 2026-04-26 23:51:51 +01:00
Peter Steinberger
b5714b90ed refactor(test): share docker e2e shell helpers 2026-04-26 23:48:32 +01:00
Peter Steinberger
7a86448a6e ci: reuse docker e2e plan action 2026-04-26 23:48:32 +01:00
Peter Steinberger
6cba12caae test: add docker e2e planner guards 2026-04-26 23:48:32 +01:00
Rubén Cuevas
a08b65a90a fix(telegram): send fresh finals for stale previews (#72038)
* fix(telegram): send fresh finals for stale previews

* test(telegram): cover stale preview send fallback

* fix(telegram): keep stale archived preview fallback

* fix(telegram): clear stale active previews

* fix(telegram): reset preview state after fresh finals
2026-04-26 15:44:30 -07:00
Peter Steinberger
084dde89fd docs: clarify extension ownership boundaries 2026-04-26 23:39:18 +01:00
Peter Steinberger
2efc4a8233 docs(test): document docker e2e layout 2026-04-26 23:36:31 +01:00
Peter Steinberger
cd417f3b68 ci: derive docker e2e artifacts from plan 2026-04-26 23:36:31 +01:00
Peter Steinberger
a2adb05f74 refactor(test): split docker e2e planner 2026-04-26 23:36:31 +01:00
Peter Steinberger
c9c0ab3a44 fix(bonjour): keep ciao failure handling extension-owned 2026-04-26 23:29:40 +01:00
Peter Steinberger
0472b6197a chore: clarify bonjour fatal guard naming 2026-04-26 23:27:35 +01:00
Peter Steinberger
8a60e57846 fix: keep bonjour failures non-fatal 2026-04-26 23:27:08 +01:00
Vincent Koc
c6cf37068c fix(feishu): repair interactive card content extraction (#72397) 2026-04-26 15:26:53 -07:00
Peter Steinberger
ff6044f441 docs(changelog): note Ollama thinking validation fix 2026-04-26 23:25:05 +01:00
Peter Steinberger
5aa3779d8c ci: disable bonjour in install e2e docker 2026-04-26 23:20:08 +01:00
Peter Steinberger
ff9fefb79b fix(agents): validate thinking with model catalog 2026-04-26 23:16:05 +01:00
Peter Steinberger
3746e5b969 ci: cap Telegram E2E build cache 2026-04-26 23:11:21 +01:00
Peter Steinberger
9f5bc5465c style: format codex and loader tests 2026-04-26 23:10:33 +01:00
Peter Steinberger
d108110a89 ci: use packaged tarball for docker e2e 2026-04-26 23:10:33 +01:00
Peter Steinberger
1b1eea238c ci: preserve docker test runner path 2026-04-26 23:04:21 +01:00
Vincent Koc
d9e9e61e77 fix(logging): skip unserializable file log message parts 2026-04-26 15:01:19 -07:00
Vincent Koc
fc0e6e4650 docs(logging): document structured file fields 2026-04-26 15:01:19 -07:00
Vincent Koc
e8df081a1f feat(logging): add file log correlation fields 2026-04-26 15:01:19 -07:00
github-actions[bot]
5c4c33c7de chore(ui): refresh th control ui locale 2026-04-26 22:01:03 +00:00
Vincent Koc
070b55f336 UI: localize command palette labels (#72378) 2026-04-26 14:58:16 -07:00
Vincent Koc
364d49889e fix: allow trusted exec approvals home symlinks (#72377) 2026-04-26 14:57:01 -07:00
Peter Steinberger
baaad52389 ci: split docker e2e images 2026-04-26 22:55:00 +01:00
Peter Steinberger
3a8961af0f test: copy docker build helper in setup e2e 2026-04-26 22:54:27 +01:00
Peter Steinberger
ff570f3a61 fix(ollama): expose native thinking efforts 2026-04-26 22:49:13 +01:00
Peter Steinberger
2cd23957c0 build: use slim docker runtime 2026-04-26 22:47:48 +01:00
Vincent Koc
43a003b8a0 fix: short-circuit live model switch fallback redirects (#72375) 2026-04-26 14:45:02 -07:00
Vincent Koc
fa85e6c26e docs(changelog): note acp stdout fix 2026-04-26 14:42:37 -07:00
Vincent Koc
d46de6cff7 fix(acp): keep server logs off stdout 2026-04-26 14:42:22 -07:00
Peter Steinberger
018f2e78ba build: skip docker apt upgrades 2026-04-26 22:40:44 +01:00
Peter Steinberger
b61954919c ci: verify docker release attestations 2026-04-26 22:40:44 +01:00
Peter Steinberger
5abb717112 docs: add OpenClaw testing skill 2026-04-26 22:40:32 +01:00
Vincent Koc
8226238765 refactor(plugins): share lookup cache eviction 2026-04-26 14:28:15 -07:00
Peter Steinberger
b68b4b9151 ci: add targeted docker lane reruns 2026-04-26 22:27:45 +01:00
Josh Lehman
a3c51f91c5 fix: isolate cron context-engine session keys (#72292) 2026-04-26 14:21:01 -07:00
Vincent Koc
2edbdc42ae refactor(plugins): isolate loader cache state 2026-04-26 14:16:35 -07:00
Peter Steinberger
b28de9a7d9 ci: centralize docker build wrapper 2026-04-26 22:14:36 +01:00
Peter Steinberger
824c3e2b71 ci: enable docker image attestations 2026-04-26 22:14:36 +01:00
Vincent Koc
2194a8c64c docs(logging): document request trace scopes 2026-04-26 14:13:15 -07:00
Vincent Koc
410783c126 fix(diagnostics): chain run traces to request scope 2026-04-26 14:13:15 -07:00
Vincent Koc
3ae6f01d61 feat(logging): propagate request trace scopes 2026-04-26 14:13:14 -07:00
Peter Steinberger
e3cbad4fb6 ci: fix ACPX Docker update repair target 2026-04-26 22:13:00 +01:00
Peter Steinberger
c082cf892a docs: codify formatter tooling 2026-04-26 22:02:31 +01:00
Peter Steinberger
b4a9ac3516 ci: run release Docker chunks through scheduler 2026-04-26 22:02:31 +01:00
Vincent Koc
f0566e410a docs(diagnostics): document model call size timing 2026-04-26 13:43:22 -07:00
Vincent Koc
c6e9849351 feat(diagnostics): capture model call size timing 2026-04-26 13:43:22 -07:00
Vincent Koc
8e1755928c refactor(plugins): split plugin registry facade 2026-04-26 13:43:22 -07:00
Vincent Koc
9eb071c3f1 perf(plugins): reuse persisted registry fallback read 2026-04-26 13:43:22 -07:00
Vincent Koc
522eedc754 refactor(plugins): make provider discovery runtime explicit 2026-04-26 13:43:21 -07:00
Vincent Koc
71e361af8a refactor(plugins): split installed plugin index modules 2026-04-26 13:43:21 -07:00
Peter Steinberger
487f8c5d3a test(gateway): skip codex acp bind when auth is unavailable 2026-04-26 21:42:49 +01:00
Peter Steinberger
7a4574376a fix(ollama): honor native model capabilities 2026-04-26 21:40:22 +01:00
Josh Lehman
8ba82534e6 fix: preserve cron telegram topic delivery after timeout (#72317) 2026-04-26 13:30:54 -07:00
Peter Steinberger
ffa84cdc02 ci: chunk release Docker e2e jobs 2026-04-26 21:23:08 +01:00
pash-openai
67ffa3df8b Add Codex Computer Use setup for Codex mode (#71842)
* Add Codex Computer Use setup

* Tighten Codex Computer Use setup checks

* Handle fresh Codex Computer Use marketplace setup

* Fix channel setup manifest fixture

* Match Codex Computer Use marketplace loading

* Harden plugin manifest test fixtures

* Isolate auth choice legacy manifest test

* Update aggregate shard test expectation

* Improve Codex Computer Use first-run setup

* Harden Codex Computer Use auto-install

* Fix plugin auto-enable test fixture roots
2026-04-26 13:21:56 -07:00
Vincent Koc
df542f75a9 fix(logging): expose trace fields in file logs 2026-04-26 12:52:04 -07:00
Peter Steinberger
edf40ab6c9 test(gateway): retry gemini acp startup warmup timeout 2026-04-26 20:50:06 +01:00
Vincent Koc
406ae72fd2 fix(logging): redact persisted transcript text 2026-04-26 12:12:44 -07:00
Peter Steinberger
f99fb2af86 test(gateway): wait longer for codex harness subagent start 2026-04-26 20:11:16 +01:00
Peter Steinberger
244628f467 docs: clarify PR triage comments 2026-04-26 19:48:22 +01:00
Sally O'Malley
637bd33e69 fix(diagnostics): defer OTEL run span finalization (#72260) 2026-04-26 11:29:05 -07:00
Vincent Koc
e53c068d78 fix: repair skills and memory watcher refresh paths 2026-04-26 11:21:21 -07:00
Peter Steinberger
4e181d30fa test(gateway): classify stream fallback as empty live response 2026-04-26 19:15:00 +01:00
Peter Steinberger
e60cc50dff test(gateway): harden acp bind docker smoke 2026-04-26 19:14:58 +01:00
Peter Steinberger
f2dab9b334 fix(agents): keep responses web search reasoning compatible 2026-04-26 19:14:55 +01:00
Peter Steinberger
fc6cfbd418 fix(agents): honor bundle mcp tool allowlist 2026-04-26 19:14:51 +01:00
Vincent Koc
480a3f66c9 fix: shortcut live session model redirects during fallback 2026-04-26 11:14:05 -07:00
Vincent Koc
19e41a1e69 docs(logging): clarify redaction surfaces 2026-04-26 11:09:56 -07:00
Vincent Koc
b4cdd55f62 fix(discord): escalate repeated health-monitor restarts 2026-04-26 11:09:03 -07:00
Vincent Koc
6b6dcafcee fix(webchat): support non-image file attachments 2026-04-26 10:58:24 -07:00
Vincent Koc
303cde8f60 fix(auto-reply): poison inbound dedupe after partial turn failure
* fix(auto-reply): poison inbound dedupe after replay-unsafe failures

* fix(clownfish): address review for ghcrawl-165980-agentic-merge (1)
2026-04-26 10:58:19 -07:00
Vincent Koc
e672b61417 fix(whatsapp): stop reconnecting quiet sockets
Fixes #70678.\n\nKeeps quiet but healthy WhatsApp linked-device sessions connected by tracking WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Also cleans up transport activity listeners on failed connection-open paths.\n\nCarries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.\n\nValidation:\n- pnpm test:serial extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts extensions/whatsapp/src/connection-controller.test.ts\n- pnpm check:changed\n- codex review --base origin/main
2026-04-26 09:51:41 -07:00
Peter Steinberger
4a3030df9e fix: avoid PowerShell error variable collision 2026-04-26 16:26:31 +01:00
356 changed files with 15840 additions and 6053 deletions

View File

@@ -325,9 +325,11 @@ node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
- Docker install/update coverage that exercises the published beta package
- published npm Telegram proof: dispatch Actions > `NPM Telegram Beta E2E`
from `main` with `package_spec=openclaw@<beta-version>` and
`provider_mode=mock-openai`, approve `npm-release`, and require success.
This is the default button path for installed-package onboarding,
Telegram setup, and real Telegram E2E against the published npm package.
`provider_mode=mock-openai`, and require success. This workflow is
maintainer-dispatched and intentionally has no `npm-release` approval gate;
`qa-live-shared` only supplies the shared QA secrets. This is the default
button path for installed-package onboarding, Telegram setup, and real
Telegram E2E against the published npm package.
Use the local `pnpm test:docker:npm-telegram-live` lane with the matching
`OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC` and Convex CI env only as a fallback
or debugging path.

View File

@@ -0,0 +1,244 @@
---
name: openclaw-testing
description: Choose, run, rerun, or debug OpenClaw tests, CI checks, Docker E2E lanes, release validation, and the cheapest safe verification path.
---
# OpenClaw Testing
Use this skill when deciding what to test, debugging failures, rerunning CI,
or validating a change without wasting hours.
## Read First
- `docs/reference/test.md` for local test commands.
- `docs/ci.md` for CI scope, release checks, Docker chunks, and runner behavior.
- Scoped `AGENTS.md` files before editing code under a subtree.
## Default Rule
Prove the touched surface first. Do not reflexively run the whole suite.
1. Inspect the diff and classify the touched surface:
- source: `pnpm changed:lanes --json`, then `pnpm check:changed`
- tests only: `pnpm test:changed`
- one failing file: `pnpm test <path-or-filter> -- --reporter=verbose`
- workflow-only: `git diff --check`, workflow syntax/lint (`actionlint` when available)
- docs-only: `pnpm docs:list`, docs formatter/lint only if docs tooling changed or requested
2. Reproduce narrowly before fixing.
3. Fix root cause.
4. Rerun the same narrow proof.
5. Broaden only when the touched contract demands it.
## Guardrails
- Do not kill unrelated processes or tests. If something is running elsewhere, treat it as owned by the user or another agent.
- Do not run expensive local Docker, full release checks, full `pnpm test`, or full `pnpm check` unless the user asks or the change genuinely requires it.
- Prefer GitHub Actions for release/Docker proof when the workflow already has the prepared image and secrets.
- Use `scripts/committer "<msg>" <paths...>` when committing; stage only your files.
- If deps are missing, run `pnpm install`, retry once, then report the first actionable error.
## Local Test Shortcuts
```bash
pnpm changed:lanes --json
pnpm check:changed # changed typecheck/lint/guards; no Vitest
pnpm test:changed # cheap smart changed Vitest targets
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
pnpm test <path-or-filter> -- --reporter=verbose
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test <path-or-filter>
```
Use targeted file paths whenever possible. Avoid raw `vitest`; use the repo
`pnpm test` wrapper so project routing, workers, and setup stay correct.
## Command Semantics
- `pnpm check` and `pnpm check:changed` do not run Vitest tests. They are for
typecheck, lint, and guard proof.
- `pnpm test` and `pnpm test:changed` run Vitest tests.
- `pnpm test:changed` is intentionally cheap by default: direct test edits,
sibling tests, explicit source mappings, and import-graph dependents.
- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` is the explicit broad
fallback for harness/config/package edits that genuinely need it.
- Do not run extension sweeps just because core changed. If a core edit is for a
specific plugin bug, run that plugin's tests explicitly. If a public SDK or
contract change needs consumer proof, choose the smallest representative
plugin/contract tests first, then broaden only when the risk justifies it.
- The test wrapper prints a short `[test] passed|failed|skipped ... in ...`
line. Vitest's own duration is still the per-shard detail.
## Routing Model
- `pnpm changed:lanes --json` answers "which check lanes does this diff touch?"
It is used by `pnpm check:changed` for typecheck/lint/guard selection.
- `pnpm test:changed` answers "which Vitest targets are worth running now?" It
uses the same changed path list, but applies a cheaper test-target resolver.
- Direct test edits run themselves. Source edits prefer explicit mappings,
sibling `*.test.ts`, then import-graph dependents. Shared harness/config/root
edits are skipped by default unless they have precise mapped tests.
- Public SDK or contract edits do not automatically run every plugin test.
`check:changed` proves extension type contracts; the agent chooses the
smallest plugin/contract Vitest proof that matches the actual risk.
- Use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when a harness,
config, package, or unknown-root edit really needs the broad Vitest fallback.
## CI Debugging
Start with current run state, not logs for everything:
```bash
gh run list --branch main --limit 10
gh run view <run-id> --json status,conclusion,headSha,url,jobs
gh run view <run-id> --job <job-id> --log
```
- Check exact SHA. Ignore newer unrelated `main` unless asked.
- For cancelled same-branch runs, confirm whether a newer run superseded it.
- Fetch full logs only for failed or relevant jobs.
## Docker
Docker is expensive. First inspect the scheduler without running Docker:
```bash
OPENCLAW_DOCKER_ALL_DRY_RUN=1 pnpm test:docker:all
OPENCLAW_DOCKER_ALL_DRY_RUN=1 OPENCLAW_DOCKER_ALL_LANES=install-e2e pnpm test:docker:all
OPENCLAW_DOCKER_ALL_LANES=install-e2e node scripts/test-docker-all.mjs --plan-json
```
Run one failed lane locally only when explicitly asked or when GitHub is not
usable:
```bash
OPENCLAW_DOCKER_ALL_LANES=<lane> \
OPENCLAW_DOCKER_ALL_BUILD=0 \
OPENCLAW_DOCKER_ALL_PREFLIGHT=0 \
OPENCLAW_SKIP_DOCKER_BUILD=1 \
OPENCLAW_DOCKER_E2E_BARE_IMAGE='<prepared-bare-image>' \
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE='<prepared-functional-image>' \
pnpm test:docker:all
```
For release validation, prefer the reusable GitHub workflow input:
```yaml
docker_lanes: install-e2e
```
Multiple lanes are allowed:
```yaml
docker_lanes: install-e2e bundled-channel-update-acpx
```
That skips the three chunk matrix and runs one targeted Docker job against the
prepared GHCR images and a fresh OpenClaw npm tarball for the selected ref.
Reruns usually need that new tarball because the fix being tested changed the
package contents even if the SHA-tagged GHCR Docker image can be reused.
Live-only targeted reruns skip the E2E images and build only the live-test
image. Release-path normal mode remains max three Docker chunk jobs:
- `core`
- `package-update`
- `plugins-integrations`
Docker E2E images never copy repo sources as the app under test: the bare image
is a Node/Git runner, and the functional image installs the same prebuilt npm
tarball that bare lanes mount. `scripts/package-openclaw-for-docker.mjs` is the
single packer for local scripts and CI and validates the tarball inventory
before Docker consumes it. `scripts/test-docker-all.mjs --plan-json` is the
scheduler-owned CI plan for image kind, package, live image, lane, and
credential needs. Docker lane definitions live in the single scenario catalog
`scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in
`scripts/lib/docker-e2e-plan.mjs`. `scripts/docker-e2e.mjs` converts plan and
summary JSON into GitHub outputs and step summaries. Every scheduler run writes
`.artifacts/docker-tests/**/summary.json` plus `failures.json`. Read those
before rerunning. Lane entries include `command`, `rerunCommand`, status,
timing, timeout state, image kind, and log file path. The summary also includes
top-level phase timings for preflight, image build, package prep, lane pools,
and cleanup. Use `pnpm test:docker:timings <summary.json>` to rank slow lanes
and phases before deciding whether a broader rerun is justified.
## Cheap Docker Reruns
First derive the smallest rerun command from artifacts:
```bash
pnpm test:docker:rerun <github-run-id>
pnpm test:docker:rerun .artifacts/docker-tests/<run>/failures.json
```
The script downloads Docker E2E artifacts for a GitHub run, reads
`summary.json`/`failures.json`, and prints a combined targeted workflow command
plus per-lane commands. Prefer the combined targeted command when several lanes
failed for the same patch:
```bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
-f ref=<sha> \
-f include_repo_e2e=false \
-f include_release_path_suites=false \
-f include_openwebui=false \
-f docker_lanes='install-e2e bundled-channel-update-acpx' \
-f include_live_suites=false \
-f live_models_only=false
```
That path still runs the prepare job, so it creates a new tarball for `<sha>`.
If the SHA-tagged GHCR bare/functional image already exists, CI skips rebuilding
that image and only uploads the fresh package artifact before the targeted lane
job. Do not rerun the full three-chunk release path unless the failed lane list
or touched surface really requires it.
## Docker Expected Timings
Treat these as ballpark. Blacksmith queue time, GHCR pull speed, provider
latency, npm cache state, and Docker daemon health can dominate.
Current local timing artifact (`.artifacts/docker-tests/lane-timings.json`) has
these rough bands:
- Tiny lanes, seconds to under 1 minute:
`agents-delete-shared-workspace` ~3s, `plugin-update` ~7s,
`config-reload` ~14s, `pi-bundle-mcp-tools` ~15s, `onboard` ~18s,
`session-runtime-context` ~20s, `gateway-network` ~34s, `qr` ~44s.
- Medium deterministic lanes, ~1-5 minutes:
`npm-onboard-channel-agent` ~96s, `openai-image-auth` ~99s,
bundled channel/update lanes usually ~90-300s, `openwebui` ~225s,
`mcp-channels` ~274s.
- Heavy deterministic lanes, ~6-10 minutes:
`bundled-channel-root-owned` ~429s,
`bundled-channel-setup-entry` ~420s,
`bundled-channel-load-failure` ~383s,
`cron-mcp-cleanup` ~567s.
- Live provider lanes, often ~15-20 minutes:
`live-gateway` ~958s, `live-models` ~1054s.
- Installer/release lanes:
`install-e2e` and package-update paths can vary widely with npm, provider,
and package registry behavior. Budget tens of minutes; prefer GitHub targeted
reruns over local repeats.
Default fallback lane timeout is 120 minutes. A timeout usually means debug the
lane log/artifacts first, not “run the whole thing again.”
## Failure Workflow
1. Identify exact failing job, SHA, lane, and artifact path.
2. Read `failures.json`, `summary.json`, and the failed lane log tail.
3. Use `pnpm test:docker:rerun <run-id|failures.json>` to generate targeted
GitHub rerun commands.
4. If the lane has `rerunCommand`, use that only as a local starting point.
5. For Docker release failures, dispatch targeted `docker_lanes=<failed-lane>`
on GitHub before considering local Docker.
6. Patch narrowly, then rerun the failed file/lane only.
7. Broaden to `pnpm check:changed` or CI only after the isolated proof passes.
## When To Escalate
- Public SDK/plugin contract changes: run changed gate plus relevant extension
validation.
- Build output, lazy imports, package boundaries, or published surfaces:
include `pnpm build`.
- Workflow edits: run `pnpm check:workflows`.
- Release branch or tag validation: use release docs and GitHub workflows; avoid
local Docker unless Peter explicitly asks.

View File

@@ -0,0 +1,4 @@
interface:
display_name: "OpenClaw Testing"
short_description: "Choose cheap, targeted OpenClaw validation"
default_prompt: "Use $openclaw-testing to choose the cheapest safe test or CI verification path, inspect failures, and rerun only the relevant OpenClaw lane."

View File

@@ -0,0 +1,145 @@
name: Docker E2E plan and hydrate
description: >
Create a Docker E2E lane plan, expose GitHub outputs, and optionally hydrate
the prebuilt package artifact plus shared Docker images needed by the plan.
inputs:
mode:
description: prepare, chunk, or targeted.
required: true
chunk:
description: Release-path chunk for mode=chunk.
required: false
default: ""
lanes:
description: Comma/space separated lane names for targeted or prepare mode.
required: false
default: ""
include-openwebui:
description: Whether Open WebUI is included when planning release/prepare coverage.
required: false
default: "true"
include-release-path-suites:
description: Whether prepare mode should plan all release-path suites.
required: false
default: "false"
hydrate-artifacts:
description: Whether to download/pull artifacts required by the plan.
required: false
default: "true"
outputs:
credentials:
description: Comma-separated credential groups required by selected lanes.
value: ${{ steps.plan.outputs.credentials }}
needs_bare_image:
description: "1 when selected lanes require the bare Docker E2E image."
value: ${{ steps.plan.outputs.needs_bare_image }}
needs_e2e_image:
description: "1 when selected lanes require any Docker E2E image."
value: ${{ steps.plan.outputs.needs_e2e_image }}
needs_functional_image:
description: "1 when selected lanes require the functional Docker E2E image."
value: ${{ steps.plan.outputs.needs_functional_image }}
needs_live_image:
description: "1 when selected lanes require building the live Docker image."
value: ${{ steps.plan.outputs.needs_live_image }}
needs_package:
description: "1 when selected lanes require the OpenClaw package tarball."
value: ${{ steps.plan.outputs.needs_package }}
plan_json:
description: Path to the generated plan JSON.
value: ${{ steps.plan.outputs.plan_json }}
runs:
using: composite
steps:
- name: Plan Docker E2E lanes
id: plan
shell: bash
env:
MODE: ${{ inputs.mode }}
CHUNK: ${{ inputs.chunk }}
LANES: ${{ inputs.lanes }}
INCLUDE_OPENWEBUI: ${{ inputs.include-openwebui }}
INCLUDE_RELEASE_PATH_SUITES: ${{ inputs.include-release-path-suites }}
run: |
set -euo pipefail
mkdir -p .artifacts/docker-tests
case "$MODE" in
prepare)
plan_path=".artifacts/docker-tests/plan.json"
if [[ "$INCLUDE_RELEASE_PATH_SUITES" == "true" ]]; then
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
export OPENCLAW_DOCKER_ALL_PLAN_RELEASE_ALL=1
elif [[ -n "$LANES" ]]; then
export OPENCLAW_DOCKER_ALL_LANES="$LANES"
elif [[ "$INCLUDE_OPENWEBUI" == "true" ]]; then
export OPENCLAW_DOCKER_ALL_LANES=openwebui
fi
;;
chunk)
if [[ -z "$CHUNK" ]]; then
echo "chunk input is required for Docker E2E chunk planning." >&2
exit 1
fi
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
export OPENCLAW_DOCKER_ALL_CHUNK="$CHUNK"
plan_path=".artifacts/docker-tests/release-${CHUNK}-plan.json"
;;
targeted)
if [[ -z "$LANES" ]]; then
echo "lanes input is required for Docker E2E targeted planning." >&2
exit 1
fi
export OPENCLAW_DOCKER_ALL_LANES="$LANES"
plan_path=".artifacts/docker-tests/targeted-plan.json"
;;
*)
echo "mode must be prepare, chunk, or targeted. Got: $MODE" >&2
exit 1
;;
esac
export OPENCLAW_DOCKER_ALL_INCLUDE_OPENWEBUI="$INCLUDE_OPENWEBUI"
node scripts/test-docker-all.mjs --plan-json > "$plan_path"
node scripts/docker-e2e.mjs github-outputs "$plan_path" >> "$GITHUB_OUTPUT"
echo "plan_json=$plan_path" >> "$GITHUB_OUTPUT"
- name: Download OpenClaw Docker E2E package
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_package == '1'
uses: actions/download-artifact@v8
with:
name: docker-e2e-package
path: .artifacts/docker-e2e-package
- name: Pull shared bare Docker E2E image
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_bare_image == '1'
shell: bash
run: |
set -euo pipefail
docker pull "${OPENCLAW_DOCKER_E2E_BARE_IMAGE}"
- name: Pull shared functional Docker E2E image
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_functional_image == '1'
shell: bash
run: |
set -euo pipefail
docker pull "${OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE}"
- name: Validate Docker E2E credentials
if: inputs.hydrate-artifacts == 'true'
shell: bash
env:
CREDENTIALS: ${{ steps.plan.outputs.credentials }}
run: |
set -euo pipefail
credentials=",$CREDENTIALS,"
if [[ "$credentials" == *",openai,"* ]]; then
[[ -n "${OPENAI_API_KEY:-}" ]] || {
echo "OPENAI_API_KEY is required for selected Docker E2E lanes." >&2
exit 1
}
fi
if [[ "$credentials" == *",anthropic,"* && -z "${ANTHROPIC_API_TOKEN:-}" && -z "${ANTHROPIC_API_KEY:-}" ]]; then
echo "ANTHROPIC_API_TOKEN or ANTHROPIC_API_KEY is required for selected Docker E2E lanes." >&2
exit 1
fi

View File

@@ -1,6 +1,7 @@
name: CI
on:
workflow_dispatch:
push:
branches: [main]
paths-ignore:
@@ -13,8 +14,8 @@ permissions:
contents: read
concurrency:
group: ${{ github.event_name == 'pull_request' && format('{0}-v7-{1}', github.workflow, github.event.pull_request.number) || (github.repository == 'openclaw/openclaw' && format('{0}-v7-{1}', github.workflow, github.ref) || format('{0}-v7-{1}-{2}', github.workflow, github.ref, github.sha)) }}
cancel-in-progress: true
group: ${{ github.event_name == 'workflow_dispatch' && format('{0}-manual-v1-{1}', github.workflow, github.run_id) || (github.event_name == 'pull_request' && format('{0}-v7-{1}', github.workflow, github.event.pull_request.number) || (github.repository == 'openclaw/openclaw' && format('{0}-v7-{1}', github.workflow, github.ref) || format('{0}-v7-{1}-{2}', github.workflow, github.ref, github.sha))) }}
cancel-in-progress: ${{ github.event_name != 'workflow_dispatch' }}
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
@@ -75,6 +76,7 @@ jobs:
submodules: false
- name: Ensure preflight base commit
if: github.event_name != 'workflow_dispatch'
uses: ./.github/actions/ensure-base-commit
with:
base-sha: ${{ github.event_name == 'push' && github.event.before || github.event.pull_request.base.sha }}
@@ -82,11 +84,12 @@ jobs:
- name: Detect docs-only changes
id: docs_scope
if: github.event_name != 'workflow_dispatch'
uses: ./.github/actions/detect-docs-changes
- name: Detect changed scopes
id: changed_scope
if: steps.docs_scope.outputs.docs_only != 'true'
if: github.event_name != 'workflow_dispatch' && steps.docs_scope.outputs.docs_only != 'true'
shell: bash
run: |
set -euo pipefail
@@ -101,7 +104,7 @@ jobs:
- name: Detect changed extensions
id: changed_extensions
if: steps.docs_scope.outputs.docs_only != 'true' && steps.changed_scope.outputs.run_node == 'true'
if: github.event_name != 'workflow_dispatch' && steps.docs_scope.outputs.docs_only != 'true' && steps.changed_scope.outputs.run_node == 'true'
env:
BASE_SHA: ${{ github.event_name == 'push' && github.event.before || github.event.pull_request.base.sha }}
BASE_REF: ${{ github.event_name == 'push' && github.ref_name || github.event.pull_request.base.ref }}
@@ -125,19 +128,19 @@ jobs:
- name: Build CI manifest
id: manifest
env:
OPENCLAW_CI_DOCS_ONLY: ${{ steps.docs_scope.outputs.docs_only }}
OPENCLAW_CI_DOCS_CHANGED: ${{ steps.docs_scope.outputs.docs_changed }}
OPENCLAW_CI_RUN_NODE: ${{ steps.changed_scope.outputs.run_node || 'false' }}
OPENCLAW_CI_RUN_MACOS: ${{ steps.changed_scope.outputs.run_macos || 'false' }}
OPENCLAW_CI_RUN_ANDROID: ${{ steps.changed_scope.outputs.run_android || 'false' }}
OPENCLAW_CI_RUN_WINDOWS: ${{ steps.changed_scope.outputs.run_windows || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_ONLY: ${{ steps.changed_scope.outputs.run_node_fast_only || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_PLUGIN_CONTRACTS: ${{ steps.changed_scope.outputs.run_node_fast_plugin_contracts || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_CI_ROUTING: ${{ steps.changed_scope.outputs.run_node_fast_ci_routing || 'false' }}
OPENCLAW_CI_RUN_SKILLS_PYTHON: ${{ steps.changed_scope.outputs.run_skills_python || 'false' }}
OPENCLAW_CI_RUN_CONTROL_UI_I18N: ${{ steps.changed_scope.outputs.run_control_ui_i18n || 'false' }}
OPENCLAW_CI_HAS_CHANGED_EXTENSIONS: ${{ steps.changed_extensions.outputs.has_changed_extensions || 'false' }}
OPENCLAW_CI_CHANGED_EXTENSIONS_MATRIX: ${{ steps.changed_extensions.outputs.changed_extensions_matrix || '{"include":[]}' }}
OPENCLAW_CI_DOCS_ONLY: ${{ github.event_name == 'workflow_dispatch' && 'false' || steps.docs_scope.outputs.docs_only }}
OPENCLAW_CI_DOCS_CHANGED: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.docs_scope.outputs.docs_changed }}
OPENCLAW_CI_RUN_NODE: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_node || 'false' }}
OPENCLAW_CI_RUN_MACOS: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_macos || 'false' }}
OPENCLAW_CI_RUN_ANDROID: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_android || 'false' }}
OPENCLAW_CI_RUN_WINDOWS: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_windows || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_ONLY: ${{ github.event_name == 'workflow_dispatch' && 'false' || steps.changed_scope.outputs.run_node_fast_only || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_PLUGIN_CONTRACTS: ${{ github.event_name == 'workflow_dispatch' && 'false' || steps.changed_scope.outputs.run_node_fast_plugin_contracts || 'false' }}
OPENCLAW_CI_RUN_NODE_FAST_CI_ROUTING: ${{ github.event_name == 'workflow_dispatch' && 'false' || steps.changed_scope.outputs.run_node_fast_ci_routing || 'false' }}
OPENCLAW_CI_RUN_SKILLS_PYTHON: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_skills_python || 'false' }}
OPENCLAW_CI_RUN_CONTROL_UI_I18N: ${{ github.event_name == 'workflow_dispatch' && 'true' || steps.changed_scope.outputs.run_control_ui_i18n || 'false' }}
OPENCLAW_CI_HAS_CHANGED_EXTENSIONS: ${{ github.event_name == 'workflow_dispatch' && 'false' || steps.changed_extensions.outputs.has_changed_extensions || 'false' }}
OPENCLAW_CI_CHANGED_EXTENSIONS_MATRIX: ${{ github.event_name == 'workflow_dispatch' && '{"include":[]}' || steps.changed_extensions.outputs.changed_extensions_matrix || '{"include":[]}' }}
OPENCLAW_CI_REPOSITORY: ${{ github.repository }}
run: |
node --input-type=module <<'EOF'

View File

@@ -63,7 +63,7 @@ jobs:
# KEEP THIS WORKFLOW ON GITHUB-HOSTED RUNNERS.
# DO NOT MOVE IT BACK TO BLACKSMITH WITHOUT RE-VALIDATING TAG BUILDS AND BACKFILLS.
# Build amd64 images (default + slim share the build stage cache)
# Build amd64 image. Default and slim tags point to the same slim runtime.
build-amd64:
needs: [approve_manual_backfill]
if: ${{ always() && (github.event_name != 'workflow_dispatch' || needs.approve_manual_backfill.result == 'success') }}
@@ -74,7 +74,6 @@ jobs:
contents: read
outputs:
digest: ${{ steps.build.outputs.digest }}
slim-digest: ${{ steps.build-slim.outputs.digest }}
steps:
- name: Checkout
uses: actions/checkout@v6
@@ -117,12 +116,7 @@ jobs:
fi
{
echo "value<<EOF"
printf "%s\n" "${tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
{
echo "slim<<EOF"
printf "%s\n" "${slim_tags[@]}"
printf "%s\n" "${tags[@]}" "${slim_tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
@@ -163,27 +157,11 @@ jobs:
OPENCLAW_EXTENSIONS=diagnostics-otel
tags: ${{ steps.tags.outputs.value }}
labels: ${{ steps.labels.outputs.value }}
provenance: false
sbom: true
provenance: mode=max
push: true
- name: Build and push amd64 slim image
id: build-slim
# WARNING: KEEP THE OFFICIAL DOCKER ACTION HERE; DO NOT SWITCH THIS BACK TO BLACKSMITH BLINDLY.
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
platforms: linux/amd64
cache-from: type=gha,scope=docker-release-amd64
cache-to: type=gha,mode=max,scope=docker-release-amd64
build-args: |
OPENCLAW_EXTENSIONS=diagnostics-otel
OPENCLAW_VARIANT=slim
tags: ${{ steps.tags.outputs.slim }}
labels: ${{ steps.labels.outputs.value }}
provenance: false
push: true
# Build arm64 images (default + slim share the build stage cache)
# Build arm64 image. Default and slim tags point to the same slim runtime.
build-arm64:
needs: [approve_manual_backfill]
if: ${{ always() && (github.event_name != 'workflow_dispatch' || needs.approve_manual_backfill.result == 'success') }}
@@ -194,7 +172,6 @@ jobs:
contents: read
outputs:
digest: ${{ steps.build.outputs.digest }}
slim-digest: ${{ steps.build-slim.outputs.digest }}
steps:
- name: Checkout
uses: actions/checkout@v6
@@ -237,12 +214,7 @@ jobs:
fi
{
echo "value<<EOF"
printf "%s\n" "${tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
{
echo "slim<<EOF"
printf "%s\n" "${slim_tags[@]}"
printf "%s\n" "${tags[@]}" "${slim_tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
@@ -283,24 +255,8 @@ jobs:
OPENCLAW_EXTENSIONS=diagnostics-otel
tags: ${{ steps.tags.outputs.value }}
labels: ${{ steps.labels.outputs.value }}
provenance: false
push: true
- name: Build and push arm64 slim image
id: build-slim
# WARNING: KEEP THE OFFICIAL DOCKER ACTION HERE; DO NOT SWITCH THIS BACK TO BLACKSMITH BLINDLY.
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
platforms: linux/arm64
cache-from: type=gha,scope=docker-release-arm64
cache-to: type=gha,mode=max,scope=docker-release-arm64
build-args: |
OPENCLAW_EXTENSIONS=diagnostics-otel
OPENCLAW_VARIANT=slim
tags: ${{ steps.tags.outputs.slim }}
labels: ${{ steps.labels.outputs.value }}
provenance: false
sbom: true
provenance: mode=max
push: true
# Create multi-platform manifests
@@ -357,16 +313,11 @@ jobs:
fi
{
echo "value<<EOF"
printf "%s\n" "${tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
{
echo "slim<<EOF"
printf "%s\n" "${slim_tags[@]}"
printf "%s\n" "${tags[@]}" "${slim_tags[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
- name: Create and push default manifest
- name: Create and push manifest
shell: bash
env:
TAGS: ${{ steps.tags.outputs.value }}
@@ -384,20 +335,94 @@ jobs:
"${AMD64_DIGEST}" \
"${ARM64_DIGEST}"
- name: Create and push slim manifest
verify-attestations:
needs: [create-manifest]
if: ${{ always() && needs.create-manifest.result == 'success' }}
runs-on: ubuntu-24.04
permissions:
contents: read
packages: read
steps:
- name: Checkout
uses: actions/checkout@v6
with:
fetch-depth: 1
- name: Set up Docker Builder
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4
- name: Login to GitHub Container Registry
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Resolve image refs
id: refs
shell: bash
env:
SLIM_TAGS: ${{ steps.tags.outputs.slim }}
AMD64_SLIM_DIGEST: ${{ needs.build-amd64.outputs.slim-digest }}
ARM64_SLIM_DIGEST: ${{ needs.build-arm64.outputs.slim-digest }}
IMAGE: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
SOURCE_REF: ${{ github.event_name == 'workflow_dispatch' && format('refs/tags/{0}', inputs.tag) || github.ref }}
IS_MANUAL_BACKFILL: ${{ github.event_name == 'workflow_dispatch' && '1' || '0' }}
run: |
set -euo pipefail
mapfile -t tags <<< "${SLIM_TAGS}"
args=()
for tag in "${tags[@]}"; do
[ -z "$tag" ] && continue
args+=("-t" "$tag")
done
docker buildx imagetools create "${args[@]}" \
"${AMD64_SLIM_DIGEST}" \
"${ARM64_SLIM_DIGEST}"
multi_refs=()
slim_multi_refs=()
amd64_refs=()
arm64_refs=()
if [[ "${SOURCE_REF}" == "refs/heads/main" ]]; then
multi_refs+=("${IMAGE}:main")
slim_multi_refs+=("${IMAGE}:main-slim")
amd64_refs+=("${IMAGE}:main-amd64" "${IMAGE}:main-slim-amd64")
arm64_refs+=("${IMAGE}:main-arm64" "${IMAGE}:main-slim-arm64")
fi
if [[ "${SOURCE_REF}" == refs/tags/v* ]]; then
version="${SOURCE_REF#refs/tags/v}"
multi_refs+=("${IMAGE}:${version}")
slim_multi_refs+=("${IMAGE}:${version}-slim")
amd64_refs+=("${IMAGE}:${version}-amd64" "${IMAGE}:${version}-slim-amd64")
arm64_refs+=("${IMAGE}:${version}-arm64" "${IMAGE}:${version}-slim-arm64")
if [[ "${IS_MANUAL_BACKFILL}" != "1" && "$version" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[0-9]+)?$ ]]; then
multi_refs+=("${IMAGE}:latest")
slim_multi_refs+=("${IMAGE}:slim")
fi
fi
if [[ ${#multi_refs[@]} -eq 0 || ${#amd64_refs[@]} -eq 0 || ${#arm64_refs[@]} -eq 0 ]]; then
echo "::error::No Docker image refs resolved for ref ${SOURCE_REF}"
exit 1
fi
{
echo "multi<<EOF"
printf "%s\n" "${multi_refs[@]}" "${slim_multi_refs[@]}"
echo "EOF"
echo "amd64<<EOF"
printf "%s\n" "${amd64_refs[@]}"
echo "EOF"
echo "arm64<<EOF"
printf "%s\n" "${arm64_refs[@]}"
echo "EOF"
} >> "$GITHUB_OUTPUT"
- name: Verify Docker attestations
shell: bash
env:
MULTI_REFS: ${{ steps.refs.outputs.multi }}
AMD64_REFS: ${{ steps.refs.outputs.amd64 }}
ARM64_REFS: ${{ steps.refs.outputs.arm64 }}
run: |
set -euo pipefail
mapfile -t multi_refs <<< "${MULTI_REFS}"
mapfile -t amd64_refs <<< "${AMD64_REFS}"
mapfile -t arm64_refs <<< "${ARM64_REFS}"
node scripts/verify-docker-attestations.mjs \
--platform linux/amd64 \
--platform linux/arm64 \
"${multi_refs[@]}"
node scripts/verify-docker-attestations.mjs \
--platform linux/amd64 \
"${amd64_refs[@]}"
node scripts/verify-docker-attestations.mjs \
--platform linux/arm64 \
"${arm64_refs[@]}"

View File

@@ -10,6 +10,11 @@ on:
required: false
default: false
type: boolean
update_baseline_version:
description: Baseline openclaw version or dist-tag for installer update smoke
required: false
default: latest
type: string
workflow_call:
inputs:
ref:
@@ -21,6 +26,11 @@ on:
required: false
default: true
type: boolean
update_baseline_version:
description: Baseline openclaw version or dist-tag for installer update smoke
required: false
default: latest
type: string
permissions:
contents: read
@@ -103,7 +113,6 @@ jobs:
context: .
file: ./Dockerfile
build-args: |
OPENCLAW_DOCKER_APT_UPGRADE=0
OPENCLAW_EXTENSIONS=matrix
tags: |
openclaw-dockerfile-smoke:local
@@ -218,7 +227,6 @@ jobs:
context: .
file: ./Dockerfile
build-args: |
OPENCLAW_DOCKER_APT_UPGRADE=0
OPENCLAW_EXTENSIONS=matrix
tags: |
openclaw-dockerfile-smoke:local
@@ -332,7 +340,7 @@ jobs:
OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT: "0"
OPENCLAW_INSTALL_SMOKE_SKIP_NPM_GLOBAL: "1"
OPENCLAW_INSTALL_SMOKE_SKIP_PREVIOUS: "1"
OPENCLAW_INSTALL_SMOKE_UPDATE_BASELINE: latest
OPENCLAW_INSTALL_SMOKE_UPDATE_BASELINE: ${{ inputs.update_baseline_version || 'latest' }}
OPENCLAW_INSTALL_SMOKE_UPDATE_DIST_IMAGE: openclaw-dockerfile-smoke:local
OPENCLAW_INSTALL_SMOKE_UPDATE_SKIP_LOCAL_BUILD: "1"
run: bash scripts/test-install-sh-docker.sh

View File

@@ -34,34 +34,8 @@ env:
PNPM_VERSION: "10.33.0"
jobs:
validate_dispatch_ref:
name: Validate dispatch ref
runs-on: blacksmith-8vcpu-ubuntu-2404
steps:
- name: Require main workflow ref
env:
WORKFLOW_REF: ${{ github.ref }}
run: |
set -euo pipefail
if [[ "${WORKFLOW_REF}" != "refs/heads/main" ]]; then
echo "NPM Telegram beta E2E must be dispatched from main so workflow logic stays controlled." >&2
exit 1
fi
approve_release_manager:
name: Approve npm Telegram beta E2E
needs: validate_dispatch_ref
runs-on: ubuntu-latest
environment: npm-release
steps:
- name: Record approval
env:
PACKAGE_SPEC: ${{ inputs.package_spec }}
run: echo "Approved npm Telegram beta E2E for ${PACKAGE_SPEC}"
run_npm_telegram_beta_e2e:
name: Run published npm Telegram E2E
needs: approve_release_manager
runs-on: blacksmith-32vcpu-ubuntu-2404
timeout-minutes: 60
environment: qa-live-shared
@@ -71,7 +45,7 @@ jobs:
DOCKER_BUILD_SUMMARY: "false"
DOCKER_BUILD_RECORD_UPLOAD: "false"
steps:
- name: Checkout main
- name: Checkout dispatch ref
uses: actions/checkout@v6
with:
ref: ${{ github.sha }}
@@ -79,6 +53,8 @@ jobs:
- name: Set up Blacksmith Docker Builder
uses: useblacksmith/setup-docker-builder@ac083cc84672d01c60d5e8561d0a939b697de542 # v1
with:
max-cache-size-mb: 800000
- name: Build Docker E2E image
uses: useblacksmith/build-push-action@cbd1f60d194a98cb3be5523b15134501eaf0fbf3 # v2
@@ -143,6 +119,7 @@ jobs:
OPENCLAW_QA_CONVEX_SITE_URL: ${{ secrets.OPENCLAW_QA_CONVEX_SITE_URL }}
OPENCLAW_QA_CONVEX_SECRET_CI: ${{ secrets.OPENCLAW_QA_CONVEX_SECRET_CI }}
OPENCLAW_QA_REDACT_PUBLIC_METADATA: "1"
OPENCLAW_QA_TELEGRAM_CAPTURE_CONTENT: "1"
INPUT_SCENARIO: ${{ inputs.scenario }}
run: |
set -euo pipefail

View File

@@ -23,6 +23,11 @@ on:
required: false
default: true
type: boolean
docker_lanes:
description: Comma/space separated Docker scheduler lane names to run against the prepared image
required: false
default: ""
type: string
include_live_suites:
description: Whether to run live-provider coverage
required: false
@@ -54,6 +59,11 @@ on:
required: false
default: true
type: boolean
docker_lanes:
description: Comma/space separated Docker scheduler lane names to run against the prepared image
required: false
default: ""
type: string
include_live_suites:
description: Whether to run live-provider coverage
required: false
@@ -182,6 +192,7 @@ jobs:
env:
GH_TOKEN: ${{ github.token }}
INPUT_REF: ${{ inputs.ref }}
WORKFLOW_REF_NAME: ${{ github.ref_name }}
shell: bash
run: |
set -euo pipefail
@@ -189,9 +200,15 @@ jobs:
trusted_reason=""
git fetch --no-tags origin +refs/heads/main:refs/remotes/origin/main
if [[ "${WORKFLOW_REF_NAME}" =~ ^release/[0-9]{4}\.[1-9][0-9]*\.[1-9][0-9]*$ ]]; then
git fetch --no-tags origin "+refs/heads/${WORKFLOW_REF_NAME}:refs/remotes/origin/${WORKFLOW_REF_NAME}"
fi
if git merge-base --is-ancestor "$selected_sha" refs/remotes/origin/main; then
trusted_reason="main-ancestor"
elif [[ "${WORKFLOW_REF_NAME}" =~ ^release/[0-9]{4}\.[1-9][0-9]*\.[1-9][0-9]*$ ]] &&
[[ "$selected_sha" == "$(git rev-parse "refs/remotes/origin/${WORKFLOW_REF_NAME}")" ]]; then
trusted_reason="release-branch-head"
elif git tag --points-at "$selected_sha" | grep -Eq '^v'; then
trusted_reason="release-tag"
else
@@ -208,7 +225,7 @@ jobs:
if [[ -z "$trusted_reason" ]]; then
echo "Ref '${INPUT_REF}' resolved to $selected_sha, which is not trusted for secret-bearing live/E2E checks." >&2
echo "Allowed refs must be on main, point to a release tag, or match an open PR head in ${GITHUB_REPOSITORY}." >&2
echo "Allowed refs must be on main, match the current release branch head, point to a release tag, or match an open PR head in ${GITHUB_REPOSITORY}." >&2
exit 1
fi
@@ -303,7 +320,7 @@ jobs:
requires_live_suites: false
- suite_id: openai-ws-stream-live-e2e
label: OpenAI WebSocket live E2E
command: pnpm test:e2e -- src/agents/openai-ws-stream.e2e.test.ts
command: pnpm test:e2e src/agents/openai-ws-stream.e2e.test.ts
timeout_minutes: 90
requires_repo_e2e: false
requires_live_suites: true
@@ -363,93 +380,23 @@ jobs:
validate_docker_e2e:
needs: [validate_selected_ref, prepare_docker_e2e_image]
if: inputs.include_release_path_suites
if: inputs.include_release_path_suites && inputs.docker_lanes == ''
name: Docker E2E (${{ matrix.label }})
runs-on: blacksmith-32vcpu-ubuntu-2404
timeout-minutes: ${{ matrix.timeout_minutes }}
strategy:
fail-fast: false
matrix:
include:
- suite_id: docker-onboard
label: Onboarding Docker E2E
command: pnpm test:docker:onboard
timeout_minutes: 60
release_path: true
- suite_id: docker-npm-onboard-channel-agent
label: Npm Onboard Channel Agent Docker E2E
command: pnpm test:docker:npm-onboard-channel-agent
timeout_minutes: 90
release_path: true
- suite_id: docker-gateway-network
label: Gateway Network Docker E2E
command: pnpm test:docker:gateway-network
timeout_minutes: 60
release_path: true
- suite_id: docker-openai-web-search-minimal
label: OpenAI Web Search Minimal Docker E2E
command: pnpm test:docker:openai-web-search-minimal
timeout_minutes: 60
release_path: true
- suite_id: docker-mcp-channels
label: MCP Channels Docker E2E
command: pnpm test:docker:mcp-channels
timeout_minutes: 60
release_path: true
- suite_id: docker-pi-bundle-mcp-tools
label: Pi Bundle MCP Tools Docker E2E
command: pnpm test:docker:pi-bundle-mcp-tools
timeout_minutes: 60
release_path: true
- suite_id: docker-cron-mcp-cleanup
label: Cron MCP Cleanup Docker E2E
command: pnpm test:docker:cron-mcp-cleanup
timeout_minutes: 60
release_path: true
- suite_id: docker-plugins
label: Plugins Docker E2E
command: pnpm test:docker:plugins
timeout_minutes: 75
release_path: true
- suite_id: docker-plugin-update
label: Plugin Update Docker E2E
command: pnpm test:docker:plugin-update
timeout_minutes: 60
release_path: true
- suite_id: docker-config-reload
label: Config Reload Docker E2E
command: pnpm test:docker:config-reload
timeout_minutes: 60
release_path: true
- suite_id: docker-bundled-channel-deps
label: Bundled Channel Runtime Deps Docker E2E
command: pnpm test:docker:bundled-channel-deps
timeout_minutes: 75
release_path: true
- suite_id: docker-doctor-switch
label: Doctor Install Switch Docker E2E
command: pnpm test:docker:doctor-switch
timeout_minutes: 60
release_path: true
- suite_id: docker-update-channel-switch
label: Update Channel Switch Docker E2E
command: pnpm test:docker:update-channel-switch
timeout_minutes: 60
release_path: true
- suite_id: docker-session-runtime-context
label: Session Runtime Context Docker E2E
command: pnpm test:docker:session-runtime-context
timeout_minutes: 60
release_path: true
- suite_id: docker-qr
label: QR Import Docker E2E
command: pnpm test:docker:qr
timeout_minutes: 60
release_path: true
- suite_id: docker-install-e2e
label: Installer Docker E2E
command: pnpm test:install:e2e
- chunk_id: core
label: core
timeout_minutes: 120
release_path: true
- chunk_id: package-update
label: package/update
timeout_minutes: 180
- chunk_id: plugins-integrations
label: plugins/integrations
timeout_minutes: 180
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
@@ -496,7 +443,12 @@ jobs:
OPENCLAW_GEMINI_SETTINGS_JSON: ${{ secrets.OPENCLAW_GEMINI_SETTINGS_JSON }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENCLAW_DOCKER_E2E_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.image }}
OPENCLAW_DOCKER_E2E_BARE_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.bare_image }}
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.functional_image }}
OPENCLAW_CURRENT_PACKAGE_TGZ: .artifacts/docker-e2e-package/openclaw-current.tgz
OPENCLAW_SKIP_DOCKER_BUILD: "1"
INCLUDE_OPENWEBUI: ${{ inputs.include_openwebui }}
DOCKER_E2E_CHUNK: ${{ matrix.chunk_id }}
steps:
- name: Checkout selected ref
uses: actions/checkout@v6
@@ -521,45 +473,188 @@ jobs:
- name: Hydrate live auth/profile inputs
run: bash scripts/ci-hydrate-live-auth.sh
- name: Configure suite-specific env
- name: Plan and hydrate Docker E2E chunk
id: plan
uses: ./.github/actions/docker-e2e-plan
with:
mode: chunk
chunk: ${{ matrix.chunk_id }}
include-openwebui: ${{ inputs.include_openwebui }}
- name: Run Docker E2E chunk
shell: bash
run: |
set -euo pipefail
case "${{ matrix.suite_id }}" in
docker-install-e2e)
echo "OPENCLAW_E2E_MODELS=both" >> "$GITHUB_ENV"
;;
esac
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
export OPENCLAW_DOCKER_ALL_CHUNK="${DOCKER_E2E_CHUNK}"
export OPENCLAW_DOCKER_ALL_BUILD=0
export OPENCLAW_DOCKER_ALL_PREFLIGHT=0
export OPENCLAW_DOCKER_ALL_FAIL_FAST=0
export OPENCLAW_DOCKER_ALL_INCLUDE_OPENWEBUI="${INCLUDE_OPENWEBUI}"
export OPENCLAW_DOCKER_ALL_LOG_DIR=".artifacts/docker-tests/release-${DOCKER_E2E_CHUNK}"
export OPENCLAW_DOCKER_ALL_TIMINGS_FILE=".artifacts/docker-tests/release-${DOCKER_E2E_CHUNK}-timings.json"
export OPENCLAW_DOCKER_ALL_PNPM_COMMAND="$(command -v pnpm)"
- name: Validate suite credentials
pnpm test:docker:all
- name: Summarize Docker E2E chunk
if: always()
shell: bash
run: |
set -euo pipefail
case "${{ matrix.suite_id }}" in
docker-install-e2e)
[[ -n "${OPENAI_API_KEY:-}" ]] || {
echo "OPENAI_API_KEY is required for installer Docker E2E." >&2
exit 1
}
if [[ -z "${ANTHROPIC_API_TOKEN:-}" && -z "${ANTHROPIC_API_KEY:-}" ]]; then
echo "ANTHROPIC_API_TOKEN or ANTHROPIC_API_KEY is required for installer Docker E2E." >&2
exit 1
fi
;;
esac
summary=".artifacts/docker-tests/release-${DOCKER_E2E_CHUNK}/summary.json"
if [[ ! -f "$summary" ]]; then
echo "Docker chunk summary missing: \`$summary\`" >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
node scripts/docker-e2e.mjs summary "$summary" "Docker E2E chunk: ${DOCKER_E2E_CHUNK:-unknown}" >> "$GITHUB_STEP_SUMMARY"
- name: Run ${{ matrix.label }}
run: ${{ matrix.command }}
- name: Upload Docker E2E chunk artifacts
if: always()
uses: actions/upload-artifact@v7
with:
name: docker-e2e-${{ matrix.chunk_id }}
path: .artifacts/docker-tests/
if-no-files-found: ignore
validate_docker_lanes:
needs: [validate_selected_ref, prepare_docker_e2e_image]
if: inputs.docker_lanes != ''
name: Docker E2E targeted lanes
runs-on: blacksmith-32vcpu-ubuntu-2404
timeout-minutes: 180
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
ANTHROPIC_API_TOKEN: ${{ secrets.ANTHROPIC_API_TOKEN }}
ANTHROPIC_API_KEY_OLD: ${{ secrets.ANTHROPIC_API_KEY_OLD }}
BYTEPLUS_API_KEY: ${{ secrets.BYTEPLUS_API_KEY }}
CEREBRAS_API_KEY: ${{ secrets.CEREBRAS_API_KEY }}
DASHSCOPE_API_KEY: ${{ secrets.DASHSCOPE_API_KEY }}
GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
KIMI_API_KEY: ${{ secrets.KIMI_API_KEY }}
MODELSTUDIO_API_KEY: ${{ secrets.MODELSTUDIO_API_KEY }}
MOONSHOT_API_KEY: ${{ secrets.MOONSHOT_API_KEY }}
MISTRAL_API_KEY: ${{ secrets.MISTRAL_API_KEY }}
MINIMAX_API_KEY: ${{ secrets.MINIMAX_API_KEY }}
OPENCODE_API_KEY: ${{ secrets.OPENCODE_API_KEY }}
OPENCODE_ZEN_API_KEY: ${{ secrets.OPENCODE_ZEN_API_KEY }}
OPENCLAW_LIVE_BROWSER_CDP_URL: ${{ secrets.OPENCLAW_LIVE_BROWSER_CDP_URL }}
OPENCLAW_LIVE_SETUP_TOKEN: ${{ secrets.OPENCLAW_LIVE_SETUP_TOKEN }}
OPENCLAW_LIVE_SETUP_TOKEN_MODEL: ${{ secrets.OPENCLAW_LIVE_SETUP_TOKEN_MODEL }}
OPENCLAW_LIVE_SETUP_TOKEN_PROFILE: ${{ secrets.OPENCLAW_LIVE_SETUP_TOKEN_PROFILE }}
OPENCLAW_LIVE_SETUP_TOKEN_VALUE: ${{ secrets.OPENCLAW_LIVE_SETUP_TOKEN_VALUE }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
OPENROUTER_API_KEY: ${{ secrets.OPENROUTER_API_KEY }}
QWEN_API_KEY: ${{ secrets.QWEN_API_KEY }}
FAL_KEY: ${{ secrets.FAL_KEY }}
RUNWAY_API_KEY: ${{ secrets.RUNWAY_API_KEY }}
DEEPGRAM_API_KEY: ${{ secrets.DEEPGRAM_API_KEY }}
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
VYDRA_API_KEY: ${{ secrets.VYDRA_API_KEY }}
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
ZAI_API_KEY: ${{ secrets.ZAI_API_KEY }}
Z_AI_API_KEY: ${{ secrets.Z_AI_API_KEY }}
BYTEPLUS_ACCESS_KEY_ID: ${{ secrets.BYTEPLUS_ACCESS_KEY_ID }}
BYTEPLUS_SECRET_ACCESS_KEY: ${{ secrets.BYTEPLUS_SECRET_ACCESS_KEY }}
CLAUDE_CODE_OAUTH_TOKEN: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}
OPENCLAW_CODEX_AUTH_JSON: ${{ secrets.OPENCLAW_CODEX_AUTH_JSON }}
OPENCLAW_CODEX_CONFIG_TOML: ${{ secrets.OPENCLAW_CODEX_CONFIG_TOML }}
OPENCLAW_CLAUDE_JSON: ${{ secrets.OPENCLAW_CLAUDE_JSON }}
OPENCLAW_CLAUDE_CREDENTIALS_JSON: ${{ secrets.OPENCLAW_CLAUDE_CREDENTIALS_JSON }}
OPENCLAW_CLAUDE_SETTINGS_JSON: ${{ secrets.OPENCLAW_CLAUDE_SETTINGS_JSON }}
OPENCLAW_CLAUDE_SETTINGS_LOCAL_JSON: ${{ secrets.OPENCLAW_CLAUDE_SETTINGS_LOCAL_JSON }}
OPENCLAW_GEMINI_SETTINGS_JSON: ${{ secrets.OPENCLAW_GEMINI_SETTINGS_JSON }}
FIREWORKS_API_KEY: ${{ secrets.FIREWORKS_API_KEY }}
OPENCLAW_DOCKER_E2E_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.image }}
OPENCLAW_DOCKER_E2E_BARE_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.bare_image }}
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.functional_image }}
OPENCLAW_CURRENT_PACKAGE_TGZ: .artifacts/docker-e2e-package/openclaw-current.tgz
OPENCLAW_SKIP_DOCKER_BUILD: "1"
INCLUDE_OPENWEBUI: ${{ inputs.include_openwebui }}
DOCKER_E2E_LANES: ${{ inputs.docker_lanes }}
steps:
- name: Checkout selected ref
uses: actions/checkout@v6
with:
ref: ${{ needs.validate_selected_ref.outputs.selected_sha }}
fetch-depth: 1
- name: Log in to GHCR for shared Docker E2E image
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ github.token }}
- name: Setup Node environment
uses: ./.github/actions/setup-node-env
with:
node-version: ${{ env.NODE_VERSION }}
pnpm-version: ${{ env.PNPM_VERSION }}
install-bun: "true"
- name: Hydrate live auth/profile inputs
run: bash scripts/ci-hydrate-live-auth.sh
- name: Plan and hydrate targeted Docker E2E lanes
id: plan
uses: ./.github/actions/docker-e2e-plan
with:
mode: targeted
lanes: ${{ inputs.docker_lanes }}
include-openwebui: ${{ inputs.include_openwebui }}
- name: Run targeted Docker E2E lanes
shell: bash
run: |
set -euo pipefail
export OPENCLAW_DOCKER_ALL_LANES="${DOCKER_E2E_LANES}"
export OPENCLAW_DOCKER_ALL_PREFLIGHT=0
export OPENCLAW_DOCKER_ALL_FAIL_FAST=0
export OPENCLAW_DOCKER_ALL_INCLUDE_OPENWEBUI="${INCLUDE_OPENWEBUI}"
export OPENCLAW_DOCKER_ALL_LOG_DIR=".artifacts/docker-tests/targeted"
export OPENCLAW_DOCKER_ALL_TIMINGS_FILE=".artifacts/docker-tests/targeted-timings.json"
export OPENCLAW_DOCKER_ALL_PNPM_COMMAND="$(command -v pnpm)"
if [[ "${{ steps.plan.outputs.needs_live_image }}" == "1" ]]; then
pnpm test:docker:live-build
fi
export OPENCLAW_DOCKER_ALL_BUILD=0
pnpm test:docker:all
- name: Summarize targeted Docker E2E lanes
if: always()
shell: bash
run: |
set -euo pipefail
summary=".artifacts/docker-tests/targeted/summary.json"
if [[ ! -f "$summary" ]]; then
echo "Docker targeted summary missing: \`$summary\`" >> "$GITHUB_STEP_SUMMARY"
exit 0
fi
node scripts/docker-e2e.mjs summary "$summary" "Docker E2E targeted lanes" >> "$GITHUB_STEP_SUMMARY"
- name: Upload targeted Docker E2E artifacts
if: always()
uses: actions/upload-artifact@v7
with:
name: docker-e2e-targeted
path: .artifacts/docker-tests/
if-no-files-found: ignore
validate_docker_openwebui:
needs: [validate_selected_ref, prepare_docker_e2e_image]
if: inputs.include_openwebui
if: inputs.include_openwebui && !inputs.include_release_path_suites && inputs.docker_lanes == ''
runs-on: blacksmith-32vcpu-ubuntu-2404
timeout-minutes: 75
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: ${{ secrets.OPENAI_BASE_URL }}
OPENCLAW_DOCKER_E2E_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.image }}
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE: ${{ needs.prepare_docker_e2e_image.outputs.functional_image }}
OPENCLAW_SKIP_DOCKER_BUILD: "1"
steps:
- name: Checkout selected ref
@@ -596,7 +691,7 @@ jobs:
prepare_docker_e2e_image:
needs: validate_selected_ref
if: inputs.include_release_path_suites || inputs.include_openwebui
if: inputs.include_release_path_suites || inputs.include_openwebui || inputs.docker_lanes != ''
runs-on: blacksmith-32vcpu-ubuntu-2404
timeout-minutes: 90
permissions:
@@ -604,6 +699,13 @@ jobs:
packages: write
outputs:
image: ${{ steps.image.outputs.image }}
bare_image: ${{ steps.image.outputs.bare_image }}
functional_image: ${{ steps.image.outputs.functional_image }}
needs_bare_image: ${{ steps.plan.outputs.needs_bare_image }}
needs_e2e_image: ${{ steps.plan.outputs.needs_e2e_image }}
needs_functional_image: ${{ steps.plan.outputs.needs_functional_image }}
needs_live_image: ${{ steps.plan.outputs.needs_live_image }}
needs_package: ${{ steps.plan.outputs.needs_package }}
env:
DOCKER_BUILD_SUMMARY: "false"
DOCKER_BUILD_RECORD_UPLOAD: "false"
@@ -614,7 +716,7 @@ jobs:
ref: ${{ needs.validate_selected_ref.outputs.selected_sha }}
fetch-depth: 1
- name: Resolve shared Docker E2E image tag
- name: Resolve shared Docker E2E image tags
id: image
shell: bash
env:
@@ -622,31 +724,127 @@ jobs:
run: |
set -euo pipefail
repository="${GITHUB_REPOSITORY,,}"
image="ghcr.io/${repository}-docker-e2e:${SELECTED_SHA}"
bare_image="ghcr.io/${repository}-docker-e2e-bare:${SELECTED_SHA}"
functional_image="ghcr.io/${repository}-docker-e2e-functional:${SELECTED_SHA}"
image="$functional_image"
echo "image=$image" >> "$GITHUB_OUTPUT"
echo "Shared Docker E2E image: \`$image\`" >> "$GITHUB_STEP_SUMMARY"
echo "bare_image=$bare_image" >> "$GITHUB_OUTPUT"
echo "functional_image=$functional_image" >> "$GITHUB_OUTPUT"
echo "Shared Docker E2E bare image: \`$bare_image\`" >> "$GITHUB_STEP_SUMMARY"
echo "Shared Docker E2E functional image: \`$functional_image\`" >> "$GITHUB_STEP_SUMMARY"
- name: Plan Docker E2E images
id: plan
uses: ./.github/actions/docker-e2e-plan
with:
mode: prepare
lanes: ${{ inputs.docker_lanes }}
include-release-path-suites: ${{ inputs.include_release_path_suites }}
include-openwebui: ${{ inputs.include_openwebui }}
hydrate-artifacts: "false"
- name: Setup Node environment
if: steps.plan.outputs.needs_package == '1'
uses: ./.github/actions/setup-node-env
with:
node-version: ${{ env.NODE_VERSION }}
pnpm-version: ${{ env.PNPM_VERSION }}
install-bun: "true"
- name: Pack OpenClaw package for Docker E2E
if: steps.plan.outputs.needs_package == '1'
shell: bash
run: |
set -euo pipefail
mkdir -p .artifacts/docker-e2e-package
node scripts/package-openclaw-for-docker.mjs \
--output-dir .artifacts/docker-e2e-package \
--output-name openclaw-current.tgz
- name: Upload OpenClaw Docker E2E package
if: steps.plan.outputs.needs_package == '1'
uses: actions/upload-artifact@v7
with:
name: docker-e2e-package
path: .artifacts/docker-e2e-package/openclaw-current.tgz
if-no-files-found: error
- name: Log in to GHCR
if: steps.plan.outputs.needs_e2e_image == '1'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ github.token }}
- name: Check existing shared Docker E2E images
id: image_exists
if: steps.plan.outputs.needs_e2e_image == '1'
shell: bash
run: |
set -euo pipefail
bare_exists=0
functional_exists=0
needs_build=0
if [[ "${{ steps.plan.outputs.needs_bare_image }}" == "1" ]]; then
if docker manifest inspect "${{ steps.image.outputs.bare_image }}" >/dev/null 2>&1; then
bare_exists=1
echo "Shared Docker E2E bare image already exists: ${{ steps.image.outputs.bare_image }}"
else
needs_build=1
fi
fi
if [[ "${{ steps.plan.outputs.needs_functional_image }}" == "1" ]]; then
if docker manifest inspect "${{ steps.image.outputs.functional_image }}" >/dev/null 2>&1; then
functional_exists=1
echo "Shared Docker E2E functional image already exists: ${{ steps.image.outputs.functional_image }}"
else
needs_build=1
fi
fi
echo "bare_exists=$bare_exists" >> "$GITHUB_OUTPUT"
echo "functional_exists=$functional_exists" >> "$GITHUB_OUTPUT"
echo "needs_build=$needs_build" >> "$GITHUB_OUTPUT"
- name: Setup Docker builder
if: steps.image_exists.outputs.needs_build == '1'
uses: useblacksmith/setup-docker-builder@ac083cc84672d01c60d5e8561d0a939b697de542 # v1
- name: Build and push shared Docker E2E image
- name: Build and push bare Docker E2E image
if: steps.plan.outputs.needs_bare_image == '1' && steps.image_exists.outputs.bare_exists != '1'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: ./scripts/e2e/Dockerfile
target: build
target: bare
platforms: linux/amd64
cache-from: type=gha,scope=docker-e2e
cache-to: type=gha,mode=max,scope=docker-e2e
tags: ${{ steps.image.outputs.image }}
provenance: false
cache-from: type=gha,scope=docker-e2e-bare
cache-to: type=gha,mode=max,scope=docker-e2e-bare
tags: ${{ steps.image.outputs.bare_image }}
sbom: true
provenance: mode=max
push: true
- name: Build and push functional Docker E2E image
if: steps.plan.outputs.needs_functional_image == '1' && steps.image_exists.outputs.functional_exists != '1'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: ./scripts/e2e/Dockerfile
target: functional
build-contexts: |
openclaw_package=.artifacts/docker-e2e-package
platforms: linux/amd64
cache-from: |
type=gha,scope=docker-e2e-bare
type=gha,scope=docker-e2e-functional
cache-to: type=gha,mode=max,scope=docker-e2e-functional
tags: ${{ steps.image.outputs.functional_image }}
sbom: true
provenance: mode=max
push: true
validate_live_models_docker:

2
.gitignore vendored
View File

@@ -118,6 +118,8 @@ USER.md
!.agents/skills/openclaw-test-heap-leaks/**
!.agents/skills/openclaw-test-performance/
!.agents/skills/openclaw-test-performance/**
!.agents/skills/openclaw-testing/
!.agents/skills/openclaw-testing/**
!.agents/skills/optimizetests/
!.agents/skills/optimizetests/**
!.agents/skills/parallels-discord-roundtrip/

View File

@@ -29,6 +29,7 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work.
- Extension prod code: no core `src/**`, `src/plugin-sdk-internal/**`, other extension `src/**`, or relative outside package.
- Core/tests: no deep plugin internals (`extensions/*/src/**`, `onboard.js`). Use `api.ts`, SDK facade, generic contracts.
- Extension-owned behavior stays extension-owned: repair, detection, onboarding, auth/provider defaults, provider tools/settings.
- Owner boundary: fix owner-specific behavior in the owner module. Shared/core gets generic seams only; no owner ids, dependency strings, defaults, migrations, or recovery policy. If a bug names an extension or its dependency, start in that extension and add a generic core seam only when multiple owners need it.
- Legacy config repair: doctor/fix paths, not startup/load-time core migrations.
- Core test asserting extension-specific behavior: move to owner extension or generic contract test.
- New seams: backwards-compatible, documented, versioned. Third-party plugins exist.
@@ -50,7 +51,8 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work.
- Extension tests: `pnpm test:extensions`, `pnpm test extensions`, `pnpm test extensions/<id>`.
- Targeted tests: `pnpm test <path-or-filter> [vitest args...]`; never raw `vitest`.
- Typecheck: `tsgo` lanes only (`pnpm tsgo*`, `pnpm check:test-types`); do not add `tsc --noEmit`, `typecheck`, `check:types`.
- Format/lint: `pnpm format:check`/`pnpm format`; `pnpm lint*` lanes.
- Formatting: use `oxfmt`, not Prettier. Prefer `pnpm format:check` / `pnpm format`; for targeted files use `pnpm exec oxfmt --check --threads=1 <files...>` or `pnpm exec oxfmt --write --threads=1 <files...>`.
- Linting: use repo wrappers (`pnpm lint:*`, `scripts/run-oxlint.mjs`); do not invoke generic JS formatters/lints unless a repo script uses them.
- Heavy checks: `OPENCLAW_LOCAL_CHECK=1`, mode `OPENCLAW_LOCAL_CHECK_MODE=throttled|full`; CI/shared use `OPENCLAW_LOCAL_CHECK=0`.
- Local first. Use repo `pnpm` lanes before Blacksmith/Testbox. Remote only for parity-only failures, secrets/services, or explicit ask.
@@ -58,6 +60,7 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work.
- Triage: list first, hydrate few. Use bounded `gh --json --jq`; avoid repeated full comment scans.
- Automatic PR/issue discovery: skip maintainer-owned items unless directly relevant. Do not comment, close, label, retitle, rebase, fix up, or land them without Peter asking.
- PR scan/triage: no unsolicited PR comments/reviews. Report in chat only unless explicitly asked, or a close/duplicate action needs a reason comment.
- Search/dedupe: prefer `gh search issues 'repo:openclaw/openclaw is:open <terms>' --json number,title,state,updatedAt --limit 20`.
- GitHub search boolean text is fussy. If `OR` queries return empty, split exact terms and search title/body/comments separately before concluding no hits.
- PR shortlist: `gh pr list ...`; then `gh pr view <n> --json number,title,body,closingIssuesReferences,files,statusCheckRollup,reviewDecision`.
@@ -117,6 +120,7 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work.
## Tests
- Vitest. Colocated `*.test.ts`; e2e `*.e2e.test.ts`; example models `sonnet-4.6`, `gpt-5.4`.
- Avoid brittle tests that grep workflow/docs strings for operator policy. Prefer executable behavior, parsed config/schema checks, or live run proof; put release/CI policy reminders in AGENTS/docs instead.
- Clean timers/env/globals/mocks/sockets/temp dirs/module state; `--isolate=false` safe.
- Hot tests: avoid per-test `vi.resetModules()` + heavy imports. Measure with `pnpm test:perf:imports <file>` / `pnpm test:perf:hotspots --limit N`.
- Seam depth: pure helper/contract unit tests; one integration smoke per boundary.
@@ -132,7 +136,7 @@ Telegraph style. Root rules only. Read scoped `AGENTS.md` before subtree work.
- Docs change with behavior/API. Use docs list/read_when hints; docs links per `docs/AGENTS.md`.
- Changelog user-facing only; pure test/internal usually no entry.
- Changelog placement: active version `### Changes`/`### Fixes`; every added entry must include at least one `Thanks @author` attribution, using credited GitHub username(s). Never add `Thanks @steipete`.
- Changelog placement: active version `### Changes`/`### Fixes`; every added entry must include at least one `Thanks @author` attribution, using credited GitHub username(s). Never add `Thanks @steipete` or `Thanks @codex`.
- Changelog bullets are always single-line. No wrapping/continuation across multiple lines. Long entries stay on one long line so dedupe, PR-ref, and credit-audit tooling work and so the visual style stays uniform.
## Git

View File

@@ -6,6 +6,22 @@ Docs: https://docs.openclaw.ai
### Fixes
- Codex harness: normalize cached input tokens before session/context accounting so prompt cache reads are not double-counted in `/status`, `session_status`, or persisted `sessionEntry.totalTokens`. Fixes #69298. Thanks @richardmqq.
- Hooks/session-memory: use the host local timezone for memory filenames, fallback timestamp slugs, and markdown headers instead of UTC dates. Fixes #46703. (#46721) Thanks @Astro-Han.
- Feishu: extract quoted/replied interactive-card text across schema 1.0, schema 2.0, i18n, template-variable, and post-format fallback shapes without carrying broad generated/config churn from related parser experiments. (#38776, #60383, #42218, #45936) Thanks @lishuaigit, @lskun, @just2gooo, and @Br1an67.
- Exec approvals: accept a symlinked `OPENCLAW_HOME` as the trusted approvals root while still rejecting symlinked `.openclaw` path components below it. (#64663) Thanks @FunJim.
- Logging: add top-level `hostname`, flattened `message`, and available `agent_id`, `session_id`, and `channel` fields to file-log JSONL records for multi-agent filtering without removing existing structured log arguments. Fixes #51075. Thanks @stevengonsalvez.
- ACP: route server logs to stderr before Gateway config/bootstrap work so ACP stdout remains JSON-RPC only for IDE integrations. Fixes #49060. Thanks @Hollychou924.
- Logging: propagate internal request trace scopes through Gateway HTTP requests and WebSocket frames so file logs, diagnostic events, agent run traces, model-call traces, OTEL spans, and trusted provider `traceparent` headers share a correlatable `traceId` without logging raw request or model content. Fixes #40353. Thanks @liangruochong44-ui.
- Diagnostics/OTEL: capture privacy-safe model-call request payload bytes, streamed response bytes, first-response latency, and total duration in diagnostic events, plugin hooks, stability snapshots, and OTEL model-call spans/metrics without logging raw model content. Fixes #33832. Thanks @wwh830.
- Logging: write validated diagnostic trace context as top-level `traceId`, `spanId`, `parentSpanId`, and `traceFlags` fields in file-log JSONL records so traced requests and model calls are easier to correlate in log processors. Refs #40353. Thanks @liangruochong44-ui.
- Logging/sessions: apply configured redaction patterns to persisted session transcript text and accept escaped character classes in safe custom redaction regexes, so transcript JSONL no longer keeps matching sensitive text in the clear. Fixes #42982. Thanks @panpan0000.
- Providers/Ollama: honor `/api/show` capabilities when registering local models so non-tool Ollama models no longer receive the agent tool surface, and keep native Ollama thinking opt-in instead of enabling it by default. Fixes #64710 and duplicate #65343. Thanks @yuan-b, @netherby, @xilopaint, and @Diyforfun2026.
- Providers/Ollama: expose native Ollama thinking effort levels so `/think max` is accepted for reasoning-capable Ollama models and maps to Ollama's highest supported `think` effort. Fixes #71584. Thanks @g0st1n.
- Agents/Ollama: validate explicit `--thinking max` against catalog-discovered Ollama reasoning metadata so local agent runs accept the same native thinking levels shown in the model catalog. Fixes #71584. Thanks @g0st1n.
- Docker/QA: add observability coverage to the normal Docker aggregate so QA-lab OTEL and Prometheus diagnostics run inside Docker. Thanks @vincentkoc.
- Auto-reply: poison inbound message dedupe after replay-unsafe provider/runtime failures so retries stay safe before visible progress but cannot duplicate messages after block output, tool side effects, or session progress. Fixes #69303; keeps #58549 and #64606 as duplicate validation. Thanks @martingarramon, @NikolaFC, and @zeroth-blip.
- Agents/model fallback: jump directly to a known later live-session model redirect instead of walking unrelated fallback candidates, while preserving the already-landed live-session/fallback loop guard. Fixes #57471; related loop family already closed via #58496. Thanks @yuxiaoyang2007-prog.
- Gateway/Bonjour: keep @homebridge/ciao cancellation handlers registered across advertiser restarts so late probing cancellations cannot crash Linux and other mDNS-churned gateways. Thanks @codex.
- Plugins/startup: load the default `memory-core` slot during Gateway startup when permitted so active-memory recall can call `memory_search` and `memory_get` without requiring an explicit `plugins.slots.memory` entry, while preserving `plugins.slots.memory: "none"`. Thanks @codex.
- Plugins/CLI: prefer native require for compiled bundled plugin JavaScript before jiti so read-only config, status, device, and node commands avoid unnecessary transform overhead on slow hosts. Fixes #62842. Thanks @Effet.
@@ -14,8 +30,14 @@ Docs: https://docs.openclaw.ai
- Plugins/CLI: refresh the persisted registry after managed plugin files are removed so ClawHub uninstall cannot leave stale `plugins list` entries. Thanks @codex.
- Plugins/CLI: make plugin install and uninstall config writes conflict-aware, clear stale denylist entries on explicit reinstall/removal, and delete managed plugin files only after config/index commit succeeds. Thanks @codex.
- Plugins: fail `plugins update` when tracked plugin or hook updates error, keep bundled runtime-dependency repair behind restrictive allowlists, and reject package installs with unloadable extension entries. Thanks @codex.
- WebChat/Control UI: support non-video file attachments in chat uploads while preserving the existing image attachment path and MIME-sniff fallback for generic image uploads. (#70947) Thanks @IAMSamuelRodda.
- Skills/memory: restore Chokidar v5 hot reloads by watching concrete skill and memory roots with filters, including SKILL.md removals and deleted skill folders without broad workspace recursion. Fixes #27404, #33585, and #41606. Thanks @shelvenzhou, @08820048, and @rocke2020.
- Gateway/chat: keep duplicate attachment-backed `chat.send` retries with the same idempotency key on the documented in-flight path so aborts still target the real active run. Fixes #70139. Thanks @Feelw00.
- Plugins: share package entrypoint resolution between install and discovery, reject mismatched `runtimeExtensions`, and cache bundled runtime-dependency manifest reads during scans. Thanks @codex.
- WhatsApp/Web: keep quiet but healthy linked-device sessions connected by basing the watchdog on WhatsApp Web transport activity, while retaining a longer app-silence cap so frame activity cannot mask a stuck session forever. Fixes #70678; carries forward the focused #71466 approach and keeps #63939 as related configurable-timeout follow-up. Thanks @vincentkoc and @oromeis.
- Discord/gateway: count failed health-monitor restart attempts toward cooldown and hourly caps, and evict stale account lifecycle state during channel reloads so repeated Discord gateway recovery cannot loop on old status. Fixes #38596. (#40413) Thanks @jellyAI-dev and @vashquez.
- Cron/context engine: run isolated cron jobs under run-scoped context-engine session keys so prior runs of the same job are not inherited unless the job is explicitly session-bound. (#72292) Thanks @jalehman.
- Control UI: localize command palette labels, categories, skill shortcuts, footer hints, and connect-command copy labels while preserving localized command palette search matching. (#61130, #61119) Thanks @rubensfox20.
## 2026.4.26
@@ -28,6 +50,7 @@ Docs: https://docs.openclaw.ai
- Onboarding/models: keep skip-auth and provider-scoped model picker prompts off the full global model catalog path, and cache provider catalog hook resolution so setup no longer stalls after auth on large plugin registries. Thanks @shakkernerd.
- Gateway/Bonjour: suppress known @homebridge/ciao cancellation and network assertion failures through scoped process handlers so malformed mDNS packets or restricted VPS networking disable/restart Bonjour instead of crashing the gateway. Fixes #67578. Thanks @zenassist26-create.
- Discord: keep late clicks on already-resolved exec approval buttons quiet when elevated mode auto-resolved the request, while still surfacing real approval submission failures. Fixes #66906. Thanks @rlerikse.
- Telegram: send a fresh final message for long-lived preview-streamed replies so the visible Telegram timestamp reflects completion time instead of the preview creation time. Thanks @rubencu.
## 2026.4.25

View File

@@ -9,22 +9,19 @@
# bundled plugin workspace tree, so the main build layer is not invalidated by
# unrelated plugin source changes.
#
# Two runtime variants:
# Default (bookworm): docker build .
# Slim (bookworm-slim): docker build --build-arg OPENCLAW_VARIANT=slim .
# Build stages use full bookworm; the runtime image is always bookworm-slim.
ARG OPENCLAW_EXTENSIONS=""
ARG OPENCLAW_VARIANT=default
ARG OPENCLAW_BUNDLED_PLUGIN_DIR=extensions
ARG OPENCLAW_DOCKER_APT_UPGRADE=1
ARG OPENCLAW_NODE_BOOKWORM_IMAGE="node:24-bookworm@sha256:3a09aa6354567619221ef6c45a5051b671f953f0a1924d1f819ffb236e520e6b"
ARG OPENCLAW_NODE_BOOKWORM_DIGEST="sha256:3a09aa6354567619221ef6c45a5051b671f953f0a1924d1f819ffb236e520e6b"
ARG OPENCLAW_NODE_BOOKWORM_SLIM_IMAGE="node:24-bookworm-slim@sha256:e8e2e91b1378f83c5b2dd15f0247f34110e2fe895f6ca7719dbb780f929368eb"
ARG OPENCLAW_NODE_BOOKWORM_SLIM_DIGEST="sha256:e8e2e91b1378f83c5b2dd15f0247f34110e2fe895f6ca7719dbb780f929368eb"
# Base images are pinned to SHA256 digests for reproducible builds.
# Trade-off: digests must be updated manually when upstream tags move.
# To update, run: docker buildx imagetools inspect node:24-bookworm (or podman)
# and replace the digest below with the current multi-arch manifest list entry.
# Dependabot refreshes these blessed digests; release builds consume the
# reviewed base snapshot instead of mutating distro state on every build.
# To update, run: docker buildx imagetools inspect node:24-bookworm and
# node:24-bookworm-slim (or podman) and replace the digests below with the
# current multi-arch manifest list entries.
FROM ${OPENCLAW_NODE_BOOKWORM_IMAGE} AS ext-deps
ARG OPENCLAW_EXTENSIONS
@@ -125,22 +122,15 @@ RUN printf 'packages:\n - .\n - ui\n' > /tmp/pnpm-workspace.runtime.yaml && \
node scripts/postinstall-bundled-plugins.mjs && \
find dist -type f \( -name '*.d.ts' -o -name '*.d.mts' -o -name '*.d.cts' -o -name '*.map' \) -delete
# ── Runtime base images ─────────────────────────────────────────
FROM ${OPENCLAW_NODE_BOOKWORM_IMAGE} AS base-default
ARG OPENCLAW_NODE_BOOKWORM_DIGEST
LABEL org.opencontainers.image.base.name="docker.io/library/node:24-bookworm" \
org.opencontainers.image.base.digest="${OPENCLAW_NODE_BOOKWORM_DIGEST}"
FROM ${OPENCLAW_NODE_BOOKWORM_SLIM_IMAGE} AS base-slim
# ── Runtime base image ─────────────────────────────────────────
FROM ${OPENCLAW_NODE_BOOKWORM_SLIM_IMAGE} AS base-runtime
ARG OPENCLAW_NODE_BOOKWORM_SLIM_DIGEST
LABEL org.opencontainers.image.base.name="docker.io/library/node:24-bookworm-slim" \
org.opencontainers.image.base.digest="${OPENCLAW_NODE_BOOKWORM_SLIM_DIGEST}"
# ── Stage 3: Runtime ────────────────────────────────────────────
FROM base-${OPENCLAW_VARIANT}
ARG OPENCLAW_VARIANT
FROM base-runtime
ARG OPENCLAW_BUNDLED_PLUGIN_DIR
ARG OPENCLAW_DOCKER_APT_UPGRADE
# OCI base-image metadata for downstream image consumers.
# If you change these annotations, also update:
@@ -155,16 +145,10 @@ LABEL org.opencontainers.image.source="https://github.com/openclaw/openclaw" \
WORKDIR /app
# Install system utilities present in bookworm but missing in bookworm-slim.
# On the full bookworm image these are already installed (apt-get is a no-op).
# Smoke workflows can opt out of distro upgrades to cut repeated CI time while
# keeping the default runtime image behavior unchanged.
# Install runtime system utilities missing from bookworm-slim.
RUN --mount=type=cache,id=openclaw-bookworm-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-bookworm-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update && \
if [ "${OPENCLAW_DOCKER_APT_UPGRADE}" != "0" ]; then \
DEBIAN_FRONTEND=noninteractive apt-get upgrade -y --no-install-recommends; \
fi && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
procps hostname curl git lsof openssl

View File

@@ -7,7 +7,6 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN --mount=type=cache,id=openclaw-sandbox-bookworm-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-sandbox-bookworm-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update \
&& apt-get upgrade -y --no-install-recommends \
&& apt-get install -y --no-install-recommends \
bash \
ca-certificates \

View File

@@ -7,7 +7,6 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN --mount=type=cache,id=openclaw-sandbox-bookworm-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-sandbox-bookworm-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update \
&& apt-get upgrade -y --no-install-recommends \
&& apt-get install -y --no-install-recommends \
bash \
ca-certificates \

View File

@@ -24,7 +24,6 @@ ENV PATH=${BUN_INSTALL_DIR}/bin:${BREW_INSTALL_DIR}/bin:${BREW_INSTALL_DIR}/sbin
RUN --mount=type=cache,id=openclaw-sandbox-common-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-sandbox-common-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update \
&& apt-get upgrade -y --no-install-recommends \
&& apt-get install -y --no-install-recommends ${PACKAGES}
RUN if [ "${INSTALL_PNPM}" = "1" ]; then npm install -g pnpm; fi

View File

@@ -6,9 +6,9 @@ services:
TERM: xterm-256color
OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN:-}
OPENCLAW_ALLOW_INSECURE_PRIVATE_WS: ${OPENCLAW_ALLOW_INSECURE_PRIVATE_WS:-}
# Docker bridge networks usually do not carry mDNS multicast reliably.
# Set OPENCLAW_DISABLE_BONJOUR=0 only on host/macvlan/mDNS-capable networks.
OPENCLAW_DISABLE_BONJOUR: ${OPENCLAW_DISABLE_BONJOUR:-1}
# Empty means auto: Bonjour disables itself in detected containers.
# Set 0 only on host/macvlan/mDNS-capable networks; set 1 to force off.
OPENCLAW_DISABLE_BONJOUR: ${OPENCLAW_DISABLE_BONJOUR:-}
# OpenTelemetry export is outbound OTLP/HTTP from the Gateway. Prometheus
# uses the existing authenticated Gateway route; it does not need a port.
OTEL_EXPORTER_OTLP_ENDPOINT: ${OTEL_EXPORTER_OTLP_ENDPOINT:-}

View File

@@ -1,4 +1,4 @@
7fa6e35bb9f9d3096d6281f141488be0dcfe15de40dc4f5c0305eb1ff2bc60b6 config-baseline.json
5f5fb87fd46f9cbb84d8af17e00ae3c4b74062e8ad517bc2260ba83da2e9014f config-baseline.core.json
3e6dd8292d9350b0ccc243f81f7b6e95494fc769c01c084d8d6d6e9e1f668a14 config-baseline.json
e040e5818afe66d71fc8a7ae1653f1e8c252cc5b51480ef3b4ae1269682b9ade config-baseline.core.json
7cd9c908f066c143eab2a201efbc9640f483ab28bba92ddeca1d18cc2b528bc3 config-baseline.channel.json
f9e0174988718959fe1923a54496ec5b9262721fe1e7306f32ccb1316d9d9c3f config-baseline.plugin.json
74b74cb18ac37c0acaa765f398f1f9edbcee4c43567f02d45c89598a1e13afb4 config-baseline.plugin.json

View File

@@ -1,2 +1,2 @@
fd941e0485a92ebb8256cf2256330b58c2d5bd94189f4a05d7394353ef7bed88 plugin-sdk-api-baseline.json
11ef8362518a0d9f221dc1958b25db46956d1916f278b53e52199bf6c2cbc65b plugin-sdk-api-baseline.jsonl
21914ef8c5840e0defc36d571834dc28a92d6d5ca2d42a088c33b4de681e836a plugin-sdk-api-baseline.json
3f22e6af0dad3433d25d996802d7436a3cc0e68bc86ecaf813a22e2b4e5333eb plugin-sdk-api-baseline.jsonl

View File

@@ -173,7 +173,7 @@ openclaw hooks enable <hook-name>
### session-memory details
Extracts the last 15 user/assistant messages, generates a descriptive filename slug via LLM, and saves to `<workspace>/memory/YYYY-MM-DD-slug.md`. Requires `workspace.dir` to be configured.
Extracts the last 15 user/assistant messages, generates a descriptive filename slug via LLM, and saves to `<workspace>/memory/YYYY-MM-DD-slug.md` using the host local date. Requires `workspace.dir` to be configured.
<a id="bootstrap-extra-files"></a>

View File

@@ -298,8 +298,8 @@ curl "https://api.telegram.org/bot<bot_token>/getUpdates"
For text-only replies:
- DM: OpenClaw keeps the same preview message and performs a final edit in place (no second message)
- group/topic: OpenClaw keeps the same preview message and performs a final edit in place (no second message)
- short DM/group/topic previews: OpenClaw keeps the same preview message and performs a final edit in place
- previews older than about one minute: OpenClaw sends the completed reply as a fresh final message and then cleans up the preview, so Telegram's visible timestamp reflects completion time instead of the preview creation time
For complex replies (for example media payloads), OpenClaw falls back to normal final delivery and then cleans up the preview message.

View File

@@ -146,6 +146,7 @@ OpenClaw recommends running WhatsApp on a separate number when possible. (The ch
## Runtime model
- Gateway owns the WhatsApp socket and reconnect loop.
- The reconnect watchdog uses WhatsApp Web transport activity, not only inbound app-message volume, so a quiet linked-device session is not restarted solely because nobody has sent a message recently. A longer application-silence cap still forces a reconnect if transport frames keep arriving but no application messages are handled for the watchdog window.
- Outbound sends require an active WhatsApp listener for the target account.
- Status and broadcast chats are ignored (`@status`, `@broadcast`).
- Direct chats use DM session rules (`session.dmScope`; default `main` collapses DMs to the agent main session).
@@ -510,6 +511,10 @@ Behavior notes:
<Accordion title="Linked but disconnected / reconnect loop">
Symptom: linked account with repeated disconnects or reconnect attempts.
Quiet accounts can stay connected past the normal message timeout; the watchdog
restarts when WhatsApp Web transport activity stops, the socket closes, or
application-level activity stays silent beyond the longer safety window.
Fix:
```bash

File diff suppressed because one or more lines are too long

View File

@@ -21,8 +21,12 @@ calls paired with their matching `toolResult` entries. If a split point lands
inside a tool block, OpenClaw moves the boundary so the pair stays together and
the current unsummarized tail is preserved.
The full conversation history stays on disk. Compaction only changes what the
model sees on the next turn.
By default, OpenClaw also rewrites the session transcript after compaction and
removes the message entries that were summarized. The persisted summary and
recent unsummarized tail remain on disk. Set
`agents.defaults.compaction.truncateAfterCompaction` to `false` if you need the
older behavior where compaction only changed what the model saw on the next
turn and left the full transcript intact.
## Auto-compaction

View File

@@ -265,6 +265,7 @@ That means fallback retries have to coordinate with live model switching:
- System-driven model changes such as fallback rotation, heartbeat overrides, or compaction never mark a pending live switch on their own.
- Before a fallback retry starts, the reply runner persists the selected fallback override fields to the session entry.
- Live-session reconciliation prefers persisted session overrides over stale runtime model fields.
- If a live-switch error points at a later candidate in the active fallback chain, OpenClaw jumps directly to that selected model instead of walking unrelated candidates first.
- If the fallback attempt fails, the runner rolls back only the override fields it wrote, and only if they still match that failed candidate.
This prevents the classic race:

View File

@@ -65,6 +65,15 @@ model calls must not export `StreamAbandoned` on successful turns; raw diagnosti
`openclaw.content.*` attributes must stay out of the trace. It writes
`otel-smoke-summary.json` next to the QA suite artifacts.
The normal Docker aggregate and release-path core chunk also run an
observability lane. It reuses the shared package-installed functional Docker
image, mounts the QA harness files read-only, runs the OTEL trace smoke inside
the container, then runs the `docker-prometheus-smoke` QA scenario with the
`diagnostics-prometheus` plugin enabled. Set
`OPENCLAW_DOCKER_OBSERVABILITY_LOOPS=<count>` to repeat both checks inside one
Docker run while preserving per-loop artifacts under
`.artifacts/docker-observability/...`.
For a transport-real Matrix smoke lane, run:
```bash

View File

@@ -152,6 +152,7 @@ Legacy key migration:
Telegram:
- Uses `sendMessage` + `editMessageText` preview updates across DMs and group/topics.
- Sends a fresh final message instead of editing in place when a preview has been visible for about one minute, then cleans up the preview so Telegram's timestamp reflects reply completion.
- Preview streaming is skipped when Telegram block streaming is explicitly enabled (to avoid double-streaming).
- `/reasoning stream` can write reasoning to preview.

View File

@@ -179,11 +179,10 @@ openclaw plugins disable bonjour
## Docker gotchas
Bundled Docker Compose sets `OPENCLAW_DISABLE_BONJOUR=1` for the Gateway service
by default. Docker bridge networks usually do not forward mDNS multicast
(`224.0.0.251:5353`) between the container and the LAN, so leaving Bonjour on can
produce repeated ciao `probing` or `announcing` failures without making discovery
work.
The bundled Bonjour plugin auto-disables LAN multicast advertising in detected
containers when `OPENCLAW_DISABLE_BONJOUR` is unset. Docker bridge networks
usually do not forward mDNS multicast (`224.0.0.251:5353`) between the container
and the LAN, so advertising from the container rarely makes discovery work.
Important gotchas:
@@ -193,16 +192,16 @@ Important gotchas:
`OPENCLAW_GATEWAY_BIND=lan` so the published host port can work.
- Disabling Bonjour does not disable wide-area DNS-SD. Use wide-area discovery
or Tailnet when the Gateway and node are not on the same LAN.
- Reusing the same `OPENCLAW_CONFIG_DIR` outside Docker does not inherit the
Compose default unless the environment still sets `OPENCLAW_DISABLE_BONJOUR`.
- Reusing the same `OPENCLAW_CONFIG_DIR` outside Docker does not persist the
container auto-disable policy.
- Set `OPENCLAW_DISABLE_BONJOUR=0` only for host networking, macvlan, or another
network where mDNS multicast is known to pass.
network where mDNS multicast is known to pass; set it to `1` to force-disable.
## Troubleshooting disabled Bonjour
If a node no longer auto-discovers the Gateway after Docker setup:
1. Confirm whether the Gateway is intentionally suppressing LAN advertising:
1. Confirm whether the Gateway is running in auto, forced-on, or forced-off mode:
```bash
docker compose config | grep OPENCLAW_DISABLE_BONJOUR
@@ -239,9 +238,9 @@ If a node no longer auto-discovers the Gateway after Docker setup:
container bridges, WSL, or interface churn can leave the ciao advertiser in a
non-announced state. OpenClaw retries a few times and then disables Bonjour
for the current Gateway process instead of restarting the advertiser forever.
- **Docker bridge networking**: bundled Docker Compose disables Bonjour by
default with `OPENCLAW_DISABLE_BONJOUR=1`. Set it to `0` only for host,
macvlan, or another mDNS-capable network.
- **Docker bridge networking**: Bonjour auto-disables in detected containers.
Set `OPENCLAW_DISABLE_BONJOUR=0` only for host, macvlan, or another
mDNS-capable network.
- **Sleep / interface churn**: macOS may temporarily drop mDNS results; retry.
- **Browse works but resolve fails**: keep machine names simple (avoid emojis or
punctuation), then restart the Gateway. The service instance name derives from
@@ -260,7 +259,8 @@ sequences (e.g. spaces become `\032`).
- `openclaw plugins disable bonjour` disables LAN multicast advertising by disabling the bundled plugin.
- `openclaw plugins enable bonjour` restores the default LAN discovery plugin.
- `OPENCLAW_DISABLE_BONJOUR=1` disables LAN multicast advertising without changing plugin config; accepted truthy values are `1`, `true`, `yes`, and `on` (legacy: `OPENCLAW_DISABLE_BONJOUR`).
- Docker Compose sets `OPENCLAW_DISABLE_BONJOUR=1` by default for bridge networking; override with `OPENCLAW_DISABLE_BONJOUR=0` only when mDNS multicast is available.
- `OPENCLAW_DISABLE_BONJOUR=0` forces LAN multicast advertising on, including inside detected containers; accepted falsy values are `0`, `false`, `no`, and `off`.
- When `OPENCLAW_DISABLE_BONJOUR` is unset, Bonjour advertises on normal hosts and auto-disables inside detected containers.
- `gateway.bind` in `~/.openclaw/openclaw.json` controls the Gateway bind mode.
- `OPENCLAW_SSH_PORT` overrides the SSH port when `sshPort` is advertised (legacy: `OPENCLAW_SSH_PORT`).
- `OPENCLAW_TAILNET_DNS` publishes a MagicDNS hint in TXT when mDNS full mode is enabled (legacy: `OPENCLAW_TAILNET_DNS`).

View File

@@ -859,6 +859,7 @@ Notes:
- Set `logging.file` for a stable path.
- `consoleLevel` bumps to `debug` when `--verbose`.
- `maxFileBytes`: maximum active log file size in bytes before rotation (positive integer; default: `104857600` = 100 MB). OpenClaw keeps up to five numbered archives beside the active file.
- `redactSensitive` / `redactPatterns`: best-effort masking for console output, file logs, OTLP log records, and persisted session transcript text.
---

View File

@@ -86,9 +86,9 @@ Security notes:
Disable/override:
- `OPENCLAW_DISABLE_BONJOUR=1` disables advertising.
- Docker Compose defaults `OPENCLAW_DISABLE_BONJOUR=1` because bridge networks
usually do not carry mDNS multicast reliably; use `0` only on host, macvlan,
or another mDNS-capable network.
- When `OPENCLAW_DISABLE_BONJOUR` is unset, Bonjour advertises on normal hosts
and auto-disables inside detected containers. Use `0` only on host, macvlan,
or another mDNS-capable network; use `1` to force-disable.
- `gateway.bind` in `~/.openclaw/openclaw.json` controls the Gateway bind mode.
- `OPENCLAW_SSH_PORT` overrides the SSH port advertised when `sshPort` is emitted.
- `OPENCLAW_TAILNET_DNS` publishes a `tailnetDns` hint (MagicDNS).

View File

@@ -52,10 +52,12 @@ You can tune console verbosity independently via:
- `logging.consoleLevel` (default `info`)
- `logging.consoleStyle` (`pretty` | `compact` | `json`)
## Tool summary redaction
## Redaction
Verbose tool summaries (e.g. `🛠️ Exec: ...`) can mask sensitive tokens before they hit the
console stream. This is **tools-only** and does not alter file logs.
OpenClaw can mask sensitive tokens before log or transcript output leaves the
process. The same redaction policy is applied at console, file-log, OTLP
log-record, and session transcript text sinks, so matching secret values are
masked before JSONL lines or messages are written to disk.
- `logging.redactSensitive`: `off` | `tools` (default: `tools`)
- `logging.redactPatterns`: array of regex strings (overrides defaults)

View File

@@ -147,9 +147,17 @@ When any subkey is enabled, model and tool spans get bounded, redacted
- **Traces:** `diagnostics.otel.sampleRate` (root-span only, `0.0` drops all,
`1.0` keeps all).
- **Metrics:** `diagnostics.otel.flushIntervalMs` (minimum `1000`).
- **Logs:** OTLP logs respect `logging.level` (file log level). Console
redaction does **not** apply to OTLP logs. High-volume installs should
prefer OTLP collector sampling/filtering over local sampling.
- **Logs:** OTLP logs respect `logging.level` (file log level). They use the
diagnostic log-record redaction path, not console formatting. High-volume
installs should prefer OTLP collector sampling/filtering over local sampling.
- **File-log correlation:** JSONL file logs include top-level `traceId`,
`spanId`, `parentSpanId`, and `traceFlags` when the log call carries a valid
diagnostic trace context, which lets log processors join local log lines with
exported spans.
- **Request correlation:** Gateway HTTP requests and WebSocket frames create an
internal request trace scope. Logs and diagnostic events inside that scope
inherit the request trace by default, while agent run and model-call spans are
created as children so provider `traceparent` headers stay on the same trace.
## Exported metrics
@@ -161,6 +169,10 @@ When any subkey is enabled, model and tool spans get bounded, redacted
- `openclaw.context.tokens` (histogram, attrs: `openclaw.context`, `openclaw.channel`, `openclaw.provider`, `openclaw.model`)
- `gen_ai.client.token.usage` (histogram, GenAI semantic-conventions metric, attrs: `gen_ai.token.type` = `input`/`output`, `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`)
- `gen_ai.client.operation.duration` (histogram, seconds, GenAI semantic-conventions metric, attrs: `gen_ai.provider.name`, `gen_ai.operation.name`, `gen_ai.request.model`, optional `error.type`)
- `openclaw.model_call.duration_ms` (histogram, attrs: `openclaw.provider`, `openclaw.model`, `openclaw.api`, `openclaw.transport`)
- `openclaw.model_call.request_bytes` (histogram, UTF-8 byte size of the final model request payload; no raw payload content)
- `openclaw.model_call.response_bytes` (histogram, UTF-8 byte size of streamed model response events; no raw response content)
- `openclaw.model_call.time_to_first_byte_ms` (histogram, elapsed time before the first streamed response event)
### Message flow
@@ -212,6 +224,7 @@ When any subkey is enabled, model and tool spans get bounded, redacted
- `openclaw.model.call`
- `gen_ai.system` by default, or `gen_ai.provider.name` when the latest GenAI semantic conventions are opted in
- `gen_ai.request.model`, `gen_ai.operation.name`, `openclaw.provider`, `openclaw.model`, `openclaw.api`, `openclaw.transport`
- `openclaw.model_call.request_bytes`, `openclaw.model_call.response_bytes`, `openclaw.model_call.time_to_first_byte_ms`
- `openclaw.provider.request_id_hash` (bounded SHA-based hash of the upstream provider request id; raw ids are not exported)
- `openclaw.harness.run`
- `openclaw.harness.id`, `openclaw.harness.plugin`, `openclaw.outcome`, `openclaw.provider`, `openclaw.model`, `openclaw.channel`

View File

@@ -999,7 +999,7 @@ Logs and transcripts can leak sensitive info even when access controls are corre
Recommendations:
- Keep tool summary redaction on (`logging.redactSensitive: "tools"`; default).
- Keep log and transcript redaction on (`logging.redactSensitive: "tools"`; default).
- Add custom patterns for your environment via `logging.redactPatterns` (tokens, hostnames, internal URLs).
- When sharing diagnostics, prefer `openclaw status --all` (pasteable, secrets redacted) over raw logs.
- Prune old session transcripts and log files if you dont need long retention.

View File

@@ -227,10 +227,12 @@ Notes:
- `OPENCLAW_LIVE_ACP_BIND_CODEX_MODEL=gpt-5.2`
- `OPENCLAW_LIVE_ACP_BIND_OPENCODE_MODEL=opencode/kimi-k2.6`
- `OPENCLAW_LIVE_ACP_BIND_REQUIRE_TRANSCRIPT=1`
- `OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1`
- `OPENCLAW_LIVE_ACP_BIND_PARENT_MODEL=openai/gpt-5.2`
- Notes:
- This lane uses the gateway `chat.send` surface with admin-only synthetic originating-route fields so tests can attach message-channel context without pretending to deliver externally.
- When `OPENCLAW_LIVE_ACP_BIND_AGENT_COMMAND` is unset, the test uses the embedded `acpx` plugin's built-in agent registry for the selected ACP harness agent.
- Bound-session cron MCP creation is best-effort by default because external ACP harnesses can cancel MCP calls after the bind/image proof has passed; set `OPENCLAW_LIVE_ACP_BIND_REQUIRE_CRON=1` to make that post-bind cron probe strict.
Example:

View File

@@ -411,9 +411,9 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
- Untargeted `pnpm test` runs twelve smaller shard configs (`core-unit-fast`, `core-unit-src`, `core-unit-security`, `core-unit-ui`, `core-unit-support`, `core-support-boundary`, `core-contracts`, `core-bundled`, `core-runtime`, `agentic`, `auto-reply`, `extensions`) instead of one giant native root-project process. This cuts peak RSS on loaded machines and avoids auto-reply/extension work starving unrelated suites.
- `pnpm test --watch` still uses the native root `vitest.config.ts` project graph, because a multi-shard watch loop is not practical.
- `pnpm test`, `pnpm test:watch`, and `pnpm test:perf:imports` route explicit file/directory targets through scoped lanes first, so `pnpm test extensions/discord/src/monitor/message-handler.preflight.test.ts` avoids paying the full root project startup tax.
- `pnpm test:changed` expands changed git paths into the same scoped lanes when the diff only touches routable source/test files; config/setup edits still fall back to the broad root-project rerun.
- `pnpm check:changed` is the normal smart local gate for narrow work. It classifies the diff into core, core tests, extensions, extension tests, apps, docs, release metadata, live Docker tooling, and tooling, then runs the matching typecheck/lint/test lanes. Public Plugin SDK and plugin-contract changes include one extension validation pass because extensions depend on those core contracts. Release metadata-only version bumps run targeted version/config/root-dependency checks instead of the full suite, with a guard that rejects package changes outside the top-level version field.
- Live Docker ACP harness edits run a focused local gate: shell syntax for the live Docker auth scripts, live Docker scheduler dry-run, ACP bind unit tests, and the ACPX extension tests. `package.json` changes are included only when the diff is limited to `scripts["test:docker:live-*"]`; dependency, export, version, and other package-surface edits still use the broader guards.
- `pnpm test:changed` expands changed git paths into cheap scoped lanes by default: direct test edits, sibling `*.test.ts` files, explicit source mappings, and local import-graph dependents. Config/setup/package edits do not broad-run tests unless you explicitly use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed`.
- `pnpm check:changed` is the normal smart local check gate for narrow work. It classifies the diff into core, core tests, extensions, extension tests, apps, docs, release metadata, live Docker tooling, and tooling, then runs the matching typecheck, lint, and guard commands. It does not run Vitest tests; call `pnpm test:changed` or explicit `pnpm test <target>` for test proof. Release metadata-only version bumps run targeted version/config/root-dependency checks, with a guard that rejects package changes outside the top-level version field.
- Live Docker ACP harness edits run focused checks: shell syntax for the live Docker auth scripts and a live Docker scheduler dry-run. `package.json` changes are included only when the diff is limited to `scripts["test:docker:live-*"]`; dependency, export, version, and other package-surface edits still use the broader guards.
- Import-light unit tests from agents, commands, plugins, auto-reply helpers, `plugin-sdk`, and similar pure utility areas route through the `unit-fast` lane, which skips `test/setup-openclaw-runtime.ts`; stateful/runtime-heavy files stay on the existing lanes.
- Selected `plugin-sdk` and `commands` helper source files also map changed-mode runs to explicit sibling tests in those light lanes, so helper edits avoid rerunning the full heavy suite for that directory.
- `auto-reply` has dedicated buckets for top-level core helpers, top-level `reply.*` integration tests, and the `src/auto-reply/reply/**` subtree. CI further splits the reply subtree into agent-runner, dispatch, and commands/state-routing shards so one import-heavy bucket does not own the full Node tail.
@@ -458,10 +458,11 @@ Think of the suites as “increasing realism” (and increasing flakiness/cost):
- The pre-commit hook is formatting-only. It restages formatted files and
does not run lint, typecheck, or tests.
- Run `pnpm check:changed` explicitly before handoff or push when you
need the smart local gate. Public Plugin SDK and plugin-contract
changes include one extension validation pass.
- `pnpm test:changed` routes through scoped lanes when the changed paths
map cleanly to a smaller suite.
need the smart local check gate.
- `pnpm test:changed` routes through cheap scoped lanes by default. Use
`OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when the agent
decides a harness, config, package, or contract edit really needs broader
Vitest coverage.
- `pnpm test:max` and `pnpm test:changed:max` keep the same routing
behavior, just with a higher worker cap.
- Local worker auto-scaling is intentionally conservative and backs off
@@ -606,7 +607,7 @@ These Docker runners split into two buckets:
`OPENCLAW_LIVE_GATEWAY_STEP_TIMEOUT_MS=45000`, and
`OPENCLAW_LIVE_GATEWAY_MODEL_TIMEOUT_MS=90000`. Override those env vars when you
explicitly want the larger exhaustive scan.
- `test:docker:all` builds the live Docker image once via `test:docker:live-build`, then reuses it for the live Docker lanes. It also builds one shared `scripts/e2e/Dockerfile` image via `test:docker:e2e-build` and reuses it for the E2E container smoke runners that exercise the built app. The aggregate uses a weighted local scheduler: `OPENCLAW_DOCKER_ALL_PARALLELISM` controls process slots, while resource caps keep heavy live, npm-install, and multi-service lanes from all starting at once. Defaults are 10 slots, `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=6`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=8`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7`; tune `OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT` or `OPENCLAW_DOCKER_ALL_DOCKER_LIMIT` only when the Docker host has more headroom. The runner performs a Docker preflight by default, removes stale OpenClaw E2E containers, prints status every 30 seconds, stores successful lane timings in `.artifacts/docker-tests/lane-timings.json`, and uses those timings to start longer lanes first on later runs. Use `OPENCLAW_DOCKER_ALL_DRY_RUN=1` to print the weighted lane manifest without building or running Docker.
- `test:docker:all` builds the live Docker image once via `test:docker:live-build`, packs OpenClaw once as an npm tarball through `scripts/package-openclaw-for-docker.mjs`, then builds/reuses two `scripts/e2e/Dockerfile` images. The bare image is only the Node/Git runner for install/update/plugin-dependency lanes; those lanes mount the prebuilt tarball. The functional image installs the same tarball into `/app` for built-app functionality lanes. Docker lane definitions live in `scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in `scripts/lib/docker-e2e-plan.mjs`; `scripts/test-docker-all.mjs` executes the selected plan. The aggregate uses a weighted local scheduler: `OPENCLAW_DOCKER_ALL_PARALLELISM` controls process slots, while resource caps keep heavy live, npm-install, and multi-service lanes from all starting at once. Defaults are 10 slots, `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=10`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7`; tune `OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT` or `OPENCLAW_DOCKER_ALL_DOCKER_LIMIT` only when the Docker host has more headroom. The runner performs a Docker preflight by default, removes stale OpenClaw E2E containers, prints status every 30 seconds, stores successful lane timings in `.artifacts/docker-tests/lane-timings.json`, and uses those timings to start longer lanes first on later runs. Use `OPENCLAW_DOCKER_ALL_DRY_RUN=1` to print the weighted lane manifest without building or running Docker, or `node scripts/test-docker-all.mjs --plan-json` to print the CI plan for selected lanes, package/image needs, and credentials.
- Container smoke runners: `test:docker:openwebui`, `test:docker:onboard`, `test:docker:npm-onboard-channel-agent`, `test:docker:update-channel-switch`, `test:docker:session-runtime-context`, `test:docker:agents-delete-shared-workspace`, `test:docker:gateway-network`, `test:docker:browser-cdp-snapshot`, `test:docker:mcp-channels`, `test:docker:pi-bundle-mcp-tools`, `test:docker:cron-mcp-cleanup`, `test:docker:plugins`, `test:docker:plugin-update`, and `test:docker:config-reload` boot one or more real containers and verify higher-level integration paths.
The live-model Docker runners also bind-mount only the needed CLI auth homes (or all supported ones when the run is not narrowed), then copy them into the container home before the run so external-CLI OAuth can refresh tokens without mutating the host auth store:
@@ -616,13 +617,14 @@ The live-model Docker runners also bind-mount only the needed CLI auth homes (or
- CLI backend smoke: `pnpm test:docker:live-cli-backend` (script: `scripts/test-live-cli-backend-docker.sh`)
- Codex app-server harness smoke: `pnpm test:docker:live-codex-harness` (script: `scripts/test-live-codex-harness-docker.sh`)
- Gateway + dev agent: `pnpm test:docker:live-gateway` (script: `scripts/test-live-gateway-models-docker.sh`)
- Docker observability smoke: included in `pnpm test:docker:all`, `pnpm test:docker:local:all`, and the release-path `core` chunk (script: `scripts/e2e/docker-observability-smoke.sh`). It runs QA-lab OTEL and Prometheus diagnostics checks inside the shared package-installed functional Docker image, with only QA harness files mounted read-only. Set `OPENCLAW_DOCKER_OBSERVABILITY_LOOPS=<count>` to repeat both checks in one container run.
- Open WebUI live smoke: `pnpm test:docker:openwebui` (script: `scripts/e2e/openwebui-docker.sh`)
- Onboarding wizard (TTY, full scaffolding): `pnpm test:docker:onboard` (script: `scripts/e2e/onboard-docker.sh`)
- Npm tarball onboarding/channel/agent smoke: `pnpm test:docker:npm-onboard-channel-agent` installs the packed OpenClaw tarball globally in Docker, configures OpenAI via env-ref onboarding plus Telegram by default, verifies doctor repairs activated plugin runtime deps, and runs one mocked OpenAI agent turn. Reuse a prebuilt tarball with `OPENCLAW_NPM_ONBOARD_PACKAGE_TGZ=/path/to/openclaw-*.tgz`, skip the host rebuild with `OPENCLAW_NPM_ONBOARD_HOST_BUILD=0`, or switch channel with `OPENCLAW_NPM_ONBOARD_CHANNEL=discord`.
- Npm tarball onboarding/channel/agent smoke: `pnpm test:docker:npm-onboard-channel-agent` installs the packed OpenClaw tarball globally in Docker, configures OpenAI via env-ref onboarding plus Telegram by default, verifies doctor repairs activated plugin runtime deps, and runs one mocked OpenAI agent turn. Reuse a prebuilt tarball with `OPENCLAW_CURRENT_PACKAGE_TGZ=/path/to/openclaw-*.tgz`, skip the host rebuild with `OPENCLAW_NPM_ONBOARD_HOST_BUILD=0`, or switch channel with `OPENCLAW_NPM_ONBOARD_CHANNEL=discord`.
- Update channel switch smoke: `pnpm test:docker:update-channel-switch` installs the packed OpenClaw tarball globally in Docker, switches from package `stable` to git `dev`, verifies the persisted channel and plugin post-update work, then switches back to package `stable` and checks update status.
- Session runtime context smoke: `pnpm test:docker:session-runtime-context` verifies hidden runtime context transcript persistence plus doctor repair of affected duplicated prompt-rewrite branches.
- Bun global install smoke: `bash scripts/e2e/bun-global-install-smoke.sh` packs the current tree, installs it with `bun install -g` in an isolated home, and verifies `openclaw infer image providers --json` returns bundled image providers instead of hanging. Reuse a prebuilt tarball with `OPENCLAW_BUN_GLOBAL_SMOKE_PACKAGE_TGZ=/path/to/openclaw-*.tgz`, skip the host build with `OPENCLAW_BUN_GLOBAL_SMOKE_HOST_BUILD=0`, or copy `dist/` from a built Docker image with `OPENCLAW_BUN_GLOBAL_SMOKE_DIST_IMAGE=openclaw-dockerfile-smoke:local`.
- Installer Docker smoke: `bash scripts/test-install-sh-docker.sh` shares one npm cache across its root, update, and direct-npm containers. Update smoke defaults to npm `latest` as the stable baseline before upgrading to the candidate tarball. Non-root installer checks keep an isolated npm cache so root-owned cache entries do not mask user-local install behavior. Set `OPENCLAW_INSTALL_SMOKE_NPM_CACHE_DIR=/path/to/cache` to reuse the root/update/direct-npm cache across local reruns.
- Installer Docker smoke: `bash scripts/test-install-sh-docker.sh` shares one npm cache across its root, update, and direct-npm containers. Update smoke defaults to npm `latest` as the stable baseline before upgrading to the candidate tarball. Override with `OPENCLAW_INSTALL_SMOKE_UPDATE_BASELINE=2026.4.22` locally, or with the Install Smoke workflow's `update_baseline_version` input on GitHub. Non-root installer checks keep an isolated npm cache so root-owned cache entries do not mask user-local install behavior. Set `OPENCLAW_INSTALL_SMOKE_NPM_CACHE_DIR=/path/to/cache` to reuse the root/update/direct-npm cache across local reruns.
- Install Smoke CI skips the duplicate direct-npm global update with `OPENCLAW_INSTALL_SMOKE_SKIP_NPM_GLOBAL=1`; run the script locally without that env when direct `npm install -g` coverage is needed.
- Agents delete shared workspace CLI smoke: `pnpm test:docker:agents-delete-shared-workspace` (script: `scripts/e2e/agents-delete-shared-workspace-docker.sh`) builds the root Dockerfile image by default, seeds two agents with one workspace in an isolated container home, runs `agents delete --json`, and verifies valid JSON plus retained workspace behavior. Reuse the install-smoke image with `OPENCLAW_AGENTS_DELETE_SHARED_WORKSPACE_E2E_IMAGE=openclaw-dockerfile-smoke:local OPENCLAW_AGENTS_DELETE_SHARED_WORKSPACE_E2E_SKIP_BUILD=1`.
- Gateway networking (two containers, WS auth + health): `pnpm test:docker:gateway-network` (script: `scripts/e2e/gateway-network-docker.sh`)
@@ -635,15 +637,15 @@ The live-model Docker runners also bind-mount only the needed CLI auth homes (or
Set `OPENCLAW_PLUGINS_E2E_CLAWHUB=0` to skip the live ClawHub block, or override the default package with `OPENCLAW_PLUGINS_E2E_CLAWHUB_SPEC` and `OPENCLAW_PLUGINS_E2E_CLAWHUB_ID`.
- Plugin update unchanged smoke: `pnpm test:docker:plugin-update` (script: `scripts/e2e/plugin-update-unchanged-docker.sh`)
- Config reload metadata smoke: `pnpm test:docker:config-reload` (script: `scripts/e2e/config-reload-source-docker.sh`)
- Bundled plugin runtime deps: `pnpm test:docker:bundled-channel-deps` builds a small Docker runner image by default, builds and packs OpenClaw once on the host, then mounts that tarball into each Linux install scenario. Reuse the image with `OPENCLAW_SKIP_DOCKER_BUILD=1`, skip the host rebuild after a fresh local build with `OPENCLAW_BUNDLED_CHANNEL_HOST_BUILD=0`, or point at an existing tarball with `OPENCLAW_BUNDLED_CHANNEL_PACKAGE_TGZ=/path/to/openclaw-*.tgz`. The full Docker aggregate pre-packs this tarball once, then shards bundled channel checks into independent lanes, including separate update lanes for Telegram, Discord, Slack, Feishu, memory-lancedb, and ACPX. Use `OPENCLAW_BUNDLED_CHANNELS=telegram,slack` to narrow the channel matrix when running the bundled lane directly, or `OPENCLAW_BUNDLED_CHANNEL_UPDATE_TARGETS=telegram,acpx` to narrow the update scenario. The lane also verifies that `channels.<id>.enabled=false` and `plugins.entries.<id>.enabled=false` suppress doctor/runtime-dependency repair.
- Bundled plugin runtime deps: `pnpm test:docker:bundled-channel-deps` builds a small Docker runner image by default, builds and packs OpenClaw once on the host, then mounts that tarball into each Linux install scenario. Reuse the image with `OPENCLAW_SKIP_DOCKER_BUILD=1`, skip the host rebuild after a fresh local build with `OPENCLAW_BUNDLED_CHANNEL_HOST_BUILD=0`, or point at an existing tarball with `OPENCLAW_CURRENT_PACKAGE_TGZ=/path/to/openclaw-*.tgz`. The full Docker aggregate pre-packs this tarball once, then shards bundled channel checks into independent lanes, including separate update lanes for Telegram, Discord, Slack, Feishu, memory-lancedb, and ACPX. Use `OPENCLAW_BUNDLED_CHANNELS=telegram,slack` to narrow the channel matrix when running the bundled lane directly, or `OPENCLAW_BUNDLED_CHANNEL_UPDATE_TARGETS=telegram,acpx` to narrow the update scenario. The lane also verifies that `channels.<id>.enabled=false` and `plugins.entries.<id>.enabled=false` suppress doctor/runtime-dependency repair.
- Narrow bundled plugin runtime deps while iterating by disabling unrelated scenarios, for example:
`OPENCLAW_BUNDLED_CHANNEL_SCENARIOS=0 OPENCLAW_BUNDLED_CHANNEL_UPDATE_SCENARIO=0 OPENCLAW_BUNDLED_CHANNEL_ROOT_OWNED_SCENARIO=0 OPENCLAW_BUNDLED_CHANNEL_SETUP_ENTRY_SCENARIO=0 pnpm test:docker:bundled-channel-deps`.
To prebuild and reuse the shared built-app image manually:
To prebuild and reuse the shared functional image manually:
```bash
OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e:local pnpm test:docker:e2e-build
OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e:local OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:mcp-channels
OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e-functional:local pnpm test:docker:e2e-build
OPENCLAW_DOCKER_E2E_IMAGE=openclaw-docker-e2e-functional:local OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:mcp-channels
```
Suite-specific image overrides such as `OPENCLAW_GATEWAY_NETWORK_E2E_IMAGE` still win when set. When `OPENCLAW_SKIP_DOCKER_BUILD=1` points at a remote shared image, the scripts pull it if it is not already local. The QR and installer Docker tests keep their own Dockerfiles because they validate package/install behavior rather than the shared built-app runtime.

View File

@@ -357,9 +357,11 @@ See [ClawDock](/install/clawdock) for the full helper guide.
</Accordion>
<Accordion title="Base image metadata">
The main Docker image uses `node:24-bookworm` and publishes OCI base-image
annotations including `org.opencontainers.image.base.name`,
`org.opencontainers.image.source`, and others. See
The main Docker runtime image uses `node:24-bookworm-slim` and publishes OCI
base-image annotations including `org.opencontainers.image.base.name`,
`org.opencontainers.image.source`, and others. The Node base digest is
refreshed through Dependabot Docker base-image PRs; release builds do not run
a distro upgrade layer. See
[OCI image annotations](https://github.com/opencontainers/image-spec/blob/main/annotations.md).
</Accordion>
</AccordionGroup>

View File

@@ -67,6 +67,20 @@ Add `--no-onboard` to skip onboarding. To force a specific install type through
the installer, pass `--install-method git --no-onboard` or
`--install-method npm --no-onboard`.
If `openclaw update` fails after the npm package install phase, re-run the
installer. The installer does not call the old updater; it runs the global
package install directly and can recover a partially updated npm install.
```bash
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm
```
To pin the recovery to a specific version or dist-tag, add `--version`:
```bash
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --install-method npm --version <version-or-dist-tag>
```
## Alternative: manual npm, pnpm, or bun
```bash

View File

@@ -103,6 +103,18 @@ openclaw channels logs --channel whatsapp
Each line in the log file is a JSON object. The CLI and Control UI parse these
entries to render structured output (time, level, subsystem, message).
File-log JSONL records also include machine-filterable top-level fields when
available:
- `hostname`: gateway host name.
- `message`: flattened log message text for full-text search.
- `agent_id`: active agent id when the log call carries agent context.
- `session_id`: active session id/key when the log call carries session context.
- `channel`: active channel when the log call carries channel context.
OpenClaw preserves the original structured log arguments alongside these fields
so existing parsers that read numbered tslog argument keys keep working.
### Console output
Console logs are **TTY-aware** and formatted for readability:
@@ -157,6 +169,33 @@ You can override both via the **`OPENCLAW_LOG_LEVEL`** environment variable (e.g
`--verbose` only affects console output and WS log verbosity; it does not change
file log levels.
### Trace correlation
File logs are JSONL. When a log call carries a valid diagnostic trace context,
OpenClaw writes the trace fields as top-level JSON keys (`traceId`, `spanId`,
`parentSpanId`, `traceFlags`) so external log processors can correlate the line
with OTEL spans and provider `traceparent` propagation.
Gateway HTTP requests and Gateway WebSocket frames establish an internal request
trace scope. Logs and diagnostic events emitted inside that async scope inherit
the request trace when they do not pass an explicit trace context. Agent run and
model-call traces become children of the active request trace, so local logs,
diagnostic snapshots, OTEL spans, and trusted provider `traceparent` headers can
be joined by `traceId` without logging raw request or model content.
### Model call size and timing
Model-call diagnostics record bounded request/response measurements without
capturing raw prompt or response content:
- `requestPayloadBytes`: UTF-8 byte size of the final model request payload
- `responseStreamBytes`: UTF-8 byte size of streamed model response events
- `timeToFirstByteMs`: elapsed time before the first streamed response event
- `durationMs`: total model-call duration
These fields are available to diagnostic snapshots, model-call plugin hooks, and
OTEL model-call spans/metrics when diagnostics export is enabled.
### Console styles
`logging.consoleStyle`:
@@ -167,14 +206,16 @@ file log levels.
### Redaction
Tool summaries can redact sensitive tokens before they hit the console:
OpenClaw can redact sensitive tokens before they hit console output, file logs,
OTLP log records, or persisted session transcript text:
- `logging.redactSensitive`: `off` | `tools` (default: `tools`)
- `logging.redactPatterns`: list of regex strings to override the default set
Redaction applies at the logging sinks for **console output**, **stderr-routed
console diagnostics**, and **file logs**. File logs stay JSONL, but matching
secret values are masked before the line is written to disk.
File logs and session transcripts stay JSONL, but matching secret values are
masked before the line or message is written to disk. Redaction is best-effort:
it applies to text-bearing message content and log strings, not every
identifier or binary payload field.
## Diagnostics and OpenTelemetry

View File

@@ -542,6 +542,72 @@ Environment overrides remain available for local testing:
preferred for repeatable deployments because it keeps the plugin behavior in the
same reviewed file as the rest of the Codex harness setup.
## Computer Use
Computer Use is a Codex-native MCP plugin. OpenClaw does not vendor the desktop
control app or execute desktop actions itself; it enables Codex app-server
plugins, installs the configured Codex marketplace plugin when requested, checks
that the `computer-use` MCP server is available, and then lets Codex handle the
native MCP tool calls during Codex-mode turns.
Set `plugins.entries.codex.config.computerUse` when you want Codex-mode turns to
require Computer Use:
```json5
{
plugins: {
entries: {
codex: {
enabled: true,
config: {
computerUse: {
autoInstall: true,
},
},
},
},
},
agents: {
defaults: {
model: "openai/gpt-5.5",
embeddedHarness: {
runtime: "codex",
},
},
},
}
```
With no marketplace fields, OpenClaw asks Codex app-server to use its discovered
marketplaces. On a fresh Codex home, app-server seeds the official curated
marketplace and OpenClaw follows the same loading shape as Codex: it polls
`plugin/list` during install before treating Computer Use as unavailable. The
default discovery wait is 60 seconds and can be tuned with
`marketplaceDiscoveryTimeoutMs`. If multiple known Codex marketplaces contain
Computer Use, OpenClaw uses the Codex marketplace preference order before
failing closed for unknown ambiguous matches.
Use `marketplaceSource` for a non-default Codex marketplace source that
app-server can add, or `marketplacePath` for a local marketplace file that
already exists on the machine. If the marketplace is already registered with
Codex app-server, use `marketplaceName` instead. The defaults are
`pluginName: "computer-use"` and `mcpServerName: "computer-use"`.
For safety, turn-start auto-install only uses marketplaces app-server has
already discovered. Use `/codex computer-use install` for explicit installs from
a configured `marketplaceSource` or `marketplacePath`.
The same setup can be checked or installed from the command surface:
- `/codex computer-use status`
- `/codex computer-use install`
- `/codex computer-use install --source <marketplace-source>`
- `/codex computer-use install --marketplace-path <path>`
Computer Use is macOS-specific and may require local OS permissions before the
Codex MCP server can control apps. If `computerUse.enabled` is true and the MCP
server is unavailable, Codex-mode turns fail before the thread starts instead of
silently running without the native Computer Use tools.
## Common recipes
Local Codex with default stdio transport:
@@ -644,6 +710,8 @@ Common forms:
- `/codex resume <thread-id>` attaches the current OpenClaw session to an existing Codex thread.
- `/codex compact` asks Codex app-server to compact the attached thread.
- `/codex review` starts Codex native review for the attached thread.
- `/codex computer-use status` checks the configured Computer Use plugin and MCP server.
- `/codex computer-use install` installs the configured Computer Use plugin and reloads MCP servers.
- `/codex account` shows account and rate-limit status.
- `/codex mcp` lists Codex app-server MCP server status.
- `/codex skills` lists Codex app-server skills.

View File

@@ -461,7 +461,7 @@ For the full setup and behavior details, see [Ollama Web Search](/tools/ollama-s
<Accordion title="Streaming configuration">
OpenClaw's Ollama integration uses the **native Ollama API** (`/api/chat`) by default, which fully supports streaming and tool calling simultaneously. No special configuration is needed.
For native `/api/chat` requests, OpenClaw also forwards thinking control directly to Ollama: `/think off` and `openclaw agent --thinking off` send top-level `think: false`, while non-`off` thinking levels send `think: true`.
For native `/api/chat` requests, OpenClaw also forwards thinking control directly to Ollama: `/think off` and `openclaw agent --thinking off` send top-level `think: false`, while `/think low|medium|high` send the matching top-level `think` effort string. `/think max` maps to Ollama's highest native effort, `think: "high"`.
<Tip>
If you need to use the OpenAI-compatible endpoint, see the "Legacy OpenAI-compatible mode" section above. Streaming and tool calling may not work simultaneously in that mode.

View File

@@ -1,133 +0,0 @@
---
summary: "Investigation notes for duplicate async exec completion injection"
read_when:
- Debugging repeated node exec completion events
- Working on heartbeat/system-event dedupe
title: "Async exec duplicate completion investigation"
---
## Scope
- Session: `agent:main:telegram:group:-1003774691294:topic:1`
- Symptom: the same async exec completion for session/run `keen-nexus` was recorded twice in LCM as user turns.
- Goal: identify whether this is most likely duplicate session injection or plain outbound delivery retry.
## Conclusion
Most likely this is **duplicate session injection**, not a pure outbound delivery retry.
The strongest gateway-side gap is in the **node exec completion path**:
1. A node-side exec finish emits `exec.finished` with the full `runId`.
2. Gateway `server-node-events` converts that into a system event and requests a heartbeat.
3. The heartbeat run injects the drained system event block into the agent prompt.
4. The embedded runner persists that prompt as a new user turn in the session transcript.
If the same `exec.finished` reaches the gateway twice for the same `runId` for any reason (replay, reconnect duplicate, upstream resend, duplicated producer), OpenClaw currently has **no idempotency check keyed by `runId`/`contextKey`** on this path. The second copy will become a second user message with the same content.
## Exact Code Path
### 1. Producer: node exec completion event
- `src/node-host/invoke.ts:340-360`
- `sendExecFinishedEvent(...)` emits `node.event` with event `exec.finished`.
- Payload includes `sessionKey` and full `runId`.
### 2. Gateway event ingestion
- `src/gateway/server-node-events.ts:574-640`
- Handles `exec.finished`.
- Builds text:
- `Exec finished (node=..., id=<runId>, code ...)`
- Enqueues it via:
- `enqueueSystemEvent(text, { sessionKey, contextKey: runId ? \`exec:${runId}\` : "exec", trusted: false })`
- Immediately requests a wake:
- `requestHeartbeatNow(scopedHeartbeatWakeOptions(sessionKey, { reason: "exec-event" }))`
### 3. System event dedupe weakness
- `src/infra/system-events.ts:90-115`
- `enqueueSystemEvent(...)` only suppresses **consecutive duplicate text**:
- `if (entry.lastText === cleaned) return false`
- It stores `contextKey`, but does **not** use `contextKey` for idempotency.
- After drain, duplicate suppression resets.
This means a replayed `exec.finished` with the same `runId` can be accepted again later, even though the code already had a stable idempotency candidate (`exec:<runId>`).
### 4. Wake handling is not the primary duplicator
- `src/infra/heartbeat-wake.ts:79-117`
- Wakes are coalesced by `(agentId, sessionKey)`.
- Duplicate wake requests for the same target collapse to one pending wake entry.
This makes **duplicate wake handling alone** a weaker explanation than duplicate event ingestion.
### 5. Heartbeat consumes the event and turns it into prompt input
- `src/infra/heartbeat-runner.ts:535-574`
- Preflight peeks pending system events and classifies exec-event runs.
- `src/auto-reply/reply/session-system-events.ts:86-90`
- `drainFormattedSystemEvents(...)` drains the queue for the session.
- `src/auto-reply/reply/get-reply-run.ts:400-427`
- The drained system event block is prepended into the agent prompt body.
### 6. Transcript injection point
- `src/agents/pi-embedded-runner/run/attempt.ts:2000-2017`
- `activeSession.prompt(effectivePrompt)` submits the full prompt to the embedded PI session.
- That is the point where the completion-derived prompt becomes a persisted user turn.
So once the same system event is rebuilt into the prompt twice, duplicate LCM user messages are expected.
## Why plain outbound delivery retry is less likely
There is a real outbound failure path in the heartbeat runner:
- `src/infra/heartbeat-runner.ts:1194-1242`
- The reply is generated first.
- Outbound delivery happens later via `deliverOutboundPayloads(...)`.
- Failure there returns `{ status: "failed" }`.
However, for the same system event queue entry, this alone is **not sufficient** to explain the duplicate user turns:
- `src/auto-reply/reply/session-system-events.ts:86-90`
- The system event queue is already drained before outbound delivery.
So a channel send retry by itself would not recreate the exact same queued event. It could explain missing/failed external delivery, but not by itself a second identical session user message.
## Secondary, lower-confidence possibility
There is a full-run retry loop in the agent runner:
- `src/auto-reply/reply/agent-runner-execution.ts:741-1473`
- Certain transient failures can retry the whole run and resubmit the same `commandBody`.
That can duplicate a persisted user prompt **within the same reply execution** if the prompt was already appended before the retry condition triggered.
I rank this lower than duplicate `exec.finished` ingestion because:
- the observed gap was around 51 seconds, which looks more like a second wake/turn than an in-process retry;
- the report already mentions repeated message send failures, which points more toward a separate later turn than an immediate model/runtime retry.
## Root Cause Hypothesis
Highest-confidence hypothesis:
- The `keen-nexus` completion came through the **node exec event path**.
- The same `exec.finished` was delivered to `server-node-events` twice.
- Gateway accepted both because `enqueueSystemEvent(...)` does not dedupe by `contextKey` / `runId`.
- Each accepted event triggered a heartbeat and was injected as a user turn into the PI transcript.
## Proposed Tiny Surgical Fix
If a fix is wanted, the smallest high-value change is:
- make exec/system-event idempotency honor `contextKey` for a short horizon, at least for exact `(sessionKey, contextKey, text)` repeats;
- or add a dedicated dedupe in `server-node-events` for `exec.finished` keyed by `(sessionKey, runId, event kind)`.
That would directly block replayed `exec.finished` duplicates before they become session turns.
## Related
- [Exec tool](/tools/exec)
- [Session management](/concepts/session)

View File

@@ -1,540 +0,0 @@
---
summary: "QA refactor plan for scenario catalog and harness consolidation"
read_when:
- Refactoring QA scenario definitions or qa-lab harness code
- Moving QA behavior between markdown scenarios and TypeScript harness logic
title: "QA refactor"
---
Status: foundational migration landed.
## Goal
Move OpenClaw QA from a split-definition model to a single source of truth:
- scenario metadata
- prompts sent to the model
- setup and teardown
- harness logic
- assertions and success criteria
- artifacts and report hints
The desired end state is a generic QA harness that loads powerful scenario definition files instead of hardcoding most behavior in TypeScript.
## Current State
Primary source of truth now lives in `qa/scenarios/index.md` plus one file per
scenario under `qa/scenarios/<theme>/*.md`.
Implemented:
- `qa/scenarios/index.md`
- canonical QA pack metadata
- operator identity
- kickoff mission
- `qa/scenarios/<theme>/*.md`
- one markdown file per scenario
- scenario metadata
- handler bindings
- scenario-specific execution config
- `extensions/qa-lab/src/scenario-catalog.ts`
- markdown pack parser + zod validation
- `extensions/qa-lab/src/qa-agent-bootstrap.ts`
- plan rendering from the markdown pack
- `extensions/qa-lab/src/qa-agent-workspace.ts`
- seeds generated compatibility files plus `QA_SCENARIOS.md`
- `extensions/qa-lab/src/suite.ts`
- selects executable scenarios through markdown-defined handler bindings
- QA bus protocol + UI
- generic inline attachments for image/video/audio/file rendering
Remaining split surfaces:
- `extensions/qa-lab/src/suite.ts`
- still owns most executable custom handler logic
- `extensions/qa-lab/src/report.ts`
- still derives report structure from runtime outputs
So the source-of-truth split is fixed, but execution is still mostly handler-backed rather than fully declarative.
## What The Real Scenario Surface Looks Like
Reading the current suite shows a few distinct scenario classes.
### Simple interaction
- channel baseline
- DM baseline
- threaded follow-up
- model switch
- approval followthrough
- reaction/edit/delete
### Config and runtime mutation
- config patch skill disable
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift check
### Filesystem and repo assertions
- source/docs discovery report
- build Lobster Invaders
- generated image artifact lookup
### Memory orchestration
- memory recall
- memory tools in channel context
- memory failure fallback
- session memory ranking
- thread memory isolation
- memory dreaming sweep
### Tool and plugin integration
- MCP plugin-tools call
- skill visibility
- skill hot install
- native image generation
- image roundtrip
- image understanding from attachment
### Multi-turn and multi-actor
- subagent handoff
- subagent fanout synthesis
- restart recovery style flows
These categories matter because they drive DSL requirements. A flat list of prompt + expected text is not enough.
## Direction
### Single source of truth
Use `qa/scenarios/index.md` plus `qa/scenarios/<theme>/*.md` as the authored
source of truth.
The pack should stay:
- human-readable in review
- machine-parseable
- rich enough to drive:
- suite execution
- QA workspace bootstrap
- QA Lab UI metadata
- docs/discovery prompts
- report generation
### Preferred authoring format
Use markdown as the top-level format, with structured YAML inside it.
Recommended shape:
- YAML frontmatter
- id
- title
- surface
- tags
- docs refs
- code refs
- model/provider overrides
- prerequisites
- prose sections
- objective
- notes
- debugging hints
- fenced YAML blocks
- setup
- steps
- assertions
- cleanup
This gives:
- better PR readability than giant JSON
- richer context than pure YAML
- strict parsing and zod validation
Raw JSON is acceptable only as an intermediate generated form.
## Proposed Scenario File Shape
Example:
````md
---
id: image-generation-roundtrip
title: Image generation roundtrip
surface: image
tags: [media, image, roundtrip]
models:
primary: openai/gpt-5.4
requires:
tools: [image_generate]
plugins: [openai, qa-channel]
docsRefs:
- docs/help/testing.md
- docs/concepts/model-providers.md
codeRefs:
- extensions/qa-lab/src/suite.ts
- src/gateway/chat-attachments.ts
---
# Objective
Verify generated media is reattached on the follow-up turn.
# Setup
```yaml scenario.setup
- action: config.patch
patch:
agents:
defaults:
imageGenerationModel:
primary: openai/gpt-image-1
- action: session.create
key: agent:qa:image-roundtrip
```
# Steps
```yaml scenario.steps
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Image generation check: generate a QA lighthouse image and summarize it in one short sentence.
- action: artifact.capture
kind: generated-image
promptSnippet: Image generation check
saveAs: lighthouseImage
- action: agent.send
session: agent:qa:image-roundtrip
message: |
Roundtrip image inspection check: describe the generated lighthouse attachment in one short sentence.
attachments:
- fromArtifact: lighthouseImage
```
# Expect
```yaml scenario.expect
- assert: outbound.textIncludes
value: lighthouse
- assert: requestLog.matches
where:
promptIncludes: Roundtrip image inspection check
imageInputCountGte: 1
- assert: artifact.exists
ref: lighthouseImage
```
````
## Runner Capabilities The DSL Must Cover
Based on the current suite, the generic runner needs more than prompt execution.
### Environment and setup actions
- `bus.reset`
- `gateway.waitHealthy`
- `channel.waitReady`
- `session.create`
- `thread.create`
- `workspace.writeSkill`
### Agent turn actions
- `agent.send`
- `agent.wait`
- `bus.injectInbound`
- `bus.injectOutbound`
### Config and runtime actions
- `config.get`
- `config.patch`
- `config.apply`
- `gateway.restart`
- `tools.effective`
- `skills.status`
### File and artifact actions
- `file.write`
- `file.read`
- `file.delete`
- `file.touchTime`
- `artifact.captureGeneratedImage`
- `artifact.capturePath`
### Memory and cron actions
- `memory.indexForce`
- `memory.searchCli`
- `doctor.memory.status`
- `cron.list`
- `cron.run`
- `cron.waitCompletion`
- `sessionTranscript.write`
### MCP actions
- `mcp.callTool`
### Assertions
- `outbound.textIncludes`
- `outbound.inThread`
- `outbound.notInRoot`
- `tool.called`
- `tool.notPresent`
- `skill.visible`
- `skill.disabled`
- `file.contains`
- `memory.contains`
- `requestLog.matches`
- `sessionStore.matches`
- `cron.managedPresent`
- `artifact.exists`
## Variables and Artifact References
The DSL must support saved outputs and later references.
Examples from the current suite:
- create a thread, then reuse `threadId`
- create a session, then reuse `sessionKey`
- generate an image, then attach the file on the next turn
- generate a wake marker string, then assert that it appears later
Needed capabilities:
- `saveAs`
- `${vars.name}`
- `${artifacts.name}`
- typed references for paths, session keys, thread ids, markers, tool outputs
Without variable support, the harness will keep leaking scenario logic back into TypeScript.
## What Should Stay As Escape Hatches
A fully pure declarative runner is not realistic in phase 1.
Some scenarios are inherently orchestration-heavy:
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- generated image artifact resolution by timestamp/path
- discovery-report evaluation
These should use explicit custom handlers for now.
Recommended rule:
- 85-90% declarative
- explicit `customHandler` steps for the hard remainder
- named and documented custom handlers only
- no anonymous inline code in the scenario file
That keeps the generic engine clean while still allowing progress.
## Architecture Change
### Current
Scenario markdown already is the source of truth for:
- suite execution
- workspace bootstrap files
- QA Lab UI scenario catalog
- report metadata
- discovery prompts
Generated compatibility:
- seeded workspace still includes `QA_KICKOFF_TASK.md`
- seeded workspace still includes `QA_SCENARIO_PLAN.md`
- seeded workspace now also includes `QA_SCENARIOS.md`
## Refactor Plan
### Phase 1: loader and schema
Done.
- added `qa/scenarios/index.md`
- split scenarios into `qa/scenarios/<theme>/*.md`
- added parser for named markdown YAML pack content
- validated with zod
- switched consumers to the parsed pack
- removed repo-level `qa/seed-scenarios.json` and `qa/QA_KICKOFF_TASK.md`
### Phase 2: generic engine
- split `extensions/qa-lab/src/suite.ts` into:
- loader
- engine
- action registry
- assertion registry
- custom handlers
- keep existing helper functions as engine operations
Deliverable:
- engine executes simple declarative scenarios
Start with scenarios that are mostly prompt + wait + assert:
- threaded follow-up
- image understanding from attachment
- skill visibility and invocation
- channel baseline
Deliverable:
- first real markdown-defined scenarios shipping through the generic engine
### Phase 4: migrate medium scenarios
- image generation roundtrip
- memory tools in channel context
- session memory ranking
- subagent handoff
- subagent fanout synthesis
Deliverable:
- variables, artifacts, tool assertions, request-log assertions proven out
### Phase 5: keep hard scenarios on custom handlers
- memory dreaming sweep
- config apply restart wake-up
- config restart capability flip
- runtime inventory drift
Deliverable:
- same authoring format, but with explicit custom-step blocks where needed
### Phase 6: delete hardcoded scenario map
Once the pack coverage is good enough:
- remove most scenario-specific TypeScript branching from `extensions/qa-lab/src/suite.ts`
## Fake Slack / Rich Media Support
The current QA bus is text-first.
Relevant files:
- `extensions/qa-channel/src/protocol.ts`
- `extensions/qa-lab/src/bus-state.ts`
- `extensions/qa-lab/src/bus-queries.ts`
- `extensions/qa-lab/src/bus-server.ts`
- `extensions/qa-lab/web/src/ui-render.ts`
Today the QA bus supports:
- text
- reactions
- threads
It does not yet model inline media attachments.
### Needed transport contract
Add a generic QA bus attachment model:
```ts
type QaBusAttachment = {
id: string;
kind: "image" | "video" | "audio" | "file";
mimeType: string;
fileName?: string;
inline?: boolean;
url?: string;
contentBase64?: string;
width?: number;
height?: number;
durationMs?: number;
altText?: string;
transcript?: string;
};
```
Then add `attachments?: QaBusAttachment[]` to:
- `QaBusMessage`
- `QaBusInboundMessageInput`
- `QaBusOutboundMessageInput`
### Why generic first
Do not build a Slack-only media model.
Instead:
- one generic QA transport model
- multiple renderers on top of it
- current QA Lab chat
- future fake Slack web
- any other fake transport views
This prevents duplicate logic and lets media scenarios stay transport-agnostic.
### UI work needed
Update the QA UI to render:
- inline image preview
- inline audio player
- inline video player
- file attachment chip
The current UI can already render threads and reactions, so attachment rendering should layer onto the same message card model.
### Scenario work enabled by media transport
Once attachments flow through QA bus, we can add richer fake-chat scenarios:
- inline image reply in fake Slack
- audio attachment understanding
- video attachment understanding
- mixed attachment ordering
- thread reply with media retained
## Recommendation
The next implementation chunk should be:
1. add markdown scenario loader + zod schema
2. generate the current catalog from markdown
3. migrate a few simple scenarios first
4. add generic QA bus attachment support
5. render inline image in the QA UI
6. then expand to audio and video
This is the smallest path that proves both goals:
- generic markdown-defined QA
- richer fake messaging surfaces
## Open Questions
- whether scenario files should allow embedded markdown prompt templates with variable interpolation
- whether setup/cleanup should be named sections or just ordered action lists
- whether artifact references should be strongly typed in schema or string-based
- whether custom handlers should live in one registry or per-surface registries
- whether the generated JSON compatibility file should remain checked in during migration
## Related
- [QA E2E automation](/concepts/qa-e2e-automation)

View File

@@ -49,6 +49,12 @@ OpenClaw has three public release lanes:
- Run `pnpm build && pnpm ui:build` before `pnpm release:check` so the expected
`dist/*` release artifacts and Control UI bundle exist for the pack
validation step
- Run the manual `CI` workflow before release approval when you need full normal
CI coverage for the release candidate. Manual CI dispatches bypass changed
scoping and force the Linux Node shards, bundled-plugin shards, channel
contracts, `check`, `check-additional`, build smoke, docs checks, Python
skills, Windows, macOS, Android, and Control UI i18n lanes.
Example: `gh workflow run ci.yml --ref release/YYYY.M.D`
- Run `pnpm qa:otel:smoke` when validating release telemetry. It exercises
QA-lab through a local OTLP/HTTP receiver and verifies the exported trace
span names, bounded attributes, and content/identifier redaction without
@@ -182,18 +188,20 @@ When cutting a stable npm release:
SHA for a validation-only dry run of the preflight workflow
2. Choose `npm_dist_tag=beta` for the normal beta-first flow, or `latest` only
when you intentionally want a direct stable publish
3. Run `OpenClaw Release Checks` separately with the same tag or the
3. Run the manual `CI` workflow on the release ref when you want full normal CI
coverage instead of smart-scoped merge coverage
4. Run `OpenClaw Release Checks` separately with the same tag or the
full current workflow-branch commit SHA when you want live prompt cache,
QA Lab parity, Matrix, and Telegram coverage
- This is separate on purpose so live coverage stays available without
recoupling long-running or flaky checks to the publish workflow
4. Save the successful `preflight_run_id`
5. Run `OpenClaw NPM Release` again with `preflight_only=false`, the same
5. Save the successful `preflight_run_id`
6. Run `OpenClaw NPM Release` again with `preflight_only=false`, the same
`tag`, the same `npm_dist_tag`, and the saved `preflight_run_id`
6. If the release landed on `beta`, use the private
7. If the release landed on `beta`, use the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow to promote that stable version from `beta` to `latest`
7. If the release intentionally published directly to `latest` and `beta`
8. If the release intentionally published directly to `latest` and `beta`
should follow the same stable build immediately, use that same private
workflow to point both dist-tags at the stable version, or let its scheduled
self-healing sync move `beta` later

View File

@@ -193,7 +193,12 @@ Notable entry types:
- `compaction`: persisted compaction summary with `firstKeptEntryId` and `tokensBefore`
- `branch_summary`: persisted summary when navigating a tree branch
OpenClaw intentionally does **not** “fix up” transcripts; the Gateway uses `SessionManager` to read/write them.
OpenClaw uses `SessionManager` for normal transcript reads/writes. After
compaction, the Gateway now defaults to a bounded transcript rewrite that drops
message entries already covered by the persisted compaction summary while
keeping non-message session state and the recent unsummarized tail. Set
`agents.defaults.compaction.truncateAfterCompaction` to `false` to preserve the
legacy append-only behavior.
---

View File

@@ -10,11 +10,12 @@ title: "Tests"
- `pnpm test:force`: Kills any lingering gateway process holding the default control port, then runs the full Vitest suite with an isolated gateway port so server tests dont collide with a running instance. Use this when a prior gateway run left port 18789 occupied.
- `pnpm test:coverage`: Runs the unit suite with V8 coverage (via `vitest.unit.config.ts`). This is a loaded-file unit coverage gate, not whole-repo all-file coverage. Thresholds are 70% lines/functions/statements and 55% branches. Because `coverage.all` is false, the gate measures files loaded by the unit coverage suite instead of treating every split-lane source file as uncovered.
- `pnpm test:coverage:changed`: Runs unit coverage only for files changed since `origin/main`.
- `pnpm test:changed`: expands changed git paths into scoped Vitest lanes when the diff only touches routable source/test files. Config/setup changes still fall back to the native root projects run so wiring edits rerun broadly when needed.
- `pnpm test:changed:focused`: inner-loop changed test run. It only runs precise targets from direct test edits, sibling `*.test.ts` files, explicit source mappings, and the local import graph. Broad/config/package changes are skipped instead of expanding to the full changed-test fallback.
- `pnpm test:changed`: cheap smart changed test run. It runs precise targets from direct test edits, sibling `*.test.ts` files, explicit source mappings, and the local import graph. Broad/config/package changes are skipped unless they map to precise tests.
- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed`: explicit broad changed test run. Use it when a test harness/config/package edit should fall back to Vitest's broader changed-test behavior.
- `pnpm changed:lanes`: shows the architectural lanes triggered by the diff against `origin/main`.
- `pnpm check:changed`: runs the smart changed gate for the diff against `origin/main`. It runs core work with core test lanes, extension work with extension test lanes, test-only work with test typecheck/tests only, expands public Plugin SDK or plugin-contract changes to one extension validation pass, and keeps release metadata-only version bumps on targeted version/config/root-dependency checks.
- `pnpm check:changed`: runs the smart changed check gate for the diff against `origin/main`. It runs typecheck, lint, and guard commands for the affected architectural lanes, but does not run Vitest tests. Use `pnpm test:changed` or explicit `pnpm test <target>` for test proof.
- `pnpm test`: routes explicit file/directory targets through scoped Vitest lanes. Untargeted runs use fixed shard groups and expand to leaf configs for local parallel execution; the extension group always expands to the per-extension shard configs instead of one giant root-project process.
- Test wrapper runs end with a short `[test] passed|failed|skipped ... in ...` summary. Vitest's own duration line stays the per-shard detail.
- Full, extension, and include-pattern shard runs update local timing data in `.artifacts/vitest-shard-timings.json`; later whole-config runs use those timings to balance slow and fast shards. Include-pattern CI shards append the shard name to the timing key, which keeps filtered shard timings visible without replacing whole-config timing data. Set `OPENCLAW_TEST_PROJECTS_TIMINGS=0` to ignore the local timing artifact.
- Selected `plugin-sdk` and `commands` test files now route through dedicated light lanes that keep only `test/setup.ts`, leaving runtime-heavy cases on their existing lanes.
- Source files with sibling tests map to that sibling before falling back to wider directory globs. Helper edits under `test/helpers/channels` and `test/helpers/plugins` use a local import graph to run importing tests instead of broad-running every shard when the dependency path is precise.
@@ -33,7 +34,7 @@ title: "Tests"
- Gateway integration: opt-in via `OPENCLAW_TEST_INCLUDE_GATEWAY=1 pnpm test` or `pnpm test:gateway`.
- `pnpm test:e2e`: Runs gateway end-to-end smoke tests (multi-instance WS/HTTP/node pairing). Defaults to `threads` + `isolate: false` with adaptive workers in `vitest.e2e.config.ts`; tune with `OPENCLAW_E2E_WORKERS=<n>` and set `OPENCLAW_E2E_VERBOSE=1` for verbose logs.
- `pnpm test:live`: Runs provider live tests (minimax/zai). Requires API keys and `LIVE=1` (or provider-specific `*_LIVE_TEST=1`) to unskip.
- `pnpm test:docker:all`: Builds the shared live-test image and Docker E2E image once, then runs the Docker smoke lanes with `OPENCLAW_SKIP_DOCKER_BUILD=1` through a weighted scheduler. `OPENCLAW_DOCKER_ALL_PARALLELISM=<n>` controls process slots and defaults to 10; `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM=<n>` controls the provider-sensitive tail pool and defaults to 10. Heavy lane caps default to `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=10`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7`; provider caps default to one heavy lane per provider via `OPENCLAW_DOCKER_ALL_LIVE_CLAUDE_LIMIT=4`, `OPENCLAW_DOCKER_ALL_LIVE_CODEX_LIMIT=4`, and `OPENCLAW_DOCKER_ALL_LIVE_GEMINI_LIMIT=4`. Use `OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT` or `OPENCLAW_DOCKER_ALL_DOCKER_LIMIT` for larger hosts. Lane starts are staggered by 2 seconds by default to avoid local Docker daemon create storms; override with `OPENCLAW_DOCKER_ALL_START_STAGGER_MS=<ms>`. The runner preflights Docker by default, cleans stale OpenClaw E2E containers, emits active-lane status every 30 seconds, shares provider CLI tool caches between compatible lanes, retries transient live-provider failures once by default (`OPENCLAW_DOCKER_ALL_LIVE_RETRIES=<n>`), and stores lane timings in `.artifacts/docker-tests/lane-timings.json` for longest-first ordering on later runs. Use `OPENCLAW_DOCKER_ALL_DRY_RUN=1` to print the lane manifest without running Docker, `OPENCLAW_DOCKER_ALL_STATUS_INTERVAL_MS=<ms>` to tune status output, or `OPENCLAW_DOCKER_ALL_TIMINGS=0` to disable timing reuse. Use `OPENCLAW_DOCKER_ALL_LIVE_MODE=skip` for deterministic/local lanes only or `OPENCLAW_DOCKER_ALL_LIVE_MODE=only` for live-provider lanes only; package aliases are `pnpm test:docker:local:all` and `pnpm test:docker:live:all`. Live-only mode merges main and tail live lanes into one longest-first pool so provider buckets can pack Claude, Codex, and Gemini work together. The runner stops scheduling new pooled lanes after the first failure unless `OPENCLAW_DOCKER_ALL_FAIL_FAST=0` is set, and each lane has a 120-minute fallback timeout overrideable with `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS`; selected live/tail lanes use tighter per-lane caps. CLI backend Docker setup commands have their own timeout via `OPENCLAW_LIVE_CLI_BACKEND_SETUP_TIMEOUT_SECONDS` (default 180). Per-lane logs are written under `.artifacts/docker-tests/<run-id>/`.
- `pnpm test:docker:all`: Builds the shared live-test image, packs OpenClaw once as an npm tarball, builds/reuses a bare Node/Git runner image plus a functional image that installs that tarball into `/app`, then runs Docker smoke lanes with `OPENCLAW_SKIP_DOCKER_BUILD=1` through a weighted scheduler. The bare image (`OPENCLAW_DOCKER_E2E_BARE_IMAGE`) is used for installer/update/plugin-dependency lanes; those lanes mount the prebuilt tarball instead of using copied repo sources. The functional image (`OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE`) is used for normal built-app functionality lanes. `scripts/package-openclaw-for-docker.mjs` is the single local/CI package packer and validates the tarball plus `dist/postinstall-inventory.json` before Docker consumes it. Docker lane definitions live in `scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in `scripts/lib/docker-e2e-plan.mjs`; `scripts/test-docker-all.mjs` executes the selected plan. `node scripts/test-docker-all.mjs --plan-json` emits the scheduler-owned CI plan for selected lanes, image kinds, package/live-image needs, and credential checks without building or running Docker. `OPENCLAW_DOCKER_ALL_PARALLELISM=<n>` controls process slots and defaults to 10; `OPENCLAW_DOCKER_ALL_TAIL_PARALLELISM=<n>` controls the provider-sensitive tail pool and defaults to 10. Heavy lane caps default to `OPENCLAW_DOCKER_ALL_LIVE_LIMIT=9`, `OPENCLAW_DOCKER_ALL_NPM_LIMIT=10`, and `OPENCLAW_DOCKER_ALL_SERVICE_LIMIT=7`; provider caps default to one heavy lane per provider via `OPENCLAW_DOCKER_ALL_LIVE_CLAUDE_LIMIT=4`, `OPENCLAW_DOCKER_ALL_LIVE_CODEX_LIMIT=4`, and `OPENCLAW_DOCKER_ALL_LIVE_GEMINI_LIMIT=4`. Use `OPENCLAW_DOCKER_ALL_WEIGHT_LIMIT` or `OPENCLAW_DOCKER_ALL_DOCKER_LIMIT` for larger hosts. Lane starts are staggered by 2 seconds by default to avoid local Docker daemon create storms; override with `OPENCLAW_DOCKER_ALL_START_STAGGER_MS=<ms>`. The runner preflights Docker by default, cleans stale OpenClaw E2E containers, emits active-lane status every 30 seconds, shares provider CLI tool caches between compatible lanes, retries transient live-provider failures once by default (`OPENCLAW_DOCKER_ALL_LIVE_RETRIES=<n>`), and stores lane timings in `.artifacts/docker-tests/lane-timings.json` for longest-first ordering on later runs. Use `OPENCLAW_DOCKER_ALL_DRY_RUN=1` to print the lane manifest without running Docker, `OPENCLAW_DOCKER_ALL_STATUS_INTERVAL_MS=<ms>` to tune status output, or `OPENCLAW_DOCKER_ALL_TIMINGS=0` to disable timing reuse. Use `OPENCLAW_DOCKER_ALL_LIVE_MODE=skip` for deterministic/local lanes only or `OPENCLAW_DOCKER_ALL_LIVE_MODE=only` for live-provider lanes only; package aliases are `pnpm test:docker:local:all` and `pnpm test:docker:live:all`. Live-only mode merges main and tail live lanes into one longest-first pool so provider buckets can pack Claude, Codex, and Gemini work together. The runner stops scheduling new pooled lanes after the first failure unless `OPENCLAW_DOCKER_ALL_FAIL_FAST=0` is set, and each lane has a 120-minute fallback timeout overrideable with `OPENCLAW_DOCKER_ALL_LANE_TIMEOUT_MS`; selected live/tail lanes use tighter per-lane caps. CLI backend Docker setup commands have their own timeout via `OPENCLAW_LIVE_CLI_BACKEND_SETUP_TIMEOUT_SECONDS` (default 180). Per-lane logs, `summary.json`, `failures.json`, and phase timings are written under `.artifacts/docker-tests/<run-id>/`; use `pnpm test:docker:timings <summary.json>` to inspect slow lanes and `pnpm test:docker:rerun <run-id|summary.json|failures.json>` to print cheap targeted rerun commands.
- `pnpm test:docker:browser-cdp-snapshot`: Builds a Chromium-backed source E2E container, starts raw CDP plus an isolated Gateway, runs `browser doctor --deep`, and verifies CDP role snapshots include link URLs, cursor-promoted clickables, iframe refs, and frame metadata.
- CLI backend live Docker probes can be run as focused lanes, for example `pnpm test:docker:live-cli-backend:codex`, `pnpm test:docker:live-cli-backend:codex:resume`, or `pnpm test:docker:live-cli-backend:codex:mcp`. Claude and Gemini have matching `:resume` and `:mcp` aliases.
- `pnpm test:docker:openwebui`: Starts Dockerized OpenClaw + Open WebUI, signs in through Open WebUI, checks `/api/models`, then runs a real proxied chat through `/api/chat/completions`. Requires a usable live model key (for example OpenAI in `~/.profile`), pulls an external Open WebUI image, and is not expected to be CI-stable like the normal unit/e2e suites.

View File

@@ -15,7 +15,7 @@ title: "Thinking levels"
- high → “ultrathink” (max budget)
- xhigh → “ultrathink+” (GPT-5.2+ and Codex models, plus Anthropic Claude Opus 4.7 effort)
- adaptive → provider-managed adaptive thinking (supported for Claude 4.6 on Anthropic/Bedrock, Anthropic Claude Opus 4.7, and Google Gemini dynamic thinking)
- max → provider max reasoning (currently Anthropic Claude Opus 4.7)
- max → provider max reasoning (Anthropic Claude Opus 4.7; Ollama maps this to its highest native `think` effort)
- `x-high`, `x_high`, `extra-high`, `extra high`, and `extra_high` map to `xhigh`.
- `highest` maps to `high`.
- Provider notes:
@@ -26,6 +26,7 @@ title: "Thinking levels"
- Anthropic Claude Opus 4.7 does not default to adaptive thinking. Its API effort default remains provider-owned unless you explicitly set a thinking level.
- Anthropic Claude Opus 4.7 maps `/think xhigh` to adaptive thinking plus `output_config.effort: "xhigh"`, because `/think` is a thinking directive and `xhigh` is the Opus 4.7 effort setting.
- Anthropic Claude Opus 4.7 also exposes `/think max`; it maps to the same provider-owned max effort path.
- Ollama thinking-capable models expose `/think low|medium|high|max`; `max` maps to native `think: "high"` because Ollama's native API accepts `low`, `medium`, and `high` effort strings.
- OpenAI GPT models map `/think` through model-specific Responses API effort support. `/think off` sends `reasoning.effort: "none"` only when the target model supports it; otherwise OpenClaw omits the disabled reasoning payload instead of sending an unsupported value.
- Google Gemini maps `/think adaptive` to Gemini's provider-owned dynamic thinking. Gemini 3 requests omit a fixed `thinkingLevel`, while Gemini 2.5 requests send `thinkingBudget: -1`; fixed levels still map to the closest Gemini `thinkingLevel` or budget for that model family.
- MiniMax (`minimax/*`) on the Anthropic-compatible streaming path defaults to `thinking: { type: "disabled" }` unless you explicitly set thinking in model params or request params. This avoids leaked `reasoning_content` deltas from MiniMax's non-native Anthropic stream format.

View File

@@ -134,6 +134,7 @@ The Control UI can localize itself on first load based on your browser locale. T
<AccordionGroup>
<Accordion title="Send and history semantics">
- `chat.send` is **non-blocking**: it acks immediately with `{ runId, status: "started" }` and the response streams via `chat` events.
- Chat uploads accept images plus non-video files. Images keep the native image path; other files are stored as managed media and shown in history as attachment links.
- Re-sending with the same `idempotencyKey` returns `{ status: "in_flight" }` while running, and `{ status: "ok" }` after completion.
- `chat.history` responses are size-bounded for UI safety. When transcript entries are too large, Gateway may truncate long text fields, omit heavy metadata blocks, and replace oversized messages with a placeholder (`[chat.history omitted: message too large]`).
- Assistant/generated images are persisted as managed media references and served back through authenticated Gateway media URLs, so reloads do not depend on raw base64 image payloads staying in the chat history response.

View File

@@ -1,3 +1,4 @@
import fs from "node:fs";
import os from "node:os";
import { afterEach, describe, expect, it, vi } from "vitest";
@@ -207,6 +208,38 @@ describe("gateway bonjour advertiser", () => {
await expect(started.stop()).resolves.toBeUndefined();
});
it("auto-disables Bonjour in detected containers", async () => {
enableAdvertiserUnitMode();
vi.spyOn(fs, "existsSync").mockImplementation((filePath) => String(filePath) === "/.dockerenv");
const started = await startAdvertiser({
gatewayPort: 18789,
sshPort: 2222,
});
expect(createService).not.toHaveBeenCalled();
await expect(started.stop()).resolves.toBeUndefined();
});
it("honors explicit Bonjour opt-in inside detected containers", async () => {
enableAdvertiserUnitMode();
process.env.OPENCLAW_DISABLE_BONJOUR = "0";
vi.spyOn(fs, "existsSync").mockImplementation((filePath) => String(filePath) === "/.dockerenv");
const destroy = vi.fn().mockResolvedValue(undefined);
const advertise = vi.fn().mockResolvedValue(undefined);
mockCiaoService({ advertise, destroy });
const started = await startAdvertiser({
gatewayPort: 18789,
sshPort: 2222,
});
expect(createService).toHaveBeenCalledTimes(1);
await started.stop();
});
it("attaches conflict listeners for services", async () => {
enableAdvertiserUnitMode();

View File

@@ -1,3 +1,4 @@
import fs from "node:fs";
import type { PluginLogger } from "openclaw/plugin-sdk/plugin-entry";
import { isTruthyEnvValue } from "openclaw/plugin-sdk/runtime-env";
import { classifyCiaoProcessError, type CiaoProcessErrorClassification } from "./ciao.js";
@@ -89,16 +90,61 @@ async function loadCiaoModule(): Promise<CiaoModule> {
return ciaoModulePromise;
}
function isDisabledByEnv() {
if (isTruthyEnvValue(process.env.OPENCLAW_DISABLE_BONJOUR)) {
function readBonjourDisableOverride(): boolean | null {
const raw = process.env.OPENCLAW_DISABLE_BONJOUR;
const normalized = raw?.trim().toLowerCase();
if (!normalized) {
return null;
}
if (isTruthyEnvValue(raw)) {
return true;
}
switch (normalized) {
case "0":
case "false":
case "no":
case "off":
return false;
default:
return null;
}
}
function isContainerEnvironment() {
for (const sentinelPath of ["/.dockerenv", "/run/.containerenv", "/var/run/.containerenv"]) {
try {
if (fs.existsSync(sentinelPath)) {
return true;
}
} catch {
// ignore
}
}
try {
const cgroup = fs.readFileSync("/proc/1/cgroup", "utf8");
return /\/docker\/|cri-containerd-[0-9a-f]|containerd\/[0-9a-f]{64}|\/kubepods[/.]|\blxc\b/u.test(
cgroup,
);
} catch {
return false;
}
}
function isDisabledByEnv() {
if (process.env.NODE_ENV === "test") {
return true;
}
if (process.env.VITEST) {
return true;
}
const envOverride = readBonjourDisableOverride();
if (envOverride !== null) {
return envOverride;
}
if (isContainerEnvironment()) {
return true;
}
return false;
}

View File

@@ -48,6 +48,34 @@ describe("bonjour-ciao", () => {
expect(ignoreCiaoUnhandledRejection(new Error("CIAO PROBING CANCELLED"))).toBe(true);
});
it("suppresses wrapped ciao cancellation rejections", () => {
expect(
classifyCiaoUnhandledRejection({
reason: new Error("CIAO ANNOUNCEMENT CANCELLED"),
}),
).toEqual({
kind: "cancellation",
formatted: "CIAO ANNOUNCEMENT CANCELLED",
});
});
it("suppresses aggregate ciao assertion rejections", () => {
expect(
classifyCiaoUnhandledRejection(
new AggregateError([
Object.assign(
new Error("Reached illegal state! IPV4 address change from defined to undefined!"),
{ name: "AssertionError" },
),
]),
),
).toEqual({
kind: "interface-assertion",
formatted:
"AssertionError: Reached illegal state! IPV4 address change from defined to undefined!",
});
});
it("suppresses lower-case string cancellation reasons too", () => {
expect(ignoreCiaoUnhandledRejection("ciao announcement cancelled during cleanup")).toBe(true);
});

View File

@@ -11,17 +11,59 @@ export type CiaoProcessErrorClassification =
| { kind: "interface-assertion"; formatted: string }
| { kind: "netmask-assertion"; formatted: string };
function collectCiaoProcessErrorCandidates(reason: unknown): unknown[] {
const queue: unknown[] = [reason];
const seen = new Set<unknown>();
const candidates: unknown[] = [];
while (queue.length > 0) {
const current = queue.shift();
if (current == null || seen.has(current)) {
continue;
}
seen.add(current);
candidates.push(current);
if (!current || typeof current !== "object") {
continue;
}
const record = current as Record<string, unknown>;
for (const nested of [
record.cause,
record.reason,
record.original,
record.error,
record.data,
]) {
if (nested != null && !seen.has(nested)) {
queue.push(nested);
}
}
if (Array.isArray(record.errors)) {
for (const nested of record.errors) {
if (nested != null && !seen.has(nested)) {
queue.push(nested);
}
}
}
}
return candidates;
}
export function classifyCiaoProcessError(reason: unknown): CiaoProcessErrorClassification | null {
const formatted = formatBonjourError(reason);
const message = formatted.toUpperCase();
if (CIAO_CANCELLATION_MESSAGE_RE.test(message)) {
return { kind: "cancellation", formatted };
}
if (CIAO_INTERFACE_ASSERTION_MESSAGE_RE.test(message)) {
return { kind: "interface-assertion", formatted };
}
if (CIAO_NETMASK_ASSERTION_MESSAGE_RE.test(message)) {
return { kind: "netmask-assertion", formatted };
for (const candidate of collectCiaoProcessErrorCandidates(reason)) {
const formatted = formatBonjourError(candidate);
const message = formatted.toUpperCase();
if (CIAO_CANCELLATION_MESSAGE_RE.test(message)) {
return { kind: "cancellation", formatted };
}
if (CIAO_INTERFACE_ASSERTION_MESSAGE_RE.test(message)) {
return { kind: "interface-assertion", formatted };
}
if (CIAO_NETMASK_ASSERTION_MESSAGE_RE.test(message)) {
return { kind: "netmask-assertion", formatted };
}
}
return null;
}

View File

@@ -43,6 +43,42 @@
}
}
},
"computerUse": {
"type": "object",
"additionalProperties": false,
"properties": {
"enabled": {
"type": "boolean",
"default": false
},
"autoInstall": {
"type": "boolean",
"default": false
},
"marketplaceDiscoveryTimeoutMs": {
"type": "number",
"minimum": 1,
"default": 60000
},
"marketplaceSource": {
"type": "string"
},
"marketplacePath": {
"type": "string"
},
"marketplaceName": {
"type": "string"
},
"pluginName": {
"type": "string",
"default": "computer-use"
},
"mcpServerName": {
"type": "string",
"default": "computer-use"
}
}
},
"appServer": {
"type": "object",
"additionalProperties": false,
@@ -112,6 +148,51 @@
"help": "Maximum time to wait for Codex app-server model discovery before falling back to the bundled model list.",
"advanced": true
},
"computerUse": {
"label": "Computer Use",
"help": "Controls Codex app-server setup for the Computer Use plugin.",
"advanced": true
},
"computerUse.enabled": {
"label": "Enable Computer Use",
"help": "When true, Codex-mode turns require the configured Computer Use MCP server to be available.",
"advanced": true
},
"computerUse.autoInstall": {
"label": "Auto Install",
"help": "Install the configured Computer Use plugin when Codex-mode turns start.",
"advanced": true
},
"computerUse.marketplaceDiscoveryTimeoutMs": {
"label": "Marketplace Discovery Timeout",
"help": "Maximum time to wait for Codex app-server to finish loading marketplaces during Computer Use install.",
"advanced": true
},
"computerUse.marketplaceSource": {
"label": "Marketplace Source",
"help": "Optional Codex marketplace source to add before installing Computer Use.",
"advanced": true
},
"computerUse.marketplacePath": {
"label": "Marketplace Path",
"help": "Optional local Codex marketplace file path containing the Computer Use plugin.",
"advanced": true
},
"computerUse.marketplaceName": {
"label": "Marketplace Name",
"help": "Optional registered Codex marketplace name containing the Computer Use plugin.",
"advanced": true
},
"computerUse.pluginName": {
"label": "Plugin Name",
"help": "Codex marketplace plugin name for Computer Use.",
"advanced": true
},
"computerUse.mcpServerName": {
"label": "MCP Server Name",
"help": "MCP server name exposed by the Computer Use plugin.",
"advanced": true
},
"appServer": {
"label": "App Server",
"help": "Runtime controls for connecting to Codex app-server.",

View File

@@ -0,0 +1,502 @@
import { afterEach, describe, expect, it, vi } from "vitest";
import {
CodexComputerUseSetupError,
ensureCodexComputerUse,
installCodexComputerUse,
readCodexComputerUseStatus,
type CodexComputerUseRequest,
} from "./computer-use.js";
describe("Codex Computer Use setup", () => {
afterEach(() => {
vi.useRealTimers();
});
it("stays disabled until configured", async () => {
await expect(
readCodexComputerUseStatus({ pluginConfig: {}, request: vi.fn() }),
).resolves.toEqual(
expect.objectContaining({
enabled: false,
ready: false,
message: "Computer Use is disabled.",
}),
);
});
it("reports an installed Computer Use MCP server from a registered marketplace", async () => {
const request = createComputerUseRequest({ installed: true });
await expect(
readCodexComputerUseStatus({
pluginConfig: { computerUse: { enabled: true, marketplaceName: "desktop-tools" } },
request,
}),
).resolves.toEqual(
expect.objectContaining({
enabled: true,
ready: true,
installed: true,
pluginEnabled: true,
mcpServerAvailable: true,
marketplaceName: "desktop-tools",
tools: ["list_apps"],
message: "Computer Use is ready.",
}),
);
expect(request).not.toHaveBeenCalledWith("marketplace/add", expect.anything());
expect(request).not.toHaveBeenCalledWith(
"experimentalFeature/enablement/set",
expect.anything(),
);
expect(request).not.toHaveBeenCalledWith("plugin/install", expect.anything());
});
it("does not register marketplace sources during status checks", async () => {
const request = createComputerUseRequest({ installed: true });
await expect(
readCodexComputerUseStatus({
pluginConfig: {
computerUse: {
enabled: true,
marketplaceSource: "github:example/desktop-tools",
},
},
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: true,
message: "Computer Use is ready.",
}),
);
expect(request).not.toHaveBeenCalledWith("marketplace/add", expect.anything());
expect(request).not.toHaveBeenCalledWith(
"experimentalFeature/enablement/set",
expect.anything(),
);
});
it("fails closed when multiple marketplaces contain Computer Use", async () => {
const request = createAmbiguousComputerUseRequest();
await expect(
readCodexComputerUseStatus({
pluginConfig: { computerUse: { enabled: true } },
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: false,
message:
"Multiple Codex marketplaces contain computer-use. Configure computerUse.marketplaceName or computerUse.marketplacePath to choose one.",
}),
);
expect(request).not.toHaveBeenCalledWith("plugin/read", expect.anything());
});
it("installs Computer Use from a configured marketplace source", async () => {
const request = createComputerUseRequest({ installed: false });
await expect(
installCodexComputerUse({
pluginConfig: {
computerUse: {
marketplaceSource: "github:example/desktop-tools",
},
},
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: true,
installed: true,
pluginEnabled: true,
tools: ["list_apps"],
}),
);
expect(request).toHaveBeenCalledWith("experimentalFeature/enablement/set", {
enablement: { plugins: true },
});
expect(request).toHaveBeenCalledWith("marketplace/add", {
source: "github:example/desktop-tools",
});
expect(request).toHaveBeenCalledWith("plugin/install", {
marketplacePath: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
pluginName: "computer-use",
});
expect(request).toHaveBeenCalledWith("config/mcpServer/reload", undefined);
});
it("fails closed when Computer Use is required but not installed", async () => {
const request = createComputerUseRequest({ installed: false });
await expect(
ensureCodexComputerUse({
pluginConfig: { computerUse: { enabled: true, marketplaceName: "desktop-tools" } },
request,
}),
).rejects.toThrow(CodexComputerUseSetupError);
expect(request).not.toHaveBeenCalledWith("plugin/install", expect.anything());
});
it("skips setup writes when auto-install is already ready", async () => {
const request = createComputerUseRequest({ installed: true });
await expect(
ensureCodexComputerUse({
pluginConfig: {
computerUse: {
enabled: true,
autoInstall: true,
marketplaceName: "desktop-tools",
},
},
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: true,
message: "Computer Use is ready.",
}),
);
expect(request).not.toHaveBeenCalledWith("marketplace/add", expect.anything());
expect(request).not.toHaveBeenCalledWith(
"experimentalFeature/enablement/set",
expect.anything(),
);
expect(request).not.toHaveBeenCalledWith("plugin/install", expect.anything());
});
it("uses setup writes when auto-install needs to install", async () => {
const request = createComputerUseRequest({ installed: false });
await expect(
ensureCodexComputerUse({
pluginConfig: {
computerUse: {
enabled: true,
autoInstall: true,
},
},
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: true,
message: "Computer Use is ready.",
}),
);
expect(request).toHaveBeenCalledWith("experimentalFeature/enablement/set", {
enablement: { plugins: true },
});
expect(request).not.toHaveBeenCalledWith("marketplace/add", expect.anything());
expect(request).toHaveBeenCalledWith("plugin/install", {
marketplacePath: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
pluginName: "computer-use",
});
});
it("requires an explicit install command for configured marketplace sources", async () => {
const request = createComputerUseRequest({ installed: false });
await expect(
ensureCodexComputerUse({
pluginConfig: {
computerUse: {
enabled: true,
autoInstall: true,
marketplaceSource: "github:example/desktop-tools",
},
},
request,
}),
).rejects.toThrow(CodexComputerUseSetupError);
expect(request).not.toHaveBeenCalledWith("marketplace/add", expect.anything());
expect(request).not.toHaveBeenCalledWith("plugin/install", expect.anything());
});
it("fails closed when a configured marketplace name is not discovered", async () => {
const request = createEmptyMarketplaceComputerUseRequest();
await expect(
readCodexComputerUseStatus({
pluginConfig: {
computerUse: {
enabled: true,
marketplaceName: "missing-marketplace",
},
},
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: false,
message:
"Configured Codex marketplace missing-marketplace was not found or does not contain computer-use. Run /codex computer-use install with a source or path to install from a new marketplace.",
}),
);
expect(request).not.toHaveBeenCalledWith("plugin/read", expect.anything());
});
it("waits for the default Codex marketplace during install", async () => {
vi.useFakeTimers();
const request = createComputerUseRequest({
installed: false,
marketplaceAvailableAfterListCalls: 3,
});
const installed = installCodexComputerUse({
pluginConfig: { computerUse: {} },
request,
});
await vi.advanceTimersByTimeAsync(4_000);
await expect(installed).resolves.toEqual(
expect.objectContaining({
ready: true,
message: "Computer Use is ready.",
}),
);
expect(request).toHaveBeenCalledWith("plugin/install", {
marketplacePath: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
pluginName: "computer-use",
});
expect(
vi.mocked(request).mock.calls.filter(([method]) => method === "plugin/list"),
).toHaveLength(3);
});
it("prefers the official Computer Use marketplace when multiple matches are present", async () => {
const request = createMultiMarketplaceComputerUseRequest();
await expect(
installCodexComputerUse({
pluginConfig: { computerUse: {} },
request,
}),
).resolves.toEqual(
expect.objectContaining({
ready: true,
marketplaceName: "openai-curated",
}),
);
expect(request).toHaveBeenCalledWith("plugin/install", {
marketplacePath: "/marketplaces/openai-curated/.agents/plugins/marketplace.json",
pluginName: "computer-use",
});
});
});
function createComputerUseRequest(params: {
installed: boolean;
marketplaceAvailableAfterListCalls?: number;
}): CodexComputerUseRequest {
let installed = params.installed;
let pluginListCalls = 0;
return vi.fn(async (method: string, requestParams?: unknown) => {
if (method === "experimentalFeature/enablement/set") {
return { enablement: { plugins: true } };
}
if (method === "marketplace/add") {
return {
marketplaceName: "desktop-tools",
installedRoot: "/marketplaces/desktop-tools",
alreadyAdded: false,
};
}
if (method === "plugin/list") {
pluginListCalls += 1;
const marketplaceAvailable =
pluginListCalls >= (params.marketplaceAvailableAfterListCalls ?? 1);
return {
marketplaces: marketplaceAvailable
? [
{
name: "desktop-tools",
path: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
interface: null,
plugins: [pluginSummary(installed)],
},
]
: [],
marketplaceLoadErrors: [],
featuredPluginIds: [],
};
}
if (method === "plugin/read") {
expect(requestParams).toEqual(
expect.objectContaining({
pluginName: "computer-use",
}),
);
return {
plugin: {
marketplaceName: "desktop-tools",
marketplacePath: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
summary: pluginSummary(installed),
description: "Control desktop apps.",
skills: [],
apps: [],
mcpServers: ["computer-use"],
},
};
}
if (method === "plugin/install") {
installed = true;
return { authPolicy: "ON_INSTALL", appsNeedingAuth: [] };
}
if (method === "config/mcpServer/reload") {
return undefined;
}
if (method === "mcpServerStatus/list") {
return {
data: installed
? [
{
name: "computer-use",
tools: {
list_apps: {
name: "list_apps",
inputSchema: { type: "object" },
},
},
resources: [],
resourceTemplates: [],
authStatus: "unsupported",
},
]
: [],
nextCursor: null,
};
}
throw new Error(`unexpected request ${method}`);
}) as CodexComputerUseRequest;
}
function createAmbiguousComputerUseRequest(): CodexComputerUseRequest {
return vi.fn(async (method: string) => {
if (method === "plugin/list") {
return {
marketplaces: [
{
name: "desktop-tools",
path: "/marketplaces/desktop-tools/.agents/plugins/marketplace.json",
interface: null,
plugins: [pluginSummary(true, "desktop-tools")],
},
{
name: "other-tools",
path: "/marketplaces/other-tools/.agents/plugins/marketplace.json",
interface: null,
plugins: [pluginSummary(true, "other-tools")],
},
],
marketplaceLoadErrors: [],
featuredPluginIds: [],
};
}
throw new Error(`unexpected request ${method}`);
}) as CodexComputerUseRequest;
}
function createEmptyMarketplaceComputerUseRequest(): CodexComputerUseRequest {
return vi.fn(async (method: string) => {
if (method === "plugin/list") {
return {
marketplaces: [],
marketplaceLoadErrors: [],
featuredPluginIds: [],
};
}
throw new Error(`unexpected request ${method}`);
}) as CodexComputerUseRequest;
}
function createMultiMarketplaceComputerUseRequest(): CodexComputerUseRequest {
let installed = false;
return vi.fn(async (method: string, requestParams?: unknown) => {
if (method === "experimentalFeature/enablement/set") {
return { enablement: { plugins: true } };
}
if (method === "plugin/list") {
return {
marketplaces: [
marketplaceEntry("workspace-tools", false),
marketplaceEntry("openai-curated", installed),
],
marketplaceLoadErrors: [],
featuredPluginIds: [],
};
}
if (method === "plugin/read") {
return {
plugin: {
marketplaceName: "openai-curated",
marketplacePath: "/marketplaces/openai-curated/.agents/plugins/marketplace.json",
summary: pluginSummary(installed, "openai-curated"),
description: "Control desktop apps.",
skills: [],
apps: [],
mcpServers: ["computer-use"],
},
};
}
if (method === "plugin/install") {
expect(requestParams).toEqual({
marketplacePath: "/marketplaces/openai-curated/.agents/plugins/marketplace.json",
pluginName: "computer-use",
});
installed = true;
return { authPolicy: "ON_INSTALL", appsNeedingAuth: [] };
}
if (method === "config/mcpServer/reload") {
return undefined;
}
if (method === "mcpServerStatus/list") {
return {
data: installed
? [
{
name: "computer-use",
tools: {
list_apps: {
name: "list_apps",
inputSchema: { type: "object" },
},
},
resources: [],
resourceTemplates: [],
authStatus: "unsupported",
},
]
: [],
nextCursor: null,
};
}
throw new Error(`unexpected request ${method}`);
}) as CodexComputerUseRequest;
}
function marketplaceEntry(marketplaceName: string, installed: boolean) {
return {
name: marketplaceName,
path: `/marketplaces/${marketplaceName}/.agents/plugins/marketplace.json`,
interface: null,
plugins: [pluginSummary(installed, marketplaceName)],
};
}
function pluginSummary(installed: boolean, marketplaceName = "desktop-tools") {
return {
id: `computer-use@${marketplaceName}`,
name: "computer-use",
source: { type: "local", path: `/marketplaces/${marketplaceName}/plugins/computer-use` },
installed,
enabled: installed,
installPolicy: "AVAILABLE",
authPolicy: "ON_INSTALL",
interface: null,
};
}

View File

@@ -0,0 +1,511 @@
import { describeControlFailure } from "./capabilities.js";
import type { CodexAppServerClient } from "./client.js";
import {
resolveCodexAppServerRuntimeOptions,
resolveCodexComputerUseConfig,
type CodexComputerUseConfig,
type ResolvedCodexComputerUseConfig,
} from "./config.js";
import type { v2 } from "./protocol-generated/typescript/index.js";
import type { JsonValue } from "./protocol.js";
import { requestCodexAppServerJson } from "./request.js";
export type CodexComputerUseRequest = <T = JsonValue | undefined>(
method: string,
params?: unknown,
) => Promise<T>;
export type CodexComputerUseStatus = {
enabled: boolean;
ready: boolean;
installed: boolean;
pluginEnabled: boolean;
mcpServerAvailable: boolean;
pluginName: string;
mcpServerName: string;
marketplaceName?: string;
marketplacePath?: string;
tools: string[];
message: string;
};
export class CodexComputerUseSetupError extends Error {
readonly status: CodexComputerUseStatus;
constructor(status: CodexComputerUseStatus) {
super(status.message);
this.name = "CodexComputerUseSetupError";
this.status = status;
}
}
export type CodexComputerUseSetupParams = {
pluginConfig?: unknown;
overrides?: Partial<CodexComputerUseConfig>;
request?: CodexComputerUseRequest;
client?: CodexAppServerClient;
timeoutMs?: number;
signal?: AbortSignal;
forceEnable?: boolean;
};
type MarketplaceRef = {
name?: string;
path?: string;
remoteMarketplaceName?: string;
};
type MarketplaceResolution = {
marketplace?: MarketplaceRef;
message?: string;
};
const CURATED_MARKETPLACE_POLL_INTERVAL_MS = 2_000;
const COMPUTER_USE_MARKETPLACE_NAME_PRIORITY = ["openai-bundled", "openai-curated", "local"];
export async function readCodexComputerUseStatus(
params: CodexComputerUseSetupParams = {},
): Promise<CodexComputerUseStatus> {
const config = resolveComputerUseConfig(params);
if (!config.enabled) {
return disabledStatus(config);
}
try {
return await inspectCodexComputerUse({
...params,
config,
installPlugin: false,
});
} catch (error) {
return unavailableStatus(config, `Computer Use check failed: ${describeControlFailure(error)}`);
}
}
export async function ensureCodexComputerUse(
params: CodexComputerUseSetupParams = {},
): Promise<CodexComputerUseStatus> {
const config = resolveComputerUseConfig(params);
if (!config.enabled) {
return disabledStatus(config);
}
const status = await inspectCodexComputerUse({
...params,
config,
installPlugin: false,
});
if (status.ready) {
return status;
}
if (config.autoInstall) {
const blockedAutoInstallStatus = blockUnsafeAutoInstallStatus(config);
if (blockedAutoInstallStatus) {
throw new CodexComputerUseSetupError(blockedAutoInstallStatus);
}
const installedStatus = await inspectCodexComputerUse({
...params,
config,
installPlugin: true,
});
if (!installedStatus.ready) {
throw new CodexComputerUseSetupError(installedStatus);
}
return installedStatus;
}
if (!status.ready) {
throw new CodexComputerUseSetupError(status);
}
return status;
}
export async function installCodexComputerUse(
params: CodexComputerUseSetupParams = {},
): Promise<CodexComputerUseStatus> {
const config = resolveComputerUseConfig({
...params,
forceEnable: true,
overrides: { ...params.overrides, enabled: true, autoInstall: true },
});
const status = await inspectCodexComputerUse({
...params,
config,
installPlugin: true,
});
if (!status.ready) {
throw new CodexComputerUseSetupError(status);
}
return status;
}
async function inspectCodexComputerUse(params: {
pluginConfig?: unknown;
request?: CodexComputerUseRequest;
client?: CodexAppServerClient;
timeoutMs?: number;
signal?: AbortSignal;
config: ResolvedCodexComputerUseConfig;
installPlugin: boolean;
}): Promise<CodexComputerUseStatus> {
const request = createComputerUseRequest(params);
if (params.installPlugin) {
await request<v2.ExperimentalFeatureEnablementSetResponse>(
"experimentalFeature/enablement/set",
{
enablement: { plugins: true },
} satisfies v2.ExperimentalFeatureEnablementSetParams,
);
}
const marketplace = await resolveMarketplaceRef({
request,
config: params.config,
allowAdd: params.installPlugin,
signal: params.signal,
});
if (!marketplace.marketplace) {
return unavailableStatus(
params.config,
marketplace.message ??
`No Codex marketplace containing ${params.config.pluginName} is registered. Configure computerUse.marketplaceSource or computerUse.marketplacePath, then run /codex computer-use install.`,
);
}
let plugin = await readComputerUsePlugin(
request,
marketplace.marketplace,
params.config.pluginName,
);
if (!plugin.summary.installed || !plugin.summary.enabled) {
if (!params.installPlugin) {
return statusFromPlugin({
config: params.config,
plugin,
tools: [],
message: `Computer Use is available but not installed. Run /codex computer-use install or enable computerUse.autoInstall.`,
});
}
await request<v2.PluginInstallResponse>(
"plugin/install",
pluginRequestParams(
marketplace.marketplace,
params.config.pluginName,
) satisfies v2.PluginInstallParams,
);
await reloadMcpServers(request);
plugin = await readComputerUsePlugin(
request,
marketplace.marketplace,
params.config.pluginName,
);
}
let server = await readMcpServerStatus(request, params.config.mcpServerName);
if (!server && params.installPlugin) {
await reloadMcpServers(request);
server = await readMcpServerStatus(request, params.config.mcpServerName);
}
if (!server) {
return statusFromPlugin({
config: params.config,
plugin,
tools: [],
message: `Computer Use is installed, but the ${params.config.mcpServerName} MCP server is not available.`,
});
}
return statusFromPlugin({
config: params.config,
plugin,
tools: Object.keys(server.tools).toSorted(),
message: "Computer Use is ready.",
});
}
async function resolveMarketplaceRef(params: {
request: CodexComputerUseRequest;
config: ResolvedCodexComputerUseConfig;
allowAdd: boolean;
signal?: AbortSignal;
}): Promise<MarketplaceResolution> {
let preferredMarketplaceName = params.config.marketplaceName;
if (params.config.marketplaceSource && params.allowAdd) {
const added = await params.request<v2.MarketplaceAddResponse>("marketplace/add", {
source: params.config.marketplaceSource,
} satisfies v2.MarketplaceAddParams);
preferredMarketplaceName ??= added.marketplaceName;
}
if (params.config.marketplacePath) {
const marketplace: MarketplaceRef = preferredMarketplaceName
? { name: preferredMarketplaceName, path: params.config.marketplacePath }
: { path: params.config.marketplacePath };
return { marketplace };
}
let candidates: MarketplaceRef[] = [];
const waitUntil = marketplaceDiscoveryWaitUntil(params);
while (candidates.length === 0) {
const listed = await params.request<v2.PluginListResponse>("plugin/list", {
cwds: [],
} satisfies v2.PluginListParams);
candidates = findComputerUseMarketplaces(listed, params.config.pluginName);
if (candidates.length > 0) {
break;
}
if (Date.now() >= waitUntil) {
break;
}
await delay(
Math.min(CURATED_MARKETPLACE_POLL_INTERVAL_MS, waitUntil - Date.now()),
params.signal,
);
}
if (preferredMarketplaceName) {
const preferred = candidates.find((candidate) => candidate.name === preferredMarketplaceName);
if (preferred) {
return { marketplace: preferred };
}
return {
message: `Configured Codex marketplace ${preferredMarketplaceName} was not found or does not contain ${params.config.pluginName}. Run /codex computer-use install with a source or path to install from a new marketplace.`,
};
}
if (candidates.length > 1) {
const preferred = chooseKnownComputerUseMarketplace(candidates);
if (preferred) {
return { marketplace: preferred };
}
return {
message: `Multiple Codex marketplaces contain ${params.config.pluginName}. Configure computerUse.marketplaceName or computerUse.marketplacePath to choose one.`,
};
}
if (params.config.marketplaceSource && !params.allowAdd && candidates.length === 0) {
return {
message:
"Computer Use marketplace source is configured but has not been registered. Run /codex computer-use install to register it.",
};
}
const marketplace = candidates[0];
return marketplace ? { marketplace } : {};
}
function blockUnsafeAutoInstallStatus(
config: ResolvedCodexComputerUseConfig,
): CodexComputerUseStatus | undefined {
if (!config.marketplaceSource && !config.marketplacePath) {
return undefined;
}
return unavailableStatus(
config,
"Computer Use auto-install only uses marketplaces Codex app-server has already discovered. Run /codex computer-use install to install from a configured marketplace source or path.",
);
}
function findComputerUseMarketplaces(
listed: v2.PluginListResponse,
pluginName: string,
): MarketplaceRef[] {
return listed.marketplaces
.filter((marketplace) =>
marketplace.plugins.some(
(plugin) =>
plugin.name === pluginName ||
plugin.id === pluginName ||
plugin.id === `${pluginName}@${marketplace.name}`,
),
)
.map((marketplace) => {
if (marketplace.path) {
return { name: marketplace.name, path: marketplace.path };
}
return { name: marketplace.name, remoteMarketplaceName: marketplace.name };
});
}
function chooseKnownComputerUseMarketplace(
candidates: MarketplaceRef[],
): MarketplaceRef | undefined {
for (const marketplaceName of COMPUTER_USE_MARKETPLACE_NAME_PRIORITY) {
const candidate = candidates.find((marketplace) => marketplace.name === marketplaceName);
if (candidate) {
return candidate;
}
}
return undefined;
}
function marketplaceDiscoveryWaitUntil(params: {
config: ResolvedCodexComputerUseConfig;
allowAdd: boolean;
}): number {
if (
params.allowAdd &&
!params.config.marketplaceSource &&
!params.config.marketplacePath &&
!params.config.marketplaceName
) {
return Date.now() + params.config.marketplaceDiscoveryTimeoutMs;
}
return 0;
}
async function delay(ms: number, signal?: AbortSignal): Promise<void> {
if (signal?.aborted) {
throw abortError(signal);
}
await new Promise<void>((resolve, reject) => {
let timer: ReturnType<typeof setTimeout>;
const onAbort = () => {
clearTimeout(timer);
signal?.removeEventListener("abort", onAbort);
reject(abortError(signal));
};
timer = setTimeout(() => {
signal?.removeEventListener("abort", onAbort);
resolve();
}, ms);
signal?.addEventListener("abort", onAbort, { once: true });
});
}
function abortError(signal?: AbortSignal): Error {
const reason = signal?.reason;
return reason instanceof Error ? reason : new Error("Computer Use setup was aborted.");
}
async function readComputerUsePlugin(
request: CodexComputerUseRequest,
marketplace: MarketplaceRef,
pluginName: string,
): Promise<v2.PluginDetail> {
const response = await request<v2.PluginReadResponse>(
"plugin/read",
pluginRequestParams(marketplace, pluginName) satisfies v2.PluginReadParams,
);
return response.plugin;
}
async function readMcpServerStatus(
request: CodexComputerUseRequest,
serverName: string,
): Promise<v2.McpServerStatus | undefined> {
let cursor: string | null | undefined;
do {
const response = await request<v2.ListMcpServerStatusResponse>("mcpServerStatus/list", {
cursor,
limit: 100,
detail: "toolsAndAuthOnly",
} satisfies v2.ListMcpServerStatusParams);
const found = response.data.find((server) => server.name === serverName);
if (found) {
return found;
}
cursor = response.nextCursor;
} while (cursor);
return undefined;
}
async function reloadMcpServers(request: CodexComputerUseRequest): Promise<void> {
await request("config/mcpServer/reload", undefined);
}
function pluginRequestParams(marketplace: MarketplaceRef, pluginName: string) {
return {
...(marketplace.path ? { marketplacePath: marketplace.path } : {}),
...(!marketplace.path && marketplace.remoteMarketplaceName
? { remoteMarketplaceName: marketplace.remoteMarketplaceName }
: {}),
pluginName,
};
}
function statusFromPlugin(params: {
config: ResolvedCodexComputerUseConfig;
plugin: v2.PluginDetail;
tools: string[];
message: string;
}): CodexComputerUseStatus {
return {
enabled: true,
ready:
params.plugin.summary.installed && params.plugin.summary.enabled && params.tools.length > 0,
installed: params.plugin.summary.installed,
pluginEnabled: params.plugin.summary.enabled,
mcpServerAvailable: params.tools.length > 0,
pluginName: params.config.pluginName,
mcpServerName: params.config.mcpServerName,
marketplaceName: params.plugin.marketplaceName,
...(params.plugin.marketplacePath ? { marketplacePath: params.plugin.marketplacePath } : {}),
tools: params.tools,
message: params.message,
};
}
function disabledStatus(config: ResolvedCodexComputerUseConfig): CodexComputerUseStatus {
return {
enabled: false,
ready: false,
installed: false,
pluginEnabled: false,
mcpServerAvailable: false,
pluginName: config.pluginName,
mcpServerName: config.mcpServerName,
tools: [],
message: "Computer Use is disabled.",
};
}
function unavailableStatus(
config: ResolvedCodexComputerUseConfig,
message: string,
): CodexComputerUseStatus {
return {
enabled: true,
ready: false,
installed: false,
pluginEnabled: false,
mcpServerAvailable: false,
pluginName: config.pluginName,
mcpServerName: config.mcpServerName,
...(config.marketplaceName ? { marketplaceName: config.marketplaceName } : {}),
...(config.marketplacePath ? { marketplacePath: config.marketplacePath } : {}),
tools: [],
message,
};
}
function createComputerUseRequest(params: {
pluginConfig?: unknown;
request?: CodexComputerUseRequest;
client?: CodexAppServerClient;
timeoutMs?: number;
signal?: AbortSignal;
}): CodexComputerUseRequest {
if (params.request) {
return params.request;
}
if (params.client) {
return async <T = JsonValue | undefined>(method: string, requestParams?: unknown) =>
await params.client!.request<T>(method, requestParams, {
timeoutMs: params.timeoutMs,
signal: params.signal,
});
}
const runtime = resolveCodexAppServerRuntimeOptions({ pluginConfig: params.pluginConfig });
return async <T = JsonValue | undefined>(method: string, requestParams?: unknown) =>
await requestCodexAppServerJson<T>({
method,
requestParams,
timeoutMs: params.timeoutMs ?? runtime.requestTimeoutMs,
startOptions: runtime.start,
});
}
function resolveComputerUseConfig(
params: Pick<CodexComputerUseSetupParams, "pluginConfig" | "overrides" | "forceEnable">,
): ResolvedCodexComputerUseConfig {
const overrides = params.forceEnable ? { ...params.overrides, enabled: true } : params.overrides;
return resolveCodexComputerUseConfig({
pluginConfig: params.pluginConfig,
overrides,
});
}

View File

@@ -2,9 +2,11 @@ import fs from "node:fs/promises";
import { describe, expect, it } from "vitest";
import {
CODEX_APP_SERVER_CONFIG_KEYS,
CODEX_COMPUTER_USE_CONFIG_KEYS,
codexAppServerStartOptionsKey,
readCodexPluginConfig,
resolveCodexAppServerRuntimeOptions,
resolveCodexComputerUseConfig,
} from "./config.js";
describe("Codex app-server config", () => {
@@ -130,6 +132,48 @@ describe("Codex app-server config", () => {
);
});
it("resolves Computer Use setup from plugin config and environment fallbacks", () => {
expect(
resolveCodexComputerUseConfig({
pluginConfig: {
computerUse: {
autoInstall: true,
marketplaceName: "desktop-tools",
},
},
env: {
OPENCLAW_CODEX_COMPUTER_USE_PLUGIN_NAME: "env-fallback-plugin",
},
}),
).toEqual({
enabled: true,
autoInstall: true,
marketplaceDiscoveryTimeoutMs: 60_000,
pluginName: "env-fallback-plugin",
mcpServerName: "computer-use",
marketplaceName: "desktop-tools",
});
expect(
resolveCodexComputerUseConfig({
pluginConfig: {},
env: {
OPENCLAW_CODEX_COMPUTER_USE: "1",
OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_SOURCE: "github:example/plugins",
OPENCLAW_CODEX_COMPUTER_USE_AUTO_INSTALL: "true",
OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_DISCOVERY_TIMEOUT_MS: "30000",
},
}),
).toEqual(
expect.objectContaining({
enabled: true,
autoInstall: true,
marketplaceDiscoveryTimeoutMs: 30_000,
marketplaceSource: "github:example/plugins",
}),
);
});
it("allows plugin config to opt in to guardian-reviewed local execution", () => {
const runtime = resolveCodexAppServerRuntimeOptions({
pluginConfig: {
@@ -246,6 +290,7 @@ describe("Codex app-server config", () => {
configSchema: {
properties: {
appServer: { properties: Record<string, unknown> };
computerUse: { properties: Record<string, unknown> };
};
};
uiHints: Record<string, unknown>;
@@ -258,6 +303,13 @@ describe("Codex app-server config", () => {
for (const key of CODEX_APP_SERVER_CONFIG_KEYS) {
expect(manifest.uiHints[`appServer.${key}`]).toBeTruthy();
}
const computerUseManifestKeys = Object.keys(
manifest.configSchema.properties.computerUse.properties,
).toSorted();
expect(computerUseManifestKeys).toEqual([...CODEX_COMPUTER_USE_CONFIG_KEYS].toSorted());
for (const key of CODEX_COMPUTER_USE_CONFIG_KEYS) {
expect(manifest.uiHints[`computerUse.${key}`]).toBeTruthy();
}
});
it("does not schema-default mode-derived policy fields", async () => {

View File

@@ -9,6 +9,28 @@ export type CodexAppServerSandboxMode = "read-only" | "workspace-write" | "dange
export type CodexAppServerApprovalsReviewer = "user" | "auto_review" | "guardian_subagent";
export type CodexAppServerCommandSource = "managed" | "resolved-managed" | "config" | "env";
export type CodexComputerUseConfig = {
enabled?: boolean;
autoInstall?: boolean;
marketplaceDiscoveryTimeoutMs?: number;
marketplaceSource?: string;
marketplacePath?: string;
marketplaceName?: string;
pluginName?: string;
mcpServerName?: string;
};
export type ResolvedCodexComputerUseConfig = {
enabled: boolean;
autoInstall: boolean;
marketplaceDiscoveryTimeoutMs: number;
pluginName: string;
mcpServerName: string;
marketplaceSource?: string;
marketplacePath?: string;
marketplaceName?: string;
};
export type CodexAppServerStartOptions = {
transport: CodexAppServerTransportMode;
command: string;
@@ -35,6 +57,7 @@ export type CodexPluginConfig = {
enabled?: boolean;
timeoutMs?: number;
};
computerUse?: CodexComputerUseConfig;
appServer?: {
mode?: CodexAppServerPolicyMode;
transport?: CodexAppServerTransportMode;
@@ -68,6 +91,21 @@ export const CODEX_APP_SERVER_CONFIG_KEYS = [
"defaultWorkspaceDir",
] as const;
export const CODEX_COMPUTER_USE_CONFIG_KEYS = [
"enabled",
"autoInstall",
"marketplaceDiscoveryTimeoutMs",
"marketplaceSource",
"marketplacePath",
"marketplaceName",
"pluginName",
"mcpServerName",
] as const;
export const DEFAULT_CODEX_COMPUTER_USE_PLUGIN_NAME = "computer-use";
export const DEFAULT_CODEX_COMPUTER_USE_MCP_SERVER_NAME = "computer-use";
export const DEFAULT_CODEX_COMPUTER_USE_MARKETPLACE_DISCOVERY_TIMEOUT_MS = 60_000;
const codexAppServerTransportSchema = z.enum(["stdio", "websocket"]);
const codexAppServerPolicyModeSchema = z.enum(["yolo", "guardian"]);
const codexAppServerApprovalPolicySchema = z.enum([
@@ -92,6 +130,19 @@ const codexPluginConfigSchema = z
})
.strict()
.optional(),
computerUse: z
.object({
enabled: z.boolean().optional(),
autoInstall: z.boolean().optional(),
marketplaceDiscoveryTimeoutMs: z.number().positive().optional(),
marketplaceSource: z.string().optional(),
marketplacePath: z.string().optional(),
marketplaceName: z.string().optional(),
pluginName: z.string().optional(),
mcpServerName: z.string().optional(),
})
.strict()
.optional(),
appServer: z
.object({
mode: codexAppServerPolicyModeSchema.optional(),
@@ -176,6 +227,64 @@ export function resolveCodexAppServerRuntimeOptions(
};
}
export function resolveCodexComputerUseConfig(
params: {
pluginConfig?: unknown;
env?: NodeJS.ProcessEnv;
overrides?: Partial<CodexComputerUseConfig>;
} = {},
): ResolvedCodexComputerUseConfig {
const env = params.env ?? process.env;
const config = readCodexPluginConfig(params.pluginConfig).computerUse ?? {};
const marketplaceSource =
readNonEmptyString(params.overrides?.marketplaceSource) ??
readNonEmptyString(config.marketplaceSource) ??
readNonEmptyString(env.OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_SOURCE);
const marketplacePath =
readNonEmptyString(params.overrides?.marketplacePath) ??
readNonEmptyString(config.marketplacePath) ??
readNonEmptyString(env.OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_PATH);
const marketplaceName =
readNonEmptyString(params.overrides?.marketplaceName) ??
readNonEmptyString(config.marketplaceName) ??
readNonEmptyString(env.OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_NAME);
const autoInstall =
params.overrides?.autoInstall ??
config.autoInstall ??
readBooleanEnv(env.OPENCLAW_CODEX_COMPUTER_USE_AUTO_INSTALL) ??
false;
const marketplaceDiscoveryTimeoutMs = normalizePositiveNumber(
params.overrides?.marketplaceDiscoveryTimeoutMs ??
config.marketplaceDiscoveryTimeoutMs ??
readNumberEnv(env.OPENCLAW_CODEX_COMPUTER_USE_MARKETPLACE_DISCOVERY_TIMEOUT_MS),
DEFAULT_CODEX_COMPUTER_USE_MARKETPLACE_DISCOVERY_TIMEOUT_MS,
);
const enabled =
params.overrides?.enabled ??
config.enabled ??
readBooleanEnv(env.OPENCLAW_CODEX_COMPUTER_USE) ??
Boolean(autoInstall || marketplaceSource || marketplacePath || marketplaceName);
return {
enabled,
autoInstall,
marketplaceDiscoveryTimeoutMs,
pluginName:
readNonEmptyString(params.overrides?.pluginName) ??
readNonEmptyString(config.pluginName) ??
readNonEmptyString(env.OPENCLAW_CODEX_COMPUTER_USE_PLUGIN_NAME) ??
DEFAULT_CODEX_COMPUTER_USE_PLUGIN_NAME,
mcpServerName:
readNonEmptyString(params.overrides?.mcpServerName) ??
readNonEmptyString(config.mcpServerName) ??
readNonEmptyString(env.OPENCLAW_CODEX_COMPUTER_USE_MCP_SERVER_NAME) ??
DEFAULT_CODEX_COMPUTER_USE_MCP_SERVER_NAME,
...(marketplaceSource ? { marketplaceSource } : {}),
...(marketplacePath ? { marketplacePath } : {}),
...(marketplaceName ? { marketplaceName } : {}),
};
}
export function codexAppServerStartOptionsKey(
options: CodexAppServerStartOptions,
params: { authProfileId?: string } = {},
@@ -264,6 +373,28 @@ function normalizeHeaders(value: unknown): Record<string, string> {
);
}
function readBooleanEnv(value: string | undefined): boolean | undefined {
if (value === undefined) {
return undefined;
}
const normalized = value.trim().toLowerCase();
if (["1", "true", "yes", "on"].includes(normalized)) {
return true;
}
if (["0", "false", "no", "off"].includes(normalized)) {
return false;
}
return undefined;
}
function readNumberEnv(value: string | undefined): number | undefined {
if (value === undefined) {
return undefined;
}
const parsed = Number(value);
return Number.isFinite(parsed) ? parsed : undefined;
}
function resolveArgs(configArgs: unknown, envArgs: string | undefined): string[] {
if (Array.isArray(configArgs)) {
return configArgs

View File

@@ -167,7 +167,7 @@ describe("CodexAppServerEventProjector", () => {
outputTokens: 100_000,
},
last: {
totalTokens: 14,
totalTokens: 12,
inputTokens: 5,
cachedInputTokens: 2,
outputTokens: 7,
@@ -186,12 +186,12 @@ describe("CodexAppServerEventProjector", () => {
expect(result.assistantTexts).toEqual(["hello"]);
expect(result.messagesSnapshot.map((message) => message.role)).toEqual(["user", "assistant"]);
expect(result.lastAssistant?.content).toEqual([{ type: "text", text: "hello" }]);
expect(result.attemptUsage).toMatchObject({ input: 5, output: 7, cacheRead: 2, total: 14 });
expect(result.attemptUsage).toMatchObject({ input: 3, output: 7, cacheRead: 2, total: 12 });
expect(result.lastAssistant?.usage).toMatchObject({
input: 5,
input: 3,
output: 7,
cacheRead: 2,
totalTokens: 14,
totalTokens: 12,
});
expect(result.replayMetadata.replaySafe).toBe(true);
});
@@ -289,7 +289,7 @@ describe("CodexAppServerEventProjector", () => {
tokenUsage: {
total: { total_tokens: 1_000_000 },
last_token_usage: {
total_tokens: 20,
total_tokens: 17,
input_tokens: 8,
cached_input_tokens: 3,
output_tokens: 9,
@@ -300,12 +300,12 @@ describe("CodexAppServerEventProjector", () => {
const result = projector.buildResult(buildEmptyToolTelemetry());
expect(result.attemptUsage).toMatchObject({ input: 8, output: 9, cacheRead: 3, total: 20 });
expect(result.attemptUsage).toMatchObject({ input: 5, output: 9, cacheRead: 3, total: 17 });
expect(result.lastAssistant?.usage).toMatchObject({
input: 8,
input: 5,
output: 9,
cacheRead: 3,
totalTokens: 20,
totalTokens: 17,
});
});

View File

@@ -61,6 +61,13 @@ const CURRENT_TOKEN_USAGE_KEYS = [
"last_token_usage",
] as const;
const CODEX_PROMPT_TOTAL_INPUT_KEYS = [
"inputTokens",
"input_tokens",
"promptTokens",
"prompt_tokens",
] as const;
const MAX_TOOL_OUTPUT_DELTA_MESSAGES_PER_ITEM = 20;
export class CodexAppServerEventProjector {
@@ -910,17 +917,24 @@ function readNumberAlias(record: JsonObject, keys: readonly string[]): number |
}
function normalizeCodexTokenUsage(record: JsonObject): ReturnType<typeof normalizeUsage> {
const promptTotalInput = readNumberAlias(record, CODEX_PROMPT_TOTAL_INPUT_KEYS);
const cacheRead = readNumberAlias(record, [
"cachedInputTokens",
"cached_input_tokens",
"cacheRead",
"cache_read",
"cache_read_input_tokens",
"cached_tokens",
]);
const input =
promptTotalInput !== undefined && cacheRead !== undefined
? Math.max(0, promptTotalInput - cacheRead)
: (promptTotalInput ?? readNumber(record, "input"));
return normalizeUsage({
input: readNumberAlias(record, ["inputTokens", "input_tokens", "input", "promptTokens"]),
input,
output: readNumberAlias(record, ["outputTokens", "output_tokens", "output"]),
cacheRead: readNumberAlias(record, [
"cachedInputTokens",
"cached_input_tokens",
"cacheRead",
"cache_read",
"cache_read_input_tokens",
"cached_tokens",
]),
cacheRead,
cacheWrite: readNumberAlias(record, [
"cacheWrite",
"cache_write",

View File

@@ -41,6 +41,7 @@ import {
defaultCodexAppServerClientFactory,
} from "./client-factory.js";
import { isCodexAppServerApprovalRequest, type CodexAppServerClient } from "./client.js";
import { ensureCodexComputerUse } from "./computer-use.js";
import { resolveCodexAppServerRuntimeOptions } from "./config.js";
import { projectContextEngineAssemblyForCodex } from "./context-engine-projection.js";
import { createCodexDynamicToolBridge } from "./dynamic-tools.js";
@@ -311,6 +312,12 @@ export async function runCodexAppServerAttempt(
signal: runAbortController.signal,
operation: async () => {
const startupClient = await clientFactory(appServer.start, startupAuthProfileId);
await ensureCodexComputerUse({
client: startupClient,
pluginConfig: options.pluginConfig,
timeoutMs: appServer.requestTimeoutMs,
signal: runAbortController.signal,
});
const startupThread = await startOrResumeThread({
client: startupClient,
params,

View File

@@ -1,3 +1,4 @@
import type { CodexComputerUseStatus } from "./app-server/computer-use.js";
import type { CodexAppServerModelListResult } from "./app-server/models.js";
import { isJsonObject, type JsonObject, type JsonValue } from "./app-server/protocol.js";
import type { SafeValue } from "./command-rpc.js";
@@ -89,6 +90,28 @@ export function formatAccount(
].join("\n");
}
export function formatComputerUseStatus(status: CodexComputerUseStatus): string {
const lines = [
`Computer Use: ${status.ready ? "ready" : status.enabled ? "not ready" : "disabled"}`,
];
lines.push(
`Plugin: ${status.pluginName}${status.installed ? " (installed)" : " (not installed)"}`,
);
lines.push(
`MCP server: ${status.mcpServerName}${
status.mcpServerAvailable ? ` (${status.tools.length} tools)` : " (unavailable)"
}`,
);
if (status.marketplaceName) {
lines.push(`Marketplace: ${status.marketplaceName}`);
}
if (status.tools.length > 0) {
lines.push(`Tools: ${status.tools.slice(0, 8).join(", ")}`);
}
lines.push(status.message);
return lines.join("\n");
}
export function formatList(response: JsonValue | undefined, label: string): string {
const entries = extractArray(response);
if (entries.length === 0) {
@@ -120,6 +143,7 @@ export function buildHelp(): string {
"- /codex detach",
"- /codex compact",
"- /codex review",
"- /codex computer-use [status|install]",
"- /codex account",
"- /codex mcp",
"- /codex skills",

View File

@@ -1,5 +1,11 @@
import type { PluginCommandContext, PluginCommandResult } from "openclaw/plugin-sdk/plugin-entry";
import { CODEX_CONTROL_METHODS, type CodexControlMethod } from "./app-server/capabilities.js";
import {
installCodexComputerUse,
readCodexComputerUseStatus,
type CodexComputerUseSetupParams,
} from "./app-server/computer-use.js";
import type { CodexComputerUseConfig } from "./app-server/config.js";
import { listAllCodexAppServerModels } from "./app-server/models.js";
import { isJsonObject, type JsonValue } from "./app-server/protocol.js";
import {
@@ -10,6 +16,7 @@ import {
import {
buildHelp,
formatAccount,
formatComputerUseStatus,
formatCodexStatus,
formatList,
formatModels,
@@ -49,6 +56,8 @@ export type CodexCommandDeps = {
safeCodexControlRequest: SafeCodexControlRequestFn;
writeCodexAppServerBinding: typeof writeCodexAppServerBinding;
clearCodexAppServerBinding: typeof clearCodexAppServerBinding;
readCodexComputerUseStatus: typeof readCodexComputerUseStatus;
installCodexComputerUse: typeof installCodexComputerUse;
resolveCodexDefaultWorkspaceDir: typeof resolveCodexDefaultWorkspaceDir;
startCodexConversationThread: typeof startCodexConversationThread;
readCodexConversationActiveTurn: typeof readCodexConversationActiveTurn;
@@ -80,6 +89,8 @@ const defaultCodexCommandDeps: CodexCommandDeps = {
safeCodexControlRequest,
writeCodexAppServerBinding,
clearCodexAppServerBinding,
readCodexComputerUseStatus,
installCodexComputerUse,
resolveCodexDefaultWorkspaceDir,
startCodexConversationThread,
readCodexConversationActiveTurn,
@@ -98,6 +109,13 @@ type ParsedBindArgs = {
help?: boolean;
};
type ParsedComputerUseArgs = {
action: "status" | "install";
overrides: Partial<CodexComputerUseConfig>;
hasOverrides: boolean;
help?: boolean;
};
export async function handleCodexSubcommand(
ctx: PluginCommandContext,
options: { pluginConfig?: unknown; deps?: Partial<CodexCommandDeps> },
@@ -170,6 +188,11 @@ export async function handleCodexSubcommand(
),
};
}
if (normalized === "computer-use" || normalized === "computeruse") {
return {
text: await handleComputerUseCommand(deps, options.pluginConfig, rest),
};
}
if (normalized === "mcp") {
return {
text: formatList(
@@ -204,6 +227,29 @@ export async function handleCodexSubcommand(
return { text: `Unknown Codex command: ${subcommand}\n\n${buildHelp()}` };
}
async function handleComputerUseCommand(
deps: CodexCommandDeps,
pluginConfig: unknown,
args: string[],
): Promise<string> {
const parsed = parseComputerUseArgs(args);
if (parsed.help) {
return [
"Usage: /codex computer-use [status|install] [--source <marketplace-source>] [--marketplace-path <path>] [--marketplace <name>]",
"Checks or installs the configured Codex Computer Use plugin through app-server.",
].join("\n");
}
const params: CodexComputerUseSetupParams = {
pluginConfig,
forceEnable: parsed.action === "install" || parsed.hasOverrides,
...(Object.keys(parsed.overrides).length > 0 ? { overrides: parsed.overrides } : {}),
};
if (parsed.action === "install") {
return formatComputerUseStatus(await deps.installCodexComputerUse(params));
}
return formatComputerUseStatus(await deps.readCodexComputerUseStatus(params));
}
async function bindConversation(
deps: CodexCommandDeps,
ctx: PluginCommandContext,
@@ -504,6 +550,114 @@ function parseBindArgs(args: string[]): ParsedBindArgs {
return parsed;
}
function parseComputerUseArgs(args: string[]): ParsedComputerUseArgs {
const parsed: ParsedComputerUseArgs = {
action: "status",
overrides: {},
hasOverrides: false,
};
for (let index = 0; index < args.length; index += 1) {
const arg = args[index];
if (arg === "--help" || arg === "-h") {
parsed.help = true;
continue;
}
if (arg === "status" || arg === "install") {
parsed.action = arg;
continue;
}
if (arg === "--source" || arg === "--marketplace-source") {
const value = readRequiredOptionValue(args, index);
if (!value) {
parsed.help = true;
continue;
}
parsed.overrides.marketplaceSource = value;
index += 1;
continue;
}
if (arg === "--marketplace-path" || arg === "--path") {
const value = readRequiredOptionValue(args, index);
if (!value) {
parsed.help = true;
continue;
}
parsed.overrides.marketplacePath = value;
index += 1;
continue;
}
if (arg === "--marketplace") {
const value = readRequiredOptionValue(args, index);
if (!value) {
parsed.help = true;
continue;
}
parsed.overrides.marketplaceName = value;
index += 1;
continue;
}
if (arg === "--plugin") {
const value = readRequiredOptionValue(args, index);
if (!value) {
parsed.help = true;
continue;
}
parsed.overrides.pluginName = value;
index += 1;
continue;
}
if (arg === "--server" || arg === "--mcp-server") {
const value = readRequiredOptionValue(args, index);
if (!value) {
parsed.help = true;
continue;
}
parsed.overrides.mcpServerName = value;
index += 1;
continue;
}
parsed.help = true;
}
parsed.overrides = normalizeComputerUseStringOverrides(parsed.overrides);
parsed.hasOverrides = Object.values(parsed.overrides).some(Boolean);
return parsed;
}
function readRequiredOptionValue(args: string[], index: number): string | undefined {
const value = args[index + 1];
if (!value || value.startsWith("-")) {
return undefined;
}
return value;
}
function normalizeComputerUseStringOverrides(
overrides: Partial<CodexComputerUseConfig>,
): Partial<CodexComputerUseConfig> {
const normalized: Partial<CodexComputerUseConfig> = {};
const marketplaceSource = normalizeOptionalString(overrides.marketplaceSource);
if (marketplaceSource) {
normalized.marketplaceSource = marketplaceSource;
}
const marketplacePath = normalizeOptionalString(overrides.marketplacePath);
if (marketplacePath) {
normalized.marketplacePath = marketplacePath;
}
const marketplaceName = normalizeOptionalString(overrides.marketplaceName);
if (marketplaceName) {
normalized.marketplaceName = marketplaceName;
}
const pluginName = normalizeOptionalString(overrides.pluginName);
if (pluginName) {
normalized.pluginName = pluginName;
}
const mcpServerName = normalizeOptionalString(overrides.mcpServerName);
if (mcpServerName) {
normalized.mcpServerName = mcpServerName;
}
return normalized;
}
function normalizeOptionalString(value: string | undefined): string | undefined {
const trimmed = value?.trim();
return trimmed || undefined;

View File

@@ -4,6 +4,7 @@ import path from "node:path";
import type { PluginCommandContext } from "openclaw/plugin-sdk/plugin-entry";
import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";
import { CODEX_CONTROL_METHODS } from "./app-server/capabilities.js";
import type { CodexComputerUseStatus } from "./app-server/computer-use.js";
import type { CodexAppServerStartOptions } from "./app-server/config.js";
import { resetSharedCodexAppServerClientForTests } from "./app-server/shared-client.js";
import type { CodexCommandDeps } from "./command-handlers.js";
@@ -241,6 +242,67 @@ describe("codex command", () => {
});
});
it("checks Codex Computer Use setup", async () => {
const readCodexComputerUseStatus = vi.fn(async () => computerUseReadyStatus());
await expect(
handleCodexCommand(createContext("computer-use status"), {
deps: createDeps({ readCodexComputerUseStatus }),
}),
).resolves.toEqual({
text: [
"Computer Use: ready",
"Plugin: computer-use (installed)",
"MCP server: computer-use (1 tools)",
"Marketplace: desktop-tools",
"Tools: list_apps",
"Computer Use is ready.",
].join("\n"),
});
expect(readCodexComputerUseStatus).toHaveBeenCalledWith({
pluginConfig: undefined,
forceEnable: false,
});
});
it("installs Codex Computer Use from command overrides", async () => {
const installCodexComputerUse = vi.fn(async () => computerUseReadyStatus());
await expect(
handleCodexCommand(
createContext(
"computer-use install --source github:example/desktop-tools --marketplace desktop-tools",
),
{
deps: createDeps({ installCodexComputerUse }),
},
),
).resolves.toEqual({
text: expect.stringContaining("Computer Use: ready"),
});
expect(installCodexComputerUse).toHaveBeenCalledWith({
pluginConfig: undefined,
forceEnable: true,
overrides: {
marketplaceSource: "github:example/desktop-tools",
marketplaceName: "desktop-tools",
},
});
});
it("shows help when Computer Use option values are missing", async () => {
const installCodexComputerUse = vi.fn(async () => computerUseReadyStatus());
await expect(
handleCodexCommand(createContext("computer-use install --source"), {
deps: createDeps({ installCodexComputerUse }),
}),
).resolves.toEqual({
text: expect.stringContaining("Usage: /codex computer-use"),
});
expect(installCodexComputerUse).not.toHaveBeenCalled();
});
it("explains compaction when no Codex thread is attached", async () => {
const sessionFile = path.join(tempDir, "session.jsonl");
@@ -600,3 +662,18 @@ describe("codex command", () => {
});
});
});
function computerUseReadyStatus(): CodexComputerUseStatus {
return {
enabled: true,
ready: true,
installed: true,
pluginEnabled: true,
mcpServerAvailable: true,
pluginName: "computer-use",
mcpServerName: "computer-use",
marketplaceName: "desktop-tools",
tools: ["list_apps"],
message: "Computer Use is ready.",
};
}

View File

@@ -7,14 +7,24 @@ const telemetryState = vi.hoisted(() => {
name: string;
addEvent: ReturnType<typeof vi.fn>;
end: ReturnType<typeof vi.fn>;
setAttributes: ReturnType<typeof vi.fn>;
setStatus: ReturnType<typeof vi.fn>;
spanContext: ReturnType<typeof vi.fn>;
}> = [];
const tracer = {
startSpan: vi.fn((name: string, _opts?: unknown, _ctx?: unknown) => {
const spanNumber = spans.length + 1;
const spanId = spanNumber.toString(16).padStart(16, "0");
const span = {
addEvent: vi.fn(),
end: vi.fn(),
setAttributes: vi.fn(),
setStatus: vi.fn(),
spanContext: vi.fn(() => ({
traceId: "4bf92f3577b34da6a3ce929d0e0e4736",
spanId,
traceFlags: 1,
})),
};
spans.push({ name, ...span });
return span;
@@ -122,6 +132,7 @@ vi.mock("@opentelemetry/semantic-conventions", () => ({
import {
emitTrustedDiagnosticEvent,
onInternalDiagnosticEvent,
resetDiagnosticEventsForTest,
} from "../../../src/infra/diagnostic-events.js";
import type { OpenClawPluginServiceContext } from "../api.js";
import { emitDiagnosticEvent } from "../api.js";
@@ -219,6 +230,7 @@ function flushDiagnosticEvents() {
describe("diagnostics-otel service", () => {
beforeEach(() => {
resetDiagnosticEventsForTest();
delete process.env.OPENCLAW_OTEL_PRELOADED;
delete process.env.OTEL_SEMCONV_STABILITY_OPT_IN;
telemetryState.counters.clear();
@@ -241,6 +253,7 @@ describe("diagnostics-otel service", () => {
});
afterEach(() => {
resetDiagnosticEventsForTest();
if (ORIGINAL_OPENCLAW_OTEL_PRELOADED === undefined) {
delete process.env.OPENCLAW_OTEL_PRELOADED;
} else {
@@ -561,6 +574,7 @@ describe("diagnostics-otel service", () => {
outcome: "completed",
durationMs: 100,
});
await flushDiagnosticEvents();
expect(sdkStart).not.toHaveBeenCalled();
expect(telemetryState.histograms.get("openclaw.run.duration_ms")?.record).toHaveBeenCalledWith(
@@ -1133,6 +1147,9 @@ describe("diagnostics-otel service", () => {
api: "completions",
transport: "http",
durationMs: 80,
requestPayloadBytes: 1234,
responseStreamBytes: 567,
timeToFirstByteMs: 45,
trace: {
traceId: TRACE_ID,
spanId: CHILD_SPAN_ID,
@@ -1295,6 +1312,41 @@ describe("diagnostics-otel service", () => {
"openclaw.model": "gpt-5.4",
}),
);
expect(
telemetryState.histograms.get("openclaw.model_call.request_bytes")?.record,
).toHaveBeenCalledWith(
1234,
expect.objectContaining({
"openclaw.provider": "openai",
"openclaw.model": "gpt-5.4",
}),
);
expect(
telemetryState.histograms.get("openclaw.model_call.response_bytes")?.record,
).toHaveBeenCalledWith(
567,
expect.objectContaining({
"openclaw.provider": "openai",
"openclaw.model": "gpt-5.4",
}),
);
expect(
telemetryState.histograms.get("openclaw.model_call.time_to_first_byte_ms")?.record,
).toHaveBeenCalledWith(
45,
expect.objectContaining({
"openclaw.provider": "openai",
"openclaw.model": "gpt-5.4",
}),
);
const modelCallSpan = telemetryState.spans.find((span) => span.name === "openclaw.model.call");
expect(modelCallSpan?.setAttributes).toHaveBeenCalledWith(
expect.objectContaining({
"openclaw.model_call.request_bytes": 1234,
"openclaw.model_call.response_bytes": 567,
"openclaw.model_call.time_to_first_byte_ms": 45,
}),
);
expect(telemetryState.histograms.get("openclaw.run.duration_ms")?.record).toHaveBeenCalledWith(
100,
expect.not.objectContaining({
@@ -1506,6 +1558,17 @@ describe("diagnostics-otel service", () => {
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
await service.start(ctx);
emitTrustedDiagnosticEvent({
type: "run.started",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
trace: {
traceId: TRACE_ID,
spanId: SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "context.assembled",
runId: "run-1",
@@ -1536,6 +1599,8 @@ describe("diagnostics-otel service", () => {
const contextCall = telemetryState.tracer.startSpan.mock.calls.find(
(call) => call[0] === "openclaw.context.assembled",
);
const runSpan = telemetryState.spans.find((span) => span.name === "openclaw.run");
const runSpanId = runSpan?.spanContext.mock.results[0]?.value?.spanId;
expect(contextCall?.[1]).toMatchObject({
attributes: {
"openclaw.provider": "openai",
@@ -1553,12 +1618,19 @@ describe("diagnostics-otel service", () => {
"openclaw.context.reserve_tokens": 4096,
},
});
expect(contextCall?.[1]).toEqual({
attributes: expect.any(Object),
startTime: expect.any(Number),
});
expect(JSON.stringify(contextCall)).not.toContain("session-key");
expect(JSON.stringify(contextCall)).not.toContain("prompt text");
expect(telemetryState.tracer.setSpanContext).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({ traceId: TRACE_ID, spanId: SPAN_ID }),
expect.objectContaining({ traceId: TRACE_ID, spanId: runSpanId }),
);
expect(
(contextCall?.[2] as { spanContext?: { spanId?: string } } | undefined)?.spanContext?.spanId,
).toBe(runSpanId);
await service.stop?.(ctx);
});
@@ -1688,7 +1760,185 @@ describe("diagnostics-otel service", () => {
await service.stop?.(ctx);
});
test("parents trusted diagnostic lifecycle spans from explicit parent ids", async () => {
test("parents trusted diagnostic lifecycle spans from active started spans", async () => {
const service = createDiagnosticsOtelService();
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
await service.start(ctx);
emitTrustedDiagnosticEvent({
type: "run.started",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
trace: {
traceId: TRACE_ID,
spanId: CHILD_SPAN_ID,
parentSpanId: SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "model.call.started",
runId: "run-1",
callId: "call-1",
provider: "openai",
model: "gpt-5.4",
trace: {
traceId: TRACE_ID,
spanId: GRANDCHILD_SPAN_ID,
parentSpanId: CHILD_SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "tool.execution.started",
runId: "run-1",
toolName: "read",
trace: {
traceId: TRACE_ID,
spanId: TOOL_SPAN_ID,
parentSpanId: GRANDCHILD_SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "tool.execution.error",
runId: "run-1",
toolName: "read",
durationMs: 20,
errorCategory: "TypeError",
trace: {
traceId: TRACE_ID,
spanId: TOOL_SPAN_ID,
parentSpanId: GRANDCHILD_SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "model.call.completed",
runId: "run-1",
callId: "call-1",
provider: "openai",
model: "gpt-5.4",
durationMs: 80,
trace: {
traceId: TRACE_ID,
spanId: GRANDCHILD_SPAN_ID,
parentSpanId: CHILD_SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "run.completed",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
outcome: "completed",
durationMs: 100,
trace: {
traceId: TRACE_ID,
spanId: CHILD_SPAN_ID,
parentSpanId: SPAN_ID,
traceFlags: "01",
},
});
await flushDiagnosticEvents();
const runSpan = telemetryState.spans.find((span) => span.name === "openclaw.run");
const modelSpan = telemetryState.spans.find((span) => span.name === "openclaw.model.call");
const toolSpan = telemetryState.spans.find((span) => span.name === "openclaw.tool.execution");
const runSpanId = runSpan?.spanContext.mock.results[0]?.value?.spanId;
const modelSpanId = modelSpan?.spanContext.mock.results[0]?.value?.spanId;
expect(telemetryState.tracer.setSpanContext).toHaveBeenCalledTimes(2);
expect(telemetryState.tracer.setSpanContext.mock.calls.map((call) => call[1])).toEqual([
expect.objectContaining({ traceId: TRACE_ID, spanId: runSpanId }),
expect.objectContaining({ traceId: TRACE_ID, spanId: modelSpanId }),
]);
const parentBySpanName = Object.fromEntries(
telemetryState.tracer.startSpan.mock.calls.map((call) => [
call[0],
(call[2] as { spanContext?: { spanId?: string } } | undefined)?.spanContext?.spanId,
]),
);
expect(parentBySpanName).toMatchObject({
"openclaw.run": undefined,
"openclaw.model.call": runSpanId,
"openclaw.tool.execution": modelSpanId,
});
expect(toolSpan?.setStatus).toHaveBeenCalledWith({
code: 2,
message: "TypeError",
});
await service.stop?.(ctx);
});
test("keeps trusted run spans alive long enough for post-completion usage parenting", async () => {
const service = createDiagnosticsOtelService();
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
await service.start(ctx);
emitTrustedDiagnosticEvent({
type: "run.started",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
trace: {
traceId: TRACE_ID,
spanId: CHILD_SPAN_ID,
parentSpanId: SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "run.completed",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
outcome: "completed",
durationMs: 100,
trace: {
traceId: TRACE_ID,
spanId: CHILD_SPAN_ID,
parentSpanId: SPAN_ID,
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "model.usage",
provider: "openai",
model: "gpt-5.4",
usage: { input: 3, output: 2, total: 5 },
durationMs: 10,
trace: {
traceId: TRACE_ID,
spanId: GRANDCHILD_SPAN_ID,
parentSpanId: SPAN_ID,
traceFlags: "01",
},
});
await flushDiagnosticEvents();
const runSpan = telemetryState.spans.find((span) => span.name === "openclaw.run");
const runSpanId = runSpan?.spanContext.mock.results[0]?.value?.spanId;
const modelUsageCall = telemetryState.tracer.startSpan.mock.calls.find(
(call) => call[0] === "openclaw.model.usage",
);
expect(telemetryState.tracer.setSpanContext).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({ traceId: TRACE_ID, spanId: runSpanId }),
);
expect(
(modelUsageCall?.[2] as { spanContext?: { spanId?: string } } | undefined)?.spanContext
?.spanId,
).toBe(runSpanId);
expect(runSpan?.end).toHaveBeenCalledWith(expect.any(Number));
await service.stop?.(ctx);
});
test("does not force remote parents for completed-only trusted lifecycle spans", async () => {
const service = createDiagnosticsOtelService();
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
await service.start(ctx);
@@ -1721,38 +1971,15 @@ describe("diagnostics-otel service", () => {
traceFlags: "01",
},
});
emitTrustedDiagnosticEvent({
type: "tool.execution.error",
runId: "run-1",
toolName: "read",
durationMs: 20,
errorCategory: "TypeError",
trace: {
traceId: TRACE_ID,
spanId: TOOL_SPAN_ID,
parentSpanId: GRANDCHILD_SPAN_ID,
traceFlags: "01",
},
});
await flushDiagnosticEvents();
expect(telemetryState.tracer.setSpanContext).toHaveBeenCalledTimes(3);
expect(telemetryState.tracer.setSpanContext.mock.calls.map((call) => call[1])).toEqual([
expect.objectContaining({ traceId: TRACE_ID, spanId: SPAN_ID }),
expect.objectContaining({ traceId: TRACE_ID, spanId: CHILD_SPAN_ID }),
expect.objectContaining({ traceId: TRACE_ID, spanId: GRANDCHILD_SPAN_ID }),
]);
expect(telemetryState.tracer.setSpanContext).not.toHaveBeenCalled();
const parentBySpanName = Object.fromEntries(
telemetryState.tracer.startSpan.mock.calls.map((call) => [
call[0],
(call[2] as { spanContext?: { spanId?: string } } | undefined)?.spanContext?.spanId,
]),
telemetryState.tracer.startSpan.mock.calls.map((call) => [call[0], call[2]]),
);
expect(parentBySpanName).toMatchObject({
"openclaw.run": SPAN_ID,
"openclaw.model.call": CHILD_SPAN_ID,
"openclaw.tool.execution": GRANDCHILD_SPAN_ID,
"openclaw.run": undefined,
"openclaw.model.call": undefined,
});
await service.stop?.(ctx);
});
@@ -1860,6 +2087,93 @@ describe("diagnostics-otel service", () => {
await service.stop?.(ctx);
});
test("does not create live started spans for untrusted lifecycle diagnostics", async () => {
const service = createDiagnosticsOtelService();
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });
await service.start(ctx);
emitDiagnosticEvent({
type: "run.started",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
});
emitDiagnosticEvent({
type: "run.completed",
runId: "run-1",
provider: "openai",
model: "gpt-5.4",
outcome: "completed",
durationMs: 100,
});
emitDiagnosticEvent({
type: "model.call.started",
runId: "run-1",
callId: "call-1",
provider: "openai",
model: "gpt-5.4",
});
emitDiagnosticEvent({
type: "model.call.completed",
runId: "run-1",
callId: "call-1",
provider: "openai",
model: "gpt-5.4",
durationMs: 80,
});
emitDiagnosticEvent({
type: "tool.execution.started",
runId: "run-1",
toolName: "read",
});
emitDiagnosticEvent({
type: "tool.execution.error",
runId: "run-1",
toolName: "read",
durationMs: 20,
errorCategory: "TypeError",
});
emitDiagnosticEvent({
type: "harness.run.started",
runId: "run-1",
provider: "codex",
model: "gpt-5.4",
harnessId: "codex",
pluginId: "codex-plugin",
});
emitDiagnosticEvent({
type: "harness.run.completed",
runId: "run-1",
provider: "codex",
model: "gpt-5.4",
harnessId: "codex",
pluginId: "codex-plugin",
outcome: "completed",
durationMs: 90,
});
await flushDiagnosticEvents();
expect(
telemetryState.tracer.startSpan.mock.calls.filter((call) => call[0] === "openclaw.run"),
).toHaveLength(1);
expect(
telemetryState.tracer.startSpan.mock.calls.filter(
(call) => call[0] === "openclaw.model.call",
),
).toHaveLength(1);
expect(
telemetryState.tracer.startSpan.mock.calls.filter(
(call) => call[0] === "openclaw.tool.execution",
),
).toHaveLength(1);
expect(
telemetryState.tracer.startSpan.mock.calls.filter(
(call) => call[0] === "openclaw.harness.run",
),
).toHaveLength(1);
await service.stop?.(ctx);
});
test("exports exec process spans without command text", async () => {
const service = createDiagnosticsOtelService();
const ctx = createOtelContext(OTEL_TEST_ENDPOINT, { traces: true, metrics: true });

View File

@@ -81,9 +81,9 @@ type ModelCallLifecycleDiagnosticEvent = Extract<
DiagnosticEventPayload,
{ type: "model.call.completed" | "model.call.error" }
>;
type HarnessRunLifecycleDiagnosticEvent = Extract<
type HarnessRunDiagnosticEvent = Extract<
DiagnosticEventPayload,
{ type: "harness.run.completed" | "harness.run.error" }
{ type: "harness.run.started" | "harness.run.completed" | "harness.run.error" }
>;
type TelemetryExporterDiagnosticEvent = Extract<
DiagnosticEventPayload,
@@ -217,7 +217,7 @@ function positiveFiniteNumber(value: number | undefined): number | undefined {
}
function assignPositiveNumberAttr(
attrs: Record<string, string | number>,
attrs: Record<string, string | number | boolean>,
key: string,
value: number | undefined,
): void {
@@ -227,6 +227,23 @@ function assignPositiveNumberAttr(
}
}
function assignModelCallSizeTimingAttrs(
attrs: Record<string, string | number | boolean>,
evt: {
requestPayloadBytes?: number;
responseStreamBytes?: number;
timeToFirstByteMs?: number;
},
): void {
assignPositiveNumberAttr(attrs, "openclaw.model_call.request_bytes", evt.requestPayloadBytes);
assignPositiveNumberAttr(attrs, "openclaw.model_call.response_bytes", evt.responseStreamBytes);
assignPositiveNumberAttr(
attrs,
"openclaw.model_call.time_to_first_byte_ms",
evt.timeToFirstByteMs,
);
}
function assignGenAiSpanIdentityAttrs(
attrs: Record<string, string | number | boolean>,
input: { api?: string; model?: string; provider?: string },
@@ -244,7 +261,7 @@ function assignGenAiSpanIdentityAttrs(
function assignGenAiModelCallAttrs(
attrs: Record<string, string | number | boolean>,
evt: ModelCallLifecycleDiagnosticEvent,
evt: { api?: string; model?: string; provider?: string },
): void {
assignGenAiSpanIdentityAttrs(attrs, evt);
}
@@ -467,19 +484,6 @@ function contextForTraceContext(traceContext: DiagnosticTraceContext | undefined
});
}
function contextForDiagnosticSpanParent(traceContext: DiagnosticTraceContext | undefined) {
const normalized = normalizeTraceContext(traceContext);
if (!normalized?.parentSpanId) {
return undefined;
}
return trace.setSpanContext(otelContextApi.active(), {
traceId: normalized.traceId,
spanId: normalized.parentSpanId,
traceFlags: traceFlagsToOtel(normalized.traceFlags),
isRemote: true,
});
}
function contextForTrustedTraceContext(
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
@@ -487,13 +491,6 @@ function contextForTrustedTraceContext(
return metadata.trusted ? contextForTraceContext(evt.trace) : undefined;
}
function contextForTrustedDiagnosticSpanParent(
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
) {
return metadata.trusted ? contextForDiagnosticSpanParent(evt.trace) : undefined;
}
function addTraceAttributes(
attributes: Record<string, string | number | boolean>,
traceContext: DiagnosticTraceContext | undefined,
@@ -518,17 +515,21 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
let sdk: NodeSDK | null = null;
let logProvider: LoggerProvider | null = null;
let unsubscribe: (() => void) | null = null;
let stopActiveTrustedSpans: (() => void) | null = null;
const stopStarted = async () => {
const currentUnsubscribe = unsubscribe;
const currentLogProvider = logProvider;
const currentSdk = sdk;
const currentStopActiveTrustedSpans = stopActiveTrustedSpans;
unsubscribe = null;
logProvider = null;
sdk = null;
stopActiveTrustedSpans = null;
currentUnsubscribe?.();
currentStopActiveTrustedSpans?.();
if (currentLogProvider) {
await currentLogProvider.shutdown().catch(() => undefined);
}
@@ -694,6 +695,24 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
const meter = metrics.getMeter("openclaw");
const tracer = trace.getTracer("openclaw");
const activeTrustedSpans = new Map<string, ReturnType<typeof tracer.startSpan>>();
const activeTrustedSpanAliases = new Map<string, ReturnType<typeof tracer.startSpan>>();
const pendingTrustedRunFinalizers = new Map<string, ReturnType<typeof setImmediate>>();
stopActiveTrustedSpans = () => {
const stopAt = Date.now();
for (const handle of pendingTrustedRunFinalizers.values()) {
clearImmediate(handle);
}
pendingTrustedRunFinalizers.clear();
for (const span of new Set([
...activeTrustedSpans.values(),
...activeTrustedSpanAliases.values(),
])) {
span.end(stopAt);
}
activeTrustedSpans.clear();
activeTrustedSpanAliases.clear();
};
const tokensCounter = meter.createCounter("openclaw.tokens", {
unit: "1",
@@ -810,6 +829,27 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
unit: "ms",
description: "Model call duration",
});
const modelCallRequestBytesHistogram = meter.createHistogram(
"openclaw.model_call.request_bytes",
{
unit: "By",
description: "UTF-8 byte size of sanitized model request payloads",
},
);
const modelCallResponseBytesHistogram = meter.createHistogram(
"openclaw.model_call.response_bytes",
{
unit: "By",
description: "UTF-8 byte size of streamed model response events",
},
);
const modelCallTimeToFirstByteHistogram = meter.createHistogram(
"openclaw.model_call.time_to_first_byte_ms",
{
unit: "ms",
description: "Elapsed time before the first streamed model response event",
},
);
const toolExecutionDurationHistogram = meter.createHistogram(
"openclaw.tool.execution.duration_ms",
{
@@ -942,11 +982,16 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
options: {
parentContext?: ReturnType<typeof contextForTraceContext> | null;
endTimeMs?: number;
startTimeMs?: number;
} = {},
) => {
const endTimeMs = options.endTimeMs ?? Date.now();
const startTime =
typeof durationMs === "number" ? endTimeMs - Math.max(0, durationMs) : undefined;
typeof options.startTimeMs === "number"
? options.startTimeMs
: typeof durationMs === "number" && durationMs >= 0
? endTimeMs - durationMs
: undefined;
const parentContext =
"parentContext" in options ? (options.parentContext ?? undefined) : undefined;
const span = tracer.startSpan(
@@ -959,6 +1004,78 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
);
return span;
};
const trustedTraceContext = (
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
) => (metadata.trusted ? normalizeTraceContext(evt.trace) : undefined);
const activeTrustedParentContext = (
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
) => {
const parentSpanId = trustedTraceContext(evt, metadata)?.parentSpanId;
if (!parentSpanId) {
return undefined;
}
const activeParentSpan =
activeTrustedSpans.get(parentSpanId) ?? activeTrustedSpanAliases.get(parentSpanId);
if (!activeParentSpan) {
return undefined;
}
return trace.setSpanContext(otelContextApi.active(), activeParentSpan.spanContext());
};
const trackTrustedSpan = (
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
span: ReturnType<typeof tracer.startSpan>,
) => {
const spanId = trustedTraceContext(evt, metadata)?.spanId;
if (spanId) {
activeTrustedSpans.set(spanId, span);
}
return span;
};
const takeTrackedTrustedSpan = (
evt: DiagnosticEventPayload,
metadata: DiagnosticEventMetadata,
) => {
const spanId = trustedTraceContext(evt, metadata)?.spanId;
if (!spanId) {
return undefined;
}
const span = activeTrustedSpans.get(spanId);
if (span) {
activeTrustedSpans.delete(spanId);
}
return span;
};
const setSpanAttrs = (
span: ReturnType<typeof tracer.startSpan>,
attributes: Record<string, string | number | boolean>,
) => {
span.setAttributes?.(redactOtelAttributes(attributes));
};
const scheduleTrackedRunSpanFinalize = (
spanId: string,
parentSpanId: string | undefined,
span: ReturnType<typeof tracer.startSpan>,
endTimeMs: number,
) => {
const existingHandle = pendingTrustedRunFinalizers.get(spanId);
if (existingHandle) {
clearImmediate(existingHandle);
}
const handle = setImmediate(() => {
pendingTrustedRunFinalizers.delete(spanId);
if (activeTrustedSpans.get(spanId) === span) {
activeTrustedSpans.delete(spanId);
}
if (parentSpanId && activeTrustedSpanAliases.get(parentSpanId) === span) {
activeTrustedSpanAliases.delete(parentSpanId);
}
span.end(endTimeMs);
});
pendingTrustedRunFinalizers.set(spanId, handle);
};
const addRunAttrs = (
spanAttrs: Record<string, string | number | boolean>,
@@ -1093,7 +1210,7 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
);
const span = spanWithDuration("openclaw.model.usage", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
span.end(evt.ts);
@@ -1258,6 +1375,29 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
span.end(evt.ts);
};
const recordRunStarted = (
evt: Extract<DiagnosticEventPayload, { type: "run.started" }>,
metadata: DiagnosticEventMetadata,
) => {
if (!tracesEnabled || !metadata.trusted) {
return;
}
const spanAttrs: Record<string, string | number | boolean> = {};
addRunAttrs(spanAttrs, evt);
const span = trackTrustedSpan(
evt,
metadata,
spanWithDuration("openclaw.run", spanAttrs, undefined, {
parentContext: activeTrustedParentContext(evt, metadata),
startTimeMs: evt.ts,
}),
);
const parentSpanId = trustedTraceContext(evt, metadata)?.parentSpanId;
if (parentSpanId && !activeTrustedSpans.has(parentSpanId)) {
activeTrustedSpanAliases.set(parentSpanId, span);
}
};
const recordLaneEnqueue = (
evt: Extract<DiagnosticEventPayload, { type: "queue.lane.enqueue" }>,
) => {
@@ -1421,28 +1561,65 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
if (evt.errorCategory) {
spanAttrs["openclaw.errorCategory"] = lowCardinalityAttr(evt.errorCategory, "other");
}
const span = spanWithDuration("openclaw.run", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const trustedTrace = trustedTraceContext(evt, metadata);
const trackedSpan = trustedTrace?.spanId
? activeTrustedSpans.get(trustedTrace.spanId)
: undefined;
const span =
trackedSpan ??
spanWithDuration("openclaw.run", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
if (evt.outcome === "error") {
span.setStatus({
code: SpanStatusCode.ERROR,
...(evt.errorCategory ? { message: redactSensitiveText(evt.errorCategory) } : {}),
});
}
if (trackedSpan && trustedTrace?.spanId) {
scheduleTrackedRunSpanFinalize(
trustedTrace.spanId,
trustedTrace.parentSpanId,
trackedSpan,
evt.ts,
);
return;
}
span.end(evt.ts);
};
const harnessRunMetricAttrs = (evt: HarnessRunLifecycleDiagnosticEvent) => ({
const harnessRunMetricAttrs = (evt: HarnessRunDiagnosticEvent) => ({
"openclaw.harness.id": lowCardinalityAttr(evt.harnessId, "unknown"),
"openclaw.harness.plugin": lowCardinalityAttr(evt.pluginId),
"openclaw.outcome": evt.type === "harness.run.error" ? "error" : evt.outcome,
...(evt.type === "harness.run.started"
? {}
: {
"openclaw.outcome": evt.type === "harness.run.error" ? "error" : evt.outcome,
}),
"openclaw.provider": lowCardinalityAttr(evt.provider, "unknown"),
"openclaw.model": lowCardinalityAttr(evt.model, "unknown"),
...(evt.channel ? { "openclaw.channel": lowCardinalityAttr(evt.channel) } : {}),
});
const recordHarnessRunStarted = (
evt: Extract<DiagnosticEventPayload, { type: "harness.run.started" }>,
metadata: DiagnosticEventMetadata,
) => {
if (!tracesEnabled || !metadata.trusted) {
return;
}
trackTrustedSpan(
evt,
metadata,
spanWithDuration("openclaw.harness.run", harnessRunMetricAttrs(evt), undefined, {
parentContext: activeTrustedParentContext(evt, metadata),
startTimeMs: evt.ts,
}),
);
};
const recordHarnessRunCompleted = (
evt: Extract<DiagnosticEventPayload, { type: "harness.run.completed" }>,
metadata: DiagnosticEventMetadata,
@@ -1467,10 +1644,13 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
spanAttrs["openclaw.harness.items.completed"] = evt.itemLifecycle.completedCount;
spanAttrs["openclaw.harness.items.active"] = evt.itemLifecycle.activeCount;
}
const span = spanWithDuration("openclaw.harness.run", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.harness.run", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
if (evt.outcome === "error") {
span.setStatus({
code: SpanStatusCode.ERROR,
@@ -1499,10 +1679,13 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
"error.type": errorType,
...(evt.cleanupFailed ? { "openclaw.harness.cleanup_failed": true } : {}),
};
const span = spanWithDuration("openclaw.harness.run", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.harness.run", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
span.setStatus({
code: SpanStatusCode.ERROR,
message: errorType,
@@ -1534,7 +1717,7 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
spanAttrs["openclaw.context.reserve_tokens"] = evt.reserveTokens;
}
const span = spanWithDuration("openclaw.context.assembled", spanAttrs, 0, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
span.end(evt.ts);
@@ -1555,12 +1738,59 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
"gen_ai.request.model": lowCardinalityAttr(evt.model),
...(errorType ? { "error.type": errorType } : {}),
});
const recordModelCallSizeTimingMetrics = (
evt: Extract<DiagnosticEventPayload, { type: "model.call.completed" | "model.call.error" }>,
attrs: ReturnType<typeof modelCallMetricAttrs>,
) => {
const requestPayloadBytes = positiveFiniteNumber(evt.requestPayloadBytes);
if (requestPayloadBytes !== undefined) {
modelCallRequestBytesHistogram.record(requestPayloadBytes, attrs);
}
const responseStreamBytes = positiveFiniteNumber(evt.responseStreamBytes);
if (responseStreamBytes !== undefined) {
modelCallResponseBytesHistogram.record(responseStreamBytes, attrs);
}
const timeToFirstByteMs = positiveFiniteNumber(evt.timeToFirstByteMs);
if (timeToFirstByteMs !== undefined) {
modelCallTimeToFirstByteHistogram.record(timeToFirstByteMs, attrs);
}
};
const recordModelCallStarted = (
evt: Extract<DiagnosticEventPayload, { type: "model.call.started" }>,
metadata: DiagnosticEventMetadata,
) => {
if (!tracesEnabled || !metadata.trusted) {
return;
}
const spanAttrs: Record<string, string | number | boolean> = {
"openclaw.provider": evt.provider,
"openclaw.model": evt.model,
};
assignGenAiModelCallAttrs(spanAttrs, evt);
if (evt.api) {
spanAttrs["openclaw.api"] = evt.api;
}
if (evt.transport) {
spanAttrs["openclaw.transport"] = evt.transport;
}
trackTrustedSpan(
evt,
metadata,
spanWithDuration("openclaw.model.call", spanAttrs, undefined, {
parentContext: activeTrustedParentContext(evt, metadata),
startTimeMs: evt.ts,
}),
);
};
const recordModelCallCompleted = (
evt: Extract<DiagnosticEventPayload, { type: "model.call.completed" }>,
metadata: DiagnosticEventMetadata,
) => {
modelCallDurationHistogram.record(evt.durationMs, modelCallMetricAttrs(evt));
const metricAttrs = modelCallMetricAttrs(evt);
modelCallDurationHistogram.record(evt.durationMs, metricAttrs);
recordModelCallSizeTimingMetrics(evt, metricAttrs);
genAiOperationDurationHistogram.record(
evt.durationMs / 1000,
genAiModelCallMetricAttrs(evt),
@@ -1579,15 +1809,19 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
if (evt.transport) {
spanAttrs["openclaw.transport"] = evt.transport;
}
assignModelCallSizeTimingAttrs(spanAttrs, evt);
assignOtelModelContentAttributes(
spanAttrs,
evt as unknown as Record<string, unknown>,
contentCapturePolicy,
);
const span = spanWithDuration("openclaw.model.call", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.model.call", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
addUpstreamRequestIdSpanEvent(span, evt.upstreamRequestIdHash);
span.end(evt.ts);
};
@@ -1597,10 +1831,12 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
metadata: DiagnosticEventMetadata,
) => {
const errorType = lowCardinalityAttr(evt.errorCategory, "other");
modelCallDurationHistogram.record(evt.durationMs, {
const metricAttrs = {
...modelCallMetricAttrs(evt),
"openclaw.errorCategory": errorType,
});
};
modelCallDurationHistogram.record(evt.durationMs, metricAttrs);
recordModelCallSizeTimingMetrics(evt, metricAttrs);
genAiOperationDurationHistogram.record(
evt.durationMs / 1000,
genAiModelCallMetricAttrs(evt, errorType),
@@ -1621,15 +1857,19 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
if (evt.transport) {
spanAttrs["openclaw.transport"] = evt.transport;
}
assignModelCallSizeTimingAttrs(spanAttrs, evt);
assignOtelModelContentAttributes(
spanAttrs,
evt as unknown as Record<string, unknown>,
contentCapturePolicy,
);
const span = spanWithDuration("openclaw.model.call", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.model.call", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
addUpstreamRequestIdSpanEvent(span, evt.upstreamRequestIdHash);
span.setStatus({
code: SpanStatusCode.ERROR,
@@ -1638,6 +1878,36 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
span.end(evt.ts);
};
const toolExecutionBaseAttrs = (
evt: Extract<
DiagnosticEventPayload,
{
type: "tool.execution.started" | "tool.execution.completed" | "tool.execution.error";
}
>,
): Record<string, string | number | boolean> => ({
"openclaw.toolName": evt.toolName,
"gen_ai.tool.name": evt.toolName,
...paramsSummaryAttrs(evt.paramsSummary),
});
const recordToolExecutionStarted = (
evt: Extract<DiagnosticEventPayload, { type: "tool.execution.started" }>,
metadata: DiagnosticEventMetadata,
) => {
if (!tracesEnabled || !metadata.trusted) {
return;
}
trackTrustedSpan(
evt,
metadata,
spanWithDuration("openclaw.tool.execution", toolExecutionBaseAttrs(evt), undefined, {
parentContext: activeTrustedParentContext(evt, metadata),
startTimeMs: evt.ts,
}),
);
};
const recordToolExecutionCompleted = (
evt: Extract<DiagnosticEventPayload, { type: "tool.execution.completed" }>,
metadata: DiagnosticEventMetadata,
@@ -1651,9 +1921,7 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
return;
}
const spanAttrs: Record<string, string | number | boolean> = {
"openclaw.toolName": evt.toolName,
"gen_ai.tool.name": evt.toolName,
...paramsSummaryAttrs(evt.paramsSummary),
...toolExecutionBaseAttrs(evt),
};
addRunAttrs(spanAttrs, evt);
assignOtelToolContentAttributes(
@@ -1661,10 +1929,13 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
evt as unknown as Record<string, unknown>,
contentCapturePolicy,
);
const span = spanWithDuration("openclaw.tool.execution", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.tool.execution", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
span.end(evt.ts);
};
@@ -1682,10 +1953,8 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
return;
}
const spanAttrs: Record<string, string | number | boolean> = {
"openclaw.toolName": evt.toolName,
...toolExecutionBaseAttrs(evt),
"openclaw.errorCategory": lowCardinalityAttr(evt.errorCategory, "other"),
"gen_ai.tool.name": evt.toolName,
...paramsSummaryAttrs(evt.paramsSummary),
};
addRunAttrs(spanAttrs, evt);
if (evt.errorCode) {
@@ -1696,10 +1965,13 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
evt as unknown as Record<string, unknown>,
contentCapturePolicy,
);
const span = spanWithDuration("openclaw.tool.execution", spanAttrs, evt.durationMs, {
parentContext: contextForTrustedDiagnosticSpanParent(evt, metadata),
endTimeMs: evt.ts,
});
const span =
takeTrackedTrustedSpan(evt, metadata) ??
spanWithDuration("openclaw.tool.execution", spanAttrs, evt.durationMs, {
parentContext: activeTrustedParentContext(evt, metadata),
endTimeMs: evt.ts,
});
setSpanAttrs(span, spanAttrs);
span.setStatus({
code: SpanStatusCode.ERROR,
message: redactSensitiveText(evt.errorCategory),
@@ -1827,9 +2099,15 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
case "diagnostic.heartbeat":
recordHeartbeat(evt);
return;
case "run.started":
recordRunStarted(evt, metadata);
return;
case "run.completed":
recordRunCompleted(evt, metadata);
return;
case "harness.run.started":
recordHarnessRunStarted(evt, metadata);
return;
case "harness.run.completed":
recordHarnessRunCompleted(evt, metadata);
return;
@@ -1839,12 +2117,18 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
case "context.assembled":
recordContextAssembled(evt, metadata);
return;
case "model.call.started":
recordModelCallStarted(evt, metadata);
return;
case "model.call.completed":
recordModelCallCompleted(evt, metadata);
return;
case "model.call.error":
recordModelCallError(evt, metadata);
return;
case "tool.execution.started":
recordToolExecutionStarted(evt, metadata);
return;
case "tool.execution.completed":
recordToolExecutionCompleted(evt, metadata);
return;
@@ -1869,10 +2153,6 @@ export function createDiagnosticsOtelService(): OpenClawPluginService {
case "telemetry.exporter":
recordTelemetryExporter(evt, metadata);
return;
case "tool.execution.started":
case "run.started":
case "harness.run.started":
case "model.call.started":
case "payload.large":
return;
}

View File

@@ -166,6 +166,9 @@ function renderElement(
}
case "emotion":
return renderEmotionElement(element);
case "md":
case "lark_md":
return toStringOrEmpty(element.text) || toStringOrEmpty(element.content);
case "br":
return "\n";
case "hr":

View File

@@ -168,6 +168,95 @@ describe("getMessageFeishu", () => {
);
});
it("falls through empty interactive card element arrays and locale variants", async () => {
mockClientGet.mockResolvedValueOnce({
code: 0,
data: {
items: [
{
message_id: "om_i18n_card",
chat_id: "oc_i18n_card",
msg_type: "interactive",
body: {
content: JSON.stringify({
elements: [],
body: { elements: [] },
i18n_elements: {
zh_cn: [],
en_us: [
{
tag: "markdown",
content: "hello ${count} {{label}} {{metadata}}",
},
],
},
template_variable: {
count: 2,
label: "tasks",
metadata: { ignored: true },
},
}),
},
},
],
},
});
const result = await getMessageFeishu({
cfg: {} as ClawdbotConfig,
messageId: "om_i18n_card",
});
expect(result).toEqual(
expect.objectContaining({
messageId: "om_i18n_card",
chatId: "oc_i18n_card",
contentType: "interactive",
content: "hello 2 tasks {{metadata}}",
}),
);
});
it("falls back to post-format content when interactive card elements are empty", async () => {
mockClientGet.mockResolvedValueOnce({
code: 0,
data: {
items: [
{
message_id: "om_post_card",
chat_id: "oc_post_card",
msg_type: "interactive",
body: {
content: JSON.stringify({
elements: [],
post: {
zh_cn: {
title: "Card summary",
content: [[{ tag: "md", text: "**fallback** body" }]],
},
},
}),
},
},
],
},
});
const result = await getMessageFeishu({
cfg: {} as ClawdbotConfig,
messageId: "om_post_card",
});
expect(result).toEqual(
expect.objectContaining({
messageId: "om_post_card",
chatId: "oc_post_card",
contentType: "interactive",
content: "Card summary\n\n**fallback** body",
}),
);
});
it("extracts text content from post messages", async () => {
mockClientGet.mockResolvedValueOnce({
code: 0,

View File

@@ -15,6 +15,8 @@ import { resolveFeishuSendTarget } from "./send-target.js";
import type { FeishuChatType, FeishuMessageInfo, FeishuSendResult } from "./types.js";
const WITHDRAWN_REPLY_ERROR_CODES = new Set([230011, 231003]);
const INTERACTIVE_CARD_FALLBACK_TEXT = "[Interactive Card]";
const POST_FALLBACK_TEXT = "[Rich text message]";
const FEISHU_CARD_TEMPLATES = new Set([
"blue",
"green",
@@ -60,6 +62,10 @@ function isWithdrawnReplyError(err: unknown): boolean {
return false;
}
function isRecord(value: unknown): value is Record<string, unknown> {
return Boolean(value && typeof value === "object" && !Array.isArray(value));
}
type FeishuCreateMessageClient = {
im: {
message: {
@@ -179,41 +185,121 @@ async function sendReplyOrFallbackDirect(
return toFeishuSendResult(response, params.directParams.receiveId);
}
function parseInteractiveCardContent(parsed: unknown): string {
if (!parsed || typeof parsed !== "object") {
return "[Interactive Card]";
function normalizeCardTemplateVariable(value: unknown): string | undefined {
if (typeof value === "string") {
return value;
}
// Support both schema 1.0 (top-level `elements`) and 2.0 (`body.elements`).
const candidate = parsed as { elements?: unknown; body?: { elements?: unknown } };
const elements = Array.isArray(candidate.elements)
? candidate.elements
: Array.isArray(candidate.body?.elements)
? candidate.body.elements
: null;
if (!elements) {
return "[Interactive Card]";
if (typeof value === "number" || typeof value === "boolean" || typeof value === "bigint") {
return String(value);
}
return undefined;
}
function readCardTemplateVariables(parsed: Record<string, unknown>): Map<string, string> {
const variables = new Map<string, string>();
for (const source of [parsed.template_variable, parsed.template_variables]) {
if (!isRecord(source)) {
continue;
}
for (const [key, value] of Object.entries(source)) {
const normalized = normalizeCardTemplateVariable(value);
if (normalized !== undefined) {
variables.set(key, normalized);
}
}
}
return variables;
}
function applyCardTemplateVariables(text: string, variables: Map<string, string>): string {
if (variables.size === 0) {
return text;
}
return text.replace(/\$\{([A-Za-z0-9_.-]+)\}|\{\{\s*([A-Za-z0-9_.-]+)\s*\}\}/g, (match, a, b) => {
const variableName = typeof a === "string" ? a : b;
return variables.get(variableName) ?? match;
});
}
function extractInteractiveElementText(
element: unknown,
variables: Map<string, string>,
): string | undefined {
if (!isRecord(element)) {
return undefined;
}
const tag = typeof element.tag === "string" ? element.tag : "";
const text = isRecord(element.text) ? element.text : undefined;
if (tag === "div" && typeof text?.content === "string") {
return applyCardTemplateVariables(text.content, variables);
}
if ((tag === "markdown" || tag === "lark_md") && typeof element.content === "string") {
return applyCardTemplateVariables(element.content, variables);
}
if (tag === "plain_text" && typeof element.content === "string") {
return applyCardTemplateVariables(element.content, variables);
}
return undefined;
}
function extractInteractiveElementsText(
elements: unknown[],
variables: Map<string, string>,
): string {
const texts: string[] = [];
for (const element of elements) {
if (!element || typeof element !== "object") {
continue;
}
const item = element as {
tag?: string;
content?: string;
text?: { content?: string };
};
if (item.tag === "div" && typeof item.text?.content === "string") {
texts.push(item.text.content);
continue;
}
if (item.tag === "markdown" && typeof item.content === "string") {
texts.push(item.content);
const text = extractInteractiveElementText(element, variables);
if (text !== undefined) {
texts.push(text);
}
}
return texts.join("\n").trim() || "[Interactive Card]";
return texts.join("\n").trim();
}
function readInteractiveElementArrays(parsed: Record<string, unknown>): unknown[][] {
const body = isRecord(parsed.body) ? parsed.body : undefined;
const elementArrays: unknown[][] = [];
for (const candidate of [parsed.elements, body?.elements]) {
if (Array.isArray(candidate)) {
elementArrays.push(candidate);
}
}
for (const candidate of [parsed.i18n_elements, body?.i18n_elements]) {
if (!isRecord(candidate)) {
continue;
}
for (const localeElements of Object.values(candidate)) {
if (Array.isArray(localeElements)) {
elementArrays.push(localeElements);
}
}
}
return elementArrays;
}
function parseInteractivePostFallback(parsed: unknown): string | undefined {
const textContent = parsePostContent(JSON.stringify(parsed)).textContent.trim();
return textContent && textContent !== POST_FALLBACK_TEXT ? textContent : undefined;
}
function parseInteractiveCardContent(parsed: unknown): string {
if (!isRecord(parsed)) {
return INTERACTIVE_CARD_FALLBACK_TEXT;
}
const variables = readCardTemplateVariables(parsed);
for (const elements of readInteractiveElementArrays(parsed)) {
const text = extractInteractiveElementsText(elements, variables);
if (text) {
return text;
}
}
return parseInteractivePostFallback(parsed) ?? INTERACTIVE_CARD_FALLBACK_TEXT;
}
function parseFeishuMessageContent(rawContent: string, msgType: string): string {

View File

@@ -5,11 +5,7 @@ import path from "node:path";
import type { DatabaseSync } from "node:sqlite";
import chokidar, { FSWatcher } from "chokidar";
import { formatErrorMessage } from "openclaw/plugin-sdk/error-runtime";
import {
buildCaseInsensitiveExtensionGlob,
classifyMemoryMultimodalPath,
getMemoryMultimodalExtensions,
} from "openclaw/plugin-sdk/memory-core-host-engine-embeddings";
import { classifyMemoryMultimodalPath } from "openclaw/plugin-sdk/memory-core-host-engine-embeddings";
import {
createSubsystemLogger,
onSessionTranscriptUpdate,
@@ -105,6 +101,9 @@ function shouldIgnoreMemoryWatchPath(
if (stats?.isDirectory?.()) {
return false;
}
if (!stats) {
return false;
}
const extension = normalizeLowercaseStringOrEmpty(path.extname(normalized));
if (extension.length === 0 || extension === ".md") {
return false;
@@ -383,16 +382,7 @@ export abstract class MemoryManagerSyncOps {
continue;
}
if (stat.isDirectory()) {
watchPaths.add(path.join(entry, "**", "*.md"));
if (this.settings.multimodal.enabled) {
for (const modality of this.settings.multimodal.modalities) {
for (const extension of getMemoryMultimodalExtensions(modality)) {
watchPaths.add(
path.join(entry, "**", buildCaseInsensitiveExtensionGlob(extension)),
);
}
}
}
watchPaths.add(entry);
continue;
}
if (
@@ -422,6 +412,7 @@ export abstract class MemoryManagerSyncOps {
this.watcher.on("add", markDirty);
this.watcher.on("change", markDirty);
this.watcher.on("unlink", markDirty);
this.watcher.on("unlinkDir", markDirty);
}
protected ensureSessionListener() {

View File

@@ -11,12 +11,35 @@ import { registerBuiltInMemoryEmbeddingProviders } from "./provider-adapters.js"
type WatchIgnoredFn = (watchPath: string, stats?: { isDirectory?: () => boolean }) => boolean;
const { watchMock } = vi.hoisted(() => ({
watchMock: vi.fn(() => ({
on: vi.fn(),
close: vi.fn(async () => undefined),
})),
}));
const { createdWatchers, watchMock } = vi.hoisted(() => {
type WatchEvent = "add" | "change" | "unlink" | "unlinkDir";
type WatchCallback = () => void;
function createMockWatcher() {
const handlers = new Map<WatchEvent, WatchCallback[]>();
const watcher = {
on: vi.fn((event: WatchEvent, callback: WatchCallback) => {
handlers.set(event, [...(handlers.get(event) ?? []), callback]);
return watcher;
}),
close: vi.fn(async () => undefined),
emit: (event: WatchEvent) => {
for (const callback of handlers.get(event) ?? []) {
callback();
}
},
};
return watcher;
}
const watchers: Array<ReturnType<typeof createMockWatcher>> = [];
return {
createdWatchers: watchers,
watchMock: vi.fn(() => {
const watcher = createMockWatcher();
watchers.push(watcher);
return watcher;
}),
};
});
vi.mock("chokidar", () => ({
default: { watch: watchMock },
@@ -69,7 +92,9 @@ describe("memory watcher config", () => {
});
afterEach(async () => {
vi.useRealTimers();
watchMock.mockClear();
createdWatchers.length = 0;
if (manager) {
await manager.close();
manager = null;
@@ -140,9 +165,10 @@ describe("memory watcher config", () => {
expect.arrayContaining([
path.join(workspaceDir, "MEMORY.md"),
path.join(workspaceDir, "memory"),
path.join(extraDir, "**", "*.md"),
extraDir,
]),
);
expect(watchedPaths.every((watchPath) => !watchPath.includes("*"))).toBe(true);
expect(options.ignoreInitial).toBe(true);
expect(options.awaitWriteFinish).toEqual({ stabilityThreshold: 25, pollInterval: 100 });
@@ -152,15 +178,19 @@ describe("memory watcher config", () => {
true,
);
expect(ignored?.(path.join(workspaceDir, "memory", ".venv", "lib", "python.md"))).toBe(true);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.tmp"))).toBe(true);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.json"))).toBe(true);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.tmp"), {})).toBe(true);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.json"), {})).toBe(true);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.json"), undefined)).toBe(
false,
);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.md"))).toBe(false);
expect(ignored?.(path.join(workspaceDir, "memory", "project", "notes.md"), {})).toBe(false);
expect(
ignored?.(path.join(workspaceDir, "memory", "project"), { isDirectory: () => true }),
).toBe(false);
});
it("watches multimodal extensions with case-insensitive globs", async () => {
it("watches multimodal extra directories with filtered extensions", async () => {
await setupWatcherWorkspace({ name: "PHOTO.PNG", contents: "png" });
const cfg = createWatcherConfig({
provider: "gemini",
@@ -177,16 +207,40 @@ describe("memory watcher config", () => {
Record<string, unknown>,
];
expect(watchedPaths).toEqual(
expect.arrayContaining([
path.join(extraDir, "**", "*.[pP][nN][gG]"),
path.join(extraDir, "**", "*.[wW][aA][vV]"),
]),
expect.arrayContaining([path.join(workspaceDir, "MEMORY.md"), path.join(extraDir)]),
);
expect(watchedPaths.every((watchPath) => !watchPath.includes("*"))).toBe(true);
const ignored = options.ignored as WatchIgnoredFn | undefined;
expect(ignored).toBeTypeOf("function");
expect(ignored?.(path.join(extraDir, "nested", "PHOTO.PNG"))).toBe(false);
expect(ignored?.(path.join(extraDir, "nested", "PHOTO.PNG"), {})).toBe(false);
expect(ignored?.(path.join(extraDir, "nested", "voice.WAV"))).toBe(false);
expect(ignored?.(path.join(extraDir, "nested", "metadata.json"))).toBe(true);
expect(ignored?.(path.join(extraDir, "nested", "voice.WAV"), {})).toBe(false);
expect(ignored?.(path.join(extraDir, "nested", "metadata.json"), {})).toBe(true);
});
it.each(["add", "change", "unlink", "unlinkDir"] as const)(
"schedules watch sync on %s",
async (event) => {
await setupWatcherWorkspace({ name: "notes.md", contents: "hello" });
const cfg = createWatcherConfig();
await expectWatcherManager(cfg);
vi.useFakeTimers();
const syncSpy = vi
.spyOn(
manager as unknown as {
sync: (params?: { reason?: string }) => Promise<void>;
},
"sync",
)
.mockResolvedValue(undefined);
createdWatchers[0]?.emit(event);
await vi.advanceTimersByTimeAsync(25);
expect(syncSpy).toHaveBeenCalledWith({ reason: "watch" });
},
);
});

View File

@@ -69,7 +69,9 @@ function registerProviderWithPluginConfig(pluginConfig: Record<string, unknown>)
return registerProviderMock.mock.calls[0]?.[0];
}
function captureWrappedOllamaPayload(thinkingLevel: "off" | "low" | undefined) {
function captureWrappedOllamaPayload(
thinkingLevel: "off" | "minimal" | "low" | "medium" | "high" | "max" | undefined,
) {
const provider = registerProvider();
let payloadSeen: Record<string, unknown> | undefined;
const baseStreamFn = vi.fn((_model, _context, options) => {
@@ -528,10 +530,43 @@ describe("ollama plugin", () => {
expect((payloadSeen?.options as Record<string, unknown> | undefined)?.think).toBeUndefined();
});
it("wraps native Ollama payloads with top-level think=true when thinking is enabled", () => {
it("keeps native Ollama thinking off by default while exposing opt-in effort levels", () => {
const provider = registerProvider();
expect(
provider.resolveThinkingProfile?.({
provider: "ollama",
modelId: "llama3.2:latest",
reasoning: false,
}),
).toEqual({
levels: [{ id: "off" }],
defaultLevel: "off",
});
expect(
provider.resolveThinkingProfile?.({
provider: "ollama",
modelId: "gemma4:31b",
reasoning: true,
}),
).toEqual({
levels: [{ id: "off" }, { id: "low" }, { id: "medium" }, { id: "high" }, { id: "max" }],
defaultLevel: "off",
});
});
it("wraps native Ollama payloads with top-level think effort when thinking is enabled", () => {
const { baseStreamFn, payloadSeen } = captureWrappedOllamaPayload("low");
expect(baseStreamFn).toHaveBeenCalledTimes(1);
expect(payloadSeen?.think).toBe(true);
expect(payloadSeen?.think).toBe("low");
expect((payloadSeen?.options as Record<string, unknown> | undefined)?.think).toBeUndefined();
});
it("maps native Ollama max thinking to the highest supported wire effort", () => {
const { baseStreamFn, payloadSeen } = captureWrappedOllamaPayload("max");
expect(baseStreamFn).toHaveBeenCalledTimes(1);
expect(payloadSeen?.think).toBe("high");
expect((payloadSeen?.options as Record<string, unknown> | undefined)?.think).toBeUndefined();
});

View File

@@ -166,6 +166,13 @@ export default definePluginEntry({
contributeResolvedModelCompat: ({ model }) =>
usesOllamaOpenAICompatTransport(model) ? { supportsUsageInStreaming: true } : undefined,
resolveReasoningOutputMode: () => "native",
resolveThinkingProfile: ({ reasoning }) => ({
levels:
reasoning === true
? [{ id: "off" }, { id: "low" }, { id: "medium" }, { id: "high" }, { id: "max" }]
: [{ id: "off" }],
defaultLevel: "off",
}),
wrapStreamFn: createConfiguredOllamaCompatStreamWrapper,
createEmbeddingProvider: async ({ config, model, remote }) => {
const { provider, client } = await createOllamaEmbeddingProvider({

View File

@@ -203,13 +203,26 @@ describe("ollama provider models", () => {
"vision",
"completion",
"tools",
"thinking",
]);
expect(visionModel.input).toEqual(["text", "image"]);
expect(visionModel.reasoning).toBe(true);
expect(visionModel.compat?.supportsTools).toBe(true);
const textModel = buildOllamaModelDefinition("glm-5.1:cloud", 202752, ["completion", "tools"]);
expect(textModel.input).toEqual(["text"]);
expect(textModel.reasoning).toBe(false);
expect(textModel.compat?.supportsTools).toBe(true);
const noCapabilities = buildOllamaModelDefinition("unknown-model", 65536);
expect(noCapabilities.input).toEqual(["text"]);
expect(noCapabilities.compat).toBeUndefined();
});
it("disables tool support when Ollama capabilities omit tools", () => {
const model = buildOllamaModelDefinition("embeddinggemma:latest", 2048, ["embedding"]);
expect(model.reasoning).toBe(false);
expect(model.compat?.supportsTools).toBe(false);
});
});

View File

@@ -218,14 +218,25 @@ export function buildOllamaModelDefinition(
): ModelDefinitionConfig {
const hasVision = capabilities?.includes("vision") ?? false;
const input: ("text" | "image")[] = hasVision ? ["text", "image"] : ["text"];
const reasoning =
capabilities === undefined
? isReasoningModelHeuristic(modelId)
: capabilities.includes("thinking");
const compat =
capabilities === undefined
? undefined
: {
supportsTools: capabilities.includes("tools"),
};
return {
id: modelId,
name: modelId,
reasoning: isReasoningModelHeuristic(modelId),
reasoning,
input,
cost: OLLAMA_DEFAULT_COST,
contextWindow: contextWindow ?? OLLAMA_DEFAULT_CONTEXT_WINDOW,
maxTokens: OLLAMA_DEFAULT_MAX_TOKENS,
...(compat ? { compat } : {}),
};
}

View File

@@ -150,7 +150,7 @@ describe("createConfiguredOllamaCompatStreamWrapper", () => {
);
});
it("forwards think=true on native Ollama chat requests when thinking is enabled", async () => {
it("forwards the native think effort on native Ollama chat requests when thinking is enabled", async () => {
await withMockNdjsonFetch(
[
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}',
@@ -193,10 +193,63 @@ describe("createConfiguredOllamaCompatStreamWrapper", () => {
throw new Error("Expected string request body");
}
const requestBody = JSON.parse(requestInit.body) as {
think?: boolean;
options?: { think?: boolean; num_ctx?: number };
think?: boolean | string;
options?: { think?: boolean | string; num_ctx?: number };
};
expect(requestBody.think).toBe(true);
expect(requestBody.think).toBe("low");
expect(requestBody.options?.think).toBeUndefined();
expect(requestBody.options?.num_ctx).toBe(131072);
},
);
});
it("maps native Ollama max thinking to think=high on the wire", async () => {
await withMockNdjsonFetch(
[
'{"model":"m","created_at":"t","message":{"role":"assistant","content":"ok"},"done":false}',
'{"model":"m","created_at":"t","message":{"role":"assistant","content":""},"done":true,"prompt_eval_count":1,"eval_count":1}',
],
async (fetchMock) => {
const baseStreamFn = createOllamaStreamFn("http://ollama-host:11434");
const model = {
api: "ollama",
provider: "ollama",
id: "gpt-oss:20b",
contextWindow: 131072,
};
const wrapped = createConfiguredOllamaCompatStreamWrapper({
provider: "ollama",
modelId: "gpt-oss:20b",
model,
streamFn: baseStreamFn,
thinkingLevel: "max",
} as never);
if (!wrapped) {
throw new Error("Expected wrapped Ollama stream function");
}
const stream = await Promise.resolve(
wrapped(
model as never,
{
messages: [{ role: "user", content: "hello" }],
} as never,
{} as never,
),
);
await collectStreamEvents(stream);
const requestInit = getGuardedFetchCall(fetchMock).init ?? {};
if (typeof requestInit.body !== "string") {
throw new Error("Expected string request body");
}
const requestBody = JSON.parse(requestInit.body) as {
think?: boolean | string;
options?: { think?: boolean | string; num_ctx?: number };
};
expect(requestBody.think).toBe("high");
expect(requestBody.options?.think).toBeUndefined();
expect(requestBody.options?.num_ctx).toBe(131072);
},

View File

@@ -151,7 +151,12 @@ export function wrapOllamaCompatNumCtx(baseFn: StreamFn | undefined, numCtx: num
});
}
function createOllamaThinkingWrapper(baseFn: StreamFn | undefined, think: boolean): StreamFn {
type OllamaThinkValue = boolean | "low" | "medium" | "high";
function createOllamaThinkingWrapper(
baseFn: StreamFn | undefined,
think: OllamaThinkValue,
): StreamFn {
const streamFn = baseFn ?? streamSimple;
return (model, context, options) =>
streamWithPayloadPatch(streamFn, model, context, options, (payloadRecord) => {
@@ -159,6 +164,22 @@ function createOllamaThinkingWrapper(baseFn: StreamFn | undefined, think: boolea
});
}
function resolveOllamaThinkValue(thinkingLevel: unknown): OllamaThinkValue | undefined {
if (thinkingLevel === "off") {
return false;
}
if (thinkingLevel === "low" || thinkingLevel === "medium" || thinkingLevel === "high") {
return thinkingLevel;
}
if (thinkingLevel === "minimal") {
return "low";
}
if (thinkingLevel === "xhigh" || thinkingLevel === "adaptive" || thinkingLevel === "max") {
return "high";
}
return undefined;
}
function resolveOllamaCompatNumCtx(model: ProviderRuntimeModel): number {
return Math.max(1, Math.floor(model.contextWindow ?? model.maxTokens ?? DEFAULT_CONTEXT_TOKENS));
}
@@ -196,12 +217,11 @@ export function createConfiguredOllamaCompatStreamWrapper(
streamFn = wrapOllamaCompatNumCtx(streamFn, resolveOllamaCompatNumCtx(model));
}
if (isNativeOllamaTransport && ctx.thinkingLevel === "off") {
streamFn = createOllamaThinkingWrapper(streamFn, false);
} else if (isNativeOllamaTransport && ctx.thinkingLevel) {
// Any non-off ThinkLevel (minimal, low, medium, high, xhigh, adaptive, max)
// should enable Ollama's native thinking mode.
streamFn = createOllamaThinkingWrapper(streamFn, true);
const ollamaThinkValue = isNativeOllamaTransport
? resolveOllamaThinkValue(ctx.thinkingLevel)
: undefined;
if (ollamaThinkValue !== undefined) {
streamFn = createOllamaThinkingWrapper(streamFn, ollamaThinkValue);
}
if (normalizeProviderId(ctx.provider) === "ollama" && isOllamaCloudKimiModelRef(ctx.modelId)) {
@@ -310,7 +330,7 @@ interface OllamaChatRequest {
stream: boolean;
tools?: OllamaTool[];
options?: Record<string, unknown>;
think?: boolean;
think?: OllamaThinkValue;
}
interface OllamaChatMessage {

View File

@@ -162,6 +162,7 @@ describe("telegram live qa runtime", () => {
sutAccountId: "sut",
});
expect(next.agents?.defaults?.skipBootstrap).toBe(true);
expect(next.plugins?.allow).toContain("telegram");
expect(next.plugins?.entries?.telegram).toEqual({ enabled: true });
expect(next.channels?.telegram).toEqual({
@@ -375,6 +376,27 @@ describe("telegram live qa runtime", () => {
matchText: "TELEGRAM_QA_NOMENTION_TOKEN",
}),
).toBe(false);
expect(
__testing.matchesTelegramScenarioReply({
allowAnySutReply: true,
groupId: "-100123",
sentMessageId: 55,
sutBotId: 88,
message: {
updateId: 3,
messageId: 12,
chatId: -100123,
senderId: 88,
senderIsBot: true,
senderUsername: "sut_bot",
text: "Protocol note: acknowledged.",
replyToMessageId: undefined,
timestamp: 1_700_000_003_000,
inlineButtons: [],
mediaKinds: [],
},
}),
).toBe(true);
});
it("validates expected Telegram reply markers", () => {

View File

@@ -51,6 +51,7 @@ type TelegramQaScenarioId =
| "telegram-mention-gating";
type TelegramQaScenarioRun = {
allowAnySutReply?: boolean;
expectReply: boolean;
input: string;
expectedTextIncludes?: string[];
@@ -268,15 +269,11 @@ const TELEGRAM_QA_SCENARIOS: TelegramQaScenarioDefinition[] = [
id: "telegram-mentioned-message-reply",
title: "Telegram mentioned message gets a reply",
timeoutMs: 45_000,
buildRun: (sutUsername) => {
const token = `TELEGRAM_QA_REPLY_${randomUUID().slice(0, 8).toUpperCase()}`;
return {
expectReply: true,
input: `@${sutUsername} reply with only this exact marker: ${token}`,
expectedTextIncludes: [token],
matchText: token,
};
},
buildRun: (sutUsername) => ({
allowAnySutReply: true,
expectReply: true,
input: `@${sutUsername} Telegram QA mention routing check. Reply with a short acknowledgement.`,
}),
},
{
id: "telegram-mention-gating",
@@ -476,6 +473,13 @@ function buildTelegramQaConfig(
};
return {
...baseCfg,
agents: {
...baseCfg.agents,
defaults: {
...baseCfg.agents?.defaults,
skipBootstrap: true,
},
},
plugins: {
...baseCfg.plugins,
allow: pluginAllow,
@@ -751,6 +755,7 @@ function findScenario(ids?: string[]) {
function matchesTelegramScenarioReply(params: {
groupId: string;
allowAnySutReply?: boolean;
matchText?: string;
message: TelegramObservedMessage;
sentMessageId: number;
@@ -765,6 +770,9 @@ function matchesTelegramScenarioReply(params: {
if (params.message.replyToMessageId === params.sentMessageId) {
return true;
}
if (params.allowAnySutReply === true) {
return true;
}
return Boolean(params.matchText && params.message.text.includes(params.matchText));
}
@@ -1216,6 +1224,7 @@ export async function runTelegramQaLive(params: {
observationScenarioTitle: scenario.title,
predicate: (message) =>
matchesTelegramScenarioReply({
allowAnySutReply: scenarioRun.allowAnySutReply,
groupId: runtimeEnv.groupId,
matchText: scenarioRun.matchText,
message,

View File

@@ -433,6 +433,7 @@ export const dispatchTelegramMessage = async ({
archivedAnswerPreviews.push({
messageId: preview.messageId,
textSnapshot: preview.textSnapshot,
visibleSinceMs: preview.visibleSinceMs,
deleteIfUnused: true,
});
}
@@ -539,6 +540,7 @@ export const dispatchTelegramMessage = async ({
archivedAnswerPreviews.push({
messageId: previewMessageId,
textSnapshot: answerLane.lastPartialText,
visibleSinceMs: answerLane.stream?.visibleSinceMs?.(),
deleteIfUnused: false,
});
}

View File

@@ -6,6 +6,7 @@ export type TestDraftStream = {
update: ReturnType<typeof vi.fn<(text: string) => void>>;
flush: ReturnType<typeof vi.fn<() => Promise<void>>>;
messageId: ReturnType<typeof vi.fn<() => number | undefined>>;
visibleSinceMs: ReturnType<typeof vi.fn<() => number | undefined>>;
previewMode: ReturnType<typeof vi.fn<() => DraftPreviewMode>>;
previewRevision: ReturnType<typeof vi.fn<() => number>>;
lastDeliveredText: ReturnType<typeof vi.fn<() => string>>;
@@ -25,8 +26,10 @@ export function createTestDraftStream(params?: {
onStop?: () => void | Promise<void>;
onDiscard?: () => void | Promise<void>;
clearMessageIdOnForceNew?: boolean;
visibleSinceMs?: number;
}): TestDraftStream {
let messageId = params?.messageId;
let visibleSinceMs = params?.visibleSinceMs;
let previewRevision = 0;
let lastDeliveredText = "";
return {
@@ -37,6 +40,7 @@ export function createTestDraftStream(params?: {
}),
flush: vi.fn().mockResolvedValue(undefined),
messageId: vi.fn().mockImplementation(() => messageId),
visibleSinceMs: vi.fn().mockImplementation(() => visibleSinceMs),
previewMode: vi.fn().mockReturnValue(params?.previewMode ?? "message"),
previewRevision: vi.fn().mockImplementation(() => previewRevision),
lastDeliveredText: vi.fn().mockImplementation(() => lastDeliveredText),
@@ -52,16 +56,19 @@ export function createTestDraftStream(params?: {
if (params?.clearMessageIdOnForceNew) {
messageId = undefined;
}
visibleSinceMs = undefined;
}),
sendMayHaveLanded: vi.fn().mockReturnValue(false),
setMessageId: (value: number | undefined) => {
messageId = value;
visibleSinceMs = value == null ? undefined : Date.now();
},
};
}
export function createSequencedTestDraftStream(startMessageId = 1001): TestDraftStream {
let activeMessageId: number | undefined;
let visibleSinceMs: number | undefined;
let nextMessageId = startMessageId;
let previewRevision = 0;
let lastDeliveredText = "";
@@ -69,12 +76,14 @@ export function createSequencedTestDraftStream(startMessageId = 1001): TestDraft
update: vi.fn().mockImplementation((text: string) => {
if (activeMessageId == null) {
activeMessageId = nextMessageId++;
visibleSinceMs = Date.now();
}
previewRevision += 1;
lastDeliveredText = text.trimEnd();
}),
flush: vi.fn().mockResolvedValue(undefined),
messageId: vi.fn().mockImplementation(() => activeMessageId),
visibleSinceMs: vi.fn().mockImplementation(() => visibleSinceMs),
previewMode: vi.fn().mockReturnValue("message"),
previewRevision: vi.fn().mockImplementation(() => previewRevision),
lastDeliveredText: vi.fn().mockImplementation(() => lastDeliveredText),
@@ -84,10 +93,12 @@ export function createSequencedTestDraftStream(startMessageId = 1001): TestDraft
materialize: vi.fn().mockImplementation(async () => activeMessageId),
forceNewMessage: vi.fn().mockImplementation(() => {
activeMessageId = undefined;
visibleSinceMs = undefined;
}),
sendMayHaveLanded: vi.fn().mockReturnValue(false),
setMessageId: (value: number | undefined) => {
activeMessageId = value;
visibleSinceMs = value == null ? undefined : Date.now();
},
};
}

View File

@@ -161,6 +161,28 @@ describe("createTelegramDraftStream", () => {
expect(api.sendMessageDraft).not.toHaveBeenCalled();
});
it("tracks when a message preview first became visible", async () => {
vi.useFakeTimers();
try {
vi.setSystemTime(new Date("2026-04-26T01:00:00.000Z"));
const api = createMockDraftApi();
const stream = createDraftStream(api, { previewTransport: "message" });
stream.update("Hello");
await stream.flush();
expect(stream.visibleSinceMs?.()).toBe(Date.parse("2026-04-26T01:00:00.000Z"));
vi.setSystemTime(new Date("2026-04-26T01:01:00.000Z"));
stream.update("Hello again");
await stream.flush();
expect(stream.visibleSinceMs?.()).toBe(Date.parse("2026-04-26T01:00:00.000Z"));
} finally {
vi.useRealTimers();
}
});
it("falls back to message transport when sendMessageDraft is unavailable", async () => {
const api = createMockDraftApi();
delete (api as { sendMessageDraft?: unknown }).sendMessageDraft;
@@ -436,6 +458,23 @@ describe("createTelegramDraftStream", () => {
expect(api.sendMessage).toHaveBeenLastCalledWith(123, "After thinking", undefined);
});
it("creates new message after cleanup and forceNewMessage", async () => {
const { api, stream } = createForceNewMessageHarness();
stream.update("Stale preview");
await stream.flush();
await stream.clear();
expect(api.deleteMessage).toHaveBeenCalledWith(123, 17);
stream.forceNewMessage();
stream.update("Next preview");
await stream.flush();
expect(api.sendMessage).toHaveBeenCalledTimes(2);
expect(api.sendMessage).toHaveBeenLastCalledWith(123, "Next preview", undefined);
});
it("sends first update immediately after forceNewMessage within throttle window", async () => {
vi.useFakeTimers();
try {
@@ -487,6 +526,7 @@ describe("createTelegramDraftStream", () => {
messageId: 17,
textSnapshot: "Message A partial",
parseMode: undefined,
visibleSinceMs: expect.any(Number),
});
expect(api.sendMessage).toHaveBeenCalledTimes(2);
expect(api.sendMessage).toHaveBeenNthCalledWith(2, 123, "Message B partial", undefined);

View File

@@ -94,6 +94,7 @@ export type TelegramDraftStream = {
update: (text: string) => void;
flush: () => Promise<void>;
messageId: () => number | undefined;
visibleSinceMs?: () => number | undefined;
previewMode?: () => "message" | "draft";
previewRevision?: () => number;
lastDeliveredText?: () => string;
@@ -118,6 +119,7 @@ type SupersededTelegramPreview = {
messageId: number;
textSnapshot: string;
parseMode?: "HTML";
visibleSinceMs?: number;
};
export function createTelegramDraftStream(params: {
@@ -174,6 +176,7 @@ export function createTelegramDraftStream(params: {
const streamState = { stopped: false, final: false };
let messageSendAttempted = false;
let streamMessageId: number | undefined;
let streamVisibleSinceMs: number | undefined;
let streamDraftId = usesDraftTransport ? allocateTelegramDraftId() : undefined;
let previewTransport: "message" | "draft" = usesDraftTransport ? "draft" : "message";
let lastSentText = "";
@@ -226,6 +229,7 @@ export function createTelegramDraftStream(params: {
sendGeneration,
}: PreviewSendParams): Promise<boolean> => {
if (typeof streamMessageId === "number") {
streamVisibleSinceMs ??= Date.now();
if (renderedParseMode) {
await params.api.editMessageText(chatId, streamMessageId, renderedText, {
parse_mode: renderedParseMode,
@@ -257,15 +261,18 @@ export function createTelegramDraftStream(params: {
return false;
}
const normalizedMessageId = Math.trunc(sentMessageId);
const visibleSinceMs = Date.now();
if (sendGeneration !== generation) {
params.onSupersededPreview?.({
messageId: normalizedMessageId,
textSnapshot: renderedText,
parseMode: renderedParseMode,
visibleSinceMs,
});
return true;
}
streamMessageId = normalizedMessageId;
streamVisibleSinceMs = visibleSinceMs;
return true;
};
const sendDraftTransportPreview = async ({
@@ -397,10 +404,12 @@ export function createTelegramDraftStream(params: {
};
const forceNewMessage = () => {
streamState.stopped = false;
streamState.final = false;
generation += 1;
messageSendAttempted = false;
streamMessageId = undefined;
streamVisibleSinceMs = undefined;
if (previewTransport === "draft") {
streamDraftId = allocateTelegramDraftId();
}
@@ -430,6 +439,7 @@ export function createTelegramDraftStream(params: {
const sentId = sent?.message_id;
if (typeof sentId === "number" && Number.isFinite(sentId)) {
streamMessageId = Math.trunc(sentId);
streamVisibleSinceMs = Date.now();
if (resolvedDraftApi != null && streamDraftId != null) {
const clearDraftId = streamDraftId;
const clearThreadParams =
@@ -454,6 +464,7 @@ export function createTelegramDraftStream(params: {
update,
flush: loop.flush,
messageId: () => streamMessageId,
visibleSinceMs: () => streamVisibleSinceMs,
previewMode: () => previewTransport,
previewRevision: () => previewRevision,
lastDeliveredText: () => lastDeliveredText,

View File

@@ -12,6 +12,7 @@ const MESSAGE_NOT_MODIFIED_RE =
/400:\s*Bad Request:\s*message is not modified|MESSAGE_NOT_MODIFIED/i;
const MESSAGE_NOT_FOUND_RE =
/400:\s*Bad Request:\s*message to edit not found|MESSAGE_ID_INVALID|message can't be edited/i;
const LONG_LIVED_PREVIEW_FRESH_FINAL_AFTER_MS = 60_000;
function extractErrorText(err: unknown): string {
return typeof err === "string"
@@ -55,6 +56,7 @@ export type DraftLaneState = {
export type ArchivedPreview = {
messageId: number;
textSnapshot: string;
visibleSinceMs?: number;
// Boundary-finalized previews should remain visible even if no matching
// final edit arrives; superseded previews can be safely deleted.
deleteIfUnused?: boolean;
@@ -92,6 +94,7 @@ type CreateLaneTextDelivererParams = {
deletePreviewMessage: (messageId: number) => Promise<void>;
log: (message: string) => void;
markDelivered: () => void;
now?: () => number;
};
type DeliverLaneTextParams = {
@@ -169,6 +172,14 @@ function shouldSkipRegressivePreviewUpdate(args: {
);
}
function isLongLivedPreview(visibleSinceMs: number | undefined, nowMs: number): boolean {
return (
typeof visibleSinceMs === "number" &&
Number.isFinite(visibleSinceMs) &&
nowMs - visibleSinceMs >= LONG_LIVED_PREVIEW_FRESH_FINAL_AFTER_MS
);
}
function resolvePreviewTarget(params: ResolvePreviewTargetParams): PreviewTargetResolution {
const lanePreviewMessageId = params.lane.stream?.messageId();
const previewMessageId =
@@ -187,11 +198,27 @@ function resolvePreviewTarget(params: ResolvePreviewTargetParams): PreviewTarget
export function createLaneTextDeliverer(params: CreateLaneTextDelivererParams) {
const getLanePreviewText = (lane: DraftLaneState) => lane.lastPartialText;
const readNow = () => params.now?.() ?? Date.now();
const markActivePreviewComplete = (laneName: LaneName) => {
params.activePreviewLifecycleByLane[laneName] = "complete";
params.retainPreviewOnCleanupByLane[laneName] = true;
};
const isDraftPreviewLane = (lane: DraftLaneState) => lane.stream?.previewMode?.() === "draft";
const isMessagePreviewLane = (lane: DraftLaneState) => !isDraftPreviewLane(lane);
const shouldUseFreshFinalForLane = (lane: DraftLaneState) =>
isMessagePreviewLane(lane) && isLongLivedPreview(lane.stream?.visibleSinceMs?.(), readNow());
const shouldUseFreshFinalForPreview = (lane: DraftLaneState, visibleSinceMs?: number) =>
isMessagePreviewLane(lane) && isLongLivedPreview(visibleSinceMs, readNow());
const clearActivePreviewAfterFreshFinal = async (lane: DraftLaneState, laneName: LaneName) => {
try {
await lane.stream?.clear();
} catch (err) {
params.log(`telegram: ${laneName} fresh final preview cleanup failed: ${String(err)}`);
}
lane.lastPartialText = "";
lane.hasStreamedMessage = false;
lane.stream?.forceNewMessage();
};
const canMaterializeDraftFinal = (
lane: DraftLaneState,
previewButtons?: TelegramInlineButtons,
@@ -444,6 +471,19 @@ export function createLaneTextDeliverer(params: CreateLaneTextDelivererParams) {
if (!archivedPreview) {
return undefined;
}
if (canEditViaPreview && shouldUseFreshFinalForPreview(lane, archivedPreview.visibleSinceMs)) {
const delivered = await params.sendPayload(params.applyTextToPayload(payload, text));
if (delivered) {
try {
await params.deletePreviewMessage(archivedPreview.messageId);
} catch (err) {
params.log(
`telegram: archived answer preview cleanup failed (${archivedPreview.messageId}): ${String(err)}`,
);
}
return result("sent");
}
}
if (canEditViaPreview) {
const finalized = await tryUpdatePreviewForLane({
lane,
@@ -551,6 +591,14 @@ export function createLaneTextDeliverer(params: CreateLaneTextDelivererParams) {
});
}
}
if (shouldUseFreshFinalForLane(lane)) {
await params.stopDraftLane(lane);
const delivered = await params.sendPayload(params.applyTextToPayload(payload, text));
if (delivered) {
await clearActivePreviewAfterFreshFinal(lane, laneName);
return result("sent");
}
}
const previewMessageId = lane.stream?.messageId();
const finalized = await tryUpdatePreviewForLane({
lane,

View File

@@ -2,6 +2,7 @@ import type { ReplyPayload } from "openclaw/plugin-sdk/reply-runtime";
import { describe, expect, it, vi } from "vitest";
import { createTestDraftStream } from "./draft-stream.test-helpers.js";
import {
type ArchivedPreview,
createLaneTextDeliverer,
type DraftLaneState,
type LaneDeliveryResult,
@@ -17,9 +18,15 @@ function createHarness(params?: {
answerStream?: DraftLaneState["stream"];
answerHasStreamedMessage?: boolean;
answerLastPartialText?: string;
answerPreviewVisibleSinceMs?: number;
nowMs?: number;
}) {
const answer =
params?.answerStream ?? createTestDraftStream({ messageId: params?.answerMessageId });
params?.answerStream ??
createTestDraftStream({
messageId: params?.answerMessageId,
visibleSinceMs: params?.answerPreviewVisibleSinceMs,
});
const reasoning = createTestDraftStream();
const lanes: Record<LaneName, DraftLaneState> = {
answer: {
@@ -51,11 +58,7 @@ function createHarness(params?: {
const markDelivered = vi.fn();
const activePreviewLifecycleByLane = { answer: "transient", reasoning: "transient" } as const;
const retainPreviewOnCleanupByLane = { answer: false, reasoning: false } as const;
const archivedAnswerPreviews: Array<{
messageId: number;
textSnapshot: string;
deleteIfUnused?: boolean;
}> = [];
const archivedAnswerPreviews: ArchivedPreview[] = [];
const deliverLaneText = createLaneTextDeliverer({
lanes,
@@ -71,6 +74,7 @@ function createHarness(params?: {
deletePreviewMessage,
log,
markDelivered,
now: params?.nowMs != null ? () => params.nowMs! : undefined,
});
return {
@@ -347,6 +351,116 @@ describe("createLaneTextDeliverer", () => {
expect(harness.log).toHaveBeenCalledWith(expect.stringContaining("preview final too long"));
});
it("sends a fresh final when a message preview is long lived", async () => {
const visibleSinceMs = 10_000;
const harness = createHarness({
answerMessageId: 999,
answerHasStreamedMessage: true,
answerLastPartialText: "Working...",
answerPreviewVisibleSinceMs: visibleSinceMs,
nowMs: visibleSinceMs + 60_000,
});
const result = await deliverFinalAnswer(harness, HELLO_FINAL);
expect(result.kind).toBe("sent");
expect(harness.stopDraftLane).toHaveBeenCalledTimes(1);
expect(harness.sendPayload).toHaveBeenCalledWith(
expect.objectContaining({ text: HELLO_FINAL }),
);
expect(harness.editPreview).not.toHaveBeenCalled();
expect(harness.answer.stream?.clear).toHaveBeenCalledTimes(1);
expect(harness.answer.stream?.forceNewMessage).toHaveBeenCalledTimes(1);
expect(harness.lanes.answer.hasStreamedMessage).toBe(false);
expect(harness.lanes.answer.lastPartialText).toBe("");
expect(harness.markDelivered).not.toHaveBeenCalled();
});
it("falls back to editing a long-lived preview when fresh final send returns false", async () => {
const visibleSinceMs = 10_000;
const harness = createHarness({
answerMessageId: 999,
answerHasStreamedMessage: true,
answerLastPartialText: "Working...",
answerPreviewVisibleSinceMs: visibleSinceMs,
nowMs: visibleSinceMs + 60_000,
});
harness.sendPayload.mockResolvedValueOnce(false);
const result = await deliverFinalAnswer(harness, HELLO_FINAL);
expect(expectPreviewFinalized(result)).toEqual({
content: HELLO_FINAL,
messageId: 999,
});
expect(harness.stopDraftLane).toHaveBeenCalledTimes(2);
expect(harness.sendPayload).toHaveBeenCalledTimes(1);
expect(harness.editPreview).toHaveBeenCalledWith(
expect.objectContaining({
messageId: 999,
text: HELLO_FINAL,
}),
);
expect(harness.answer.stream?.clear).not.toHaveBeenCalled();
expect(harness.markDelivered).toHaveBeenCalledTimes(1);
});
it("sends a fresh final for stale archived previews", async () => {
const visibleSinceMs = 10_000;
const harness = createHarness({
answerMessageId: 1001,
answerPreviewVisibleSinceMs: visibleSinceMs,
nowMs: visibleSinceMs + 60_000,
});
harness.archivedAnswerPreviews.push({
messageId: 222,
textSnapshot: "Working...",
visibleSinceMs,
deleteIfUnused: true,
});
const result = await deliverFinalAnswer(harness, HELLO_FINAL);
expect(result.kind).toBe("sent");
expect(harness.sendPayload).toHaveBeenCalledWith(
expect.objectContaining({ text: HELLO_FINAL }),
);
expect(harness.editPreview).not.toHaveBeenCalled();
expect(harness.deletePreviewMessage).toHaveBeenCalledWith(222);
});
it("falls back to editing a stale archived preview when fresh final send returns false", async () => {
const visibleSinceMs = 10_000;
const harness = createHarness({
answerMessageId: 1001,
answerPreviewVisibleSinceMs: visibleSinceMs,
nowMs: visibleSinceMs + 60_000,
});
harness.archivedAnswerPreviews.push({
messageId: 222,
textSnapshot: "Working...",
visibleSinceMs,
deleteIfUnused: true,
});
harness.sendPayload.mockResolvedValueOnce(false);
const result = await deliverFinalAnswer(harness, HELLO_FINAL);
expect(expectPreviewFinalized(result)).toEqual({
content: HELLO_FINAL,
messageId: 222,
});
expect(harness.sendPayload).toHaveBeenCalledTimes(1);
expect(harness.editPreview).toHaveBeenCalledWith(
expect.objectContaining({
messageId: 222,
text: HELLO_FINAL,
}),
);
expect(harness.deletePreviewMessage).not.toHaveBeenCalled();
expect(harness.markDelivered).toHaveBeenCalledTimes(1);
});
it("materializes DM draft streaming final even when text is unchanged", async () => {
const answerStream = createTestDraftStream({ previewMode: "draft", messageId: 321 });
answerStream.materialize.mockResolvedValue(321);

View File

@@ -1,4 +1,5 @@
import "./test-helpers.js";
import { EventEmitter } from "node:events";
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
@@ -42,25 +43,57 @@ type WebAutoReplyMonitorHarness = {
controller: AbortController;
run: Promise<unknown>;
};
type MockSessionSocket = {
ev: { on: ReturnType<typeof vi.fn>; off: ReturnType<typeof vi.fn> };
ws: EventEmitter & { close: ReturnType<typeof vi.fn> };
user: { id: string };
};
export const TEST_NET_IP = "93.184.216.34";
const WEB_AUTO_REPLY_SOCKETS_KEY = Symbol.for("openclaw:webAutoReplySessionSockets");
function getSessionSockets(): MockSessionSocket[] {
const store = globalThis as Record<PropertyKey, unknown>;
if (!Array.isArray(store[WEB_AUTO_REPLY_SOCKETS_KEY])) {
store[WEB_AUTO_REPLY_SOCKETS_KEY] = [];
}
return store[WEB_AUTO_REPLY_SOCKETS_KEY] as MockSessionSocket[];
}
vi.mock("./session.js", async () => {
const actual = await vi.importActual<typeof import("./session.js")>("./session.js");
return {
...actual,
createWaSocket: vi.fn(async () => ({
ev: {
on: vi.fn(),
off: vi.fn(),
},
ws: { close: vi.fn() },
user: { id: "123@s.whatsapp.net" },
})),
createWaSocket: vi.fn(async () => {
const ws = new EventEmitter() as MockSessionSocket["ws"];
ws.close = vi.fn();
const sock: MockSessionSocket = {
ev: {
on: vi.fn(),
off: vi.fn(),
},
ws,
user: { id: "123@s.whatsapp.net" },
};
getSessionSockets().push(sock);
return sock;
}),
waitForWaConnection: vi.fn().mockResolvedValue(undefined),
};
});
export function getLastWebAutoReplySessionSocket(): MockSessionSocket {
const last = getSessionSockets().at(-1);
if (!last) {
throw new Error("No WhatsApp Web auto-reply test socket created");
}
return last;
}
export function resetWebAutoReplySessionSockets() {
getSessionSockets().length = 0;
}
vi.mock("openclaw/plugin-sdk/agent-runtime", () => ({
abortEmbeddedPiRun: vi.fn().mockReturnValue(false),
appendCronStyleCurrentTimeLine: (text: string) => text,
@@ -166,6 +199,7 @@ export function installWebAutoReplyUnitTestHooks(opts?: { pinDns?: boolean }) {
beforeEach(async () => {
vi.clearAllMocks();
resetWebAutoReplySessionSockets();
_resetBaileysMocks();
_resetLoadConfigMock();
if (opts?.pinDns) {

View File

@@ -12,6 +12,7 @@ import {
createMockWebListener,
createScriptedWebListenerFactory,
createWebListenerFactoryCapture,
getLastWebAutoReplySessionSocket,
installWebAutoReplyTestHomeHooks,
installWebAutoReplyUnitTestHooks,
makeSessionStore,
@@ -255,6 +256,92 @@ describe("web auto-reply connection", () => {
}
});
it("keeps quiet linked-device sessions open when transport frames keep arriving", async () => {
vi.useFakeTimers();
try {
const sleep = vi.fn(async () => {});
const scripted = createScriptedWebListenerFactory();
const { controller, run } = startWebAutoReplyMonitor({
monitorWebChannelFn: monitorWebChannel as never,
listenerFactory: scripted.listenerFactory,
sleep,
heartbeatSeconds: 60,
messageTimeoutMs: 30,
watchdogCheckMs: 5,
});
await vi.waitFor(
() => {
expect(scripted.getListenerCount()).toBe(1);
},
{ timeout: 250, interval: 2 },
);
const socket = getLastWebAutoReplySessionSocket();
await vi.advanceTimersByTimeAsync(20);
socket.ws.emit("frame");
await vi.advanceTimersByTimeAsync(20);
socket.ws.emit("frame");
await vi.advanceTimersByTimeAsync(20);
expect(scripted.getListenerCount()).toBe(1);
controller.abort();
scripted.resolveClose(0, { status: 499, isLoggedOut: false });
await Promise.resolve();
await run;
} finally {
vi.useRealTimers();
}
});
it("does not let transport frames mask application silence forever", async () => {
vi.useFakeTimers();
try {
const sleep = vi.fn(async () => {});
const scripted = createScriptedWebListenerFactory();
const { controller, run } = startWebAutoReplyMonitor({
monitorWebChannelFn: monitorWebChannel as never,
listenerFactory: scripted.listenerFactory,
sleep,
heartbeatSeconds: 60,
messageTimeoutMs: 30,
watchdogCheckMs: 5,
});
await vi.waitFor(
() => {
expect(scripted.getListenerCount()).toBe(1);
},
{ timeout: 250, interval: 2 },
);
const socket = getLastWebAutoReplySessionSocket();
for (let elapsedMs = 0; elapsedMs < 140; elapsedMs += 20) {
socket.ws.emit("frame");
await vi.advanceTimersByTimeAsync(20);
}
await vi.waitFor(
() => {
expect(scripted.getListenerCount()).toBeGreaterThanOrEqual(2);
},
{ timeout: 250, interval: 2 },
);
controller.abort();
scripted.resolveClose(scripted.getListenerCount() - 1, {
status: 499,
isLoggedOut: false,
error: "aborted",
});
await Promise.resolve();
await run;
} finally {
vi.useRealTimers();
}
});
it("gives a reconnected listener a fresh watchdog window", async () => {
vi.useFakeTimers();
try {

View File

@@ -280,6 +280,7 @@ export async function monitorWebChannel(
reconnectAttempts: snapshot.reconnectAttempts,
messagesHandled: snapshot.handledMessages,
lastInboundAt: snapshot.lastInboundAt,
lastTransportActivityAt: snapshot.lastTransportActivityAt,
authAgeMs,
uptimeMs: snapshot.uptimeMs,
...(minutesSinceLastMessage !== null && minutesSinceLastMessage > 30
@@ -297,20 +298,28 @@ export async function monitorWebChannel(
}
},
onWatchdogTimeout: (snapshot) => {
const watchdogBaselineAt = snapshot.lastInboundAt ?? snapshot.startedAt;
const minutesSinceLastMessage = Math.floor((Date.now() - watchdogBaselineAt) / 60000);
const now = Date.now();
const transportSilentMs = now - snapshot.lastTransportActivityAt;
const appBaselineAt = snapshot.lastInboundAt ?? snapshot.startedAt;
const minutesSinceTransportActivity = Math.floor(transportSilentMs / 60000);
const minutesSinceAppActivity = Math.floor((now - appBaselineAt) / 60000);
const watchdogReason =
transportSilentMs > messageTimeoutMs ? "transport-inactive" : "app-silent";
statusController.noteWatchdogStale();
heartbeatLogger.warn(
{
connectionId: snapshot.connectionId,
minutesSinceLastMessage,
watchdogReason,
minutesSinceTransportActivity,
minutesSinceAppActivity,
lastInboundAt: snapshot.lastInboundAt ? new Date(snapshot.lastInboundAt) : null,
lastTransportActivityAt: new Date(snapshot.lastTransportActivityAt),
messagesHandled: snapshot.handledMessages,
},
"Message timeout detected - forcing reconnect",
"WhatsApp watchdog timeout detected - forcing reconnect",
);
whatsappHeartbeatLog.warn(
`No messages received in ${minutesSinceLastMessage}m - restarting connection`,
`WhatsApp watchdog timeout (${watchdogReason}) - restarting connection`,
);
},
});

View File

@@ -40,8 +40,10 @@ export type WhatsAppLiveConnection = {
heartbeat: TimerHandle | null;
watchdogTimer: TimerHandle | null;
lastInboundAt: number | null;
lastTransportActivityAt: number;
handledMessages: number;
unregisterUnhandled: (() => void) | null;
unregisterTransportActivity: (() => void) | null;
backgroundTasks: Set<Promise<unknown>>;
closePromise: Promise<WebListenerCloseReason>;
resolveClose: (reason: WebListenerCloseReason) => void;
@@ -51,6 +53,7 @@ export type WhatsAppConnectionSnapshot = {
connectionId: string;
startedAt: number;
lastInboundAt: number | null;
lastTransportActivityAt: number;
handledMessages: number;
reconnectAttempts: number;
uptimeMs: number;
@@ -83,6 +86,12 @@ function createNeverResolvePromise<T>(): Promise<T> {
return new Promise<T>(() => {});
}
type SocketActivityEmitter = {
on?: (event: string, listener: (...args: unknown[]) => void) => void;
off?: (event: string, listener: (...args: unknown[]) => void) => void;
removeListener?: (event: string, listener: (...args: unknown[]) => void) => void;
};
function createLiveConnection(params: {
connectionId: string;
sock: WASocket;
@@ -108,8 +117,10 @@ function createLiveConnection(params: {
heartbeat: null,
watchdogTimer: null,
lastInboundAt: null,
lastTransportActivityAt: Date.now(),
handledMessages: 0,
unregisterUnhandled: null,
unregisterTransportActivity: null,
backgroundTasks: new Set<Promise<unknown>>(),
closePromise,
resolveClose: resolveClosePromise,
@@ -232,6 +243,7 @@ export class WhatsAppConnectionController {
private readonly heartbeatSeconds: number;
private readonly keepAlive: boolean;
private readonly messageTimeoutMs: number;
private readonly appSilenceTimeoutMs: number;
private readonly watchdogCheckMs: number;
private readonly verbose: boolean;
private readonly abortSignal?: AbortSignal;
@@ -262,6 +274,7 @@ export class WhatsAppConnectionController {
this.keepAlive = params.keepAlive;
this.heartbeatSeconds = params.heartbeatSeconds;
this.messageTimeoutMs = params.messageTimeoutMs;
this.appSilenceTimeoutMs = Math.max(params.messageTimeoutMs, params.messageTimeoutMs * 4);
this.watchdogCheckMs = params.watchdogCheckMs;
this.reconnectPolicy = params.reconnectPolicy;
this.abortSignal = params.abortSignal;
@@ -311,6 +324,14 @@ export class WhatsAppConnectionController {
}
this.current.handledMessages += 1;
this.current.lastInboundAt = timestamp;
this.current.lastTransportActivityAt = timestamp;
}
noteTransportActivity(timestamp = Date.now()): void {
if (!this.current) {
return;
}
this.current.lastTransportActivityAt = timestamp;
}
getCurrentSnapshot(
@@ -323,6 +344,7 @@ export class WhatsAppConnectionController {
connectionId: connection.connectionId,
startedAt: connection.startedAt,
lastInboundAt: connection.lastInboundAt,
lastTransportActivityAt: connection.lastTransportActivityAt,
handledMessages: connection.handledMessages,
reconnectAttempts: this.reconnectAttempts,
uptimeMs: Date.now() - connection.startedAt,
@@ -369,6 +391,7 @@ export class WhatsAppConnectionController {
const listener = await params.createListener({ sock, connection });
connection.listener = listener;
this.current = connection;
connection.unregisterTransportActivity = this.attachTransportActivityListener(sock);
registerWhatsAppConnectionController(this.accountId, this);
this.startTimers(connection, {
onHeartbeat: params.onHeartbeat,
@@ -383,6 +406,7 @@ export class WhatsAppConnectionController {
if (connection?.unregisterUnhandled) {
connection.unregisterUnhandled();
}
connection?.unregisterTransportActivity?.();
throw err;
}
}
@@ -515,6 +539,7 @@ export class WhatsAppConnectionController {
this.socketRef.current = null;
}
connection.unregisterUnhandled?.();
connection.unregisterTransportActivity?.();
if (connection.heartbeat) {
clearInterval(connection.heartbeat);
}
@@ -563,9 +588,14 @@ export class WhatsAppConnectionController {
}, this.heartbeatSeconds * 1000);
connection.watchdogTimer = setInterval(() => {
const baselineAt = connection.lastInboundAt ?? connection.startedAt;
const staleForMs = Date.now() - baselineAt;
if (staleForMs <= this.messageTimeoutMs) {
const now = Date.now();
const transportStaleForMs = now - connection.lastTransportActivityAt;
const appBaselineAt = connection.lastInboundAt ?? connection.startedAt;
const appSilentForMs = now - appBaselineAt;
if (
transportStaleForMs <= this.messageTimeoutMs &&
appSilentForMs <= this.appSilenceTimeoutMs
) {
return;
}
const snapshot = this.getCurrentSnapshot(connection);
@@ -581,6 +611,24 @@ export class WhatsAppConnectionController {
}, this.watchdogCheckMs);
}
private attachTransportActivityListener(sock: WASocket): (() => void) | null {
const ws = sock.ws as SocketActivityEmitter | undefined;
if (!ws || typeof ws.on !== "function") {
return null;
}
const noteActivity = () => this.noteTransportActivity();
ws.on("frame", noteActivity);
return () => {
if (typeof ws.off === "function") {
ws.off("frame", noteActivity);
return;
}
ws.removeListener?.("frame", noteActivity);
};
}
private stopDisconnectRetries(): void {
if (!this.disconnectRetryController.signal.aborted) {
this.disconnectRetryController.abort();

View File

@@ -37,14 +37,20 @@
"!dist/extensions/qa-channel/**",
"!dist/extensions/qa-lab/**",
"!dist/extensions/qa-matrix/**",
"!dist/plugin-sdk/extensions/qa-channel/**",
"!dist/plugin-sdk/extensions/qa-lab/**",
"!dist/plugin-sdk/qa-channel.*",
"!dist/plugin-sdk/qa-channel-protocol.*",
"!dist/plugin-sdk/qa-lab.*",
"!dist/plugin-sdk/qa-runtime.*",
"!dist/plugin-sdk/src/plugin-sdk/qa-channel.d.ts",
"!dist/plugin-sdk/src/plugin-sdk/qa-channel-protocol.d.ts",
"!dist/plugin-sdk/src/plugin-sdk/qa-lab.d.ts",
"!dist/plugin-sdk/src/plugin-sdk/qa-runtime.d.ts",
"!dist/qa-runtime-*.js",
"docs/",
"!docs/.generated/**",
"!docs/channels/qa-channel.md",
"patches/",
"skills/",
"scripts/npm-runner.mjs",
@@ -1044,14 +1050,6 @@
"types": "./dist/plugin-sdk/nostr.d.ts",
"default": "./dist/plugin-sdk/nostr.js"
},
"./plugin-sdk/qa-channel": {
"types": "./dist/plugin-sdk/qa-channel.d.ts",
"default": "./dist/plugin-sdk/qa-channel.js"
},
"./plugin-sdk/qa-channel-protocol": {
"types": "./dist/plugin-sdk/qa-channel-protocol.d.ts",
"default": "./dist/plugin-sdk/qa-channel-protocol.js"
},
"./plugin-sdk/provider-auth": {
"types": "./dist/plugin-sdk/provider-auth.d.ts",
"default": "./dist/plugin-sdk/provider-auth.js"
@@ -1335,6 +1333,7 @@
"check:timed": "node scripts/check-timed.mjs",
"check:timed:all-types": "node scripts/check-timed.mjs --include-test-types",
"check:timed:architecture": "node scripts/check-timed.mjs --include-architecture",
"check:workflows": "node scripts/check-workflows.mjs",
"ci:timings": "node scripts/ci-run-timings.mjs --latest-main",
"ci:timings:recent": "node scripts/ci-run-timings.mjs --recent 10",
"codex-app-server:protocol:check": "node --import tsx scripts/check-codex-app-server-protocol.ts",
@@ -1400,6 +1399,7 @@
"lint:auth:no-pairing-store-group": "node scripts/check-no-pairing-store-group-auth.mjs",
"lint:auth:pairing-account-scope": "node scripts/check-pairing-account-scope.mjs",
"lint:core": "node scripts/run-oxlint.mjs --tsconfig tsconfig.oxlint.core.json src ui packages",
"lint:docker-e2e": "node scripts/check-docker-e2e-boundaries.mjs",
"lint:docs": "pnpm dlx markdownlint-cli2",
"lint:docs:fix": "pnpm dlx markdownlint-cli2 --fix",
"lint:extensions": "node scripts/run-oxlint.mjs --tsconfig tsconfig.oxlint.extensions.json extensions",
@@ -1415,7 +1415,7 @@
"lint:plugins:no-monolithic-plugin-sdk-entry-imports": "node --import tsx scripts/check-no-monolithic-plugin-sdk-entry-imports.ts",
"lint:plugins:no-register-http-handler": "node scripts/check-no-register-http-handler.mjs",
"lint:plugins:plugin-sdk-subpaths-exported": "node scripts/check-plugin-sdk-subpath-exports.mjs",
"lint:scripts": "node scripts/run-oxlint.mjs --tsconfig tsconfig.oxlint.scripts.json scripts",
"lint:scripts": "pnpm lint:docker-e2e && node scripts/run-oxlint.mjs --tsconfig tsconfig.oxlint.scripts.json scripts",
"lint:swift": "swiftlint lint --config .swiftlint.yml && (cd apps/ios && swiftlint lint --config .swiftlint.yml)",
"lint:tmp:channel-agnostic-boundaries": "node scripts/check-channel-agnostic-boundaries.mjs",
"lint:tmp:dynamic-import-warts": "node scripts/check-dynamic-import-warts.mjs",
@@ -1478,7 +1478,6 @@
"test:build:singleton": "node scripts/test-built-plugin-singleton.mjs",
"test:bundled": "node scripts/run-vitest.mjs run --config test/vitest/vitest.bundled.config.ts",
"test:changed": "node scripts/test-projects.mjs --changed origin/main",
"test:changed:focused": "OPENCLAW_TEST_CHANGED_FOCUSED=1 node scripts/test-projects.mjs --changed origin/main",
"test:changed:max": "OPENCLAW_VITEST_MAX_WORKERS=8 node scripts/test-projects.mjs --changed origin/main",
"test:channels": "node scripts/run-vitest.mjs run --config test/vitest/vitest.channels.config.ts",
"test:contracts": "pnpm test:contracts:channels && pnpm test:contracts:plugins",
@@ -1541,7 +1540,9 @@
"test:docker:plugin-update": "bash scripts/e2e/plugin-update-unchanged-docker.sh",
"test:docker:plugins": "bash scripts/e2e/plugins-docker.sh",
"test:docker:qr": "bash scripts/e2e/qr-import-docker.sh",
"test:docker:rerun": "node scripts/docker-e2e-rerun.mjs",
"test:docker:session-runtime-context": "bash scripts/e2e/session-runtime-context-docker.sh",
"test:docker:timings": "node scripts/docker-e2e-timings.mjs",
"test:docker:update-channel-switch": "bash scripts/e2e/update-channel-switch-docker.sh",
"test:e2e": "node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts",
"test:e2e:openshell": "OPENCLAW_E2E_OPENSHELL=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts extensions/openshell/src/backend.e2e.test.ts",

View File

@@ -0,0 +1,156 @@
# Docker Prometheus smoke
```yaml qa-scenario
id: docker-prometheus-smoke
title: Docker Prometheus smoke
surface: telemetry
coverage:
primary:
- telemetry.prometheus
secondary:
- harness.qa-lab
- docker.e2e
objective: Verify a QA-lab gateway run emits protected, bounded Prometheus diagnostics metrics through the diagnostics-prometheus plugin.
successCriteria:
- The diagnostics-prometheus plugin exposes the protected scrape route.
- An unauthenticated scrape is rejected.
- A minimal QA-channel agent turn completes.
- The authenticated scrape includes release-critical diagnostics metric families.
- Prometheus output omits prompt content, session keys, auth tokens, raw ids, and file paths.
plugins:
- diagnostics-prometheus
gatewayConfigPatch:
diagnostics:
enabled: true
docsRefs:
- docs/gateway/prometheus.md
- docs/concepts/qa-e2e-automation.md
codeRefs:
- extensions/diagnostics-prometheus/src/service.ts
- src/diagnostics/internal-diagnostics.ts
- extensions/qa-lab/src/suite.ts
execution:
kind: flow
summary: Complete a minimal QA-lab turn and scrape the protected Prometheus route.
config:
prompt: Reply exactly DOCKER-PROMETHEUS-OK. Do not repeat DOCKER-PROMETHEUS-SECRET.
secretNeedle: DOCKER-PROMETHEUS-SECRET
```
```yaml qa-flow
steps:
- name: emits protected low-cardinality prometheus metrics
actions:
- call: waitForGatewayHealthy
args:
- ref: env
- 60000
- call: waitForQaChannelReady
args:
- ref: env
- 60000
- call: reset
- set: startCursor
value:
expr: state.getSnapshot().messages.length
- call: runAgentPrompt
args:
- ref: env
- sessionKey: agent:qa:docker-prometheus-smoke
message:
expr: config.prompt
timeoutMs:
expr: liveTurnTimeoutMs(env, 30000)
- call: waitForCondition
saveAs: outbound
args:
- lambda:
expr: "state.getSnapshot().messages.slice(startCursor).filter((candidate) => candidate.direction === 'outbound' && candidate.conversation.id === 'qa-operator' && String(candidate.text ?? '').trim().length > 0).at(-1)"
- expr: liveTurnTimeoutMs(env, 30000)
- expr: "env.providerMode === 'mock-openai' ? 100 : 250"
- assert:
expr: "String(outbound.text ?? '').trim().length > 0"
message: "expected non-empty qa output before scraping metrics"
- set: prometheusUrl
value:
expr: "`${env.gateway.baseUrl}/api/diagnostics/prometheus`"
- set: gatewayToken
value:
expr: "String(env.gateway.token ?? env.gateway.runtimeEnv.OPENCLAW_GATEWAY_TOKEN ?? '')"
- assert:
expr: "gatewayToken.length > 0"
message: "expected QA gateway token to be available for protected scrape"
- set: unauthenticatedScrape
value:
expr: |-
(async () => {
const response = await fetch(prometheusUrl);
await response.text().catch(() => "");
return { status: response.status };
})()
- assert:
expr: "unauthenticatedScrape.status === 401 || unauthenticatedScrape.status === 403"
message:
expr: "`expected unauthenticated prometheus scrape to be rejected, got ${unauthenticatedScrape.status}`"
- set: authenticatedScrape
value:
expr: |-
(async () => {
const response = await fetch(prometheusUrl, {
headers: { authorization: `Bearer ${gatewayToken}` },
});
const text = await response.text();
return {
status: response.status,
contentType: response.headers.get("content-type") ?? "",
text,
};
})()
- assert:
expr: "authenticatedScrape.status === 200"
message:
expr: "`expected authenticated prometheus scrape to return 200, got ${authenticatedScrape.status}`"
- assert:
expr: "authenticatedScrape.contentType.includes('text/plain')"
message:
expr: "`expected prometheus text content type, got ${authenticatedScrape.contentType}`"
- set: prometheusText
value:
expr: "String(authenticatedScrape.text ?? '')"
- assert:
expr: "prometheusText.includes('# TYPE openclaw_run_completed_total counter')"
message: "missing run completion counter"
- assert:
expr: "prometheusText.includes('# TYPE openclaw_run_duration_seconds histogram')"
message: "missing run duration histogram"
- assert:
expr: "prometheusText.includes('# TYPE openclaw_model_call_total counter')"
message: "missing model call counter"
- assert:
expr: "prometheusText.includes('# TYPE openclaw_harness_run_total counter')"
message: "missing harness run counter"
- assert:
expr: "!prometheusText.includes(config.secretNeedle)"
message: "prometheus output leaked prompt sentinel"
- assert:
expr: "!prometheusText.includes('DOCKER-PROMETHEUS-OK')"
message: "prometheus output leaked response content"
- assert:
expr: "!prometheusText.includes('agent:qa:docker-prometheus-smoke')"
message: "prometheus output leaked the session key"
- assert:
expr: "!prometheusText.includes(gatewayToken)"
message: "prometheus output leaked the gateway token"
- assert:
expr: "!/runId|sessionId|sessionKey|callId|toolCallId|messageId|providerRequestId/.test(prometheusText)"
message: "prometheus output leaked raw diagnostic identifiers"
- assert:
expr: "!/\\/tmp\\/|\\/private\\/tmp\\/|\\/app\\//.test(prometheusText)"
message: "prometheus output leaked a local file path"
- assert:
expr: "!prometheusText.includes('openclaw.content.')"
message: "prometheus output leaked content attributes"
- assert:
expr: "!/openclaw_prometheus_series_dropped_total(?:\\{[^}]*\\})?\\s+(?!0(?:\\.0+)?(?:\\s|$))/.test(prometheusText)"
message: "prometheus dropped series during the smoke"
```

View File

@@ -67,7 +67,7 @@ export function createEmptyChangedLanes() {
/**
* @param {string[]} changedPaths
* @param {{ packageJsonChangeKind?: "liveDockerTooling" | null }} [options]
* @param {{ packageJsonChangeKind?: "liveDockerTooling" | "tooling" | null }} [options]
* @returns {ChangedLaneResult}
*/
export function detectChangedLanes(changedPaths, options = {}) {
@@ -80,6 +80,8 @@ export function detectChangedLanes(changedPaths, options = {}) {
let hasNonDocs = false;
const packageJsonIsLiveDockerTooling =
paths.includes("package.json") && options.packageJsonChangeKind === "liveDockerTooling";
const packageJsonIsTooling =
paths.includes("package.json") && options.packageJsonChangeKind === "tooling";
if (paths.length === 0) {
reasons.push("no changed paths");
@@ -88,6 +90,7 @@ export function detectChangedLanes(changedPaths, options = {}) {
if (
!packageJsonIsLiveDockerTooling &&
!packageJsonIsTooling &&
paths.some((changedPath) => RELEASE_METADATA_PATHS.has(changedPath)) &&
paths.every(
(changedPath) => RELEASE_METADATA_PATHS.has(changedPath) || DOCS_PATH_RE.test(changedPath),
@@ -115,6 +118,12 @@ export function detectChangedLanes(changedPaths, options = {}) {
continue;
}
if (changedPath === "package.json" && packageJsonIsTooling) {
lanes.tooling = true;
reasons.push(`${changedPath}: package scripts`);
continue;
}
if (LIVE_DOCKER_TOOLING_PATH_RE.test(changedPath)) {
lanes.liveDockerTooling = true;
reasons.push(`${changedPath}: live Docker tooling surface`);
@@ -195,39 +204,57 @@ export function detectChangedLanes(changedPaths, options = {}) {
}
/**
* @param {{ base: string; head?: string; includeWorktree?: boolean }} params
* @param {{ paths: string[]; base: string; head?: string; staged?: boolean }} params
* @returns {ChangedLaneResult}
*/
export function detectChangedLanesForPaths(params) {
const packageJsonChangeKind = params.paths.includes("package.json")
? classifyPackageJsonChangeFromGit({
base: params.base,
head: params.head,
staged: params.staged,
})
: null;
return detectChangedLanes(params.paths, { packageJsonChangeKind });
}
/**
* @param {{ base: string; head?: string; includeWorktree?: boolean; cwd?: string }} params
* @returns {string[]}
*/
export function listChangedPathsFromGit(params) {
const base = params.base;
const head = params.head ?? "HEAD";
const cwd = params.cwd ?? process.cwd();
if (!base) {
return [];
}
const rangePaths = runGitNameOnlyDiff([`${base}...${head}`]);
const rangePaths = runGitNameOnlyDiff([`${base}...${head}`], cwd);
if (params.includeWorktree === false) {
return rangePaths;
}
return [
...new Set([
...rangePaths,
...runGitNameOnlyDiff(["--cached", "--diff-filter=ACMR"]),
...runGitNameOnlyDiff(["--diff-filter=ACMR"]),
...runGitLsFiles(["--others", "--exclude-standard"]),
...runGitNameOnlyDiff(["--cached", "--diff-filter=ACMR"], cwd),
...runGitNameOnlyDiff(["--diff-filter=ACMR"], cwd),
...runGitLsFiles(["--others", "--exclude-standard"], cwd),
]),
].toSorted((left, right) => left.localeCompare(right));
}
function runGitNameOnlyDiff(extraArgs) {
function runGitNameOnlyDiff(extraArgs, cwd = process.cwd()) {
const output = execFileSync("git", ["diff", "--name-only", ...extraArgs], {
cwd,
stdio: ["ignore", "pipe", "pipe"],
encoding: "utf8",
});
return output.split("\n").map(normalizeChangedPath).filter(Boolean);
}
function runGitLsFiles(extraArgs) {
function runGitLsFiles(extraArgs, cwd = process.cwd()) {
const output = execFileSync("git", ["ls-files", ...extraArgs], {
cwd,
stdio: ["ignore", "pipe", "pipe"],
encoding: "utf8",
});
@@ -245,7 +272,10 @@ export function listStagedChangedPaths() {
export function classifyPackageJsonChangeFromGit(params) {
try {
const { before, after } = readPackageJsonBeforeAfter(params);
return isLiveDockerPackageScriptOnlyChange(before, after) ? "liveDockerTooling" : null;
if (isLiveDockerPackageScriptOnlyChange(before, after)) {
return "liveDockerTooling";
}
return isPackageScriptOnlyChange(before, after) ? "tooling" : null;
} catch {
return null;
}
@@ -265,6 +295,20 @@ export function isLiveDockerPackageScriptOnlyChange(before, after) {
);
}
export function isPackageScriptOnlyChange(before, after) {
const beforePackage = JSON.parse(before);
const afterPackage = JSON.parse(after);
const beforeScripts = extractPackageScripts(beforePackage);
const afterScripts = extractPackageScripts(afterPackage);
const beforeStripped = stripPackageScripts(beforePackage);
const afterStripped = stripPackageScripts(afterPackage);
return (
stableJson(beforeStripped) === stableJson(afterStripped) &&
stableJson(beforeScripts) !== stableJson(afterScripts)
);
}
function readPackageJsonBeforeAfter(params) {
const before = readGitText(params.staged ? "HEAD" : params.base, "package.json");
if (params.staged) {
@@ -317,6 +361,17 @@ function stripLiveDockerPackageScripts(packageJson) {
return clone;
}
function extractPackageScripts(packageJson) {
const scripts = packageJson?.scripts;
return scripts && typeof scripts === "object" && !Array.isArray(scripts) ? scripts : {};
}
function stripPackageScripts(packageJson) {
const clone = JSON.parse(JSON.stringify(packageJson));
delete clone.scripts;
return clone;
}
function stableJson(value) {
if (Array.isArray(value)) {
return `[${value.map(stableJson).join(",")}]`;
@@ -418,14 +473,12 @@ if (isDirectRun()) {
: args.staged
? listStagedChangedPaths()
: listChangedPathsFromGit({ base: args.base, head: args.head });
const packageJsonChangeKind = paths.includes("package.json")
? classifyPackageJsonChangeFromGit({
base: args.base,
head: args.head,
staged: args.staged,
})
: null;
const result = detectChangedLanes(paths, { packageJsonChangeKind });
const result = detectChangedLanesForPaths({
paths,
base: args.base,
head: args.head,
staged: args.staged,
});
if (args.githubOutput) {
writeChangedLaneGitHubOutput(result);
}

View File

@@ -1,7 +1,6 @@
import { performance } from "node:perf_hooks";
import {
classifyPackageJsonChangeFromGit,
detectChangedLanes,
detectChangedLanesForPaths,
listChangedPathsFromGit,
listStagedChangedPaths,
normalizeChangedPath,
@@ -14,12 +13,7 @@ import {
} from "./lib/local-heavy-check-runtime.mjs";
import { runManagedCommand } from "./lib/managed-child-process.mjs";
import { createSparseTsgoSkipEnv } from "./lib/tsgo-sparse-guard.mjs";
import { isCiLikeEnv } from "./lib/vitest-local-scheduling.mjs";
import { resolveChangedTestTargetPlan } from "./test-projects.test-support.mjs";
export const CHANGED_CHECK_VITEST_NO_OUTPUT_TIMEOUT_MS = "600000";
const VITEST_NO_OUTPUT_TIMEOUT_ENV_KEY = "OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS";
const VITEST_NO_OUTPUT_RETRY_ENV_KEY = "OPENCLAW_VITEST_NO_OUTPUT_RETRY";
const LIVE_DOCKER_AUTH_SHELL_TARGETS = [
"scripts/lib/live-docker-auth.sh",
"scripts/test-live-acp-bind-docker.sh",
@@ -39,35 +33,6 @@ export function createChangedCheckChildEnv(baseEnv = process.env) {
};
}
export function createChangedCheckVitestEnv(baseEnv = process.env) {
const resolvedBaseEnv = createChangedCheckChildEnv(baseEnv);
const env = {
...resolvedBaseEnv,
[VITEST_NO_OUTPUT_TIMEOUT_ENV_KEY]:
resolvedBaseEnv[VITEST_NO_OUTPUT_TIMEOUT_ENV_KEY]?.trim() ||
CHANGED_CHECK_VITEST_NO_OUTPUT_TIMEOUT_MS,
[VITEST_NO_OUTPUT_RETRY_ENV_KEY]:
resolvedBaseEnv[VITEST_NO_OUTPUT_RETRY_ENV_KEY]?.trim() || "0",
};
const hasWorkerOverride = Boolean(
(resolvedBaseEnv.OPENCLAW_VITEST_MAX_WORKERS ?? resolvedBaseEnv.OPENCLAW_TEST_WORKERS)?.trim(),
);
const hasParallelOverride = Boolean(resolvedBaseEnv.OPENCLAW_TEST_PROJECTS_PARALLEL?.trim());
const serialOverride = resolvedBaseEnv.OPENCLAW_TEST_PROJECTS_SERIAL?.trim();
if (
!isCiLikeEnv(resolvedBaseEnv) &&
!hasWorkerOverride &&
!hasParallelOverride &&
serialOverride !== "0"
) {
env.OPENCLAW_TEST_PROJECTS_SERIAL = serialOverride || "1";
env.OPENCLAW_VITEST_MAX_WORKERS = "1";
}
return env;
}
export function createChangedCheckPlan(result, options = {}) {
const commands = [];
const baseEnv = createChangedCheckChildEnv(options.env ?? process.env);
@@ -93,10 +58,6 @@ export function createChangedCheckPlan(result, options = {}) {
if (result.docsOnly) {
return {
commands,
testTargets: [],
runChangedTestsBroad: false,
runFullTests: false,
runExtensionTests: false,
summary: "docs-only",
};
}
@@ -118,10 +79,6 @@ export function createChangedCheckPlan(result, options = {}) {
add("root dependency ownership", ["deps:root-ownership:check"]);
return {
commands,
testTargets: [],
runChangedTestsBroad: false,
runFullTests: false,
runExtensionTests: false,
summary: "release metadata",
};
}
@@ -132,10 +89,6 @@ export function createChangedCheckPlan(result, options = {}) {
add("runtime import cycles", ["check:import-cycles"]);
return {
commands,
testTargets: [],
runChangedTestsBroad: false,
runFullTests: true,
runExtensionTests: false,
summary: "all",
};
}
@@ -189,26 +142,10 @@ export function createChangedCheckPlan(result, options = {}) {
OPENCLAW_DOCKER_ALL_DRY_RUN: "1",
OPENCLAW_DOCKER_ALL_LIVE_MODE: "only",
});
add(
"ACP bind unit tests",
["test", "src/gateway/live-agent-probes.test.ts", "src/agents/acp-spawn.test.ts"],
createChangedCheckVitestEnv(baseEnv),
);
add("ACPX extension tests", ["test:extension", "acpx"], createChangedCheckVitestEnv(baseEnv));
}
const testPlan = resolveChangedTestTargetPlan(result.paths);
const runExtensionTests = result.extensionImpactFromCore;
const testTargets = runExtensionTests
? testPlan.targets.filter((target) => target !== "extensions")
: testPlan.targets;
const runChangedTestsBroad = testPlan.mode === "broad";
return {
commands,
testTargets,
runChangedTestsBroad,
runFullTests: false,
runExtensionTests,
summary: Object.entries(lanes)
.filter(([, enabled]) => enabled)
.map(([lane]) => lane)
@@ -244,61 +181,6 @@ export async function runChangedCheck(result, options = {}) {
}
}
if (plan.runFullTests) {
const status = await runPnpm(
{ name: "tests all", args: ["test"], env: createChangedCheckVitestEnv(childEnv) },
timings,
);
if (status !== 0) {
printSummary(timings, options);
return status;
}
} else if (plan.runChangedTestsBroad) {
const testArgs = options.explicitPaths
? ["test"]
: ["test", "--changed", options.base ?? "origin/main"];
const status = await runPnpm(
{
name: options.explicitPaths ? "tests all" : "tests changed broad",
args: testArgs,
env: createChangedCheckVitestEnv(childEnv),
},
timings,
);
if (status !== 0) {
printSummary(timings, options);
return status;
}
} else if (plan.testTargets.length > 0) {
const status = await runPnpm(
{
name: "tests changed",
args: ["test", ...plan.testTargets],
env: createChangedCheckVitestEnv(childEnv),
},
timings,
);
if (status !== 0) {
printSummary(timings, options);
return status;
}
}
if (plan.runExtensionTests) {
const status = await runPnpm(
{
name: "tests extensions",
args: ["test:extensions"],
env: createChangedCheckVitestEnv(childEnv),
},
timings,
);
if (status !== 0) {
printSummary(timings, options);
return status;
}
}
printSummary(timings, options);
return 0;
} finally {
@@ -314,17 +196,11 @@ function printPlan(result, plan, options) {
const prefix = options.dryRun ? "[check:changed:dry-run]" : "[check:changed]";
console.error(`${prefix} lanes=${plan.summary || "none"}`);
if (result.extensionImpactFromCore) {
console.error(`${prefix} core contract changed; extension tests included`);
}
if (plan.runChangedTestsBroad) {
console.error(`${prefix} broad changed tests included`);
console.error(`${prefix} extension-impacting surface; extension typecheck included`);
}
for (const reason of result.reasons) {
console.error(`${prefix} ${reason}`);
}
if (plan.testTargets.length > 0) {
console.error(`${prefix} test targets=${plan.testTargets.length}`);
}
}
async function runPnpm(command, timings) {
@@ -408,14 +284,12 @@ if (isDirectRun()) {
: args.staged
? listStagedChangedPaths()
: listChangedPathsFromGit({ base: args.base, head: args.head });
const packageJsonChangeKind = paths.includes("package.json")
? classifyPackageJsonChangeFromGit({
base: args.base,
head: args.head,
staged: args.staged,
})
: null;
const result = detectChangedLanes(paths, { packageJsonChangeKind });
const result = detectChangedLanesForPaths({
paths,
base: args.base,
head: args.head,
staged: args.staged,
});
process.exitCode = await runChangedCheck(result, {
...args,
explicitPaths: args.paths.length > 0,

View File

@@ -0,0 +1,113 @@
#!/usr/bin/env node
// Cheap guard for Docker E2E test boundaries.
// Docker E2E must test packaged npm tarballs and package-installed images, not
// the source checkout copied or mounted as the app under test.
import fs from "node:fs";
import path from "node:path";
import { fileURLToPath } from "node:url";
import { laneResources, laneWeight } from "./lib/docker-e2e-plan.mjs";
import { allReleasePathLanes, mainLanes, tailLanes } from "./lib/docker-e2e-scenarios.mjs";
const ROOT_DIR = path.resolve(path.dirname(fileURLToPath(import.meta.url)), "..");
const errors = [];
const packageJson = JSON.parse(readText("package.json"));
const packageScripts = new Set(Object.keys(packageJson.scripts ?? {}));
function readText(relativePath) {
return fs.readFileSync(path.join(ROOT_DIR, relativePath), "utf8");
}
function walk(dir, out = []) {
for (const entry of fs.readdirSync(path.join(ROOT_DIR, dir), { withFileTypes: true })) {
const relativePath = path.join(dir, entry.name);
if (entry.isDirectory()) {
walk(relativePath, out);
} else {
out.push(relativePath);
}
}
return out;
}
for (const relativePath of walk("scripts/e2e")) {
if (!/\.(?:sh|ts|mjs|js)$/u.test(relativePath)) {
continue;
}
const text = readText(relativePath);
if (/from\s+["']\.\.\/\.\.\/src\//u.test(text) || /import\(["']\.\.\/\.\.\/src\//u.test(text)) {
errors.push(`${relativePath}: Docker E2E harness must import built dist, not ../../src`);
}
if (/-v\s+["']?\$ROOT_DIR:\/app(?::|["'\s]|$)/u.test(text)) {
errors.push(`${relativePath}: do not mount the repo root as /app in Docker E2E`);
}
}
const dockerfile = readText("scripts/e2e/Dockerfile");
if (/^\s*(?:COPY|ADD)\s+\.\s+\/app(?:\s|$)/imu.test(dockerfile)) {
errors.push("scripts/e2e/Dockerfile: do not copy the source checkout into /app");
}
function validateUniqueLanes(label, lanes) {
const seen = new Set();
for (const lane of lanes) {
if (seen.has(lane.name)) {
errors.push(`${label}: duplicate Docker E2E lane '${lane.name}'`);
}
seen.add(lane.name);
}
}
function validateLane(label, lane) {
if (!lane.name || typeof lane.name !== "string") {
errors.push(`${label}: Docker E2E lane is missing a string name`);
}
if (!lane.command || typeof lane.command !== "string") {
errors.push(`${label}: Docker E2E lane '${lane.name}' is missing a string command`);
return;
}
if (lane.e2eImageKind && lane.e2eImageKind !== "bare" && lane.e2eImageKind !== "functional") {
errors.push(
`${label}: Docker E2E lane '${lane.name}' has invalid image kind '${lane.e2eImageKind}'`,
);
}
if (lane.live && lane.e2eImageKind) {
errors.push(`${label}: live Docker E2E lane '${lane.name}' must not require a package image`);
}
if (!lane.live && !lane.e2eImageKind) {
errors.push(`${label}: package Docker E2E lane '${lane.name}' must declare an e2e image kind`);
}
if (laneWeight(lane) < 1) {
errors.push(`${label}: Docker E2E lane '${lane.name}' must have positive weight`);
}
if (!laneResources(lane).includes("docker")) {
errors.push(`${label}: Docker E2E lane '${lane.name}' must include the docker resource`);
}
for (const match of lane.command.matchAll(/\bpnpm\s+([^\s]+)/gu)) {
const script = match[1];
if (!packageScripts.has(script)) {
errors.push(
`${label}: Docker E2E lane '${lane.name}' references missing package script '${script}'`,
);
}
}
}
const releasePathLanes = allReleasePathLanes({ includeOpenWebUI: true });
for (const [label, lanes] of [
["release-path", releasePathLanes],
["main", mainLanes],
["tail", tailLanes],
]) {
validateUniqueLanes(label, lanes);
for (const lane of lanes) {
validateLane(label, lane);
}
}
if (errors.length > 0) {
console.error(errors.join("\n"));
process.exit(1);
}
console.log("Docker E2E package boundary/catalog guard passed.");

View File

@@ -0,0 +1,96 @@
#!/usr/bin/env node
// Validates the npm tarball Docker E2E lanes install.
// This is intentionally tarball-only: the check proves Docker lanes consume the
// prebuilt package artifact with dist inventory, not a source checkout.
import { spawnSync } from "node:child_process";
import fs from "node:fs";
function usage() {
return "Usage: node scripts/check-openclaw-package-tarball.mjs <openclaw.tgz>";
}
function fail(message) {
console.error(message);
process.exit(1);
}
const tarball = process.argv[2];
if (!tarball || process.argv.length > 3) {
fail(usage());
}
if (!fs.existsSync(tarball)) {
fail(`OpenClaw package tarball does not exist: ${tarball}`);
}
const list = spawnSync("tar", ["-tf", tarball], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
if (list.status !== 0) {
fail(`tar -tf failed for ${tarball}: ${list.stderr || list.status}`);
}
const entries = list.stdout
.split(/\r?\n/u)
.map((entry) => entry.trim())
.filter(Boolean);
const normalized = entries.map((entry) => entry.replace(/^package\//u, ""));
const entrySet = new Set(normalized);
const errors = [];
function readTarEntry(entryPath) {
const candidates = [entryPath, `package/${entryPath}`];
for (const candidate of candidates) {
const result = spawnSync("tar", ["-xOf", tarball, candidate], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
if (result.status === 0) {
return result.stdout;
}
}
return "";
}
for (const entry of normalized) {
if (entry.startsWith("/") || entry.split("/").includes("..")) {
errors.push(`unsafe tar entry: ${entry}`);
}
}
if (!entrySet.has("package.json")) {
errors.push("missing package.json");
}
if (!normalized.some((entry) => entry.startsWith("dist/"))) {
errors.push("missing dist/ entries");
}
if (!entrySet.has("dist/postinstall-inventory.json")) {
errors.push("missing dist/postinstall-inventory.json");
}
if (entrySet.has("dist/postinstall-inventory.json")) {
try {
const inventory = JSON.parse(readTarEntry("dist/postinstall-inventory.json"));
if (!Array.isArray(inventory) || inventory.some((entry) => typeof entry !== "string")) {
errors.push("invalid dist/postinstall-inventory.json");
} else {
for (const inventoryEntry of inventory) {
const normalizedEntry = inventoryEntry.replace(/\\/gu, "/");
if (!entrySet.has(normalizedEntry)) {
errors.push(`inventory references missing tar entry ${normalizedEntry}`);
}
}
}
} catch (error) {
errors.push(
`unreadable dist/postinstall-inventory.json: ${
error instanceof Error ? error.message : String(error)
}`,
);
}
}
if (errors.length > 0) {
fail(`OpenClaw package tarball integrity failed:\n${errors.join("\n")}`);
}
console.log("OpenClaw package tarball integrity passed.");

View File

@@ -30,6 +30,16 @@ function readEntrypoints() {
return new Set(entrypoints.filter((entry) => entry !== "index"));
}
function readPrivateLocalOnlySubpaths() {
const subpaths = JSON.parse(
readFileSync(
path.join(repoRoot, "scripts/lib/plugin-sdk-private-local-only-subpaths.json"),
"utf8",
),
);
return new Set(subpaths.filter((entry) => typeof entry === "string" && !entry.includes("/")));
}
function parsePluginSdkSubpath(specifier) {
if (!specifier.startsWith("openclaw/plugin-sdk/")) {
return null;
@@ -51,6 +61,7 @@ function compareEntries(left, right) {
async function collectViolations() {
const entrypoints = readEntrypoints();
const exports = readPackageExports();
const privateLocalOnlySubpaths = readPrivateLocalOnlySubpaths();
const files = (await collectTypeScriptFilesFromRoots(scanRoots, { includeTests: true })).toSorted(
(left, right) =>
normalizeRepoPath(repoRoot, left).localeCompare(normalizeRepoPath(repoRoot, right)),
@@ -72,6 +83,9 @@ async function collectViolations() {
if (!subpath) {
return;
}
if (privateLocalOnlySubpaths.has(subpath)) {
return;
}
const missingFrom = [];
if (!entrypoints.has(subpath)) {

View File

@@ -0,0 +1,27 @@
#!/usr/bin/env node
// Runs local workflow sanity checks.
// Uses an installed actionlint when present, otherwise falls back to `go run`
// for the pinned version used by CI, then runs repo-specific composite guards.
import { spawnSync } from "node:child_process";
const ACTIONLINT_VERSION = "1.7.11";
function commandExists(command) {
return spawnSync("bash", ["-lc", `command -v ${command}`], { stdio: "ignore" }).status === 0;
}
function run(command, args) {
const result = spawnSync(command, args, { stdio: "inherit" });
if (result.status !== 0) {
process.exit(result.status ?? 1);
}
}
if (commandExists("actionlint")) {
run("actionlint", []);
} else {
run("go", ["run", `github.com/rhysd/actionlint/cmd/actionlint@v${ACTIONLINT_VERSION}`]);
}
run("python3", ["scripts/check-composite-action-input-interpolation.py"]);
run("node", ["scripts/check-no-conflict-markers.mjs"]);

View File

@@ -0,0 +1,259 @@
#!/usr/bin/env node
// Builds cheap rerun commands from a Docker E2E GitHub run or local summary.
// For GitHub runs, the script downloads Docker E2E artifacts, reads
// summary/failures JSON, and prints targeted workflow commands that prepare a
// fresh OpenClaw tarball for the same ref before running only failed lanes.
import { spawnSync } from "node:child_process";
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
const DEFAULT_WORKFLOW = "openclaw-live-and-e2e-checks-reusable.yml";
function usage() {
return [
"Usage:",
" node scripts/docker-e2e-rerun.mjs <run-id|summary.json|failures.json> [--repo owner/repo] [--dir output-dir] [--workflow workflow.yml] [--ref ref]",
].join("\n");
}
function parseArgs(argv) {
const options = {
dir: "",
input: "",
ref: "",
repo: "",
workflow: DEFAULT_WORKFLOW,
};
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === "--repo") {
options.repo = argv[(index += 1)] ?? "";
} else if (arg?.startsWith("--repo=")) {
options.repo = arg.slice("--repo=".length);
} else if (arg === "--dir") {
options.dir = argv[(index += 1)] ?? "";
} else if (arg?.startsWith("--dir=")) {
options.dir = arg.slice("--dir=".length);
} else if (arg === "--workflow") {
options.workflow = argv[(index += 1)] ?? "";
} else if (arg?.startsWith("--workflow=")) {
options.workflow = arg.slice("--workflow=".length);
} else if (arg === "--ref") {
options.ref = argv[(index += 1)] ?? "";
} else if (arg?.startsWith("--ref=")) {
options.ref = arg.slice("--ref=".length);
} else if (!options.input) {
options.input = arg;
} else {
throw new Error(`unknown argument: ${arg}\n${usage()}`);
}
}
if (!options.input || !options.workflow) {
throw new Error(usage());
}
return options;
}
function run(command, args, options = {}) {
const result = spawnSync(command, args, {
encoding: "utf8",
stdio: options.stdio ?? ["ignore", "pipe", "pipe"],
});
if (result.status !== 0) {
throw new Error(
`${command} ${args.join(" ")} failed with ${result.status ?? result.signal}\n${result.stderr}`,
);
}
return result.stdout;
}
function readJson(file) {
return JSON.parse(fs.readFileSync(file, "utf8"));
}
function shellQuote(value) {
return `'${String(value).replaceAll("'", "'\\''")}'`;
}
function ghWorkflowCommand(lanes, ref, workflow) {
return [
"gh workflow run",
shellQuote(workflow),
"-f",
`ref=${shellQuote(ref)}`,
"-f",
"include_repo_e2e=false",
"-f",
"include_release_path_suites=false",
"-f",
"include_openwebui=false",
"-f",
`docker_lanes=${shellQuote(lanes.join(" "))}`,
"-f",
"include_live_suites=false",
"-f",
"live_models_only=false",
].join(" ");
}
function detectRepo() {
return JSON.parse(run("gh", ["repo", "view", "--json", "nameWithOwner"])).nameWithOwner;
}
function findFiles(rootDir, basenames, out = []) {
for (const entry of fs.readdirSync(rootDir, { withFileTypes: true })) {
const file = path.join(rootDir, entry.name);
if (entry.isDirectory()) {
findFiles(file, basenames, out);
} else if (basenames.has(entry.name)) {
out.push(file);
}
}
return out;
}
function failedLaneEntriesFromJson(file, ref, workflow) {
const parsed = readJson(file);
const source = path.basename(file);
if (source === "failures.json" && Array.isArray(parsed.lanes)) {
return parsed.lanes
.filter((lane) => lane.name)
.map((lane) => ({
ghWorkflowCommand: lane.ghWorkflowCommand,
lane: lane.name,
localRerunCommand: lane.rerunCommand,
logFile: lane.logFile,
source: file,
status: lane.status,
}));
}
const lanes = Array.isArray(parsed.lanes) ? parsed.lanes : [];
return lanes
.filter((lane) => lane.status !== 0 && lane.name)
.map((lane) => ({
ghWorkflowCommand: ghWorkflowCommand([lane.name], ref, workflow),
lane: lane.name,
localRerunCommand: lane.rerunCommand,
logFile: lane.logFile,
source: file,
status: lane.status,
}));
}
function mergeByLane(entries) {
const byLane = new Map();
for (const entry of entries) {
if (!byLane.has(entry.lane)) {
byLane.set(entry.lane, entry);
}
}
return [...byLane.values()].toSorted((left, right) => left.lane.localeCompare(right.lane));
}
function downloadDockerArtifacts(runId, repo, outputDir) {
fs.mkdirSync(outputDir, { recursive: true });
const artifacts = JSON.parse(
run("gh", [
"api",
`repos/${repo}/actions/runs/${runId}/artifacts?per_page=100`,
"--jq",
".artifacts",
]),
);
const names = artifacts
.filter((artifact) => !artifact.expired && artifact.name.startsWith("docker-e2e-"))
.map((artifact) => artifact.name);
if (names.length === 0) {
throw new Error(`No docker-e2e-* artifacts found for run ${runId}`);
}
for (const name of names) {
run(
"gh",
["run", "download", String(runId), "--repo", repo, "--name", name, "--dir", outputDir],
{
stdio: "inherit",
},
);
}
return names;
}
function runInfo(runId, repo) {
return JSON.parse(
run("gh", [
"run",
"view",
String(runId),
"--repo",
repo,
"--json",
"databaseId,headSha,headBranch,status,conclusion,url,workflowName",
]),
);
}
function printEntries(entries, ref, workflow, run) {
if (run) {
console.log(`Run: ${run.url}`);
console.log(`Workflow: ${run.workflowName}`);
}
console.log(`Ref: ${ref}`);
console.log(
"Targeted GitHub reruns prepare a fresh OpenClaw npm tarball for that ref before lane execution.",
);
if (entries.length === 0) {
console.log("No failed Docker E2E lanes found.");
return;
}
console.log(`Failed lanes: ${entries.map((entry) => entry.lane).join(", ")}`);
console.log("");
console.log("Combined GitHub rerun:");
console.log(
ghWorkflowCommand(
entries.map((entry) => entry.lane),
ref,
workflow,
),
);
console.log("");
console.log("Per-lane GitHub reruns:");
for (const entry of entries) {
console.log(
`- ${entry.lane}: ${entry.ghWorkflowCommand || ghWorkflowCommand([entry.lane], ref, workflow)}`,
);
}
console.log("");
console.log("Local rerun starting points:");
for (const entry of entries) {
if (entry.localRerunCommand) {
console.log(`- ${entry.lane}: ${entry.localRerunCommand}`);
}
}
}
const options = parseArgs(process.argv.slice(2));
const isLocalJson = fs.existsSync(options.input) && fs.statSync(options.input).isFile();
if (isLocalJson) {
const ref = options.ref || process.env.GITHUB_SHA || "HEAD";
printEntries(
mergeByLane(failedLaneEntriesFromJson(options.input, ref, options.workflow)),
ref,
options.workflow,
);
} else {
const repo = options.repo || detectRepo();
const run = runInfo(options.input, repo);
const ref = options.ref || run.headSha || run.headBranch;
const outputDir =
options.dir || path.join(os.tmpdir(), `openclaw-docker-e2e-rerun-${options.input}`);
const artifactNames = downloadDockerArtifacts(options.input, repo, outputDir);
const files = findFiles(outputDir, new Set(["failures.json", "summary.json"]));
const entries = mergeByLane(
files.flatMap((file) => failedLaneEntriesFromJson(file, ref, options.workflow)),
);
console.log(`Artifacts: ${artifactNames.join(", ")}`);
console.log(`Downloaded: ${outputDir}`);
printEntries(entries, ref, options.workflow, run);
}

View File

@@ -0,0 +1,130 @@
#!/usr/bin/env node
// Summarizes Docker E2E timing artifacts.
// Accepts scheduler summary.json or lane-timings.json so agents can see the
// slowest lanes and phase critical path before deciding what to rerun.
import fs from "node:fs";
function usage() {
return "Usage: node scripts/docker-e2e-timings.mjs <summary.json|lane-timings.json> [--limit N]";
}
function parseArgs(argv) {
const options = { file: "", limit: 12 };
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === "--limit") {
options.limit = Number(argv[(index += 1)] ?? "");
} else if (arg?.startsWith("--limit=")) {
options.limit = Number(arg.slice("--limit=".length));
} else if (!options.file) {
options.file = arg;
} else {
throw new Error(`unknown argument: ${arg}\n${usage()}`);
}
}
if (!options.file || !Number.isInteger(options.limit) || options.limit < 1) {
throw new Error(usage());
}
return options;
}
function readJson(file) {
return JSON.parse(fs.readFileSync(file, "utf8"));
}
function seconds(value) {
return typeof value === "number" && Number.isFinite(value) ? value : 0;
}
function durationBetween(startedAt, finishedAt) {
if (!startedAt || !finishedAt) {
return 0;
}
const started = Date.parse(startedAt);
const finished = Date.parse(finishedAt);
if (!Number.isFinite(started) || !Number.isFinite(finished) || finished < started) {
return 0;
}
return Math.round((finished - started) / 1000);
}
function summarizeSummary(summary, limit) {
const lanes = (Array.isArray(summary.lanes) ? summary.lanes : [])
.map((lane) => ({
imageKind: lane.imageKind ?? "",
name: lane.name,
seconds: seconds(lane.elapsedSeconds),
status: lane.status === 0 ? "pass" : `fail ${lane.status}`,
timedOut: lane.timedOut === true,
}))
.filter((lane) => lane.name)
.toSorted((left, right) => right.seconds - left.seconds || left.name.localeCompare(right.name));
const phases = (Array.isArray(summary.phases) ? summary.phases : [])
.map((phase) => ({
name: phase.name,
seconds: seconds(phase.elapsedSeconds),
status: phase.status ?? "",
}))
.filter((phase) => phase.name);
const wallSeconds = durationBetween(summary.startedAt, summary.finishedAt);
const totalLaneSeconds = lanes.reduce((total, lane) => total + lane.seconds, 0);
const criticalPathSeconds =
phases.reduce((total, phase) => total + phase.seconds, 0) ||
wallSeconds ||
lanes[0]?.seconds ||
0;
console.log(`Status: ${summary.status ?? "unknown"}`);
if (wallSeconds > 0) {
console.log(`Wall seconds: ${wallSeconds}`);
}
console.log(`Lane seconds total: ${totalLaneSeconds}`);
console.log(`Approx critical path seconds: ${criticalPathSeconds}`);
if (wallSeconds > 0 && totalLaneSeconds > 0) {
console.log(`Approx parallelism: ${(totalLaneSeconds / wallSeconds).toFixed(1)}x`);
}
if (phases.length > 0) {
console.log("");
console.log("Phases:");
for (const phase of phases.toSorted((left, right) => right.seconds - left.seconds)) {
console.log(`- ${phase.name}: ${phase.seconds}s ${phase.status}`);
}
}
console.log("");
console.log(`Slowest lanes (top ${Math.min(limit, lanes.length)}):`);
for (const lane of lanes.slice(0, limit)) {
console.log(
`- ${lane.name}: ${lane.seconds}s ${lane.status}${lane.timedOut ? " timeout" : ""}${
lane.imageKind ? ` image=${lane.imageKind}` : ""
}`,
);
}
}
function summarizeTimingStore(store, limit) {
const lanes = Object.entries(store.lanes ?? {})
.map(([name, lane]) => ({
name,
seconds: seconds(lane.durationSeconds),
status: lane.status === 0 ? "pass" : `fail ${lane.status}`,
updatedAt: lane.updatedAt ?? "",
}))
.toSorted((left, right) => right.seconds - left.seconds || left.name.localeCompare(right.name));
console.log(`Updated: ${store.updatedAt ?? "unknown"}`);
console.log(`Known lanes: ${lanes.length}`);
console.log("");
console.log(`Slowest lanes (top ${Math.min(limit, lanes.length)}):`);
for (const lane of lanes.slice(0, limit)) {
console.log(`- ${lane.name}: ${lane.seconds}s ${lane.status} ${lane.updatedAt}`.trim());
}
}
const options = parseArgs(process.argv.slice(2));
const payload = readJson(options.file);
if (Array.isArray(payload.lanes)) {
summarizeSummary(payload, options.limit);
} else if (payload.lanes && typeof payload.lanes === "object") {
summarizeTimingStore(payload, options.limit);
} else {
throw new Error(`Unsupported Docker E2E timing artifact: ${options.file}`);
}

103
scripts/docker-e2e.mjs Normal file
View File

@@ -0,0 +1,103 @@
// Docker E2E CI helper.
// Converts scheduler JSON into GitHub Actions outputs and compact markdown
// summaries so the workflow does not duplicate Docker E2E planning logic.
import fs from "node:fs";
function usage() {
return [
"Usage:",
" node scripts/docker-e2e.mjs github-outputs <plan.json>",
" node scripts/docker-e2e.mjs summary <summary.json> <title>",
" node scripts/docker-e2e.mjs failed-reruns <summary.json>",
].join("\n");
}
function readJson(file) {
return JSON.parse(fs.readFileSync(file, "utf8"));
}
function boolOutput(value) {
return value ? "1" : "0";
}
function githubOutputs(plan) {
const needs = plan.needs ?? {};
return [
`credentials=${(plan.credentials ?? []).join(",")}`,
`needs_bare_image=${boolOutput(needs.bareImage)}`,
`needs_e2e_image=${boolOutput(needs.e2eImage)}`,
`needs_functional_image=${boolOutput(needs.functionalImage)}`,
`needs_live_image=${boolOutput(needs.liveImage)}`,
`needs_package=${boolOutput(needs.package)}`,
];
}
function markdownCell(value) {
return String(value ?? "").replaceAll("|", "\\|");
}
function inlineCode(value) {
return `\`${String(value ?? "").replaceAll("`", "\\`")}\``;
}
function summaryMarkdown(summary, title) {
const lanes = Array.isArray(summary.lanes) ? summary.lanes : [];
const lines = [
`### ${title}`,
"",
`Status: ${inlineCode(summary.status)}`,
"",
"| Lane | Status | Seconds | Timed out | Rerun |",
"| --- | ---: | ---: | --- | --- |",
];
for (const lane of lanes) {
const status = lane.status === 0 ? "pass" : `fail ${lane.status}`;
lines.push(
`| ${inlineCode(lane.name)} | ${markdownCell(status)} | ${markdownCell(lane.elapsedSeconds)} | ${lane.timedOut ? "yes" : "no"} | ${inlineCode(lane.rerunCommand)} |`,
);
}
const phases = Array.isArray(summary.phases) ? summary.phases : [];
if (phases.length > 0) {
lines.push("", "| Phase | Seconds | Status | Image kind |", "| --- | ---: | --- | --- |");
for (const phase of phases) {
lines.push(
`| ${inlineCode(phase.name)} | ${markdownCell(phase.elapsedSeconds)} | ${markdownCell(phase.status)} | ${markdownCell(phase.imageKind)} |`,
);
}
}
const failedReruns = failedRerunCommands(summary);
if (failedReruns.length > 0) {
lines.push("", "Failed lane reruns:", "");
for (const command of failedReruns) {
lines.push(`- ${inlineCode(command)}`);
}
}
return lines.join("\n");
}
function failedRerunCommands(summary) {
const lanes = Array.isArray(summary.lanes) ? summary.lanes : [];
return lanes
.filter((lane) => lane.status !== 0 && lane.rerunCommand)
.map((lane) => lane.rerunCommand);
}
const [command, file, ...args] = process.argv.slice(2);
if (!command || !file) {
throw new Error(usage());
}
if (command === "github-outputs") {
process.stdout.write(`${githubOutputs(readJson(file)).join("\n")}\n`);
} else if (command === "summary") {
const title = args.join(" ").trim();
if (!title) {
throw new Error(usage());
}
process.stdout.write(`${summaryMarkdown(readJson(file), title)}\n`);
} else if (command === "failed-reruns") {
process.stdout.write(`${failedRerunCommands(readJson(file)).join("\n")}\n`);
} else {
throw new Error(`unknown command: ${command}\n${usage()}`);
}

View File

@@ -1,13 +1,12 @@
# syntax=docker/dockerfile:1.7
FROM node:24-bookworm-slim@sha256:b4687aef2571c632a1953695ce4d61d6462a7eda471fe6e272eebf0418f276ba
FROM node:24-bookworm-slim@sha256:e8e2e91b1378f83c5b2dd15f0247f34110e2fe895f6ca7719dbb780f929368eb
ENV COREPACK_ENABLE_DOWNLOAD_PROMPT=0
RUN --mount=type=cache,id=openclaw-cleanup-smoke-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-cleanup-smoke-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade -y --no-install-recommends \
&& apt-get install -y --no-install-recommends \
bash \
ca-certificates \

View File

@@ -1,11 +1,10 @@
# syntax=docker/dockerfile:1.7
FROM node:24-bookworm-slim@sha256:b4687aef2571c632a1953695ce4d61d6462a7eda471fe6e272eebf0418f276ba
FROM node:24-bookworm-slim@sha256:e8e2e91b1378f83c5b2dd15f0247f34110e2fe895f6ca7719dbb780f929368eb
RUN --mount=type=cache,id=openclaw-install-sh-e2e-apt-cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,id=openclaw-install-sh-e2e-apt-lists,target=/var/lib/apt,sharing=locked \
apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get upgrade -y --no-install-recommends \
&& apt-get install -y --no-install-recommends \
bash \
ca-certificates \

Some files were not shown because too many files have changed in this diff Show More