Compare commits

..

25 Commits

Author SHA1 Message Date
Patrick Erichsen
c4c364cd27 fix: mark approval gateway calls as runtime clients 2026-05-17 21:24:42 -07:00
Peter Steinberger
5980c0d807 fix: wrap Mac menu gateway errors 2026-05-18 05:21:19 +01:00
Ayaan Zaidi
1c778f7afb fix(telegram): repair desktop proof login 2026-05-18 09:49:21 +05:30
Peter Steinberger
84b34519a8 fix: preflight remote skill bin probes 2026-05-18 05:19:02 +01:00
Peter Steinberger
71ed6526b1 ci: reduce aggregate runner jobs 2026-05-18 04:53:40 +01:00
Peter Steinberger
8483d03375 fix(gateway): preserve spawned sessions in configured lists 2026-05-18 04:38:14 +01:00
Peter Steinberger
696b4863c3 chore: quiet autoreview default fallback 2026-05-18 04:37:19 +01:00
Vincent Koc
a642ca9a89 ci(qa-lab): schedule live token efficiency artifacts 2026-05-18 11:33:13 +08:00
Vincent Koc
1300b22630 fix(qa-lab): classify runtime token efficiency 2026-05-18 11:09:08 +08:00
Peter Steinberger
29653e4106 fix: harden Mac gateway transport selection 2026-05-18 04:06:17 +01:00
Peter Steinberger
1ba3368fa6 fix: clean up Mac settings sidebar controls 2026-05-18 04:06:17 +01:00
Vincent Koc
4dec9679e6 fix(qa-lab): gate missing runtime tool coverage 2026-05-18 11:00:20 +08:00
Ayaan Zaidi
1ab84b4327 docs(changelog): note telegram 421 retry (#48908) (thanks @MarsDoge) 2026-05-18 08:28:27 +05:30
Dongyan Qian
63b728de43 fix(telegram): retry 421 misdirected request responses
Treat Telegram HTTP 421 / Misdirected Request responses as retryable transport failures in both the default channel API retry policy and the strict outbound send retry path.

Wire the 421 handling into isSafeToRetrySendError so non-idempotent Telegram send operations can retry this edge-node rejection without enabling broad ambiguous network retries, and add regression coverage for the default retry path plus strict send predicate handling.
2026-05-18 08:28:27 +05:30
Vincent Koc
73ca3cf3c3 test: tolerate optional ACP cron live timeout 2026-05-18 10:55:13 +08:00
Peter Steinberger
11d7499db1 feat: extend autoreview fallback reviewers 2026-05-18 03:49:23 +01:00
Galin Iliev
ad55d486ce fix(github-copilot): sanitize unsafe reasoning replay ids (#83221)
Fixes #83220.
2026-05-17 19:48:27 -07:00
Gio Della-Libera
1b5bc33161 fix(doctor): archive legacy clawd browser profile residue (#83230)
* fix(doctor): archive legacy clawd browser profile residue

* Avoid browser cleanup load without residue

Doctor --fix now skips loading the browser doctor facade unless the legacy browser/clawd profile path exists, preventing broad config repair tests from paying the plugin load cost when there is nothing to archive.

* Use structured health check for browser residue

Register the legacy clawd browser profile residue cleanup through the modern doctor health-check contract so doctor --lint can report it and doctor --fix repairs it through structured effects.
2026-05-17 19:45:03 -07:00
Gio Della-Libera
bcbe8b6299 fix(codex): surface declined native tool replies (#83108) 2026-05-17 19:43:19 -07:00
Galin Iliev
bc4f27c89a ci: skip changelog-only workflow runs (#83215)
Summary
Problem: root CHANGELOG.md updates currently cause broad pull request and push workflow activity, including CI and workflow sanity fanout, even though changelog-only edits do not touch product, runtime, docs site, or workflow logic.
Why it matters: the PR workflow (review, prepare, and land) can add or adjust CHANGELOG.md entries while processing otherwise-ready PRs. Those changelog-only updates retrigger gates, delay landing, and create avoidable contention when several PRs are being landed close together.
What changed: CI now ignores pull requests whose only changed path is CHANGELOG.md; Workflow Sanity ignores changelog-only pull requests and main-branch pushes; Docs keeps its markdown/docs trigger but excludes root CHANGELOG.md from the push path set.
What did NOT change (scope boundary): metadata-only automation such as labelers, auto-response, real behavior proof, or external GitHub apps can still run on PR events because those workflows are event-driven rather than file-scope CI. Other markdown files, docs files, and workflow files still trigger their existing checks.
2026-05-17 19:29:45 -07:00
Ayaan Zaidi
6baa2b38b2 ci(mantis): make telegram proof skips public-safe 2026-05-18 07:54:11 +05:30
Peter Steinberger
48f7db23f0 fix: harden clawpatch-reported edge cases 2026-05-18 03:18:55 +01:00
Tak Hoffman
816fbe0cf0 chore(labels): cool label palette (#83374)
* chore(labels): cool label palette

* chore(labels): soften taxonomy colors

* chore(labels): finalize label palette

* chore(labels): harden final palette
2026-05-17 21:12:10 -05:00
Peter Steinberger
69cea57f69 fix(telegram): fail closed on missing topic threads (#83381)
* fix(telegram): fail closed on missing topic threads

* docs(changelog): reference telegram topic cleanup
2026-05-18 03:07:12 +01:00
Vincent Koc
58e1351863 fix(qa-lab): hard gate runtime tool coverage 2026-05-18 10:05:04 +08:00
163 changed files with 4624 additions and 1293 deletions

View File

@@ -7,6 +7,8 @@ description: "Autoreview closeout: local dirty changes, PR branch vs main, paral
Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing.
Codex native review mode performs best and is recommended. Non-Codex reviewers are fallback/second-opinion paths that receive a generated diff prompt, not the full Codex review-mode runtime.
Use when:
- user asks for Codex review / autoreview / second-model review
- after non-trivial code edits, before final/commit/ship
@@ -21,7 +23,7 @@ Use when:
- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
- Keep going until the selected review path returns no accepted/actionable findings.
- If a review-triggered fix changes code, rerun focused tests and rerun the review helper.
- Default to Codex review. If Codex is unavailable or exits with an error, the helper may fall back to `claude -p`; `pi -p` and `opencode run` are explicit reviewer/fallback options. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
- Default to Codex review. If Codex is unavailable or exits with an error, the helper falls back to the first configured CLI from `claude -p`, `pi -p`, `opencode run`, `droid exec`, or `copilot`. Prefer Codex for final closeout because it uses native review mode; non-Codex reviewers use a Codex-inspired generated diff prompt. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
- Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
@@ -107,8 +109,8 @@ The helper:
- otherwise uses `origin/main` for non-main branches
- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- supports `--reviewer codex|claude|pi|opencode|auto`; `auto` runs Codex first
- supports `--fallback-reviewer claude|pi|opencode|none`; default is `claude`
- supports `--reviewer codex|claude|pi|opencode|droid|copilot|auto`; `auto` means Codex first
- supports `--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none`; default is configured CLI fallback
- falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings
- writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set
- supports `--dry-run`, `--parallel-tests`, and commit refs

View File

@@ -10,14 +10,16 @@ Options:
Target selection. Default: auto.
--base REF Base ref for branch review. Default: PR base or origin/main.
--commit REF Commit ref for commit review. Default: HEAD.
--reviewer codex|claude|pi|opencode|auto
Review engine. Default: auto (Codex, fallback reviewer on error).
--fallback-reviewer claude|pi|opencode|none
Fallback when Codex is unavailable or exits nonzero. Default: claude.
--reviewer codex|claude|pi|opencode|droid|copilot|auto
Review engine. Default: Codex with configured fallback on error.
--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none
Fallback when Codex is unavailable or exits nonzero. Default: auto.
--codex-bin PATH Codex binary. Default: codex.
--claude-bin PATH Claude binary. Default: claude.
--pi-bin PATH Pi binary. Default: pi.
--opencode-bin PATH OpenCode binary. Default: opencode.
--droid-bin PATH Droid binary. Default: droid.
--copilot-bin PATH GitHub Copilot binary. Default: copilot.
--full-access Keep yolo/full-access mode enabled. Default.
--no-yolo Run nested Codex review with normal sandbox/approval prompts.
--output FILE Also save output to file.
@@ -37,11 +39,13 @@ mode=auto
base_ref=
commit_ref=HEAD
reviewer=${AUTOREVIEW_REVIEWER:-${CODEX_REVIEW_REVIEWER:-auto}}
fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-claude}}
fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-auto}}
codex_bin=${CODEX_BIN:-codex}
claude_bin=${CLAUDE_BIN:-claude}
pi_bin=${PI_BIN:-pi}
opencode_bin=${OPENCODE_BIN:-opencode}
droid_bin=${DROID_BIN:-droid}
copilot_bin=${COPILOT_BIN:-copilot}
codex_args=()
yolo=${AUTOREVIEW_YOLO:-${CODEX_REVIEW_YOLO:-1}}
output=${AUTOREVIEW_OUTPUT:-${CODEX_REVIEW_OUTPUT:-}}
@@ -86,6 +90,14 @@ while [[ $# -gt 0 ]]; do
opencode_bin=${2:-}
shift 2
;;
--droid-bin)
droid_bin=${2:-}
shift 2
;;
--copilot-bin)
copilot_bin=${2:-}
shift 2
;;
--full-access)
yolo=1
shift
@@ -131,7 +143,7 @@ case "$mode" in
esac
case "$reviewer" in
auto|codex|claude|pi|opencode) ;;
auto|codex|claude|pi|opencode|droid|copilot) ;;
*)
echo "invalid --reviewer: $reviewer" >&2
exit 2
@@ -139,7 +151,7 @@ case "$reviewer" in
esac
case "$fallback_reviewer" in
claude|pi|opencode|none) ;;
auto|claude|pi|opencode|droid|copilot|none) ;;
*)
echo "invalid --fallback-reviewer: $fallback_reviewer" >&2
exit 2
@@ -194,10 +206,17 @@ printf 'branch: %s\n' "${current_branch:-detached}"
if [[ -n "$pr_url" ]]; then
printf 'pr: %s\n' "$pr_url"
fi
printf 'reviewer: %s\n' "$reviewer"
if [[ "$reviewer" == auto ]]; then
printf 'fallback-reviewer: %s\n' "$fallback_reviewer"
printf 'reviewer: codex\n'
else
printf 'reviewer: %s\n' "$reviewer"
fi
case "$reviewer" in
codex|auto) ;;
*)
printf 'note: Codex native review mode is the recommended and best-supported review path; %s uses a generated diff prompt.\n' "$reviewer"
;;
esac
if [[ "$reviewer" == auto || "$reviewer" == codex ]]; then
printf 'review:'
printf ' %q' "${review_cmd[@]}"
@@ -284,10 +303,14 @@ Base: ${base_ref:-}
Commit: ${commit_ref:-}
Rules:
- Review only the diff below.
- Review the proposed code change as a closeout reviewer.
- Focus on the diff below. If your CLI exposes read-only repository tools, inspect surrounding code and tests to verify findings; never modify files.
- Do not modify files.
- Prioritize correctness bugs, regressions, security issues, and missing tests.
- Ignore speculative edge cases and broad rewrites.
- Report only discrete, actionable issues introduced by this change.
- Prioritize correctness, regressions, security, data loss, performance cliffs, and missing tests that would catch a real bug.
- Do not report pre-existing issues, speculative risks, broad rewrites, style nits, changelog gaps, or findings that depend on unstated assumptions.
- Identify the concrete scenario where the issue appears, and keep the line reference as small as possible.
- A finding should overlap changed code or clearly cite changed code as the cause.
- For each accepted/actionable finding, use exactly this format:
[P<0-3>] Short title
File: path:line
@@ -302,8 +325,15 @@ EOF
} > "$prompt_file" || return
}
reviewer_output_has_clean_marker() {
local path=$1
grep -Eq '^[^[:alnum:]]*autoreview clean: no accepted/actionable findings reported[[:space:]]*$' "$path"
}
run_prompt_reviewer() {
local selected=$1
local copilot_prompt=
local prompt_bytes=0
local reviewer_output
local status=0
@@ -343,13 +373,46 @@ run_prompt_reviewer() {
echo "fallback reviewer unavailable: $opencode_bin" >&2
status=127
elif printf 'fallback: opencode run\n' | tee -a "$review_output"; then
"$opencode_bin" run --pure --dir "$(dirname "$prompt_file")" --file "$prompt_file" \
"Review the attached prompt file. Do not modify files." 2>&1 | tee -a "$review_output" "$reviewer_output"
"$opencode_bin" run --pure --dir "$repo_root" \
"Review the attached prompt file. Do not modify files." \
--file "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output"
status=$?
else
status=$?
fi
;;
droid)
if ! command -v "$droid_bin" >/dev/null 2>&1; then
echo "fallback reviewer unavailable: $droid_bin" >&2
status=127
elif printf 'fallback: droid exec\n' | tee -a "$review_output"; then
"$droid_bin" exec --cwd "$repo_root" -f "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output"
status=$?
else
status=$?
fi
;;
copilot)
if ! command -v "$copilot_bin" >/dev/null 2>&1; then
echo "fallback reviewer unavailable: $copilot_bin" >&2
status=127
elif printf 'fallback: copilot\n' | tee -a "$review_output"; then
prompt_bytes=$(wc -c < "$prompt_file" | tr -d '[:space:]')
if (( prompt_bytes > 120000 )); then
echo "copilot reviewer unavailable: generated prompt is too large for copilot -p; use codex, droid, or another file/stdin-capable reviewer" \
2>&1 | tee -a "$review_output" "$reviewer_output"
status=1
else
copilot_prompt=$(< "$prompt_file")
"$copilot_bin" -C "$repo_root" --available-tools=none --stream off --output-format text --silent \
-p "$copilot_prompt" \
2>&1 | tee -a "$review_output" "$reviewer_output"
status=$?
fi
else
status=$?
fi
;;
*)
echo "unsupported prompt reviewer: $selected" >&2
status=2
@@ -360,7 +423,7 @@ run_prompt_reviewer() {
status=1
elif ! grep -q '[^[:space:]]' "$reviewer_output"; then
status=1
elif ! grep -Fxq 'autoreview clean: no accepted/actionable findings reported' "$reviewer_output"; then
elif ! reviewer_output_has_clean_marker "$reviewer_output"; then
status=1
fi
fi
@@ -380,7 +443,7 @@ run_selected_review() {
fi
run_review
;;
claude|pi|opencode)
claude|pi|opencode|droid|copilot)
run_prompt_reviewer "$selected"
;;
*)
@@ -390,6 +453,36 @@ run_selected_review() {
esac
}
fallback_reviewer_is_available() {
local selected=$1
case "$selected" in
claude) command -v "$claude_bin" >/dev/null 2>&1 ;;
pi) command -v "$pi_bin" >/dev/null 2>&1 ;;
opencode) command -v "$opencode_bin" >/dev/null 2>&1 ;;
droid) command -v "$droid_bin" >/dev/null 2>&1 ;;
copilot) command -v "$copilot_bin" >/dev/null 2>&1 ;;
*) return 1 ;;
esac
}
run_auto_fallback_review() {
local selected
if [[ "$fallback_reviewer" != auto ]]; then
run_selected_review "$fallback_reviewer"
return $?
fi
for selected in claude pi opencode droid copilot; do
if fallback_reviewer_is_available "$selected"; then
run_selected_review "$selected"
return $?
fi
done
echo "fallback reviewer unavailable: no configured fallback CLI found" >&2
return 127
}
run_auto_review() {
run_selected_review codex
local status=$?
@@ -405,8 +498,12 @@ run_auto_review() {
if [[ "$fallback_reviewer" == none ]]; then
return "$status"
fi
printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2
run_selected_review "$fallback_reviewer"
if [[ "$fallback_reviewer" == auto ]]; then
printf 'autoreview warning: codex exited %s; trying configured fallback reviewers\n' "$status" >&2
else
printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2
fi
run_auto_fallback_review
}
elapsed_since() {

View File

@@ -16,8 +16,11 @@ Hard limits:
- Do not finish with tiny, cropped-wrong, off-bottom, or sidebar-heavy GIFs.
- Do not invent a generic proof. The proof must match the PR behavior.
- Do not force GIFs for internal-only, workflow-only, test-only, docs-only, or
otherwise non-visual PRs. A no-visual-proof manifest is a successful outcome
when GIFs would be misleading.
otherwise non-visual PRs. A no-visual-proof manifest is a successful workflow
outcome when GIFs would be misleading, but it is not proof that the PR passed.
- Keep public-facing manifest summaries short and user-domain. Do not mention
harness internals, mock-provider limits, secret/trust boundaries, local paths,
transcript seeding, or workflow implementation details in the summary.
Inputs are provided as environment variables:
@@ -42,9 +45,10 @@ Required workflow:
before/after. If it does not, write
`${MANTIS_OUTPUT_DIR}/mantis-evidence.json` with `comparison.pass: true`, no
artifacts, and a summary that starts with
`Mantis did not generate before/after GIFs because`. Include the concrete
reason in the summary. Use this manifest shape and do not create worktrees
or start Crabbox for this case:
`Mantis did not generate before/after GIFs because`. Include a short
public reason, such as `the PR changes internal session bookkeeping rather
than Telegram-visible behavior`. Use this manifest shape and do not create
worktrees or start Crabbox for this case:
```json
{
@@ -73,6 +77,14 @@ Required workflow:
}
```
If the PR appears visual but proof is blocked by Telegram Desktop session
state, authorization, credentials, Crabbox, or another capture-infrastructure
issue, do not describe it as a no-visual PR. Write a manifest with
`comparison.pass: false`, skipped lanes, no artifacts, and a summary that
starts with `Mantis could not capture Telegram Desktop proof because`. The
publisher will keep that out of PR comments so the failure stays in the
workflow logs and artifacts.
4. Decide what Telegram message, mock model response, command, callback, button,
media, or sequence best proves the PR. Use `MANTIS_INSTRUCTIONS` as extra
maintainer guidance, not as a replacement for reading the PR.
@@ -134,4 +146,6 @@ Expected final state:
`Main` and `This PR`.
- No-visual-proof manifests contain no artifacts and have `comparison.pass:
true`.
- Capture-infrastructure failure manifests contain no artifacts and have
`comparison.pass: false`.
- The worktree can be dirty only under `.artifacts/`.

View File

@@ -20,6 +20,8 @@ on:
- "docs/**"
pull_request:
types: [opened, reopened, synchronize, ready_for_review, converted_to_draft]
paths-ignore:
- "CHANGELOG.md"
permissions:
contents: read
@@ -641,6 +643,15 @@ jobs:
echo "${name}-result=${results[$name]}" >> "$GITHUB_OUTPUT"
done
failures=0
for name in channels core-support-boundary gateway-watch; do
if [ "${results[$name]}" = "failure" ]; then
echo "::error title=${name} failed::${name} failed"
failures=1
fi
done
exit "$failures"
- name: Upload gateway watch regression artifacts
if: always() && needs.preflight.outputs.run_check_additional == 'true'
uses: actions/upload-artifact@v7
@@ -828,28 +839,6 @@ jobs:
EOF
OPENCLAW_VITEST_INCLUDE_FILE="$include_file" pnpm test:contracts:plugins
checks-fast-plugin-contracts:
permissions:
contents: read
name: checks-fast-contracts-plugins
needs: [preflight, checks-fast-plugin-contracts-shard]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_plugin_contracts_shards == 'true' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify plugin contract shards
env:
SHARD_RESULT: ${{ needs.checks-fast-plugin-contracts-shard.result }}
run: |
if [ "$SHARD_RESULT" = "cancelled" ]; then
echo "Plugin contract shards were cancelled, usually because a newer commit superseded this run." >&2
exit 1
fi
if [ "$SHARD_RESULT" != "success" ]; then
echo "Plugin contract shards failed: $SHARD_RESULT" >&2
exit 1
fi
checks-fast-channel-contracts-shard:
permissions:
contents: read
@@ -934,28 +923,6 @@ jobs:
EOF
OPENCLAW_VITEST_INCLUDE_FILE="$include_file" pnpm test:contracts:channels
checks-fast-channel-contracts:
permissions:
contents: read
name: checks-fast-contracts-channels
needs: [preflight, checks-fast-channel-contracts-shard]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks_fast == 'true' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify channel contract shards
env:
SHARD_RESULT: ${{ needs.checks-fast-channel-contracts-shard.result }}
run: |
if [ "$SHARD_RESULT" = "cancelled" ]; then
echo "Channel contract shards were cancelled, usually because a newer commit superseded this run." >&2
exit 1
fi
if [ "$SHARD_RESULT" != "success" ]; then
echo "Channel contract shards failed: $SHARD_RESULT" >&2
exit 1
fi
checks-fast-protocol:
permissions:
contents: read
@@ -1021,38 +988,6 @@ jobs:
- name: Run protocol check
run: pnpm protocol:check
checks:
permissions:
contents: read
name: ${{ matrix.check_name }}
needs: [preflight, build-artifacts]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks == 'true' && needs.build-artifacts.result == 'success' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.preflight.outputs.checks_matrix) }}
steps:
- name: Verify ${{ matrix.task }} (${{ matrix.runtime }})
env:
TASK: ${{ matrix.task }}
CHANNELS_RESULT: ${{ needs.build-artifacts.outputs['channels-result'] }}
shell: bash
run: |
set -euo pipefail
case "$TASK" in
channels)
if [ "$CHANNELS_RESULT" != "success" ]; then
echo "Channel tests failed in build-artifacts: $CHANNELS_RESULT" >&2
exit 1
fi
;;
*)
echo "Unsupported checks task: $TASK" >&2
exit 1
;;
esac
checks-node-compat:
permissions:
contents: read
@@ -1240,63 +1175,6 @@ jobs:
}
EOF
checks-node-core-test-dist-shard:
permissions:
contents: read
name: ${{ matrix.check_name }}
needs: [preflight, build-artifacts]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks_node_core_dist == 'true' && needs.build-artifacts.result == 'success' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
strategy:
fail-fast: false
matrix: ${{ fromJson(needs.preflight.outputs.checks_node_core_dist_matrix) }}
steps:
- name: Verify Node test shard
env:
CORE_SUPPORT_BOUNDARY_RESULT: ${{ needs.build-artifacts.outputs['core-support-boundary-result'] }}
SHARD_NAME: ${{ matrix.shard_name }}
shell: bash
run: |
set -euo pipefail
case "$SHARD_NAME" in
core-support-boundary)
if [ "$CORE_SUPPORT_BOUNDARY_RESULT" != "success" ]; then
echo "Core support boundary shard failed in build-artifacts: $CORE_SUPPORT_BOUNDARY_RESULT" >&2
exit 1
fi
;;
*)
echo "Unsupported built-artifact shard: $SHARD_NAME" >&2
exit 1
;;
esac
checks-node-core-test:
permissions:
contents: read
name: checks-node-core
needs: [preflight, checks-node-core-test-nondist-shard, checks-node-core-test-dist-shard]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks == 'true' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify node test shards
env:
DIST_SHARD_RESULT: ${{ needs.checks-node-core-test-dist-shard.result }}
NONDIST_SHARD_RESULT: ${{ needs.checks-node-core-test-nondist-shard.result }}
RUN_DIST_SHARDS: ${{ needs.preflight.outputs.run_checks_node_core_dist }}
RUN_NONDIST_SHARDS: ${{ needs.preflight.outputs.run_checks_node_core_nondist }}
run: |
if [ "$RUN_NONDIST_SHARDS" = "true" ] && [ "$NONDIST_SHARD_RESULT" != "success" ]; then
echo "Node non-dist test shards failed: $NONDIST_SHARD_RESULT" >&2
exit 1
fi
if [ "$RUN_DIST_SHARDS" = "true" ] && [ "$DIST_SHARD_RESULT" != "success" ]; then
echo "Node dist test shards failed: $DIST_SHARD_RESULT" >&2
exit 1
fi
# Types, lint, and format check shards.
check-shard:
permissions:
@@ -1442,24 +1320,6 @@ jobs:
path: .artifacts/deadcode
if-no-files-found: ignore
check:
permissions:
contents: read
name: "check"
needs: [preflight, check-shard]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_check == 'true' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify check shards
env:
SHARD_RESULT: ${{ needs.check-shard.result }}
run: |
if [ "$SHARD_RESULT" != "success" ]; then
echo "Check shards failed: $SHARD_RESULT" >&2
exit 1
fi
check-additional-shard:
permissions:
contents: read
@@ -1637,52 +1497,6 @@ jobs:
exit "$failures"
check-additional:
permissions:
contents: read
name: "check-additional"
needs: [preflight, check-additional-shard, build-artifacts]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_check_additional == 'true' }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify additional check shards
env:
SHARD_RESULT: ${{ needs.check-additional-shard.result }}
BUILD_ARTIFACTS_RESULT: ${{ needs.build-artifacts.result }}
GATEWAY_RESULT: ${{ needs.build-artifacts.outputs.gateway-watch-result }}
run: |
if [ "$SHARD_RESULT" != "success" ]; then
echo "Additional check shards failed: $SHARD_RESULT" >&2
exit 1
fi
if [ "$BUILD_ARTIFACTS_RESULT" != "success" ]; then
echo "Build artifact job failed: $BUILD_ARTIFACTS_RESULT" >&2
exit 1
fi
if [ "$GATEWAY_RESULT" != "success" ]; then
echo "Gateway topology check failed: $GATEWAY_RESULT" >&2
exit 1
fi
build-smoke:
permissions:
contents: read
name: "build-smoke"
needs: [preflight, build-artifacts]
if: ${{ !cancelled() && always() && needs.preflight.outputs.run_build_smoke == 'true' && (github.event_name != 'push' || needs.build-artifacts.result == 'success') }}
runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
timeout-minutes: 5
steps:
- name: Verify build smoke
env:
BUILD_ARTIFACTS_RESULT: ${{ needs.build-artifacts.result }}
run: |
if [ "$BUILD_ARTIFACTS_RESULT" != "success" ]; then
echo "Build smoke checks failed in build-artifacts: $BUILD_ARTIFACTS_RESULT" >&2
exit 1
fi
# Validate docs (format, lint, broken links) only when docs files changed.
check-docs:
permissions:

View File

@@ -6,6 +6,7 @@ on:
paths:
- "**/*.md"
- "docs/**"
- "!CHANGELOG.md"
permissions:
contents: read

View File

@@ -955,6 +955,57 @@ jobs:
retention-days: 14
if-no-files-found: warn
runtime_tool_coverage_release_checks:
name: Enforce QA Lab runtime tool coverage
needs: [resolve_target, qa_lab_runtime_parity_release_checks]
if: always() && contains(fromJSON('["all","qa","qa-parity"]'), needs.resolve_target.outputs.rerun_group)
runs-on: ubuntu-24.04
timeout-minutes: 15
permissions:
contents: read
actions: read
env:
OPENCLAW_BUILD_PRIVATE_QA: "1"
OPENCLAW_ENABLE_PRIVATE_QA_CLI: "1"
steps:
- name: Checkout selected ref
uses: actions/checkout@v6
with:
persist-credentials: false
ref: ${{ needs.resolve_target.outputs.revision }}
fetch-depth: 1
- name: Setup Node environment
uses: ./.github/actions/setup-node-env
with:
node-version: ${{ env.NODE_VERSION }}
pnpm-version: ${{ env.PNPM_VERSION }}
install-bun: "true"
- name: Download runtime parity artifacts
uses: actions/download-artifact@v4
with:
name: release-qa-runtime-parity-${{ needs.resolve_target.outputs.revision }}
path: .artifacts/qa-e2e/
- name: Enforce standard runtime tool coverage
run: |
set -euo pipefail
pnpm openclaw qa coverage \
--repo-root . \
--tools \
--summary .artifacts/qa-e2e/runtime-parity-standard/qa-suite-summary.json \
--output .artifacts/qa-e2e/runtime-parity-standard-report/qa-runtime-tool-coverage-report.md
- name: Upload runtime tool coverage artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: release-qa-runtime-tool-coverage-${{ needs.resolve_target.outputs.revision }}
path: .artifacts/qa-e2e/runtime-parity-standard-report/
retention-days: 14
if-no-files-found: warn
qa_live_matrix_release_checks:
name: Run QA Lab live Matrix lane
needs: [resolve_target]
@@ -1434,6 +1485,7 @@ jobs:
- qa_lab_parity_lane_release_checks
- qa_lab_parity_report_release_checks
- qa_lab_runtime_parity_release_checks
- runtime_tool_coverage_release_checks
- qa_live_matrix_release_checks
- qa_live_telegram_release_checks
- qa_live_discord_release_checks
@@ -1465,6 +1517,7 @@ jobs:
"qa_lab_parity_lane_release_checks=${{ needs.qa_lab_parity_lane_release_checks.result }}" \
"qa_lab_parity_report_release_checks=${{ needs.qa_lab_parity_report_release_checks.result }}" \
"qa_lab_runtime_parity_release_checks=${{ needs.qa_lab_runtime_parity_release_checks.result }}" \
"runtime_tool_coverage_release_checks=${{ needs.runtime_tool_coverage_release_checks.result }}" \
"qa_live_matrix_release_checks=${{ needs.qa_live_matrix_release_checks.result }}" \
"qa_live_telegram_release_checks=${{ needs.qa_live_telegram_release_checks.result }}" \
"qa_live_discord_release_checks=${{ needs.qa_live_discord_release_checks.result }}" \

View File

@@ -229,6 +229,96 @@ jobs:
retention-days: 14
if-no-files-found: warn
run_live_runtime_token_efficiency:
name: Run live runtime token-efficiency lane
needs: [authorize_actor, validate_selected_ref]
if: github.event_name == 'schedule'
runs-on: blacksmith-8vcpu-ubuntu-2404
timeout-minutes: 45
environment: qa-live-shared
env:
QA_PARITY_CONCURRENCY: "1"
OPENCLAW_QA_TRANSPORT_READY_TIMEOUT_MS: "180000"
OPENCLAW_QA_REDACT_PUBLIC_METADATA: "1"
steps:
- name: Checkout selected ref
uses: actions/checkout@v6
with:
persist-credentials: false
ref: ${{ needs.validate_selected_ref.outputs.selected_revision }}
fetch-depth: 1
- name: Setup Node environment
uses: ./.github/actions/setup-node-env
with:
node-version: ${{ env.NODE_VERSION }}
pnpm-version: ${{ env.PNPM_VERSION }}
install-bun: "true"
- name: Validate required QA credential env
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
shell: bash
run: |
set -euo pipefail
if [[ -z "${OPENAI_API_KEY:-}" ]]; then
echo "Missing required OPENAI_API_KEY." >&2
exit 1
fi
- name: Build private QA runtime
env:
NODE_OPTIONS: --max-old-space-size=8192
run: pnpm build
- name: Run live runtime parity lane
id: run_lane
shell: bash
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENCLAW_LIVE_OPENAI_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
set -euo pipefail
output_dir=".artifacts/qa-e2e/runtime-token-efficiency-live-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
echo "output_dir=${output_dir}" >> "$GITHUB_OUTPUT"
pnpm openclaw qa suite \
--repo-root . \
--provider-mode live-frontier \
--runtime-parity-tier standard \
--runtime-parity-tier live-only \
--concurrency "${QA_PARITY_CONCURRENCY}" \
--model "${OPENCLAW_CI_OPENAI_MODEL}" \
--alt-model "${OPENCLAW_CI_OPENAI_MODEL}" \
--runtime-pair pi,codex \
--fast \
--allow-failures \
--output-dir "${output_dir}/runtime-suite"
- name: Generate live runtime token-efficiency report
if: always() && steps.run_lane.outcome != 'skipped' && steps.run_lane.outcome != 'cancelled'
shell: bash
run: |
set -euo pipefail
pnpm openclaw qa parity-report \
--repo-root . \
--runtime-axis \
--token-efficiency \
--summary "${{ steps.run_lane.outputs.output_dir }}/runtime-suite/qa-suite-summary.json" \
--output-dir "${{ steps.run_lane.outputs.output_dir }}/runtime-report"
- name: Upload live runtime token-efficiency artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: qa-live-runtime-token-efficiency-${{ github.run_id }}-${{ github.run_attempt }}
path: ${{ steps.run_lane.outputs.output_dir }}
retention-days: 14
if-no-files-found: warn
run_live_matrix:
name: Run Matrix live QA lane
needs: [authorize_actor, validate_selected_ref]

View File

@@ -2,8 +2,12 @@ name: Workflow Sanity
on:
pull_request:
paths-ignore:
- "CHANGELOG.md"
push:
branches: [main]
paths-ignore:
- "CHANGELOG.md"
workflow_dispatch:
permissions:

View File

@@ -24,15 +24,26 @@ Docs: https://docs.openclaw.ai
- QA-Lab: add live-only harness self-health scenarios for plugin hook crashes, manifest contract errors, and WebChat direct-reply self-message routing. (#80323) Thanks @100yenadmin.
- QA-Lab: add runtime tool fixture scenarios and coverage reporting for Codex-native workspace tools, OpenClaw dynamic tools, and optional plugin-backed tools. Fixes #80173. Thanks @100yenadmin.
- QA-Lab: expose runtime tool fixture coverage through `openclaw qa coverage --tools`, with optional suite-summary evaluation for parity gate artifacts. Thanks @100yenadmin.
- QA-Lab: schedule a live-frontier Codex-vs-Pi runtime token-efficiency artifact lane in the all-lanes QA workflow. Fixes #80175. Thanks @100yenadmin.
- QA-Lab: hard-gate required OpenClaw dynamic runtime-tool drift in the standard Codex-vs-Pi tier with a blocking release-check verifier and publish the tool coverage report artifact. Fixes #80339; refs #80319. Thanks @100yenadmin.
- QA-Lab: add the personal-agent approval-denial scenario so the benchmark pack verifies denied local reads stop cleanly without tool progress or fixture leaks. (#83150) Thanks @iFiras-Max1.
### Fixes
- Gateway/skills: preflight remote macOS skill-bin refreshes with a WebSocket connectivity check so stale node sessions skip quickly instead of logging slow `system.which` timeout warnings.
- GitHub Copilot: drop unsafe native Responses reasoning replay items with non-replayable IDs before dispatch, preventing affected Copilot sessions from failing with `invalid_request_body`. Fixes #83220. Thanks @galiniliev.
- QA-Lab: make runtime tool coverage fail on missing required tool exercise instead of treating pass/pass parity envelope drift as missing coverage.
- Core/plugins: harden clawpatch-reported edge cases across gateway auth cleanup, Claude session id paths, plugin activation policy, apply-patch hunk handling, diagnostic redaction, and plugin metadata validation.
- Mac app: prefer explicit private/Tailscale/LAN Gateway endpoints over SSH tunnels, preserve legacy loopback tunnel configs, persist transport choices, and show captured SSH stderr when tunneling really fails.
- Gateway/sessions: keep ACP/acpx and runtime child sessions visible in configured-only session lists when their owner or parent session belongs to a configured agent.
- Mac app: keep app-level menu commands and Dashboard failure states reachable when the remote Gateway is disconnected, and keep the Settings sidebar toggle in the leading titlebar area.
- Mac app: allow longer Gateway and Context errors to wrap in the menu instead of truncating the useful failure detail.
- Gateway/webchat: hide internal runtime-context and other `display: false` transcript messages from Chat history and live message events. Fixes #83216. Thanks @EmpireCreator.
- CLI/help: keep `gateway`, `doctor`, `status`, and `health` help registration out of action/runtime imports so subcommand `--help` stays lightweight in constrained terminals. Fixes #83228. Thanks @dfguerrerom.
- Cron/Discord: keep explicit announce runs in message-tool-only source-reply mode so scheduled agent turns post once instead of also echoing through automatic visible replies. Fixes #83261. Thanks @Theralley.
- Telegram: preserve forum-topic origin targets in inbound, audio-preflight, and skipped-message hook contexts so follow-up delivery stays bound to the originating topic. Fixes #83302. Thanks @M00zyx.
- Telegram: retry HTTP 421 Misdirected Request send failures on a fresh fallback transport so transient edge-node routing errors no longer drop outbound replies. Fixes #48892. (#48908) Thanks @MarsDoge.
- Telegram: fail topic sends closed when Telegram reports `message thread not found` instead of retrying without `message_thread_id` into the base chat. Refs #83302.
- Mac app: align the Sessions settings pane with the standard Settings page gutter and row spacing.
- OpenAI/Codex: stop rejecting available `openai-codex` GPT-5.1, GPT-5.2, and GPT-5.3 model refs during config validation, while keeping removed Spark aliases suppressed. Fixes #83303.
- Plugins/xAI: complete OAuth-backed xAI login and sidecar auth fixes, including guarded loopback callback CORS handling, video generation polling/defaults, and native-host User-Agent attribution. (#83322) Thanks @Jaaneek.
@@ -46,6 +57,7 @@ Docs: https://docs.openclaw.ai
- Agents/subagents: require the initial subagent registry save before reporting spawn accepted, returning a spawn error instead of losing an untracked run when the registry write fails. (#83146) Thanks @yetval.
- QA-Lab/qa-channel: attach redacted agent tool-start traces to outbound `QaBusMessage` records so scenarios can assert actual tool use instead of relying only on reply text. Fixes #67637. Thanks @100yenadmin.
- QA-Lab: fail live runtime parity reports when assistant-message usage is missing, preventing `0 vs 0` live token rows from being reported as passing proof. Fixes #80411. Thanks @100yenadmin.
- QA-Lab: add a runtime token-efficiency sidecar report that classifies Codex savings separately from regressions and fails only positive Codex-over-Pi live token deltas above threshold. Fixes #81093. Thanks @100yenadmin.
- QA-Lab: fail Codex-backed OpenAI live runtime-pair runs before launching isolated workers when no portable Codex auth is available, while staging API-key fallbacks and configured Codex keys for isolated QA agents. Fixes #80412. Thanks @100yenadmin.
- QA-Lab: refresh parity gates, mock frontier fixtures, model scenarios, and workflow artifact lanes to compare GPT-5.5 against Claude Opus 4.7. Fixes #74262. Thanks @100yenadmin.
- QA-Lab: make mock parity dispatch provider-aware for source discovery and subagent scenarios so OpenAI and Anthropic lanes no longer share identical canned plans. Fixes #64879. Thanks @100yenadmin.
@@ -81,12 +93,6 @@ Docs: https://docs.openclaw.ai
- Agents/OpenAI: stop post-processing GPT-5 final replies with hardcoded brevity caps, preserving full channel responses instead of appending synthetic ellipses, and log when strict-agentic GPT-5 execution activates. Fixes #82910.
- Mac app: refine the Settings General and Connection panes with cleaner status panels, card rows, and a single native titlebar sidebar toggle.
- Agents/media: deliver failed async image, music, and video generation completions directly when requester-session completion handoff fails, so channel users see provider errors instead of silent fallback stalls.
- CLI/setup: reject invalid `openclaw configure --section` values before opening the full wizard and show config issue details when non-interactive setup is blocked by invalid config.
- CLI/channels: reject unknown `openclaw channels logs --channel` values and invalid `--lines` values instead of silently showing all/default logs.
- CLI/agent: reject `--timeout` values with junk suffixes or fractions instead of partially parsing them.
- CLI/sessions: reject `--active` values with junk suffixes instead of partially parsing them.
- CLI/models: reject fractional `models scan --max-candidates` and `--concurrency` values before starting a scan.
- Config: label root-level `${VAR}` substitution failures as `<root>` instead of printing a blank config path.
- Agents/music: steer song, jingle, beat, anthem, and instrumental requests toward `music_generate` audio creation instead of lyric-only replies, and reserve `lyrics` for exact sung words.
- Codex app-server: record native Codex tool calls and results into trajectory artifacts so debug/trajectory exports capture the full Codex-native tool history, not just OpenClaw-bridged turns. Thanks @vyctorbrzezowski.
- Codex/app-server: keep bound conversation sessions on the owning agent runtime so native Codex control and follow-up turns do not fall back to the default agent client. Fixes #82954. (#82993)

View File

@@ -363,9 +363,11 @@ final class AppState {
}
let configRoot = OpenClawConfigFile.loadDict()
let configRemoteUrl = GatewayRemoteConfig.resolveUrlString(root: configRoot)
let configRemoteToken = GatewayRemoteConfig.resolveTokenValue(root: configRoot)
let configRemoteTransport = GatewayRemoteConfig.resolveTransport(root: configRoot)
let configRemoteResolution = GatewayRemoteConfig.resolveTransportResolution(root: configRoot)
let configRemoteTransport = configRemoteResolution.transport
let configRemoteUrl = configRemoteResolution.directURL?.absoluteString
?? GatewayRemoteConfig.resolveUrlString(root: configRoot)
let resolvedConnectionMode = ConnectionModeResolver.resolve(root: configRoot).mode
self.remoteTransport = configRemoteTransport
self.connectionMode = resolvedConnectionMode
@@ -532,7 +534,10 @@ final class AppState {
}
case .ssh:
changed = Self.updateGatewayString(&remote, key: "transport", value: nil) || changed
changed = Self.updateGatewayString(
&remote,
key: "transport",
value: RemoteTransport.ssh.rawValue) || changed
let sanitizedTarget = Self.sanitizeSSHTarget(draft.remoteTarget)
let expectedRemoteHost = CommandResolver.parseSSHTarget(sanitizedTarget)?.host ?? draft.remoteHost
@@ -576,7 +581,8 @@ final class AppState {
let hasRemoteUrl = !(remoteUrl?
.trimmingCharacters(in: .whitespacesAndNewlines)
.isEmpty ?? true)
let remoteTransport = GatewayRemoteConfig.resolveTransport(root: root)
let remoteResolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
let remoteTransport = remoteResolution.transport
let desiredMode: ConnectionMode? = switch modeRaw {
case "local":
@@ -600,7 +606,7 @@ final class AppState {
if remoteTransport != self.remoteTransport {
self.remoteTransport = remoteTransport
}
let remoteUrlText = remoteUrl ?? ""
let remoteUrlText = remoteResolution.directURL?.absoluteString ?? remoteUrl ?? ""
if remoteUrlText != self.remoteUrl {
self.remoteUrl = remoteUrlText
}

View File

@@ -23,7 +23,7 @@ struct ContextRootMenuLabelView: View {
if self.usesStackedLayout {
self.subtitleText
.lineLimit(3)
.lineLimit(5)
.fixedSize(horizontal: false, vertical: true)
}
}

View File

@@ -265,9 +265,10 @@ final class ControlChannel {
private static func isLikelyLocalNetworkPermissionBlock() -> Bool {
let root = OpenClawConfigFile.loadDict()
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
guard ConnectionModeResolver.resolve(root: root).mode == .remote,
GatewayRemoteConfig.resolveTransport(root: root) == .direct,
let url = GatewayRemoteConfig.resolveGatewayUrl(root: root),
resolution.transport == .direct,
let url = resolution.directURL,
url.scheme?.lowercased() == "ws",
let host = url.host,
GatewayRemoteConfig.isTrustedPlaintextRemoteHost(host),

View File

@@ -115,9 +115,10 @@ final class DashboardManager {
private func immediateDashboardConfig(mode: AppState.ConnectionMode) -> GatewayConnection.Config? {
let root = OpenClawConfigFile.loadDict()
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
if mode == .remote,
GatewayRemoteConfig.resolveTransport(root: root) == .direct,
let url = GatewayRemoteConfig.resolveGatewayUrl(root: root)
resolution.transport == .direct,
let url = resolution.directURL
{
return (
url,

View File

@@ -41,21 +41,31 @@ enum GatewayDiscoveryHelpers {
static func directUrl(for gateway: GatewayDiscoveryModel.DiscoveredGateway) -> String? {
self.directGatewayUrl(
serviceHost: gateway.serviceHost,
servicePort: gateway.servicePort)
servicePort: gateway.servicePort,
gatewayTls: gateway.gatewayTls)
}
static func directGatewayUrl(
serviceHost: String?,
servicePort: Int?) -> String?
servicePort: Int?,
gatewayTls: Bool = false) -> String?
{
// Security: do not route using unauthenticated TXT hints (tailnetDns/lanHost/gatewayPort).
// Prefer the resolved service endpoint (SRV + A/AAAA).
guard let endpoint = self.serviceEndpoint(serviceHost: serviceHost, servicePort: servicePort) else {
return nil
}
// Security: for non-loopback hosts, force TLS to avoid plaintext credential/session leakage.
let scheme = self.isLoopbackHost(endpoint.host) ? "ws" : "wss"
let portSuffix = endpoint.port == 443 ? "" : ":\(endpoint.port)"
let scheme: String
if gatewayTls {
scheme = "wss"
} else if self.isLoopbackHost(endpoint.host)
|| GatewayRemoteConfig.isTrustedPlaintextRemoteHost(endpoint.host)
{
scheme = "ws"
} else {
return nil
}
let portSuffix = scheme == "wss" && endpoint.port == 443 ? "" : ":\(endpoint.port)"
return "\(scheme)://\(endpoint.host)\(portSuffix)"
}

View File

@@ -25,14 +25,14 @@ enum GatewayDiscoverySelectionSupport {
state.remoteTarget = GatewayDiscoveryHelpers.sshTarget(for: gateway) ?? ""
if preferredTransport == .direct {
if let endpoint = GatewayDiscoveryHelpers.serviceEndpoint(for: gateway) {
OpenClawConfigFile.setRemoteGatewayUrl(
host: endpoint.host,
port: endpoint.port)
OpenClawConfigFile.setRemoteGatewayTransport(AppState.RemoteTransport.direct.rawValue)
if !state.remoteUrl.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
OpenClawConfigFile.setRemoteGatewayUrlString(state.remoteUrl)
} else {
OpenClawConfigFile.clearRemoteGatewayUrl()
}
} else {
OpenClawConfigFile.setRemoteGatewayTransport(AppState.RemoteTransport.ssh.rawValue)
OpenClawConfigFile.setRemoteGatewayUrlString(state.remoteUrl)
}
}
@@ -65,9 +65,10 @@ enum GatewayDiscoverySelectionSupport {
for gateway: GatewayDiscoveryModel.DiscoveredGateway) -> Bool
{
guard GatewayDiscoveryHelpers.directUrl(for: gateway) != nil else { return false }
if gateway.stableID.hasPrefix("tailscale-serve|") {
if gateway.gatewayTls || gateway.gatewayDirectReachable {
return true
}
guard let host = GatewayDiscoveryHelpers.resolvedServiceHost(for: gateway)?
.trimmingCharacters(in: .whitespacesAndNewlines)
.lowercased()

View File

@@ -306,8 +306,9 @@ actor GatewayEndpointStore {
password: password))
case .remote:
let root = OpenClawConfigFile.loadDict()
if GatewayRemoteConfig.resolveTransport(root: root) == .direct {
guard let url = GatewayRemoteConfig.resolveGatewayUrl(root: root) else {
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
if resolution.transport == .direct {
guard let url = resolution.directURL else {
self.cancelRemoteEnsure()
self.setState(.unavailable(
mode: .remote,
@@ -470,8 +471,9 @@ actor GatewayEndpointStore {
private func resolveDirectRemoteURL() throws -> URL? {
let root = OpenClawConfigFile.loadDict()
guard GatewayRemoteConfig.resolveTransport(root: root) == .direct else { return nil }
guard let url = GatewayRemoteConfig.resolveGatewayUrl(root: root) else {
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
guard resolution.transport == .direct else { return nil }
guard let url = resolution.directURL else {
throw NSError(
domain: "GatewayEndpoint",
code: 1,

View File

@@ -5,6 +5,18 @@ import Darwin
#endif
enum GatewayRemoteConfig {
enum TransportSource: Equatable {
case explicit
case inferredRemoteURL
case legacySSH
}
struct TransportResolution: Equatable {
let transport: AppState.RemoteTransport
let source: TransportSource
let directURL: URL?
}
enum TokenValue: Equatable {
case missing
case plaintext(String)
@@ -28,14 +40,49 @@ enum GatewayRemoteConfig {
}
static func resolveTransport(root: [String: Any]) -> AppState.RemoteTransport {
self.resolveTransportResolution(root: root).transport
}
static func resolveTransportResolution(root: [String: Any]) -> TransportResolution {
let explicit = self.resolveExplicitTransport(root: root)
switch explicit {
case .direct:
return TransportResolution(
transport: .direct,
source: .explicit,
directURL: self.resolveGatewayUrl(root: root))
case .ssh:
return TransportResolution(transport: .ssh, source: .explicit, directURL: nil)
case nil:
break
}
if let url = self.resolveGatewayUrl(root: root),
let host = url.host,
!LoopbackHost.isLoopbackHost(host)
{
return TransportResolution(transport: .direct, source: .inferredRemoteURL, directURL: url)
}
return TransportResolution(transport: .ssh, source: .legacySSH, directURL: nil)
}
private static func resolveExplicitTransport(root: [String: Any]) -> AppState.RemoteTransport? {
guard let gateway = root["gateway"] as? [String: Any],
let remote = gateway["remote"] as? [String: Any],
let raw = remote["transport"] as? String
else {
return .ssh
return nil
}
let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
return trimmed == AppState.RemoteTransport.direct.rawValue ? .direct : .ssh
switch trimmed {
case AppState.RemoteTransport.direct.rawValue:
return .direct
case AppState.RemoteTransport.ssh.rawValue:
return .ssh
default:
return .ssh
}
}
static func resolveUrlString(root: [String: Any]) -> String? {

View File

@@ -38,7 +38,7 @@ struct MenuHeaderCard<Content: View>: View {
.font(.caption)
.foregroundStyle(.secondary)
.multilineTextAlignment(.leading)
.lineLimit(3)
.lineLimit(5)
.truncationMode(.tail)
.fixedSize(horizontal: false, vertical: true)
}

View File

@@ -301,6 +301,16 @@ enum OpenClawConfigFile {
}
}
static func setRemoteGatewayTransport(_ value: String) {
let trimmed = value.trimmingCharacters(in: .whitespacesAndNewlines)
guard !trimmed.isEmpty else { return }
self.updateGatewayDict { gateway in
var remote = gateway["remote"] as? [String: Any] ?? [:]
remote["transport"] = trimmed
gateway["remote"] = remote
}
}
static func clearRemoteGatewayUrl() {
self.updateGatewayDict { gateway in
guard var remote = gateway["remote"] as? [String: Any] else { return }

View File

@@ -16,6 +16,32 @@ final class RemotePortTunnel: @unchecked Sendable {
let localPort: UInt16?
private let stderrHandle: FileHandle?
private final class StderrCapture: @unchecked Sendable {
private let lock = NSLock()
private var text = ""
private let limit = 4096
func append(_ chunk: String) {
let trimmed = chunk.trimmingCharacters(in: .whitespacesAndNewlines)
guard !trimmed.isEmpty else { return }
self.lock.lock()
defer { self.lock.unlock() }
if !self.text.isEmpty {
self.text += "\n"
}
self.text += trimmed
if self.text.count > self.limit {
self.text = String(self.text.suffix(self.limit))
}
}
func snapshot() -> String {
self.lock.lock()
defer { self.lock.unlock() }
return self.text.trimmingCharacters(in: .whitespacesAndNewlines)
}
}
private init(process: Process, localPort: UInt16?, stderrHandle: FileHandle?) {
self.process = process
self.localPort = localPort
@@ -93,6 +119,7 @@ final class RemotePortTunnel: @unchecked Sendable {
let pipe = Pipe()
process.standardError = pipe
let stderrHandle = pipe.fileHandleForReading
let stderrCapture = StderrCapture()
// Consume stderr so ssh cannot block if it logs.
stderrHandle.readabilityHandler = { handle in
@@ -106,6 +133,7 @@ final class RemotePortTunnel: @unchecked Sendable {
.trimmingCharacters(in: .whitespacesAndNewlines),
!line.isEmpty
else { return }
stderrCapture.append(line)
Self.logger.error("ssh tunnel stderr: \(line, privacy: .public)")
}
process.terminationHandler = { _ in
@@ -114,7 +142,11 @@ final class RemotePortTunnel: @unchecked Sendable {
try process.run()
try await Self.waitForListener(process: process, localPort: localPort, stderrHandle: stderrHandle)
try await Self.waitForListener(
process: process,
localPort: localPort,
stderrHandle: stderrHandle,
stderrCapture: stderrCapture)
// Track tunnel so we can clean up stale listeners on restart.
Task {
@@ -131,12 +163,13 @@ final class RemotePortTunnel: @unchecked Sendable {
private static func waitForListener(
process: Process,
localPort: UInt16,
stderrHandle: FileHandle) async throws
stderrHandle: FileHandle,
stderrCapture: StderrCapture) async throws
{
let deadline = Date().addingTimeInterval(6)
repeat {
if !process.isRunning {
let stderr = Self.drainStderr(stderrHandle)
let stderr = Self.drainStderr(stderrHandle, captured: stderrCapture.snapshot())
let msg = stderr.isEmpty ? "ssh tunnel exited before listening" : "ssh tunnel failed: \(stderr)"
throw NSError(domain: "RemotePortTunnel", code: 4, userInfo: [NSLocalizedDescriptionKey: msg])
}
@@ -152,7 +185,7 @@ final class RemotePortTunnel: @unchecked Sendable {
} while Date() < deadline
process.terminate()
let stderr = Self.drainStderr(stderrHandle)
let stderr = Self.drainStderr(stderrHandle, captured: stderrCapture.snapshot())
let msg = stderr.isEmpty ? "ssh tunnel did not open local port \(localPort)" : "ssh tunnel failed: \(stderr)"
throw NSError(domain: "RemotePortTunnel", code: 4, userInfo: [NSLocalizedDescriptionKey: msg])
}
@@ -311,16 +344,27 @@ final class RemotePortTunnel: @unchecked Sendable {
}
private static func drainStderr(_ handle: FileHandle) -> String {
self.drainStderr(handle, captured: "")
}
private static func drainStderr(_ handle: FileHandle, captured: String) -> String {
handle.readabilityHandler = nil
defer { try? handle.close() }
do {
let data = try handle.readToEnd() ?? Data()
return String(data: data, encoding: .utf8)?
let remaining = String(data: data, encoding: .utf8)?
.trimmingCharacters(in: .whitespacesAndNewlines) ?? ""
if captured.isEmpty {
return remaining
}
if remaining.isEmpty {
return captured
}
return captured + "\n" + remaining
} catch {
self.logger.debug("Failed to drain ssh stderr: \(error, privacy: .public)")
return ""
return captured
}
}

View File

@@ -8,6 +8,7 @@ struct SettingsRootView: View {
@State private var monitoringPermissions = false
@State private var selectedTab: SettingsTab = .general
@State private var cachedTabs: Set<SettingsTab>
@State private var columnVisibility: NavigationSplitViewVisibility = .all
@State private var snapshotPaths: (configPath: String?, stateDir: String?) = (nil, nil)
let updater: UpdaterProviding?
private let isPreview = ProcessInfo.processInfo.isPreview
@@ -22,7 +23,7 @@ struct SettingsRootView: View {
}
var body: some View {
NavigationSplitView {
NavigationSplitView(columnVisibility: self.$columnVisibility) {
List(selection: self.$selectedTab) {
ForEach(self.visibleGroups) { group in
Section(group.title) {
@@ -46,19 +47,9 @@ struct SettingsRootView: View {
.padding(.horizontal, 22)
.padding(.vertical, 18)
}
.navigationSplitViewStyle(.balanced)
.frame(width: SettingsTab.windowWidth, height: SettingsTab.windowHeight, alignment: .topLeading)
.frame(maxWidth: .infinity, maxHeight: .infinity, alignment: .topLeading)
.toolbar(removing: .sidebarToggle)
.toolbar {
ToolbarItem(placement: .navigation) {
Button {
NSApp.sendAction(#selector(NSSplitViewController.toggleSidebar(_:)), to: nil, from: nil)
} label: {
Image(systemName: "sidebar.left")
}
.help("Show or hide sidebar")
}
}
.background(SettingsWindowChromeConfigurator())
.onReceive(NotificationCenter.default.publisher(for: .openclawSelectSettingsTab)) { note in
if let tab = note.object as? SettingsTab {

View File

@@ -10,5 +10,6 @@ struct SettingsSidebarScroll<Content: View>: View {
.padding(.horizontal, 10)
}
.settingsSidebarCardLayout()
.padding(.leading, 16)
}
}

View File

@@ -30,6 +30,8 @@ public final class GatewayDiscoveryModel {
public var tailnetDns: String?
public var sshPort: Int
public var gatewayPort: Int?
public var gatewayTls: Bool
public var gatewayDirectReachable: Bool
public var cliPath: String?
public var stableID: String
public var debugID: String
@@ -43,6 +45,8 @@ public final class GatewayDiscoveryModel {
tailnetDns: String? = nil,
sshPort: Int,
gatewayPort: Int? = nil,
gatewayTls: Bool = false,
gatewayDirectReachable: Bool = false,
cliPath: String? = nil,
stableID: String,
debugID: String,
@@ -55,6 +59,8 @@ public final class GatewayDiscoveryModel {
self.tailnetDns = tailnetDns
self.sshPort = sshPort
self.gatewayPort = gatewayPort
self.gatewayTls = gatewayTls
self.gatewayDirectReachable = gatewayDirectReachable
self.cliPath = cliPath
self.stableID = stableID
self.debugID = debugID
@@ -184,6 +190,8 @@ public final class GatewayDiscoveryModel {
tailnetDns: beacon.tailnetDns,
sshPort: beacon.sshPort ?? 22,
gatewayPort: beacon.gatewayPort,
gatewayTls: beacon.gatewayTls,
gatewayDirectReachable: beacon.gatewayDirectReachable,
cliPath: beacon.cliPath,
stableID: stableID,
debugID: "\(beacon.instanceName)@\(beacon.host):\(beacon.port)",
@@ -210,6 +218,8 @@ public final class GatewayDiscoveryModel {
tailnetDns: beacon.tailnetDns,
sshPort: 22,
gatewayPort: beacon.port,
gatewayTls: true,
gatewayDirectReachable: true,
cliPath: nil,
stableID: stableID,
debugID: "\(beacon.host):\(beacon.port)",
@@ -282,6 +292,8 @@ public final class GatewayDiscoveryModel {
tailnetDns: parsedTXT.tailnetDns,
sshPort: parsedTXT.sshPort,
gatewayPort: parsedTXT.gatewayPort,
gatewayTls: parsedTXT.gatewayTls,
gatewayDirectReachable: parsedTXT.gatewayDirectReachable,
cliPath: parsedTXT.cliPath,
stableID: stableID,
debugID: GatewayEndpointID.prettyDescription(result.endpoint),
@@ -445,6 +457,8 @@ public final class GatewayDiscoveryModel {
public var tailnetDns: String?
public var sshPort: Int
public var gatewayPort: Int?
public var gatewayTls: Bool
public var gatewayDirectReachable: Bool
public var cliPath: String?
}
@@ -453,6 +467,8 @@ public final class GatewayDiscoveryModel {
var tailnetDns: String?
var sshPort = 22
var gatewayPort: Int?
var gatewayTls = false
var gatewayDirectReachable = false
var cliPath: String?
if let value = txt["lanHost"] {
@@ -475,6 +491,14 @@ public final class GatewayDiscoveryModel {
{
gatewayPort = parsed
}
if let value = txt["gatewayTls"] {
let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
gatewayTls = normalized == "1" || normalized == "true" || normalized == "yes"
}
if let value = txt["gatewayDirectReachable"] {
let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
gatewayDirectReachable = normalized == "1" || normalized == "true" || normalized == "yes"
}
if let value = txt["cliPath"] {
let trimmed = value.trimmingCharacters(in: .whitespacesAndNewlines)
cliPath = trimmed.isEmpty ? nil : trimmed
@@ -485,6 +509,8 @@ public final class GatewayDiscoveryModel {
tailnetDns: tailnetDns,
sshPort: sshPort,
gatewayPort: gatewayPort,
gatewayTls: gatewayTls,
gatewayDirectReachable: gatewayDirectReachable,
cliPath: cliPath)
}

View File

@@ -9,6 +9,8 @@ struct WideAreaGatewayBeacon: Equatable {
var lanHost: String?
var tailnetDns: String?
var gatewayPort: Int?
var gatewayTls: Bool
var gatewayDirectReachable: Bool
var sshPort: Int?
var cliPath: String?
}
@@ -83,6 +85,8 @@ enum WideAreaGatewayDiscovery {
lanHost: txt["lanHost"],
tailnetDns: txt["tailnetDns"],
gatewayPort: parseInt(txt["gatewayPort"]),
gatewayTls: parseBool(txt["gatewayTls"]),
gatewayDirectReachable: parseBool(txt["gatewayDirectReachable"]),
sshPort: parseInt(txt["sshPort"]),
cliPath: txt["cliPath"])
beacons.append(beacon)
@@ -246,6 +250,12 @@ enum WideAreaGatewayDiscovery {
return Int(trimmed)
}
private static func parseBool(_ value: String?) -> Bool {
guard let value else { return false }
let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
return normalized == "1" || normalized == "true" || normalized == "yes"
}
private static func isTailnetIPv4(_ value: String) -> Bool {
let parts = value.split(separator: ".")
if parts.count != 4 { return false }

View File

@@ -41,6 +41,8 @@ struct DiscoveryOutput: Encodable {
var tailnetDns: String?
var sshPort: Int
var gatewayPort: Int?
var gatewayTls: Bool
var gatewayDirectReachable: Bool
var cliPath: String?
var stableID: String
var debugID: String
@@ -106,6 +108,8 @@ func runDiscover(_ args: [String]) async {
tailnetDns: $0.tailnetDns,
sshPort: $0.sshPort,
gatewayPort: $0.gatewayPort,
gatewayTls: $0.gatewayTls,
gatewayDirectReachable: $0.gatewayDirectReachable,
cliPath: $0.cliPath,
stableID: $0.stableID,
debugID: $0.debugID,
@@ -139,6 +143,8 @@ func runDiscover(_ args: [String]) async {
if let port = gateway.gatewayPort {
print(" gatewayPort: \(port)")
}
print(" gatewayTls: \(gateway.gatewayTls)")
print(" gatewayDirectReachable: \(gateway.gatewayDirectReachable)")
if let cliPath = gateway.cliPath {
print(" cliPath: \(cliPath)")
}

View File

@@ -51,7 +51,7 @@ struct AppStateRemoteConfigTests {
remoteTokenDirty: false))
#expect(remote["url"] as? String == "ws://127.0.0.1:18789")
#expect((remote["transport"] as? String) == nil)
#expect(remote["transport"] as? String == "ssh")
#expect(remote["sshTarget"] as? String == "alice@gateway.example")
}
@@ -161,6 +161,29 @@ struct AppStateRemoteConfigTests {
}
}
@Test
func `app state init preserves legacy SSH tunnel config until transport is explicit`() async {
let configPath = TestIsolation.tempConfigPath()
await TestIsolation.withIsolatedState(
env: ["OPENCLAW_CONFIG_PATH": configPath],
defaults: [remoteTargetKey: nil])
{
OpenClawConfigFile.saveDict([
"gateway": [
"mode": "remote",
"remote": [
"url": "ws://127.0.0.1:18789",
"sshTarget": "steipete@192.168.0.202",
],
],
])
let state = AppState(preview: true)
#expect(state.remoteTransport == .ssh)
#expect(state.remoteUrl == "ws://127.0.0.1:18789")
}
}
@Test
func `synced gateway root preserves object token across mode and transport changes when untouched`() {
let initialRoot: [String: Any] = [

View File

@@ -10,7 +10,8 @@ struct GatewayDiscoveryHelpersTests {
lanHost: String? = "txt-host.local",
tailnetDns: String? = "txt-host.ts.net",
sshPort: Int = 22,
gatewayPort: Int? = 18789) -> GatewayDiscoveryModel.DiscoveredGateway
gatewayPort: Int? = 18789,
gatewayTls: Bool = false) -> GatewayDiscoveryModel.DiscoveredGateway
{
GatewayDiscoveryModel.DiscoveredGateway(
displayName: "Gateway",
@@ -20,6 +21,7 @@ struct GatewayDiscoveryHelpersTests {
tailnetDns: tailnetDns,
sshPort: sshPort,
gatewayPort: gatewayPort,
gatewayTls: gatewayTls,
cliPath: "/tmp/openclaw",
stableID: UUID().uuidString,
debugID: UUID().uuidString,
@@ -70,13 +72,14 @@ struct GatewayDiscoveryHelpersTests {
@Test func `direct url uses resolved service endpoint only`() {
let tlsGateway = self.makeGateway(
serviceHost: "resolved.example.ts.net",
servicePort: 443)
servicePort: 443,
gatewayTls: true)
#expect(GatewayDiscoveryHelpers.directUrl(for: tlsGateway) == "wss://resolved.example.ts.net")
let wsGateway = self.makeGateway(
serviceHost: "resolved.example.ts.net",
servicePort: 18789)
#expect(GatewayDiscoveryHelpers.directUrl(for: wsGateway) == "wss://resolved.example.ts.net:18789")
#expect(GatewayDiscoveryHelpers.directUrl(for: wsGateway) == "ws://resolved.example.ts.net:18789")
let localGateway = self.makeGateway(
serviceHost: "127.0.0.1",
@@ -84,6 +87,15 @@ struct GatewayDiscoveryHelpersTests {
#expect(GatewayDiscoveryHelpers.directUrl(for: localGateway) == "ws://127.0.0.1:18789")
}
@Test func `direct url rejects public plaintext service endpoint`() {
let gateway = self.makeGateway(
serviceHost: "gateway.example",
servicePort: 18789,
gatewayTls: false)
#expect(GatewayDiscoveryHelpers.directUrl(for: gateway) == nil)
}
@Test func `direct url rejects txt only fallback`() {
let gateway = self.makeGateway(
serviceHost: nil,

View File

@@ -87,12 +87,16 @@ struct GatewayDiscoveryModelTests {
"tailnetDns": " peters-mac-studio-1.ts.net ",
"sshPort": " 2222 ",
"gatewayPort": " 18799 ",
"gatewayTls": " yes ",
"gatewayDirectReachable": " true ",
"cliPath": " /opt/openclaw ",
])
#expect(parsed.lanHost == "studio.local")
#expect(parsed.tailnetDns == "peters-mac-studio-1.ts.net")
#expect(parsed.sshPort == 2222)
#expect(parsed.gatewayPort == 18799)
#expect(parsed.gatewayTls)
#expect(parsed.gatewayDirectReachable)
#expect(parsed.cliPath == "/opt/openclaw")
}
@@ -107,6 +111,8 @@ struct GatewayDiscoveryModelTests {
#expect(parsed.tailnetDns == nil)
#expect(parsed.sshPort == 22)
#expect(parsed.gatewayPort == nil)
#expect(!parsed.gatewayTls)
#expect(!parsed.gatewayDirectReachable)
#expect(parsed.cliPath == nil)
}

View File

@@ -11,6 +11,8 @@ struct GatewayDiscoverySelectionSupportTests {
servicePort: Int?,
tailnetDns: String? = nil,
sshPort: Int = 22,
gatewayTls: Bool = false,
gatewayDirectReachable: Bool = false,
stableID: String) -> GatewayDiscoveryModel.DiscoveredGateway
{
GatewayDiscoveryModel.DiscoveredGateway(
@@ -21,6 +23,8 @@ struct GatewayDiscoverySelectionSupportTests {
tailnetDns: tailnetDns,
sshPort: sshPort,
gatewayPort: servicePort,
gatewayTls: gatewayTls,
gatewayDirectReachable: gatewayDirectReachable,
cliPath: nil,
stableID: stableID,
debugID: UUID().uuidString,
@@ -40,6 +44,7 @@ struct GatewayDiscoverySelectionSupportTests {
serviceHost: tailnetHost,
servicePort: 443,
tailnetDns: tailnetHost,
gatewayTls: true,
stableID: "tailscale-serve|\(tailnetHost)"),
state: state)
@@ -61,6 +66,7 @@ struct GatewayDiscoverySelectionSupportTests {
serviceHost: tailnetHost,
servicePort: 443,
tailnetDns: tailnetHost,
gatewayTls: true,
stableID: "wide-area|openclaw.internal.|gateway-host"),
state: state)
@@ -69,12 +75,33 @@ struct GatewayDiscoverySelectionSupportTests {
}
}
@Test func `selecting nearby lan gateway keeps ssh transport`() async {
@Test func `legacy tailnet discovery without reachability flags still switches to direct transport`() async {
let tailnetHost = "gateway-host.tailnet-example.ts.net"
let configPath = TestIsolation.tempConfigPath()
await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
let state = AppState(preview: true)
state.remoteTransport = .ssh
GatewayDiscoverySelectionSupport.applyRemoteSelection(
gateway: self.makeGateway(
serviceHost: tailnetHost,
servicePort: 18789,
tailnetDns: tailnetHost,
stableID: "wide-area|openclaw.internal.|gateway-host"),
state: state)
#expect(state.remoteTransport == .direct)
#expect(state.remoteUrl == "ws://\(tailnetHost):18789")
}
}
@Test func `selecting nearby lan gateway keeps ssh without direct reachability signal`() async {
let configPath = TestIsolation.tempConfigPath()
await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
let state = AppState(preview: true)
state.remoteTransport = .ssh
state.remoteTarget = "user@old-host"
state.remoteUrl = "ws://localhost:29876"
GatewayDiscoverySelectionSupport.applyRemoteSelection(
gateway: self.makeGateway(
@@ -84,16 +111,17 @@ struct GatewayDiscoverySelectionSupportTests {
state: state)
#expect(state.remoteTransport == .ssh)
#expect(state.remoteUrl == "ws://127.0.0.1:18789")
#expect(state.remoteUrl == "ws://127.0.0.1:29876")
#expect(CommandResolver.parseSSHTarget(state.remoteTarget)?.host == "nearby-gateway.local")
let configRoot = OpenClawConfigFile.loadDict()
let remote = ((configRoot["gateway"] as? [String: Any])?["remote"] as? [String: Any]) ?? [:]
#expect(remote["url"] as? String == "ws://127.0.0.1:18789")
#expect(remote["transport"] as? String == "ssh")
#expect(remote["url"] as? String == "ws://127.0.0.1:29876")
}
}
@Test func `selecting nearby lan gateway preserves existing ssh tunnel port`() async {
@Test func `selecting direct reachable lan gateway ignores stale local tunnel port`() async {
let configPath = TestIsolation.tempConfigPath()
await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
let state = AppState(preview: true)
@@ -104,15 +132,17 @@ struct GatewayDiscoverySelectionSupportTests {
gateway: self.makeGateway(
serviceHost: "nearby-gateway.local",
servicePort: 19999,
gatewayDirectReachable: true,
stableID: "bonjour|nearby-gateway-custom"),
state: state)
#expect(state.remoteTransport == .ssh)
#expect(state.remoteUrl == "ws://127.0.0.1:29876")
#expect(state.remoteTransport == .direct)
#expect(state.remoteUrl == "ws://nearby-gateway.local:19999")
let configRoot = OpenClawConfigFile.loadDict()
let remote = ((configRoot["gateway"] as? [String: Any])?["remote"] as? [String: Any]) ?? [:]
#expect(remote["url"] as? String == "ws://127.0.0.1:29876")
#expect(remote["transport"] as? String == "direct")
#expect(remote["url"] as? String == "ws://nearby-gateway.local:19999")
}
}
}

View File

@@ -315,6 +315,54 @@ struct GatewayEndpointStoreTests {
#expect(url?.absoluteString == "ws://100.123.224.76:18789")
}
@Test func `missing transport infers direct from private remote URL`() {
let root: [String: Any] = [
"gateway": [
"remote": [
"url": "ws://192.168.0.202:18789",
],
],
]
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
#expect(resolution.transport == .direct)
#expect(resolution.source == .inferredRemoteURL)
#expect(resolution.directURL?.absoluteString == "ws://192.168.0.202:18789")
}
@Test func `legacy loopback URL keeps SSH even with trusted SSH target`() {
let root: [String: Any] = [
"gateway": [
"remote": [
"url": "ws://127.0.0.1:18789",
"sshTarget": "steipete@192.168.0.202",
],
],
]
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
#expect(resolution.transport == .ssh)
#expect(resolution.source == .legacySSH)
#expect(resolution.directURL == nil)
}
@Test func `explicit ssh keeps legacy tunnel even when target is direct capable`() {
let root: [String: Any] = [
"gateway": [
"remote": [
"transport": "ssh",
"url": "ws://127.0.0.1:18789",
"sshTarget": "steipete@192.168.0.202",
],
],
]
let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
#expect(resolution.transport == .ssh)
#expect(resolution.source == .explicit)
#expect(resolution.directURL == nil)
}
@Test func `normalize gateway url rejects public host ws`() {
let url = GatewayRemoteConfig.normalizeGatewayUrl("ws://gateway.example:18789")
#expect(url == nil)

View File

@@ -12,39 +12,39 @@ OpenClaw CI runs on every push to `main` and every pull request. The `preflight`
## Pipeline overview
| Job | Purpose | When it runs |
| -------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------- |
| `preflight` | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest | Always on non-draft pushes and PRs |
| `security-scm-fast` | Private key detection and workflow audit via `zizmor` | Always on non-draft pushes and PRs |
| `security-dependency-audit` | Dependency-free production lockfile audit against npm advisories | Always on non-draft pushes and PRs |
| `security-fast` | Required aggregate for the fast security jobs | Always on non-draft pushes and PRs |
| `check-dependencies` | Production Knip dependency-only pass plus the unused-file allowlist guard | Node-relevant changes |
| `build-artifacts` | Build `dist/`, Control UI, built-artifact checks, and reusable downstream artifacts | Node-relevant changes |
| `checks-fast-core` | Fast Linux correctness lanes such as bundled/plugin-contract/protocol checks | Node-relevant changes |
| `checks-fast-contracts-channels` | Sharded channel contract checks with a stable aggregate check result | Node-relevant changes |
| `checks-node-core-test` | Core Node test shards, excluding channel, bundled, contract, and extension lanes | Node-relevant changes |
| `check` | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke | Node-relevant changes |
| `check-additional` | Architecture, sharded boundary/prompt drift, extension guards, package boundary, and gateway watch | Node-relevant changes |
| `build-smoke` | Built-CLI smoke tests and startup-memory smoke | Node-relevant changes |
| `checks` | Verifier for built-artifact channel tests | Node-relevant changes |
| `checks-node-compat-node22` | Node 22 compatibility build and smoke lane | Manual CI dispatch for releases |
| `check-docs` | Docs formatting, lint, and broken-link checks | Docs changed |
| `skills-python` | Ruff + pytest for Python-backed skills | Python-skill-relevant changes |
| `checks-windows` | Windows-specific process/path tests plus shared runtime import specifier regressions | Windows-relevant changes |
| `macos-node` | macOS TypeScript test lane using the shared built artifacts | macOS-relevant changes |
| `macos-swift` | Swift lint, build, and tests for the macOS app | macOS-relevant changes |
| `android` | Android unit tests for both flavors plus one debug APK build | Android-relevant changes |
| `test-performance-agent` | Daily Codex slow-test optimization after trusted activity | Main CI success or manual dispatch |
| `openclaw-performance` | Daily/on-demand Kova runtime performance reports with mock-provider, deep-profile, and GPT 5.5 live lanes | Scheduled and manual dispatch |
| Job | Purpose | When it runs |
| ---------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------- |
| `preflight` | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest | Always on non-draft pushes and PRs |
| `security-scm-fast` | Private key detection and workflow audit via `zizmor` | Always on non-draft pushes and PRs |
| `security-dependency-audit` | Dependency-free production lockfile audit against npm advisories | Always on non-draft pushes and PRs |
| `security-fast` | Required aggregate for the fast security jobs | Always on non-draft pushes and PRs |
| `check-dependencies` | Production Knip dependency-only pass plus the unused-file allowlist guard | Node-relevant changes |
| `build-artifacts` | Build `dist/`, Control UI, built-CLI smoke checks, embedded built-artifact checks, and reusable artifacts | Node-relevant changes |
| `checks-fast-core` | Fast Linux correctness lanes such as bundled and CI-routing checks | Node-relevant changes |
| `checks-fast-protocol` | Gateway protocol compatibility check | Node-relevant changes |
| `checks-fast-contracts-plugins-*` | Two sharded plugin contract checks | Node-relevant changes |
| `checks-fast-contracts-channels-*` | Two sharded channel contract checks | Node-relevant changes |
| `checks-node-core-*` | Core Node test shards, excluding channel, bundled, contract, and extension lanes | Node-relevant changes |
| `check-*` | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke | Node-relevant changes |
| `check-additional-*` | Architecture, sharded boundary/prompt drift, extension guards, package boundary, and runtime topology | Node-relevant changes |
| `checks-node-compat-node22` | Node 22 compatibility build and smoke lane | Manual CI dispatch for releases |
| `check-docs` | Docs formatting, lint, and broken-link checks | Docs changed |
| `skills-python` | Ruff + pytest for Python-backed skills | Python-skill-relevant changes |
| `checks-windows` | Windows-specific process/path tests plus shared runtime import specifier regressions | Windows-relevant changes |
| `macos-node` | macOS TypeScript test lane using the shared built artifacts | macOS-relevant changes |
| `macos-swift` | Swift lint, build, and tests for the macOS app | macOS-relevant changes |
| `android` | Android unit tests for both flavors plus one debug APK build | Android-relevant changes |
| `test-performance-agent` | Daily Codex slow-test optimization after trusted activity | Main CI success or manual dispatch |
| `openclaw-performance` | Daily/on-demand Kova runtime performance reports with mock-provider, deep-profile, and GPT 5.5 live lanes | Scheduled and manual dispatch |
## Fail-fast order
1. `preflight` decides which lanes exist at all. The `docs-scope` and `changed-scope` logic are steps inside this job, not standalone jobs.
2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check`, `check-additional`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs.
2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check-*`, `check-additional-*`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs.
3. `build-artifacts` overlaps with the fast Linux lanes so downstream consumers can start as soon as the shared build is ready.
4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-channels`, `checks-node-core-test`, `checks`, `checks-windows`, `macos-node`, `macos-swift`, and `android`.
4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-plugins-*`, `checks-fast-contracts-channels-*`, `checks-node-core-*`, `checks-windows`, `macos-node`, `macos-swift`, and `android`.
GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Aggregate shard checks use `!cancelled() && always()` so they still report normal shard failures but do not queue after the whole workflow has already been superseded. The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs.
GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Matrix jobs use `fail-fast: false`, and `build-artifacts` reports embedded channel, core-support-boundary, and gateway-watch failures directly instead of queuing tiny verifier jobs. The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs.
The `ci-timings-summary` job uploads a compact `ci-timings-summary` artifact for each non-draft CI run. It records wall time, queue time, slowest jobs, and failed jobs for the current run, so CI health checks do not need to scrape the full Actions payload repeatedly.
@@ -56,7 +56,7 @@ Scope logic lives in `scripts/ci-changed-scope.mjs` and is covered by unit tests
- **CI routing-only edits, selected cheap core-test fixture edits, and narrow plugin contract helper/test-routing edits** use a fast Node-only manifest path: `preflight`, security, and a single `checks-fast-core` task. That path skips build artifacts, Node 22 compatibility, channel contracts, full core shards, bundled-plugin shards, and additional guard matrices when the change is limited to the routing or helper surfaces the fast task exercises directly.
- **Windows Node checks** are scoped to Windows-specific process/path wrappers, npm/pnpm/UI runner helpers, package manager config, and the CI workflow surfaces that execute that lane; unrelated source, plugin, install-smoke, and test-only changes stay on the Linux Node lanes.
The slowest Node test families are split or balanced so each job stays small without over-reserving runners: channel contracts run as three weighted Blacksmith-backed shards with the standard GitHub runner fallback, core unit fast/support lanes run separately, core runtime infra is split between state, process/config, cron, and shared shards, auto-reply runs as balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/server configs are split across chat/auth/model/http-plugin/runtime/startup lanes instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard list is striped across four matrix shards, each running selected independent guards concurrently and printing per-check timings. The expensive Codex happy-path prompt snapshot drift check runs as its own additional job for manual CI and for prompt-affecting changes only, so normal unrelated Node changes do not wait behind cold prompt snapshot generation and the boundary shards stay balanced while prompt drift is still pinned to the PR that caused it; the same flag skips prompt snapshot Vitest generation inside the built-artifact core support-boundary shard. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built.
The slowest Node test families are split or balanced so each job stays small without over-reserving runners: plugin contracts and channel contracts each run as two weighted Blacksmith-backed shards with the standard GitHub runner fallback, core unit fast/support lanes run separately, core runtime infra is split between state, process/config, cron, and shared shards, auto-reply runs as balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/server configs are split across chat/auth/model/http-plugin/runtime/startup lanes instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional-*` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard list is striped across four matrix shards, each running selected independent guards concurrently and printing per-check timings. The expensive Codex happy-path prompt snapshot drift check runs as its own additional job for manual CI and for prompt-affecting changes only, so normal unrelated Node changes do not wait behind cold prompt snapshot generation and the boundary shards stay balanced while prompt drift is still pinned to the PR that caused it; the same flag skips prompt snapshot Vitest generation inside the built-artifact core support-boundary shard. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built.
Android CI runs both `testPlayDebugUnitTest` and `testThirdPartyDebugUnitTest` and then builds the Play debug APK. The third-party flavor has no separate source set or manifest; its unit-test lane still compiles the flavor with the SMS/call-log BuildConfig flags, while avoiding a duplicate debug APK packaging job on every Android-relevant push.
@@ -81,7 +81,7 @@ Treat GitHub titles, comments, bodies, review text, branch names, and commit mes
## Manual dispatches
Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, channel contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with `include_android=true`; the full release umbrella enables Android by passing `include_android=true`. Plugin prerelease static checks, the release-only `agentic-plugins` shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when `Full Release Validation` dispatches the separate `Plugin Prerelease` workflow with the release-validation gate enabled.
Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, plugin and channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`, built-artifact smoke checks, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with `include_android=true`; the full release umbrella enables Android by passing `include_android=true`. Plugin prerelease static checks, the release-only `agentic-plugins` shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when `Full Release Validation` dispatches the separate `Plugin Prerelease` workflow with the release-validation gate enabled.
Manual runs use a unique concurrency group so a release-candidate full suite is not cancelled by another push or PR run on the same ref. The optional `target_ref` input lets a trusted caller run that graph against a branch, tag, or full commit SHA while using the workflow file from the selected dispatch ref.
@@ -93,15 +93,15 @@ gh workflow run full-release-validation.yml --ref main -f ref=<branch-or-sha>
## Runners
| Runner | Jobs |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `ubuntu-24.04` | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, sharded channel contract checks, `check` shards except lint, `check-additional` aggregates, Node test aggregate verifiers, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier |
| `blacksmith-4vcpu-ubuntu-2404` | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types` |
| `blacksmith-8vcpu-ubuntu-2404` | build-smoke, Linux Node test shards, bundled plugin test shards, `check-additional` shards, `android` |
| `blacksmith-16vcpu-ubuntu-2404` | `build-artifacts`, `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved) |
| `blacksmith-16vcpu-windows-2025` | `checks-windows` |
| `blacksmith-6vcpu-macos-latest` | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest` |
| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest` |
| Runner | Jobs |
| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ubuntu-24.04` | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier |
| `blacksmith-4vcpu-ubuntu-2404` | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-fast-protocol`, plugin/channel contract shards, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types` |
| `blacksmith-8vcpu-ubuntu-2404` | Linux Node test shards, bundled plugin test shards, `check-additional-*` shards, `android` |
| `blacksmith-16vcpu-ubuntu-2404` | `build-artifacts`, `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved) |
| `blacksmith-16vcpu-windows-2025` | `checks-windows` |
| `blacksmith-6vcpu-macos-latest` | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest` |
| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest` |
Canonical-repo CI keeps Blacksmith as the default runner path. During `preflight`, `scripts/ci-runner-labels.mjs` checks recent queued and in-progress Actions runs for queued Blacksmith jobs. If a specific Blacksmith label already has queued jobs, downstream jobs that would use that exact label fall back to the matching GitHub-hosted runner (`ubuntu-24.04`, `windows-2025`, or `macos-latest`) for that run only. Other Blacksmith sizes in the same OS family stay on their primary labels. If the API probe fails, no fallback is applied.
@@ -121,7 +121,7 @@ pnpm test:changed # cheap smart changed Vitest targe
pnpm test:channels
pnpm test:contracts:channels
pnpm check:docs # docs format + lint + broken links
pnpm build # build dist when CI artifact/build-smoke lanes matter
pnpm build # build dist when CI artifact/smoke checks matter
pnpm ci:timings # summarize the latest origin/main push CI run
pnpm ci:timings:recent # compare recent successful main CI runs
node scripts/ci-run-timings.mjs <run-id> # summarize wall time, queue time, and slowest jobs
@@ -203,7 +203,7 @@ Docker release-path soak; `full` forces soak on.
The umbrella records the dispatched child run ids, and the final `Verify full validation` job re-checks current child run conclusions and appends slowest-job tables for each child run. If a child workflow is rerun and turns green, rerun only the parent verifier job to refresh the umbrella result and timing summary.
For recovery, both `Full Release Validation` and `OpenClaw Release Checks` accept `rerun_group`. Use `all` for a release candidate, `ci` for only the normal full CI child, `plugin-prerelease` for only the plugin prerelease child, `release-checks` for every release child, or a narrower group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, `qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed release box rerun bounded after a focused fix. For one failed cross-OS lane, combine `rerun_group=cross-os` with `cross_os_suite_filter`, for example `windows/packaged-upgrade`; long cross-OS commands emit heartbeat lines and packaged-upgrade summaries include per-phase timings. QA release-check lanes are advisory, so QA-only failures warn but do not block the release-check verifier.
For recovery, both `Full Release Validation` and `OpenClaw Release Checks` accept `rerun_group`. Use `all` for a release candidate, `ci` for only the normal full CI child, `plugin-prerelease` for only the plugin prerelease child, `release-checks` for every release child, or a narrower group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, `qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed release box rerun bounded after a focused fix. For one failed cross-OS lane, combine `rerun_group=cross-os` with `cross_os_suite_filter`, for example `windows/packaged-upgrade`; long cross-OS commands emit heartbeat lines and packaged-upgrade summaries include per-phase timings. QA release-check lanes are advisory except the standard runtime tool coverage gate, which blocks when required OpenClaw dynamic tools drift or disappear from the standard tier summary.
`OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected ref once into a `release-package-under-test` tarball, then passes that artifact to cross-OS checks and Package Acceptance, plus the live/E2E release-path Docker workflow when soak coverage runs. That keeps the package bytes consistent across release boxes and avoids repacking the same candidate in multiple child jobs.

View File

@@ -90,7 +90,7 @@ openclaw doctor --lint --only core/doctor/gateway-config --json
Human output is compact:
```text
doctor --lint: ran 5 check(s), 1 finding(s)
doctor --lint: ran 6 check(s), 1 finding(s)
[warning] core/doctor/gateway-config gateway.mode - gateway.mode is unset; gateway start will be blocked.
fix: Run `openclaw configure` and set Gateway mode (local/remote), or `openclaw config set gateway.mode local`.
```

View File

@@ -34,7 +34,7 @@ script aliases; both forms are supported.
| `qa run` | Bundled QA self-check; writes a Markdown report. |
| `qa suite` | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM. |
| `qa coverage` | Print the markdown scenario-coverage inventory (`--json` for machine output). |
| `qa parity-report` | Compare two `qa-suite-summary.json` files and write the agentic parity report. |
| `qa parity-report` | Compare two `qa-suite-summary.json` files and write the agentic parity report, or use `--runtime-axis --token-efficiency` to write Codex-vs-Pi runtime parity and token-efficiency reports from one runtime-pair summary. |
| `qa character-eval` | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting). |
| `qa manual` | Run a one-off prompt against the selected provider/model lane. |
| `qa ui` | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`). |

View File

@@ -185,10 +185,10 @@ vYYYY.M.D-beta.N` from the matching `release/YYYY.M.D` branch. The helper runs
- `custom`: exact `docker_lanes` selection for a focused rerun
- Run the manual `CI` workflow directly when you only need full normal CI
coverage for the release candidate. Manual CI dispatches bypass changed
scoping and force the Linux Node shards, bundled-plugin shards, channel
contracts, Node 22 compatibility, `check`, `check-additional`, build smoke,
docs checks, Python skills, Windows, macOS, Android, and Control UI i18n
lanes.
scoping and force the Linux Node shards, bundled-plugin shards, plugin and
channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`,
built-artifact smoke checks, docs checks, Python skills, Windows, macOS,
Android, and Control UI i18n lanes.
Example: `gh workflow run ci.yml --ref release/YYYY.M.D`
- Run `pnpm qa:otel:smoke` when validating release telemetry. It exercises
QA-lab through a local OTLP/HTTP receiver and verifies the exported trace
@@ -442,16 +442,19 @@ Focused `npm-telegram` reruns require `release_package_spec` or
`npm_telegram_package_spec`; full/all runs with `release_profile=full` use the
release-checks package artifact. Focused
cross-OS reruns can add `cross_os_suite_filter=windows/packaged-upgrade` or
another OS/suite filter. QA release-check failures are advisory; a QA-only
failure does not block release validation.
another OS/suite filter. QA release-check failures are advisory except the
standard runtime tool coverage gate, which blocks release validation when
required OpenClaw dynamic tools drift or disappear from the standard tier
summary.
### Vitest
The Vitest box is the manual `CI` child workflow. Manual CI intentionally
bypasses changed scoping and forces the normal test graph for the release
candidate: Linux Node shards, bundled-plugin shards, channel contracts, Node 22
compatibility, `check`, `check-additional`, build smoke, docs checks, Python
skills, Windows, macOS, Android, and Control UI i18n.
candidate: Linux Node shards, bundled-plugin shards, plugin and channel contract
shards, Node 22 compatibility, `check-*`, `check-additional-*`,
built-artifact smoke checks, docs checks, Python skills, Windows, macOS,
Android, and Control UI i18n.
Use this box to answer "did the source tree pass the full normal test suite?"
It is not the same as release-path product validation. Evidence to keep:

View File

@@ -44,7 +44,7 @@ only when Package Acceptance should intentionally prove a different package.
| Stage | Details |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Target resolution | **Job:** `Resolve target ref`<br />**Child workflow:** none<br />**Proves:** resolves the release branch, tag, or full commit SHA and records selected inputs.<br />**Rerun:** rerun the umbrella if this fails. |
| Vitest and normal CI | **Job:** `Run normal full CI`<br />**Child workflow:** `CI`<br />**Proves:** manual full CI graph against the target ref, including Linux Node lanes, bundled plugin shards, channel contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs checks, Python skills, Windows, macOS, Control UI i18n, and Android via the umbrella.<br />**Rerun:** `rerun_group=ci`. |
| Vitest and normal CI | **Job:** `Run normal full CI`<br />**Child workflow:** `CI`<br />**Proves:** manual full CI graph against the target ref, including Linux Node lanes, bundled plugin shards, plugin and channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`, built-artifact smoke checks, docs checks, Python skills, Windows, macOS, Control UI i18n, and Android via the umbrella.<br />**Rerun:** `rerun_group=ci`. |
| Plugin prerelease | **Job:** `Run plugin prerelease validation`<br />**Child workflow:** `Plugin Prerelease`<br />**Proves:** release-only plugin static checks, agentic plugin coverage, full extension batch shards, plugin prerelease Docker lanes, and a non-blocking `plugin-inspector-advisory` artifact for compatibility triage.<br />**Rerun:** `rerun_group=plugin-prerelease`. |
| Release checks | **Job:** `Run release/live/Docker/QA validation`<br />**Child workflow:** `OpenClaw Release Checks`<br />**Proves:** install smoke, cross-OS package checks, Package Acceptance, QA Lab parity, live Matrix, and live Telegram. With `run_release_soak=true` or `release_profile=full`, also runs exhaustive live/E2E suites and Docker release-path chunks.<br />**Rerun:** `rerun_group=release-checks` or a narrower release-checks handle. |
| Package artifact | **Job:** `Prepare release package artifact`<br />**Child workflow:** none<br />**Proves:** creates the parent `release-package-under-test` tarball early enough for package-facing checks that do not need to wait for `OpenClaw Release Checks`.<br />**Rerun:** rerun the umbrella or provide `release_package_spec` for published-package reruns. |
@@ -166,9 +166,10 @@ summaries include per-phase timings for packaged upgrade lanes, and long-running
commands print heartbeat lines so a stuck Windows update is visible before the
job timeout.
QA release-check lanes are advisory. A QA-only failure is reported as a warning
and does not block the release-check verifier; rerun `rerun_group=qa`,
`qa-parity`, or `qa-live` when you need fresh QA evidence.
QA release-check lanes are advisory except the standard runtime tool coverage
gate. Required OpenClaw dynamic tool drift in the standard tier blocks the
release-check verifier; other QA-only failures are reported as warnings. Rerun
`rerun_group=qa`, `qa-parity`, or `qa-live` when you need fresh QA evidence.
## Evidence to keep

View File

@@ -72,6 +72,7 @@ describe("bonjour plugin entry", () => {
gatewayPort: 3210,
gatewayTlsEnabled: true,
gatewayTlsFingerprintSha256: "abc123",
gatewayDirectReachable: true,
canvasPort: 9876,
sshPort: 22,
tailnetDns: "dev.tailnet.ts.net",
@@ -88,6 +89,7 @@ describe("bonjour plugin entry", () => {
gatewayPort: 3210,
gatewayTlsEnabled: true,
gatewayTlsFingerprintSha256: "abc123",
gatewayDirectReachable: true,
canvasPort: 9876,
sshPort: 22,
tailnetDns: "dev.tailnet.ts.net",

View File

@@ -32,6 +32,7 @@ export default definePluginEntry({
gatewayPort: ctx.gatewayPort,
gatewayTlsEnabled: ctx.gatewayTlsEnabled,
gatewayTlsFingerprintSha256: ctx.gatewayTlsFingerprintSha256,
gatewayDirectReachable: ctx.gatewayDirectReachable,
canvasPort: ctx.canvasPort,
sshPort: ctx.sshPort,
tailnetDns: ctx.tailnetDns,

View File

@@ -180,6 +180,7 @@ describe("gateway bonjour advertiser", () => {
const started = await startAdvertiser({
gatewayPort: 18789,
sshPort: 2222,
gatewayDirectReachable: true,
tailnetDns: "host.tailnet.ts.net",
cliPath: "/opt/homebrew/bin/openclaw",
minimal: false,
@@ -195,6 +196,7 @@ describe("gateway bonjour advertiser", () => {
expect(gatewayCall?.[0]?.hostname).toBe("test-host");
expect((gatewayCall?.[0]?.txt as Record<string, string>)?.lanHost).toBe("test-host.local");
expect((gatewayCall?.[0]?.txt as Record<string, string>)?.gatewayPort).toBe("18789");
expect((gatewayCall?.[0]?.txt as Record<string, string>)?.gatewayDirectReachable).toBe("1");
expect((gatewayCall?.[0]?.txt as Record<string, string>)?.sshPort).toBe("2222");
expect((gatewayCall?.[0]?.txt as Record<string, string>)?.tailnetDns).toBe(
"host.tailnet.ts.net",

View File

@@ -22,6 +22,7 @@ export type GatewayBonjourAdvertiseOpts = {
sshPort?: number;
gatewayTlsEnabled?: boolean;
gatewayTlsFingerprintSha256?: string;
gatewayDirectReachable?: boolean;
canvasPort?: number;
tailnetDns?: string;
cliPath?: string;
@@ -451,6 +452,9 @@ export async function startGatewayBonjourAdvertiser(
txtBase.gatewayTlsSha256 = opts.gatewayTlsFingerprintSha256;
}
}
if (opts.gatewayDirectReachable) {
txtBase.gatewayDirectReachable = "1";
}
if (typeof opts.canvasPort === "number" && opts.canvasPort > 0) {
txtBase.canvasPort = String(opts.canvasPort);
}

View File

@@ -1 +1,6 @@
export { noteChromeMcpBrowserReadiness } from "./src/doctor-browser.js";
export {
detectLegacyClawdBrowserProfileResidue,
maybeArchiveLegacyClawdBrowserProfileResidue,
noteChromeMcpBrowserReadiness,
} from "./src/doctor-browser.js";
export type { LegacyClawdBrowserProfileResidue } from "./src/doctor-browser.js";

View File

@@ -1,5 +1,8 @@
import { describe, expect, it, vi } from "vitest";
import { noteChromeMcpBrowserReadiness } from "./doctor-browser.js";
import {
maybeArchiveLegacyClawdBrowserProfileResidue,
noteChromeMcpBrowserReadiness,
} from "./doctor-browser.js";
function requireFirstNoteText(noteFn: ReturnType<typeof vi.fn>): string {
const [call] = noteFn.mock.calls;
@@ -92,6 +95,63 @@ describe("browser doctor readiness", () => {
);
});
it("warns about legacy clawd managed browser profile residue", async () => {
const noteFn = vi.fn();
const configDir = "/tmp/openclaw-home";
await noteChromeMcpBrowserReadiness(
{
browser: {
profiles: {
openclaw: { color: "#FF4500" },
},
},
},
{
noteFn,
platform: "linux",
env: { DISPLAY: ":99" },
getUid: () => 1000,
configDir,
pathExists: (targetPath) => targetPath.endsWith("/browser/clawd/user-data"),
resolveManagedExecutable: () => ({ kind: "chrome", path: "/usr/bin/google-chrome" }),
},
);
expect(noteFn).toHaveBeenCalledTimes(1);
const note = requireFirstNoteText(noteFn);
expect(note).toContain("Legacy managed browser profile residue");
expect(note).toContain("/tmp/openclaw-home/browser/clawd");
expect(note).toContain("/tmp/openclaw-home/browser/openclaw/user-data");
expect(note).toContain("openclaw doctor --fix");
});
it("does not warn when clawd is still configured as a browser profile", async () => {
const noteFn = vi.fn();
await noteChromeMcpBrowserReadiness(
{
browser: {
profiles: {
clawd: { color: "#FF4500" },
openclaw: { color: "#00AA00" },
},
},
},
{
noteFn,
platform: "linux",
env: { DISPLAY: ":99" },
getUid: () => 1000,
configDir: "/tmp/openclaw-home",
pathExists: () => true,
resolveManagedExecutable: () => ({ kind: "chrome", path: "/usr/bin/google-chrome" }),
},
);
expect(noteFn).not.toHaveBeenCalled();
});
it("warns when Chrome MCP is configured but Chrome is missing", async () => {
const noteFn = vi.fn();
await noteChromeMcpBrowserReadiness(
@@ -195,3 +255,54 @@ describe("browser doctor readiness", () => {
expect(note).toContain("brave://inspect/#remote-debugging");
});
});
describe("legacy clawd browser profile cleanup", () => {
it("archives stale clawd residue with the safe trash mover", async () => {
const movePathToTrash = vi.fn(async () => "/tmp/openclaw-home/browser/.trash/clawd");
const result = await maybeArchiveLegacyClawdBrowserProfileResidue(
{
browser: {
profiles: {
openclaw: { color: "#FF4500" },
},
},
},
{
configDir: "/tmp/openclaw-home",
pathExists: (targetPath) => targetPath.endsWith("/browser/clawd/user-data"),
movePathToTrash,
},
);
expect(movePathToTrash).toHaveBeenCalledWith("/tmp/openclaw-home/browser/clawd");
expect(result.warnings).toStrictEqual([]);
expect(result.changes.join("\n")).toContain(
"Archived legacy clawd managed browser profile residue.",
);
expect(result.changes.join("\n")).toContain("/tmp/openclaw-home/browser/openclaw/user-data");
});
it("does not archive a configured clawd browser profile", async () => {
const movePathToTrash = vi.fn(async () => "/tmp/unused");
const result = await maybeArchiveLegacyClawdBrowserProfileResidue(
{
browser: {
defaultProfile: "clawd",
profiles: {
clawd: { color: "#FF4500" },
},
},
},
{
configDir: "/tmp/openclaw-home",
pathExists: () => true,
movePathToTrash,
},
);
expect(movePathToTrash).not.toHaveBeenCalled();
expect(result).toStrictEqual({ changes: [], warnings: [] });
});
});

View File

@@ -1,3 +1,5 @@
import fs from "node:fs";
import path from "node:path";
import { normalizeOptionalString } from "openclaw/plugin-sdk/string-coerce-runtime";
import {
parseBrowserMajorVersion,
@@ -5,12 +7,15 @@ import {
resolveBrowserExecutableForPlatform,
resolveGoogleChromeExecutableForPlatform,
} from "./browser/chrome.executables.js";
import { resolveBrowserConfig } from "./browser/config.js";
import { DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME, resolveBrowserConfig } from "./browser/config.js";
import { movePathToTrash } from "./browser/trash.js";
import type { OpenClawConfig } from "./config/config.js";
import { asRecord } from "./record-shared.js";
import { note } from "./sdk-setup-tools.js";
import { formatCliCommand, note } from "./sdk-setup-tools.js";
import { CONFIG_DIR, resolveUserPath } from "./utils.js";
const CHROME_MCP_MIN_MAJOR = 144;
const LEGACY_CLAWD_BROWSER_PROFILE_NAME = "clawd";
const REMOTE_DEBUGGING_PAGES = [
"chrome://inspect/#remote-debugging",
"brave://inspect/#remote-debugging",
@@ -26,6 +31,18 @@ type ManagedProfile = {
name: string;
};
export type LegacyClawdBrowserProfileResidue = {
legacyProfileDir: string;
legacyUserDataDir: string;
canonicalUserDataDir: string;
};
type BrowserDoctorFilesystemDeps = {
configDir?: string;
pathExists?: (targetPath: string) => boolean;
movePathToTrash?: (targetPath: string) => Promise<string>;
};
function collectChromeMcpProfiles(cfg: OpenClawConfig): ExistingSessionProfile[] {
const browser = asRecord(cfg.browser);
if (!browser) {
@@ -85,6 +102,102 @@ function collectManagedProfiles(cfg: OpenClawConfig): ManagedProfile[] {
return [...profiles.values()].toSorted((a, b) => a.name.localeCompare(b.name));
}
function resolveManagedBrowserProfileDir(configDir: string, profileName: string): string {
return path.join(configDir, "browser", profileName);
}
function resolveManagedBrowserUserDataDir(configDir: string, profileName: string): string {
return path.join(resolveManagedBrowserProfileDir(configDir, profileName), "user-data");
}
function normalizeComparablePath(targetPath: string): string {
return path.resolve(targetPath);
}
function isSameOrChildPath(candidatePath: string, parentPath: string): boolean {
const candidate = normalizeComparablePath(candidatePath);
const parent = normalizeComparablePath(parentPath);
return candidate === parent || candidate.startsWith(`${parent}${path.sep}`);
}
function isLegacyClawdProfileConfigured(cfg: OpenClawConfig, legacyProfileDir: string): boolean {
const browser = asRecord(cfg.browser);
if (!browser) {
return false;
}
if (normalizeOptionalString(browser.defaultProfile) === LEGACY_CLAWD_BROWSER_PROFILE_NAME) {
return true;
}
const configuredProfiles = asRecord(browser.profiles);
if (!configuredProfiles) {
return false;
}
if (Object.prototype.hasOwnProperty.call(configuredProfiles, LEGACY_CLAWD_BROWSER_PROFILE_NAME)) {
return true;
}
for (const rawProfile of Object.values(configuredProfiles)) {
const profile = asRecord(rawProfile);
const userDataDir = normalizeOptionalString(profile?.userDataDir);
if (userDataDir && isSameOrChildPath(resolveUserPath(userDataDir), legacyProfileDir)) {
return true;
}
}
return false;
}
export function detectLegacyClawdBrowserProfileResidue(
cfg: OpenClawConfig,
deps?: BrowserDoctorFilesystemDeps,
): LegacyClawdBrowserProfileResidue | null {
const configDir = deps?.configDir ?? CONFIG_DIR;
const legacyProfileDir = resolveManagedBrowserProfileDir(
configDir,
LEGACY_CLAWD_BROWSER_PROFILE_NAME,
);
const legacyUserDataDir = resolveManagedBrowserUserDataDir(
configDir,
LEGACY_CLAWD_BROWSER_PROFILE_NAME,
);
const pathExists = deps?.pathExists ?? fs.existsSync;
if (!pathExists(legacyProfileDir) && !pathExists(legacyUserDataDir)) {
return null;
}
if (isLegacyClawdProfileConfigured(cfg, legacyProfileDir)) {
return null;
}
const resolved = resolveBrowserConfig(cfg.browser, cfg);
const defaultProfile = resolved.profiles[resolved.defaultProfile];
if (
resolved.defaultProfile !== DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME ||
defaultProfile?.driver === "existing-session"
) {
return null;
}
return {
legacyProfileDir,
legacyUserDataDir,
canonicalUserDataDir: resolveManagedBrowserUserDataDir(
configDir,
DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME,
),
};
}
function formatLegacyClawdBrowserProfileResidueNote(
residue: LegacyClawdBrowserProfileResidue,
): string {
return [
`- Legacy managed browser profile residue was found at ${residue.legacyProfileDir}.`,
`- The canonical OpenClaw-managed browser profile is ${residue.canonicalUserDataDir}.`,
`- If no browser is using the legacy profile, run ${formatCliCommand("openclaw doctor --fix")} to archive it safely instead of deleting it in place.`,
].join("\n");
}
export async function noteChromeMcpBrowserReadiness(
cfg: OpenClawConfig,
deps?: {
@@ -95,6 +208,8 @@ export async function noteChromeMcpBrowserReadiness(
resolveManagedExecutable?: typeof resolveBrowserExecutableForPlatform;
resolveChromeExecutable?: (platform: NodeJS.Platform) => { path: string } | null;
readVersion?: (executablePath: string) => string | null;
configDir?: string;
pathExists?: (targetPath: string) => boolean;
},
) {
const noteFn = deps?.noteFn ?? note;
@@ -109,6 +224,13 @@ export async function noteChromeMcpBrowserReadiness(
const managedProfiles = collectManagedProfiles(cfg);
const managedProfileLabel = managedProfiles.map((profile) => profile.name).join(", ");
const resolved = resolveBrowserConfig(cfg.browser, cfg);
const legacyClawdResidue = detectLegacyClawdBrowserProfileResidue(cfg, {
configDir: deps?.configDir,
pathExists: deps?.pathExists,
});
if (legacyClawdResidue) {
noteFn(formatLegacyClawdBrowserProfileResidueNote(legacyClawdResidue), "Browser");
}
const browserExecutable =
managedProfiles.length > 0 ? resolveManagedExecutable(resolved, platform) : null;
const missingDisplay =
@@ -225,3 +347,35 @@ export async function noteChromeMcpBrowserReadiness(
noteFn(lines.join("\n"), "Browser");
}
export async function maybeArchiveLegacyClawdBrowserProfileResidue(
cfg: OpenClawConfig,
deps?: BrowserDoctorFilesystemDeps,
): Promise<{ changes: string[]; warnings: string[] }> {
const residue = detectLegacyClawdBrowserProfileResidue(cfg, deps);
if (!residue) {
return { changes: [], warnings: [] };
}
const move = deps?.movePathToTrash ?? movePathToTrash;
try {
const archivedPath = await move(residue.legacyProfileDir);
return {
changes: [
[
"Archived legacy clawd managed browser profile residue.",
`- legacy profile: ${residue.legacyProfileDir}`,
`- canonical profile: ${residue.canonicalUserDataDir}`,
`- archived at: ${archivedPath}`,
].join("\n"),
],
warnings: [],
};
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return {
changes: [],
warnings: [`Legacy clawd browser profile residue could not be archived: ${message}`],
};
}
}

View File

@@ -1406,6 +1406,46 @@ describe("CodexAppServerEventProjector", () => {
expect(toolResult.result).toEqual({ status: "completed", exitCode: 0, durationMs: 42 });
});
it("uses streamed command output for failed native tool errors", async () => {
const projector = await createProjector();
await projector.handleNotification(
forCurrentTurn("item/commandExecution/outputDelta", {
itemId: "cmd-streamed-failure",
delta: "fatal: missing fixture\n",
}),
);
await projector.handleNotification(
turnCompleted([
{
type: "commandExecution",
id: "cmd-streamed-failure",
command: "pnpm test extensions/codex",
cwd: "/workspace",
processId: null,
source: "agent",
status: "failed",
commandActions: [],
aggregatedOutput: null,
exitCode: 1,
durationMs: 42,
},
]),
);
expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
toolName: "bash",
meta: "run tests (workspace)",
error: "fatal: missing fixture",
mutatingAction: true,
actionFingerprint: JSON.stringify({
type: "commandExecution",
command: "pnpm test extensions/codex",
cwd: "/workspace",
}),
});
});
it("does not duplicate native tool starts when the snapshot completes a started item", async () => {
const onAgentEvent = vi.fn();
const trajectoryRecorder = {
@@ -1609,6 +1649,121 @@ describe("CodexAppServerEventProjector", () => {
toolCallId: "cmd-declined",
},
]);
expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
toolName: "bash",
meta: "run tests (workspace)",
error: "codex native tool blocked",
mutatingAction: true,
actionFingerprint: JSON.stringify({
type: "commandExecution",
command: "pnpm test extensions/codex",
cwd: "/workspace",
}),
});
});
it("clears a recovered declined native tool error", async () => {
const projector = await createProjector();
await projector.handleNotification(
forCurrentTurn("item/completed", {
item: {
type: "commandExecution",
id: "cmd-declined",
command: "pnpm test extensions/codex",
cwd: "/workspace",
processId: null,
source: "agent",
status: "declined",
commandActions: [],
aggregatedOutput: null,
exitCode: null,
durationMs: 1,
},
}),
);
expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
toolName: "bash",
meta: "run tests (workspace)",
error: "codex native tool blocked",
mutatingAction: true,
actionFingerprint: JSON.stringify({
type: "commandExecution",
command: "pnpm test extensions/codex",
cwd: "/workspace",
}),
});
await projector.handleNotification(
forCurrentTurn("item/completed", {
item: {
type: "commandExecution",
id: "cmd-recovered",
command: "pnpm test extensions/codex",
cwd: "/workspace",
processId: null,
source: "agent",
status: "completed",
commandActions: [],
aggregatedOutput: "ok",
exitCode: 0,
durationMs: 42,
},
}),
);
expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toBeUndefined();
});
it("does not clear a declined native tool error with a different action", async () => {
const projector = await createProjector();
await projector.handleNotification(
forCurrentTurn("item/completed", {
item: {
type: "commandExecution",
id: "cmd-declined",
command: "pnpm test extensions/codex",
cwd: "/workspace",
processId: null,
source: "agent",
status: "declined",
commandActions: [],
aggregatedOutput: null,
exitCode: null,
durationMs: 1,
},
}),
);
await projector.handleNotification(
forCurrentTurn("item/completed", {
item: {
type: "commandExecution",
id: "cmd-unrelated-success",
command: "pnpm test src/foo.test.ts",
cwd: "/workspace",
processId: null,
source: "agent",
status: "completed",
commandActions: [],
aggregatedOutput: "ok",
exitCode: 0,
durationMs: 42,
},
}),
);
expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
toolName: "bash",
meta: "run tests (workspace)",
error: "codex native tool blocked",
mutatingAction: true,
actionFingerprint: JSON.stringify({
type: "commandExecution",
command: "pnpm test extensions/codex",
cwd: "/workspace",
}),
});
});
it("emits after_tool_call observations for Codex-native tool item completions", async () => {

View File

@@ -152,6 +152,7 @@ export class CodexAppServerEventProjector {
private readonly toolTranscriptCallIds = new Set<string>();
private readonly toolTranscriptResultIds = new Set<string>();
private readonly transcriptToolProgressCallIds = new Set<string>();
private lastNativeToolError: EmbeddedRunAttemptResult["lastToolError"];
private readonly nativeGeneratedMediaUrls = new Set<string>();
private readonly diagnosticToolStartedAtByItem = new Map<string, number>();
private readonly afterToolCallObservedItemIds = new Set<string>();
@@ -338,6 +339,7 @@ export class CodexAppServerEventProjector {
assistantTexts,
toolMetas: [...this.toolMetas.values()],
lastAssistant,
...(this.lastNativeToolError ? { lastToolError: this.lastNativeToolError } : {}),
didSendViaMessagingTool: toolTelemetry.didSendViaMessagingTool,
messagingToolSentTexts: toolTelemetry.messagingToolSentTexts,
messagingToolSentMediaUrls: toolTelemetry.messagingToolSentMediaUrls,
@@ -907,15 +909,18 @@ export class CodexAppServerEventProjector {
}
const status = params.phase === "result" ? itemStatus(item) : "running";
const args = itemToolArgs(item);
const meta = itemMeta(item, this.toolProgressDetailMode());
this.recordToolTrajectoryEvent({ phase: params.phase, item, name, args, status });
this.emitDiagnosticToolExecutionEvent({ phase: params.phase, item, name, status });
if (params.phase === "result") {
this.recordNativeToolError({ item, name, meta, status });
}
if (!shouldEmitTranscriptToolProgress(name, args)) {
if (params.phase === "result") {
this.emitAfterToolCallObservation(item);
}
return;
}
const meta = itemMeta(item, this.toolProgressDetailMode());
this.emitAgentEvent({
stream: "tool",
data: {
@@ -939,6 +944,41 @@ export class CodexAppServerEventProjector {
}
}
private recordNativeToolError(params: {
item: CodexThreadItem;
name: string;
meta?: string;
status: ReturnType<typeof itemStatus>;
}): void {
if (!isNonSuccessItemStatus(params.status)) {
if (!this.lastNativeToolError) {
return;
}
if (!this.lastNativeToolError.mutatingAction) {
this.lastNativeToolError = undefined;
return;
}
const actionFingerprint = nativeToolActionFingerprint(params.item);
if (
this.lastNativeToolError.actionFingerprint &&
actionFingerprint &&
this.lastNativeToolError.actionFingerprint === actionFingerprint
) {
this.lastNativeToolError = undefined;
}
return;
}
const error = itemToolError(params.item, params.status, this.toolResultOutputTextByItem);
const actionFingerprint = nativeToolActionFingerprint(params.item);
this.lastNativeToolError = {
toolName: params.name,
...(params.meta ? { meta: params.meta } : {}),
...(error ? { error } : {}),
...(isMutatingNativeToolItem(params.item) ? { mutatingAction: true } : {}),
...(actionFingerprint ? { actionFingerprint } : {}),
};
}
private recordToolTrajectoryEvent(params: {
phase: "start" | "result";
item: CodexThreadItem;
@@ -1709,6 +1749,27 @@ function shouldSynthesizeToolProgressForItem(item: CodexThreadItem): boolean {
}
}
function isMutatingNativeToolItem(item: CodexThreadItem): boolean {
return item.type === "commandExecution" || item.type === "fileChange";
}
function nativeToolActionFingerprint(item: CodexThreadItem): string | undefined {
if (item.type === "commandExecution" && typeof item.command === "string") {
return JSON.stringify({
type: item.type,
command: item.command,
cwd: typeof item.cwd === "string" ? item.cwd : "",
});
}
if (item.type === "fileChange") {
return JSON.stringify({
type: item.type,
changes: itemFileChanges(item),
});
}
return undefined;
}
function isNativePostToolUseRelayItem(item: CodexThreadItem): boolean {
switch (item.type) {
case "commandExecution":

View File

@@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import {
rewriteCopilotConnectionBoundResponseIds,
rewriteCopilotResponsePayloadConnectionBoundIds,
sanitizeCopilotReplayResponseIds,
} from "./connection-bound-ids.js";
describe("github-copilot connection-bound response IDs", () => {
@@ -35,7 +36,7 @@ describe("github-copilot connection-bound response IDs", () => {
expect(input[4]?.id).toMatch(/^msg_[a-f0-9]{16}$/);
});
it("preserves reasoning IDs regardless of encrypted_content", () => {
it("preserves valid reasoning IDs regardless of encrypted_content", () => {
const withEncrypted = Buffer.from(`reasoning-${"e".repeat(24)}`).toString("base64");
const withNull = Buffer.from(`reasoning-${"n".repeat(24)}`).toString("base64");
const withoutField = Buffer.from(`reasoning-${"a".repeat(24)}`).toString("base64");
@@ -51,6 +52,38 @@ describe("github-copilot connection-bound response IDs", () => {
expect(input[2]?.id).toBe(withoutField);
});
it("preserves valid base64-ish reasoning IDs with and without encrypted content", () => {
const withEncrypted = "abcDEF0123+/=";
const withoutEncrypted = "reasoning/abc+123=";
const input = [
{ id: withEncrypted, type: "reasoning", encrypted_content: "opaque-encrypted-payload" },
{ id: withoutEncrypted, type: "reasoning" },
];
expect(sanitizeCopilotReplayResponseIds(input)).toBe(false);
expect(input.map((item) => item.id)).toEqual([withEncrypted, withoutEncrypted]);
});
it("drops unsafe reasoning replay items instead of stripping their IDs", () => {
const overlongId = `5PX6gLHXT5wE+Y2tPmUV4gn+${"B".repeat(384)}`;
const input = [
{
id: overlongId,
type: "reasoning",
encrypted_content: "encrypted-replay-payload",
summary: [],
},
{ type: "reasoning", encrypted_content: "missing-id", summary: [] },
{ id: 123, type: "reasoning", encrypted_content: "non-string-id", summary: [] },
{ id: "rs_valid", type: "reasoning", encrypted_content: "valid", summary: [] },
];
expect(sanitizeCopilotReplayResponseIds(input)).toBe(true);
expect(input).toEqual([
{ id: "rs_valid", type: "reasoning", encrypted_content: "valid", summary: [] },
]);
});
it("patches response payload input arrays only", () => {
const messageId = Buffer.from(`message-${"m".repeat(24)}`).toString("base64");
const payload = { input: [{ id: messageId, type: "message" }] };

View File

@@ -2,7 +2,7 @@ import { createHash } from "node:crypto";
// Copilot's OpenAI-compatible `/responses` endpoint can emit replay item IDs
// that encode upstream connection state. Those IDs are rejected after the
// connection changes, so normalize them at the provider boundary before send.
// connection changes, so sanitize them at the provider boundary before send.
function looksLikeConnectionBoundId(id: string): boolean {
if (id.length < 24) {
@@ -25,21 +25,36 @@ function deriveReplacementId(type: string | undefined, originalId: string): stri
type InputItem = Record<string, unknown> & { id?: unknown; type?: unknown };
export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolean {
function isInputItem(value: unknown): value is InputItem {
return !!value && typeof value === "object";
}
function isValidReasoningReplayId(id: unknown): id is string {
return typeof id === "string" && id.length > 0 && id.length <= 64;
}
export function sanitizeCopilotReplayResponseIds(input: unknown): boolean {
if (!Array.isArray(input)) {
return false;
}
let rewrote = false;
for (const item of input as InputItem[]) {
const id = item.id;
if (typeof id !== "string" || id.length === 0) {
for (let index = input.length - 1; index >= 0; index -= 1) {
const item = input[index];
if (!isInputItem(item)) {
continue;
}
const id = item.id;
// Reasoning items always reference server-side encrypted state bound to the
// original item ID. Rewriting the ID — even when encrypted_content is absent
// or null — breaks Copilot's server-side lookup and causes a 400 validation
// failure regardless of whether the client included encrypted_content.
// original item ID. Rewriting or stripping that ID can turn replay into an
// invalid or ambiguous server-state lookup, so drop unsafe reasoning items.
if (item.type === "reasoning") {
if (!isValidReasoningReplayId(id)) {
input.splice(index, 1);
rewrote = true;
}
continue;
}
if (typeof id !== "string" || id.length === 0) {
continue;
}
if (looksLikeConnectionBoundId(id)) {
@@ -50,9 +65,17 @@ export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolea
return rewrote;
}
export function rewriteCopilotResponsePayloadConnectionBoundIds(payload: unknown): boolean {
export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolean {
return sanitizeCopilotReplayResponseIds(input);
}
export function sanitizeCopilotReplayResponsePayloadIds(payload: unknown): boolean {
if (!payload || typeof payload !== "object") {
return false;
}
return rewriteCopilotConnectionBoundResponseIds((payload as { input?: unknown }).input);
return sanitizeCopilotReplayResponseIds((payload as { input?: unknown }).input);
}
export function rewriteCopilotResponsePayloadConnectionBoundIds(payload: unknown): boolean {
return sanitizeCopilotReplayResponsePayloadIds(payload);
}

View File

@@ -118,14 +118,21 @@ describe("wrapCopilotAnthropicStream", () => {
expect(baseStreamFn.mock.calls).toEqual([[model, context, options]]);
});
it("adds Copilot headers, preserves reasoning IDs, and rewrites message IDs before payload send", () => {
it("adds Copilot headers, sanitizes reasoning replay, and rewrites message IDs before payload send", () => {
const reasoningId = Buffer.from(`reasoning-${"x".repeat(24)}`).toString("base64");
const overlongReasoningId = `5PX6gLHXT5wE+Y2tPmUV4gn+${"B".repeat(384)}`;
const messageId = Buffer.from(`message-${"y".repeat(24)}`).toString("base64");
const payloads: Array<{ input: Array<Record<string, unknown>> }> = [];
const baseStreamFn = vi.fn((_model, _context, options) => {
const payload = {
input: [
{ id: reasoningId, type: "reasoning" },
{ id: reasoningId, type: "reasoning", encrypted_content: "valid-encrypted-payload" },
{
id: overlongReasoningId,
type: "reasoning",
encrypted_content: "invalid-encrypted-payload",
summary: [],
},
{ id: messageId, type: "message" },
],
};
@@ -174,6 +181,7 @@ describe("wrapCopilotAnthropicStream", () => {
onPayload: options.onPayload,
});
expect(payloads[0]?.input[0]?.id).toBe(reasoningId);
expect(payloads[0]?.input.map((item) => item.type)).toEqual(["reasoning", "message"]);
expect(payloads[0]?.input[1]?.id).toMatch(/^msg_[a-f0-9]{16}$/);
});

View File

@@ -965,6 +965,108 @@ describe("qa cli runtime", () => {
}
});
it("writes a runtime-axis token-efficiency report when requested", async () => {
const repoRoot = await fs.mkdtemp(path.join(os.tmpdir(), "qa-runtime-token-efficiency-"));
const priorExitCode = process.exitCode;
process.exitCode = undefined;
try {
await fs.writeFile(
path.join(repoRoot, "runtime-summary.json"),
JSON.stringify({
scenarios: [
{
name: "runtime-tool-fs-read",
status: "pass",
steps: [],
runtimeParity: {
scenarioId: "runtime-tool-fs-read",
drift: "none",
cells: {
pi: {
runtime: "pi",
transcriptBytes: '{"role":"assistant"}\n',
toolCalls: [{ tool: "fs.read", argsHash: "a", resultHash: "r" }],
finalText: "done",
usage: { inputTokens: 72_000, outputTokens: 381, totalTokens: 72_381 },
wallClockMs: 10,
bootStateLines: [],
},
codex: {
runtime: "codex",
transcriptBytes: '{"role":"assistant"}\n',
toolCalls: Array.from({ length: 40 }, (_, index) => ({
tool: "fs.read",
argsHash: `a-${index}`,
resultHash: `r-${index}`,
})),
finalText: "done",
usage: { inputTokens: 118_000, outputTokens: 1_489, totalTokens: 119_489 },
wallClockMs: 10,
bootStateLines: [],
},
},
},
},
],
counts: { total: 1, passed: 1, failed: 0 },
run: {
providerMode: "live-frontier",
primaryModel: "openai/gpt-5.5",
runtimePair: ["pi", "codex"],
},
}),
"utf8",
);
await runQaParityReportCommand({
repoRoot,
runtimeAxis: true,
summary: "runtime-summary.json",
tokenEfficiency: true,
});
expect(process.exitCode).toBe(1);
expect(stdoutWrite).toHaveBeenCalledWith(
expect.stringContaining("QA runtime parity verdict: pass"),
);
expect(stdoutWrite).toHaveBeenCalledWith(
expect.stringContaining("QA runtime token efficiency report:"),
);
expect(stdoutWrite).toHaveBeenCalledWith(
expect.stringContaining("QA runtime token efficiency verdict: fail"),
);
const [artifactDir] = await fs.readdir(path.join(repoRoot, ".artifacts", "qa-e2e"));
const tokenSummary = JSON.parse(
await fs.readFile(
path.join(
repoRoot,
".artifacts",
"qa-e2e",
artifactDir ?? "",
"qa-runtime-token-efficiency-summary.json",
),
"utf8",
),
) as { aggregate?: { flaggedScenarios?: string[] } };
expect(tokenSummary.aggregate?.flaggedScenarios).toEqual(["runtime-tool-fs-read"]);
} finally {
process.exitCode = priorExitCode;
await fs.rm(repoRoot, { recursive: true, force: true });
}
});
it("rejects token-efficiency without runtime-axis mode", async () => {
await expect(
runQaParityReportCommand({
repoRoot: process.cwd(),
candidateSummary: "candidate.json",
baselineSummary: "baseline.json",
tokenEfficiency: true,
}),
).rejects.toThrow("--token-efficiency requires --runtime-axis.");
});
it("prints a markdown coverage report from scenario metadata", async () => {
await runQaCoverageReportCommand({ repoRoot: process.cwd() });
@@ -979,6 +1081,64 @@ describe("qa cli runtime", () => {
expectWriteContains(stdoutWrite, "codex-native-workspace");
});
it("exits nonzero when tool coverage summary is missing a required runtime tool call", async () => {
const priorExitCode = process.exitCode;
const repoRoot = await fs.mkdtemp(path.join(os.tmpdir(), "qa-tool-coverage-"));
try {
await fs.writeFile(
path.join(repoRoot, "runtime-summary.json"),
JSON.stringify({
scenarios: [
{
name: "runtime-tool-web-search",
status: "fail",
runtimeParity: {
scenarioId: "runtime-tool-web-search",
drift: "tool-call-shape",
driftDetails: "Codex emitted no web_search call",
cells: {
pi: {
runtime: "pi",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
codex: {
runtime: "codex",
transcriptBytes: "",
toolCalls: [],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
},
},
},
],
run: { runtimePair: ["pi", "codex"] },
}),
"utf8",
);
await runQaCoverageReportCommand({
repoRoot,
tools: true,
summary: "runtime-summary.json",
});
expect(process.exitCode).toBe(1);
expectWriteContains(stdoutWrite, "- Verdict: fail");
expectWriteContains(stdoutWrite, "web-search missing codex tool call web_search");
} finally {
process.exitCode = priorExitCode;
await fs.rm(repoRoot, { recursive: true, force: true });
}
});
it("resolves character eval paths and passes model refs through", async () => {
await runQaCharacterEvalCommand({
repoRoot: "/tmp/openclaw-repo",

View File

@@ -50,6 +50,11 @@ import {
import { resolveQaScenarioPackScenarioIds } from "./scenario-packs.js";
import { runQaSuiteFromRuntime } from "./suite-launch.runtime.js";
import { readQaSuiteFailedScenarioCountFromSummary } from "./suite-summary.js";
import {
buildTokenEfficiencyReport,
renderTokenEfficiencyMarkdownReport,
type TokenEfficiencySuiteSummary,
} from "./token-efficiency-report.js";
import {
buildQaToolCoverageReport,
renderQaToolCoverageMarkdownReport,
@@ -681,8 +686,12 @@ export async function runQaParityReportCommand(opts: {
outputDir?: string;
runtimeAxis?: boolean;
summary?: string;
tokenEfficiency?: boolean;
}) {
const repoRoot = path.resolve(opts.repoRoot ?? process.cwd());
if (opts.tokenEfficiency === true && opts.runtimeAxis !== true) {
throw new Error("--token-efficiency requires --runtime-axis.");
}
const outputDir =
resolveRepoRelativeOutputDir(repoRoot, opts.outputDir) ??
path.join(repoRoot, ".artifacts", "qa-e2e", `parity-${Date.now().toString(36)}`);
@@ -706,7 +715,26 @@ export async function runQaParityReportCommand(opts: {
process.stdout.write(`QA runtime parity report: ${reportPath}\n`);
process.stdout.write(`QA runtime parity summary: ${runtimeSummaryPath}\n`);
process.stdout.write(`QA runtime parity verdict: ${reportPayload.pass ? "pass" : "fail"}\n`);
if (!reportPayload.pass) {
let tokenEfficiencyPass = true;
if (opts.tokenEfficiency === true) {
const tokenPayload = buildTokenEfficiencyReport({
summary: summary as TokenEfficiencySuiteSummary,
});
tokenEfficiencyPass = tokenPayload.pass;
const tokenReport = renderTokenEfficiencyMarkdownReport(tokenPayload);
const tokenReportPath = path.join(outputDir, "qa-runtime-token-efficiency-report.md");
const tokenSummaryPath = path.join(outputDir, "qa-runtime-token-efficiency-summary.json");
await fs.writeFile(tokenReportPath, tokenReport, "utf8");
await fs.writeFile(tokenSummaryPath, `${JSON.stringify(tokenPayload, null, 2)}\n`, "utf8");
process.stdout.write(`QA runtime token efficiency report: ${tokenReportPath}\n`);
process.stdout.write(`QA runtime token efficiency summary: ${tokenSummaryPath}\n`);
process.stdout.write(
`QA runtime token efficiency verdict: ${tokenPayload.status === "skipped" ? "skipped" : tokenPayload.pass ? "pass" : "fail"}\n`,
);
}
if (!reportPayload.pass || !tokenEfficiencyPass) {
process.exitCode = 1;
}
return;
@@ -769,6 +797,9 @@ export async function runQaCoverageReportCommand(opts: {
? `${JSON.stringify(report, null, 2)}\n`
: renderQaToolCoverageMarkdownReport(report);
outputLabel = "QA tool coverage report";
if (summary && !report.pass) {
process.exitCode = 1;
}
} else {
if (opts.summary?.trim()) {
throw new Error("--summary requires --tools.");

View File

@@ -66,6 +66,7 @@ async function runQaParityReport(opts: {
outputDir?: string;
runtimeAxis?: boolean;
summary?: string;
tokenEfficiency?: boolean;
}) {
const runtime = await loadQaLabCliRuntime();
await runtime.runQaParityReportCommand(opts);
@@ -353,6 +354,11 @@ export function registerQaLabCli(program: Command) {
.option("--baseline-summary <path>", "Baseline qa-suite-summary.json path")
.option("--runtime-axis", "Interpret --summary as a runtime-pair qa-suite-summary.json", false)
.option("--summary <path>", "Runtime-axis qa-suite-summary.json path")
.option(
"--token-efficiency",
"Also write the runtime token-efficiency report for --runtime-axis summaries",
false,
)
.option("--repo-root <path>", "Repository root to target when running from a neutral cwd")
.option(
"--candidate-label <label>",
@@ -371,6 +377,7 @@ export function registerQaLabCli(program: Command) {
outputDir?: string;
runtimeAxis?: boolean;
summary?: string;
tokenEfficiency?: boolean;
}) => {
await runQaParityReport(opts);
},

View File

@@ -120,6 +120,7 @@ describe("qa scenario catalog", () => {
const applyPatch = readQaScenarioById("runtime-tool-apply-patch");
const messageTool = readQaScenarioById("runtime-tool-message-tool");
const tavilySearch = readQaScenarioById("runtime-tool-tavily-search");
const webSearch = readQaScenarioById("runtime-tool-web-search");
expect(applyPatch.runtimeParityTier).toBe("standard");
expect(messageTool.runtimeParityTier).toBe("optional");
@@ -140,6 +141,16 @@ describe("qa scenario catalog", () => {
required: false,
},
});
expect(readQaScenarioExecutionConfig(webSearch.id)).toMatchObject({
toolName: "web_search",
toolCoverage: {
bucket: "openclaw-dynamic-integration",
expectedLayer: "openclaw-dynamic",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
},
});
expect(readQaScenarioExecutionConfig(webSearch.id)).not.toHaveProperty("knownHarnessGap");
});
it("loads the Codex Pi-shaped Read vocabulary live parity canary", () => {

View File

@@ -0,0 +1,191 @@
import { describe, expect, it } from "vitest";
import type {
RuntimeId,
RuntimeParityCell,
RuntimeParityResult,
RuntimeParityToolCall,
} from "./runtime-parity.js";
import {
buildTokenEfficiencyReport,
renderTokenEfficiencyMarkdownReport,
type TokenEfficiencySuiteSummary,
} from "./token-efficiency-report.js";
function makeToolCall(tool: string): RuntimeParityToolCall {
return {
tool,
argsHash: `${tool}-args`,
resultHash: `${tool}-result`,
};
}
function makeCell(
runtime: RuntimeId,
usage: RuntimeParityCell["usage"],
toolCalls: RuntimeParityToolCall[] = [],
): RuntimeParityCell {
return {
runtime,
transcriptBytes: '{"role":"assistant"}\n',
toolCalls,
finalText: "done",
usage,
wallClockMs: 10,
bootStateLines: [],
};
}
function makeRuntimeParity(
scenarioId: string,
pi: RuntimeParityCell,
codex: RuntimeParityCell,
): RuntimeParityResult {
return {
scenarioId,
drift: "none",
cells: { pi, codex },
};
}
function makeLiveSummary(runtimeParity: RuntimeParityResult[]): TokenEfficiencySuiteSummary {
return {
scenarios: runtimeParity.map((result) => ({
name: result.scenarioId,
status: "pass" as const,
runtimeParity: result,
})),
run: {
providerMode: "live-frontier",
runtimePair: ["pi", "codex"],
},
};
}
describe("token efficiency report", () => {
it("does not fail live reports solely because Codex uses fewer tokens", () => {
const report = buildTokenEfficiencyReport({
generatedAt: "2026-05-10T00:00:00.000Z",
summary: makeLiveSummary([
makeRuntimeParity(
"codex-savings",
makeCell("pi", { inputTokens: 120, outputTokens: 80, totalTokens: 200 }),
makeCell("codex", { inputTokens: 60, outputTokens: 40, totalTokens: 100 }),
),
]),
});
expect(report.pass).toBe(true);
expect(report.aggregate.flaggedScenarios).toEqual([]);
expect(report.aggregate.savingsScenarios).toEqual(["codex-savings"]);
expect(report.rows[0]).toMatchObject({
deltaPercent: -50,
classification: "savings",
flagged: false,
});
});
it("fails live reports on positive Codex token increases over the threshold", () => {
const report = buildTokenEfficiencyReport({
generatedAt: "2026-05-10T00:00:00.000Z",
summary: makeLiveSummary([
makeRuntimeParity(
"runtime-tool-fs-read",
makeCell("pi", { inputTokens: 72_000, outputTokens: 381, totalTokens: 72_381 }, [
makeToolCall("fs.read"),
makeToolCall("fs.read"),
]),
makeCell(
"codex",
{ inputTokens: 118_000, outputTokens: 1_489, totalTokens: 119_489 },
Array.from({ length: 40 }, () => makeToolCall("fs.read")),
),
),
]),
});
expect(report.pass).toBe(false);
expect(report.aggregate.flaggedScenarios).toEqual(["runtime-tool-fs-read"]);
expect(report.rows[0]).toMatchObject({
classification: "regression",
flagged: true,
toolsUsed: ["fs.read"],
});
expect(report.failures).toEqual([
"runtime-tool-fs-read token delta=+65.1% exceeds 15.0% Codex increase threshold",
]);
});
it("keeps live zero-usage rows failing instead of passing as neutral", () => {
const report = buildTokenEfficiencyReport({
summary: makeLiveSummary([
makeRuntimeParity(
"missing-live-usage",
makeCell("pi", { inputTokens: 0, outputTokens: 0, totalTokens: 0 }),
makeCell("codex", { inputTokens: 0, outputTokens: 0, totalTokens: 0 }),
),
]),
});
expect(report.pass).toBe(false);
expect(report.failures).toEqual([
"missing-live-usage pi live usage totalTokens=0",
"missing-live-usage codex live usage totalTokens=0",
]);
});
it("labels mock-estimated Codex increases as regressions without failing the live gate", () => {
const report = buildTokenEfficiencyReport({
summary: {
scenarios: [
{
name: "mock-regression",
status: "pass",
runtimeParity: makeRuntimeParity(
"mock-regression",
makeCell("pi", { inputTokens: 100, outputTokens: 0, totalTokens: 100 }),
makeCell("codex", { inputTokens: 130, outputTokens: 0, totalTokens: 130 }),
),
},
],
run: {
providerMode: "mock-openai",
runtimePair: ["pi", "codex"],
},
},
});
expect(report.status).toBe("estimated");
expect(report.pass).toBe(true);
expect(report.aggregate.flaggedScenarios).toEqual([]);
expect(report.rows[0]).toMatchObject({
usageSource: "mock-estimate",
classification: "regression",
flagged: false,
});
});
it("renders savings and regression classifications in the markdown report", () => {
const report = buildTokenEfficiencyReport({
generatedAt: "2026-05-10T00:00:00.000Z",
summary: makeLiveSummary([
makeRuntimeParity(
"codex-savings",
makeCell("pi", { inputTokens: 100, outputTokens: 100, totalTokens: 200 }),
makeCell("codex", { inputTokens: 50, outputTokens: 50, totalTokens: 100 }),
),
makeRuntimeParity(
"codex-regression",
makeCell("pi", { inputTokens: 100, outputTokens: 0, totalTokens: 100 }),
makeCell("codex", { inputTokens: 130, outputTokens: 0, totalTokens: 130 }),
),
]),
});
const markdown = renderTokenEfficiencyMarkdownReport(report);
expect(markdown).toContain("p50 per scenario");
expect(markdown).toContain("| codex-savings | live-usage |");
expect(markdown).toContain("| -50.0% | savings | no |");
expect(markdown).toContain("| codex-regression | live-usage |");
expect(markdown).toContain("| +30.0% | regression | yes |");
});
});

View File

@@ -0,0 +1,306 @@
import type { RuntimeId, RuntimeParityCell, RuntimeParityResult } from "./runtime-parity.js";
export type TokenEfficiencyRuntimeUsage = {
inputTokens: number;
outputTokens: number;
totalTokens: number;
toolCallCount: number;
};
export type TokenEfficiencyRow = {
scenarioId: string;
usageSource: "live-usage" | "mock-estimate";
pi: TokenEfficiencyRuntimeUsage;
codex: TokenEfficiencyRuntimeUsage;
deltaPercent: number;
classification: "regression" | "savings" | "neutral";
flagged: boolean;
toolsUsed: string[];
};
export type TokenEfficiencyReport = {
status: "evaluated" | "estimated" | "skipped";
runtimePair: [RuntimeId, RuntimeId];
generatedAt: string;
providerMode?: string;
thresholdPercent: number;
rows: TokenEfficiencyRow[];
aggregate: {
pi: { totalTokens: number; p50PerScenario: number; p90PerScenario: number };
codex: { totalTokens: number; p50PerScenario: number; p90PerScenario: number };
deltaPercent: number;
flaggedScenarios: string[];
savingsScenarios: string[];
};
pass: boolean;
failures: string[];
skipReason?: string;
notes: string[];
};
export type TokenEfficiencySuiteSummary = {
scenarios: Array<{
name: string;
status: "pass" | "fail" | "skip";
runtimeParity?: RuntimeParityResult;
}>;
run?: {
providerMode?: string;
runtimePair?: [RuntimeId, RuntimeId] | null;
};
};
export type BuildTokenEfficiencyReportParams = {
summary: TokenEfficiencySuiteSummary;
generatedAt?: string;
thresholdPercent?: number;
};
const DEFAULT_THRESHOLD_PERCENT = 15;
const ZERO_AGGREGATE: TokenEfficiencyReport["aggregate"] = {
pi: { totalTokens: 0, p50PerScenario: 0, p90PerScenario: 0 },
codex: { totalTokens: 0, p50PerScenario: 0, p90PerScenario: 0 },
deltaPercent: 0,
flaggedScenarios: [],
savingsScenarios: [],
};
function normalizeRuntimePair(
pair: [RuntimeId, RuntimeId] | null | undefined,
): [RuntimeId, RuntimeId] {
if (pair?.[0] && pair?.[1]) {
return pair;
}
return ["pi", "codex"];
}
function normalizeTokenCount(value: number): number {
return Number.isFinite(value) ? Math.max(0, value) : 0;
}
function deltaPercent(piTotalTokens: number, codexTotalTokens: number): number {
if (piTotalTokens === 0) {
return codexTotalTokens === 0 ? 0 : 100;
}
return ((codexTotalTokens - piTotalTokens) / piTotalTokens) * 100;
}
function percentile(values: readonly number[], p: number): number {
if (values.length === 0) {
return 0;
}
const sorted = [...values].toSorted((left, right) => left - right);
const index = Math.min(sorted.length - 1, Math.max(0, Math.ceil((p / 100) * sorted.length) - 1));
return sorted[index] ?? 0;
}
function isLiveProviderMode(providerMode: string | undefined) {
return providerMode?.startsWith("live-") === true;
}
function formatPercent(value: number) {
const sign = value > 0 ? "+" : "";
return `${sign}${value.toFixed(1)}%`;
}
function runtimeUsage(cell: RuntimeParityCell): TokenEfficiencyRuntimeUsage {
return {
inputTokens: normalizeTokenCount(cell.usage.inputTokens),
outputTokens: normalizeTokenCount(cell.usage.outputTokens),
totalTokens: normalizeTokenCount(cell.usage.totalTokens),
toolCallCount: cell.toolCalls.length,
};
}
function toolNamesForCells(pi: RuntimeParityCell, codex: RuntimeParityCell): string[] {
return [...new Set([...pi.toolCalls, ...codex.toolCalls].map((call) => call.tool))].toSorted(
(left, right) => left.localeCompare(right),
);
}
function buildRow(params: {
result: RuntimeParityResult;
thresholdPercent: number;
usageSource: TokenEfficiencyRow["usageSource"];
}): TokenEfficiencyRow {
const pi = runtimeUsage(params.result.cells.pi);
const codex = runtimeUsage(params.result.cells.codex);
const delta = deltaPercent(pi.totalTokens, codex.totalTokens);
const flagged = params.usageSource === "live-usage" && delta > params.thresholdPercent;
const classification =
delta > params.thresholdPercent
? "regression"
: delta < -params.thresholdPercent
? "savings"
: "neutral";
return {
scenarioId: params.result.scenarioId,
usageSource: params.usageSource,
pi,
codex,
deltaPercent: delta,
classification,
flagged,
toolsUsed: toolNamesForCells(params.result.cells.pi, params.result.cells.codex),
};
}
function buildAggregate(rows: readonly TokenEfficiencyRow[]): TokenEfficiencyReport["aggregate"] {
const piTotals = rows.map((row) => row.pi.totalTokens);
const codexTotals = rows.map((row) => row.codex.totalTokens);
const piTotalTokens = piTotals.reduce((sum, value) => sum + value, 0);
const codexTotalTokens = codexTotals.reduce((sum, value) => sum + value, 0);
return {
pi: {
totalTokens: piTotalTokens,
p50PerScenario: percentile(piTotals, 50),
p90PerScenario: percentile(piTotals, 90),
},
codex: {
totalTokens: codexTotalTokens,
p50PerScenario: percentile(codexTotals, 50),
p90PerScenario: percentile(codexTotals, 90),
},
deltaPercent: deltaPercent(piTotalTokens, codexTotalTokens),
flaggedScenarios: rows.filter((row) => row.flagged).map((row) => row.scenarioId),
savingsScenarios: rows
.filter((row) => row.classification === "savings")
.map((row) => row.scenarioId),
};
}
function liveEvidenceFailures(row: TokenEfficiencyRow): string[] {
const failures: string[] = [];
if (row.pi.totalTokens <= 0) {
failures.push(`${row.scenarioId} pi live usage totalTokens=${row.pi.totalTokens}`);
}
if (row.codex.totalTokens <= 0) {
failures.push(`${row.scenarioId} codex live usage totalTokens=${row.codex.totalTokens}`);
}
return failures;
}
export function buildTokenEfficiencyReport(
params: BuildTokenEfficiencyReportParams,
): TokenEfficiencyReport {
const providerMode = params.summary.run?.providerMode;
const runtimePair = normalizeRuntimePair(params.summary.run?.runtimePair);
const thresholdPercent = params.thresholdPercent ?? DEFAULT_THRESHOLD_PERCENT;
const liveUsage = isLiveProviderMode(providerMode);
const usageSource: TokenEfficiencyRow["usageSource"] = liveUsage ? "live-usage" : "mock-estimate";
const parityResults = params.summary.scenarios
.map((scenario) => scenario.runtimeParity)
.filter((result): result is RuntimeParityResult => !!result);
if (parityResults.length === 0) {
return {
status: "skipped",
runtimePair,
generatedAt: params.generatedAt ?? new Date().toISOString(),
...(providerMode ? { providerMode } : {}),
thresholdPercent,
rows: [],
aggregate: ZERO_AGGREGATE,
pass: true,
failures: [],
skipReason: "No runtime parity captures were present in the suite summary.",
notes: ["Token efficiency requires runtime-pair summaries with RuntimeParityResult cells."],
};
}
const rows = parityResults.map((result) =>
buildRow({
result,
thresholdPercent,
usageSource,
}),
);
const aggregate = buildAggregate(rows);
const failures = rows.flatMap((row) => {
const rowFailures = liveUsage ? liveEvidenceFailures(row) : [];
if (row.flagged) {
rowFailures.push(
`${row.scenarioId} token delta=${formatPercent(row.deltaPercent)} exceeds ${thresholdPercent.toFixed(1)}% Codex increase threshold`,
);
}
return rowFailures;
});
return {
status: liveUsage ? "evaluated" : "estimated",
runtimePair,
generatedAt: params.generatedAt ?? new Date().toISOString(),
...(providerMode ? { providerMode } : {}),
thresholdPercent,
rows,
aggregate,
pass: failures.length === 0,
failures,
notes: [
"Token totals are read from RuntimeParityCell.usage, which is captured from normalized AssistantMessage.usage.",
"Codex savings are reported as savings and do not fail the gate; only positive Codex-over-Pi live deltas exceed the threshold.",
usageSource === "mock-estimate"
? "Mock-provider token totals are labeled as estimates and do not block the token-efficiency gate."
: "The report does not inspect provider transport payload token counters.",
],
};
}
export function renderTokenEfficiencyMarkdownReport(report: TokenEfficiencyReport): string {
const lines = [
`# OpenClaw Runtime Token Efficiency - ${report.runtimePair[0]} vs ${report.runtimePair[1]}`,
"",
`- Generated at: ${report.generatedAt}`,
...(report.providerMode ? [`- Provider mode: ${report.providerMode}`] : []),
`- Verdict: ${report.status === "skipped" ? "skipped" : report.pass ? "pass" : "fail"}`,
`- Usage source: ${report.rows[0]?.usageSource ?? "none"}`,
`- Threshold: Codex token increase > ${report.thresholdPercent.toFixed(1)}%`,
"",
];
if (report.skipReason) {
lines.push(`- Skip reason: ${report.skipReason}`, "");
}
lines.push(
"## Aggregate Metrics",
"",
"| Runtime | Total tokens | p50 per scenario | p90 per scenario |",
"| --- | ---: | ---: | ---: |",
`| pi | ${report.aggregate.pi.totalTokens} | ${report.aggregate.pi.p50PerScenario} | ${report.aggregate.pi.p90PerScenario} |`,
`| codex | ${report.aggregate.codex.totalTokens} | ${report.aggregate.codex.p50PerScenario} | ${report.aggregate.codex.p90PerScenario} |`,
`| delta | ${formatPercent(report.aggregate.deltaPercent)} | | |`,
"",
);
if (report.rows.length > 0) {
lines.push(
"## Scenario Efficiency",
"",
"| Scenario | Source | Pi in/out/total/tools | Codex in/out/total/tools | Token delta | Classification | Flagged | Tools used |",
"| --- | --- | ---: | ---: | ---: | --- | --- | --- |",
);
for (const row of report.rows) {
lines.push(
`| ${row.scenarioId} | ${row.usageSource} | ${row.pi.inputTokens}/${row.pi.outputTokens}/${row.pi.totalTokens}/${row.pi.toolCallCount} | ${row.codex.inputTokens}/${row.codex.outputTokens}/${row.codex.totalTokens}/${row.codex.toolCallCount} | ${formatPercent(row.deltaPercent)} | ${row.classification} | ${row.flagged ? "yes" : "no"} | ${row.toolsUsed.join(", ")} |`,
);
}
lines.push("");
}
if (report.failures.length > 0) {
lines.push("## Gate Failures", "");
for (const failure of report.failures) {
lines.push(`- ${failure}`);
}
lines.push("");
}
lines.push("## Notes", "");
for (const note of report.notes) {
lines.push(`- ${note}`);
}
lines.push("");
return lines.join("\n");
}

View File

@@ -223,6 +223,192 @@ describe("qa tool coverage report", () => {
);
});
it("passes required OpenClaw dynamic tool coverage when both runtimes exercise the tool", () => {
const report = buildQaToolCoverageReport({
scenarios: [
makeScenario("tool-web-search", "web-search", {
toolName: "web_search",
toolCoverage: {
bucket: "openclaw-dynamic-integration",
expectedLayer: "openclaw-dynamic",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
},
}),
],
summary: {
scenarios: [
{
name: "tool web_search",
status: "pass",
runtimeParity: {
scenarioId: "tool-web-search",
drift: "tool-result-shape",
driftDetails: "runtime envelopes differ",
cells: {
pi: {
runtime: "pi",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r1" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
codex: {
runtime: "codex",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r2" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
},
},
},
],
},
generatedAt: "2026-05-10T00:00:00.000Z",
});
expect(report.pass).toBe(true);
expect(report.failures).toEqual([]);
expect(report.passingTools).toBe(1);
});
it("fails required OpenClaw dynamic tool coverage when a runtime skips the tool", () => {
const report = buildQaToolCoverageReport({
scenarios: [
makeScenario("tool-web-search", "web-search", {
toolName: "web_search",
toolCoverage: {
bucket: "openclaw-dynamic-integration",
expectedLayer: "openclaw-dynamic",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
},
}),
],
summary: {
scenarios: [
{
name: "tool web_search",
status: "fail",
runtimeParity: {
scenarioId: "tool-web-search",
drift: "tool-call-shape",
driftDetails: "Codex emitted no web_search call",
cells: {
pi: {
runtime: "pi",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
codex: {
runtime: "codex",
transcriptBytes: "",
toolCalls: [],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
},
},
},
],
},
generatedAt: "2026-05-10T00:00:00.000Z",
});
expect(report.pass).toBe(false);
expect(report.failures).toEqual([
"web-search missing codex tool call web_search",
]);
});
it("fails required OpenClaw dynamic tool coverage when the fixture failure mode is preserved", () => {
const report = buildQaToolCoverageReport({
scenarios: [
makeScenario("tool-web-search", "web-search", {
toolName: "web_search",
toolCoverage: {
bucket: "openclaw-dynamic-integration",
expectedLayer: "openclaw-dynamic",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
},
}),
],
summary: {
scenarios: [
{
name: "tool web_search",
status: "fail",
runtimeParity: {
scenarioId: "tool-web-search",
drift: "failure-mode",
driftDetails: "at least one runtime failed",
cells: {
pi: {
runtime: "pi",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
codex: {
runtime: "codex",
transcriptBytes: "",
toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
finalText: "",
usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
wallClockMs: 1,
bootStateLines: [],
},
},
},
},
],
},
generatedAt: "2026-05-10T00:00:00.000Z",
});
expect(report.pass).toBe(false);
expect(report.failures).toEqual([
"web-search drift=failure-mode (at least one runtime failed)",
]);
});
it("fails untracked required tools missing from an evaluated summary", () => {
const report = buildQaToolCoverageReport({
scenarios: [
makeScenario("tool-web-search", "web-search", {
toolCoverage: {
bucket: "openclaw-dynamic-integration",
expectedLayer: "openclaw-dynamic",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
},
}),
],
summary: {
scenarios: [],
},
generatedAt: "2026-05-10T00:00:00.000Z",
});
expect(report.pass).toBe(false);
expect(report.failures).toEqual(["web-search drift=not-run"]);
});
it("rejects unknown runtime tool coverage buckets", () => {
expect(() =>
buildQaToolCoverageReport({
@@ -301,5 +487,13 @@ describe("qa tool coverage report", () => {
"#80173 Tavily tools are listed in the phase matrix but are not exposed by the current default tool surface.",
}),
);
expect(report.rows.find((row) => row.tool === "web-search")).toEqual(
expect.objectContaining({
bucket: "openclaw-dynamic-integration",
capabilityLayer: "openclaw-dynamic-direct",
required: true,
}),
);
expect(report.rows.find((row) => row.tool === "web-search")?.tracking).toBeUndefined();
});
});

View File

@@ -31,6 +31,7 @@ export type QaToolCoverageBucket = QaRuntimeToolBucket;
export type QaToolCoverageRow = {
tool: string;
runtimeToolName?: string;
bucket: QaToolCoverageBucket;
expectedLayer: QaRuntimeToolExpectedLayer;
capabilityLayer: QaRuntimeCapabilityLayer;
@@ -41,6 +42,8 @@ export type QaToolCoverageRow = {
pi: QaToolCoverageStatus;
codex: QaToolCoverageStatus;
drift: QaToolCoverageDrift;
piToolCalls: number;
codexToolCalls: number;
tracking?: string;
codexDefaultImpact?: string;
qaImpact?: string;
@@ -71,7 +74,7 @@ type ToolFixtureGroup = {
scenarios: QaSeedScenarioWithSource[];
};
const PASSING_DRIFTS: ReadonlySet<QaToolCoverageDrift> = new Set(["none", "text-only", "not-run"]);
const PASSING_DRIFTS: ReadonlySet<QaToolCoverageDrift> = new Set(["none", "text-only"]);
function isRecord(value: unknown): value is Record<string, unknown> {
return Boolean(value) && typeof value === "object" && !Array.isArray(value);
@@ -146,6 +149,12 @@ function readScenarioTracking(scenario: QaSeedScenarioWithSource): string | unde
return issue;
}
function readScenarioRuntimeToolName(scenario: QaSeedScenarioWithSource): string | undefined {
const config = scenario.execution.config;
const toolCoverage = isRecord(config?.toolCoverage) ? config.toolCoverage : undefined;
return readString(toolCoverage?.actualTool) ?? readString(config?.toolName);
}
function summaryByScenarioId(
summary: QaToolCoverageSuiteSummary | undefined,
): Map<string, RuntimeParityResult> {
@@ -173,6 +182,21 @@ function mergeScenarioResults(
return failingResult;
}
function isPassingToolCoverageDrift(drift: QaToolCoverageDrift, evaluated: boolean) {
return PASSING_DRIFTS.has(drift) || (!evaluated && drift === "not-run");
}
function countRuntimeToolCalls(
result: RuntimeParityResult | undefined,
runtime: RuntimeId,
toolName: string | undefined,
) {
if (!result || !toolName) {
return 0;
}
return result.cells[runtime].toolCalls.filter((call) => call.tool === toolName).length;
}
function buildRow(params: {
group: ToolFixtureGroup;
results: ReadonlyMap<string, RuntimeParityResult>;
@@ -184,8 +208,12 @@ function buildRow(params: {
.find((entry) => entry.required);
const fallbackMetadata = readScenarioRuntimeToolCoverageMetadata(params.group.scenarios[0]);
const rowMetadata = metadata ?? fallbackMetadata;
const runtimeToolName = params.group.scenarios
.map(readScenarioRuntimeToolName)
.find(Boolean);
return {
tool: params.group.tool,
...(runtimeToolName ? { runtimeToolName } : {}),
bucket: rowMetadata.bucket,
expectedLayer: rowMetadata.expectedLayer,
capabilityLayer: rowMetadata.capabilityLayer,
@@ -196,6 +224,8 @@ function buildRow(params: {
pi: result ? cellStatus(result.cells.pi) : "not-run",
codex: result ? cellStatus(result.cells.codex) : "not-run",
drift: result?.drift ?? "not-run",
piToolCalls: countRuntimeToolCalls(result, "pi", runtimeToolName),
codexToolCalls: countRuntimeToolCalls(result, "codex", runtimeToolName),
...(tracking ? { tracking } : {}),
...(rowMetadata.codexDefaultImpact
? { codexDefaultImpact: rowMetadata.codexDefaultImpact }
@@ -206,6 +236,28 @@ function buildRow(params: {
};
}
function coverageFailureForRow(row: QaToolCoverageRow): string | undefined {
if (!row.required || row.tracking) {
return undefined;
}
if (row.drift === "not-run") {
return `${row.tool} drift=not-run`;
}
if (row.pi !== "pass" || row.codex !== "pass") {
return `${row.tool} status pi=${row.pi} codex=${row.codex}`;
}
if (row.drift === "failure-mode") {
return `${row.tool} drift=failure-mode${row.details ? ` (${row.details})` : ""}`;
}
if (row.runtimeToolName && row.piToolCalls === 0) {
return `${row.tool} missing pi tool call ${row.runtimeToolName}`;
}
if (row.runtimeToolName && row.codexToolCalls === 0) {
return `${row.tool} missing codex tool call ${row.runtimeToolName}`;
}
return undefined;
}
export function buildQaToolCoverageReport(params: {
scenarios: readonly QaSeedScenarioWithSource[];
summary?: QaToolCoverageSuiteSummary;
@@ -221,9 +273,7 @@ export function buildQaToolCoverageReport(params: {
);
const evaluated = Boolean(params.summary);
const failures = evaluated
? rows
.filter((row) => row.required && !row.tracking && !PASSING_DRIFTS.has(row.drift))
.map((row) => `${row.tool} drift=${row.drift}${row.details ? ` (${row.details})` : ""}`)
? rows.map(coverageFailureForRow).filter((failure): failure is string => Boolean(failure))
: [];
return {
runtimePair: normalizeRuntimePair(params.runtimePair ?? params.summary?.run?.runtimePair),
@@ -237,7 +287,15 @@ export function buildQaToolCoverageReport(params: {
dynamicIntegrationTools: rows.filter((row) => row.bucket === "openclaw-dynamic-integration")
.length,
optionalTools: rows.filter((row) => row.bucket === "optional-profile-or-plugin").length,
passingTools: evaluated ? rows.filter((row) => PASSING_DRIFTS.has(row.drift)).length : 0,
passingTools: evaluated
? rows.filter(
(row) =>
!row.tracking &&
row.pi === "pass" &&
row.codex === "pass" &&
(isPassingToolCoverageDrift(row.drift, true) || !coverageFailureForRow(row)),
).length
: 0,
failingTools: failures.length,
rows,
pass: failures.length === 0,

View File

@@ -246,6 +246,44 @@ describe("createTelegramBot fetch abort", () => {
vi.useRealTimers();
});
it("retries Telegram 421 responses after forcing transport fallback", async () => {
const forceFallback = vi.fn(() => true);
const fetchSpy = vi
.fn()
.mockResolvedValueOnce(new Response("Misdirected Request", { status: 421 }))
.mockResolvedValueOnce(new Response("{}", { status: 200 }));
const { clientFetch } = createWrappedTelegramClientFetchWithTransport({
fetch: fetchSpy as typeof fetch,
forceFallback,
});
const result = await clientFetch("https://api.telegram.org/bot123456:ABC/sendMessage");
expect(result).toBeInstanceOf(Response);
expect((result as Response).status).toBe(200);
expect(forceFallback).toHaveBeenCalledWith("misdirected-request");
expect(fetchSpy).toHaveBeenCalledTimes(2);
});
it("retries Telegram 421 fetch errors after forcing transport fallback", async () => {
const forceFallback = vi.fn(() => true);
const fetchSpy = vi
.fn()
.mockRejectedValueOnce(Object.assign(new Error("421 Misdirected Request"), { status: 421 }))
.mockResolvedValueOnce(new Response("{}", { status: 200 }));
const { clientFetch } = createWrappedTelegramClientFetchWithTransport({
fetch: fetchSpy as typeof fetch,
forceFallback,
});
const result = await clientFetch("https://api.telegram.org/bot123456:ABC/sendMessage");
expect(result).toBeInstanceOf(Response);
expect((result as Response).status).toBe(200);
expect(forceFallback).toHaveBeenCalledWith("misdirected-request");
expect(fetchSpy).toHaveBeenCalledTimes(2);
});
it("preserves the original fetch error when tagging cannot attach metadata", async () => {
const frozenError = Object.freeze(
Object.assign(new TypeError("fetch failed"), {

View File

@@ -1,7 +1,7 @@
import type { ApiClientOptions } from "grammy";
import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/string-coerce-runtime";
import type { TelegramTransport } from "./fetch.js";
import { tagTelegramNetworkError } from "./network-errors.js";
import { isTelegramMisdirectedRequestError, tagTelegramNetworkError } from "./network-errors.js";
import { resolveTelegramRequestTimeoutMs } from "./request-timeouts.js";
type TelegramFetchInput = Parameters<NonNullable<ApiClientOptions["fetch"]>>[0];
@@ -135,6 +135,11 @@ export function createTelegramClientFetch(params: {
: undefined;
const requestSignal = isTelegramAbortSignalLike(init?.signal) ? init.signal : undefined;
const canForceTransportFallback = (reason: string) =>
!shutdownSignal?.aborted &&
!requestSignal?.aborted &&
params.transport?.forceFallback?.(reason) === true;
const runFetch = async () => {
const controller = new AbortController();
const abortWith = (signal: Pick<TelegramAbortSignalLike, "reason">) =>
@@ -195,14 +200,22 @@ export function createTelegramClientFetch(params: {
};
try {
return await runFetch();
const response = await runFetch();
if (response.status === 421 && canForceTransportFallback("misdirected-request")) {
return await runFetch();
}
return response;
} catch (err) {
if (
requestTimeoutMs &&
shouldRetryTimedOutTelegramControlRequest(method) &&
!shutdownSignal?.aborted &&
!requestSignal?.aborted &&
params.transport?.forceFallback?.("request-timeout")
canForceTransportFallback("request-timeout")
) {
return await runFetch();
}
if (
isTelegramMisdirectedRequestError(err) &&
canForceTransportFallback("misdirected-request")
) {
return await runFetch();
}

View File

@@ -145,27 +145,32 @@ describe("createTelegramDraftStream", () => {
}
});
it("does not retry DM message preview sends without the topic id", async () => {
const api = createMockDraftApi();
api.sendMessage.mockRejectedValueOnce(new Error("400: Bad Request: message thread not found"));
const warn = vi.fn();
const stream = createDraftStream(api, {
thread: { id: 42, scope: "dm" },
warn,
});
it.each(["forum", "dm"] as const)(
"does not retry %s message preview sends without the topic id",
async (scope) => {
const api = createMockDraftApi();
api.sendMessage.mockRejectedValueOnce(
new Error("400: Bad Request: message thread not found"),
);
const warn = vi.fn();
const stream = createDraftStream(api, {
thread: { id: 42, scope },
warn,
});
stream.update("Hello");
await stream.flush();
stream.update("Hello");
await stream.flush();
expect(api.sendMessage).toHaveBeenCalledTimes(1);
expect(api.sendMessage).toHaveBeenCalledWith(123, "Hello", { message_thread_id: 42 });
expect(warn).toHaveBeenCalledWith(
"telegram stream preview failed: 400: Bad Request: message thread not found",
);
expect(
warn.mock.calls.some(([message]) => String(message).includes("retrying without thread")),
).toBe(false);
});
expect(api.sendMessage).toHaveBeenCalledTimes(1);
expect(api.sendMessage).toHaveBeenCalledWith(123, "Hello", { message_thread_id: 42 });
expect(warn).toHaveBeenCalledWith(
"telegram stream preview failed: 400: Bad Request: message thread not found",
);
expect(
warn.mock.calls.some(([message]) => String(message).includes("retrying without thread")),
).toBe(false);
},
);
it("keeps allow_sending_without_reply on message previews that target a reply", async () => {
const api = createMockDraftApi();

View File

@@ -10,19 +10,6 @@ import { normalizeTelegramReplyToMessageId } from "./outbound-params.js";
const TELEGRAM_STREAM_MAX_CHARS = 4096;
const DEFAULT_THROTTLE_MS = 1000;
const THREAD_NOT_FOUND_RE = /400:\s*Bad Request:\s*message thread not found/i;
type TelegramSendMessageParams = Parameters<Bot["api"]["sendMessage"]>[2];
function hasNumericMessageThreadId(
params: TelegramSendMessageParams | undefined,
): params is TelegramSendMessageParams & { message_thread_id: number } {
return (
typeof params === "object" &&
params !== null &&
typeof (params as { message_thread_id?: unknown }).message_thread_id === "number"
);
}
export type TelegramDraftStream = {
update: (text: string) => void;
@@ -109,7 +96,6 @@ export function createTelegramDraftStream(params: {
const minInitialChars = params.minInitialChars;
const chatId = params.chatId;
const threadParams = buildTelegramThreadParams(params.thread);
const allowThreadlessRetry = params.thread?.scope !== "dm";
const replyToMessageId = normalizeTelegramReplyToMessageId(params.replyToMessageId);
const replyParams =
replyToMessageId != null
@@ -136,10 +122,9 @@ export function createTelegramDraftStream(params: {
renderedParseMode: "HTML" | undefined;
sendGeneration: number;
};
const sendRenderedMessageWithThreadFallback = async (sendArgs: {
const sendRenderedMessage = async (sendArgs: {
renderedText: string;
renderedParseMode: "HTML" | undefined;
fallbackWarnMessage: string;
}) => {
const sendParams = sendArgs.renderedParseMode
? {
@@ -147,28 +132,7 @@ export function createTelegramDraftStream(params: {
parse_mode: sendArgs.renderedParseMode,
}
: replyParams;
const usedThreadParams = hasNumericMessageThreadId(sendParams);
try {
return {
sent: await params.api.sendMessage(chatId, sendArgs.renderedText, sendParams),
usedThreadParams,
};
} catch (err) {
if (!allowThreadlessRetry || !usedThreadParams || !THREAD_NOT_FOUND_RE.test(String(err))) {
throw err;
}
const threadlessParams: TelegramSendMessageParams = { ...sendParams };
delete threadlessParams.message_thread_id;
params.warn?.(sendArgs.fallbackWarnMessage);
return {
sent: await params.api.sendMessage(
chatId,
sendArgs.renderedText,
Object.keys(threadlessParams).length > 0 ? threadlessParams : undefined,
),
usedThreadParams: false,
};
}
return await params.api.sendMessage(chatId, sendArgs.renderedText, sendParams);
};
const sendMessageTransportPreview = async ({
renderedText,
@@ -187,14 +151,12 @@ export function createTelegramDraftStream(params: {
return true;
}
messageSendAttempted = true;
let sent: Awaited<ReturnType<typeof sendRenderedMessageWithThreadFallback>>["sent"];
let sent: Awaited<ReturnType<typeof sendRenderedMessage>>;
try {
({ sent } = await sendRenderedMessageWithThreadFallback({
sent = await sendRenderedMessage({
renderedText,
renderedParseMode,
fallbackWarnMessage:
"telegram stream preview send failed with message_thread_id, retrying without thread",
}));
});
} catch (err) {
if (isSafeToRetrySendError(err) || isTelegramClientRejection(err)) {
messageSendAttempted = false;

View File

@@ -253,6 +253,36 @@ describe("isSafeToRetrySendError", () => {
);
expect(isSafeToRetrySendError(wrapped)).toBe(false);
});
it.each([
["status", Object.assign(new Error("Misdirected Request"), { status: 421 })],
["statusCode", Object.assign(new Error("Misdirected Request"), { statusCode: "421" })],
["error_code", errorWithTelegramCode("Misdirected Request", 421)],
["message", new Error("421 Misdirected Request")],
[
"nested cause",
Object.assign(new Error("Network request for 'sendMessage' failed!"), {
cause: Object.assign(new Error("Misdirected Request"), { status: 421 }),
}),
],
[
"grammY HttpError",
new MockHttpError(
"Network request for 'sendMessage' failed!",
Object.assign(new Error("Misdirected Request"), { status: 421 }),
),
],
])("treats Telegram 421 Misdirected Request as safe to retry via %s", (_name, err) => {
expect(isSafeToRetrySendError(err)).toBe(true);
});
it("does not parse malformed status strings as Telegram 421", () => {
expect(
isSafeToRetrySendError(
Object.assign(new Error("Misdirected Request"), { statusCode: "421abc" }),
),
).toBe(false);
});
});
describe("isTelegramServerError", () => {

View File

@@ -103,6 +103,40 @@ function getErrorCode(err: unknown): string | undefined {
return undefined;
}
function getNumericHttpStatus(err: unknown): number | undefined {
if (!err || typeof err !== "object") {
return undefined;
}
const candidate = err as { error_code?: unknown; status?: unknown; statusCode?: unknown };
for (const value of [candidate.error_code, candidate.status, candidate.statusCode]) {
if (typeof value === "number" && Number.isFinite(value)) {
return value;
}
if (typeof value === "string") {
const trimmed = value.trim();
if (/^\d+$/.test(trimmed)) {
return Number.parseInt(trimmed, 10);
}
}
}
return undefined;
}
export function isTelegramMisdirectedRequestError(err: unknown): boolean {
for (const candidate of collectTelegramErrorCandidates(err)) {
const code = normalizeCode(getErrorCode(candidate));
if (code === "421" || getNumericHttpStatus(candidate) === 421) {
return true;
}
const message = normalizeLowercaseStringOrEmpty(formatErrorMessage(candidate));
if (/\b421\b/.test(message) && message.includes("misdirected request")) {
return true;
}
}
return false;
}
export type TelegramNetworkErrorContext = "polling" | "send" | "webhook" | "unknown";
export type TelegramNetworkErrorOrigin = {
method?: string | null;
@@ -162,6 +196,9 @@ export function isSafeToRetrySendError(err: unknown): boolean {
if (!err) {
return false;
}
if (isTelegramMisdirectedRequestError(err)) {
return true;
}
for (const candidate of collectTelegramErrorCandidates(err)) {
const code = normalizeCode(getErrorCode(candidate));
if (code && PRE_CONNECT_ERROR_CODES.has(code)) {

View File

@@ -27,8 +27,8 @@ export function resolveTelegramSendThreadSpec(params: {
if (messageThreadId == null) {
return undefined;
}
// Telegram supports DM topics; keep direct chat thread IDs and rely on
// thread-not-found retry fallback when a plain DM rejects them.
// Telegram supports DM topics; keep direct chat thread IDs and let invalid
// topics fail closed instead of sending to the base chat.
return {
id: messageThreadId,
scope: params.chatType === "direct" ? "dm" : "forum",

View File

@@ -1828,49 +1828,30 @@ describe("sendMessageTelegram", () => {
}
});
it("retries sends without message_thread_id on thread-not-found", async () => {
const cases = [
{ name: "forum", chatId: "-100123", text: "hello forum", messageId: 58 },
] as const;
it("fails topic sends instead of retrying without message_thread_id", async () => {
const cases = [{ name: "forum", chatId: "-100123", text: "hello forum" }] as const;
const threadErr = new Error("400: Bad Request: message thread not found");
for (const testCase of cases) {
const sendMessage = vi
.fn()
.mockRejectedValueOnce(threadErr)
.mockResolvedValueOnce({
message_id: testCase.messageId,
chat: { id: testCase.chatId },
});
const sendMessage = vi.fn().mockRejectedValueOnce(threadErr);
const api = { sendMessage } as unknown as {
sendMessage: typeof sendMessage;
};
const res = await sendMessageTelegram(testCase.chatId, testCase.text, {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
messageThreadId: 271,
});
await expect(
sendMessageTelegram(testCase.chatId, testCase.text, {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
messageThreadId: 271,
}),
).rejects.toThrow("message thread not found");
expect(sendMessage, testCase.name).toHaveBeenNthCalledWith(
1,
testCase.chatId,
testCase.text,
{
parse_mode: "HTML",
message_thread_id: 271,
},
);
expect(sendMessage, testCase.name).toHaveBeenNthCalledWith(
2,
testCase.chatId,
testCase.text,
{
parse_mode: "HTML",
},
);
expect(res.messageId, testCase.name).toBe(String(testCase.messageId));
expect(sendMessage, testCase.name).toHaveBeenCalledTimes(1);
expect(sendMessage, testCase.name).toHaveBeenCalledWith(testCase.chatId, testCase.text, {
parse_mode: "HTML",
message_thread_id: 271,
});
}
});
@@ -2052,40 +2033,32 @@ describe("sendMessageTelegram", () => {
expect(logs).not.toContain(body);
});
it("logs threadless outbound text delivery after missing-thread fallback", async () => {
it("does not log outbound success when topic text send fails thread lookup", async () => {
const logFile = captureInfoLogs();
const chatId = "-1001234567890";
const body = "fallback reply body should stay private";
const body = "topic reply body should stay private";
const threadErr = new Error("400: Bad Request: message thread not found");
const sendMessage = vi
.fn()
.mockRejectedValueOnce(threadErr)
.mockResolvedValueOnce({
message_id: 322,
chat: { id: chatId },
});
const sendMessage = vi.fn().mockRejectedValueOnce(threadErr);
const api = { sendMessage } as unknown as {
sendMessage: typeof sendMessage;
};
await sendMessageTelegram(`telegram:group:${chatId}:topic:271`, body, {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
accountId: "ops",
api,
});
await expect(
sendMessageTelegram(`telegram:group:${chatId}:topic:271`, body, {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
accountId: "ops",
api,
}),
).rejects.toThrow("message thread not found");
expect(sendMessage).toHaveBeenNthCalledWith(1, chatId, body, {
expect(sendMessage).toHaveBeenCalledTimes(1);
expect(sendMessage).toHaveBeenCalledWith(chatId, body, {
parse_mode: "HTML",
message_thread_id: 271,
});
expect(sendMessage).toHaveBeenNthCalledWith(2, chatId, body, {
parse_mode: "HTML",
});
const logs = capturedLogText(logFile);
expect(logs).toContain("outbound send ok");
expect(logs).toContain("messageId=322");
expect(logs).not.toContain("threadId=271");
expect(logs).not.toContain("outbound send ok");
expect(logs).not.toContain(body);
});
@@ -2161,17 +2134,11 @@ describe("sendMessageTelegram", () => {
expect(logs).not.toContain(body);
});
it("retries media sends without message_thread_id when thread is missing", async () => {
it("fails media sends instead of retrying without message_thread_id", async () => {
const logFile = captureInfoLogs();
const chatId = "-100123";
const threadErr = new Error("400: Bad Request: message thread not found");
const sendPhoto = vi
.fn()
.mockRejectedValueOnce(threadErr)
.mockResolvedValueOnce({
message_id: 59,
chat: { id: chatId },
});
const sendPhoto = vi.fn().mockRejectedValueOnce(threadErr);
const api = { sendPhoto } as unknown as {
sendPhoto: typeof sendPhoto;
};
@@ -2182,14 +2149,17 @@ describe("sendMessageTelegram", () => {
fileName: "photo.jpg",
});
const res = await sendMessageTelegram(chatId, "photo", {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
mediaUrl: "https://example.com/photo.jpg",
messageThreadId: 271,
});
await expect(
sendMessageTelegram(chatId, "photo", {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
mediaUrl: "https://example.com/photo.jpg",
messageThreadId: 271,
}),
).rejects.toThrow("message thread not found");
expect(sendPhoto).toHaveBeenCalledTimes(1);
expectMediaSendCall(
firstMockCall(sendPhoto, "first send photo call"),
"first send photo call",
@@ -2200,20 +2170,8 @@ describe("sendMessageTelegram", () => {
message_thread_id: 271,
},
);
expectMediaSendCall(
mockCall(sendPhoto, 1, "second send photo call"),
"second send photo call",
chatId,
{
caption: "photo",
parse_mode: "HTML",
},
);
expect(res.messageId).toBe("59");
const logs = capturedLogText(logFile);
expect(logs).toContain("outbound send ok");
expect(logs).toContain("messageId=59");
expect(logs).not.toContain("threadId=271");
expect(logs).not.toContain("outbound send ok");
});
it("defaults outbound media uploads to 100MB", async () => {
@@ -2612,32 +2570,27 @@ describe("sendStickerTelegram", () => {
}
});
it("retries sticker sends without message_thread_id when thread is missing", async () => {
it("fails sticker sends instead of retrying without message_thread_id", async () => {
const chatId = "-100123";
const threadErr = new Error("400: Bad Request: message thread not found");
const sendSticker = vi
.fn()
.mockRejectedValueOnce(threadErr)
.mockResolvedValueOnce({
message_id: 109,
chat: { id: chatId },
});
const sendSticker = vi.fn().mockRejectedValueOnce(threadErr);
const api = { sendSticker } as unknown as {
sendSticker: typeof sendSticker;
};
const res = await sendStickerTelegram(chatId, "fileId123", {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
messageThreadId: 271,
});
await expect(
sendStickerTelegram(chatId, "fileId123", {
cfg: TELEGRAM_TEST_CFG,
token: "tok",
api,
messageThreadId: 271,
}),
).rejects.toThrow("message thread not found");
expect(sendSticker).toHaveBeenNthCalledWith(1, chatId, "fileId123", {
expect(sendSticker).toHaveBeenCalledTimes(1);
expect(sendSticker).toHaveBeenCalledWith(chatId, "fileId123", {
message_thread_id: 271,
});
expect(sendSticker).toHaveBeenNthCalledWith(2, chatId, "fileId123", undefined);
expect(res.messageId).toBe("109");
});
it("fails when sticker send returns no message_id", async () => {
@@ -3110,40 +3063,31 @@ describe("sendPollTelegram", () => {
expect(requireRecord(sendPollCall[3], "send poll params").open_period).toBe(60);
});
it("retries without message_thread_id on thread-not-found", async () => {
it("fails poll sends instead of retrying without message_thread_id", async () => {
const api = {
sendPoll: vi.fn(
async (_chatId: string, _question: string, _options: string[], params: unknown) => {
const p = params as { message_thread_id?: unknown } | undefined;
if (p?.message_thread_id) {
throw new Error("400: Bad Request: message thread not found");
}
return { message_id: 1, chat: { id: 2 }, poll: { id: "p2" } };
},
),
sendPoll: vi
.fn()
.mockRejectedValueOnce(new Error("400: Bad Request: message thread not found")),
};
const res = await sendPollTelegram(
"-100123",
{ question: "Q", options: ["A", "B"] },
{
cfg: TELEGRAM_TEST_CFG,
token: "t",
api: api as unknown as Bot["api"],
messageThreadId: 99,
},
);
await expect(
sendPollTelegram(
"-100123",
{ question: "Q", options: ["A", "B"] },
{
cfg: TELEGRAM_TEST_CFG,
token: "t",
api: api as unknown as Bot["api"],
messageThreadId: 99,
},
),
).rejects.toThrow("message thread not found");
expect(res).toEqual({ messageId: "1", chatId: "2", pollId: "p2" });
expect(api.sendPoll).toHaveBeenCalledTimes(2);
expect(api.sendPoll).toHaveBeenCalledTimes(1);
expect(
requireRecord(firstMockCall(api.sendPoll, "send poll call")[3], "send poll params")
.message_thread_id,
).toBe(99);
expect(
(mockCall(api.sendPoll, 1, "second send poll call")[3] as { message_thread_id?: unknown })
.message_thread_id,
).toBeUndefined();
});
it("rejects durationHours for Telegram polls", async () => {

View File

@@ -221,7 +221,6 @@ function logTelegramOutboundSendOk(params: TelegramOutboundSuccessLogParams): vo
}
const PARSE_ERR_RE = /can't parse entities|parse entities|find end of the entity/i;
const THREAD_NOT_FOUND_RE = /400:\s*Bad Request:\s*message thread not found/i;
const MESSAGE_NOT_MODIFIED_RE =
/400:\s*Bad Request:\s*message is not modified|MESSAGE_NOT_MODIFIED/i;
const MESSAGE_DELETE_NOOP_RE =
@@ -412,10 +411,6 @@ function normalizeMessageId(raw: string | number): number {
throw new Error("Message id is required for Telegram actions");
}
function isTelegramThreadNotFoundError(err: unknown): boolean {
return THREAD_NOT_FOUND_RE.test(formatErrorMessage(err));
}
function isTelegramMessageNotModifiedError(err: unknown): boolean {
return MESSAGE_NOT_MODIFIED_RE.test(formatErrorMessage(err));
}
@@ -424,28 +419,6 @@ function isTelegramMessageDeleteNoopError(err: unknown): boolean {
return MESSAGE_DELETE_NOOP_RE.test(formatErrorMessage(err));
}
function hasMessageThreadIdParam(params?: TelegramThreadScopedParams): boolean {
if (!params) {
return false;
}
const value = params.message_thread_id;
if (typeof value === "number") {
return Number.isFinite(value);
}
return false;
}
function removeMessageThreadIdParam<TParams extends TelegramThreadScopedParams | undefined>(
params: TParams,
): TParams {
if (!params || !hasMessageThreadIdParam(params)) {
return params;
}
const next = { ...params };
delete next.message_thread_id;
return (Object.keys(next).length > 0 ? next : undefined) as TParams;
}
function isTelegramHtmlParseError(err: unknown): boolean {
return PARSE_ERR_RE.test(formatErrorMessage(err));
}
@@ -575,41 +548,6 @@ function wrapTelegramChatNotFoundError(err: unknown, params: { chatId: string; i
);
}
async function withTelegramThreadFallback<
T,
TParams extends TelegramThreadScopedParams | undefined,
>(
params: TParams,
label: string,
verbose: boolean | undefined,
allowThreadlessRetry: boolean,
attempt: (effectiveParams: TParams, effectiveLabel: string) => Promise<T>,
): Promise<{ result: T; acceptedParams: TParams }> {
try {
return { result: await attempt(params, label), acceptedParams: params };
} catch (err) {
// Do not widen this fallback to cover "chat not found".
// chat-not-found is routing/auth/membership/token; stripping thread IDs hides root cause.
if (
!allowThreadlessRetry ||
!hasMessageThreadIdParam(params) ||
!isTelegramThreadNotFoundError(err)
) {
throw err;
}
if (verbose) {
sendLogger.warn(
`telegram ${label} failed with message_thread_id, retrying without thread: ${formatErrorMessage(err)}`,
);
}
const retriedParams = removeMessageThreadIdParam(params);
return {
result: await attempt(retriedParams, `${label}-threadless`),
acceptedParams: retriedParams,
};
}
}
function createRequestWithChatNotFound(params: {
requestWithDiag: TelegramRequestWithDiag;
chatId: string;
@@ -707,49 +645,40 @@ export async function sendMessageTelegram(
chunk: TelegramTextChunk,
params?: TelegramSendMessageParams,
) => {
return await withTelegramThreadFallback(
params,
"message",
opts.verbose,
target.chatType !== "direct",
async (effectiveParams, label) => {
const baseParams = effectiveParams ? { ...effectiveParams } : {};
if (linkPreviewOptions) {
baseParams.link_preview_options = linkPreviewOptions;
}
const plainParams: TelegramSendMessageParams = {
...baseParams,
...(opts.silent === true ? { disable_notification: true } : {}),
};
const hasPlainParams = Object.keys(plainParams).length > 0;
const requestPlain = (retryLabel: string) =>
requestWithChatNotFound(
() =>
hasPlainParams
? api.sendMessage(chatId, chunk.plainText, plainParams)
: api.sendMessage(chatId, chunk.plainText),
retryLabel,
);
if (!chunk.htmlText) {
return await requestPlain(label);
}
const htmlText = chunk.htmlText;
const htmlParams: TelegramSendMessageParams = {
parse_mode: "HTML" as const,
...plainParams,
};
return await withTelegramHtmlParseFallback({
label,
const baseParams = params ? { ...params } : {};
if (linkPreviewOptions) {
baseParams.link_preview_options = linkPreviewOptions;
}
const plainParams: TelegramSendMessageParams = {
...baseParams,
...(opts.silent === true ? { disable_notification: true } : {}),
};
const hasPlainParams = Object.keys(plainParams).length > 0;
const requestPlain = (label: string) =>
requestWithChatNotFound(
() =>
hasPlainParams
? api.sendMessage(chatId, chunk.plainText, plainParams)
: api.sendMessage(chatId, chunk.plainText),
label,
);
const result = !chunk.htmlText
? await requestPlain("message")
: await withTelegramHtmlParseFallback({
label: "message",
verbose: opts.verbose,
requestHtml: (retryLabel) =>
requestHtml: (label) =>
requestWithChatNotFound(
() => api.sendMessage(chatId, htmlText, htmlParams),
retryLabel,
() =>
api.sendMessage(chatId, chunk.htmlText ?? chunk.plainText, {
parse_mode: "HTML" as const,
...plainParams,
}),
label,
),
requestPlain,
});
},
);
return { result, acceptedParams: params };
};
const buildTextParams = (isLastChunk: boolean) =>
@@ -927,15 +856,7 @@ export async function sendMessageTelegram(
sender: (
effectiveParams: TelegramThreadScopedParams | undefined,
) => Promise<TelegramMessageLike>,
) =>
await withTelegramThreadFallback(
mediaParams,
label,
opts.verbose,
target.chatType !== "direct",
async (effectiveParams, retryLabel) =>
requestWithChatNotFound(() => sender(effectiveParams), retryLabel),
);
) => await requestWithChatNotFound(() => sender(mediaParams), label);
const mediaSender = (() => {
if (isGif && deliveryKind !== "document") {
@@ -1023,7 +944,7 @@ export async function sendMessageTelegram(
};
})();
const { result, acceptedParams } = await sendMedia(mediaSender.label, mediaSender.sender);
const result = await sendMedia(mediaSender.label, mediaSender.sender);
const mediaMessageId = resolveTelegramMessageIdOrThrow(result, "media send");
const resolvedChatId = String(result?.chat?.id ?? chatId);
recordSentMessage(chatId, mediaMessageId, cfg);
@@ -1036,7 +957,7 @@ export async function sendMessageTelegram(
.map((part) => part.charAt(0).toUpperCase() + part.slice(1))
.join("")}`,
deliveryKind: mediaSender.label,
messageThreadId: acceptedParams?.message_thread_id,
messageThreadId: mediaParams.message_thread_id,
replyToMessageId: opts.replyToMessageId,
silent: opts.silent,
});
@@ -1606,13 +1527,9 @@ export async function sendStickerTelegram(
const stickerParams = hasThreadParams ? threadParams : undefined;
const { result } = await withTelegramThreadFallback(
stickerParams,
const result = await requestWithChatNotFound(
() => api.sendSticker(chatId, fileId.trim(), stickerParams),
"sticker",
opts.verbose,
target.chatType !== "direct",
async (effectiveParams, label) =>
requestWithChatNotFound(() => api.sendSticker(chatId, fileId.trim(), effectiveParams), label),
);
const messageId = resolveTelegramMessageIdOrThrow(result, "sticker send");
@@ -1714,16 +1631,9 @@ export async function sendPollTelegram(
...(opts.silent === true ? { disable_notification: true } : {}),
};
const { result } = await withTelegramThreadFallback(
pollParams,
const result = await requestWithChatNotFound(
() => api.sendPoll(chatId, normalizedPoll.question, pollOptions, pollParams),
"poll",
opts.verbose,
target.chatType !== "direct",
async (effectiveParams, label) =>
requestWithChatNotFound(
() => api.sendPoll(chatId, normalizedPoll.question, pollOptions, effectiveParams),
label,
),
);
const messageId = resolveTelegramMessageIdOrThrow(result, "poll send");

View File

@@ -28,7 +28,10 @@ Coverage tracking:
Runtime parity tiers:
- `standard`: required Codex-vs-Pi mock gate coverage for first-hour depth and
default runtime-tool fixtures; selected with
default runtime-tool fixtures. OpenClaw dynamic integration tools in this
tier are hard-gated by `openclaw qa coverage --tools --summary`; Codex-native
workspace rows remain separately tracked until native/live behavior is the
asserted surface. Selected with
`openclaw qa suite --runtime-pair pi,codex --runtime-parity-tier standard`
- `optional`: profile-, plugin-, or external-service-dependent runtime-tool
fixtures that stay out of the default release gate

View File

@@ -13,6 +13,7 @@ successCriteria:
- Effective tools expose image_generate after QA image-generation config is applied.
- The mock provider plans exactly one happy-path image_generate call.
- The mock provider plans one denied-input failure-path image_generate call.
- Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
docsRefs:
- docs/tools/image-generation.md
codeRefs:
@@ -29,15 +30,12 @@ execution:
actualTool: image_generate
bucket: openclaw-dynamic-integration
expectedLayer: openclaw-dynamic
capabilityLayer: openclaw-dynamic-direct
required: true
tracking: "#80319"
codexDefaultImpact: P4
qaImpact: P1
action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
reason: image_generate is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
knownHarnessGap:
issue: "#80319"
reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
action: hard gate in the standard direct-loading tier
reason: image_generate is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
promptSnippet: "target=image_generate"
failurePromptSnippet: "failure target=image_generate"
```

View File

@@ -13,6 +13,7 @@ successCriteria:
- Effective tools expose session_status.
- The mock provider plans exactly one happy-path session_status call.
- The mock provider plans one denied-input failure-path session_status call.
- Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
docsRefs:
- qa/scenarios/index.md
codeRefs:
@@ -28,15 +29,12 @@ execution:
actualTool: session_status
bucket: openclaw-dynamic-integration
expectedLayer: openclaw-dynamic
capabilityLayer: openclaw-dynamic-direct
required: true
tracking: "#80319"
codexDefaultImpact: P4
qaImpact: P1
action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
reason: session_status is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
knownHarnessGap:
issue: "#80319"
reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
action: hard gate in the standard direct-loading tier
reason: session_status is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
promptSnippet: "target=session_status"
failurePromptSnippet: "failure target=session_status"
```

View File

@@ -13,6 +13,7 @@ successCriteria:
- Effective tools expose sessions_spawn.
- The mock provider plans exactly one happy-path sessions_spawn call.
- The mock provider plans one denied-input failure-path sessions_spawn call.
- Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
docsRefs:
- qa/scenarios/index.md
codeRefs:
@@ -28,15 +29,12 @@ execution:
actualTool: sessions_spawn
bucket: openclaw-dynamic-integration
expectedLayer: openclaw-dynamic
capabilityLayer: openclaw-dynamic-direct
required: true
tracking: "#80319"
codexDefaultImpact: P4
qaImpact: P1
action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
reason: sessions_spawn is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
knownHarnessGap:
issue: "#80319"
reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
action: hard gate in the standard direct-loading tier
reason: sessions_spawn is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
promptSnippet: "target=sessions_spawn"
failurePromptSnippet: "failure target=sessions_spawn"
```

View File

@@ -13,6 +13,7 @@ successCriteria:
- Effective tools expose web_fetch.
- The mock provider plans exactly one happy-path web_fetch call.
- The mock provider plans one denied-input failure-path web_fetch call.
- Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
docsRefs:
- qa/scenarios/index.md
codeRefs:
@@ -28,15 +29,12 @@ execution:
actualTool: web_fetch
bucket: openclaw-dynamic-integration
expectedLayer: openclaw-dynamic
capabilityLayer: openclaw-dynamic-direct
required: true
tracking: "#80319"
codexDefaultImpact: P4
qaImpact: P1
action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
reason: web_fetch is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
knownHarnessGap:
issue: "#80319"
reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
action: hard gate in the standard direct-loading tier
reason: web_fetch is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
promptSnippet: "target=web_fetch"
failurePromptSnippet: "failure target=web_fetch"
```

View File

@@ -13,6 +13,7 @@ successCriteria:
- Effective tools expose web_search.
- The mock provider plans exactly one happy-path web_search call.
- The mock provider plans one denied-input failure-path web_search call.
- Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
docsRefs:
- qa/scenarios/index.md
codeRefs:
@@ -28,15 +29,12 @@ execution:
actualTool: web_search
bucket: openclaw-dynamic-integration
expectedLayer: openclaw-dynamic
capabilityLayer: openclaw-dynamic-direct
required: true
tracking: "#80319"
codexDefaultImpact: P4
qaImpact: P1
action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
reason: web_search is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
knownHarnessGap:
issue: "#80319"
reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
action: hard gate in the standard direct-loading tier
reason: web_search is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
promptSnippet: "target=web_search"
failurePromptSnippet: "failure target=web_search"
```

View File

@@ -1049,9 +1049,35 @@ function sshArgs(inspect: CrabboxInspect) {
};
}
function isTransientSshFailure(error: unknown) {
const message = error instanceof Error ? error.message : String(error);
return /Connection (?:closed|reset)|Operation timed out|Connection timed out/u.test(message);
}
async function runRemoteCommand(params: {
args: string[];
command: string;
cwd: string;
stdio?: "inherit" | "pipe";
}) {
let lastError: unknown;
for (let attempt = 1; attempt <= 4; attempt += 1) {
try {
return await runCommand(params);
} catch (error) {
lastError = error;
if (attempt === 4 || !isTransientSshFailure(error)) {
throw error;
}
await new Promise((resolve) => setTimeout(resolve, attempt * 3000));
}
}
throw lastError;
}
async function scpToRemote(root: string, inspect: CrabboxInspect, local: string, remote: string) {
const ssh = sshArgs(inspect);
await runCommand({
await runRemoteCommand({
command: "scp",
args: [...ssh.scpBase, local, `${ssh.target}:${remote}`],
cwd: root,
@@ -1061,7 +1087,7 @@ async function scpToRemote(root: string, inspect: CrabboxInspect, local: string,
async function scpFromRemote(root: string, inspect: CrabboxInspect, remote: string, local: string) {
const ssh = sshArgs(inspect);
await runCommand({
await runRemoteCommand({
command: "scp",
args: [...ssh.scpBase, `${ssh.target}:${remote}`, local],
cwd: root,
@@ -1071,7 +1097,7 @@ async function scpFromRemote(root: string, inspect: CrabboxInspect, remote: stri
async function sshRun(root: string, inspect: CrabboxInspect, remoteCommand: string) {
const ssh = sshArgs(inspect);
return await runCommand({
return await runRemoteCommand({
command: "ssh",
args: [...ssh.base, ssh.target, remoteCommand],
cwd: root,
@@ -1090,7 +1116,7 @@ tdlib_url=${tdlibUrl}
mkdir -p "$root"
tar -xzf "$root/state.tgz" -C "$root"
sudo apt-get update -y
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y curl git cmake g++ make zlib1g-dev libssl-dev python3 ffmpeg scrot xz-utils tar wmctrl xdotool x11-utils libopengl0 libxcb-cursor0 libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-randr0 libxcb-render-util0 libxcb-shape0 libxcb-xfixes0 libxcb-xinerama0 libxkbcommon-x11-0 >/tmp/openclaw-telegram-apt.log
sudo DEBIAN_FRONTEND=noninteractive apt-get install -y curl git cmake g++ make zlib1g-dev libssl-dev python3 ffmpeg scrot xz-utils tar wmctrl xdotool x11-utils zbar-tools libopengl0 libxcb-cursor0 libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-randr0 libxcb-render-util0 libxcb-shape0 libxcb-xfixes0 libxcb-xinerama0 libxkbcommon-x11-0 >/tmp/openclaw-telegram-apt.log
if ! command -v python3 >/dev/null 2>&1; then
echo "python3 is required" >&2
exit 127
@@ -1122,6 +1148,7 @@ if ! ldconfig -p | grep -q libtdjson.so; then
sudo ldconfig
fi
TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver" python3 "$root/user-driver.py" status --json --timeout-ms 60000 >"$root/status.json"
TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver" python3 "$root/user-driver.py" terminate-desktop-sessions --json --timeout-ms 60000 --output "$root/desktop-sessions-cleanup.json"
`;
}
@@ -1131,6 +1158,7 @@ set -euo pipefail
root=${REMOTE_ROOT}
export DISPLAY="\${DISPLAY:-:99}"
pkill -f "$root/Telegram/Telegram" >/dev/null 2>&1 || true
rm -rf "$root/desktop/tdata"
nohup "$root/Telegram/Telegram" -workdir "$root/desktop" >"$root/telegram-desktop.log" 2>&1 &
pid=$!
sleep 8
@@ -1145,6 +1173,60 @@ fi
`;
}
function renderAuthorizeDesktop() {
return `#!/usr/bin/env bash
set -euo pipefail
root=${REMOTE_ROOT}
export DISPLAY="\${DISPLAY:-:99}"
win="$(wmctrl -l | awk 'tolower($0) ~ /telegram/ {print $1; exit}')"
test -n "$win"
xdotool windowactivate "$win"
sleep 5
click_window_ratio() {
eval "$(xdotool getwindowgeometry --shell "$win")"
xdotool windowactivate "$win"
sleep 0.2
xdotool mousemove "$((X + WIDTH / 2))" "$((Y + HEIGHT * $1 / 100))"
sleep 0.2
xdotool click 1
sleep 1
}
read_qr_link() {
scrot "$root/telegram-login-qr.png"
{ zbarimg --raw "$root/telegram-login-qr.png" 2>/dev/null || true; } | awk 'index($0, "tg://login?token=") == 1 {print; exit}'
}
wait_for_qr_link() {
for _ in $(seq 1 25); do
link="$(read_qr_link)"
if [ -n "$link" ]; then
printf '%s\\n' "$link"
return 0
fi
sleep 1
done
return 1
}
click_window_ratio 69
sleep 3
click_window_ratio 80
link="$(wait_for_qr_link)" || {
echo "Telegram Desktop QR login code was not found." >&2
exit 1
}
export TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver"
python3 "$root/user-driver.py" confirm-qr --link "$link" --json --output "$root/desktop-session.json"
python3 - "$root/desktop-session.json" <<'PY'
import json
import sys
payload = json.loads(open(sys.argv[1]).read())
session = payload.get("session") or {}
if session.get("isPasswordPending"):
raise SystemExit("Telegram Desktop QR login requires a 2FA password.")
PY
sleep 6
`;
}
function renderSelectDesktopChat(params: { chatTitle: string }) {
return `#!/usr/bin/env bash
set -euo pipefail
@@ -1414,12 +1496,14 @@ async function writeRemoteSessionScripts(params: {
}) {
const setupScript = path.join(params.localRoot, "remote-setup.sh");
const launchScript = path.join(params.localRoot, "launch-desktop.sh");
const authorizeScript = path.join(params.localRoot, "authorize-desktop.sh");
const selectChatScript = path.join(params.localRoot, "select-desktop-chat.sh");
await writeExecutable(
setupScript,
renderRemoteSetup({ tdlibSha256: params.opts.tdlibSha256, tdlibUrl: params.opts.tdlibUrl }),
);
await writeExecutable(launchScript, renderLaunchDesktop());
await writeExecutable(authorizeScript, renderAuthorizeDesktop());
await writeExecutable(
selectChatScript,
renderSelectDesktopChat({ chatTitle: params.opts.desktopChatTitle }),
@@ -1429,6 +1513,12 @@ async function writeRemoteSessionScripts(params: {
await scpToRemote(params.root, params.inspect, params.stateArchive, `${REMOTE_ROOT}/state.tgz`);
await scpToRemote(params.root, params.inspect, setupScript, `${REMOTE_ROOT}/remote-setup.sh`);
await scpToRemote(params.root, params.inspect, launchScript, `${REMOTE_ROOT}/launch-desktop.sh`);
await scpToRemote(
params.root,
params.inspect,
authorizeScript,
`${REMOTE_ROOT}/authorize-desktop.sh`,
);
await scpToRemote(
params.root,
params.inspect,
@@ -1437,6 +1527,7 @@ async function writeRemoteSessionScripts(params: {
);
await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/remote-setup.sh`);
await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/launch-desktop.sh`);
await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/authorize-desktop.sh`);
await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/select-desktop-chat.sh`);
await sshRun(
params.root,
@@ -1486,6 +1577,30 @@ fi`,
);
}
async function terminateRemoteDesktopSession(root: string, inspect: CrabboxInspect) {
await sshRun(
root,
inspect,
`set -euo pipefail
root=${REMOTE_ROOT}
if [ ! -s "$root/desktop-session.json" ]; then
exit 0
fi
session_id="$(python3 - "$root/desktop-session.json" <<'PY'
import json
import sys
payload = json.loads(open(sys.argv[1]).read())
print((payload.get("session") or {}).get("id") or "")
PY
)"
if [ -z "$session_id" ]; then
exit 0
fi
export TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver"
python3 "$root/user-driver.py" terminate-session --session-id "$session_id" --json --output "$root/desktop-session-terminated.json"`,
);
}
async function startSession(root: string, opts: Options, outputDir: string) {
const localRoot = path.join(outputDir, ".session");
fs.rmSync(localRoot, { force: true, recursive: true });
@@ -1756,6 +1871,16 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
const statusPath = path.join(session.outputDir, "status.json");
const ffmpegLogPath = path.join(session.outputDir, "ffmpeg.log");
const crop = previewCrop(opts);
let desktopSessionTerminationAttempted = false;
const terminateDesktopSession = async () => {
if (opts.keepBox || desktopSessionTerminationAttempted) {
return;
}
desktopSessionTerminationAttempted = true;
await terminateRemoteDesktopSession(root, session.crabbox.inspect).catch((error: unknown) => {
summary.desktopSessionTerminateError = error instanceof Error ? error.message : String(error);
});
};
try {
await stopRemoteRecording(root, session.crabbox.inspect, session);
await scpFromRemote(root, session.crabbox.inspect, session.recorder.remoteVideo, videoPath);
@@ -1774,6 +1899,23 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
await scpFromRemote(root, session.crabbox.inspect, session.recorder.log, ffmpegLogPath).catch(
() => {},
);
await runCommand({
command: opts.crabboxBin,
args: [
"screenshot",
"--provider",
session.crabbox.provider,
"--target",
session.crabbox.target,
"--id",
session.crabbox.id,
"--output",
screenshotPath,
],
cwd: root,
stdio: "inherit",
});
await terminateDesktopSession();
summary.mediaPreview = await createMotionPreview({
motionGifPath,
motionVideoPath,
@@ -1791,22 +1933,6 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
videoPath: motionVideoPath,
});
}
await runCommand({
command: opts.crabboxBin,
args: [
"screenshot",
"--provider",
session.crabbox.provider,
"--target",
session.crabbox.target,
"--id",
session.crabbox.id,
"--output",
screenshotPath,
],
cwd: root,
stdio: "inherit",
});
summary.artifacts = {
desktopLog: path.relative(root, desktopLogPath),
ffmpegLog: path.relative(root, ffmpegLogPath),
@@ -1826,6 +1952,7 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
} finally {
killPidTree(session.localSut.gatewayPid);
killPidTree(session.localSut.mockPid);
await terminateDesktopSession();
await releaseCredential(root, opts, session.credential.leaseFile).catch((error: unknown) => {
summary.credentialReleaseError = error instanceof Error ? error.message : String(error);
});
@@ -2038,6 +2165,7 @@ async function main() {
const setupScript = path.join(localRoot, "remote-setup.sh");
const launchScript = path.join(localRoot, "launch-desktop.sh");
const authorizeScript = path.join(localRoot, "authorize-desktop.sh");
const selectChatScript = path.join(localRoot, "select-desktop-chat.sh");
const probeScript = path.join(localRoot, "remote-probe.sh");
await writeExecutable(
@@ -2045,6 +2173,7 @@ async function main() {
renderRemoteSetup({ tdlibSha256: opts.tdlibSha256, tdlibUrl: opts.tdlibUrl }),
);
await writeExecutable(launchScript, renderLaunchDesktop());
await writeExecutable(authorizeScript, renderAuthorizeDesktop());
await writeExecutable(
selectChatScript,
renderSelectDesktopChat({ chatTitle: opts.desktopChatTitle }),
@@ -2063,6 +2192,7 @@ async function main() {
await scpToRemote(root, inspect, stateArchive, `${REMOTE_ROOT}/state.tgz`);
await scpToRemote(root, inspect, setupScript, `${REMOTE_ROOT}/remote-setup.sh`);
await scpToRemote(root, inspect, launchScript, `${REMOTE_ROOT}/launch-desktop.sh`);
await scpToRemote(root, inspect, authorizeScript, `${REMOTE_ROOT}/authorize-desktop.sh`);
await scpToRemote(root, inspect, selectChatScript, `${REMOTE_ROOT}/select-desktop-chat.sh`);
await scpToRemote(root, inspect, probeScript, `${REMOTE_ROOT}/remote-probe.sh`);
await sshRun(root, inspect, `bash ${REMOTE_ROOT}/remote-setup.sh`);
@@ -2086,6 +2216,7 @@ async function main() {
};
await sshRun(root, inspect, `bash ${REMOTE_ROOT}/launch-desktop.sh`);
await sshRun(root, inspect, `bash ${REMOTE_ROOT}/authorize-desktop.sh`);
await sshRun(root, inspect, `bash ${REMOTE_ROOT}/select-desktop-chat.sh`);
const videoPath = path.join(outputDir, "telegram-user-crabbox-proof.mp4");
const recording = spawn(

View File

@@ -611,6 +611,33 @@ def command_confirm_qr(args):
)
def command_terminate_session(args):
config, bot_config = load_config()
driver = UserDriver(config, bot_config)
driver.authorize(argparse.Namespace(timeout_ms=args.timeout_ms))
driver.client.request({"@type": "terminateSession", "session_id": int(args.session_id)}, timeout=30)
print_result({"ok": True, "sessionId": args.session_id}, args.json, getattr(args, "output", ""))
def command_terminate_desktop_sessions(args):
config, bot_config = load_config()
driver = UserDriver(config, bot_config)
driver.authorize(argparse.Namespace(timeout_ms=args.timeout_ms))
result = driver.client.request({"@type": "getActiveSessions"}, timeout=30)
terminated = []
for session in result.get("sessions", []):
if session.get("is_current"):
continue
if session.get("application_name") != "Telegram Desktop":
continue
session_id = session.get("id")
if session_id is None:
continue
driver.client.request({"@type": "terminateSession", "session_id": int(session_id)}, timeout=30)
terminated.append({"id": session_id, "applicationName": session.get("application_name")})
print_result({"ok": True, "terminated": terminated}, args.json, getattr(args, "output", ""))
def public_user(user):
return {
"id": user.get("id"),
@@ -784,6 +811,15 @@ def main():
confirm_qr.add_argument("--link", required=True)
confirm_qr.set_defaults(func=command_confirm_qr)
terminate_session = sub.add_parser("terminate-session")
add_common(terminate_session)
terminate_session.add_argument("--session-id", required=True)
terminate_session.set_defaults(func=command_terminate_session)
terminate_desktop_sessions = sub.add_parser("terminate-desktop-sessions")
add_common(terminate_desktop_sessions)
terminate_desktop_sessions.set_defaults(func=command_terminate_desktop_sessions)
send = sub.add_parser("send")
add_common(send)
send.add_argument("--chat", default="")

View File

@@ -52,7 +52,7 @@ function resolveContractFileWeight(file) {
export function createChannelContractTestShards() {
const rootDir = "src/channels/plugins/contracts";
const suffixes = ["a", "b", "c"];
const suffixes = ["a", "b"];
const groups = Object.fromEntries(
suffixes.map((suffix) => [`checks-fast-contracts-channels-${suffix}`, []]),
);

View File

@@ -66,7 +66,7 @@ function resolveContractFileWeight(file) {
}
export function createPluginContractTestShards() {
const suffixes = ["a", "b", "c", "d"];
const suffixes = ["a", "b"];
const groups = Object.fromEntries(
suffixes.map((suffix) => [`checks-fast-contracts-plugins-${suffix}`, []]),
);

View File

@@ -206,9 +206,15 @@ function collectReferenceEvents(
if (!clause?.namedBindings) {
continue;
}
if (clause.isTypeOnly) {
continue;
}
if (ts.isNamedImports(clause.namedBindings)) {
for (const element of clause.namedBindings.elements) {
if (element.isTypeOnly) {
continue;
}
const importedName = element.propertyName?.text ?? element.name.text;
const record = recordMap.get(importedName);
if (!record) {

View File

@@ -110,5 +110,9 @@ const reportModules: Record<ReportModule["name"], ReportModule> = {
};
export function renderTextReport(envelope: TopologyEnvelope, limit: number): string {
return reportModules[envelope.report].describe(envelope, limit);
const reportModule = reportModules[envelope.report];
if (!reportModule) {
throw new Error(`Unsupported topology report: ${envelope.report}`);
}
return reportModule.describe(envelope, limit);
}

View File

@@ -308,6 +308,47 @@ function laneLine(label, lane) {
return pieces.join("");
}
function hasVisibleProofArtifacts(manifest) {
return manifest.artifacts.some((artifact) =>
["desktopScreenshot", "fullVideo", "motionClip", "motionPreview", "timeline"].includes(
artifact.kind,
),
);
}
function isSkippedNoVisualProof(manifest) {
const comparison = manifest.comparison ?? {};
return (
!hasVisibleProofArtifacts(manifest) &&
comparison.baseline?.status === "skipped" &&
comparison.candidate?.status === "skipped"
);
}
function publicSummary(manifest) {
if (isSkippedNoVisualProof(manifest)) {
return "Mantis did not generate before/after GIFs because this PR does not have a clean Telegram-visible before/after proof in the standard Mantis run.";
}
return manifest.summary ?? "Mantis captured QA evidence for this scenario.";
}
function overallStatus(manifest) {
if (isSkippedNoVisualProof(manifest)) {
return "skipped";
}
const pass = manifest.comparison?.pass;
return typeof pass === "boolean" ? String(pass) : "";
}
export function shouldPublishPrComment(manifest) {
if (!isSkippedNoVisualProof(manifest)) {
return true;
}
return !/(authorization[- ]?error|credential infrastructure|logged[- ]out|login screen|welcome screen|bad telegram session)/iu.test(
manifest.summary ?? "",
);
}
export function renderEvidenceComment({
artifactUrl: actionsArtifactUrl,
manifest,
@@ -333,7 +374,7 @@ export function renderEvidenceComment({
marker,
`## ${manifest.title}`,
"",
`Summary: ${manifest.summary ?? "Mantis captured QA evidence for this scenario."}`,
`Summary: ${publicSummary(manifest)}`,
"",
`- Scenario: \`${manifest.scenario}\``,
];
@@ -354,8 +395,9 @@ export function renderEvidenceComment({
if (candidateLine) {
lines.push(candidateLine);
}
if (typeof comparison.pass === "boolean") {
lines.push(`- Overall: \`${comparison.pass}\``);
const overall = overallStatus(manifest);
if (overall) {
lines.push(`- Overall: \`${overall}\``);
}
lines.push("");
@@ -551,6 +593,10 @@ export async function publishEvidence(rawArgs = process.argv.slice(2)) {
runUrl: args.run_url,
treeUrl: published.treeUrl,
});
if (!shouldPublishPrComment(manifest)) {
console.log("Skipped Mantis QA evidence PR comment because the run did not capture proof.");
return;
}
upsertPrComment({
body,
marker: args.marker,

View File

@@ -4,6 +4,8 @@ type Options = {
json?: boolean;
output?: string;
repoRoot?: string;
summary?: string;
tools?: boolean;
};
function takeValue(args: string[], index: number, flag: string): string {
@@ -27,6 +29,8 @@ Options:
--json Print machine-readable JSON
--output <path> Write the report to a file
--repo-root <path> Repository root to target
--summary <path> Runtime qa-suite-summary.json to overlay on --tools coverage
--tools Print runtime tool fixture coverage instead of scenario coverage
-h, --help Display help
`);
process.exit(0);
@@ -41,6 +45,13 @@ Options:
opts.repoRoot = takeValue(args, index, arg);
index += 1;
break;
case "--summary":
opts.summary = takeValue(args, index, arg);
index += 1;
break;
case "--tools":
opts.tools = true;
break;
default:
throw new Error(`Unknown qa coverage option: ${arg}`);
}
@@ -53,4 +64,6 @@ await runQaCoverageReportCommand({
...(opts.json ? { json: true } : {}),
...(opts.output ? { output: opts.output } : {}),
...(opts.repoRoot ? { repoRoot: opts.repoRoot } : {}),
...(opts.summary ? { summary: opts.summary } : {}),
...(opts.tools ? { tools: true } : {}),
});

View File

@@ -7,6 +7,9 @@ type Options = {
candidateSummary?: string;
outputDir?: string;
repoRoot?: string;
runtimeAxis?: boolean;
summary?: string;
tokenEfficiency?: boolean;
};
function takeValue(args: string[], index: number, flag: string): string {
@@ -31,6 +34,9 @@ Options:
--baseline-summary <path> Baseline qa-suite-summary.json path
--candidate-label <label> Candidate display label
--baseline-label <label> Baseline display label
--runtime-axis Interpret --summary as a runtime-pair summary
--summary <path> Runtime-axis qa-suite-summary.json path
--token-efficiency Also write the runtime token-efficiency report
--repo-root <path> Repository root to target
--output-dir <path> Artifact directory for the parity report
-h, --help Display help
@@ -60,6 +66,16 @@ Options:
opts.repoRoot = takeValue(args, index, arg);
index += 1;
break;
case "--runtime-axis":
opts.runtimeAxis = true;
break;
case "--summary":
opts.summary = takeValue(args, index, arg);
index += 1;
break;
case "--token-efficiency":
opts.tokenEfficiency = true;
break;
default:
throw new Error(`Unknown qa parity-report option: ${arg}`);
}
@@ -68,18 +84,27 @@ Options:
}
const opts = parseArgs(process.argv.slice(2));
if (!opts.candidateSummary) {
throw new Error("--candidate-summary is required.");
}
if (!opts.baselineSummary) {
throw new Error("--baseline-summary is required.");
if (opts.runtimeAxis) {
if (!opts.summary) {
throw new Error("--summary is required when --runtime-axis is set.");
}
} else {
if (!opts.candidateSummary) {
throw new Error("--candidate-summary is required.");
}
if (!opts.baselineSummary) {
throw new Error("--baseline-summary is required.");
}
}
await runQaParityReportCommand({
baselineSummary: opts.baselineSummary,
candidateSummary: opts.candidateSummary,
...(opts.baselineSummary ? { baselineSummary: opts.baselineSummary } : {}),
...(opts.candidateSummary ? { candidateSummary: opts.candidateSummary } : {}),
...(opts.baselineLabel ? { baselineLabel: opts.baselineLabel } : {}),
...(opts.candidateLabel ? { candidateLabel: opts.candidateLabel } : {}),
...(opts.outputDir ? { outputDir: opts.outputDir } : {}),
...(opts.repoRoot ? { repoRoot: opts.repoRoot } : {}),
...(opts.runtimeAxis ? { runtimeAxis: opts.runtimeAxis } : {}),
...(opts.summary ? { summary: opts.summary } : {}),
...(opts.tokenEfficiency ? { tokenEfficiency: opts.tokenEfficiency } : {}),
});

View File

@@ -9,17 +9,17 @@ type RepoLabel = {
};
const COLOR_BY_PREFIX = new Map<string, string>([
["channel", "DDEBFA"],
["app", "EADFF8"],
["extensions", "EDEDED"],
["plugin", "EDEDED"],
["docs", "CFE3F8"],
["cli", "CFE3F8"],
["gateway", "D9CCF5"],
["commands", "CFE3F8"],
["scripts", "D9CCF5"],
["docker", "DDF4E4"],
["size", "E8C4CB"],
["channel", "0969DA"],
["app", "6E7781"],
["extensions", "6E7781"],
["plugin", "6E7781"],
["docs", "0A3069"],
["cli", "0A3069"],
["gateway", "57606A"],
["commands", "0A3069"],
["scripts", "57606A"],
["docker", "D6E3DA"],
["size", "8C959F"],
]);
const EXTRA_LABEL_METADATA = new Map<

View File

@@ -12,23 +12,25 @@ const COLORS = {
softerAmber: "F9D65C",
paleYellow: "F7E7A1",
saturatedGreen: "0E8A16",
mutedGreen: "B8E0B0",
paleGreen: "DDF4E4",
proofGreen: "C2E0C6",
mutedProofGreen: "9BD3A0",
overrideGreen: "DDECCF",
mutedGreen: "8C959F",
paleGreen: "D6E3DA",
proofGreen: "2DA44E",
mutedProofGreen: "1A7F37",
overrideGreen: "2DA44E",
saturatedBlue: "0F2CCE",
paleBlue: "CFE3F8",
channelBlue: "DDEBFA",
dedupeBlue: "BFD4F2",
triageBlue: "D8E8F8",
paleBlue: "0A3069",
channelBlue: "0969DA",
dedupeBlue: "57606A",
triageBlue: "0969DA",
saturatedPurple: "7057FF",
mutedPurple: "D9CCF5",
appPurple: "EADFF8",
neutralGray: "EDEDED",
duplicateGray: "CFD3D7",
mutedPurple: "57606A",
taxonomyGray: "6E7781",
taxonomySteel: "57606A",
appPurple: "6E7781",
neutralGray: "E5E7EB",
duplicateGray: "D1D5DB",
darkGray: "8C8C8C",
mutedRose: "E8C4CB",
mutedRose: "8C959F",
mutedRed: "E99695",
black: "000000",
white: "FFFFFF",
@@ -43,6 +45,13 @@ const EXACT_COLORS = new Map(
P1: COLORS.saturatedOrangeRed,
P2: COLORS.saturatedAmber,
P3: COLORS.mutedGreen,
"rating: 🦀 challenger crab": "1F883D",
"rating: 🦞 diamond lobster": "0969DA",
"rating: 🐚 platinum hermit": "0F766E",
"rating: 🦐 gold shrimp": "B7791F",
"rating: 🦪 silver shellfish": "7A828E",
"rating: 🧂 unranked krab": "8C2F39",
"rating: 🌊 off-meta tidepool": "6E7781",
"impact:data-loss": COLORS.saturatedRed,
"impact:security": COLORS.saturatedRed,
"impact:crash-loop": COLORS.saturatedOrangeRed,
@@ -63,13 +72,13 @@ const EXACT_COLORS = new Map(
"triage:done": COLORS.mutedGreen,
"triage:needs-review": COLORS.paleBlue,
"triage:started": COLORS.mutedPurple,
agents: COLORS.mutedPurple,
agents: COLORS.taxonomySteel,
docs: COLORS.paleBlue,
cli: COLORS.paleBlue,
commands: COLORS.paleBlue,
scripts: COLORS.mutedPurple,
gateway: COLORS.mutedPurple,
codex: COLORS.neutralGray,
codex: COLORS.taxonomySteel,
docker: COLORS.paleGreen,
tui: COLORS.paleGreen,
"extensions: NEW": COLORS.channelBlue,
@@ -158,13 +167,13 @@ const FAMILY_RULES = [
{
family: "extension",
match: (name) => name.startsWith("extensions: "),
color: COLORS.neutralGray,
color: COLORS.taxonomyGray,
reason: "plugin implementation taxonomy should not compete with priority",
},
{
family: "plugin",
match: (name) => name.startsWith("plugin: "),
color: COLORS.neutralGray,
color: COLORS.taxonomyGray,
reason: "plugin taxonomy stays neutral unless it becomes an action gate",
},
{
@@ -196,6 +205,9 @@ function exactFamily(name) {
if (/^P[0-3]$/.test(name)) {
return "priority";
}
if (name.startsWith("rating:")) {
return "rating";
}
if (name.startsWith("impact:")) {
return "impact";
}

View File

@@ -64,4 +64,34 @@ describe("createAnthropicPayloadLogger", () => {
expect(source.sha256).toBe(crypto.createHash("sha256").update("QUJDRA==").digest("hex"));
expect(event.payloadDigest).toMatch(/^[a-f0-9]{64}$/u);
});
it("sanitizes usage and error fields before writing logs", () => {
const lines: string[] = [];
const logger = createAnthropicPayloadLogger({
env: { OPENCLAW_ANTHROPIC_PAYLOAD_LOG: "1" },
writer: {
filePath: "memory",
write: (line) => lines.push(line),
flush: async () => undefined,
},
});
logger?.recordUsage(
[
{
role: "assistant",
content: "",
usage: {
input: 1,
authorization: "Bearer sk-secret", // pragma: allowlist secret
},
} as never,
],
new Error("failed with Bearer sk-secret"), // pragma: allowlist secret
);
const event = JSON.parse(lines[0]?.trim() ?? "{}") as Record<string, unknown>;
expect(event.error).toBe("failed with Bearer <redacted>");
expect(event.usage).toEqual({ input: 1 });
});
});

View File

@@ -53,16 +53,18 @@ function getWriter(filePath: string): PayloadLogWriter {
function formatError(error: unknown): string | undefined {
if (error instanceof Error) {
return error.message;
const redacted = sanitizeDiagnosticPayload(error.message);
return typeof redacted === "string" ? redacted : error.message;
}
if (typeof error === "string") {
return error;
const redacted = sanitizeDiagnosticPayload(error);
return typeof redacted === "string" ? redacted : error;
}
if (typeof error === "number" || typeof error === "boolean" || typeof error === "bigint") {
return String(error);
}
if (error && typeof error === "object") {
return safeJsonStringify(error) ?? "unknown error";
return safeJsonStringify(sanitizeDiagnosticPayload(error)) ?? "unknown error";
}
return undefined;
}
@@ -173,7 +175,7 @@ export function createAnthropicPayloadLogger(params: {
...base,
ts: new Date().toISOString(),
stage: "usage",
usage,
usage: sanitizeDiagnosticPayload(usage) as Record<string, unknown>,
error: errorMessage,
});
log.info("anthropic usage", {

View File

@@ -53,10 +53,13 @@ function computeReplacements(
if (chunk.oldLines.length === 0) {
const insertionIndex =
originalLines.length > 0 && originalLines[originalLines.length - 1] === ""
? originalLines.length - 1
: originalLines.length;
chunk.changeContext && !chunk.isEndOfFile
? lineIndex
: originalLines.length > 0 && originalLines[originalLines.length - 1] === ""
? originalLines.length - 1
: originalLines.length;
replacements.push([insertionIndex, 0, chunk.newLines]);
lineIndex = insertionIndex;
continue;
}

View File

@@ -131,6 +131,57 @@ describe("applyPatch", () => {
expect(result.summary.modified).toEqual(["dest.txt"]);
});
it("updates in place when move target resolves to the source file", async () => {
const memory = createMemoryPatchSandbox({
"source.txt": "foo\nbar\n",
});
const patch = `*** Begin Patch
*** Update File: source.txt
*** Move to: ./source.txt
@@
foo
-bar
+baz
*** End Patch`;
const result = await applyPatch(patch, memory.options);
expect(memory.files.get("/sandbox/source.txt")).toBe("foo\nbaz\n");
expect(result.summary.modified).toEqual(["source.txt"]);
});
it("applies context-only insertions at the requested context", async () => {
const memory = createMemoryPatchSandbox({
"source.txt": "alpha\nanchor\nomega\n",
});
const patch = `*** Begin Patch
*** Update File: source.txt
@@ anchor
+inserted
*** End Patch`;
await applyPatch(patch, memory.options);
expect(memory.files.get("/sandbox/source.txt")).toBe("alpha\nanchor\ninserted\nomega\n");
});
it("keeps later insertion contexts in original file coordinates", async () => {
const memory = createMemoryPatchSandbox({
"source.txt": "a\nb\nc\n",
});
const patch = `*** Begin Patch
*** Update File: source.txt
@@ a
+after-a
@@ b
+after-b
*** End Patch`;
await applyPatch(patch, memory.options);
expect(memory.files.get("/sandbox/source.txt")).toBe("a\nafter-a\nb\nafter-b\nc\n");
});
it("supports end-of-file inserts", async () => {
const memory = createMemoryPatchSandbox({
"end.txt": "line1\n",

View File

@@ -175,9 +175,21 @@ export async function applyPatch(
const moveTarget = await resolvePatchPath(hunk.movePath, options);
await assertPatchParentPath(hunk.movePath, options);
await ensureDir(moveTarget.resolved, fileOps);
await fileOps.writeFile(moveTarget.resolved, applied);
await fileOps.remove(target.resolved);
recordSummary(summary, seen, "modified", moveTarget.display);
const moveResolvesToSource =
path.resolve(moveTarget.resolved) === path.resolve(target.resolved);
await fileOps.writeFile(
moveResolvesToSource ? target.resolved : moveTarget.resolved,
applied,
);
if (!moveResolvesToSource) {
await fileOps.remove(target.resolved);
}
recordSummary(
summary,
seen,
"modified",
moveResolvesToSource ? target.display : moveTarget.display,
);
} else {
await fileOps.writeFile(target.resolved, applied);
recordSummary(summary, seen, "modified", target.display);

View File

@@ -172,12 +172,17 @@ const transport = new StdioClientTransport({
});
const client = new Client({ name: "fake-claude", version: "1.0.0" });
await client.connect(transport);
const tools = await client.listTools();
if (!tools.tools.some((tool) => tool.name === "bundle_probe")) {
throw new Error("bundle_probe tool not exposed");
}
const result = await client.callTool({ name: "bundle_probe", arguments: {} });
await transport.close();
const result = await (async () => {
try {
const tools = await client.listTools();
if (!tools.tools.some((tool) => tool.name === "bundle_probe")) {
throw new Error("bundle_probe tool not exposed");
}
return await client.callTool({ name: "bundle_probe", arguments: {} });
} finally {
await transport.close();
}
})();
const text = Array.isArray(result.content)
? result.content

View File

@@ -363,6 +363,23 @@ describe("buildEmbeddedRunPayloads tool-error warnings", () => {
});
});
it("surfaces declined Codex native command errors for aborted empty turns", () => {
const payloads = buildPayloads({
assistantTexts: [],
lastToolError: {
toolName: "bash",
error: "codex native tool blocked",
mutatingAction: true,
},
runAborted: true,
});
expectSingleToolErrorPayload(payloads, {
title: "Bash",
absentDetail: "codex native tool blocked",
});
});
it("surfaces exec tool errors for cron sessions even when verbose mode is off", () => {
const payloads = buildPayloads({
lastToolError: {

View File

@@ -171,45 +171,6 @@ describe("agentCliCommand", () => {
});
});
it("rejects timeout values with junk suffixes", async () => {
await withTempStore(async () => {
await expect(
agentCliCommand({ message: "hi", to: "+1555", timeout: "10wat" }, runtime),
).rejects.toThrow(
"Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
);
expect(callGateway).not.toHaveBeenCalled();
expect(agentCommand).not.toHaveBeenCalled();
});
});
it("rejects fractional timeout values", async () => {
await withTempStore(async () => {
await expect(
agentCliCommand({ message: "hi", to: "+1555", timeout: "1.5" }, runtime),
).rejects.toThrow(
"Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
);
expect(callGateway).not.toHaveBeenCalled();
expect(agentCommand).not.toHaveBeenCalled();
});
});
it("rejects blank timeout values instead of disabling the timeout", async () => {
await withTempStore(async () => {
await expect(
agentCliCommand({ message: "hi", to: "+1555", timeout: " " }, runtime),
).rejects.toThrow(
"Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
);
expect(callGateway).not.toHaveBeenCalled();
expect(agentCommand).not.toHaveBeenCalled();
});
});
it("uses gateway by default", async () => {
await withTempStore(async () => {
mockGatewaySuccessReply();

View File

@@ -71,20 +71,16 @@ function protectJsonStdout(opts: Pick<AgentCliOpts, "json">): void {
}
function parseTimeoutSeconds(opts: { cfg: OpenClawConfig; timeout?: string }) {
const raw = opts.timeout !== undefined ? opts.timeout.trim() : undefined;
if (raw !== undefined && !/^\d+$/.test(raw)) {
const raw =
opts.timeout !== undefined
? Number.parseInt(opts.timeout, 10)
: (opts.cfg.agents?.defaults?.timeoutSeconds ?? 600);
if (Number.isNaN(raw) || raw < 0) {
throw new Error(
`Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.`,
);
}
const parsed =
raw !== undefined ? Number(raw) : (opts.cfg.agents?.defaults?.timeoutSeconds ?? 600);
if (!Number.isInteger(parsed) || parsed < 0) {
throw new Error(
`Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.`,
);
}
return parsed;
return raw;
}
function formatPayloadForLog(payload: {

View File

@@ -87,54 +87,6 @@ describe("channelsLogsCommand", () => {
expect(payload.lines.map((line) => line.message)).toEqual(["external sent"]);
});
it("rejects unknown channel filters instead of falling back to all logs", async () => {
await fs.writeFile(
logPath,
[
logLine({ module: "gateway/channels/external-chat/send", message: "external sent" }),
logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
].join("\n"),
);
await channelsLogsCommand({ channel: "typo", json: true }, runtime);
expect(runtime.error).toHaveBeenCalledWith(
'Unknown channel "typo" for logs. Run `openclaw channels list --all` to see configured and installable channels.',
);
expect(runtime.exit).toHaveBeenCalledWith(1);
expect(runtime.log).not.toHaveBeenCalled();
});
it("rejects invalid line limits instead of silently using the default", async () => {
await fs.writeFile(
logPath,
logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
);
await channelsLogsCommand({ channel: "slack", lines: "wat", json: true }, runtime);
expect(runtime.error).toHaveBeenCalledWith(
"Invalid --lines. Use a positive integer, for example --lines 200.",
);
expect(runtime.exit).toHaveBeenCalledWith(1);
expect(runtime.log).not.toHaveBeenCalled();
});
it("rejects fractional line limits instead of truncating", async () => {
await fs.writeFile(
logPath,
logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
);
await channelsLogsCommand({ channel: "slack", lines: "2.5", json: true }, runtime);
expect(runtime.error).toHaveBeenCalledWith(
"Invalid --lines. Use a positive integer, for example --lines 200.",
);
expect(runtime.exit).toHaveBeenCalledWith(1);
expect(runtime.log).not.toHaveBeenCalled();
});
it("falls back to the latest rolling log when the configured rolling file is missing", async () => {
const configuredFile = path.join(tempDir, "openclaw-2026-04-26.log");
const fallbackFile = path.join(tempDir, "openclaw-2026-04-25.log");

View File

@@ -1,6 +1,5 @@
import fs from "node:fs/promises";
import { normalizeChannelId as normalizeBundledChannelId } from "../../channels/registry.js";
import { formatUnknownChannelMessage } from "../../cli/error-format.js";
import { getResolvedLoggerSettings } from "../../logging.js";
import { resolveLogFile } from "../../logging/log-tail.js";
import { parseLogLine } from "../../logging/parse-log-line.js";
@@ -38,19 +37,7 @@ function parseChannelFilter(raw?: string) {
if (bundled) {
return bundled;
}
return listManifestChannelIds().has(trimmed) ? trimmed : null;
}
function parseLineLimit(raw: string | number | undefined): number | null {
if (raw === undefined) {
return DEFAULT_LIMIT;
}
const value = typeof raw === "string" ? raw.trim() : String(raw);
if (!/^\d+$/.test(value)) {
return null;
}
const parsed = Number(value);
return Number.isSafeInteger(parsed) && parsed > 0 ? parsed : null;
return listManifestChannelIds().has(trimmed) ? trimmed : "all";
}
function matchesChannel(line: NonNullable<LogLine>, channel: string) {
@@ -104,22 +91,11 @@ export async function channelsLogsCommand(
runtime: RuntimeEnv = defaultRuntime,
) {
const channel = parseChannelFilter(opts.channel);
if (!channel) {
runtime.error(
formatUnknownChannelMessage({
channel: opts.channel ?? "",
purpose: "logs",
}),
);
runtime.exit(1);
return;
}
const limit = parseLineLimit(opts.lines);
if (limit === null) {
runtime.error("Invalid --lines. Use a positive integer, for example --lines 200.");
runtime.exit(1);
return;
}
const limitRaw = typeof opts.lines === "string" ? Number(opts.lines) : opts.lines;
const limit =
typeof limitRaw === "number" && Number.isFinite(limitRaw) && limitRaw > 0
? Math.floor(limitRaw)
: DEFAULT_LIMIT;
const file = await resolveLogFile(getResolvedLoggerSettings().file);
const rawLines = await readTailLines(file, limit * 4);

View File

@@ -1,75 +0,0 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import { formatCliCommand } from "../cli/command-format.js";
import type { RuntimeEnv } from "../runtime.js";
import { CONFIGURE_WIZARD_SECTIONS } from "./configure.shared.js";
const mocks = vi.hoisted(() => ({
runConfigureWizard: vi.fn(async () => {}),
}));
vi.mock("./configure.wizard.js", () => ({
runConfigureWizard: mocks.runConfigureWizard,
}));
import { configureCommandFromSectionsArg } from "./configure.commands.js";
function makeRuntime(): RuntimeEnv {
return {
log: vi.fn(),
error: vi.fn(),
exit: vi.fn() as unknown as RuntimeEnv["exit"],
};
}
describe("configureCommandFromSectionsArg", () => {
beforeEach(() => {
vi.clearAllMocks();
});
it("runs the full configure wizard when no sections are provided", async () => {
const runtime = makeRuntime();
await configureCommandFromSectionsArg(undefined, runtime);
expect(mocks.runConfigureWizard).toHaveBeenCalledWith({ command: "configure" }, runtime);
expect(runtime.error).not.toHaveBeenCalled();
expect(runtime.exit).not.toHaveBeenCalled();
});
it("runs only the requested valid sections", async () => {
const runtime = makeRuntime();
await configureCommandFromSectionsArg(["gateway", "model"], runtime);
expect(mocks.runConfigureWizard).toHaveBeenCalledWith(
{ command: "configure", sections: ["gateway", "model"] },
runtime,
);
expect(runtime.error).not.toHaveBeenCalled();
expect(runtime.exit).not.toHaveBeenCalled();
});
it("rejects invalid-only section input instead of falling back to the full wizard", async () => {
const runtime = makeRuntime();
await configureCommandFromSectionsArg(["typo"], runtime);
expect(runtime.error).toHaveBeenCalledWith(
`Invalid --section: typo. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
);
expect(runtime.exit).toHaveBeenCalledWith(1);
expect(mocks.runConfigureWizard).not.toHaveBeenCalled();
});
it("rejects mixed valid and invalid section input without running a partial wizard", async () => {
const runtime = makeRuntime();
await configureCommandFromSectionsArg(["gateway", "bogus"], runtime);
expect(runtime.error).toHaveBeenCalledWith(
`Invalid --section: bogus. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
);
expect(runtime.exit).toHaveBeenCalledWith(1);
expect(mocks.runConfigureWizard).not.toHaveBeenCalled();
});
});

View File

@@ -21,6 +21,11 @@ export async function configureCommandFromSectionsArg(
runtime: RuntimeEnv = defaultRuntime,
): Promise<void> {
const { sections, invalid } = parseConfigureWizardSections(rawSections);
if (sections.length === 0) {
await configureCommand(runtime);
return;
}
if (invalid.length > 0) {
runtime.error(
`Invalid --section: ${invalid.join(", ")}. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
@@ -29,10 +34,5 @@ export async function configureCommandFromSectionsArg(
return;
}
if (sections.length === 0) {
await configureCommand(runtime);
return;
}
await configureCommandWithSections(sections as never, runtime);
}

View File

@@ -1,6 +1,10 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
import type { OpenClawConfig } from "../config/config.js";
import { noteChromeMcpBrowserReadiness } from "./doctor-browser.js";
import {
detectLegacyClawdBrowserProfileResidue,
maybeArchiveLegacyClawdBrowserProfileResidue,
noteChromeMcpBrowserReadiness,
} from "./doctor-browser.js";
const loadBundledPluginPublicSurfaceModuleSync = vi.hoisted(() => vi.fn());
@@ -44,6 +48,112 @@ describe("doctor browser facade", () => {
expect(noteFn).not.toHaveBeenCalled();
});
it("delegates legacy clawd browser profile detection to the browser facade surface", async () => {
const residue = {
legacyProfileDir: "/tmp/openclaw-home/browser/clawd",
legacyUserDataDir: "/tmp/openclaw-home/browser/clawd/user-data",
canonicalUserDataDir: "/tmp/openclaw-home/browser/openclaw/user-data",
};
const detect = vi.fn().mockReturnValue(residue);
loadBundledPluginPublicSurfaceModuleSync.mockReturnValue({
noteChromeMcpBrowserReadiness: vi.fn(),
detectLegacyClawdBrowserProfileResidue: detect,
});
const cfg: OpenClawConfig = {
browser: {
profiles: {
openclaw: { color: "#FF4500" },
},
},
};
const deps = {
configDir: "/tmp/openclaw-home",
pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
};
await expect(detectLegacyClawdBrowserProfileResidue(cfg, deps)).resolves.toEqual(residue);
expect(loadBundledPluginPublicSurfaceModuleSync).toHaveBeenCalledWith({
dirName: "browser",
artifactBasename: "browser-doctor.js",
});
expect(detect).toHaveBeenCalledWith(cfg, deps);
});
it("delegates legacy clawd browser profile cleanup to the browser facade surface", async () => {
const cleanup = vi.fn().mockResolvedValue({ changes: ["archived"], warnings: [] });
loadBundledPluginPublicSurfaceModuleSync.mockReturnValue({
noteChromeMcpBrowserReadiness: vi.fn(),
maybeArchiveLegacyClawdBrowserProfileResidue: cleanup,
});
const cfg: OpenClawConfig = {
browser: {
profiles: {
openclaw: { color: "#FF4500" },
},
},
};
const deps = {
configDir: "/tmp/openclaw-home",
pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
};
await expect(maybeArchiveLegacyClawdBrowserProfileResidue(cfg, deps)).resolves.toEqual({
changes: ["archived"],
warnings: [],
});
expect(loadBundledPluginPublicSurfaceModuleSync).toHaveBeenCalledWith({
dirName: "browser",
artifactBasename: "browser-doctor.js",
});
expect(cleanup).toHaveBeenCalledWith(cfg, deps);
});
it("warns when browser profile cleanup surface is unavailable", async () => {
loadBundledPluginPublicSurfaceModuleSync.mockImplementation(() => {
throw new Error("missing browser doctor facade");
});
await expect(
maybeArchiveLegacyClawdBrowserProfileResidue(
{},
{
configDir: "/tmp/openclaw-home",
pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
},
),
).resolves.toEqual({
changes: [],
warnings: ["Browser profile cleanup is unavailable: missing browser doctor facade"],
});
});
it("skips loading the browser residue detection surface when legacy residue is absent", async () => {
await expect(
detectLegacyClawdBrowserProfileResidue(
{},
{
configDir: "/tmp/openclaw-home",
pathExists: () => false,
},
),
).resolves.toBeNull();
expect(loadBundledPluginPublicSurfaceModuleSync).not.toHaveBeenCalled();
});
it("skips loading the browser cleanup surface when legacy residue is absent", async () => {
await expect(
maybeArchiveLegacyClawdBrowserProfileResidue(
{},
{
configDir: "/tmp/openclaw-home",
pathExists: () => false,
},
),
).resolves.toEqual({ changes: [], warnings: [] });
expect(loadBundledPluginPublicSurfaceModuleSync).not.toHaveBeenCalled();
});
it("warns and no-ops when the browser doctor surface is unavailable", async () => {
loadBundledPluginPublicSurfaceModuleSync.mockImplementation(() => {
throw new Error("missing browser doctor facade");

View File

@@ -1,6 +1,9 @@
import fs from "node:fs";
import path from "node:path";
import type { OpenClawConfig } from "../config/types.openclaw.js";
import { loadBundledPluginPublicSurfaceModuleSync } from "../plugin-sdk/facade-loader.js";
import { note } from "../terminal/note.js";
import { resolveConfigDir } from "../utils.js";
type BrowserDoctorDeps = {
platform?: NodeJS.Platform;
@@ -13,10 +16,33 @@ type BrowserDoctorDeps = {
) => { path: string } | null;
resolveChromeExecutable?: (platform: NodeJS.Platform) => { path: string } | null;
readVersion?: (executablePath: string) => string | null;
configDir?: string;
pathExists?: (targetPath: string) => boolean;
};
export type BrowserDoctorRepairDeps = {
env?: NodeJS.ProcessEnv;
configDir?: string;
pathExists?: (targetPath: string) => boolean;
movePathToTrash?: (targetPath: string) => Promise<string>;
};
export type LegacyClawdBrowserProfileResidue = {
legacyProfileDir: string;
legacyUserDataDir: string;
canonicalUserDataDir: string;
};
type BrowserDoctorSurface = {
noteChromeMcpBrowserReadiness: (cfg: OpenClawConfig, deps?: BrowserDoctorDeps) => Promise<void>;
detectLegacyClawdBrowserProfileResidue?: (
cfg: OpenClawConfig,
deps?: BrowserDoctorRepairDeps,
) => LegacyClawdBrowserProfileResidue | null;
maybeArchiveLegacyClawdBrowserProfileResidue?: (
cfg: OpenClawConfig,
deps?: BrowserDoctorRepairDeps,
) => Promise<{ changes: string[]; warnings: string[] }>;
};
function loadBrowserDoctorSurface(): BrowserDoctorSurface {
@@ -26,6 +52,18 @@ function loadBrowserDoctorSurface(): BrowserDoctorSurface {
});
}
function mayHaveLegacyClawdBrowserProfileResidue(deps?: BrowserDoctorRepairDeps): boolean {
const configDir = deps?.configDir ?? resolveConfigDir(deps?.env ?? process.env);
const legacyProfileDir = path.join(configDir, "browser", "clawd");
const legacyUserDataDir = path.join(legacyProfileDir, "user-data");
const pathExists = deps?.pathExists ?? fs.existsSync;
try {
return pathExists(legacyProfileDir) || pathExists(legacyUserDataDir);
} catch {
return true;
}
}
export async function noteChromeMcpBrowserReadiness(cfg: OpenClawConfig, deps?: BrowserDoctorDeps) {
try {
await loadBrowserDoctorSurface().noteChromeMcpBrowserReadiness(cfg, deps);
@@ -35,3 +73,39 @@ export async function noteChromeMcpBrowserReadiness(cfg: OpenClawConfig, deps?:
noteFn(`- Browser health check is unavailable: ${message}`, "Browser");
}
}
export async function detectLegacyClawdBrowserProfileResidue(
cfg: OpenClawConfig,
deps?: BrowserDoctorRepairDeps,
): Promise<LegacyClawdBrowserProfileResidue | null> {
if (!mayHaveLegacyClawdBrowserProfileResidue(deps)) {
return null;
}
const detect = loadBrowserDoctorSurface().detectLegacyClawdBrowserProfileResidue;
if (!detect) {
return null;
}
return detect(cfg, deps);
}
export async function maybeArchiveLegacyClawdBrowserProfileResidue(
cfg: OpenClawConfig,
deps?: BrowserDoctorRepairDeps,
): Promise<{ changes: string[]; warnings: string[] }> {
if (!mayHaveLegacyClawdBrowserProfileResidue(deps)) {
return { changes: [], warnings: [] };
}
try {
const repair = loadBrowserDoctorSurface().maybeArchiveLegacyClawdBrowserProfileResidue;
if (!repair) {
return { changes: [], warnings: [] };
}
return await repair(cfg, deps);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return {
changes: [],
warnings: [`Browser profile cleanup is unavailable: ${message}`],
};
}
}

View File

@@ -65,7 +65,7 @@ describe("runDoctorLintCli", () => {
expect(exitCode).toBe(0);
expect(String(stdout.mock.calls[0]?.[0])).toBe(
"doctor --lint: ran 5 check(s), 0 finding(s)\n",
"doctor --lint: ran 6 check(s), 0 finding(s)\n",
);
expect(String(stdout.mock.calls[1]?.[0])).toBe(" no findings\n");
} finally {

View File

@@ -9,6 +9,11 @@ vi.mock("./doctor-bootstrap-size.js", () => ({
}));
vi.mock("./doctor-browser.js", () => ({
detectLegacyClawdBrowserProfileResidue: vi.fn().mockResolvedValue(null),
maybeArchiveLegacyClawdBrowserProfileResidue: vi.fn().mockResolvedValue({
changes: [],
warnings: [],
}),
noteChromeMcpBrowserReadiness: vi.fn().mockResolvedValue(undefined),
}));

View File

@@ -148,30 +148,4 @@ describe("models scan command", () => {
expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
});
it("rejects fractional count options before scanning", async () => {
const runtime = createRuntime();
await expect(modelsScanCommand({ maxCandidates: "1.5" }, runtime)).rejects.toThrow(
"--max-candidates must be a positive integer",
);
await expect(modelsScanCommand({ concurrency: "2.5" }, runtime)).rejects.toThrow(
"--concurrency must be a positive integer",
);
expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
});
it("rejects blank count options before scanning", async () => {
const runtime = createRuntime();
await expect(modelsScanCommand({ maxCandidates: "" }, runtime)).rejects.toThrow(
"--max-candidates must be a positive integer",
);
await expect(modelsScanCommand({ concurrency: "" }, runtime)).rejects.toThrow(
"--concurrency must be a positive integer",
);
expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
});
});

View File

@@ -153,21 +153,6 @@ function printScanTable(results: ModelScanResult[], runtime: RuntimeEnv) {
}
}
function parsePositiveIntegerOption(
raw: string | undefined,
fallback: number | undefined,
): number | undefined {
if (raw === undefined) {
return fallback;
}
const trimmed = raw.trim();
if (!/^\d+$/.test(trimmed)) {
return undefined;
}
const value = Number(trimmed);
return Number.isInteger(value) && value > 0 ? value : undefined;
}
export async function modelsScanCommand(
opts: {
minParams?: string;
@@ -193,17 +178,17 @@ export async function modelsScanCommand(
if (maxAgeDays !== undefined && (!Number.isFinite(maxAgeDays) || maxAgeDays < 0)) {
throw new Error("--max-age-days must be >= 0");
}
const maxCandidates = parsePositiveIntegerOption(opts.maxCandidates, 6);
if (maxCandidates === undefined) {
throw new Error("--max-candidates must be a positive integer");
const maxCandidates = opts.maxCandidates ? Number(opts.maxCandidates) : 6;
if (!Number.isFinite(maxCandidates) || maxCandidates <= 0) {
throw new Error("--max-candidates must be > 0");
}
const timeout = opts.timeout ? Number(opts.timeout) : undefined;
if (timeout !== undefined && (!Number.isFinite(timeout) || timeout <= 0)) {
throw new Error("--timeout must be > 0");
}
const concurrency = parsePositiveIntegerOption(opts.concurrency, undefined);
if (opts.concurrency !== undefined && concurrency === undefined) {
throw new Error("--concurrency must be a positive integer");
const concurrency = opts.concurrency ? Number(opts.concurrency) : undefined;
if (concurrency !== undefined && (!Number.isFinite(concurrency) || concurrency <= 0)) {
throw new Error("--concurrency must be > 0");
}
const requestedProbe = opts.probe ?? true;

Some files were not shown because too many files have changed in this diff Show More