fix: mark approval gateway calls as runtime clients

fix: wrap Mac menu gateway errors
fix(telegram): repair desktop proof login
2026-06-13 01:31:48 +08:00 · 2026-05-17 21:24:42 -07:00 · 2026-05-18 05:21:19 +01:00 · 2026-05-18 09:49:21 +05:30 · 2026-05-18 05:19:02 +01:00 · 2026-05-18 04:53:40 +01:00
163 changed files with 4624 additions and 1293 deletions
--- a/.agents/skills/autoreview/SKILL.md
+++ b/.agents/skills/autoreview/SKILL.md
@@ -7,6 +7,8 @@ description: "Autoreview closeout: local dirty changes, PR branch vs main, paral

 Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing.

+Codex native review mode performs best and is recommended. Non-Codex reviewers are fallback/second-opinion paths that receive a generated diff prompt, not the full Codex review-mode runtime.
+
 Use when:
 - user asks for Codex review / autoreview / second-model review
 - after non-trivial code edits, before final/commit/ship
@@ -21,7 +23,7 @@ Use when:
 - Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
 - Keep going until the selected review path returns no accepted/actionable findings.
 - If a review-triggered fix changes code, rerun focused tests and rerun the review helper.
- Default to Codex review. If Codex is unavailable or exits with an error, the helper may fall back to `claude -p`; `pi -p` and `opencode run` are explicit reviewer/fallback options. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
+- Default to Codex review. If Codex is unavailable or exits with an error, the helper falls back to the first configured CLI from `claude -p`, `pi -p`, `opencode run`, `droid exec`, or `copilot`. Prefer Codex for final closeout because it uses native review mode; non-Codex reviewers use a Codex-inspired generated diff prompt. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
 - Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
 - Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
 - If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
@@ -107,8 +109,8 @@ The helper:
 - otherwise uses `origin/main` for non-main branches
 - use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
 - should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- supports `--reviewer codex|claude|pi|opencode|auto`; `auto` runs Codex first
- supports `--fallback-reviewer claude|pi|opencode|none`; default is `claude`
+- supports `--reviewer codex|claude|pi|opencode|droid|copilot|auto`; `auto` means Codex first
+- supports `--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none`; default is configured CLI fallback
 - falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings
 - writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set
 - supports `--dry-run`, `--parallel-tests`, and commit refs
--- a/.agents/skills/autoreview/scripts/autoreview
+++ b/.agents/skills/autoreview/scripts/autoreview
@@ -10,14 +10,16 @@ Options:
                              Target selection. Default: auto.
  --base REF                 Base ref for branch review. Default: PR base or origin/main.
  --commit REF               Commit ref for commit review. Default: HEAD.
-  --reviewer codex|claude|pi|opencode|auto
-                              Review engine. Default: auto (Codex, fallback reviewer on error).
-  --fallback-reviewer claude|pi|opencode|none
-                              Fallback when Codex is unavailable or exits nonzero. Default: claude.
+  --reviewer codex|claude|pi|opencode|droid|copilot|auto
+                              Review engine. Default: Codex with configured fallback on error.
+  --fallback-reviewer auto|claude|pi|opencode|droid|copilot|none
+                              Fallback when Codex is unavailable or exits nonzero. Default: auto.
  --codex-bin PATH           Codex binary. Default: codex.
  --claude-bin PATH          Claude binary. Default: claude.
  --pi-bin PATH              Pi binary. Default: pi.
  --opencode-bin PATH        OpenCode binary. Default: opencode.
+  --droid-bin PATH           Droid binary. Default: droid.
+  --copilot-bin PATH         GitHub Copilot binary. Default: copilot.
  --full-access              Keep yolo/full-access mode enabled. Default.
  --no-yolo                  Run nested Codex review with normal sandbox/approval prompts.
  --output FILE              Also save output to file.
@@ -37,11 +39,13 @@ mode=auto
 base_ref=
 commit_ref=HEAD
 reviewer=${AUTOREVIEW_REVIEWER:-${CODEX_REVIEW_REVIEWER:-auto}}
-fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-claude}}
+fallback_reviewer=${AUTOREVIEW_FALLBACK_REVIEWER:-${CODEX_REVIEW_FALLBACK_REVIEWER:-auto}}
 codex_bin=${CODEX_BIN:-codex}
 claude_bin=${CLAUDE_BIN:-claude}
 pi_bin=${PI_BIN:-pi}
 opencode_bin=${OPENCODE_BIN:-opencode}
+droid_bin=${DROID_BIN:-droid}
+copilot_bin=${COPILOT_BIN:-copilot}
 codex_args=()
 yolo=${AUTOREVIEW_YOLO:-${CODEX_REVIEW_YOLO:-1}}
 output=${AUTOREVIEW_OUTPUT:-${CODEX_REVIEW_OUTPUT:-}}
@@ -86,6 +90,14 @@ while [[ $# -gt 0 ]]; do
      opencode_bin=${2:-}
      shift 2
      ;;
+    --droid-bin)
+      droid_bin=${2:-}
+      shift 2
+      ;;
+    --copilot-bin)
+      copilot_bin=${2:-}
+      shift 2
+      ;;
    --full-access)
      yolo=1
      shift
@@ -131,7 +143,7 @@ case "$mode" in
 esac

 case "$reviewer" in
-  auto|codex|claude|pi|opencode) ;;
+  auto|codex|claude|pi|opencode|droid|copilot) ;;
  *)
    echo "invalid --reviewer: $reviewer" >&2
    exit 2
@@ -139,7 +151,7 @@ case "$reviewer" in
 esac

 case "$fallback_reviewer" in
-  claude|pi|opencode|none) ;;
+  auto|claude|pi|opencode|droid|copilot|none) ;;
  *)
    echo "invalid --fallback-reviewer: $fallback_reviewer" >&2
    exit 2
@@ -194,10 +206,17 @@ printf 'branch: %s\n' "${current_branch:-detached}"
 if [[ -n "$pr_url" ]]; then
  printf 'pr: %s\n' "$pr_url"
 fi
-printf 'reviewer: %s\n' "$reviewer"
 if [[ "$reviewer" == auto ]]; then
-  printf 'fallback-reviewer: %s\n' "$fallback_reviewer"
+  printf 'reviewer: codex\n'
+else
+  printf 'reviewer: %s\n' "$reviewer"
 fi
+case "$reviewer" in
+  codex|auto) ;;
+  *)
+    printf 'note: Codex native review mode is the recommended and best-supported review path; %s uses a generated diff prompt.\n' "$reviewer"
+    ;;
+esac
 if [[ "$reviewer" == auto || "$reviewer" == codex ]]; then
  printf 'review:'
  printf ' %q' "${review_cmd[@]}"
@@ -284,10 +303,14 @@ Base: ${base_ref:-}
 Commit: ${commit_ref:-}

 Rules:
- Review only the diff below.
+- Review the proposed code change as a closeout reviewer.
+- Focus on the diff below. If your CLI exposes read-only repository tools, inspect surrounding code and tests to verify findings; never modify files.
 - Do not modify files.
- Prioritize correctness bugs, regressions, security issues, and missing tests.
- Ignore speculative edge cases and broad rewrites.
+- Report only discrete, actionable issues introduced by this change.
+- Prioritize correctness, regressions, security, data loss, performance cliffs, and missing tests that would catch a real bug.
+- Do not report pre-existing issues, speculative risks, broad rewrites, style nits, changelog gaps, or findings that depend on unstated assumptions.
+- Identify the concrete scenario where the issue appears, and keep the line reference as small as possible.
+- A finding should overlap changed code or clearly cite changed code as the cause.
 - For each accepted/actionable finding, use exactly this format:
  [P<0-3>] Short title
  File: path:line
@@ -302,8 +325,15 @@ EOF
  } > "$prompt_file" || return
 }

+reviewer_output_has_clean_marker() {
+  local path=$1
+  grep -Eq '^[^[:alnum:]]*autoreview clean: no accepted/actionable findings reported[[:space:]]*$' "$path"
+}
+
 run_prompt_reviewer() {
  local selected=$1
+  local copilot_prompt=
+  local prompt_bytes=0
  local reviewer_output
  local status=0

@@ -343,13 +373,46 @@ run_prompt_reviewer() {
        echo "fallback reviewer unavailable: $opencode_bin" >&2
        status=127
      elif printf 'fallback: opencode run\n' | tee -a "$review_output"; then
-        "$opencode_bin" run --pure --dir "$(dirname "$prompt_file")" --file "$prompt_file" \
-          "Review the attached prompt file. Do not modify files." 2>&1 | tee -a "$review_output" "$reviewer_output"
+        "$opencode_bin" run --pure --dir "$repo_root" \
+          "Review the attached prompt file. Do not modify files." \
+          --file "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output"
        status=$?
      else
        status=$?
      fi
      ;;
+    droid)
+      if ! command -v "$droid_bin" >/dev/null 2>&1; then
+        echo "fallback reviewer unavailable: $droid_bin" >&2
+        status=127
+      elif printf 'fallback: droid exec\n' | tee -a "$review_output"; then
+        "$droid_bin" exec --cwd "$repo_root" -f "$prompt_file" 2>&1 | tee -a "$review_output" "$reviewer_output"
+        status=$?
+      else
+        status=$?
+      fi
+      ;;
+    copilot)
+      if ! command -v "$copilot_bin" >/dev/null 2>&1; then
+        echo "fallback reviewer unavailable: $copilot_bin" >&2
+        status=127
+      elif printf 'fallback: copilot\n' | tee -a "$review_output"; then
+        prompt_bytes=$(wc -c < "$prompt_file" | tr -d '[:space:]')
+        if (( prompt_bytes > 120000 )); then
+          echo "copilot reviewer unavailable: generated prompt is too large for copilot -p; use codex, droid, or another file/stdin-capable reviewer" \
+            2>&1 | tee -a "$review_output" "$reviewer_output"
+          status=1
+        else
+          copilot_prompt=$(< "$prompt_file")
+          "$copilot_bin" -C "$repo_root" --available-tools=none --stream off --output-format text --silent \
+            -p "$copilot_prompt" \
+            2>&1 | tee -a "$review_output" "$reviewer_output"
+          status=$?
+        fi
+      else
+        status=$?
+      fi
+      ;;
    *)
      echo "unsupported prompt reviewer: $selected" >&2
      status=2
@@ -360,7 +423,7 @@ run_prompt_reviewer() {
      status=1
    elif ! grep -q '[^[:space:]]' "$reviewer_output"; then
      status=1
-    elif ! grep -Fxq 'autoreview clean: no accepted/actionable findings reported' "$reviewer_output"; then
+    elif ! reviewer_output_has_clean_marker "$reviewer_output"; then
      status=1
    fi
  fi
@@ -380,7 +443,7 @@ run_selected_review() {
      fi
      run_review
      ;;
-    claude|pi|opencode)
+    claude|pi|opencode|droid|copilot)
      run_prompt_reviewer "$selected"
      ;;
    *)
@@ -390,6 +453,36 @@ run_selected_review() {
  esac
 }

+fallback_reviewer_is_available() {
+  local selected=$1
+  case "$selected" in
+    claude) command -v "$claude_bin" >/dev/null 2>&1 ;;
+    pi) command -v "$pi_bin" >/dev/null 2>&1 ;;
+    opencode) command -v "$opencode_bin" >/dev/null 2>&1 ;;
+    droid) command -v "$droid_bin" >/dev/null 2>&1 ;;
+    copilot) command -v "$copilot_bin" >/dev/null 2>&1 ;;
+    *) return 1 ;;
+  esac
+}
+
+run_auto_fallback_review() {
+  local selected
+  if [[ "$fallback_reviewer" != auto ]]; then
+    run_selected_review "$fallback_reviewer"
+    return $?
+  fi
+
+  for selected in claude pi opencode droid copilot; do
+    if fallback_reviewer_is_available "$selected"; then
+      run_selected_review "$selected"
+      return $?
+    fi
+  done
+
+  echo "fallback reviewer unavailable: no configured fallback CLI found" >&2
+  return 127
+}
+
 run_auto_review() {
  run_selected_review codex
  local status=$?
@@ -405,8 +498,12 @@ run_auto_review() {
  if [[ "$fallback_reviewer" == none ]]; then
    return "$status"
  fi
-  printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2
-  run_selected_review "$fallback_reviewer"
+  if [[ "$fallback_reviewer" == auto ]]; then
+    printf 'autoreview warning: codex exited %s; trying configured fallback reviewers\n' "$status" >&2
+  else
+    printf 'autoreview warning: codex exited %s; falling back to %s\n' "$status" "$fallback_reviewer" >&2
+  fi
+  run_auto_fallback_review
 }

 elapsed_since() {
--- a/.github/codex/prompts/mantis-telegram-desktop-proof.md
+++ b/.github/codex/prompts/mantis-telegram-desktop-proof.md
@@ -16,8 +16,11 @@ Hard limits:
 - Do not finish with tiny, cropped-wrong, off-bottom, or sidebar-heavy GIFs.
 - Do not invent a generic proof. The proof must match the PR behavior.
 - Do not force GIFs for internal-only, workflow-only, test-only, docs-only, or
-  otherwise non-visual PRs. A no-visual-proof manifest is a successful outcome
-  when GIFs would be misleading.
+  otherwise non-visual PRs. A no-visual-proof manifest is a successful workflow
+  outcome when GIFs would be misleading, but it is not proof that the PR passed.
+- Keep public-facing manifest summaries short and user-domain. Do not mention
+  harness internals, mock-provider limits, secret/trust boundaries, local paths,
+  transcript seeding, or workflow implementation details in the summary.

 Inputs are provided as environment variables:

@@ -42,9 +45,10 @@ Required workflow:
   before/after. If it does not, write
   `${MANTIS_OUTPUT_DIR}/mantis-evidence.json` with `comparison.pass: true`, no
   artifacts, and a summary that starts with
-   `Mantis did not generate before/after GIFs because`. Include the concrete
-   reason in the summary. Use this manifest shape and do not create worktrees
-   or start Crabbox for this case:
+   `Mantis did not generate before/after GIFs because`. Include a short
+   public reason, such as `the PR changes internal session bookkeeping rather
+than Telegram-visible behavior`. Use this manifest shape and do not create
+   worktrees or start Crabbox for this case:

   ```json
   {
@@ -73,6 +77,14 @@ Required workflow:
   }
   ```

+   If the PR appears visual but proof is blocked by Telegram Desktop session
+   state, authorization, credentials, Crabbox, or another capture-infrastructure
+   issue, do not describe it as a no-visual PR. Write a manifest with
+   `comparison.pass: false`, skipped lanes, no artifacts, and a summary that
+   starts with `Mantis could not capture Telegram Desktop proof because`. The
+   publisher will keep that out of PR comments so the failure stays in the
+   workflow logs and artifacts.
+
 4. Decide what Telegram message, mock model response, command, callback, button,
   media, or sequence best proves the PR. Use `MANTIS_INSTRUCTIONS` as extra
   maintainer guidance, not as a replacement for reading the PR.
@@ -134,4 +146,6 @@ Expected final state:
  `Main` and `This PR`.
 - No-visual-proof manifests contain no artifacts and have `comparison.pass:
 true`.
+- Capture-infrastructure failure manifests contain no artifacts and have
+  `comparison.pass: false`.
 - The worktree can be dirty only under `.artifacts/`.
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -20,6 +20,8 @@ on:
      - "docs/**"
  pull_request:
    types: [opened, reopened, synchronize, ready_for_review, converted_to_draft]
+    paths-ignore:
+      - "CHANGELOG.md"

 permissions:
  contents: read
@@ -641,6 +643,15 @@ jobs:
            echo "${name}-result=${results[$name]}" >> "$GITHUB_OUTPUT"
          done

+          failures=0
+          for name in channels core-support-boundary gateway-watch; do
+            if [ "${results[$name]}" = "failure" ]; then
+              echo "::error title=${name} failed::${name} failed"
+              failures=1
+            fi
+          done
+          exit "$failures"
+
      - name: Upload gateway watch regression artifacts
        if: always() && needs.preflight.outputs.run_check_additional == 'true'
        uses: actions/upload-artifact@v7
@@ -828,28 +839,6 @@ jobs:
          EOF
          OPENCLAW_VITEST_INCLUDE_FILE="$include_file" pnpm test:contracts:plugins

-  checks-fast-plugin-contracts:
-    permissions:
-      contents: read
-    name: checks-fast-contracts-plugins
-    needs: [preflight, checks-fast-plugin-contracts-shard]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_plugin_contracts_shards == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify plugin contract shards
-        env:
-          SHARD_RESULT: ${{ needs.checks-fast-plugin-contracts-shard.result }}
-        run: |
-          if [ "$SHARD_RESULT" = "cancelled" ]; then
-            echo "Plugin contract shards were cancelled, usually because a newer commit superseded this run." >&2
-            exit 1
-          fi
-          if [ "$SHARD_RESULT" != "success" ]; then
-            echo "Plugin contract shards failed: $SHARD_RESULT" >&2
-            exit 1
-          fi
-
  checks-fast-channel-contracts-shard:
    permissions:
      contents: read
@@ -934,28 +923,6 @@ jobs:
          EOF
          OPENCLAW_VITEST_INCLUDE_FILE="$include_file" pnpm test:contracts:channels

-  checks-fast-channel-contracts:
-    permissions:
-      contents: read
-    name: checks-fast-contracts-channels
-    needs: [preflight, checks-fast-channel-contracts-shard]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks_fast == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify channel contract shards
-        env:
-          SHARD_RESULT: ${{ needs.checks-fast-channel-contracts-shard.result }}
-        run: |
-          if [ "$SHARD_RESULT" = "cancelled" ]; then
-            echo "Channel contract shards were cancelled, usually because a newer commit superseded this run." >&2
-            exit 1
-          fi
-          if [ "$SHARD_RESULT" != "success" ]; then
-            echo "Channel contract shards failed: $SHARD_RESULT" >&2
-            exit 1
-          fi
-
  checks-fast-protocol:
    permissions:
      contents: read
@@ -1021,38 +988,6 @@ jobs:
      - name: Run protocol check
        run: pnpm protocol:check

-  checks:
-    permissions:
-      contents: read
-    name: ${{ matrix.check_name }}
-    needs: [preflight, build-artifacts]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks == 'true' && needs.build-artifacts.result == 'success' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    strategy:
-      fail-fast: false
-      matrix: ${{ fromJson(needs.preflight.outputs.checks_matrix) }}
-    steps:
-      - name: Verify ${{ matrix.task }} (${{ matrix.runtime }})
-        env:
-          TASK: ${{ matrix.task }}
-          CHANNELS_RESULT: ${{ needs.build-artifacts.outputs['channels-result'] }}
-        shell: bash
-        run: |
-          set -euo pipefail
-          case "$TASK" in
-            channels)
-              if [ "$CHANNELS_RESULT" != "success" ]; then
-                echo "Channel tests failed in build-artifacts: $CHANNELS_RESULT" >&2
-                exit 1
-              fi
-              ;;
-            *)
-              echo "Unsupported checks task: $TASK" >&2
-              exit 1
-              ;;
-          esac
-
  checks-node-compat:
    permissions:
      contents: read
@@ -1240,63 +1175,6 @@ jobs:
          }
          EOF

-  checks-node-core-test-dist-shard:
-    permissions:
-      contents: read
-    name: ${{ matrix.check_name }}
-    needs: [preflight, build-artifacts]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks_node_core_dist == 'true' && needs.build-artifacts.result == 'success' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    strategy:
-      fail-fast: false
-      matrix: ${{ fromJson(needs.preflight.outputs.checks_node_core_dist_matrix) }}
-    steps:
-      - name: Verify Node test shard
-        env:
-          CORE_SUPPORT_BOUNDARY_RESULT: ${{ needs.build-artifacts.outputs['core-support-boundary-result'] }}
-          SHARD_NAME: ${{ matrix.shard_name }}
-        shell: bash
-        run: |
-          set -euo pipefail
-          case "$SHARD_NAME" in
-            core-support-boundary)
-              if [ "$CORE_SUPPORT_BOUNDARY_RESULT" != "success" ]; then
-                echo "Core support boundary shard failed in build-artifacts: $CORE_SUPPORT_BOUNDARY_RESULT" >&2
-                exit 1
-              fi
-              ;;
-            *)
-              echo "Unsupported built-artifact shard: $SHARD_NAME" >&2
-              exit 1
-              ;;
-          esac
-
-  checks-node-core-test:
-    permissions:
-      contents: read
-    name: checks-node-core
-    needs: [preflight, checks-node-core-test-nondist-shard, checks-node-core-test-dist-shard]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_checks == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify node test shards
-        env:
-          DIST_SHARD_RESULT: ${{ needs.checks-node-core-test-dist-shard.result }}
-          NONDIST_SHARD_RESULT: ${{ needs.checks-node-core-test-nondist-shard.result }}
-          RUN_DIST_SHARDS: ${{ needs.preflight.outputs.run_checks_node_core_dist }}
-          RUN_NONDIST_SHARDS: ${{ needs.preflight.outputs.run_checks_node_core_nondist }}
-        run: |
-          if [ "$RUN_NONDIST_SHARDS" = "true" ] && [ "$NONDIST_SHARD_RESULT" != "success" ]; then
-            echo "Node non-dist test shards failed: $NONDIST_SHARD_RESULT" >&2
-            exit 1
-          fi
-          if [ "$RUN_DIST_SHARDS" = "true" ] && [ "$DIST_SHARD_RESULT" != "success" ]; then
-            echo "Node dist test shards failed: $DIST_SHARD_RESULT" >&2
-            exit 1
-          fi
-
  # Types, lint, and format check shards.
  check-shard:
    permissions:
@@ -1442,24 +1320,6 @@ jobs:
          path: .artifacts/deadcode
          if-no-files-found: ignore

-  check:
-    permissions:
-      contents: read
-    name: "check"
-    needs: [preflight, check-shard]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_check == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify check shards
-        env:
-          SHARD_RESULT: ${{ needs.check-shard.result }}
-        run: |
-          if [ "$SHARD_RESULT" != "success" ]; then
-            echo "Check shards failed: $SHARD_RESULT" >&2
-            exit 1
-          fi
-
  check-additional-shard:
    permissions:
      contents: read
@@ -1637,52 +1497,6 @@ jobs:

          exit "$failures"

-  check-additional:
-    permissions:
-      contents: read
-    name: "check-additional"
-    needs: [preflight, check-additional-shard, build-artifacts]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_check_additional == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify additional check shards
-        env:
-          SHARD_RESULT: ${{ needs.check-additional-shard.result }}
-          BUILD_ARTIFACTS_RESULT: ${{ needs.build-artifacts.result }}
-          GATEWAY_RESULT: ${{ needs.build-artifacts.outputs.gateway-watch-result }}
-        run: |
-          if [ "$SHARD_RESULT" != "success" ]; then
-            echo "Additional check shards failed: $SHARD_RESULT" >&2
-            exit 1
-          fi
-          if [ "$BUILD_ARTIFACTS_RESULT" != "success" ]; then
-            echo "Build artifact job failed: $BUILD_ARTIFACTS_RESULT" >&2
-            exit 1
-          fi
-          if [ "$GATEWAY_RESULT" != "success" ]; then
-            echo "Gateway topology check failed: $GATEWAY_RESULT" >&2
-            exit 1
-          fi
-
-  build-smoke:
-    permissions:
-      contents: read
-    name: "build-smoke"
-    needs: [preflight, build-artifacts]
-    if: ${{ !cancelled() && always() && needs.preflight.outputs.run_build_smoke == 'true' && (github.event_name != 'push' || needs.build-artifacts.result == 'success') }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'ubuntu-24.04' || (github.repository == 'openclaw/openclaw' && 'blacksmith-4vcpu-ubuntu-2404' || 'ubuntu-24.04') }}
-    timeout-minutes: 5
-    steps:
-      - name: Verify build smoke
-        env:
-          BUILD_ARTIFACTS_RESULT: ${{ needs.build-artifacts.result }}
-        run: |
-          if [ "$BUILD_ARTIFACTS_RESULT" != "success" ]; then
-            echo "Build smoke checks failed in build-artifacts: $BUILD_ARTIFACTS_RESULT" >&2
-            exit 1
-          fi
-
  # Validate docs (format, lint, broken links) only when docs files changed.
  check-docs:
    permissions:
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -6,6 +6,7 @@ on:
    paths:
      - "**/*.md"
      - "docs/**"
+      - "!CHANGELOG.md"

 permissions:
  contents: read
--- a/.github/workflows/openclaw-release-checks.yml
+++ b/.github/workflows/openclaw-release-checks.yml
@@ -955,6 +955,57 @@ jobs:
          retention-days: 14
          if-no-files-found: warn

+  runtime_tool_coverage_release_checks:
+    name: Enforce QA Lab runtime tool coverage
+    needs: [resolve_target, qa_lab_runtime_parity_release_checks]
+    if: always() && contains(fromJSON('["all","qa","qa-parity"]'), needs.resolve_target.outputs.rerun_group)
+    runs-on: ubuntu-24.04
+    timeout-minutes: 15
+    permissions:
+      contents: read
+      actions: read
+    env:
+      OPENCLAW_BUILD_PRIVATE_QA: "1"
+      OPENCLAW_ENABLE_PRIVATE_QA_CLI: "1"
+    steps:
+      - name: Checkout selected ref
+        uses: actions/checkout@v6
+        with:
+          persist-credentials: false
+          ref: ${{ needs.resolve_target.outputs.revision }}
+          fetch-depth: 1
+
+      - name: Setup Node environment
+        uses: ./.github/actions/setup-node-env
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+          pnpm-version: ${{ env.PNPM_VERSION }}
+          install-bun: "true"
+
+      - name: Download runtime parity artifacts
+        uses: actions/download-artifact@v4
+        with:
+          name: release-qa-runtime-parity-${{ needs.resolve_target.outputs.revision }}
+          path: .artifacts/qa-e2e/
+
+      - name: Enforce standard runtime tool coverage
+        run: |
+          set -euo pipefail
+          pnpm openclaw qa coverage \
+            --repo-root . \
+            --tools \
+            --summary .artifacts/qa-e2e/runtime-parity-standard/qa-suite-summary.json \
+            --output .artifacts/qa-e2e/runtime-parity-standard-report/qa-runtime-tool-coverage-report.md
+
+      - name: Upload runtime tool coverage artifacts
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: release-qa-runtime-tool-coverage-${{ needs.resolve_target.outputs.revision }}
+          path: .artifacts/qa-e2e/runtime-parity-standard-report/
+          retention-days: 14
+          if-no-files-found: warn
+
  qa_live_matrix_release_checks:
    name: Run QA Lab live Matrix lane
    needs: [resolve_target]
@@ -1434,6 +1485,7 @@ jobs:
      - qa_lab_parity_lane_release_checks
      - qa_lab_parity_report_release_checks
      - qa_lab_runtime_parity_release_checks
+      - runtime_tool_coverage_release_checks
      - qa_live_matrix_release_checks
      - qa_live_telegram_release_checks
      - qa_live_discord_release_checks
@@ -1465,6 +1517,7 @@ jobs:
            "qa_lab_parity_lane_release_checks=${{ needs.qa_lab_parity_lane_release_checks.result }}" \
            "qa_lab_parity_report_release_checks=${{ needs.qa_lab_parity_report_release_checks.result }}" \
            "qa_lab_runtime_parity_release_checks=${{ needs.qa_lab_runtime_parity_release_checks.result }}" \
+            "runtime_tool_coverage_release_checks=${{ needs.runtime_tool_coverage_release_checks.result }}" \
            "qa_live_matrix_release_checks=${{ needs.qa_live_matrix_release_checks.result }}" \
            "qa_live_telegram_release_checks=${{ needs.qa_live_telegram_release_checks.result }}" \
            "qa_live_discord_release_checks=${{ needs.qa_live_discord_release_checks.result }}" \
--- a/.github/workflows/qa-live-transports-convex.yml
+++ b/.github/workflows/qa-live-transports-convex.yml
@@ -229,6 +229,96 @@ jobs:
          retention-days: 14
          if-no-files-found: warn

+  run_live_runtime_token_efficiency:
+    name: Run live runtime token-efficiency lane
+    needs: [authorize_actor, validate_selected_ref]
+    if: github.event_name == 'schedule'
+    runs-on: blacksmith-8vcpu-ubuntu-2404
+    timeout-minutes: 45
+    environment: qa-live-shared
+    env:
+      QA_PARITY_CONCURRENCY: "1"
+      OPENCLAW_QA_TRANSPORT_READY_TIMEOUT_MS: "180000"
+      OPENCLAW_QA_REDACT_PUBLIC_METADATA: "1"
+    steps:
+      - name: Checkout selected ref
+        uses: actions/checkout@v6
+        with:
+          persist-credentials: false
+          ref: ${{ needs.validate_selected_ref.outputs.selected_revision }}
+          fetch-depth: 1
+
+      - name: Setup Node environment
+        uses: ./.github/actions/setup-node-env
+        with:
+          node-version: ${{ env.NODE_VERSION }}
+          pnpm-version: ${{ env.PNPM_VERSION }}
+          install-bun: "true"
+
+      - name: Validate required QA credential env
+        env:
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        shell: bash
+        run: |
+          set -euo pipefail
+
+          if [[ -z "${OPENAI_API_KEY:-}" ]]; then
+            echo "Missing required OPENAI_API_KEY." >&2
+            exit 1
+          fi
+
+      - name: Build private QA runtime
+        env:
+          NODE_OPTIONS: --max-old-space-size=8192
+        run: pnpm build
+
+      - name: Run live runtime parity lane
+        id: run_lane
+        shell: bash
+        env:
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+          OPENCLAW_LIVE_OPENAI_KEY: ${{ secrets.OPENAI_API_KEY }}
+        run: |
+          set -euo pipefail
+
+          output_dir=".artifacts/qa-e2e/runtime-token-efficiency-live-${GITHUB_RUN_ID}-${GITHUB_RUN_ATTEMPT}"
+          echo "output_dir=${output_dir}" >> "$GITHUB_OUTPUT"
+
+          pnpm openclaw qa suite \
+            --repo-root . \
+            --provider-mode live-frontier \
+            --runtime-parity-tier standard \
+            --runtime-parity-tier live-only \
+            --concurrency "${QA_PARITY_CONCURRENCY}" \
+            --model "${OPENCLAW_CI_OPENAI_MODEL}" \
+            --alt-model "${OPENCLAW_CI_OPENAI_MODEL}" \
+            --runtime-pair pi,codex \
+            --fast \
+            --allow-failures \
+            --output-dir "${output_dir}/runtime-suite"
+
+      - name: Generate live runtime token-efficiency report
+        if: always() && steps.run_lane.outcome != 'skipped' && steps.run_lane.outcome != 'cancelled'
+        shell: bash
+        run: |
+          set -euo pipefail
+
+          pnpm openclaw qa parity-report \
+            --repo-root . \
+            --runtime-axis \
+            --token-efficiency \
+            --summary "${{ steps.run_lane.outputs.output_dir }}/runtime-suite/qa-suite-summary.json" \
+            --output-dir "${{ steps.run_lane.outputs.output_dir }}/runtime-report"
+
+      - name: Upload live runtime token-efficiency artifacts
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: qa-live-runtime-token-efficiency-${{ github.run_id }}-${{ github.run_attempt }}
+          path: ${{ steps.run_lane.outputs.output_dir }}
+          retention-days: 14
+          if-no-files-found: warn
+
  run_live_matrix:
    name: Run Matrix live QA lane
    needs: [authorize_actor, validate_selected_ref]
--- a/.github/workflows/workflow-sanity.yml
+++ b/.github/workflows/workflow-sanity.yml
@@ -2,8 +2,12 @@ name: Workflow Sanity

 on:
  pull_request:
+    paths-ignore:
+      - "CHANGELOG.md"
  push:
    branches: [main]
+    paths-ignore:
+      - "CHANGELOG.md"
  workflow_dispatch:

 permissions:
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -24,15 +24,26 @@ Docs: https://docs.openclaw.ai
 - QA-Lab: add live-only harness self-health scenarios for plugin hook crashes, manifest contract errors, and WebChat direct-reply self-message routing. (#80323) Thanks @100yenadmin.
 - QA-Lab: add runtime tool fixture scenarios and coverage reporting for Codex-native workspace tools, OpenClaw dynamic tools, and optional plugin-backed tools. Fixes #80173. Thanks @100yenadmin.
 - QA-Lab: expose runtime tool fixture coverage through `openclaw qa coverage --tools`, with optional suite-summary evaluation for parity gate artifacts. Thanks @100yenadmin.
+- QA-Lab: schedule a live-frontier Codex-vs-Pi runtime token-efficiency artifact lane in the all-lanes QA workflow. Fixes #80175. Thanks @100yenadmin.
+- QA-Lab: hard-gate required OpenClaw dynamic runtime-tool drift in the standard Codex-vs-Pi tier with a blocking release-check verifier and publish the tool coverage report artifact. Fixes #80339; refs #80319. Thanks @100yenadmin.
 - QA-Lab: add the personal-agent approval-denial scenario so the benchmark pack verifies denied local reads stop cleanly without tool progress or fixture leaks. (#83150) Thanks @iFiras-Max1.

 ### Fixes

+- Gateway/skills: preflight remote macOS skill-bin refreshes with a WebSocket connectivity check so stale node sessions skip quickly instead of logging slow `system.which` timeout warnings.
+- GitHub Copilot: drop unsafe native Responses reasoning replay items with non-replayable IDs before dispatch, preventing affected Copilot sessions from failing with `invalid_request_body`. Fixes #83220. Thanks @galiniliev.
+- QA-Lab: make runtime tool coverage fail on missing required tool exercise instead of treating pass/pass parity envelope drift as missing coverage.
+- Core/plugins: harden clawpatch-reported edge cases across gateway auth cleanup, Claude session id paths, plugin activation policy, apply-patch hunk handling, diagnostic redaction, and plugin metadata validation.
+- Mac app: prefer explicit private/Tailscale/LAN Gateway endpoints over SSH tunnels, preserve legacy loopback tunnel configs, persist transport choices, and show captured SSH stderr when tunneling really fails.
+- Gateway/sessions: keep ACP/acpx and runtime child sessions visible in configured-only session lists when their owner or parent session belongs to a configured agent.
 - Mac app: keep app-level menu commands and Dashboard failure states reachable when the remote Gateway is disconnected, and keep the Settings sidebar toggle in the leading titlebar area.
+- Mac app: allow longer Gateway and Context errors to wrap in the menu instead of truncating the useful failure detail.
 - Gateway/webchat: hide internal runtime-context and other `display: false` transcript messages from Chat history and live message events. Fixes #83216. Thanks @EmpireCreator.
 - CLI/help: keep `gateway`, `doctor`, `status`, and `health` help registration out of action/runtime imports so subcommand `--help` stays lightweight in constrained terminals. Fixes #83228. Thanks @dfguerrerom.
 - Cron/Discord: keep explicit announce runs in message-tool-only source-reply mode so scheduled agent turns post once instead of also echoing through automatic visible replies. Fixes #83261. Thanks @Theralley.
 - Telegram: preserve forum-topic origin targets in inbound, audio-preflight, and skipped-message hook contexts so follow-up delivery stays bound to the originating topic. Fixes #83302. Thanks @M00zyx.
+- Telegram: retry HTTP 421 Misdirected Request send failures on a fresh fallback transport so transient edge-node routing errors no longer drop outbound replies. Fixes #48892. (#48908) Thanks @MarsDoge.
+- Telegram: fail topic sends closed when Telegram reports `message thread not found` instead of retrying without `message_thread_id` into the base chat. Refs #83302.
 - Mac app: align the Sessions settings pane with the standard Settings page gutter and row spacing.
 - OpenAI/Codex: stop rejecting available `openai-codex` GPT-5.1, GPT-5.2, and GPT-5.3 model refs during config validation, while keeping removed Spark aliases suppressed. Fixes #83303.
 - Plugins/xAI: complete OAuth-backed xAI login and sidecar auth fixes, including guarded loopback callback CORS handling, video generation polling/defaults, and native-host User-Agent attribution. (#83322) Thanks @Jaaneek.
@@ -46,6 +57,7 @@ Docs: https://docs.openclaw.ai
 - Agents/subagents: require the initial subagent registry save before reporting spawn accepted, returning a spawn error instead of losing an untracked run when the registry write fails. (#83146) Thanks @yetval.
 - QA-Lab/qa-channel: attach redacted agent tool-start traces to outbound `QaBusMessage` records so scenarios can assert actual tool use instead of relying only on reply text. Fixes #67637. Thanks @100yenadmin.
 - QA-Lab: fail live runtime parity reports when assistant-message usage is missing, preventing `0 vs 0` live token rows from being reported as passing proof. Fixes #80411. Thanks @100yenadmin.
+- QA-Lab: add a runtime token-efficiency sidecar report that classifies Codex savings separately from regressions and fails only positive Codex-over-Pi live token deltas above threshold. Fixes #81093. Thanks @100yenadmin.
 - QA-Lab: fail Codex-backed OpenAI live runtime-pair runs before launching isolated workers when no portable Codex auth is available, while staging API-key fallbacks and configured Codex keys for isolated QA agents. Fixes #80412. Thanks @100yenadmin.
 - QA-Lab: refresh parity gates, mock frontier fixtures, model scenarios, and workflow artifact lanes to compare GPT-5.5 against Claude Opus 4.7. Fixes #74262. Thanks @100yenadmin.
 - QA-Lab: make mock parity dispatch provider-aware for source discovery and subagent scenarios so OpenAI and Anthropic lanes no longer share identical canned plans. Fixes #64879. Thanks @100yenadmin.
@@ -81,12 +93,6 @@ Docs: https://docs.openclaw.ai
 - Agents/OpenAI: stop post-processing GPT-5 final replies with hardcoded brevity caps, preserving full channel responses instead of appending synthetic ellipses, and log when strict-agentic GPT-5 execution activates. Fixes #82910.
 - Mac app: refine the Settings General and Connection panes with cleaner status panels, card rows, and a single native titlebar sidebar toggle.
 - Agents/media: deliver failed async image, music, and video generation completions directly when requester-session completion handoff fails, so channel users see provider errors instead of silent fallback stalls.
- CLI/setup: reject invalid `openclaw configure --section` values before opening the full wizard and show config issue details when non-interactive setup is blocked by invalid config.
- CLI/channels: reject unknown `openclaw channels logs --channel` values and invalid `--lines` values instead of silently showing all/default logs.
- CLI/agent: reject `--timeout` values with junk suffixes or fractions instead of partially parsing them.
- CLI/sessions: reject `--active` values with junk suffixes instead of partially parsing them.
- CLI/models: reject fractional `models scan --max-candidates` and `--concurrency` values before starting a scan.
- Config: label root-level `${VAR}` substitution failures as `<root>` instead of printing a blank config path.
 - Agents/music: steer song, jingle, beat, anthem, and instrumental requests toward `music_generate` audio creation instead of lyric-only replies, and reserve `lyrics` for exact sung words.
 - Codex app-server: record native Codex tool calls and results into trajectory artifacts so debug/trajectory exports capture the full Codex-native tool history, not just OpenClaw-bridged turns. Thanks @vyctorbrzezowski.
 - Codex/app-server: keep bound conversation sessions on the owning agent runtime so native Codex control and follow-up turns do not fall back to the default agent client. Fixes #82954. (#82993)
--- a/apps/macos/Sources/OpenClaw/AppState.swift
+++ b/apps/macos/Sources/OpenClaw/AppState.swift
@@ -363,9 +363,11 @@ final class AppState {
        }

        let configRoot = OpenClawConfigFile.loadDict()
-        let configRemoteUrl = GatewayRemoteConfig.resolveUrlString(root: configRoot)
        let configRemoteToken = GatewayRemoteConfig.resolveTokenValue(root: configRoot)
-        let configRemoteTransport = GatewayRemoteConfig.resolveTransport(root: configRoot)
+        let configRemoteResolution = GatewayRemoteConfig.resolveTransportResolution(root: configRoot)
+        let configRemoteTransport = configRemoteResolution.transport
+        let configRemoteUrl = configRemoteResolution.directURL?.absoluteString
+            ?? GatewayRemoteConfig.resolveUrlString(root: configRoot)
        let resolvedConnectionMode = ConnectionModeResolver.resolve(root: configRoot).mode
        self.remoteTransport = configRemoteTransport
        self.connectionMode = resolvedConnectionMode
@@ -532,7 +534,10 @@ final class AppState {
            }

        case .ssh:
-            changed = Self.updateGatewayString(&remote, key: "transport", value: nil) || changed
+            changed = Self.updateGatewayString(
+                &remote,
+                key: "transport",
+                value: RemoteTransport.ssh.rawValue) || changed

            let sanitizedTarget = Self.sanitizeSSHTarget(draft.remoteTarget)
            let expectedRemoteHost = CommandResolver.parseSSHTarget(sanitizedTarget)?.host ?? draft.remoteHost
@@ -576,7 +581,8 @@ final class AppState {
        let hasRemoteUrl = !(remoteUrl?
            .trimmingCharacters(in: .whitespacesAndNewlines)
            .isEmpty ?? true)
-        let remoteTransport = GatewayRemoteConfig.resolveTransport(root: root)
+        let remoteResolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+        let remoteTransport = remoteResolution.transport

        let desiredMode: ConnectionMode? = switch modeRaw {
        case "local":
@@ -600,7 +606,7 @@ final class AppState {
        if remoteTransport != self.remoteTransport {
            self.remoteTransport = remoteTransport
        }
-        let remoteUrlText = remoteUrl ?? ""
+        let remoteUrlText = remoteResolution.directURL?.absoluteString ?? remoteUrl ?? ""
        if remoteUrlText != self.remoteUrl {
            self.remoteUrl = remoteUrlText
        }
--- a/apps/macos/Sources/OpenClaw/ContextRootMenuLabelView.swift
+++ b/apps/macos/Sources/OpenClaw/ContextRootMenuLabelView.swift
@@ -23,7 +23,7 @@ struct ContextRootMenuLabelView: View {

                if self.usesStackedLayout {
                    self.subtitleText
-                        .lineLimit(3)
+                        .lineLimit(5)
                        .fixedSize(horizontal: false, vertical: true)
                }
            }
--- a/apps/macos/Sources/OpenClaw/ControlChannel.swift
+++ b/apps/macos/Sources/OpenClaw/ControlChannel.swift
@@ -265,9 +265,10 @@ final class ControlChannel {

    private static func isLikelyLocalNetworkPermissionBlock() -> Bool {
        let root = OpenClawConfigFile.loadDict()
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
        guard ConnectionModeResolver.resolve(root: root).mode == .remote,
-              GatewayRemoteConfig.resolveTransport(root: root) == .direct,
-              let url = GatewayRemoteConfig.resolveGatewayUrl(root: root),
+              resolution.transport == .direct,
+              let url = resolution.directURL,
              url.scheme?.lowercased() == "ws",
              let host = url.host,
              GatewayRemoteConfig.isTrustedPlaintextRemoteHost(host),
--- a/apps/macos/Sources/OpenClaw/DashboardManager.swift
+++ b/apps/macos/Sources/OpenClaw/DashboardManager.swift
@@ -115,9 +115,10 @@ final class DashboardManager {

    private func immediateDashboardConfig(mode: AppState.ConnectionMode) -> GatewayConnection.Config? {
        let root = OpenClawConfigFile.loadDict()
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
        if mode == .remote,
-           GatewayRemoteConfig.resolveTransport(root: root) == .direct,
-           let url = GatewayRemoteConfig.resolveGatewayUrl(root: root)
+           resolution.transport == .direct,
+           let url = resolution.directURL
        {
            return (
                url,
--- a/apps/macos/Sources/OpenClaw/GatewayDiscoveryHelpers.swift
+++ b/apps/macos/Sources/OpenClaw/GatewayDiscoveryHelpers.swift
@@ -41,21 +41,31 @@ enum GatewayDiscoveryHelpers {
    static func directUrl(for gateway: GatewayDiscoveryModel.DiscoveredGateway) -> String? {
        self.directGatewayUrl(
            serviceHost: gateway.serviceHost,
-            servicePort: gateway.servicePort)
+            servicePort: gateway.servicePort,
+            gatewayTls: gateway.gatewayTls)
    }

    static func directGatewayUrl(
        serviceHost: String?,
-        servicePort: Int?) -> String?
+        servicePort: Int?,
+        gatewayTls: Bool = false) -> String?
    {
        // Security: do not route using unauthenticated TXT hints (tailnetDns/lanHost/gatewayPort).
        // Prefer the resolved service endpoint (SRV + A/AAAA).
        guard let endpoint = self.serviceEndpoint(serviceHost: serviceHost, servicePort: servicePort) else {
            return nil
        }
-        // Security: for non-loopback hosts, force TLS to avoid plaintext credential/session leakage.
-        let scheme = self.isLoopbackHost(endpoint.host) ? "ws" : "wss"
-        let portSuffix = endpoint.port == 443 ? "" : ":\(endpoint.port)"
+        let scheme: String
+        if gatewayTls {
+            scheme = "wss"
+        } else if self.isLoopbackHost(endpoint.host)
+            || GatewayRemoteConfig.isTrustedPlaintextRemoteHost(endpoint.host)
+        {
+            scheme = "ws"
+        } else {
+            return nil
+        }
+        let portSuffix = scheme == "wss" && endpoint.port == 443 ? "" : ":\(endpoint.port)"
        return "\(scheme)://\(endpoint.host)\(portSuffix)"
    }

--- a/apps/macos/Sources/OpenClaw/GatewayDiscoverySelectionSupport.swift
+++ b/apps/macos/Sources/OpenClaw/GatewayDiscoverySelectionSupport.swift
@@ -25,14 +25,14 @@ enum GatewayDiscoverySelectionSupport {
        state.remoteTarget = GatewayDiscoveryHelpers.sshTarget(for: gateway) ?? ""

        if preferredTransport == .direct {
-            if let endpoint = GatewayDiscoveryHelpers.serviceEndpoint(for: gateway) {
-                OpenClawConfigFile.setRemoteGatewayUrl(
-                    host: endpoint.host,
-                    port: endpoint.port)
+            OpenClawConfigFile.setRemoteGatewayTransport(AppState.RemoteTransport.direct.rawValue)
+            if !state.remoteUrl.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
+                OpenClawConfigFile.setRemoteGatewayUrlString(state.remoteUrl)
            } else {
                OpenClawConfigFile.clearRemoteGatewayUrl()
            }
        } else {
+            OpenClawConfigFile.setRemoteGatewayTransport(AppState.RemoteTransport.ssh.rawValue)
            OpenClawConfigFile.setRemoteGatewayUrlString(state.remoteUrl)
        }
    }
@@ -65,9 +65,10 @@ enum GatewayDiscoverySelectionSupport {
        for gateway: GatewayDiscoveryModel.DiscoveredGateway) -> Bool
    {
        guard GatewayDiscoveryHelpers.directUrl(for: gateway) != nil else { return false }
-        if gateway.stableID.hasPrefix("tailscale-serve|") {
+        if gateway.gatewayTls || gateway.gatewayDirectReachable {
            return true
        }
+
        guard let host = GatewayDiscoveryHelpers.resolvedServiceHost(for: gateway)?
            .trimmingCharacters(in: .whitespacesAndNewlines)
            .lowercased()
--- a/apps/macos/Sources/OpenClaw/GatewayEndpointStore.swift
+++ b/apps/macos/Sources/OpenClaw/GatewayEndpointStore.swift
@@ -306,8 +306,9 @@ actor GatewayEndpointStore {
                password: password))
        case .remote:
            let root = OpenClawConfigFile.loadDict()
-            if GatewayRemoteConfig.resolveTransport(root: root) == .direct {
-                guard let url = GatewayRemoteConfig.resolveGatewayUrl(root: root) else {
+            let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+            if resolution.transport == .direct {
+                guard let url = resolution.directURL else {
                    self.cancelRemoteEnsure()
                    self.setState(.unavailable(
                        mode: .remote,
@@ -470,8 +471,9 @@ actor GatewayEndpointStore {

    private func resolveDirectRemoteURL() throws -> URL? {
        let root = OpenClawConfigFile.loadDict()
-        guard GatewayRemoteConfig.resolveTransport(root: root) == .direct else { return nil }
-        guard let url = GatewayRemoteConfig.resolveGatewayUrl(root: root) else {
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+        guard resolution.transport == .direct else { return nil }
+        guard let url = resolution.directURL else {
            throw NSError(
                domain: "GatewayEndpoint",
                code: 1,
--- a/apps/macos/Sources/OpenClaw/GatewayRemoteConfig.swift
+++ b/apps/macos/Sources/OpenClaw/GatewayRemoteConfig.swift
@@ -5,6 +5,18 @@ import Darwin
 #endif

 enum GatewayRemoteConfig {
+    enum TransportSource: Equatable {
+        case explicit
+        case inferredRemoteURL
+        case legacySSH
+    }
+
+    struct TransportResolution: Equatable {
+        let transport: AppState.RemoteTransport
+        let source: TransportSource
+        let directURL: URL?
+    }
+
    enum TokenValue: Equatable {
        case missing
        case plaintext(String)
@@ -28,14 +40,49 @@ enum GatewayRemoteConfig {
    }

    static func resolveTransport(root: [String: Any]) -> AppState.RemoteTransport {
+        self.resolveTransportResolution(root: root).transport
+    }
+
+    static func resolveTransportResolution(root: [String: Any]) -> TransportResolution {
+        let explicit = self.resolveExplicitTransport(root: root)
+        switch explicit {
+        case .direct:
+            return TransportResolution(
+                transport: .direct,
+                source: .explicit,
+                directURL: self.resolveGatewayUrl(root: root))
+        case .ssh:
+            return TransportResolution(transport: .ssh, source: .explicit, directURL: nil)
+        case nil:
+            break
+        }
+
+        if let url = self.resolveGatewayUrl(root: root),
+           let host = url.host,
+           !LoopbackHost.isLoopbackHost(host)
+        {
+            return TransportResolution(transport: .direct, source: .inferredRemoteURL, directURL: url)
+        }
+
+        return TransportResolution(transport: .ssh, source: .legacySSH, directURL: nil)
+    }
+
+    private static func resolveExplicitTransport(root: [String: Any]) -> AppState.RemoteTransport? {
        guard let gateway = root["gateway"] as? [String: Any],
              let remote = gateway["remote"] as? [String: Any],
              let raw = remote["transport"] as? String
        else {
-            return .ssh
+            return nil
        }
        let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
-        return trimmed == AppState.RemoteTransport.direct.rawValue ? .direct : .ssh
+        switch trimmed {
+        case AppState.RemoteTransport.direct.rawValue:
+            return .direct
+        case AppState.RemoteTransport.ssh.rawValue:
+            return .ssh
+        default:
+            return .ssh
+        }
    }

    static func resolveUrlString(root: [String: Any]) -> String? {
--- a/apps/macos/Sources/OpenClaw/MenuHeaderCard.swift
+++ b/apps/macos/Sources/OpenClaw/MenuHeaderCard.swift
@@ -38,7 +38,7 @@ struct MenuHeaderCard<Content: View>: View {
                    .font(.caption)
                    .foregroundStyle(.secondary)
                    .multilineTextAlignment(.leading)
-                    .lineLimit(3)
+                    .lineLimit(5)
                    .truncationMode(.tail)
                    .fixedSize(horizontal: false, vertical: true)
            }
--- a/apps/macos/Sources/OpenClaw/OpenClawConfigFile.swift
+++ b/apps/macos/Sources/OpenClaw/OpenClawConfigFile.swift
@@ -301,6 +301,16 @@ enum OpenClawConfigFile {
        }
    }

+    static func setRemoteGatewayTransport(_ value: String) {
+        let trimmed = value.trimmingCharacters(in: .whitespacesAndNewlines)
+        guard !trimmed.isEmpty else { return }
+        self.updateGatewayDict { gateway in
+            var remote = gateway["remote"] as? [String: Any] ?? [:]
+            remote["transport"] = trimmed
+            gateway["remote"] = remote
+        }
+    }
+
    static func clearRemoteGatewayUrl() {
        self.updateGatewayDict { gateway in
            guard var remote = gateway["remote"] as? [String: Any] else { return }
--- a/apps/macos/Sources/OpenClaw/RemotePortTunnel.swift
+++ b/apps/macos/Sources/OpenClaw/RemotePortTunnel.swift
@@ -16,6 +16,32 @@ final class RemotePortTunnel: @unchecked Sendable {
    let localPort: UInt16?
    private let stderrHandle: FileHandle?

+    private final class StderrCapture: @unchecked Sendable {
+        private let lock = NSLock()
+        private var text = ""
+        private let limit = 4096
+
+        func append(_ chunk: String) {
+            let trimmed = chunk.trimmingCharacters(in: .whitespacesAndNewlines)
+            guard !trimmed.isEmpty else { return }
+            self.lock.lock()
+            defer { self.lock.unlock() }
+            if !self.text.isEmpty {
+                self.text += "\n"
+            }
+            self.text += trimmed
+            if self.text.count > self.limit {
+                self.text = String(self.text.suffix(self.limit))
+            }
+        }
+
+        func snapshot() -> String {
+            self.lock.lock()
+            defer { self.lock.unlock() }
+            return self.text.trimmingCharacters(in: .whitespacesAndNewlines)
+        }
+    }
+
    private init(process: Process, localPort: UInt16?, stderrHandle: FileHandle?) {
        self.process = process
        self.localPort = localPort
@@ -93,6 +119,7 @@ final class RemotePortTunnel: @unchecked Sendable {
        let pipe = Pipe()
        process.standardError = pipe
        let stderrHandle = pipe.fileHandleForReading
+        let stderrCapture = StderrCapture()

        // Consume stderr so ssh cannot block if it logs.
        stderrHandle.readabilityHandler = { handle in
@@ -106,6 +133,7 @@ final class RemotePortTunnel: @unchecked Sendable {
                .trimmingCharacters(in: .whitespacesAndNewlines),
                !line.isEmpty
            else { return }
+            stderrCapture.append(line)
            Self.logger.error("ssh tunnel stderr: \(line, privacy: .public)")
        }
        process.terminationHandler = { _ in
@@ -114,7 +142,11 @@ final class RemotePortTunnel: @unchecked Sendable {

        try process.run()

-        try await Self.waitForListener(process: process, localPort: localPort, stderrHandle: stderrHandle)
+        try await Self.waitForListener(
+            process: process,
+            localPort: localPort,
+            stderrHandle: stderrHandle,
+            stderrCapture: stderrCapture)

        // Track tunnel so we can clean up stale listeners on restart.
        Task {
@@ -131,12 +163,13 @@ final class RemotePortTunnel: @unchecked Sendable {
    private static func waitForListener(
        process: Process,
        localPort: UInt16,
-        stderrHandle: FileHandle) async throws
+        stderrHandle: FileHandle,
+        stderrCapture: StderrCapture) async throws
    {
        let deadline = Date().addingTimeInterval(6)
        repeat {
            if !process.isRunning {
-                let stderr = Self.drainStderr(stderrHandle)
+                let stderr = Self.drainStderr(stderrHandle, captured: stderrCapture.snapshot())
                let msg = stderr.isEmpty ? "ssh tunnel exited before listening" : "ssh tunnel failed: \(stderr)"
                throw NSError(domain: "RemotePortTunnel", code: 4, userInfo: [NSLocalizedDescriptionKey: msg])
            }
@@ -152,7 +185,7 @@ final class RemotePortTunnel: @unchecked Sendable {
        } while Date() < deadline

        process.terminate()
-        let stderr = Self.drainStderr(stderrHandle)
+        let stderr = Self.drainStderr(stderrHandle, captured: stderrCapture.snapshot())
        let msg = stderr.isEmpty ? "ssh tunnel did not open local port \(localPort)" : "ssh tunnel failed: \(stderr)"
        throw NSError(domain: "RemotePortTunnel", code: 4, userInfo: [NSLocalizedDescriptionKey: msg])
    }
@@ -311,16 +344,27 @@ final class RemotePortTunnel: @unchecked Sendable {
    }

    private static func drainStderr(_ handle: FileHandle) -> String {
+        self.drainStderr(handle, captured: "")
+    }
+
+    private static func drainStderr(_ handle: FileHandle, captured: String) -> String {
        handle.readabilityHandler = nil
        defer { try? handle.close() }

        do {
            let data = try handle.readToEnd() ?? Data()
-            return String(data: data, encoding: .utf8)?
+            let remaining = String(data: data, encoding: .utf8)?
                .trimmingCharacters(in: .whitespacesAndNewlines) ?? ""
+            if captured.isEmpty {
+                return remaining
+            }
+            if remaining.isEmpty {
+                return captured
+            }
+            return captured + "\n" + remaining
        } catch {
            self.logger.debug("Failed to drain ssh stderr: \(error, privacy: .public)")
-            return ""
+            return captured
        }
    }

--- a/apps/macos/Sources/OpenClaw/SettingsRootView.swift
+++ b/apps/macos/Sources/OpenClaw/SettingsRootView.swift
@@ -8,6 +8,7 @@ struct SettingsRootView: View {
    @State private var monitoringPermissions = false
    @State private var selectedTab: SettingsTab = .general
    @State private var cachedTabs: Set<SettingsTab>
+    @State private var columnVisibility: NavigationSplitViewVisibility = .all
    @State private var snapshotPaths: (configPath: String?, stateDir: String?) = (nil, nil)
    let updater: UpdaterProviding?
    private let isPreview = ProcessInfo.processInfo.isPreview
@@ -22,7 +23,7 @@ struct SettingsRootView: View {
    }

    var body: some View {
-        NavigationSplitView {
+        NavigationSplitView(columnVisibility: self.$columnVisibility) {
            List(selection: self.$selectedTab) {
                ForEach(self.visibleGroups) { group in
                    Section(group.title) {
@@ -46,19 +47,9 @@ struct SettingsRootView: View {
            .padding(.horizontal, 22)
            .padding(.vertical, 18)
        }
+        .navigationSplitViewStyle(.balanced)
        .frame(width: SettingsTab.windowWidth, height: SettingsTab.windowHeight, alignment: .topLeading)
        .frame(maxWidth: .infinity, maxHeight: .infinity, alignment: .topLeading)
-        .toolbar(removing: .sidebarToggle)
-        .toolbar {
-            ToolbarItem(placement: .navigation) {
-                Button {
-                    NSApp.sendAction(#selector(NSSplitViewController.toggleSidebar(_:)), to: nil, from: nil)
-                } label: {
-                    Image(systemName: "sidebar.left")
-                }
-                .help("Show or hide sidebar")
-            }
-        }
        .background(SettingsWindowChromeConfigurator())
        .onReceive(NotificationCenter.default.publisher(for: .openclawSelectSettingsTab)) { note in
            if let tab = note.object as? SettingsTab {
--- a/apps/macos/Sources/OpenClaw/SettingsSidebarScroll.swift
+++ b/apps/macos/Sources/OpenClaw/SettingsSidebarScroll.swift
@@ -10,5 +10,6 @@ struct SettingsSidebarScroll<Content: View>: View {
                .padding(.horizontal, 10)
        }
        .settingsSidebarCardLayout()
+        .padding(.leading, 16)
    }
 }
--- a/apps/macos/Sources/OpenClawDiscovery/GatewayDiscoveryModel.swift
+++ b/apps/macos/Sources/OpenClawDiscovery/GatewayDiscoveryModel.swift
@@ -30,6 +30,8 @@ public final class GatewayDiscoveryModel {
        public var tailnetDns: String?
        public var sshPort: Int
        public var gatewayPort: Int?
+        public var gatewayTls: Bool
+        public var gatewayDirectReachable: Bool
        public var cliPath: String?
        public var stableID: String
        public var debugID: String
@@ -43,6 +45,8 @@ public final class GatewayDiscoveryModel {
            tailnetDns: String? = nil,
            sshPort: Int,
            gatewayPort: Int? = nil,
+            gatewayTls: Bool = false,
+            gatewayDirectReachable: Bool = false,
            cliPath: String? = nil,
            stableID: String,
            debugID: String,
@@ -55,6 +59,8 @@ public final class GatewayDiscoveryModel {
            self.tailnetDns = tailnetDns
            self.sshPort = sshPort
            self.gatewayPort = gatewayPort
+            self.gatewayTls = gatewayTls
+            self.gatewayDirectReachable = gatewayDirectReachable
            self.cliPath = cliPath
            self.stableID = stableID
            self.debugID = debugID
@@ -184,6 +190,8 @@ public final class GatewayDiscoveryModel {
                tailnetDns: beacon.tailnetDns,
                sshPort: beacon.sshPort ?? 22,
                gatewayPort: beacon.gatewayPort,
+                gatewayTls: beacon.gatewayTls,
+                gatewayDirectReachable: beacon.gatewayDirectReachable,
                cliPath: beacon.cliPath,
                stableID: stableID,
                debugID: "\(beacon.instanceName)@\(beacon.host):\(beacon.port)",
@@ -210,6 +218,8 @@ public final class GatewayDiscoveryModel {
                tailnetDns: beacon.tailnetDns,
                sshPort: 22,
                gatewayPort: beacon.port,
+                gatewayTls: true,
+                gatewayDirectReachable: true,
                cliPath: nil,
                stableID: stableID,
                debugID: "\(beacon.host):\(beacon.port)",
@@ -282,6 +292,8 @@ public final class GatewayDiscoveryModel {
                tailnetDns: parsedTXT.tailnetDns,
                sshPort: parsedTXT.sshPort,
                gatewayPort: parsedTXT.gatewayPort,
+                gatewayTls: parsedTXT.gatewayTls,
+                gatewayDirectReachable: parsedTXT.gatewayDirectReachable,
                cliPath: parsedTXT.cliPath,
                stableID: stableID,
                debugID: GatewayEndpointID.prettyDescription(result.endpoint),
@@ -445,6 +457,8 @@ public final class GatewayDiscoveryModel {
        public var tailnetDns: String?
        public var sshPort: Int
        public var gatewayPort: Int?
+        public var gatewayTls: Bool
+        public var gatewayDirectReachable: Bool
        public var cliPath: String?
    }

@@ -453,6 +467,8 @@ public final class GatewayDiscoveryModel {
        var tailnetDns: String?
        var sshPort = 22
        var gatewayPort: Int?
+        var gatewayTls = false
+        var gatewayDirectReachable = false
        var cliPath: String?

        if let value = txt["lanHost"] {
@@ -475,6 +491,14 @@ public final class GatewayDiscoveryModel {
        {
            gatewayPort = parsed
        }
+        if let value = txt["gatewayTls"] {
+            let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
+            gatewayTls = normalized == "1" || normalized == "true" || normalized == "yes"
+        }
+        if let value = txt["gatewayDirectReachable"] {
+            let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
+            gatewayDirectReachable = normalized == "1" || normalized == "true" || normalized == "yes"
+        }
        if let value = txt["cliPath"] {
            let trimmed = value.trimmingCharacters(in: .whitespacesAndNewlines)
            cliPath = trimmed.isEmpty ? nil : trimmed
@@ -485,6 +509,8 @@ public final class GatewayDiscoveryModel {
            tailnetDns: tailnetDns,
            sshPort: sshPort,
            gatewayPort: gatewayPort,
+            gatewayTls: gatewayTls,
+            gatewayDirectReachable: gatewayDirectReachable,
            cliPath: cliPath)
    }

--- a/apps/macos/Sources/OpenClawDiscovery/WideAreaGatewayDiscovery.swift
+++ b/apps/macos/Sources/OpenClawDiscovery/WideAreaGatewayDiscovery.swift
@@ -9,6 +9,8 @@ struct WideAreaGatewayBeacon: Equatable {
    var lanHost: String?
    var tailnetDns: String?
    var gatewayPort: Int?
+    var gatewayTls: Bool
+    var gatewayDirectReachable: Bool
    var sshPort: Int?
    var cliPath: String?
 }
@@ -83,6 +85,8 @@ enum WideAreaGatewayDiscovery {
                lanHost: txt["lanHost"],
                tailnetDns: txt["tailnetDns"],
                gatewayPort: parseInt(txt["gatewayPort"]),
+                gatewayTls: parseBool(txt["gatewayTls"]),
+                gatewayDirectReachable: parseBool(txt["gatewayDirectReachable"]),
                sshPort: parseInt(txt["sshPort"]),
                cliPath: txt["cliPath"])
            beacons.append(beacon)
@@ -246,6 +250,12 @@ enum WideAreaGatewayDiscovery {
        return Int(trimmed)
    }

+    private static func parseBool(_ value: String?) -> Bool {
+        guard let value else { return false }
+        let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased()
+        return normalized == "1" || normalized == "true" || normalized == "yes"
+    }
+
    private static func isTailnetIPv4(_ value: String) -> Bool {
        let parts = value.split(separator: ".")
        if parts.count != 4 { return false }
--- a/apps/macos/Sources/OpenClawMacCLI/DiscoverCommand.swift
+++ b/apps/macos/Sources/OpenClawMacCLI/DiscoverCommand.swift
@@ -41,6 +41,8 @@ struct DiscoveryOutput: Encodable {
        var tailnetDns: String?
        var sshPort: Int
        var gatewayPort: Int?
+        var gatewayTls: Bool
+        var gatewayDirectReachable: Bool
        var cliPath: String?
        var stableID: String
        var debugID: String
@@ -106,6 +108,8 @@ func runDiscover(_ args: [String]) async {
                    tailnetDns: $0.tailnetDns,
                    sshPort: $0.sshPort,
                    gatewayPort: $0.gatewayPort,
+                    gatewayTls: $0.gatewayTls,
+                    gatewayDirectReachable: $0.gatewayDirectReachable,
                    cliPath: $0.cliPath,
                    stableID: $0.stableID,
                    debugID: $0.debugID,
@@ -139,6 +143,8 @@ func runDiscover(_ args: [String]) async {
        if let port = gateway.gatewayPort {
            print("  gatewayPort: \(port)")
        }
+        print("  gatewayTls: \(gateway.gatewayTls)")
+        print("  gatewayDirectReachable: \(gateway.gatewayDirectReachable)")
        if let cliPath = gateway.cliPath {
            print("  cliPath: \(cliPath)")
        }
--- a/apps/macos/Tests/OpenClawIPCTests/AppStateRemoteConfigTests.swift
+++ b/apps/macos/Tests/OpenClawIPCTests/AppStateRemoteConfigTests.swift
@@ -51,7 +51,7 @@ struct AppStateRemoteConfigTests {
                remoteTokenDirty: false))

        #expect(remote["url"] as? String == "ws://127.0.0.1:18789")
-        #expect((remote["transport"] as? String) == nil)
+        #expect(remote["transport"] as? String == "ssh")
        #expect(remote["sshTarget"] as? String == "alice@gateway.example")
    }

@@ -161,6 +161,29 @@ struct AppStateRemoteConfigTests {
        }
    }

+    @Test
+    func `app state init preserves legacy SSH tunnel config until transport is explicit`() async {
+        let configPath = TestIsolation.tempConfigPath()
+        await TestIsolation.withIsolatedState(
+            env: ["OPENCLAW_CONFIG_PATH": configPath],
+            defaults: [remoteTargetKey: nil])
+        {
+            OpenClawConfigFile.saveDict([
+                "gateway": [
+                    "mode": "remote",
+                    "remote": [
+                        "url": "ws://127.0.0.1:18789",
+                        "sshTarget": "steipete@192.168.0.202",
+                    ],
+                ],
+            ])
+
+            let state = AppState(preview: true)
+            #expect(state.remoteTransport == .ssh)
+            #expect(state.remoteUrl == "ws://127.0.0.1:18789")
+        }
+    }
+
    @Test
    func `synced gateway root preserves object token across mode and transport changes when untouched`() {
        let initialRoot: [String: Any] = [
--- a/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoveryHelpersTests.swift
+++ b/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoveryHelpersTests.swift
@@ -10,7 +10,8 @@ struct GatewayDiscoveryHelpersTests {
        lanHost: String? = "txt-host.local",
        tailnetDns: String? = "txt-host.ts.net",
        sshPort: Int = 22,
-        gatewayPort: Int? = 18789) -> GatewayDiscoveryModel.DiscoveredGateway
+        gatewayPort: Int? = 18789,
+        gatewayTls: Bool = false) -> GatewayDiscoveryModel.DiscoveredGateway
    {
        GatewayDiscoveryModel.DiscoveredGateway(
            displayName: "Gateway",
@@ -20,6 +21,7 @@ struct GatewayDiscoveryHelpersTests {
            tailnetDns: tailnetDns,
            sshPort: sshPort,
            gatewayPort: gatewayPort,
+            gatewayTls: gatewayTls,
            cliPath: "/tmp/openclaw",
            stableID: UUID().uuidString,
            debugID: UUID().uuidString,
@@ -70,13 +72,14 @@ struct GatewayDiscoveryHelpersTests {
    @Test func `direct url uses resolved service endpoint only`() {
        let tlsGateway = self.makeGateway(
            serviceHost: "resolved.example.ts.net",
-            servicePort: 443)
+            servicePort: 443,
+            gatewayTls: true)
        #expect(GatewayDiscoveryHelpers.directUrl(for: tlsGateway) == "wss://resolved.example.ts.net")

        let wsGateway = self.makeGateway(
            serviceHost: "resolved.example.ts.net",
            servicePort: 18789)
-        #expect(GatewayDiscoveryHelpers.directUrl(for: wsGateway) == "wss://resolved.example.ts.net:18789")
+        #expect(GatewayDiscoveryHelpers.directUrl(for: wsGateway) == "ws://resolved.example.ts.net:18789")

        let localGateway = self.makeGateway(
            serviceHost: "127.0.0.1",
@@ -84,6 +87,15 @@ struct GatewayDiscoveryHelpersTests {
        #expect(GatewayDiscoveryHelpers.directUrl(for: localGateway) == "ws://127.0.0.1:18789")
    }

+    @Test func `direct url rejects public plaintext service endpoint`() {
+        let gateway = self.makeGateway(
+            serviceHost: "gateway.example",
+            servicePort: 18789,
+            gatewayTls: false)
+
+        #expect(GatewayDiscoveryHelpers.directUrl(for: gateway) == nil)
+    }
+
    @Test func `direct url rejects txt only fallback`() {
        let gateway = self.makeGateway(
            serviceHost: nil,
--- a/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoveryModelTests.swift
+++ b/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoveryModelTests.swift
@@ -87,12 +87,16 @@ struct GatewayDiscoveryModelTests {
            "tailnetDns": "  peters-mac-studio-1.ts.net  ",
            "sshPort": " 2222 ",
            "gatewayPort": " 18799 ",
+            "gatewayTls": " yes ",
+            "gatewayDirectReachable": " true ",
            "cliPath": " /opt/openclaw ",
        ])
        #expect(parsed.lanHost == "studio.local")
        #expect(parsed.tailnetDns == "peters-mac-studio-1.ts.net")
        #expect(parsed.sshPort == 2222)
        #expect(parsed.gatewayPort == 18799)
+        #expect(parsed.gatewayTls)
+        #expect(parsed.gatewayDirectReachable)
        #expect(parsed.cliPath == "/opt/openclaw")
    }

@@ -107,6 +111,8 @@ struct GatewayDiscoveryModelTests {
        #expect(parsed.tailnetDns == nil)
        #expect(parsed.sshPort == 22)
        #expect(parsed.gatewayPort == nil)
+        #expect(!parsed.gatewayTls)
+        #expect(!parsed.gatewayDirectReachable)
        #expect(parsed.cliPath == nil)
    }

--- a/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoverySelectionSupportTests.swift
+++ b/apps/macos/Tests/OpenClawIPCTests/GatewayDiscoverySelectionSupportTests.swift
@@ -11,6 +11,8 @@ struct GatewayDiscoverySelectionSupportTests {
        servicePort: Int?,
        tailnetDns: String? = nil,
        sshPort: Int = 22,
+        gatewayTls: Bool = false,
+        gatewayDirectReachable: Bool = false,
        stableID: String) -> GatewayDiscoveryModel.DiscoveredGateway
    {
        GatewayDiscoveryModel.DiscoveredGateway(
@@ -21,6 +23,8 @@ struct GatewayDiscoverySelectionSupportTests {
            tailnetDns: tailnetDns,
            sshPort: sshPort,
            gatewayPort: servicePort,
+            gatewayTls: gatewayTls,
+            gatewayDirectReachable: gatewayDirectReachable,
            cliPath: nil,
            stableID: stableID,
            debugID: UUID().uuidString,
@@ -40,6 +44,7 @@ struct GatewayDiscoverySelectionSupportTests {
                    serviceHost: tailnetHost,
                    servicePort: 443,
                    tailnetDns: tailnetHost,
+                    gatewayTls: true,
                    stableID: "tailscale-serve|\(tailnetHost)"),
                state: state)

@@ -61,6 +66,7 @@ struct GatewayDiscoverySelectionSupportTests {
                    serviceHost: tailnetHost,
                    servicePort: 443,
                    tailnetDns: tailnetHost,
+                    gatewayTls: true,
                    stableID: "wide-area|openclaw.internal.|gateway-host"),
                state: state)

@@ -69,12 +75,33 @@ struct GatewayDiscoverySelectionSupportTests {
        }
    }

-    @Test func `selecting nearby lan gateway keeps ssh transport`() async {
+    @Test func `legacy tailnet discovery without reachability flags still switches to direct transport`() async {
+        let tailnetHost = "gateway-host.tailnet-example.ts.net"
+        let configPath = TestIsolation.tempConfigPath()
+        await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
+            let state = AppState(preview: true)
+            state.remoteTransport = .ssh
+
+            GatewayDiscoverySelectionSupport.applyRemoteSelection(
+                gateway: self.makeGateway(
+                    serviceHost: tailnetHost,
+                    servicePort: 18789,
+                    tailnetDns: tailnetHost,
+                    stableID: "wide-area|openclaw.internal.|gateway-host"),
+                state: state)
+
+            #expect(state.remoteTransport == .direct)
+            #expect(state.remoteUrl == "ws://\(tailnetHost):18789")
+        }
+    }
+
+    @Test func `selecting nearby lan gateway keeps ssh without direct reachability signal`() async {
        let configPath = TestIsolation.tempConfigPath()
        await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
            let state = AppState(preview: true)
            state.remoteTransport = .ssh
            state.remoteTarget = "user@old-host"
+            state.remoteUrl = "ws://localhost:29876"

            GatewayDiscoverySelectionSupport.applyRemoteSelection(
                gateway: self.makeGateway(
@@ -84,16 +111,17 @@ struct GatewayDiscoverySelectionSupportTests {
                state: state)

            #expect(state.remoteTransport == .ssh)
-            #expect(state.remoteUrl == "ws://127.0.0.1:18789")
+            #expect(state.remoteUrl == "ws://127.0.0.1:29876")
            #expect(CommandResolver.parseSSHTarget(state.remoteTarget)?.host == "nearby-gateway.local")

            let configRoot = OpenClawConfigFile.loadDict()
            let remote = ((configRoot["gateway"] as? [String: Any])?["remote"] as? [String: Any]) ?? [:]
-            #expect(remote["url"] as? String == "ws://127.0.0.1:18789")
+            #expect(remote["transport"] as? String == "ssh")
+            #expect(remote["url"] as? String == "ws://127.0.0.1:29876")
        }
    }

-    @Test func `selecting nearby lan gateway preserves existing ssh tunnel port`() async {
+    @Test func `selecting direct reachable lan gateway ignores stale local tunnel port`() async {
        let configPath = TestIsolation.tempConfigPath()
        await TestIsolation.withEnvValues(["OPENCLAW_CONFIG_PATH": configPath]) {
            let state = AppState(preview: true)
@@ -104,15 +132,17 @@ struct GatewayDiscoverySelectionSupportTests {
                gateway: self.makeGateway(
                    serviceHost: "nearby-gateway.local",
                    servicePort: 19999,
+                    gatewayDirectReachable: true,
                    stableID: "bonjour|nearby-gateway-custom"),
                state: state)

-            #expect(state.remoteTransport == .ssh)
-            #expect(state.remoteUrl == "ws://127.0.0.1:29876")
+            #expect(state.remoteTransport == .direct)
+            #expect(state.remoteUrl == "ws://nearby-gateway.local:19999")

            let configRoot = OpenClawConfigFile.loadDict()
            let remote = ((configRoot["gateway"] as? [String: Any])?["remote"] as? [String: Any]) ?? [:]
-            #expect(remote["url"] as? String == "ws://127.0.0.1:29876")
+            #expect(remote["transport"] as? String == "direct")
+            #expect(remote["url"] as? String == "ws://nearby-gateway.local:19999")
        }
    }
 }
--- a/apps/macos/Tests/OpenClawIPCTests/GatewayEndpointStoreTests.swift
+++ b/apps/macos/Tests/OpenClawIPCTests/GatewayEndpointStoreTests.swift
@@ -315,6 +315,54 @@ struct GatewayEndpointStoreTests {
        #expect(url?.absoluteString == "ws://100.123.224.76:18789")
    }

+    @Test func `missing transport infers direct from private remote URL`() {
+        let root: [String: Any] = [
+            "gateway": [
+                "remote": [
+                    "url": "ws://192.168.0.202:18789",
+                ],
+            ],
+        ]
+
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+        #expect(resolution.transport == .direct)
+        #expect(resolution.source == .inferredRemoteURL)
+        #expect(resolution.directURL?.absoluteString == "ws://192.168.0.202:18789")
+    }
+
+    @Test func `legacy loopback URL keeps SSH even with trusted SSH target`() {
+        let root: [String: Any] = [
+            "gateway": [
+                "remote": [
+                    "url": "ws://127.0.0.1:18789",
+                    "sshTarget": "steipete@192.168.0.202",
+                ],
+            ],
+        ]
+
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+        #expect(resolution.transport == .ssh)
+        #expect(resolution.source == .legacySSH)
+        #expect(resolution.directURL == nil)
+    }
+
+    @Test func `explicit ssh keeps legacy tunnel even when target is direct capable`() {
+        let root: [String: Any] = [
+            "gateway": [
+                "remote": [
+                    "transport": "ssh",
+                    "url": "ws://127.0.0.1:18789",
+                    "sshTarget": "steipete@192.168.0.202",
+                ],
+            ],
+        ]
+
+        let resolution = GatewayRemoteConfig.resolveTransportResolution(root: root)
+        #expect(resolution.transport == .ssh)
+        #expect(resolution.source == .explicit)
+        #expect(resolution.directURL == nil)
+    }
+
    @Test func `normalize gateway url rejects public host ws`() {
        let url = GatewayRemoteConfig.normalizeGatewayUrl("ws://gateway.example:18789")
        #expect(url == nil)
--- a/docs/ci.md
+++ b/docs/ci.md
@@ -12,39 +12,39 @@ OpenClaw CI runs on every push to `main` and every pull request. The `preflight`

 ## Pipeline overview

-| Job                              | Purpose                                                                                                   | When it runs                       |
-| -------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------- |
-| `preflight`                      | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest                   | Always on non-draft pushes and PRs |
-| `security-scm-fast`              | Private key detection and workflow audit via `zizmor`                                                     | Always on non-draft pushes and PRs |
-| `security-dependency-audit`      | Dependency-free production lockfile audit against npm advisories                                          | Always on non-draft pushes and PRs |
-| `security-fast`                  | Required aggregate for the fast security jobs                                                             | Always on non-draft pushes and PRs |
-| `check-dependencies`             | Production Knip dependency-only pass plus the unused-file allowlist guard                                 | Node-relevant changes              |
-| `build-artifacts`                | Build `dist/`, Control UI, built-artifact checks, and reusable downstream artifacts                       | Node-relevant changes              |
-| `checks-fast-core`               | Fast Linux correctness lanes such as bundled/plugin-contract/protocol checks                              | Node-relevant changes              |
-| `checks-fast-contracts-channels` | Sharded channel contract checks with a stable aggregate check result                                      | Node-relevant changes              |
-| `checks-node-core-test`          | Core Node test shards, excluding channel, bundled, contract, and extension lanes                          | Node-relevant changes              |
-| `check`                          | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke                | Node-relevant changes              |
-| `check-additional`               | Architecture, sharded boundary/prompt drift, extension guards, package boundary, and gateway watch        | Node-relevant changes              |
-| `build-smoke`                    | Built-CLI smoke tests and startup-memory smoke                                                            | Node-relevant changes              |
-| `checks`                         | Verifier for built-artifact channel tests                                                                 | Node-relevant changes              |
-| `checks-node-compat-node22`      | Node 22 compatibility build and smoke lane                                                                | Manual CI dispatch for releases    |
-| `check-docs`                     | Docs formatting, lint, and broken-link checks                                                             | Docs changed                       |
-| `skills-python`                  | Ruff + pytest for Python-backed skills                                                                    | Python-skill-relevant changes      |
-| `checks-windows`                 | Windows-specific process/path tests plus shared runtime import specifier regressions                      | Windows-relevant changes           |
-| `macos-node`                     | macOS TypeScript test lane using the shared built artifacts                                               | macOS-relevant changes             |
-| `macos-swift`                    | Swift lint, build, and tests for the macOS app                                                            | macOS-relevant changes             |
-| `android`                        | Android unit tests for both flavors plus one debug APK build                                              | Android-relevant changes           |
-| `test-performance-agent`         | Daily Codex slow-test optimization after trusted activity                                                 | Main CI success or manual dispatch |
-| `openclaw-performance`           | Daily/on-demand Kova runtime performance reports with mock-provider, deep-profile, and GPT 5.5 live lanes | Scheduled and manual dispatch      |
+| Job                                | Purpose                                                                                                   | When it runs                       |
+| ---------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------- |
+| `preflight`                        | Detect docs-only changes, changed scopes, changed extensions, and build the CI manifest                   | Always on non-draft pushes and PRs |
+| `security-scm-fast`                | Private key detection and workflow audit via `zizmor`                                                     | Always on non-draft pushes and PRs |
+| `security-dependency-audit`        | Dependency-free production lockfile audit against npm advisories                                          | Always on non-draft pushes and PRs |
+| `security-fast`                    | Required aggregate for the fast security jobs                                                             | Always on non-draft pushes and PRs |
+| `check-dependencies`               | Production Knip dependency-only pass plus the unused-file allowlist guard                                 | Node-relevant changes              |
+| `build-artifacts`                  | Build `dist/`, Control UI, built-CLI smoke checks, embedded built-artifact checks, and reusable artifacts | Node-relevant changes              |
+| `checks-fast-core`                 | Fast Linux correctness lanes such as bundled and CI-routing checks                                        | Node-relevant changes              |
+| `checks-fast-protocol`             | Gateway protocol compatibility check                                                                      | Node-relevant changes              |
+| `checks-fast-contracts-plugins-*`  | Two sharded plugin contract checks                                                                        | Node-relevant changes              |
+| `checks-fast-contracts-channels-*` | Two sharded channel contract checks                                                                       | Node-relevant changes              |
+| `checks-node-core-*`               | Core Node test shards, excluding channel, bundled, contract, and extension lanes                          | Node-relevant changes              |
+| `check-*`                          | Sharded main local gate equivalent: prod types, lint, guards, test types, and strict smoke                | Node-relevant changes              |
+| `check-additional-*`               | Architecture, sharded boundary/prompt drift, extension guards, package boundary, and runtime topology     | Node-relevant changes              |
+| `checks-node-compat-node22`        | Node 22 compatibility build and smoke lane                                                                | Manual CI dispatch for releases    |
+| `check-docs`                       | Docs formatting, lint, and broken-link checks                                                             | Docs changed                       |
+| `skills-python`                    | Ruff + pytest for Python-backed skills                                                                    | Python-skill-relevant changes      |
+| `checks-windows`                   | Windows-specific process/path tests plus shared runtime import specifier regressions                      | Windows-relevant changes           |
+| `macos-node`                       | macOS TypeScript test lane using the shared built artifacts                                               | macOS-relevant changes             |
+| `macos-swift`                      | Swift lint, build, and tests for the macOS app                                                            | macOS-relevant changes             |
+| `android`                          | Android unit tests for both flavors plus one debug APK build                                              | Android-relevant changes           |
+| `test-performance-agent`           | Daily Codex slow-test optimization after trusted activity                                                 | Main CI success or manual dispatch |
+| `openclaw-performance`             | Daily/on-demand Kova runtime performance reports with mock-provider, deep-profile, and GPT 5.5 live lanes | Scheduled and manual dispatch      |

 ## Fail-fast order

 1. `preflight` decides which lanes exist at all. The `docs-scope` and `changed-scope` logic are steps inside this job, not standalone jobs.
-2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check`, `check-additional`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs.
+2. `security-scm-fast`, `security-dependency-audit`, `security-fast`, `check-*`, `check-additional-*`, `check-docs`, and `skills-python` fail quickly without waiting on the heavier artifact and platform matrix jobs.
 3. `build-artifacts` overlaps with the fast Linux lanes so downstream consumers can start as soon as the shared build is ready.
-4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-channels`, `checks-node-core-test`, `checks`, `checks-windows`, `macos-node`, `macos-swift`, and `android`.
+4. Heavier platform and runtime lanes fan out after that: `checks-fast-core`, `checks-fast-contracts-plugins-*`, `checks-fast-contracts-channels-*`, `checks-node-core-*`, `checks-windows`, `macos-node`, `macos-swift`, and `android`.

-GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Aggregate shard checks use `!cancelled() && always()` so they still report normal shard failures but do not queue after the whole workflow has already been superseded. The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs.
+GitHub may mark superseded jobs as `cancelled` when a newer push lands on the same PR or `main` ref. Treat that as CI noise unless the newest run for the same ref is also failing. Matrix jobs use `fail-fast: false`, and `build-artifacts` reports embedded channel, core-support-boundary, and gateway-watch failures directly instead of queuing tiny verifier jobs. The automatic CI concurrency key is versioned (`CI-v7-*`) so a GitHub-side zombie in an old queue group cannot indefinitely block newer main runs. Manual full-suite runs use `CI-manual-v1-*` and do not cancel in-progress runs.

 The `ci-timings-summary` job uploads a compact `ci-timings-summary` artifact for each non-draft CI run. It records wall time, queue time, slowest jobs, and failed jobs for the current run, so CI health checks do not need to scrape the full Actions payload repeatedly.

@@ -56,7 +56,7 @@ Scope logic lives in `scripts/ci-changed-scope.mjs` and is covered by unit tests
 - **CI routing-only edits, selected cheap core-test fixture edits, and narrow plugin contract helper/test-routing edits** use a fast Node-only manifest path: `preflight`, security, and a single `checks-fast-core` task. That path skips build artifacts, Node 22 compatibility, channel contracts, full core shards, bundled-plugin shards, and additional guard matrices when the change is limited to the routing or helper surfaces the fast task exercises directly.
 - **Windows Node checks** are scoped to Windows-specific process/path wrappers, npm/pnpm/UI runner helpers, package manager config, and the CI workflow surfaces that execute that lane; unrelated source, plugin, install-smoke, and test-only changes stay on the Linux Node lanes.

-The slowest Node test families are split or balanced so each job stays small without over-reserving runners: channel contracts run as three weighted Blacksmith-backed shards with the standard GitHub runner fallback, core unit fast/support lanes run separately, core runtime infra is split between state, process/config, cron, and shared shards, auto-reply runs as balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/server configs are split across chat/auth/model/http-plugin/runtime/startup lanes instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard list is striped across four matrix shards, each running selected independent guards concurrently and printing per-check timings. The expensive Codex happy-path prompt snapshot drift check runs as its own additional job for manual CI and for prompt-affecting changes only, so normal unrelated Node changes do not wait behind cold prompt snapshot generation and the boundary shards stay balanced while prompt drift is still pinned to the PR that caused it; the same flag skips prompt snapshot Vitest generation inside the built-artifact core support-boundary shard. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built.
+The slowest Node test families are split or balanced so each job stays small without over-reserving runners: plugin contracts and channel contracts each run as two weighted Blacksmith-backed shards with the standard GitHub runner fallback, core unit fast/support lanes run separately, core runtime infra is split between state, process/config, cron, and shared shards, auto-reply runs as balanced workers (with the reply subtree split into agent-runner, dispatch, and commands/state-routing shards), and agentic gateway/server configs are split across chat/auth/model/http-plugin/runtime/startup lanes instead of waiting on built artifacts. Broad browser, QA, media, and miscellaneous plugin tests use their dedicated Vitest configs instead of the shared plugin catch-all. Include-pattern shards record timing entries using the CI shard name, so `.artifacts/vitest-shard-timings.json` can distinguish a whole config from a filtered shard. `check-additional-*` keeps package-boundary compile/canary work together and separates runtime topology architecture from gateway watch coverage; the boundary guard list is striped across four matrix shards, each running selected independent guards concurrently and printing per-check timings. The expensive Codex happy-path prompt snapshot drift check runs as its own additional job for manual CI and for prompt-affecting changes only, so normal unrelated Node changes do not wait behind cold prompt snapshot generation and the boundary shards stay balanced while prompt drift is still pinned to the PR that caused it; the same flag skips prompt snapshot Vitest generation inside the built-artifact core support-boundary shard. Gateway watch, channel tests, and the core support-boundary shard run concurrently inside `build-artifacts` after `dist/` and `dist-runtime/` are already built.

 Android CI runs both `testPlayDebugUnitTest` and `testThirdPartyDebugUnitTest` and then builds the Play debug APK. The third-party flavor has no separate source set or manifest; its unit-test lane still compiles the flavor with the SMS/call-log BuildConfig flags, while avoiding a duplicate debug APK packaging job on every Android-relevant push.

@@ -81,7 +81,7 @@ Treat GitHub titles, comments, bodies, review text, branch names, and commit mes

 ## Manual dispatches

-Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, channel contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with `include_android=true`; the full release umbrella enables Android by passing `include_android=true`. Plugin prerelease static checks, the release-only `agentic-plugins` shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when `Full Release Validation` dispatches the separate `Plugin Prerelease` workflow with the release-validation gate enabled.
+Manual CI dispatches run the same job graph as normal CI but force every non-Android scoped lane on: Linux Node shards, bundled-plugin shards, plugin and channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`, built-artifact smoke checks, docs checks, Python skills, Windows, macOS, and Control UI i18n. Standalone manual CI dispatches run Android only with `include_android=true`; the full release umbrella enables Android by passing `include_android=true`. Plugin prerelease static checks, the release-only `agentic-plugins` shard, the full extension batch sweep, and plugin prerelease Docker lanes are excluded from CI. The Docker prerelease suite runs only when `Full Release Validation` dispatches the separate `Plugin Prerelease` workflow with the release-validation gate enabled.

 Manual runs use a unique concurrency group so a release-candidate full suite is not cancelled by another push or PR run on the same ref. The optional `target_ref` input lets a trusted caller run that graph against a branch, tag, or full commit SHA while using the workflow file from the selected dispatch ref.

@@ -93,15 +93,15 @@ gh workflow run full-release-validation.yml --ref main -f ref=<branch-or-sha>

 ## Runners

-| Runner                           | Jobs                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
-| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `ubuntu-24.04`                   | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, sharded channel contract checks, `check` shards except lint, `check-additional` aggregates, Node test aggregate verifiers, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier |
-| `blacksmith-4vcpu-ubuntu-2404`   | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types`                                                                                                                                                                                                                                                                                                        |
-| `blacksmith-8vcpu-ubuntu-2404`   | build-smoke, Linux Node test shards, bundled plugin test shards, `check-additional` shards, `android`                                                                                                                                                                                                                                                                                                                                                        |
-| `blacksmith-16vcpu-ubuntu-2404`  | `build-artifacts`, `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved)                                                                                                                                                                                                                                                                                       |
-| `blacksmith-16vcpu-windows-2025` | `checks-windows`                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| `blacksmith-6vcpu-macos-latest`  | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest`                                                                                                                                                                                                                                                                                                                                                                                       |
-| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest`                                                                                                                                                                                                                                                                                                                                                                                      |
+| Runner                           | Jobs                                                                                                                                                                                                                                                                                                                              |
+| -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ubuntu-24.04`                   | `preflight`, fast security jobs and aggregates (`security-scm-fast`, `security-dependency-audit`, `security-fast`), fast protocol/contract/bundled checks, docs checks, Python skills, workflow-sanity, labeler, auto-response; install-smoke preflight also uses GitHub-hosted Ubuntu so the Blacksmith matrix can queue earlier |
+| `blacksmith-4vcpu-ubuntu-2404`   | `CodeQL Critical Quality`, lower-weight extension shards, `checks-fast-core`, `checks-fast-protocol`, plugin/channel contract shards, `checks-node-compat-node22`, `check-prod-types`, and `check-test-types`                                                                                                                     |
+| `blacksmith-8vcpu-ubuntu-2404`   | Linux Node test shards, bundled plugin test shards, `check-additional-*` shards, `android`                                                                                                                                                                                                                                        |
+| `blacksmith-16vcpu-ubuntu-2404`  | `build-artifacts`, `check-lint` (CPU-sensitive enough that 8 vCPU cost more than they saved); install-smoke Docker builds (32-vCPU queue time cost more than it saved)                                                                                                                                                            |
+| `blacksmith-16vcpu-windows-2025` | `checks-windows`                                                                                                                                                                                                                                                                                                                  |
+| `blacksmith-6vcpu-macos-latest`  | `macos-node` on `openclaw/openclaw`; forks fall back to `macos-latest`                                                                                                                                                                                                                                                            |
+| `blacksmith-12vcpu-macos-latest` | `macos-swift` on `openclaw/openclaw`; forks fall back to `macos-latest`                                                                                                                                                                                                                                                           |

 Canonical-repo CI keeps Blacksmith as the default runner path. During `preflight`, `scripts/ci-runner-labels.mjs` checks recent queued and in-progress Actions runs for queued Blacksmith jobs. If a specific Blacksmith label already has queued jobs, downstream jobs that would use that exact label fall back to the matching GitHub-hosted runner (`ubuntu-24.04`, `windows-2025`, or `macos-latest`) for that run only. Other Blacksmith sizes in the same OS family stay on their primary labels. If the API probe fails, no fallback is applied.

@@ -121,7 +121,7 @@ pnpm test:changed                             # cheap smart changed Vitest targe
 pnpm test:channels
 pnpm test:contracts:channels
 pnpm check:docs                               # docs format + lint + broken links
-pnpm build                                    # build dist when CI artifact/build-smoke lanes matter
+pnpm build                                    # build dist when CI artifact/smoke checks matter
 pnpm ci:timings                               # summarize the latest origin/main push CI run
 pnpm ci:timings:recent                        # compare recent successful main CI runs
 node scripts/ci-run-timings.mjs <run-id>      # summarize wall time, queue time, and slowest jobs
@@ -203,7 +203,7 @@ Docker release-path soak; `full` forces soak on.

 The umbrella records the dispatched child run ids, and the final `Verify full validation` job re-checks current child run conclusions and appends slowest-job tables for each child run. If a child workflow is rerun and turns green, rerun only the parent verifier job to refresh the umbrella result and timing summary.

-For recovery, both `Full Release Validation` and `OpenClaw Release Checks` accept `rerun_group`. Use `all` for a release candidate, `ci` for only the normal full CI child, `plugin-prerelease` for only the plugin prerelease child, `release-checks` for every release child, or a narrower group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, `qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed release box rerun bounded after a focused fix. For one failed cross-OS lane, combine `rerun_group=cross-os` with `cross_os_suite_filter`, for example `windows/packaged-upgrade`; long cross-OS commands emit heartbeat lines and packaged-upgrade summaries include per-phase timings. QA release-check lanes are advisory, so QA-only failures warn but do not block the release-check verifier.
+For recovery, both `Full Release Validation` and `OpenClaw Release Checks` accept `rerun_group`. Use `all` for a release candidate, `ci` for only the normal full CI child, `plugin-prerelease` for only the plugin prerelease child, `release-checks` for every release child, or a narrower group: `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`, `qa-parity`, `qa-live`, or `npm-telegram` on the umbrella. This keeps a failed release box rerun bounded after a focused fix. For one failed cross-OS lane, combine `rerun_group=cross-os` with `cross_os_suite_filter`, for example `windows/packaged-upgrade`; long cross-OS commands emit heartbeat lines and packaged-upgrade summaries include per-phase timings. QA release-check lanes are advisory except the standard runtime tool coverage gate, which blocks when required OpenClaw dynamic tools drift or disappear from the standard tier summary.

 `OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected ref once into a `release-package-under-test` tarball, then passes that artifact to cross-OS checks and Package Acceptance, plus the live/E2E release-path Docker workflow when soak coverage runs. That keeps the package bytes consistent across release boxes and avoids repacking the same candidate in multiple child jobs.

--- a/docs/cli/doctor.md
+++ b/docs/cli/doctor.md
@@ -90,7 +90,7 @@ openclaw doctor --lint --only core/doctor/gateway-config --json
 Human output is compact:

 ```text
-doctor --lint: ran 5 check(s), 1 finding(s)
+doctor --lint: ran 6 check(s), 1 finding(s)
  [warning] core/doctor/gateway-config gateway.mode - gateway.mode is unset; gateway start will be blocked.
    fix: Run `openclaw configure` and set Gateway mode (local/remote), or `openclaw config set gateway.mode local`.
 ```
--- a/docs/concepts/qa-e2e-automation.md
+++ b/docs/concepts/qa-e2e-automation.md
@@ -34,7 +34,7 @@ script aliases; both forms are supported.
 | `qa run`                                            | Bundled QA self-check; writes a Markdown report.                                                                                                                                                                                                                        |
 | `qa suite`                                          | Run repo-backed scenarios against the QA gateway lane. Aliases: `pnpm openclaw qa suite --runner multipass` for a disposable Linux VM.                                                                                                                                  |
 | `qa coverage`                                       | Print the markdown scenario-coverage inventory (`--json` for machine output).                                                                                                                                                                                           |
-| `qa parity-report`                                  | Compare two `qa-suite-summary.json` files and write the agentic parity report.                                                                                                                                                                                          |
+| `qa parity-report`                                  | Compare two `qa-suite-summary.json` files and write the agentic parity report, or use `--runtime-axis --token-efficiency` to write Codex-vs-Pi runtime parity and token-efficiency reports from one runtime-pair summary.                                               |
 | `qa character-eval`                                 | Run the character QA scenario across multiple live models with a judged report. See [Reporting](#reporting).                                                                                                                                                            |
 | `qa manual`                                         | Run a one-off prompt against the selected provider/model lane.                                                                                                                                                                                                          |
 | `qa ui`                                             | Start the QA debugger UI and local QA bus (alias: `pnpm qa:lab:ui`).                                                                                                                                                                                                    |
--- a/docs/reference/RELEASING.md
+++ b/docs/reference/RELEASING.md
@@ -185,10 +185,10 @@ vYYYY.M.D-beta.N` from the matching `release/YYYY.M.D` branch. The helper runs
  - `custom`: exact `docker_lanes` selection for a focused rerun
 - Run the manual `CI` workflow directly when you only need full normal CI
  coverage for the release candidate. Manual CI dispatches bypass changed
-  scoping and force the Linux Node shards, bundled-plugin shards, channel
-  contracts, Node 22 compatibility, `check`, `check-additional`, build smoke,
-  docs checks, Python skills, Windows, macOS, Android, and Control UI i18n
-  lanes.
+  scoping and force the Linux Node shards, bundled-plugin shards, plugin and
+  channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`,
+  built-artifact smoke checks, docs checks, Python skills, Windows, macOS,
+  Android, and Control UI i18n lanes.
  Example: `gh workflow run ci.yml --ref release/YYYY.M.D`
 - Run `pnpm qa:otel:smoke` when validating release telemetry. It exercises
  QA-lab through a local OTLP/HTTP receiver and verifies the exported trace
@@ -442,16 +442,19 @@ Focused `npm-telegram` reruns require `release_package_spec` or
 `npm_telegram_package_spec`; full/all runs with `release_profile=full` use the
 release-checks package artifact. Focused
 cross-OS reruns can add `cross_os_suite_filter=windows/packaged-upgrade` or
-another OS/suite filter. QA release-check failures are advisory; a QA-only
-failure does not block release validation.
+another OS/suite filter. QA release-check failures are advisory except the
+standard runtime tool coverage gate, which blocks release validation when
+required OpenClaw dynamic tools drift or disappear from the standard tier
+summary.

 ### Vitest

 The Vitest box is the manual `CI` child workflow. Manual CI intentionally
 bypasses changed scoping and forces the normal test graph for the release
-candidate: Linux Node shards, bundled-plugin shards, channel contracts, Node 22
-compatibility, `check`, `check-additional`, build smoke, docs checks, Python
-skills, Windows, macOS, Android, and Control UI i18n.
+candidate: Linux Node shards, bundled-plugin shards, plugin and channel contract
+shards, Node 22 compatibility, `check-*`, `check-additional-*`,
+built-artifact smoke checks, docs checks, Python skills, Windows, macOS,
+Android, and Control UI i18n.

 Use this box to answer "did the source tree pass the full normal test suite?"
 It is not the same as release-path product validation. Evidence to keep:
--- a/docs/reference/full-release-validation.md
+++ b/docs/reference/full-release-validation.md
@@ -44,7 +44,7 @@ only when Package Acceptance should intentionally prove a different package.
 | Stage                | Details                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | Target resolution    | **Job:** `Resolve target ref`<br />**Child workflow:** none<br />**Proves:** resolves the release branch, tag, or full commit SHA and records selected inputs.<br />**Rerun:** rerun the umbrella if this fails.                                                                                                                                                                                                                               |
-| Vitest and normal CI | **Job:** `Run normal full CI`<br />**Child workflow:** `CI`<br />**Proves:** manual full CI graph against the target ref, including Linux Node lanes, bundled plugin shards, channel contracts, Node 22 compatibility, `check`, `check-additional`, build smoke, docs checks, Python skills, Windows, macOS, Control UI i18n, and Android via the umbrella.<br />**Rerun:** `rerun_group=ci`.                                                  |
+| Vitest and normal CI | **Job:** `Run normal full CI`<br />**Child workflow:** `CI`<br />**Proves:** manual full CI graph against the target ref, including Linux Node lanes, bundled plugin shards, plugin and channel contract shards, Node 22 compatibility, `check-*`, `check-additional-*`, built-artifact smoke checks, docs checks, Python skills, Windows, macOS, Control UI i18n, and Android via the umbrella.<br />**Rerun:** `rerun_group=ci`.             |
 | Plugin prerelease    | **Job:** `Run plugin prerelease validation`<br />**Child workflow:** `Plugin Prerelease`<br />**Proves:** release-only plugin static checks, agentic plugin coverage, full extension batch shards, plugin prerelease Docker lanes, and a non-blocking `plugin-inspector-advisory` artifact for compatibility triage.<br />**Rerun:** `rerun_group=plugin-prerelease`.                                                                          |
 | Release checks       | **Job:** `Run release/live/Docker/QA validation`<br />**Child workflow:** `OpenClaw Release Checks`<br />**Proves:** install smoke, cross-OS package checks, Package Acceptance, QA Lab parity, live Matrix, and live Telegram. With `run_release_soak=true` or `release_profile=full`, also runs exhaustive live/E2E suites and Docker release-path chunks.<br />**Rerun:** `rerun_group=release-checks` or a narrower release-checks handle. |
 | Package artifact     | **Job:** `Prepare release package artifact`<br />**Child workflow:** none<br />**Proves:** creates the parent `release-package-under-test` tarball early enough for package-facing checks that do not need to wait for `OpenClaw Release Checks`.<br />**Rerun:** rerun the umbrella or provide `release_package_spec` for published-package reruns.                                                                                           |
@@ -166,9 +166,10 @@ summaries include per-phase timings for packaged upgrade lanes, and long-running
 commands print heartbeat lines so a stuck Windows update is visible before the
 job timeout.

-QA release-check lanes are advisory. A QA-only failure is reported as a warning
-and does not block the release-check verifier; rerun `rerun_group=qa`,
-`qa-parity`, or `qa-live` when you need fresh QA evidence.
+QA release-check lanes are advisory except the standard runtime tool coverage
+gate. Required OpenClaw dynamic tool drift in the standard tier blocks the
+release-check verifier; other QA-only failures are reported as warnings. Rerun
+`rerun_group=qa`, `qa-parity`, or `qa-live` when you need fresh QA evidence.

 ## Evidence to keep

--- a/extensions/bonjour/index.test.ts
+++ b/extensions/bonjour/index.test.ts
@@ -72,6 +72,7 @@ describe("bonjour plugin entry", () => {
        gatewayPort: 3210,
        gatewayTlsEnabled: true,
        gatewayTlsFingerprintSha256: "abc123",
+        gatewayDirectReachable: true,
        canvasPort: 9876,
        sshPort: 22,
        tailnetDns: "dev.tailnet.ts.net",
@@ -88,6 +89,7 @@ describe("bonjour plugin entry", () => {
        gatewayPort: 3210,
        gatewayTlsEnabled: true,
        gatewayTlsFingerprintSha256: "abc123",
+        gatewayDirectReachable: true,
        canvasPort: 9876,
        sshPort: 22,
        tailnetDns: "dev.tailnet.ts.net",
--- a/extensions/bonjour/index.ts
+++ b/extensions/bonjour/index.ts
@@ -32,6 +32,7 @@ export default definePluginEntry({
            gatewayPort: ctx.gatewayPort,
            gatewayTlsEnabled: ctx.gatewayTlsEnabled,
            gatewayTlsFingerprintSha256: ctx.gatewayTlsFingerprintSha256,
+            gatewayDirectReachable: ctx.gatewayDirectReachable,
            canvasPort: ctx.canvasPort,
            sshPort: ctx.sshPort,
            tailnetDns: ctx.tailnetDns,
--- a/extensions/bonjour/src/advertiser.test.ts
+++ b/extensions/bonjour/src/advertiser.test.ts
@@ -180,6 +180,7 @@ describe("gateway bonjour advertiser", () => {
    const started = await startAdvertiser({
      gatewayPort: 18789,
      sshPort: 2222,
+      gatewayDirectReachable: true,
      tailnetDns: "host.tailnet.ts.net",
      cliPath: "/opt/homebrew/bin/openclaw",
      minimal: false,
@@ -195,6 +196,7 @@ describe("gateway bonjour advertiser", () => {
    expect(gatewayCall?.[0]?.hostname).toBe("test-host");
    expect((gatewayCall?.[0]?.txt as Record<string, string>)?.lanHost).toBe("test-host.local");
    expect((gatewayCall?.[0]?.txt as Record<string, string>)?.gatewayPort).toBe("18789");
+    expect((gatewayCall?.[0]?.txt as Record<string, string>)?.gatewayDirectReachable).toBe("1");
    expect((gatewayCall?.[0]?.txt as Record<string, string>)?.sshPort).toBe("2222");
    expect((gatewayCall?.[0]?.txt as Record<string, string>)?.tailnetDns).toBe(
      "host.tailnet.ts.net",
--- a/extensions/bonjour/src/advertiser.ts
+++ b/extensions/bonjour/src/advertiser.ts
@@ -22,6 +22,7 @@ export type GatewayBonjourAdvertiseOpts = {
  sshPort?: number;
  gatewayTlsEnabled?: boolean;
  gatewayTlsFingerprintSha256?: string;
+  gatewayDirectReachable?: boolean;
  canvasPort?: number;
  tailnetDns?: string;
  cliPath?: string;
@@ -451,6 +452,9 @@ export async function startGatewayBonjourAdvertiser(
        txtBase.gatewayTlsSha256 = opts.gatewayTlsFingerprintSha256;
      }
    }
+    if (opts.gatewayDirectReachable) {
+      txtBase.gatewayDirectReachable = "1";
+    }
    if (typeof opts.canvasPort === "number" && opts.canvasPort > 0) {
      txtBase.canvasPort = String(opts.canvasPort);
    }
--- a/extensions/browser/browser-doctor.ts
+++ b/extensions/browser/browser-doctor.ts
@@ -1 +1,6 @@
-export { noteChromeMcpBrowserReadiness } from "./src/doctor-browser.js";
+export {
+  detectLegacyClawdBrowserProfileResidue,
+  maybeArchiveLegacyClawdBrowserProfileResidue,
+  noteChromeMcpBrowserReadiness,
+} from "./src/doctor-browser.js";
+export type { LegacyClawdBrowserProfileResidue } from "./src/doctor-browser.js";
--- a/extensions/browser/src/doctor-browser.test.ts
+++ b/extensions/browser/src/doctor-browser.test.ts
@@ -1,5 +1,8 @@
 import { describe, expect, it, vi } from "vitest";
-import { noteChromeMcpBrowserReadiness } from "./doctor-browser.js";
+import {
+  maybeArchiveLegacyClawdBrowserProfileResidue,
+  noteChromeMcpBrowserReadiness,
+} from "./doctor-browser.js";

 function requireFirstNoteText(noteFn: ReturnType<typeof vi.fn>): string {
  const [call] = noteFn.mock.calls;
@@ -92,6 +95,63 @@ describe("browser doctor readiness", () => {
    );
  });

+  it("warns about legacy clawd managed browser profile residue", async () => {
+    const noteFn = vi.fn();
+    const configDir = "/tmp/openclaw-home";
+
+    await noteChromeMcpBrowserReadiness(
+      {
+        browser: {
+          profiles: {
+            openclaw: { color: "#FF4500" },
+          },
+        },
+      },
+      {
+        noteFn,
+        platform: "linux",
+        env: { DISPLAY: ":99" },
+        getUid: () => 1000,
+        configDir,
+        pathExists: (targetPath) => targetPath.endsWith("/browser/clawd/user-data"),
+        resolveManagedExecutable: () => ({ kind: "chrome", path: "/usr/bin/google-chrome" }),
+      },
+    );
+
+    expect(noteFn).toHaveBeenCalledTimes(1);
+    const note = requireFirstNoteText(noteFn);
+    expect(note).toContain("Legacy managed browser profile residue");
+    expect(note).toContain("/tmp/openclaw-home/browser/clawd");
+    expect(note).toContain("/tmp/openclaw-home/browser/openclaw/user-data");
+    expect(note).toContain("openclaw doctor --fix");
+  });
+
+  it("does not warn when clawd is still configured as a browser profile", async () => {
+    const noteFn = vi.fn();
+
+    await noteChromeMcpBrowserReadiness(
+      {
+        browser: {
+          profiles: {
+            clawd: { color: "#FF4500" },
+            openclaw: { color: "#00AA00" },
+          },
+        },
+      },
+      {
+        noteFn,
+        platform: "linux",
+        env: { DISPLAY: ":99" },
+        getUid: () => 1000,
+        configDir: "/tmp/openclaw-home",
+        pathExists: () => true,
+        resolveManagedExecutable: () => ({ kind: "chrome", path: "/usr/bin/google-chrome" }),
+      },
+    );
+
+    expect(noteFn).not.toHaveBeenCalled();
+  });
+
  it("warns when Chrome MCP is configured but Chrome is missing", async () => {
    const noteFn = vi.fn();
    await noteChromeMcpBrowserReadiness(
@@ -195,3 +255,54 @@ describe("browser doctor readiness", () => {
    expect(note).toContain("brave://inspect/#remote-debugging");
  });
 });
+
+describe("legacy clawd browser profile cleanup", () => {
+  it("archives stale clawd residue with the safe trash mover", async () => {
+    const movePathToTrash = vi.fn(async () => "/tmp/openclaw-home/browser/.trash/clawd");
+
+    const result = await maybeArchiveLegacyClawdBrowserProfileResidue(
+      {
+        browser: {
+          profiles: {
+            openclaw: { color: "#FF4500" },
+          },
+        },
+      },
+      {
+        configDir: "/tmp/openclaw-home",
+        pathExists: (targetPath) => targetPath.endsWith("/browser/clawd/user-data"),
+        movePathToTrash,
+      },
+    );
+
+    expect(movePathToTrash).toHaveBeenCalledWith("/tmp/openclaw-home/browser/clawd");
+    expect(result.warnings).toStrictEqual([]);
+    expect(result.changes.join("\n")).toContain(
+      "Archived legacy clawd managed browser profile residue.",
+    );
+    expect(result.changes.join("\n")).toContain("/tmp/openclaw-home/browser/openclaw/user-data");
+  });
+
+  it("does not archive a configured clawd browser profile", async () => {
+    const movePathToTrash = vi.fn(async () => "/tmp/unused");
+
+    const result = await maybeArchiveLegacyClawdBrowserProfileResidue(
+      {
+        browser: {
+          defaultProfile: "clawd",
+          profiles: {
+            clawd: { color: "#FF4500" },
+          },
+        },
+      },
+      {
+        configDir: "/tmp/openclaw-home",
+        pathExists: () => true,
+        movePathToTrash,
+      },
+    );
+
+    expect(movePathToTrash).not.toHaveBeenCalled();
+    expect(result).toStrictEqual({ changes: [], warnings: [] });
+  });
+});
--- a/extensions/browser/src/doctor-browser.ts
+++ b/extensions/browser/src/doctor-browser.ts
@@ -1,3 +1,5 @@
+import fs from "node:fs";
+import path from "node:path";
 import { normalizeOptionalString } from "openclaw/plugin-sdk/string-coerce-runtime";
 import {
  parseBrowserMajorVersion,
@@ -5,12 +7,15 @@ import {
  resolveBrowserExecutableForPlatform,
  resolveGoogleChromeExecutableForPlatform,
 } from "./browser/chrome.executables.js";
-import { resolveBrowserConfig } from "./browser/config.js";
+import { DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME, resolveBrowserConfig } from "./browser/config.js";
+import { movePathToTrash } from "./browser/trash.js";
 import type { OpenClawConfig } from "./config/config.js";
 import { asRecord } from "./record-shared.js";
-import { note } from "./sdk-setup-tools.js";
+import { formatCliCommand, note } from "./sdk-setup-tools.js";
+import { CONFIG_DIR, resolveUserPath } from "./utils.js";

 const CHROME_MCP_MIN_MAJOR = 144;
+const LEGACY_CLAWD_BROWSER_PROFILE_NAME = "clawd";
 const REMOTE_DEBUGGING_PAGES = [
  "chrome://inspect/#remote-debugging",
  "brave://inspect/#remote-debugging",
@@ -26,6 +31,18 @@ type ManagedProfile = {
  name: string;
 };

+export type LegacyClawdBrowserProfileResidue = {
+  legacyProfileDir: string;
+  legacyUserDataDir: string;
+  canonicalUserDataDir: string;
+};
+
+type BrowserDoctorFilesystemDeps = {
+  configDir?: string;
+  pathExists?: (targetPath: string) => boolean;
+  movePathToTrash?: (targetPath: string) => Promise<string>;
+};
+
 function collectChromeMcpProfiles(cfg: OpenClawConfig): ExistingSessionProfile[] {
  const browser = asRecord(cfg.browser);
  if (!browser) {
@@ -85,6 +102,102 @@ function collectManagedProfiles(cfg: OpenClawConfig): ManagedProfile[] {
  return [...profiles.values()].toSorted((a, b) => a.name.localeCompare(b.name));
 }

+function resolveManagedBrowserProfileDir(configDir: string, profileName: string): string {
+  return path.join(configDir, "browser", profileName);
+}
+
+function resolveManagedBrowserUserDataDir(configDir: string, profileName: string): string {
+  return path.join(resolveManagedBrowserProfileDir(configDir, profileName), "user-data");
+}
+
+function normalizeComparablePath(targetPath: string): string {
+  return path.resolve(targetPath);
+}
+
+function isSameOrChildPath(candidatePath: string, parentPath: string): boolean {
+  const candidate = normalizeComparablePath(candidatePath);
+  const parent = normalizeComparablePath(parentPath);
+  return candidate === parent || candidate.startsWith(`${parent}${path.sep}`);
+}
+
+function isLegacyClawdProfileConfigured(cfg: OpenClawConfig, legacyProfileDir: string): boolean {
+  const browser = asRecord(cfg.browser);
+  if (!browser) {
+    return false;
+  }
+  if (normalizeOptionalString(browser.defaultProfile) === LEGACY_CLAWD_BROWSER_PROFILE_NAME) {
+    return true;
+  }
+
+  const configuredProfiles = asRecord(browser.profiles);
+  if (!configuredProfiles) {
+    return false;
+  }
+  if (Object.prototype.hasOwnProperty.call(configuredProfiles, LEGACY_CLAWD_BROWSER_PROFILE_NAME)) {
+    return true;
+  }
+
+  for (const rawProfile of Object.values(configuredProfiles)) {
+    const profile = asRecord(rawProfile);
+    const userDataDir = normalizeOptionalString(profile?.userDataDir);
+    if (userDataDir && isSameOrChildPath(resolveUserPath(userDataDir), legacyProfileDir)) {
+      return true;
+    }
+  }
+  return false;
+}
+
+export function detectLegacyClawdBrowserProfileResidue(
+  cfg: OpenClawConfig,
+  deps?: BrowserDoctorFilesystemDeps,
+): LegacyClawdBrowserProfileResidue | null {
+  const configDir = deps?.configDir ?? CONFIG_DIR;
+  const legacyProfileDir = resolveManagedBrowserProfileDir(
+    configDir,
+    LEGACY_CLAWD_BROWSER_PROFILE_NAME,
+  );
+  const legacyUserDataDir = resolveManagedBrowserUserDataDir(
+    configDir,
+    LEGACY_CLAWD_BROWSER_PROFILE_NAME,
+  );
+  const pathExists = deps?.pathExists ?? fs.existsSync;
+  if (!pathExists(legacyProfileDir) && !pathExists(legacyUserDataDir)) {
+    return null;
+  }
+
+  if (isLegacyClawdProfileConfigured(cfg, legacyProfileDir)) {
+    return null;
+  }
+
+  const resolved = resolveBrowserConfig(cfg.browser, cfg);
+  const defaultProfile = resolved.profiles[resolved.defaultProfile];
+  if (
+    resolved.defaultProfile !== DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME ||
+    defaultProfile?.driver === "existing-session"
+  ) {
+    return null;
+  }
+
+  return {
+    legacyProfileDir,
+    legacyUserDataDir,
+    canonicalUserDataDir: resolveManagedBrowserUserDataDir(
+      configDir,
+      DEFAULT_OPENCLAW_BROWSER_PROFILE_NAME,
+    ),
+  };
+}
+
+function formatLegacyClawdBrowserProfileResidueNote(
+  residue: LegacyClawdBrowserProfileResidue,
+): string {
+  return [
+    `- Legacy managed browser profile residue was found at ${residue.legacyProfileDir}.`,
+    `- The canonical OpenClaw-managed browser profile is ${residue.canonicalUserDataDir}.`,
+    `- If no browser is using the legacy profile, run ${formatCliCommand("openclaw doctor --fix")} to archive it safely instead of deleting it in place.`,
+  ].join("\n");
+}
+
 export async function noteChromeMcpBrowserReadiness(
  cfg: OpenClawConfig,
  deps?: {
@@ -95,6 +208,8 @@ export async function noteChromeMcpBrowserReadiness(
    resolveManagedExecutable?: typeof resolveBrowserExecutableForPlatform;
    resolveChromeExecutable?: (platform: NodeJS.Platform) => { path: string } | null;
    readVersion?: (executablePath: string) => string | null;
+    configDir?: string;
+    pathExists?: (targetPath: string) => boolean;
  },
 ) {
  const noteFn = deps?.noteFn ?? note;
@@ -109,6 +224,13 @@ export async function noteChromeMcpBrowserReadiness(
  const managedProfiles = collectManagedProfiles(cfg);
  const managedProfileLabel = managedProfiles.map((profile) => profile.name).join(", ");
  const resolved = resolveBrowserConfig(cfg.browser, cfg);
+  const legacyClawdResidue = detectLegacyClawdBrowserProfileResidue(cfg, {
+    configDir: deps?.configDir,
+    pathExists: deps?.pathExists,
+  });
+  if (legacyClawdResidue) {
+    noteFn(formatLegacyClawdBrowserProfileResidueNote(legacyClawdResidue), "Browser");
+  }
  const browserExecutable =
    managedProfiles.length > 0 ? resolveManagedExecutable(resolved, platform) : null;
  const missingDisplay =
@@ -225,3 +347,35 @@ export async function noteChromeMcpBrowserReadiness(

  noteFn(lines.join("\n"), "Browser");
 }
+
+export async function maybeArchiveLegacyClawdBrowserProfileResidue(
+  cfg: OpenClawConfig,
+  deps?: BrowserDoctorFilesystemDeps,
+): Promise<{ changes: string[]; warnings: string[] }> {
+  const residue = detectLegacyClawdBrowserProfileResidue(cfg, deps);
+  if (!residue) {
+    return { changes: [], warnings: [] };
+  }
+
+  const move = deps?.movePathToTrash ?? movePathToTrash;
+  try {
+    const archivedPath = await move(residue.legacyProfileDir);
+    return {
+      changes: [
+        [
+          "Archived legacy clawd managed browser profile residue.",
+          `- legacy profile: ${residue.legacyProfileDir}`,
+          `- canonical profile: ${residue.canonicalUserDataDir}`,
+          `- archived at: ${archivedPath}`,
+        ].join("\n"),
+      ],
+      warnings: [],
+    };
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    return {
+      changes: [],
+      warnings: [`Legacy clawd browser profile residue could not be archived: ${message}`],
+    };
+  }
+}
--- a/extensions/codex/src/app-server/event-projector.test.ts
+++ b/extensions/codex/src/app-server/event-projector.test.ts
@@ -1406,6 +1406,46 @@ describe("CodexAppServerEventProjector", () => {
    expect(toolResult.result).toEqual({ status: "completed", exitCode: 0, durationMs: 42 });
  });

+  it("uses streamed command output for failed native tool errors", async () => {
+    const projector = await createProjector();
+
+    await projector.handleNotification(
+      forCurrentTurn("item/commandExecution/outputDelta", {
+        itemId: "cmd-streamed-failure",
+        delta: "fatal: missing fixture\n",
+      }),
+    );
+    await projector.handleNotification(
+      turnCompleted([
+        {
+          type: "commandExecution",
+          id: "cmd-streamed-failure",
+          command: "pnpm test extensions/codex",
+          cwd: "/workspace",
+          processId: null,
+          source: "agent",
+          status: "failed",
+          commandActions: [],
+          aggregatedOutput: null,
+          exitCode: 1,
+          durationMs: 42,
+        },
+      ]),
+    );
+
+    expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
+      toolName: "bash",
+      meta: "run tests (workspace)",
+      error: "fatal: missing fixture",
+      mutatingAction: true,
+      actionFingerprint: JSON.stringify({
+        type: "commandExecution",
+        command: "pnpm test extensions/codex",
+        cwd: "/workspace",
+      }),
+    });
+  });
+
  it("does not duplicate native tool starts when the snapshot completes a started item", async () => {
    const onAgentEvent = vi.fn();
    const trajectoryRecorder = {
@@ -1609,6 +1649,121 @@ describe("CodexAppServerEventProjector", () => {
        toolCallId: "cmd-declined",
      },
    ]);
+    expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
+      toolName: "bash",
+      meta: "run tests (workspace)",
+      error: "codex native tool blocked",
+      mutatingAction: true,
+      actionFingerprint: JSON.stringify({
+        type: "commandExecution",
+        command: "pnpm test extensions/codex",
+        cwd: "/workspace",
+      }),
+    });
+  });
+
+  it("clears a recovered declined native tool error", async () => {
+    const projector = await createProjector();
+
+    await projector.handleNotification(
+      forCurrentTurn("item/completed", {
+        item: {
+          type: "commandExecution",
+          id: "cmd-declined",
+          command: "pnpm test extensions/codex",
+          cwd: "/workspace",
+          processId: null,
+          source: "agent",
+          status: "declined",
+          commandActions: [],
+          aggregatedOutput: null,
+          exitCode: null,
+          durationMs: 1,
+        },
+      }),
+    );
+    expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
+      toolName: "bash",
+      meta: "run tests (workspace)",
+      error: "codex native tool blocked",
+      mutatingAction: true,
+      actionFingerprint: JSON.stringify({
+        type: "commandExecution",
+        command: "pnpm test extensions/codex",
+        cwd: "/workspace",
+      }),
+    });
+
+    await projector.handleNotification(
+      forCurrentTurn("item/completed", {
+        item: {
+          type: "commandExecution",
+          id: "cmd-recovered",
+          command: "pnpm test extensions/codex",
+          cwd: "/workspace",
+          processId: null,
+          source: "agent",
+          status: "completed",
+          commandActions: [],
+          aggregatedOutput: "ok",
+          exitCode: 0,
+          durationMs: 42,
+        },
+      }),
+    );
+
+    expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toBeUndefined();
+  });
+
+  it("does not clear a declined native tool error with a different action", async () => {
+    const projector = await createProjector();
+
+    await projector.handleNotification(
+      forCurrentTurn("item/completed", {
+        item: {
+          type: "commandExecution",
+          id: "cmd-declined",
+          command: "pnpm test extensions/codex",
+          cwd: "/workspace",
+          processId: null,
+          source: "agent",
+          status: "declined",
+          commandActions: [],
+          aggregatedOutput: null,
+          exitCode: null,
+          durationMs: 1,
+        },
+      }),
+    );
+    await projector.handleNotification(
+      forCurrentTurn("item/completed", {
+        item: {
+          type: "commandExecution",
+          id: "cmd-unrelated-success",
+          command: "pnpm test src/foo.test.ts",
+          cwd: "/workspace",
+          processId: null,
+          source: "agent",
+          status: "completed",
+          commandActions: [],
+          aggregatedOutput: "ok",
+          exitCode: 0,
+          durationMs: 42,
+        },
+      }),
+    );
+
+    expect(projector.buildResult(buildEmptyToolTelemetry()).lastToolError).toEqual({
+      toolName: "bash",
+      meta: "run tests (workspace)",
+      error: "codex native tool blocked",
+      mutatingAction: true,
+      actionFingerprint: JSON.stringify({
+        type: "commandExecution",
+        command: "pnpm test extensions/codex",
+        cwd: "/workspace",
+      }),
+    });
  });

  it("emits after_tool_call observations for Codex-native tool item completions", async () => {
--- a/extensions/codex/src/app-server/event-projector.ts
+++ b/extensions/codex/src/app-server/event-projector.ts
@@ -152,6 +152,7 @@ export class CodexAppServerEventProjector {
  private readonly toolTranscriptCallIds = new Set<string>();
  private readonly toolTranscriptResultIds = new Set<string>();
  private readonly transcriptToolProgressCallIds = new Set<string>();
+  private lastNativeToolError: EmbeddedRunAttemptResult["lastToolError"];
  private readonly nativeGeneratedMediaUrls = new Set<string>();
  private readonly diagnosticToolStartedAtByItem = new Map<string, number>();
  private readonly afterToolCallObservedItemIds = new Set<string>();
@@ -338,6 +339,7 @@ export class CodexAppServerEventProjector {
      assistantTexts,
      toolMetas: [...this.toolMetas.values()],
      lastAssistant,
+      ...(this.lastNativeToolError ? { lastToolError: this.lastNativeToolError } : {}),
      didSendViaMessagingTool: toolTelemetry.didSendViaMessagingTool,
      messagingToolSentTexts: toolTelemetry.messagingToolSentTexts,
      messagingToolSentMediaUrls: toolTelemetry.messagingToolSentMediaUrls,
@@ -907,15 +909,18 @@ export class CodexAppServerEventProjector {
    }
    const status = params.phase === "result" ? itemStatus(item) : "running";
    const args = itemToolArgs(item);
+    const meta = itemMeta(item, this.toolProgressDetailMode());
    this.recordToolTrajectoryEvent({ phase: params.phase, item, name, args, status });
    this.emitDiagnosticToolExecutionEvent({ phase: params.phase, item, name, status });
+    if (params.phase === "result") {
+      this.recordNativeToolError({ item, name, meta, status });
+    }
    if (!shouldEmitTranscriptToolProgress(name, args)) {
      if (params.phase === "result") {
        this.emitAfterToolCallObservation(item);
      }
      return;
    }
-    const meta = itemMeta(item, this.toolProgressDetailMode());
    this.emitAgentEvent({
      stream: "tool",
      data: {
@@ -939,6 +944,41 @@ export class CodexAppServerEventProjector {
    }
  }

+  private recordNativeToolError(params: {
+    item: CodexThreadItem;
+    name: string;
+    meta?: string;
+    status: ReturnType<typeof itemStatus>;
+  }): void {
+    if (!isNonSuccessItemStatus(params.status)) {
+      if (!this.lastNativeToolError) {
+        return;
+      }
+      if (!this.lastNativeToolError.mutatingAction) {
+        this.lastNativeToolError = undefined;
+        return;
+      }
+      const actionFingerprint = nativeToolActionFingerprint(params.item);
+      if (
+        this.lastNativeToolError.actionFingerprint &&
+        actionFingerprint &&
+        this.lastNativeToolError.actionFingerprint === actionFingerprint
+      ) {
+        this.lastNativeToolError = undefined;
+      }
+      return;
+    }
+    const error = itemToolError(params.item, params.status, this.toolResultOutputTextByItem);
+    const actionFingerprint = nativeToolActionFingerprint(params.item);
+    this.lastNativeToolError = {
+      toolName: params.name,
+      ...(params.meta ? { meta: params.meta } : {}),
+      ...(error ? { error } : {}),
+      ...(isMutatingNativeToolItem(params.item) ? { mutatingAction: true } : {}),
+      ...(actionFingerprint ? { actionFingerprint } : {}),
+    };
+  }
+
  private recordToolTrajectoryEvent(params: {
    phase: "start" | "result";
    item: CodexThreadItem;
@@ -1709,6 +1749,27 @@ function shouldSynthesizeToolProgressForItem(item: CodexThreadItem): boolean {
  }
 }

+function isMutatingNativeToolItem(item: CodexThreadItem): boolean {
+  return item.type === "commandExecution" || item.type === "fileChange";
+}
+
+function nativeToolActionFingerprint(item: CodexThreadItem): string | undefined {
+  if (item.type === "commandExecution" && typeof item.command === "string") {
+    return JSON.stringify({
+      type: item.type,
+      command: item.command,
+      cwd: typeof item.cwd === "string" ? item.cwd : "",
+    });
+  }
+  if (item.type === "fileChange") {
+    return JSON.stringify({
+      type: item.type,
+      changes: itemFileChanges(item),
+    });
+  }
+  return undefined;
+}
+
 function isNativePostToolUseRelayItem(item: CodexThreadItem): boolean {
  switch (item.type) {
    case "commandExecution":
--- a/extensions/github-copilot/connection-bound-ids.test.ts
+++ b/extensions/github-copilot/connection-bound-ids.test.ts
@@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
 import {
  rewriteCopilotConnectionBoundResponseIds,
  rewriteCopilotResponsePayloadConnectionBoundIds,
+  sanitizeCopilotReplayResponseIds,
 } from "./connection-bound-ids.js";

 describe("github-copilot connection-bound response IDs", () => {
@@ -35,7 +36,7 @@ describe("github-copilot connection-bound response IDs", () => {
    expect(input[4]?.id).toMatch(/^msg_[a-f0-9]{16}$/);
  });

-  it("preserves reasoning IDs regardless of encrypted_content", () => {
+  it("preserves valid reasoning IDs regardless of encrypted_content", () => {
    const withEncrypted = Buffer.from(`reasoning-${"e".repeat(24)}`).toString("base64");
    const withNull = Buffer.from(`reasoning-${"n".repeat(24)}`).toString("base64");
    const withoutField = Buffer.from(`reasoning-${"a".repeat(24)}`).toString("base64");
@@ -51,6 +52,38 @@ describe("github-copilot connection-bound response IDs", () => {
    expect(input[2]?.id).toBe(withoutField);
  });

+  it("preserves valid base64-ish reasoning IDs with and without encrypted content", () => {
+    const withEncrypted = "abcDEF0123+/=";
+    const withoutEncrypted = "reasoning/abc+123=";
+    const input = [
+      { id: withEncrypted, type: "reasoning", encrypted_content: "opaque-encrypted-payload" },
+      { id: withoutEncrypted, type: "reasoning" },
+    ];
+
+    expect(sanitizeCopilotReplayResponseIds(input)).toBe(false);
+    expect(input.map((item) => item.id)).toEqual([withEncrypted, withoutEncrypted]);
+  });
+
+  it("drops unsafe reasoning replay items instead of stripping their IDs", () => {
+    const overlongId = `5PX6gLHXT5wE+Y2tPmUV4gn+${"B".repeat(384)}`;
+    const input = [
+      {
+        id: overlongId,
+        type: "reasoning",
+        encrypted_content: "encrypted-replay-payload",
+        summary: [],
+      },
+      { type: "reasoning", encrypted_content: "missing-id", summary: [] },
+      { id: 123, type: "reasoning", encrypted_content: "non-string-id", summary: [] },
+      { id: "rs_valid", type: "reasoning", encrypted_content: "valid", summary: [] },
+    ];
+
+    expect(sanitizeCopilotReplayResponseIds(input)).toBe(true);
+    expect(input).toEqual([
+      { id: "rs_valid", type: "reasoning", encrypted_content: "valid", summary: [] },
+    ]);
+  });
+
  it("patches response payload input arrays only", () => {
    const messageId = Buffer.from(`message-${"m".repeat(24)}`).toString("base64");
    const payload = { input: [{ id: messageId, type: "message" }] };
--- a/extensions/github-copilot/connection-bound-ids.ts
+++ b/extensions/github-copilot/connection-bound-ids.ts
@@ -2,7 +2,7 @@ import { createHash } from "node:crypto";

 // Copilot's OpenAI-compatible `/responses` endpoint can emit replay item IDs
 // that encode upstream connection state. Those IDs are rejected after the
-// connection changes, so normalize them at the provider boundary before send.
+// connection changes, so sanitize them at the provider boundary before send.

 function looksLikeConnectionBoundId(id: string): boolean {
  if (id.length < 24) {
@@ -25,21 +25,36 @@ function deriveReplacementId(type: string | undefined, originalId: string): stri

 type InputItem = Record<string, unknown> & { id?: unknown; type?: unknown };

-export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolean {
+function isInputItem(value: unknown): value is InputItem {
+  return !!value && typeof value === "object";
+}
+
+function isValidReasoningReplayId(id: unknown): id is string {
+  return typeof id === "string" && id.length > 0 && id.length <= 64;
+}
+
+export function sanitizeCopilotReplayResponseIds(input: unknown): boolean {
  if (!Array.isArray(input)) {
    return false;
  }
  let rewrote = false;
-  for (const item of input as InputItem[]) {
-    const id = item.id;
-    if (typeof id !== "string" || id.length === 0) {
+  for (let index = input.length - 1; index >= 0; index -= 1) {
+    const item = input[index];
+    if (!isInputItem(item)) {
      continue;
    }
+    const id = item.id;
    // Reasoning items always reference server-side encrypted state bound to the
-    // original item ID. Rewriting the ID — even when encrypted_content is absent
-    // or null — breaks Copilot's server-side lookup and causes a 400 validation
-    // failure regardless of whether the client included encrypted_content.
+    // original item ID. Rewriting or stripping that ID can turn replay into an
+    // invalid or ambiguous server-state lookup, so drop unsafe reasoning items.
    if (item.type === "reasoning") {
+      if (!isValidReasoningReplayId(id)) {
+        input.splice(index, 1);
+        rewrote = true;
+      }
+      continue;
+    }
+    if (typeof id !== "string" || id.length === 0) {
      continue;
    }
    if (looksLikeConnectionBoundId(id)) {
@@ -50,9 +65,17 @@ export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolea
  return rewrote;
 }

-export function rewriteCopilotResponsePayloadConnectionBoundIds(payload: unknown): boolean {
+export function rewriteCopilotConnectionBoundResponseIds(input: unknown): boolean {
+  return sanitizeCopilotReplayResponseIds(input);
+}
+
+export function sanitizeCopilotReplayResponsePayloadIds(payload: unknown): boolean {
  if (!payload || typeof payload !== "object") {
    return false;
  }
-  return rewriteCopilotConnectionBoundResponseIds((payload as { input?: unknown }).input);
+  return sanitizeCopilotReplayResponseIds((payload as { input?: unknown }).input);
+}
+
+export function rewriteCopilotResponsePayloadConnectionBoundIds(payload: unknown): boolean {
+  return sanitizeCopilotReplayResponsePayloadIds(payload);
 }
--- a/extensions/github-copilot/stream.test.ts
+++ b/extensions/github-copilot/stream.test.ts
@@ -118,14 +118,21 @@ describe("wrapCopilotAnthropicStream", () => {
    expect(baseStreamFn.mock.calls).toEqual([[model, context, options]]);
  });

-  it("adds Copilot headers, preserves reasoning IDs, and rewrites message IDs before payload send", () => {
+  it("adds Copilot headers, sanitizes reasoning replay, and rewrites message IDs before payload send", () => {
    const reasoningId = Buffer.from(`reasoning-${"x".repeat(24)}`).toString("base64");
+    const overlongReasoningId = `5PX6gLHXT5wE+Y2tPmUV4gn+${"B".repeat(384)}`;
    const messageId = Buffer.from(`message-${"y".repeat(24)}`).toString("base64");
    const payloads: Array<{ input: Array<Record<string, unknown>> }> = [];
    const baseStreamFn = vi.fn((_model, _context, options) => {
      const payload = {
        input: [
-          { id: reasoningId, type: "reasoning" },
+          { id: reasoningId, type: "reasoning", encrypted_content: "valid-encrypted-payload" },
+          {
+            id: overlongReasoningId,
+            type: "reasoning",
+            encrypted_content: "invalid-encrypted-payload",
+            summary: [],
+          },
          { id: messageId, type: "message" },
        ],
      };
@@ -174,6 +181,7 @@ describe("wrapCopilotAnthropicStream", () => {
      onPayload: options.onPayload,
    });
    expect(payloads[0]?.input[0]?.id).toBe(reasoningId);
+    expect(payloads[0]?.input.map((item) => item.type)).toEqual(["reasoning", "message"]);
    expect(payloads[0]?.input[1]?.id).toMatch(/^msg_[a-f0-9]{16}$/);
  });

--- a/extensions/qa-lab/src/cli.runtime.test.ts
+++ b/extensions/qa-lab/src/cli.runtime.test.ts
@@ -965,6 +965,108 @@ describe("qa cli runtime", () => {
    }
  });

+  it("writes a runtime-axis token-efficiency report when requested", async () => {
+    const repoRoot = await fs.mkdtemp(path.join(os.tmpdir(), "qa-runtime-token-efficiency-"));
+    const priorExitCode = process.exitCode;
+    process.exitCode = undefined;
+
+    try {
+      await fs.writeFile(
+        path.join(repoRoot, "runtime-summary.json"),
+        JSON.stringify({
+          scenarios: [
+            {
+              name: "runtime-tool-fs-read",
+              status: "pass",
+              steps: [],
+              runtimeParity: {
+                scenarioId: "runtime-tool-fs-read",
+                drift: "none",
+                cells: {
+                  pi: {
+                    runtime: "pi",
+                    transcriptBytes: '{"role":"assistant"}\n',
+                    toolCalls: [{ tool: "fs.read", argsHash: "a", resultHash: "r" }],
+                    finalText: "done",
+                    usage: { inputTokens: 72_000, outputTokens: 381, totalTokens: 72_381 },
+                    wallClockMs: 10,
+                    bootStateLines: [],
+                  },
+                  codex: {
+                    runtime: "codex",
+                    transcriptBytes: '{"role":"assistant"}\n',
+                    toolCalls: Array.from({ length: 40 }, (_, index) => ({
+                      tool: "fs.read",
+                      argsHash: `a-${index}`,
+                      resultHash: `r-${index}`,
+                    })),
+                    finalText: "done",
+                    usage: { inputTokens: 118_000, outputTokens: 1_489, totalTokens: 119_489 },
+                    wallClockMs: 10,
+                    bootStateLines: [],
+                  },
+                },
+              },
+            },
+          ],
+          counts: { total: 1, passed: 1, failed: 0 },
+          run: {
+            providerMode: "live-frontier",
+            primaryModel: "openai/gpt-5.5",
+            runtimePair: ["pi", "codex"],
+          },
+        }),
+        "utf8",
+      );
+
+      await runQaParityReportCommand({
+        repoRoot,
+        runtimeAxis: true,
+        summary: "runtime-summary.json",
+        tokenEfficiency: true,
+      });
+
+      expect(process.exitCode).toBe(1);
+      expect(stdoutWrite).toHaveBeenCalledWith(
+        expect.stringContaining("QA runtime parity verdict: pass"),
+      );
+      expect(stdoutWrite).toHaveBeenCalledWith(
+        expect.stringContaining("QA runtime token efficiency report:"),
+      );
+      expect(stdoutWrite).toHaveBeenCalledWith(
+        expect.stringContaining("QA runtime token efficiency verdict: fail"),
+      );
+      const [artifactDir] = await fs.readdir(path.join(repoRoot, ".artifacts", "qa-e2e"));
+      const tokenSummary = JSON.parse(
+        await fs.readFile(
+          path.join(
+            repoRoot,
+            ".artifacts",
+            "qa-e2e",
+            artifactDir ?? "",
+            "qa-runtime-token-efficiency-summary.json",
+          ),
+          "utf8",
+        ),
+      ) as { aggregate?: { flaggedScenarios?: string[] } };
+      expect(tokenSummary.aggregate?.flaggedScenarios).toEqual(["runtime-tool-fs-read"]);
+    } finally {
+      process.exitCode = priorExitCode;
+      await fs.rm(repoRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("rejects token-efficiency without runtime-axis mode", async () => {
+    await expect(
+      runQaParityReportCommand({
+        repoRoot: process.cwd(),
+        candidateSummary: "candidate.json",
+        baselineSummary: "baseline.json",
+        tokenEfficiency: true,
+      }),
+    ).rejects.toThrow("--token-efficiency requires --runtime-axis.");
+  });
+
  it("prints a markdown coverage report from scenario metadata", async () => {
    await runQaCoverageReportCommand({ repoRoot: process.cwd() });

@@ -979,6 +1081,64 @@ describe("qa cli runtime", () => {
    expectWriteContains(stdoutWrite, "codex-native-workspace");
  });

+  it("exits nonzero when tool coverage summary is missing a required runtime tool call", async () => {
+    const priorExitCode = process.exitCode;
+    const repoRoot = await fs.mkdtemp(path.join(os.tmpdir(), "qa-tool-coverage-"));
+    try {
+      await fs.writeFile(
+        path.join(repoRoot, "runtime-summary.json"),
+        JSON.stringify({
+          scenarios: [
+            {
+              name: "runtime-tool-web-search",
+              status: "fail",
+              runtimeParity: {
+                scenarioId: "runtime-tool-web-search",
+                drift: "tool-call-shape",
+                driftDetails: "Codex emitted no web_search call",
+                cells: {
+                  pi: {
+                    runtime: "pi",
+                    transcriptBytes: "",
+                    toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
+                    finalText: "",
+                    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                    wallClockMs: 1,
+                    bootStateLines: [],
+                  },
+                  codex: {
+                    runtime: "codex",
+                    transcriptBytes: "",
+                    toolCalls: [],
+                    finalText: "",
+                    usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                    wallClockMs: 1,
+                    bootStateLines: [],
+                  },
+                },
+              },
+            },
+          ],
+          run: { runtimePair: ["pi", "codex"] },
+        }),
+        "utf8",
+      );
+
+      await runQaCoverageReportCommand({
+        repoRoot,
+        tools: true,
+        summary: "runtime-summary.json",
+      });
+
+      expect(process.exitCode).toBe(1);
+      expectWriteContains(stdoutWrite, "- Verdict: fail");
+      expectWriteContains(stdoutWrite, "web-search missing codex tool call web_search");
+    } finally {
+      process.exitCode = priorExitCode;
+      await fs.rm(repoRoot, { recursive: true, force: true });
+    }
+  });
+
  it("resolves character eval paths and passes model refs through", async () => {
    await runQaCharacterEvalCommand({
      repoRoot: "/tmp/openclaw-repo",
--- a/extensions/qa-lab/src/cli.runtime.ts
+++ b/extensions/qa-lab/src/cli.runtime.ts
@@ -50,6 +50,11 @@ import {
 import { resolveQaScenarioPackScenarioIds } from "./scenario-packs.js";
 import { runQaSuiteFromRuntime } from "./suite-launch.runtime.js";
 import { readQaSuiteFailedScenarioCountFromSummary } from "./suite-summary.js";
+import {
+  buildTokenEfficiencyReport,
+  renderTokenEfficiencyMarkdownReport,
+  type TokenEfficiencySuiteSummary,
+} from "./token-efficiency-report.js";
 import {
  buildQaToolCoverageReport,
  renderQaToolCoverageMarkdownReport,
@@ -681,8 +686,12 @@ export async function runQaParityReportCommand(opts: {
  outputDir?: string;
  runtimeAxis?: boolean;
  summary?: string;
+  tokenEfficiency?: boolean;
 }) {
  const repoRoot = path.resolve(opts.repoRoot ?? process.cwd());
+  if (opts.tokenEfficiency === true && opts.runtimeAxis !== true) {
+    throw new Error("--token-efficiency requires --runtime-axis.");
+  }
  const outputDir =
    resolveRepoRelativeOutputDir(repoRoot, opts.outputDir) ??
    path.join(repoRoot, ".artifacts", "qa-e2e", `parity-${Date.now().toString(36)}`);
@@ -706,7 +715,26 @@ export async function runQaParityReportCommand(opts: {
    process.stdout.write(`QA runtime parity report: ${reportPath}\n`);
    process.stdout.write(`QA runtime parity summary: ${runtimeSummaryPath}\n`);
    process.stdout.write(`QA runtime parity verdict: ${reportPayload.pass ? "pass" : "fail"}\n`);
-    if (!reportPayload.pass) {
+
+    let tokenEfficiencyPass = true;
+    if (opts.tokenEfficiency === true) {
+      const tokenPayload = buildTokenEfficiencyReport({
+        summary: summary as TokenEfficiencySuiteSummary,
+      });
+      tokenEfficiencyPass = tokenPayload.pass;
+      const tokenReport = renderTokenEfficiencyMarkdownReport(tokenPayload);
+      const tokenReportPath = path.join(outputDir, "qa-runtime-token-efficiency-report.md");
+      const tokenSummaryPath = path.join(outputDir, "qa-runtime-token-efficiency-summary.json");
+      await fs.writeFile(tokenReportPath, tokenReport, "utf8");
+      await fs.writeFile(tokenSummaryPath, `${JSON.stringify(tokenPayload, null, 2)}\n`, "utf8");
+      process.stdout.write(`QA runtime token efficiency report: ${tokenReportPath}\n`);
+      process.stdout.write(`QA runtime token efficiency summary: ${tokenSummaryPath}\n`);
+      process.stdout.write(
+        `QA runtime token efficiency verdict: ${tokenPayload.status === "skipped" ? "skipped" : tokenPayload.pass ? "pass" : "fail"}\n`,
+      );
+    }
+
+    if (!reportPayload.pass || !tokenEfficiencyPass) {
      process.exitCode = 1;
    }
    return;
@@ -769,6 +797,9 @@ export async function runQaCoverageReportCommand(opts: {
      ? `${JSON.stringify(report, null, 2)}\n`
      : renderQaToolCoverageMarkdownReport(report);
    outputLabel = "QA tool coverage report";
+    if (summary && !report.pass) {
+      process.exitCode = 1;
+    }
  } else {
    if (opts.summary?.trim()) {
      throw new Error("--summary requires --tools.");
--- a/extensions/qa-lab/src/cli.ts
+++ b/extensions/qa-lab/src/cli.ts
@@ -66,6 +66,7 @@ async function runQaParityReport(opts: {
  outputDir?: string;
  runtimeAxis?: boolean;
  summary?: string;
+  tokenEfficiency?: boolean;
 }) {
  const runtime = await loadQaLabCliRuntime();
  await runtime.runQaParityReportCommand(opts);
@@ -353,6 +354,11 @@ export function registerQaLabCli(program: Command) {
    .option("--baseline-summary <path>", "Baseline qa-suite-summary.json path")
    .option("--runtime-axis", "Interpret --summary as a runtime-pair qa-suite-summary.json", false)
    .option("--summary <path>", "Runtime-axis qa-suite-summary.json path")
+    .option(
+      "--token-efficiency",
+      "Also write the runtime token-efficiency report for --runtime-axis summaries",
+      false,
+    )
    .option("--repo-root <path>", "Repository root to target when running from a neutral cwd")
    .option(
      "--candidate-label <label>",
@@ -371,6 +377,7 @@ export function registerQaLabCli(program: Command) {
        outputDir?: string;
        runtimeAxis?: boolean;
        summary?: string;
+        tokenEfficiency?: boolean;
      }) => {
        await runQaParityReport(opts);
      },
--- a/extensions/qa-lab/src/scenario-catalog.test.ts
+++ b/extensions/qa-lab/src/scenario-catalog.test.ts
@@ -120,6 +120,7 @@ describe("qa scenario catalog", () => {
    const applyPatch = readQaScenarioById("runtime-tool-apply-patch");
    const messageTool = readQaScenarioById("runtime-tool-message-tool");
    const tavilySearch = readQaScenarioById("runtime-tool-tavily-search");
+    const webSearch = readQaScenarioById("runtime-tool-web-search");

    expect(applyPatch.runtimeParityTier).toBe("standard");
    expect(messageTool.runtimeParityTier).toBe("optional");
@@ -140,6 +141,16 @@ describe("qa scenario catalog", () => {
        required: false,
      },
    });
+    expect(readQaScenarioExecutionConfig(webSearch.id)).toMatchObject({
+      toolName: "web_search",
+      toolCoverage: {
+        bucket: "openclaw-dynamic-integration",
+        expectedLayer: "openclaw-dynamic",
+        capabilityLayer: "openclaw-dynamic-direct",
+        required: true,
+      },
+    });
+    expect(readQaScenarioExecutionConfig(webSearch.id)).not.toHaveProperty("knownHarnessGap");
  });

  it("loads the Codex Pi-shaped Read vocabulary live parity canary", () => {
--- a/extensions/qa-lab/src/token-efficiency-report.test.ts
+++ b/extensions/qa-lab/src/token-efficiency-report.test.ts
@@ -0,0 +1,191 @@
+import { describe, expect, it } from "vitest";
+import type {
+  RuntimeId,
+  RuntimeParityCell,
+  RuntimeParityResult,
+  RuntimeParityToolCall,
+} from "./runtime-parity.js";
+import {
+  buildTokenEfficiencyReport,
+  renderTokenEfficiencyMarkdownReport,
+  type TokenEfficiencySuiteSummary,
+} from "./token-efficiency-report.js";
+
+function makeToolCall(tool: string): RuntimeParityToolCall {
+  return {
+    tool,
+    argsHash: `${tool}-args`,
+    resultHash: `${tool}-result`,
+  };
+}
+
+function makeCell(
+  runtime: RuntimeId,
+  usage: RuntimeParityCell["usage"],
+  toolCalls: RuntimeParityToolCall[] = [],
+): RuntimeParityCell {
+  return {
+    runtime,
+    transcriptBytes: '{"role":"assistant"}\n',
+    toolCalls,
+    finalText: "done",
+    usage,
+    wallClockMs: 10,
+    bootStateLines: [],
+  };
+}
+
+function makeRuntimeParity(
+  scenarioId: string,
+  pi: RuntimeParityCell,
+  codex: RuntimeParityCell,
+): RuntimeParityResult {
+  return {
+    scenarioId,
+    drift: "none",
+    cells: { pi, codex },
+  };
+}
+
+function makeLiveSummary(runtimeParity: RuntimeParityResult[]): TokenEfficiencySuiteSummary {
+  return {
+    scenarios: runtimeParity.map((result) => ({
+      name: result.scenarioId,
+      status: "pass" as const,
+      runtimeParity: result,
+    })),
+    run: {
+      providerMode: "live-frontier",
+      runtimePair: ["pi", "codex"],
+    },
+  };
+}
+
+describe("token efficiency report", () => {
+  it("does not fail live reports solely because Codex uses fewer tokens", () => {
+    const report = buildTokenEfficiencyReport({
+      generatedAt: "2026-05-10T00:00:00.000Z",
+      summary: makeLiveSummary([
+        makeRuntimeParity(
+          "codex-savings",
+          makeCell("pi", { inputTokens: 120, outputTokens: 80, totalTokens: 200 }),
+          makeCell("codex", { inputTokens: 60, outputTokens: 40, totalTokens: 100 }),
+        ),
+      ]),
+    });
+
+    expect(report.pass).toBe(true);
+    expect(report.aggregate.flaggedScenarios).toEqual([]);
+    expect(report.aggregate.savingsScenarios).toEqual(["codex-savings"]);
+    expect(report.rows[0]).toMatchObject({
+      deltaPercent: -50,
+      classification: "savings",
+      flagged: false,
+    });
+  });
+
+  it("fails live reports on positive Codex token increases over the threshold", () => {
+    const report = buildTokenEfficiencyReport({
+      generatedAt: "2026-05-10T00:00:00.000Z",
+      summary: makeLiveSummary([
+        makeRuntimeParity(
+          "runtime-tool-fs-read",
+          makeCell("pi", { inputTokens: 72_000, outputTokens: 381, totalTokens: 72_381 }, [
+            makeToolCall("fs.read"),
+            makeToolCall("fs.read"),
+          ]),
+          makeCell(
+            "codex",
+            { inputTokens: 118_000, outputTokens: 1_489, totalTokens: 119_489 },
+            Array.from({ length: 40 }, () => makeToolCall("fs.read")),
+          ),
+        ),
+      ]),
+    });
+
+    expect(report.pass).toBe(false);
+    expect(report.aggregate.flaggedScenarios).toEqual(["runtime-tool-fs-read"]);
+    expect(report.rows[0]).toMatchObject({
+      classification: "regression",
+      flagged: true,
+      toolsUsed: ["fs.read"],
+    });
+    expect(report.failures).toEqual([
+      "runtime-tool-fs-read token delta=+65.1% exceeds 15.0% Codex increase threshold",
+    ]);
+  });
+
+  it("keeps live zero-usage rows failing instead of passing as neutral", () => {
+    const report = buildTokenEfficiencyReport({
+      summary: makeLiveSummary([
+        makeRuntimeParity(
+          "missing-live-usage",
+          makeCell("pi", { inputTokens: 0, outputTokens: 0, totalTokens: 0 }),
+          makeCell("codex", { inputTokens: 0, outputTokens: 0, totalTokens: 0 }),
+        ),
+      ]),
+    });
+
+    expect(report.pass).toBe(false);
+    expect(report.failures).toEqual([
+      "missing-live-usage pi live usage totalTokens=0",
+      "missing-live-usage codex live usage totalTokens=0",
+    ]);
+  });
+
+  it("labels mock-estimated Codex increases as regressions without failing the live gate", () => {
+    const report = buildTokenEfficiencyReport({
+      summary: {
+        scenarios: [
+          {
+            name: "mock-regression",
+            status: "pass",
+            runtimeParity: makeRuntimeParity(
+              "mock-regression",
+              makeCell("pi", { inputTokens: 100, outputTokens: 0, totalTokens: 100 }),
+              makeCell("codex", { inputTokens: 130, outputTokens: 0, totalTokens: 130 }),
+            ),
+          },
+        ],
+        run: {
+          providerMode: "mock-openai",
+          runtimePair: ["pi", "codex"],
+        },
+      },
+    });
+
+    expect(report.status).toBe("estimated");
+    expect(report.pass).toBe(true);
+    expect(report.aggregate.flaggedScenarios).toEqual([]);
+    expect(report.rows[0]).toMatchObject({
+      usageSource: "mock-estimate",
+      classification: "regression",
+      flagged: false,
+    });
+  });
+
+  it("renders savings and regression classifications in the markdown report", () => {
+    const report = buildTokenEfficiencyReport({
+      generatedAt: "2026-05-10T00:00:00.000Z",
+      summary: makeLiveSummary([
+        makeRuntimeParity(
+          "codex-savings",
+          makeCell("pi", { inputTokens: 100, outputTokens: 100, totalTokens: 200 }),
+          makeCell("codex", { inputTokens: 50, outputTokens: 50, totalTokens: 100 }),
+        ),
+        makeRuntimeParity(
+          "codex-regression",
+          makeCell("pi", { inputTokens: 100, outputTokens: 0, totalTokens: 100 }),
+          makeCell("codex", { inputTokens: 130, outputTokens: 0, totalTokens: 130 }),
+        ),
+      ]),
+    });
+
+    const markdown = renderTokenEfficiencyMarkdownReport(report);
+    expect(markdown).toContain("p50 per scenario");
+    expect(markdown).toContain("| codex-savings | live-usage |");
+    expect(markdown).toContain("| -50.0% | savings | no |");
+    expect(markdown).toContain("| codex-regression | live-usage |");
+    expect(markdown).toContain("| +30.0% | regression | yes |");
+  });
+});
--- a/extensions/qa-lab/src/token-efficiency-report.ts
+++ b/extensions/qa-lab/src/token-efficiency-report.ts
@@ -0,0 +1,306 @@
+import type { RuntimeId, RuntimeParityCell, RuntimeParityResult } from "./runtime-parity.js";
+
+export type TokenEfficiencyRuntimeUsage = {
+  inputTokens: number;
+  outputTokens: number;
+  totalTokens: number;
+  toolCallCount: number;
+};
+
+export type TokenEfficiencyRow = {
+  scenarioId: string;
+  usageSource: "live-usage" | "mock-estimate";
+  pi: TokenEfficiencyRuntimeUsage;
+  codex: TokenEfficiencyRuntimeUsage;
+  deltaPercent: number;
+  classification: "regression" | "savings" | "neutral";
+  flagged: boolean;
+  toolsUsed: string[];
+};
+
+export type TokenEfficiencyReport = {
+  status: "evaluated" | "estimated" | "skipped";
+  runtimePair: [RuntimeId, RuntimeId];
+  generatedAt: string;
+  providerMode?: string;
+  thresholdPercent: number;
+  rows: TokenEfficiencyRow[];
+  aggregate: {
+    pi: { totalTokens: number; p50PerScenario: number; p90PerScenario: number };
+    codex: { totalTokens: number; p50PerScenario: number; p90PerScenario: number };
+    deltaPercent: number;
+    flaggedScenarios: string[];
+    savingsScenarios: string[];
+  };
+  pass: boolean;
+  failures: string[];
+  skipReason?: string;
+  notes: string[];
+};
+
+export type TokenEfficiencySuiteSummary = {
+  scenarios: Array<{
+    name: string;
+    status: "pass" | "fail" | "skip";
+    runtimeParity?: RuntimeParityResult;
+  }>;
+  run?: {
+    providerMode?: string;
+    runtimePair?: [RuntimeId, RuntimeId] | null;
+  };
+};
+
+export type BuildTokenEfficiencyReportParams = {
+  summary: TokenEfficiencySuiteSummary;
+  generatedAt?: string;
+  thresholdPercent?: number;
+};
+
+const DEFAULT_THRESHOLD_PERCENT = 15;
+const ZERO_AGGREGATE: TokenEfficiencyReport["aggregate"] = {
+  pi: { totalTokens: 0, p50PerScenario: 0, p90PerScenario: 0 },
+  codex: { totalTokens: 0, p50PerScenario: 0, p90PerScenario: 0 },
+  deltaPercent: 0,
+  flaggedScenarios: [],
+  savingsScenarios: [],
+};
+
+function normalizeRuntimePair(
+  pair: [RuntimeId, RuntimeId] | null | undefined,
+): [RuntimeId, RuntimeId] {
+  if (pair?.[0] && pair?.[1]) {
+    return pair;
+  }
+  return ["pi", "codex"];
+}
+
+function normalizeTokenCount(value: number): number {
+  return Number.isFinite(value) ? Math.max(0, value) : 0;
+}
+
+function deltaPercent(piTotalTokens: number, codexTotalTokens: number): number {
+  if (piTotalTokens === 0) {
+    return codexTotalTokens === 0 ? 0 : 100;
+  }
+  return ((codexTotalTokens - piTotalTokens) / piTotalTokens) * 100;
+}
+
+function percentile(values: readonly number[], p: number): number {
+  if (values.length === 0) {
+    return 0;
+  }
+  const sorted = [...values].toSorted((left, right) => left - right);
+  const index = Math.min(sorted.length - 1, Math.max(0, Math.ceil((p / 100) * sorted.length) - 1));
+  return sorted[index] ?? 0;
+}
+
+function isLiveProviderMode(providerMode: string | undefined) {
+  return providerMode?.startsWith("live-") === true;
+}
+
+function formatPercent(value: number) {
+  const sign = value > 0 ? "+" : "";
+  return `${sign}${value.toFixed(1)}%`;
+}
+
+function runtimeUsage(cell: RuntimeParityCell): TokenEfficiencyRuntimeUsage {
+  return {
+    inputTokens: normalizeTokenCount(cell.usage.inputTokens),
+    outputTokens: normalizeTokenCount(cell.usage.outputTokens),
+    totalTokens: normalizeTokenCount(cell.usage.totalTokens),
+    toolCallCount: cell.toolCalls.length,
+  };
+}
+
+function toolNamesForCells(pi: RuntimeParityCell, codex: RuntimeParityCell): string[] {
+  return [...new Set([...pi.toolCalls, ...codex.toolCalls].map((call) => call.tool))].toSorted(
+    (left, right) => left.localeCompare(right),
+  );
+}
+
+function buildRow(params: {
+  result: RuntimeParityResult;
+  thresholdPercent: number;
+  usageSource: TokenEfficiencyRow["usageSource"];
+}): TokenEfficiencyRow {
+  const pi = runtimeUsage(params.result.cells.pi);
+  const codex = runtimeUsage(params.result.cells.codex);
+  const delta = deltaPercent(pi.totalTokens, codex.totalTokens);
+  const flagged = params.usageSource === "live-usage" && delta > params.thresholdPercent;
+  const classification =
+    delta > params.thresholdPercent
+      ? "regression"
+      : delta < -params.thresholdPercent
+        ? "savings"
+        : "neutral";
+  return {
+    scenarioId: params.result.scenarioId,
+    usageSource: params.usageSource,
+    pi,
+    codex,
+    deltaPercent: delta,
+    classification,
+    flagged,
+    toolsUsed: toolNamesForCells(params.result.cells.pi, params.result.cells.codex),
+  };
+}
+
+function buildAggregate(rows: readonly TokenEfficiencyRow[]): TokenEfficiencyReport["aggregate"] {
+  const piTotals = rows.map((row) => row.pi.totalTokens);
+  const codexTotals = rows.map((row) => row.codex.totalTokens);
+  const piTotalTokens = piTotals.reduce((sum, value) => sum + value, 0);
+  const codexTotalTokens = codexTotals.reduce((sum, value) => sum + value, 0);
+  return {
+    pi: {
+      totalTokens: piTotalTokens,
+      p50PerScenario: percentile(piTotals, 50),
+      p90PerScenario: percentile(piTotals, 90),
+    },
+    codex: {
+      totalTokens: codexTotalTokens,
+      p50PerScenario: percentile(codexTotals, 50),
+      p90PerScenario: percentile(codexTotals, 90),
+    },
+    deltaPercent: deltaPercent(piTotalTokens, codexTotalTokens),
+    flaggedScenarios: rows.filter((row) => row.flagged).map((row) => row.scenarioId),
+    savingsScenarios: rows
+      .filter((row) => row.classification === "savings")
+      .map((row) => row.scenarioId),
+  };
+}
+
+function liveEvidenceFailures(row: TokenEfficiencyRow): string[] {
+  const failures: string[] = [];
+  if (row.pi.totalTokens <= 0) {
+    failures.push(`${row.scenarioId} pi live usage totalTokens=${row.pi.totalTokens}`);
+  }
+  if (row.codex.totalTokens <= 0) {
+    failures.push(`${row.scenarioId} codex live usage totalTokens=${row.codex.totalTokens}`);
+  }
+  return failures;
+}
+
+export function buildTokenEfficiencyReport(
+  params: BuildTokenEfficiencyReportParams,
+): TokenEfficiencyReport {
+  const providerMode = params.summary.run?.providerMode;
+  const runtimePair = normalizeRuntimePair(params.summary.run?.runtimePair);
+  const thresholdPercent = params.thresholdPercent ?? DEFAULT_THRESHOLD_PERCENT;
+  const liveUsage = isLiveProviderMode(providerMode);
+  const usageSource: TokenEfficiencyRow["usageSource"] = liveUsage ? "live-usage" : "mock-estimate";
+  const parityResults = params.summary.scenarios
+    .map((scenario) => scenario.runtimeParity)
+    .filter((result): result is RuntimeParityResult => !!result);
+
+  if (parityResults.length === 0) {
+    return {
+      status: "skipped",
+      runtimePair,
+      generatedAt: params.generatedAt ?? new Date().toISOString(),
+      ...(providerMode ? { providerMode } : {}),
+      thresholdPercent,
+      rows: [],
+      aggregate: ZERO_AGGREGATE,
+      pass: true,
+      failures: [],
+      skipReason: "No runtime parity captures were present in the suite summary.",
+      notes: ["Token efficiency requires runtime-pair summaries with RuntimeParityResult cells."],
+    };
+  }
+
+  const rows = parityResults.map((result) =>
+    buildRow({
+      result,
+      thresholdPercent,
+      usageSource,
+    }),
+  );
+  const aggregate = buildAggregate(rows);
+  const failures = rows.flatMap((row) => {
+    const rowFailures = liveUsage ? liveEvidenceFailures(row) : [];
+    if (row.flagged) {
+      rowFailures.push(
+        `${row.scenarioId} token delta=${formatPercent(row.deltaPercent)} exceeds ${thresholdPercent.toFixed(1)}% Codex increase threshold`,
+      );
+    }
+    return rowFailures;
+  });
+
+  return {
+    status: liveUsage ? "evaluated" : "estimated",
+    runtimePair,
+    generatedAt: params.generatedAt ?? new Date().toISOString(),
+    ...(providerMode ? { providerMode } : {}),
+    thresholdPercent,
+    rows,
+    aggregate,
+    pass: failures.length === 0,
+    failures,
+    notes: [
+      "Token totals are read from RuntimeParityCell.usage, which is captured from normalized AssistantMessage.usage.",
+      "Codex savings are reported as savings and do not fail the gate; only positive Codex-over-Pi live deltas exceed the threshold.",
+      usageSource === "mock-estimate"
+        ? "Mock-provider token totals are labeled as estimates and do not block the token-efficiency gate."
+        : "The report does not inspect provider transport payload token counters.",
+    ],
+  };
+}
+
+export function renderTokenEfficiencyMarkdownReport(report: TokenEfficiencyReport): string {
+  const lines = [
+    `# OpenClaw Runtime Token Efficiency - ${report.runtimePair[0]} vs ${report.runtimePair[1]}`,
+    "",
+    `- Generated at: ${report.generatedAt}`,
+    ...(report.providerMode ? [`- Provider mode: ${report.providerMode}`] : []),
+    `- Verdict: ${report.status === "skipped" ? "skipped" : report.pass ? "pass" : "fail"}`,
+    `- Usage source: ${report.rows[0]?.usageSource ?? "none"}`,
+    `- Threshold: Codex token increase > ${report.thresholdPercent.toFixed(1)}%`,
+    "",
+  ];
+
+  if (report.skipReason) {
+    lines.push(`- Skip reason: ${report.skipReason}`, "");
+  }
+
+  lines.push(
+    "## Aggregate Metrics",
+    "",
+    "| Runtime | Total tokens | p50 per scenario | p90 per scenario |",
+    "| --- | ---: | ---: | ---: |",
+    `| pi | ${report.aggregate.pi.totalTokens} | ${report.aggregate.pi.p50PerScenario} | ${report.aggregate.pi.p90PerScenario} |`,
+    `| codex | ${report.aggregate.codex.totalTokens} | ${report.aggregate.codex.p50PerScenario} | ${report.aggregate.codex.p90PerScenario} |`,
+    `| delta | ${formatPercent(report.aggregate.deltaPercent)} |  |  |`,
+    "",
+  );
+
+  if (report.rows.length > 0) {
+    lines.push(
+      "## Scenario Efficiency",
+      "",
+      "| Scenario | Source | Pi in/out/total/tools | Codex in/out/total/tools | Token delta | Classification | Flagged | Tools used |",
+      "| --- | --- | ---: | ---: | ---: | --- | --- | --- |",
+    );
+    for (const row of report.rows) {
+      lines.push(
+        `| ${row.scenarioId} | ${row.usageSource} | ${row.pi.inputTokens}/${row.pi.outputTokens}/${row.pi.totalTokens}/${row.pi.toolCallCount} | ${row.codex.inputTokens}/${row.codex.outputTokens}/${row.codex.totalTokens}/${row.codex.toolCallCount} | ${formatPercent(row.deltaPercent)} | ${row.classification} | ${row.flagged ? "yes" : "no"} | ${row.toolsUsed.join(", ")} |`,
+      );
+    }
+    lines.push("");
+  }
+
+  if (report.failures.length > 0) {
+    lines.push("## Gate Failures", "");
+    for (const failure of report.failures) {
+      lines.push(`- ${failure}`);
+    }
+    lines.push("");
+  }
+
+  lines.push("## Notes", "");
+  for (const note of report.notes) {
+    lines.push(`- ${note}`);
+  }
+  lines.push("");
+
+  return lines.join("\n");
+}
--- a/extensions/qa-lab/src/tool-coverage-report.test.ts
+++ b/extensions/qa-lab/src/tool-coverage-report.test.ts
@@ -223,6 +223,192 @@ describe("qa tool coverage report", () => {
    );
  });

+  it("passes required OpenClaw dynamic tool coverage when both runtimes exercise the tool", () => {
+    const report = buildQaToolCoverageReport({
+      scenarios: [
+        makeScenario("tool-web-search", "web-search", {
+          toolName: "web_search",
+          toolCoverage: {
+            bucket: "openclaw-dynamic-integration",
+            expectedLayer: "openclaw-dynamic",
+            capabilityLayer: "openclaw-dynamic-direct",
+            required: true,
+          },
+        }),
+      ],
+      summary: {
+        scenarios: [
+          {
+            name: "tool web_search",
+            status: "pass",
+            runtimeParity: {
+              scenarioId: "tool-web-search",
+              drift: "tool-result-shape",
+              driftDetails: "runtime envelopes differ",
+              cells: {
+                pi: {
+                  runtime: "pi",
+                  transcriptBytes: "",
+                  toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r1" }],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+                codex: {
+                  runtime: "codex",
+                  transcriptBytes: "",
+                  toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r2" }],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+              },
+            },
+          },
+        ],
+      },
+      generatedAt: "2026-05-10T00:00:00.000Z",
+    });
+
+    expect(report.pass).toBe(true);
+    expect(report.failures).toEqual([]);
+    expect(report.passingTools).toBe(1);
+  });
+
+  it("fails required OpenClaw dynamic tool coverage when a runtime skips the tool", () => {
+    const report = buildQaToolCoverageReport({
+      scenarios: [
+        makeScenario("tool-web-search", "web-search", {
+          toolName: "web_search",
+          toolCoverage: {
+            bucket: "openclaw-dynamic-integration",
+            expectedLayer: "openclaw-dynamic",
+            capabilityLayer: "openclaw-dynamic-direct",
+            required: true,
+          },
+        }),
+      ],
+      summary: {
+        scenarios: [
+          {
+            name: "tool web_search",
+            status: "fail",
+            runtimeParity: {
+              scenarioId: "tool-web-search",
+              drift: "tool-call-shape",
+              driftDetails: "Codex emitted no web_search call",
+              cells: {
+                pi: {
+                  runtime: "pi",
+                  transcriptBytes: "",
+                  toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+                codex: {
+                  runtime: "codex",
+                  transcriptBytes: "",
+                  toolCalls: [],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+              },
+            },
+          },
+        ],
+      },
+      generatedAt: "2026-05-10T00:00:00.000Z",
+    });
+
+    expect(report.pass).toBe(false);
+    expect(report.failures).toEqual([
+      "web-search missing codex tool call web_search",
+    ]);
+  });
+
+  it("fails required OpenClaw dynamic tool coverage when the fixture failure mode is preserved", () => {
+    const report = buildQaToolCoverageReport({
+      scenarios: [
+        makeScenario("tool-web-search", "web-search", {
+          toolName: "web_search",
+          toolCoverage: {
+            bucket: "openclaw-dynamic-integration",
+            expectedLayer: "openclaw-dynamic",
+            capabilityLayer: "openclaw-dynamic-direct",
+            required: true,
+          },
+        }),
+      ],
+      summary: {
+        scenarios: [
+          {
+            name: "tool web_search",
+            status: "fail",
+            runtimeParity: {
+              scenarioId: "tool-web-search",
+              drift: "failure-mode",
+              driftDetails: "at least one runtime failed",
+              cells: {
+                pi: {
+                  runtime: "pi",
+                  transcriptBytes: "",
+                  toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+                codex: {
+                  runtime: "codex",
+                  transcriptBytes: "",
+                  toolCalls: [{ tool: "web_search", argsHash: "a", resultHash: "r" }],
+                  finalText: "",
+                  usage: { inputTokens: 0, outputTokens: 0, totalTokens: 0 },
+                  wallClockMs: 1,
+                  bootStateLines: [],
+                },
+              },
+            },
+          },
+        ],
+      },
+      generatedAt: "2026-05-10T00:00:00.000Z",
+    });
+
+    expect(report.pass).toBe(false);
+    expect(report.failures).toEqual([
+      "web-search drift=failure-mode (at least one runtime failed)",
+    ]);
+  });
+
+  it("fails untracked required tools missing from an evaluated summary", () => {
+    const report = buildQaToolCoverageReport({
+      scenarios: [
+        makeScenario("tool-web-search", "web-search", {
+          toolCoverage: {
+            bucket: "openclaw-dynamic-integration",
+            expectedLayer: "openclaw-dynamic",
+            capabilityLayer: "openclaw-dynamic-direct",
+            required: true,
+          },
+        }),
+      ],
+      summary: {
+        scenarios: [],
+      },
+      generatedAt: "2026-05-10T00:00:00.000Z",
+    });
+
+    expect(report.pass).toBe(false);
+    expect(report.failures).toEqual(["web-search drift=not-run"]);
+  });
+
  it("rejects unknown runtime tool coverage buckets", () => {
    expect(() =>
      buildQaToolCoverageReport({
@@ -301,5 +487,13 @@ describe("qa tool coverage report", () => {
          "#80173 Tavily tools are listed in the phase matrix but are not exposed by the current default tool surface.",
      }),
    );
+    expect(report.rows.find((row) => row.tool === "web-search")).toEqual(
+      expect.objectContaining({
+        bucket: "openclaw-dynamic-integration",
+        capabilityLayer: "openclaw-dynamic-direct",
+        required: true,
+      }),
+    );
+    expect(report.rows.find((row) => row.tool === "web-search")?.tracking).toBeUndefined();
  });
 });
--- a/extensions/qa-lab/src/tool-coverage-report.ts
+++ b/extensions/qa-lab/src/tool-coverage-report.ts
@@ -31,6 +31,7 @@ export type QaToolCoverageBucket = QaRuntimeToolBucket;

 export type QaToolCoverageRow = {
  tool: string;
+  runtimeToolName?: string;
  bucket: QaToolCoverageBucket;
  expectedLayer: QaRuntimeToolExpectedLayer;
  capabilityLayer: QaRuntimeCapabilityLayer;
@@ -41,6 +42,8 @@ export type QaToolCoverageRow = {
  pi: QaToolCoverageStatus;
  codex: QaToolCoverageStatus;
  drift: QaToolCoverageDrift;
+  piToolCalls: number;
+  codexToolCalls: number;
  tracking?: string;
  codexDefaultImpact?: string;
  qaImpact?: string;
@@ -71,7 +74,7 @@ type ToolFixtureGroup = {
  scenarios: QaSeedScenarioWithSource[];
 };

-const PASSING_DRIFTS: ReadonlySet<QaToolCoverageDrift> = new Set(["none", "text-only", "not-run"]);
+const PASSING_DRIFTS: ReadonlySet<QaToolCoverageDrift> = new Set(["none", "text-only"]);

 function isRecord(value: unknown): value is Record<string, unknown> {
  return Boolean(value) && typeof value === "object" && !Array.isArray(value);
@@ -146,6 +149,12 @@ function readScenarioTracking(scenario: QaSeedScenarioWithSource): string | unde
  return issue;
 }

+function readScenarioRuntimeToolName(scenario: QaSeedScenarioWithSource): string | undefined {
+  const config = scenario.execution.config;
+  const toolCoverage = isRecord(config?.toolCoverage) ? config.toolCoverage : undefined;
+  return readString(toolCoverage?.actualTool) ?? readString(config?.toolName);
+}
+
 function summaryByScenarioId(
  summary: QaToolCoverageSuiteSummary | undefined,
 ): Map<string, RuntimeParityResult> {
@@ -173,6 +182,21 @@ function mergeScenarioResults(
  return failingResult;
 }

+function isPassingToolCoverageDrift(drift: QaToolCoverageDrift, evaluated: boolean) {
+  return PASSING_DRIFTS.has(drift) || (!evaluated && drift === "not-run");
+}
+
+function countRuntimeToolCalls(
+  result: RuntimeParityResult | undefined,
+  runtime: RuntimeId,
+  toolName: string | undefined,
+) {
+  if (!result || !toolName) {
+    return 0;
+  }
+  return result.cells[runtime].toolCalls.filter((call) => call.tool === toolName).length;
+}
+
 function buildRow(params: {
  group: ToolFixtureGroup;
  results: ReadonlyMap<string, RuntimeParityResult>;
@@ -184,8 +208,12 @@ function buildRow(params: {
    .find((entry) => entry.required);
  const fallbackMetadata = readScenarioRuntimeToolCoverageMetadata(params.group.scenarios[0]);
  const rowMetadata = metadata ?? fallbackMetadata;
+  const runtimeToolName = params.group.scenarios
+    .map(readScenarioRuntimeToolName)
+    .find(Boolean);
  return {
    tool: params.group.tool,
+    ...(runtimeToolName ? { runtimeToolName } : {}),
    bucket: rowMetadata.bucket,
    expectedLayer: rowMetadata.expectedLayer,
    capabilityLayer: rowMetadata.capabilityLayer,
@@ -196,6 +224,8 @@ function buildRow(params: {
    pi: result ? cellStatus(result.cells.pi) : "not-run",
    codex: result ? cellStatus(result.cells.codex) : "not-run",
    drift: result?.drift ?? "not-run",
+    piToolCalls: countRuntimeToolCalls(result, "pi", runtimeToolName),
+    codexToolCalls: countRuntimeToolCalls(result, "codex", runtimeToolName),
    ...(tracking ? { tracking } : {}),
    ...(rowMetadata.codexDefaultImpact
      ? { codexDefaultImpact: rowMetadata.codexDefaultImpact }
@@ -206,6 +236,28 @@ function buildRow(params: {
  };
 }

+function coverageFailureForRow(row: QaToolCoverageRow): string | undefined {
+  if (!row.required || row.tracking) {
+    return undefined;
+  }
+  if (row.drift === "not-run") {
+    return `${row.tool} drift=not-run`;
+  }
+  if (row.pi !== "pass" || row.codex !== "pass") {
+    return `${row.tool} status pi=${row.pi} codex=${row.codex}`;
+  }
+  if (row.drift === "failure-mode") {
+    return `${row.tool} drift=failure-mode${row.details ? ` (${row.details})` : ""}`;
+  }
+  if (row.runtimeToolName && row.piToolCalls === 0) {
+    return `${row.tool} missing pi tool call ${row.runtimeToolName}`;
+  }
+  if (row.runtimeToolName && row.codexToolCalls === 0) {
+    return `${row.tool} missing codex tool call ${row.runtimeToolName}`;
+  }
+  return undefined;
+}
+
 export function buildQaToolCoverageReport(params: {
  scenarios: readonly QaSeedScenarioWithSource[];
  summary?: QaToolCoverageSuiteSummary;
@@ -221,9 +273,7 @@ export function buildQaToolCoverageReport(params: {
  );
  const evaluated = Boolean(params.summary);
  const failures = evaluated
-    ? rows
-        .filter((row) => row.required && !row.tracking && !PASSING_DRIFTS.has(row.drift))
-        .map((row) => `${row.tool} drift=${row.drift}${row.details ? ` (${row.details})` : ""}`)
+    ? rows.map(coverageFailureForRow).filter((failure): failure is string => Boolean(failure))
    : [];
  return {
    runtimePair: normalizeRuntimePair(params.runtimePair ?? params.summary?.run?.runtimePair),
@@ -237,7 +287,15 @@ export function buildQaToolCoverageReport(params: {
    dynamicIntegrationTools: rows.filter((row) => row.bucket === "openclaw-dynamic-integration")
      .length,
    optionalTools: rows.filter((row) => row.bucket === "optional-profile-or-plugin").length,
-    passingTools: evaluated ? rows.filter((row) => PASSING_DRIFTS.has(row.drift)).length : 0,
+    passingTools: evaluated
+      ? rows.filter(
+          (row) =>
+            !row.tracking &&
+            row.pi === "pass" &&
+            row.codex === "pass" &&
+            (isPassingToolCoverageDrift(row.drift, true) || !coverageFailureForRow(row)),
+        ).length
+      : 0,
    failingTools: failures.length,
    rows,
    pass: failures.length === 0,
--- a/extensions/telegram/src/bot.fetch-abort.test.ts
+++ b/extensions/telegram/src/bot.fetch-abort.test.ts
@@ -246,6 +246,44 @@ describe("createTelegramBot fetch abort", () => {
    vi.useRealTimers();
  });

+  it("retries Telegram 421 responses after forcing transport fallback", async () => {
+    const forceFallback = vi.fn(() => true);
+    const fetchSpy = vi
+      .fn()
+      .mockResolvedValueOnce(new Response("Misdirected Request", { status: 421 }))
+      .mockResolvedValueOnce(new Response("{}", { status: 200 }));
+    const { clientFetch } = createWrappedTelegramClientFetchWithTransport({
+      fetch: fetchSpy as typeof fetch,
+      forceFallback,
+    });
+
+    const result = await clientFetch("https://api.telegram.org/bot123456:ABC/sendMessage");
+
+    expect(result).toBeInstanceOf(Response);
+    expect((result as Response).status).toBe(200);
+    expect(forceFallback).toHaveBeenCalledWith("misdirected-request");
+    expect(fetchSpy).toHaveBeenCalledTimes(2);
+  });
+
+  it("retries Telegram 421 fetch errors after forcing transport fallback", async () => {
+    const forceFallback = vi.fn(() => true);
+    const fetchSpy = vi
+      .fn()
+      .mockRejectedValueOnce(Object.assign(new Error("421 Misdirected Request"), { status: 421 }))
+      .mockResolvedValueOnce(new Response("{}", { status: 200 }));
+    const { clientFetch } = createWrappedTelegramClientFetchWithTransport({
+      fetch: fetchSpy as typeof fetch,
+      forceFallback,
+    });
+
+    const result = await clientFetch("https://api.telegram.org/bot123456:ABC/sendMessage");
+
+    expect(result).toBeInstanceOf(Response);
+    expect((result as Response).status).toBe(200);
+    expect(forceFallback).toHaveBeenCalledWith("misdirected-request");
+    expect(fetchSpy).toHaveBeenCalledTimes(2);
+  });
+
  it("preserves the original fetch error when tagging cannot attach metadata", async () => {
    const frozenError = Object.freeze(
      Object.assign(new TypeError("fetch failed"), {
--- a/extensions/telegram/src/client-fetch.ts
+++ b/extensions/telegram/src/client-fetch.ts
@@ -1,7 +1,7 @@
 import type { ApiClientOptions } from "grammy";
 import { normalizeOptionalLowercaseString } from "openclaw/plugin-sdk/string-coerce-runtime";
 import type { TelegramTransport } from "./fetch.js";
-import { tagTelegramNetworkError } from "./network-errors.js";
+import { isTelegramMisdirectedRequestError, tagTelegramNetworkError } from "./network-errors.js";
 import { resolveTelegramRequestTimeoutMs } from "./request-timeouts.js";

 type TelegramFetchInput = Parameters<NonNullable<ApiClientOptions["fetch"]>>[0];
@@ -135,6 +135,11 @@ export function createTelegramClientFetch(params: {
      : undefined;
    const requestSignal = isTelegramAbortSignalLike(init?.signal) ? init.signal : undefined;

+    const canForceTransportFallback = (reason: string) =>
+      !shutdownSignal?.aborted &&
+      !requestSignal?.aborted &&
+      params.transport?.forceFallback?.(reason) === true;
+
    const runFetch = async () => {
      const controller = new AbortController();
      const abortWith = (signal: Pick<TelegramAbortSignalLike, "reason">) =>
@@ -195,14 +200,22 @@ export function createTelegramClientFetch(params: {
    };

    try {
-      return await runFetch();
+      const response = await runFetch();
+      if (response.status === 421 && canForceTransportFallback("misdirected-request")) {
+        return await runFetch();
+      }
+      return response;
    } catch (err) {
      if (
        requestTimeoutMs &&
        shouldRetryTimedOutTelegramControlRequest(method) &&
-        !shutdownSignal?.aborted &&
-        !requestSignal?.aborted &&
-        params.transport?.forceFallback?.("request-timeout")
+        canForceTransportFallback("request-timeout")
+      ) {
+        return await runFetch();
+      }
+      if (
+        isTelegramMisdirectedRequestError(err) &&
+        canForceTransportFallback("misdirected-request")
      ) {
        return await runFetch();
      }
--- a/extensions/telegram/src/draft-stream.test.ts
+++ b/extensions/telegram/src/draft-stream.test.ts
@@ -145,27 +145,32 @@ describe("createTelegramDraftStream", () => {
    }
  });

-  it("does not retry DM message preview sends without the topic id", async () => {
-    const api = createMockDraftApi();
-    api.sendMessage.mockRejectedValueOnce(new Error("400: Bad Request: message thread not found"));
-    const warn = vi.fn();
-    const stream = createDraftStream(api, {
-      thread: { id: 42, scope: "dm" },
-      warn,
-    });
+  it.each(["forum", "dm"] as const)(
+    "does not retry %s message preview sends without the topic id",
+    async (scope) => {
+      const api = createMockDraftApi();
+      api.sendMessage.mockRejectedValueOnce(
+        new Error("400: Bad Request: message thread not found"),
+      );
+      const warn = vi.fn();
+      const stream = createDraftStream(api, {
+        thread: { id: 42, scope },
+        warn,
+      });

-    stream.update("Hello");
-    await stream.flush();
+      stream.update("Hello");
+      await stream.flush();

-    expect(api.sendMessage).toHaveBeenCalledTimes(1);
-    expect(api.sendMessage).toHaveBeenCalledWith(123, "Hello", { message_thread_id: 42 });
-    expect(warn).toHaveBeenCalledWith(
-      "telegram stream preview failed: 400: Bad Request: message thread not found",
-    );
-    expect(
-      warn.mock.calls.some(([message]) => String(message).includes("retrying without thread")),
-    ).toBe(false);
-  });
+      expect(api.sendMessage).toHaveBeenCalledTimes(1);
+      expect(api.sendMessage).toHaveBeenCalledWith(123, "Hello", { message_thread_id: 42 });
+      expect(warn).toHaveBeenCalledWith(
+        "telegram stream preview failed: 400: Bad Request: message thread not found",
+      );
+      expect(
+        warn.mock.calls.some(([message]) => String(message).includes("retrying without thread")),
+      ).toBe(false);
+    },
+  );

  it("keeps allow_sending_without_reply on message previews that target a reply", async () => {
    const api = createMockDraftApi();
--- a/extensions/telegram/src/draft-stream.ts
+++ b/extensions/telegram/src/draft-stream.ts
@@ -10,19 +10,6 @@ import { normalizeTelegramReplyToMessageId } from "./outbound-params.js";

 const TELEGRAM_STREAM_MAX_CHARS = 4096;
 const DEFAULT_THROTTLE_MS = 1000;
-const THREAD_NOT_FOUND_RE = /400:\s*Bad Request:\s*message thread not found/i;
-
-type TelegramSendMessageParams = Parameters<Bot["api"]["sendMessage"]>[2];
-
-function hasNumericMessageThreadId(
-  params: TelegramSendMessageParams | undefined,
-): params is TelegramSendMessageParams & { message_thread_id: number } {
-  return (
-    typeof params === "object" &&
-    params !== null &&
-    typeof (params as { message_thread_id?: unknown }).message_thread_id === "number"
-  );
-}

 export type TelegramDraftStream = {
  update: (text: string) => void;
@@ -109,7 +96,6 @@ export function createTelegramDraftStream(params: {
  const minInitialChars = params.minInitialChars;
  const chatId = params.chatId;
  const threadParams = buildTelegramThreadParams(params.thread);
-  const allowThreadlessRetry = params.thread?.scope !== "dm";
  const replyToMessageId = normalizeTelegramReplyToMessageId(params.replyToMessageId);
  const replyParams =
    replyToMessageId != null
@@ -136,10 +122,9 @@ export function createTelegramDraftStream(params: {
    renderedParseMode: "HTML" | undefined;
    sendGeneration: number;
  };
-  const sendRenderedMessageWithThreadFallback = async (sendArgs: {
+  const sendRenderedMessage = async (sendArgs: {
    renderedText: string;
    renderedParseMode: "HTML" | undefined;
-    fallbackWarnMessage: string;
  }) => {
    const sendParams = sendArgs.renderedParseMode
      ? {
@@ -147,28 +132,7 @@ export function createTelegramDraftStream(params: {
          parse_mode: sendArgs.renderedParseMode,
        }
      : replyParams;
-    const usedThreadParams = hasNumericMessageThreadId(sendParams);
-    try {
-      return {
-        sent: await params.api.sendMessage(chatId, sendArgs.renderedText, sendParams),
-        usedThreadParams,
-      };
-    } catch (err) {
-      if (!allowThreadlessRetry || !usedThreadParams || !THREAD_NOT_FOUND_RE.test(String(err))) {
-        throw err;
-      }
-      const threadlessParams: TelegramSendMessageParams = { ...sendParams };
-      delete threadlessParams.message_thread_id;
-      params.warn?.(sendArgs.fallbackWarnMessage);
-      return {
-        sent: await params.api.sendMessage(
-          chatId,
-          sendArgs.renderedText,
-          Object.keys(threadlessParams).length > 0 ? threadlessParams : undefined,
-        ),
-        usedThreadParams: false,
-      };
-    }
+    return await params.api.sendMessage(chatId, sendArgs.renderedText, sendParams);
  };
  const sendMessageTransportPreview = async ({
    renderedText,
@@ -187,14 +151,12 @@ export function createTelegramDraftStream(params: {
      return true;
    }
    messageSendAttempted = true;
-    let sent: Awaited<ReturnType<typeof sendRenderedMessageWithThreadFallback>>["sent"];
+    let sent: Awaited<ReturnType<typeof sendRenderedMessage>>;
    try {
-      ({ sent } = await sendRenderedMessageWithThreadFallback({
+      sent = await sendRenderedMessage({
        renderedText,
        renderedParseMode,
-        fallbackWarnMessage:
-          "telegram stream preview send failed with message_thread_id, retrying without thread",
-      }));
+      });
    } catch (err) {
      if (isSafeToRetrySendError(err) || isTelegramClientRejection(err)) {
        messageSendAttempted = false;
--- a/extensions/telegram/src/network-errors.test.ts
+++ b/extensions/telegram/src/network-errors.test.ts
@@ -253,6 +253,36 @@ describe("isSafeToRetrySendError", () => {
    );
    expect(isSafeToRetrySendError(wrapped)).toBe(false);
  });
+
+  it.each([
+    ["status", Object.assign(new Error("Misdirected Request"), { status: 421 })],
+    ["statusCode", Object.assign(new Error("Misdirected Request"), { statusCode: "421" })],
+    ["error_code", errorWithTelegramCode("Misdirected Request", 421)],
+    ["message", new Error("421 Misdirected Request")],
+    [
+      "nested cause",
+      Object.assign(new Error("Network request for 'sendMessage' failed!"), {
+        cause: Object.assign(new Error("Misdirected Request"), { status: 421 }),
+      }),
+    ],
+    [
+      "grammY HttpError",
+      new MockHttpError(
+        "Network request for 'sendMessage' failed!",
+        Object.assign(new Error("Misdirected Request"), { status: 421 }),
+      ),
+    ],
+  ])("treats Telegram 421 Misdirected Request as safe to retry via %s", (_name, err) => {
+    expect(isSafeToRetrySendError(err)).toBe(true);
+  });
+
+  it("does not parse malformed status strings as Telegram 421", () => {
+    expect(
+      isSafeToRetrySendError(
+        Object.assign(new Error("Misdirected Request"), { statusCode: "421abc" }),
+      ),
+    ).toBe(false);
+  });
 });

 describe("isTelegramServerError", () => {
--- a/extensions/telegram/src/network-errors.ts
+++ b/extensions/telegram/src/network-errors.ts
@@ -103,6 +103,40 @@ function getErrorCode(err: unknown): string | undefined {
  return undefined;
 }

+function getNumericHttpStatus(err: unknown): number | undefined {
+  if (!err || typeof err !== "object") {
+    return undefined;
+  }
+  const candidate = err as { error_code?: unknown; status?: unknown; statusCode?: unknown };
+  for (const value of [candidate.error_code, candidate.status, candidate.statusCode]) {
+    if (typeof value === "number" && Number.isFinite(value)) {
+      return value;
+    }
+    if (typeof value === "string") {
+      const trimmed = value.trim();
+      if (/^\d+$/.test(trimmed)) {
+        return Number.parseInt(trimmed, 10);
+      }
+    }
+  }
+  return undefined;
+}
+
+export function isTelegramMisdirectedRequestError(err: unknown): boolean {
+  for (const candidate of collectTelegramErrorCandidates(err)) {
+    const code = normalizeCode(getErrorCode(candidate));
+    if (code === "421" || getNumericHttpStatus(candidate) === 421) {
+      return true;
+    }
+
+    const message = normalizeLowercaseStringOrEmpty(formatErrorMessage(candidate));
+    if (/\b421\b/.test(message) && message.includes("misdirected request")) {
+      return true;
+    }
+  }
+  return false;
+}
+
 export type TelegramNetworkErrorContext = "polling" | "send" | "webhook" | "unknown";
 export type TelegramNetworkErrorOrigin = {
  method?: string | null;
@@ -162,6 +196,9 @@ export function isSafeToRetrySendError(err: unknown): boolean {
  if (!err) {
    return false;
  }
+  if (isTelegramMisdirectedRequestError(err)) {
+    return true;
+  }
  for (const candidate of collectTelegramErrorCandidates(err)) {
    const code = normalizeCode(getErrorCode(candidate));
    if (code && PRE_CONNECT_ERROR_CODES.has(code)) {
--- a/extensions/telegram/src/reply-parameters.ts
+++ b/extensions/telegram/src/reply-parameters.ts
@@ -27,8 +27,8 @@ export function resolveTelegramSendThreadSpec(params: {
  if (messageThreadId == null) {
    return undefined;
  }
-  // Telegram supports DM topics; keep direct chat thread IDs and rely on
-  // thread-not-found retry fallback when a plain DM rejects them.
+  // Telegram supports DM topics; keep direct chat thread IDs and let invalid
+  // topics fail closed instead of sending to the base chat.
  return {
    id: messageThreadId,
    scope: params.chatType === "direct" ? "dm" : "forum",
--- a/extensions/telegram/src/send.test.ts
+++ b/extensions/telegram/src/send.test.ts
@@ -1828,49 +1828,30 @@ describe("sendMessageTelegram", () => {
    }
  });

-  it("retries sends without message_thread_id on thread-not-found", async () => {
-    const cases = [
-      { name: "forum", chatId: "-100123", text: "hello forum", messageId: 58 },
-    ] as const;
+  it("fails topic sends instead of retrying without message_thread_id", async () => {
+    const cases = [{ name: "forum", chatId: "-100123", text: "hello forum" }] as const;
    const threadErr = new Error("400: Bad Request: message thread not found");

    for (const testCase of cases) {
-      const sendMessage = vi
-        .fn()
-        .mockRejectedValueOnce(threadErr)
-        .mockResolvedValueOnce({
-          message_id: testCase.messageId,
-          chat: { id: testCase.chatId },
-        });
+      const sendMessage = vi.fn().mockRejectedValueOnce(threadErr);
      const api = { sendMessage } as unknown as {
        sendMessage: typeof sendMessage;
      };

-      const res = await sendMessageTelegram(testCase.chatId, testCase.text, {
-        cfg: TELEGRAM_TEST_CFG,
-        token: "tok",
-        api,
-        messageThreadId: 271,
-      });
+      await expect(
+        sendMessageTelegram(testCase.chatId, testCase.text, {
+          cfg: TELEGRAM_TEST_CFG,
+          token: "tok",
+          api,
+          messageThreadId: 271,
+        }),
+      ).rejects.toThrow("message thread not found");

-      expect(sendMessage, testCase.name).toHaveBeenNthCalledWith(
-        1,
-        testCase.chatId,
-        testCase.text,
-        {
-          parse_mode: "HTML",
-          message_thread_id: 271,
-        },
-      );
-      expect(sendMessage, testCase.name).toHaveBeenNthCalledWith(
-        2,
-        testCase.chatId,
-        testCase.text,
-        {
-          parse_mode: "HTML",
-        },
-      );
-      expect(res.messageId, testCase.name).toBe(String(testCase.messageId));
+      expect(sendMessage, testCase.name).toHaveBeenCalledTimes(1);
+      expect(sendMessage, testCase.name).toHaveBeenCalledWith(testCase.chatId, testCase.text, {
+        parse_mode: "HTML",
+        message_thread_id: 271,
+      });
    }
  });

@@ -2052,40 +2033,32 @@ describe("sendMessageTelegram", () => {
    expect(logs).not.toContain(body);
  });

-  it("logs threadless outbound text delivery after missing-thread fallback", async () => {
+  it("does not log outbound success when topic text send fails thread lookup", async () => {
    const logFile = captureInfoLogs();
    const chatId = "-1001234567890";
-    const body = "fallback reply body should stay private";
+    const body = "topic reply body should stay private";
    const threadErr = new Error("400: Bad Request: message thread not found");
-    const sendMessage = vi
-      .fn()
-      .mockRejectedValueOnce(threadErr)
-      .mockResolvedValueOnce({
-        message_id: 322,
-        chat: { id: chatId },
-      });
+    const sendMessage = vi.fn().mockRejectedValueOnce(threadErr);
    const api = { sendMessage } as unknown as {
      sendMessage: typeof sendMessage;
    };

-    await sendMessageTelegram(`telegram:group:${chatId}:topic:271`, body, {
-      cfg: TELEGRAM_TEST_CFG,
-      token: "tok",
-      accountId: "ops",
-      api,
-    });
+    await expect(
+      sendMessageTelegram(`telegram:group:${chatId}:topic:271`, body, {
+        cfg: TELEGRAM_TEST_CFG,
+        token: "tok",
+        accountId: "ops",
+        api,
+      }),
+    ).rejects.toThrow("message thread not found");

-    expect(sendMessage).toHaveBeenNthCalledWith(1, chatId, body, {
+    expect(sendMessage).toHaveBeenCalledTimes(1);
+    expect(sendMessage).toHaveBeenCalledWith(chatId, body, {
      parse_mode: "HTML",
      message_thread_id: 271,
    });
-    expect(sendMessage).toHaveBeenNthCalledWith(2, chatId, body, {
-      parse_mode: "HTML",
-    });
    const logs = capturedLogText(logFile);
-    expect(logs).toContain("outbound send ok");
-    expect(logs).toContain("messageId=322");
-    expect(logs).not.toContain("threadId=271");
+    expect(logs).not.toContain("outbound send ok");
    expect(logs).not.toContain(body);
  });

@@ -2161,17 +2134,11 @@ describe("sendMessageTelegram", () => {
    expect(logs).not.toContain(body);
  });

-  it("retries media sends without message_thread_id when thread is missing", async () => {
+  it("fails media sends instead of retrying without message_thread_id", async () => {
    const logFile = captureInfoLogs();
    const chatId = "-100123";
    const threadErr = new Error("400: Bad Request: message thread not found");
-    const sendPhoto = vi
-      .fn()
-      .mockRejectedValueOnce(threadErr)
-      .mockResolvedValueOnce({
-        message_id: 59,
-        chat: { id: chatId },
-      });
+    const sendPhoto = vi.fn().mockRejectedValueOnce(threadErr);
    const api = { sendPhoto } as unknown as {
      sendPhoto: typeof sendPhoto;
    };
@@ -2182,14 +2149,17 @@ describe("sendMessageTelegram", () => {
      fileName: "photo.jpg",
    });

-    const res = await sendMessageTelegram(chatId, "photo", {
-      cfg: TELEGRAM_TEST_CFG,
-      token: "tok",
-      api,
-      mediaUrl: "https://example.com/photo.jpg",
-      messageThreadId: 271,
-    });
+    await expect(
+      sendMessageTelegram(chatId, "photo", {
+        cfg: TELEGRAM_TEST_CFG,
+        token: "tok",
+        api,
+        mediaUrl: "https://example.com/photo.jpg",
+        messageThreadId: 271,
+      }),
+    ).rejects.toThrow("message thread not found");

+    expect(sendPhoto).toHaveBeenCalledTimes(1);
    expectMediaSendCall(
      firstMockCall(sendPhoto, "first send photo call"),
      "first send photo call",
@@ -2200,20 +2170,8 @@ describe("sendMessageTelegram", () => {
        message_thread_id: 271,
      },
    );
-    expectMediaSendCall(
-      mockCall(sendPhoto, 1, "second send photo call"),
-      "second send photo call",
-      chatId,
-      {
-        caption: "photo",
-        parse_mode: "HTML",
-      },
-    );
-    expect(res.messageId).toBe("59");
    const logs = capturedLogText(logFile);
-    expect(logs).toContain("outbound send ok");
-    expect(logs).toContain("messageId=59");
-    expect(logs).not.toContain("threadId=271");
+    expect(logs).not.toContain("outbound send ok");
  });

  it("defaults outbound media uploads to 100MB", async () => {
@@ -2612,32 +2570,27 @@ describe("sendStickerTelegram", () => {
    }
  });

-  it("retries sticker sends without message_thread_id when thread is missing", async () => {
+  it("fails sticker sends instead of retrying without message_thread_id", async () => {
    const chatId = "-100123";
    const threadErr = new Error("400: Bad Request: message thread not found");
-    const sendSticker = vi
-      .fn()
-      .mockRejectedValueOnce(threadErr)
-      .mockResolvedValueOnce({
-        message_id: 109,
-        chat: { id: chatId },
-      });
+    const sendSticker = vi.fn().mockRejectedValueOnce(threadErr);
    const api = { sendSticker } as unknown as {
      sendSticker: typeof sendSticker;
    };

-    const res = await sendStickerTelegram(chatId, "fileId123", {
-      cfg: TELEGRAM_TEST_CFG,
-      token: "tok",
-      api,
-      messageThreadId: 271,
-    });
+    await expect(
+      sendStickerTelegram(chatId, "fileId123", {
+        cfg: TELEGRAM_TEST_CFG,
+        token: "tok",
+        api,
+        messageThreadId: 271,
+      }),
+    ).rejects.toThrow("message thread not found");

-    expect(sendSticker).toHaveBeenNthCalledWith(1, chatId, "fileId123", {
+    expect(sendSticker).toHaveBeenCalledTimes(1);
+    expect(sendSticker).toHaveBeenCalledWith(chatId, "fileId123", {
      message_thread_id: 271,
    });
-    expect(sendSticker).toHaveBeenNthCalledWith(2, chatId, "fileId123", undefined);
-    expect(res.messageId).toBe("109");
  });

  it("fails when sticker send returns no message_id", async () => {
@@ -3110,40 +3063,31 @@ describe("sendPollTelegram", () => {
    expect(requireRecord(sendPollCall[3], "send poll params").open_period).toBe(60);
  });

-  it("retries without message_thread_id on thread-not-found", async () => {
+  it("fails poll sends instead of retrying without message_thread_id", async () => {
    const api = {
-      sendPoll: vi.fn(
-        async (_chatId: string, _question: string, _options: string[], params: unknown) => {
-          const p = params as { message_thread_id?: unknown } | undefined;
-          if (p?.message_thread_id) {
-            throw new Error("400: Bad Request: message thread not found");
-          }
-          return { message_id: 1, chat: { id: 2 }, poll: { id: "p2" } };
-        },
-      ),
+      sendPoll: vi
+        .fn()
+        .mockRejectedValueOnce(new Error("400: Bad Request: message thread not found")),
    };

-    const res = await sendPollTelegram(
-      "-100123",
-      { question: "Q", options: ["A", "B"] },
-      {
-        cfg: TELEGRAM_TEST_CFG,
-        token: "t",
-        api: api as unknown as Bot["api"],
-        messageThreadId: 99,
-      },
-    );
+    await expect(
+      sendPollTelegram(
+        "-100123",
+        { question: "Q", options: ["A", "B"] },
+        {
+          cfg: TELEGRAM_TEST_CFG,
+          token: "t",
+          api: api as unknown as Bot["api"],
+          messageThreadId: 99,
+        },
+      ),
+    ).rejects.toThrow("message thread not found");

-    expect(res).toEqual({ messageId: "1", chatId: "2", pollId: "p2" });
-    expect(api.sendPoll).toHaveBeenCalledTimes(2);
+    expect(api.sendPoll).toHaveBeenCalledTimes(1);
    expect(
      requireRecord(firstMockCall(api.sendPoll, "send poll call")[3], "send poll params")
        .message_thread_id,
    ).toBe(99);
-    expect(
-      (mockCall(api.sendPoll, 1, "second send poll call")[3] as { message_thread_id?: unknown })
-        .message_thread_id,
-    ).toBeUndefined();
  });

  it("rejects durationHours for Telegram polls", async () => {
--- a/extensions/telegram/src/send.ts
+++ b/extensions/telegram/src/send.ts
@@ -221,7 +221,6 @@ function logTelegramOutboundSendOk(params: TelegramOutboundSuccessLogParams): vo
 }

 const PARSE_ERR_RE = /can't parse entities|parse entities|find end of the entity/i;
-const THREAD_NOT_FOUND_RE = /400:\s*Bad Request:\s*message thread not found/i;
 const MESSAGE_NOT_MODIFIED_RE =
  /400:\s*Bad Request:\s*message is not modified|MESSAGE_NOT_MODIFIED/i;
 const MESSAGE_DELETE_NOOP_RE =
@@ -412,10 +411,6 @@ function normalizeMessageId(raw: string | number): number {
  throw new Error("Message id is required for Telegram actions");
 }

-function isTelegramThreadNotFoundError(err: unknown): boolean {
-  return THREAD_NOT_FOUND_RE.test(formatErrorMessage(err));
-}
-
 function isTelegramMessageNotModifiedError(err: unknown): boolean {
  return MESSAGE_NOT_MODIFIED_RE.test(formatErrorMessage(err));
 }
@@ -424,28 +419,6 @@ function isTelegramMessageDeleteNoopError(err: unknown): boolean {
  return MESSAGE_DELETE_NOOP_RE.test(formatErrorMessage(err));
 }

-function hasMessageThreadIdParam(params?: TelegramThreadScopedParams): boolean {
-  if (!params) {
-    return false;
-  }
-  const value = params.message_thread_id;
-  if (typeof value === "number") {
-    return Number.isFinite(value);
-  }
-  return false;
-}
-
-function removeMessageThreadIdParam<TParams extends TelegramThreadScopedParams | undefined>(
-  params: TParams,
-): TParams {
-  if (!params || !hasMessageThreadIdParam(params)) {
-    return params;
-  }
-  const next = { ...params };
-  delete next.message_thread_id;
-  return (Object.keys(next).length > 0 ? next : undefined) as TParams;
-}
-
 function isTelegramHtmlParseError(err: unknown): boolean {
  return PARSE_ERR_RE.test(formatErrorMessage(err));
 }
@@ -575,41 +548,6 @@ function wrapTelegramChatNotFoundError(err: unknown, params: { chatId: string; i
  );
 }

-async function withTelegramThreadFallback<
-  T,
-  TParams extends TelegramThreadScopedParams | undefined,
->(
-  params: TParams,
-  label: string,
-  verbose: boolean | undefined,
-  allowThreadlessRetry: boolean,
-  attempt: (effectiveParams: TParams, effectiveLabel: string) => Promise<T>,
-): Promise<{ result: T; acceptedParams: TParams }> {
-  try {
-    return { result: await attempt(params, label), acceptedParams: params };
-  } catch (err) {
-    // Do not widen this fallback to cover "chat not found".
-    // chat-not-found is routing/auth/membership/token; stripping thread IDs hides root cause.
-    if (
-      !allowThreadlessRetry ||
-      !hasMessageThreadIdParam(params) ||
-      !isTelegramThreadNotFoundError(err)
-    ) {
-      throw err;
-    }
-    if (verbose) {
-      sendLogger.warn(
-        `telegram ${label} failed with message_thread_id, retrying without thread: ${formatErrorMessage(err)}`,
-      );
-    }
-    const retriedParams = removeMessageThreadIdParam(params);
-    return {
-      result: await attempt(retriedParams, `${label}-threadless`),
-      acceptedParams: retriedParams,
-    };
-  }
-}
-
 function createRequestWithChatNotFound(params: {
  requestWithDiag: TelegramRequestWithDiag;
  chatId: string;
@@ -707,49 +645,40 @@ export async function sendMessageTelegram(
    chunk: TelegramTextChunk,
    params?: TelegramSendMessageParams,
  ) => {
-    return await withTelegramThreadFallback(
-      params,
-      "message",
-      opts.verbose,
-      target.chatType !== "direct",
-      async (effectiveParams, label) => {
-        const baseParams = effectiveParams ? { ...effectiveParams } : {};
-        if (linkPreviewOptions) {
-          baseParams.link_preview_options = linkPreviewOptions;
-        }
-        const plainParams: TelegramSendMessageParams = {
-          ...baseParams,
-          ...(opts.silent === true ? { disable_notification: true } : {}),
-        };
-        const hasPlainParams = Object.keys(plainParams).length > 0;
-        const requestPlain = (retryLabel: string) =>
-          requestWithChatNotFound(
-            () =>
-              hasPlainParams
-                ? api.sendMessage(chatId, chunk.plainText, plainParams)
-                : api.sendMessage(chatId, chunk.plainText),
-            retryLabel,
-          );
-        if (!chunk.htmlText) {
-          return await requestPlain(label);
-        }
-        const htmlText = chunk.htmlText;
-        const htmlParams: TelegramSendMessageParams = {
-          parse_mode: "HTML" as const,
-          ...plainParams,
-        };
-        return await withTelegramHtmlParseFallback({
-          label,
+    const baseParams = params ? { ...params } : {};
+    if (linkPreviewOptions) {
+      baseParams.link_preview_options = linkPreviewOptions;
+    }
+    const plainParams: TelegramSendMessageParams = {
+      ...baseParams,
+      ...(opts.silent === true ? { disable_notification: true } : {}),
+    };
+    const hasPlainParams = Object.keys(plainParams).length > 0;
+    const requestPlain = (label: string) =>
+      requestWithChatNotFound(
+        () =>
+          hasPlainParams
+            ? api.sendMessage(chatId, chunk.plainText, plainParams)
+            : api.sendMessage(chatId, chunk.plainText),
+        label,
+      );
+    const result = !chunk.htmlText
+      ? await requestPlain("message")
+      : await withTelegramHtmlParseFallback({
+          label: "message",
          verbose: opts.verbose,
-          requestHtml: (retryLabel) =>
+          requestHtml: (label) =>
            requestWithChatNotFound(
-              () => api.sendMessage(chatId, htmlText, htmlParams),
-              retryLabel,
+              () =>
+                api.sendMessage(chatId, chunk.htmlText ?? chunk.plainText, {
+                  parse_mode: "HTML" as const,
+                  ...plainParams,
+                }),
+              label,
            ),
          requestPlain,
        });
-      },
-    );
+    return { result, acceptedParams: params };
  };

  const buildTextParams = (isLastChunk: boolean) =>
@@ -927,15 +856,7 @@ export async function sendMessageTelegram(
      sender: (
        effectiveParams: TelegramThreadScopedParams | undefined,
      ) => Promise<TelegramMessageLike>,
-    ) =>
-      await withTelegramThreadFallback(
-        mediaParams,
-        label,
-        opts.verbose,
-        target.chatType !== "direct",
-        async (effectiveParams, retryLabel) =>
-          requestWithChatNotFound(() => sender(effectiveParams), retryLabel),
-      );
+    ) => await requestWithChatNotFound(() => sender(mediaParams), label);

    const mediaSender = (() => {
      if (isGif && deliveryKind !== "document") {
@@ -1023,7 +944,7 @@ export async function sendMessageTelegram(
      };
    })();

-    const { result, acceptedParams } = await sendMedia(mediaSender.label, mediaSender.sender);
+    const result = await sendMedia(mediaSender.label, mediaSender.sender);
    const mediaMessageId = resolveTelegramMessageIdOrThrow(result, "media send");
    const resolvedChatId = String(result?.chat?.id ?? chatId);
    recordSentMessage(chatId, mediaMessageId, cfg);
@@ -1036,7 +957,7 @@ export async function sendMessageTelegram(
        .map((part) => part.charAt(0).toUpperCase() + part.slice(1))
        .join("")}`,
      deliveryKind: mediaSender.label,
-      messageThreadId: acceptedParams?.message_thread_id,
+      messageThreadId: mediaParams.message_thread_id,
      replyToMessageId: opts.replyToMessageId,
      silent: opts.silent,
    });
@@ -1606,13 +1527,9 @@ export async function sendStickerTelegram(

  const stickerParams = hasThreadParams ? threadParams : undefined;

-  const { result } = await withTelegramThreadFallback(
-    stickerParams,
+  const result = await requestWithChatNotFound(
+    () => api.sendSticker(chatId, fileId.trim(), stickerParams),
    "sticker",
-    opts.verbose,
-    target.chatType !== "direct",
-    async (effectiveParams, label) =>
-      requestWithChatNotFound(() => api.sendSticker(chatId, fileId.trim(), effectiveParams), label),
  );

  const messageId = resolveTelegramMessageIdOrThrow(result, "sticker send");
@@ -1714,16 +1631,9 @@ export async function sendPollTelegram(
    ...(opts.silent === true ? { disable_notification: true } : {}),
  };

-  const { result } = await withTelegramThreadFallback(
-    pollParams,
+  const result = await requestWithChatNotFound(
+    () => api.sendPoll(chatId, normalizedPoll.question, pollOptions, pollParams),
    "poll",
-    opts.verbose,
-    target.chatType !== "direct",
-    async (effectiveParams, label) =>
-      requestWithChatNotFound(
-        () => api.sendPoll(chatId, normalizedPoll.question, pollOptions, effectiveParams),
-        label,
-      ),
  );

  const messageId = resolveTelegramMessageIdOrThrow(result, "poll send");
--- a/qa/scenarios/index.md
+++ b/qa/scenarios/index.md
@@ -28,7 +28,10 @@ Coverage tracking:
 Runtime parity tiers:

 - `standard`: required Codex-vs-Pi mock gate coverage for first-hour depth and
-  default runtime-tool fixtures; selected with
+  default runtime-tool fixtures. OpenClaw dynamic integration tools in this
+  tier are hard-gated by `openclaw qa coverage --tools --summary`; Codex-native
+  workspace rows remain separately tracked until native/live behavior is the
+  asserted surface. Selected with
  `openclaw qa suite --runtime-pair pi,codex --runtime-parity-tier standard`
 - `optional`: profile-, plugin-, or external-service-dependent runtime-tool
  fixtures that stay out of the default release gate
--- a/qa/scenarios/runtime/tools/image-generate.md
+++ b/qa/scenarios/runtime/tools/image-generate.md
@@ -13,6 +13,7 @@ successCriteria:
  - Effective tools expose image_generate after QA image-generation config is applied.
  - The mock provider plans exactly one happy-path image_generate call.
  - The mock provider plans one denied-input failure-path image_generate call.
+  - Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
 docsRefs:
  - docs/tools/image-generation.md
 codeRefs:
@@ -29,15 +30,12 @@ execution:
      actualTool: image_generate
      bucket: openclaw-dynamic-integration
      expectedLayer: openclaw-dynamic
+      capabilityLayer: openclaw-dynamic-direct
      required: true
-      tracking: "#80319"
      codexDefaultImpact: P4
      qaImpact: P1
-      action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
-      reason: image_generate is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
-    knownHarnessGap:
-      issue: "#80319"
-      reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
+      action: hard gate in the standard direct-loading tier
+      reason: image_generate is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
    promptSnippet: "target=image_generate"
    failurePromptSnippet: "failure target=image_generate"
 ```
--- a/qa/scenarios/runtime/tools/session-status.md
+++ b/qa/scenarios/runtime/tools/session-status.md
@@ -13,6 +13,7 @@ successCriteria:
  - Effective tools expose session_status.
  - The mock provider plans exactly one happy-path session_status call.
  - The mock provider plans one denied-input failure-path session_status call.
+  - Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
 docsRefs:
  - qa/scenarios/index.md
 codeRefs:
@@ -28,15 +29,12 @@ execution:
      actualTool: session_status
      bucket: openclaw-dynamic-integration
      expectedLayer: openclaw-dynamic
+      capabilityLayer: openclaw-dynamic-direct
      required: true
-      tracking: "#80319"
      codexDefaultImpact: P4
      qaImpact: P1
-      action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
-      reason: session_status is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
-    knownHarnessGap:
-      issue: "#80319"
-      reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
+      action: hard gate in the standard direct-loading tier
+      reason: session_status is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
    promptSnippet: "target=session_status"
    failurePromptSnippet: "failure target=session_status"
 ```
--- a/qa/scenarios/runtime/tools/sessions-spawn.md
+++ b/qa/scenarios/runtime/tools/sessions-spawn.md
@@ -13,6 +13,7 @@ successCriteria:
  - Effective tools expose sessions_spawn.
  - The mock provider plans exactly one happy-path sessions_spawn call.
  - The mock provider plans one denied-input failure-path sessions_spawn call.
+  - Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
 docsRefs:
  - qa/scenarios/index.md
 codeRefs:
@@ -28,15 +29,12 @@ execution:
      actualTool: sessions_spawn
      bucket: openclaw-dynamic-integration
      expectedLayer: openclaw-dynamic
+      capabilityLayer: openclaw-dynamic-direct
      required: true
-      tracking: "#80319"
      codexDefaultImpact: P4
      qaImpact: P1
-      action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
-      reason: sessions_spawn is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
-    knownHarnessGap:
-      issue: "#80319"
-      reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
+      action: hard gate in the standard direct-loading tier
+      reason: sessions_spawn is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
    promptSnippet: "target=sessions_spawn"
    failurePromptSnippet: "failure target=sessions_spawn"
 ```
--- a/qa/scenarios/runtime/tools/web-fetch.md
+++ b/qa/scenarios/runtime/tools/web-fetch.md
@@ -13,6 +13,7 @@ successCriteria:
  - Effective tools expose web_fetch.
  - The mock provider plans exactly one happy-path web_fetch call.
  - The mock provider plans one denied-input failure-path web_fetch call.
+  - Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
 docsRefs:
  - qa/scenarios/index.md
 codeRefs:
@@ -28,15 +29,12 @@ execution:
      actualTool: web_fetch
      bucket: openclaw-dynamic-integration
      expectedLayer: openclaw-dynamic
+      capabilityLayer: openclaw-dynamic-direct
      required: true
-      tracking: "#80319"
      codexDefaultImpact: P4
      qaImpact: P1
-      action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
-      reason: web_fetch is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
-    knownHarnessGap:
-      issue: "#80319"
-      reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
+      action: hard gate in the standard direct-loading tier
+      reason: web_fetch is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
    promptSnippet: "target=web_fetch"
    failurePromptSnippet: "failure target=web_fetch"
 ```
--- a/qa/scenarios/runtime/tools/web-search.md
+++ b/qa/scenarios/runtime/tools/web-search.md
@@ -13,6 +13,7 @@ successCriteria:
  - Effective tools expose web_search.
  - The mock provider plans exactly one happy-path web_search call.
  - The mock provider plans one denied-input failure-path web_search call.
+  - Runtime parity coverage hard-fails call/result drift in the standard direct-loading gate.
 docsRefs:
  - qa/scenarios/index.md
 codeRefs:
@@ -28,15 +29,12 @@ execution:
      actualTool: web_search
      bucket: openclaw-dynamic-integration
      expectedLayer: openclaw-dynamic
+      capabilityLayer: openclaw-dynamic-direct
      required: true
-      tracking: "#80319"
      codexDefaultImpact: P4
      qaImpact: P1
-      action: teach fixture/mock planner Codex searchable OpenClaw dynamic tool behavior
-      reason: web_search is an OpenClaw integration tool; QA mock provider does not yet model Codex searchable/deferred dynamic tool declarations for this fixture.
-    knownHarnessGap:
-      issue: "#80319"
-      reason: QA mock provider does not yet model Codex searchable/deferred OpenClaw dynamic tool declarations for this fixture.
+      action: hard gate in the standard direct-loading tier
+      reason: web_search is an OpenClaw integration tool and must stay visible and callable under Pi and Codex direct runtime parity.
    promptSnippet: "target=web_search"
    failurePromptSnippet: "failure target=web_search"
 ```
--- a/scripts/e2e/telegram-user-crabbox-proof.ts
+++ b/scripts/e2e/telegram-user-crabbox-proof.ts
@@ -1049,9 +1049,35 @@ function sshArgs(inspect: CrabboxInspect) {
  };
 }

+function isTransientSshFailure(error: unknown) {
+  const message = error instanceof Error ? error.message : String(error);
+  return /Connection (?:closed|reset)|Operation timed out|Connection timed out/u.test(message);
+}
+
+async function runRemoteCommand(params: {
+  args: string[];
+  command: string;
+  cwd: string;
+  stdio?: "inherit" | "pipe";
+}) {
+  let lastError: unknown;
+  for (let attempt = 1; attempt <= 4; attempt += 1) {
+    try {
+      return await runCommand(params);
+    } catch (error) {
+      lastError = error;
+      if (attempt === 4 || !isTransientSshFailure(error)) {
+        throw error;
+      }
+      await new Promise((resolve) => setTimeout(resolve, attempt * 3000));
+    }
+  }
+  throw lastError;
+}
+
 async function scpToRemote(root: string, inspect: CrabboxInspect, local: string, remote: string) {
  const ssh = sshArgs(inspect);
-  await runCommand({
+  await runRemoteCommand({
    command: "scp",
    args: [...ssh.scpBase, local, `${ssh.target}:${remote}`],
    cwd: root,
@@ -1061,7 +1087,7 @@ async function scpToRemote(root: string, inspect: CrabboxInspect, local: string,

 async function scpFromRemote(root: string, inspect: CrabboxInspect, remote: string, local: string) {
  const ssh = sshArgs(inspect);
-  await runCommand({
+  await runRemoteCommand({
    command: "scp",
    args: [...ssh.scpBase, `${ssh.target}:${remote}`, local],
    cwd: root,
@@ -1071,7 +1097,7 @@ async function scpFromRemote(root: string, inspect: CrabboxInspect, remote: stri

 async function sshRun(root: string, inspect: CrabboxInspect, remoteCommand: string) {
  const ssh = sshArgs(inspect);
-  return await runCommand({
+  return await runRemoteCommand({
    command: "ssh",
    args: [...ssh.base, ssh.target, remoteCommand],
    cwd: root,
@@ -1090,7 +1116,7 @@ tdlib_url=${tdlibUrl}
 mkdir -p "$root"
 tar -xzf "$root/state.tgz" -C "$root"
 sudo apt-get update -y
-sudo DEBIAN_FRONTEND=noninteractive apt-get install -y curl git cmake g++ make zlib1g-dev libssl-dev python3 ffmpeg scrot xz-utils tar wmctrl xdotool x11-utils libopengl0 libxcb-cursor0 libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-randr0 libxcb-render-util0 libxcb-shape0 libxcb-xfixes0 libxcb-xinerama0 libxkbcommon-x11-0 >/tmp/openclaw-telegram-apt.log
+sudo DEBIAN_FRONTEND=noninteractive apt-get install -y curl git cmake g++ make zlib1g-dev libssl-dev python3 ffmpeg scrot xz-utils tar wmctrl xdotool x11-utils zbar-tools libopengl0 libxcb-cursor0 libxcb-icccm4 libxcb-image0 libxcb-keysyms1 libxcb-randr0 libxcb-render-util0 libxcb-shape0 libxcb-xfixes0 libxcb-xinerama0 libxkbcommon-x11-0 >/tmp/openclaw-telegram-apt.log
 if ! command -v python3 >/dev/null 2>&1; then
  echo "python3 is required" >&2
  exit 127
@@ -1122,6 +1148,7 @@ if ! ldconfig -p | grep -q libtdjson.so; then
  sudo ldconfig
 fi
 TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver" python3 "$root/user-driver.py" status --json --timeout-ms 60000 >"$root/status.json"
+TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver" python3 "$root/user-driver.py" terminate-desktop-sessions --json --timeout-ms 60000 --output "$root/desktop-sessions-cleanup.json"
 `;
 }

@@ -1131,6 +1158,7 @@ set -euo pipefail
 root=${REMOTE_ROOT}
 export DISPLAY="\${DISPLAY:-:99}"
 pkill -f "$root/Telegram/Telegram" >/dev/null 2>&1 || true
+rm -rf "$root/desktop/tdata"
 nohup "$root/Telegram/Telegram" -workdir "$root/desktop" >"$root/telegram-desktop.log" 2>&1 &
 pid=$!
 sleep 8
@@ -1145,6 +1173,60 @@ fi
 `;
 }

+function renderAuthorizeDesktop() {
+  return `#!/usr/bin/env bash
+set -euo pipefail
+root=${REMOTE_ROOT}
+export DISPLAY="\${DISPLAY:-:99}"
+win="$(wmctrl -l | awk 'tolower($0) ~ /telegram/ {print $1; exit}')"
+test -n "$win"
+xdotool windowactivate "$win"
+sleep 5
+click_window_ratio() {
+  eval "$(xdotool getwindowgeometry --shell "$win")"
+  xdotool windowactivate "$win"
+  sleep 0.2
+  xdotool mousemove "$((X + WIDTH / 2))" "$((Y + HEIGHT * $1 / 100))"
+  sleep 0.2
+  xdotool click 1
+  sleep 1
+}
+read_qr_link() {
+  scrot "$root/telegram-login-qr.png"
+  { zbarimg --raw "$root/telegram-login-qr.png" 2>/dev/null || true; } | awk 'index($0, "tg://login?token=") == 1 {print; exit}'
+}
+wait_for_qr_link() {
+  for _ in $(seq 1 25); do
+    link="$(read_qr_link)"
+    if [ -n "$link" ]; then
+      printf '%s\\n' "$link"
+      return 0
+    fi
+    sleep 1
+  done
+  return 1
+}
+click_window_ratio 69
+sleep 3
+click_window_ratio 80
+link="$(wait_for_qr_link)" || {
+  echo "Telegram Desktop QR login code was not found." >&2
+  exit 1
+}
+export TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver"
+python3 "$root/user-driver.py" confirm-qr --link "$link" --json --output "$root/desktop-session.json"
+python3 - "$root/desktop-session.json" <<'PY'
+import json
+import sys
+payload = json.loads(open(sys.argv[1]).read())
+session = payload.get("session") or {}
+if session.get("isPasswordPending"):
+    raise SystemExit("Telegram Desktop QR login requires a 2FA password.")
+PY
+sleep 6
+`;
+}
+
 function renderSelectDesktopChat(params: { chatTitle: string }) {
  return `#!/usr/bin/env bash
 set -euo pipefail
@@ -1414,12 +1496,14 @@ async function writeRemoteSessionScripts(params: {
 }) {
  const setupScript = path.join(params.localRoot, "remote-setup.sh");
  const launchScript = path.join(params.localRoot, "launch-desktop.sh");
+  const authorizeScript = path.join(params.localRoot, "authorize-desktop.sh");
  const selectChatScript = path.join(params.localRoot, "select-desktop-chat.sh");
  await writeExecutable(
    setupScript,
    renderRemoteSetup({ tdlibSha256: params.opts.tdlibSha256, tdlibUrl: params.opts.tdlibUrl }),
  );
  await writeExecutable(launchScript, renderLaunchDesktop());
+  await writeExecutable(authorizeScript, renderAuthorizeDesktop());
  await writeExecutable(
    selectChatScript,
    renderSelectDesktopChat({ chatTitle: params.opts.desktopChatTitle }),
@@ -1429,6 +1513,12 @@ async function writeRemoteSessionScripts(params: {
  await scpToRemote(params.root, params.inspect, params.stateArchive, `${REMOTE_ROOT}/state.tgz`);
  await scpToRemote(params.root, params.inspect, setupScript, `${REMOTE_ROOT}/remote-setup.sh`);
  await scpToRemote(params.root, params.inspect, launchScript, `${REMOTE_ROOT}/launch-desktop.sh`);
+  await scpToRemote(
+    params.root,
+    params.inspect,
+    authorizeScript,
+    `${REMOTE_ROOT}/authorize-desktop.sh`,
+  );
  await scpToRemote(
    params.root,
    params.inspect,
@@ -1437,6 +1527,7 @@ async function writeRemoteSessionScripts(params: {
  );
  await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/remote-setup.sh`);
  await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/launch-desktop.sh`);
+  await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/authorize-desktop.sh`);
  await sshRun(params.root, params.inspect, `bash ${REMOTE_ROOT}/select-desktop-chat.sh`);
  await sshRun(
    params.root,
@@ -1486,6 +1577,30 @@ fi`,
  );
 }

+async function terminateRemoteDesktopSession(root: string, inspect: CrabboxInspect) {
+  await sshRun(
+    root,
+    inspect,
+    `set -euo pipefail
+root=${REMOTE_ROOT}
+if [ ! -s "$root/desktop-session.json" ]; then
+  exit 0
+fi
+session_id="$(python3 - "$root/desktop-session.json" <<'PY'
+import json
+import sys
+payload = json.loads(open(sys.argv[1]).read())
+print((payload.get("session") or {}).get("id") or "")
+PY
+)"
+if [ -z "$session_id" ]; then
+  exit 0
+fi
+export TELEGRAM_USER_DRIVER_STATE_DIR="$root/user-driver"
+python3 "$root/user-driver.py" terminate-session --session-id "$session_id" --json --output "$root/desktop-session-terminated.json"`,
+  );
+}
+
 async function startSession(root: string, opts: Options, outputDir: string) {
  const localRoot = path.join(outputDir, ".session");
  fs.rmSync(localRoot, { force: true, recursive: true });
@@ -1756,6 +1871,16 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
  const statusPath = path.join(session.outputDir, "status.json");
  const ffmpegLogPath = path.join(session.outputDir, "ffmpeg.log");
  const crop = previewCrop(opts);
+  let desktopSessionTerminationAttempted = false;
+  const terminateDesktopSession = async () => {
+    if (opts.keepBox || desktopSessionTerminationAttempted) {
+      return;
+    }
+    desktopSessionTerminationAttempted = true;
+    await terminateRemoteDesktopSession(root, session.crabbox.inspect).catch((error: unknown) => {
+      summary.desktopSessionTerminateError = error instanceof Error ? error.message : String(error);
+    });
+  };
  try {
    await stopRemoteRecording(root, session.crabbox.inspect, session);
    await scpFromRemote(root, session.crabbox.inspect, session.recorder.remoteVideo, videoPath);
@@ -1774,6 +1899,23 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
    await scpFromRemote(root, session.crabbox.inspect, session.recorder.log, ffmpegLogPath).catch(
      () => {},
    );
+    await runCommand({
+      command: opts.crabboxBin,
+      args: [
+        "screenshot",
+        "--provider",
+        session.crabbox.provider,
+        "--target",
+        session.crabbox.target,
+        "--id",
+        session.crabbox.id,
+        "--output",
+        screenshotPath,
+      ],
+      cwd: root,
+      stdio: "inherit",
+    });
+    await terminateDesktopSession();
    summary.mediaPreview = await createMotionPreview({
      motionGifPath,
      motionVideoPath,
@@ -1791,22 +1933,6 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
        videoPath: motionVideoPath,
      });
    }
-    await runCommand({
-      command: opts.crabboxBin,
-      args: [
-        "screenshot",
-        "--provider",
-        session.crabbox.provider,
-        "--target",
-        session.crabbox.target,
-        "--id",
-        session.crabbox.id,
-        "--output",
-        screenshotPath,
-      ],
-      cwd: root,
-      stdio: "inherit",
-    });
    summary.artifacts = {
      desktopLog: path.relative(root, desktopLogPath),
      ffmpegLog: path.relative(root, ffmpegLogPath),
@@ -1826,6 +1952,7 @@ async function finishSession(root: string, opts: Options, outputDir: string) {
  } finally {
    killPidTree(session.localSut.gatewayPid);
    killPidTree(session.localSut.mockPid);
+    await terminateDesktopSession();
    await releaseCredential(root, opts, session.credential.leaseFile).catch((error: unknown) => {
      summary.credentialReleaseError = error instanceof Error ? error.message : String(error);
    });
@@ -2038,6 +2165,7 @@ async function main() {

    const setupScript = path.join(localRoot, "remote-setup.sh");
    const launchScript = path.join(localRoot, "launch-desktop.sh");
+    const authorizeScript = path.join(localRoot, "authorize-desktop.sh");
    const selectChatScript = path.join(localRoot, "select-desktop-chat.sh");
    const probeScript = path.join(localRoot, "remote-probe.sh");
    await writeExecutable(
@@ -2045,6 +2173,7 @@ async function main() {
      renderRemoteSetup({ tdlibSha256: opts.tdlibSha256, tdlibUrl: opts.tdlibUrl }),
    );
    await writeExecutable(launchScript, renderLaunchDesktop());
+    await writeExecutable(authorizeScript, renderAuthorizeDesktop());
    await writeExecutable(
      selectChatScript,
      renderSelectDesktopChat({ chatTitle: opts.desktopChatTitle }),
@@ -2063,6 +2192,7 @@ async function main() {
    await scpToRemote(root, inspect, stateArchive, `${REMOTE_ROOT}/state.tgz`);
    await scpToRemote(root, inspect, setupScript, `${REMOTE_ROOT}/remote-setup.sh`);
    await scpToRemote(root, inspect, launchScript, `${REMOTE_ROOT}/launch-desktop.sh`);
+    await scpToRemote(root, inspect, authorizeScript, `${REMOTE_ROOT}/authorize-desktop.sh`);
    await scpToRemote(root, inspect, selectChatScript, `${REMOTE_ROOT}/select-desktop-chat.sh`);
    await scpToRemote(root, inspect, probeScript, `${REMOTE_ROOT}/remote-probe.sh`);
    await sshRun(root, inspect, `bash ${REMOTE_ROOT}/remote-setup.sh`);
@@ -2086,6 +2216,7 @@ async function main() {
    };

    await sshRun(root, inspect, `bash ${REMOTE_ROOT}/launch-desktop.sh`);
+    await sshRun(root, inspect, `bash ${REMOTE_ROOT}/authorize-desktop.sh`);
    await sshRun(root, inspect, `bash ${REMOTE_ROOT}/select-desktop-chat.sh`);
    const videoPath = path.join(outputDir, "telegram-user-crabbox-proof.mp4");
    const recording = spawn(
--- a/scripts/e2e/telegram-user-driver.py
+++ b/scripts/e2e/telegram-user-driver.py
@@ -611,6 +611,33 @@ def command_confirm_qr(args):
    )


+def command_terminate_session(args):
+    config, bot_config = load_config()
+    driver = UserDriver(config, bot_config)
+    driver.authorize(argparse.Namespace(timeout_ms=args.timeout_ms))
+    driver.client.request({"@type": "terminateSession", "session_id": int(args.session_id)}, timeout=30)
+    print_result({"ok": True, "sessionId": args.session_id}, args.json, getattr(args, "output", ""))
+
+
+def command_terminate_desktop_sessions(args):
+    config, bot_config = load_config()
+    driver = UserDriver(config, bot_config)
+    driver.authorize(argparse.Namespace(timeout_ms=args.timeout_ms))
+    result = driver.client.request({"@type": "getActiveSessions"}, timeout=30)
+    terminated = []
+    for session in result.get("sessions", []):
+        if session.get("is_current"):
+            continue
+        if session.get("application_name") != "Telegram Desktop":
+            continue
+        session_id = session.get("id")
+        if session_id is None:
+            continue
+        driver.client.request({"@type": "terminateSession", "session_id": int(session_id)}, timeout=30)
+        terminated.append({"id": session_id, "applicationName": session.get("application_name")})
+    print_result({"ok": True, "terminated": terminated}, args.json, getattr(args, "output", ""))
+
+
 def public_user(user):
    return {
        "id": user.get("id"),
@@ -784,6 +811,15 @@ def main():
    confirm_qr.add_argument("--link", required=True)
    confirm_qr.set_defaults(func=command_confirm_qr)

+    terminate_session = sub.add_parser("terminate-session")
+    add_common(terminate_session)
+    terminate_session.add_argument("--session-id", required=True)
+    terminate_session.set_defaults(func=command_terminate_session)
+
+    terminate_desktop_sessions = sub.add_parser("terminate-desktop-sessions")
+    add_common(terminate_desktop_sessions)
+    terminate_desktop_sessions.set_defaults(func=command_terminate_desktop_sessions)
+
    send = sub.add_parser("send")
    add_common(send)
    send.add_argument("--chat", default="")
--- a/scripts/lib/channel-contract-test-plan.mjs
+++ b/scripts/lib/channel-contract-test-plan.mjs
@@ -52,7 +52,7 @@ function resolveContractFileWeight(file) {

 export function createChannelContractTestShards() {
  const rootDir = "src/channels/plugins/contracts";
-  const suffixes = ["a", "b", "c"];
+  const suffixes = ["a", "b"];
  const groups = Object.fromEntries(
    suffixes.map((suffix) => [`checks-fast-contracts-channels-${suffix}`, []]),
  );
--- a/scripts/lib/plugin-contract-test-plan.mjs
+++ b/scripts/lib/plugin-contract-test-plan.mjs
@@ -66,7 +66,7 @@ function resolveContractFileWeight(file) {
 }

 export function createPluginContractTestShards() {
-  const suffixes = ["a", "b", "c", "d"];
+  const suffixes = ["a", "b"];
  const groups = Object.fromEntries(
    suffixes.map((suffix) => [`checks-fast-contracts-plugins-${suffix}`, []]),
  );
--- a/scripts/lib/ts-topology/analyze.ts
+++ b/scripts/lib/ts-topology/analyze.ts
@@ -206,9 +206,15 @@ function collectReferenceEvents(
      if (!clause?.namedBindings) {
        continue;
      }
+      if (clause.isTypeOnly) {
+        continue;
+      }

      if (ts.isNamedImports(clause.namedBindings)) {
        for (const element of clause.namedBindings.elements) {
+          if (element.isTypeOnly) {
+            continue;
+          }
          const importedName = element.propertyName?.text ?? element.name.text;
          const record = recordMap.get(importedName);
          if (!record) {
--- a/scripts/lib/ts-topology/reports.ts
+++ b/scripts/lib/ts-topology/reports.ts
@@ -110,5 +110,9 @@ const reportModules: Record<ReportModule["name"], ReportModule> = {
 };

 export function renderTextReport(envelope: TopologyEnvelope, limit: number): string {
-  return reportModules[envelope.report].describe(envelope, limit);
+  const reportModule = reportModules[envelope.report];
+  if (!reportModule) {
+    throw new Error(`Unsupported topology report: ${envelope.report}`);
+  }
+  return reportModule.describe(envelope, limit);
 }
--- a/scripts/mantis/publish-pr-evidence.mjs
+++ b/scripts/mantis/publish-pr-evidence.mjs
@@ -308,6 +308,47 @@ function laneLine(label, lane) {
  return pieces.join("");
 }

+function hasVisibleProofArtifacts(manifest) {
+  return manifest.artifacts.some((artifact) =>
+    ["desktopScreenshot", "fullVideo", "motionClip", "motionPreview", "timeline"].includes(
+      artifact.kind,
+    ),
+  );
+}
+
+function isSkippedNoVisualProof(manifest) {
+  const comparison = manifest.comparison ?? {};
+  return (
+    !hasVisibleProofArtifacts(manifest) &&
+    comparison.baseline?.status === "skipped" &&
+    comparison.candidate?.status === "skipped"
+  );
+}
+
+function publicSummary(manifest) {
+  if (isSkippedNoVisualProof(manifest)) {
+    return "Mantis did not generate before/after GIFs because this PR does not have a clean Telegram-visible before/after proof in the standard Mantis run.";
+  }
+  return manifest.summary ?? "Mantis captured QA evidence for this scenario.";
+}
+
+function overallStatus(manifest) {
+  if (isSkippedNoVisualProof(manifest)) {
+    return "skipped";
+  }
+  const pass = manifest.comparison?.pass;
+  return typeof pass === "boolean" ? String(pass) : "";
+}
+
+export function shouldPublishPrComment(manifest) {
+  if (!isSkippedNoVisualProof(manifest)) {
+    return true;
+  }
+  return !/(authorization[- ]?error|credential infrastructure|logged[- ]out|login screen|welcome screen|bad telegram session)/iu.test(
+    manifest.summary ?? "",
+  );
+}
+
 export function renderEvidenceComment({
  artifactUrl: actionsArtifactUrl,
  manifest,
@@ -333,7 +374,7 @@ export function renderEvidenceComment({
    marker,
    `## ${manifest.title}`,
    "",
-    `Summary: ${manifest.summary ?? "Mantis captured QA evidence for this scenario."}`,
+    `Summary: ${publicSummary(manifest)}`,
    "",
    `- Scenario: \`${manifest.scenario}\``,
  ];
@@ -354,8 +395,9 @@ export function renderEvidenceComment({
  if (candidateLine) {
    lines.push(candidateLine);
  }
-  if (typeof comparison.pass === "boolean") {
-    lines.push(`- Overall: \`${comparison.pass}\``);
+  const overall = overallStatus(manifest);
+  if (overall) {
+    lines.push(`- Overall: \`${overall}\``);
  }
  lines.push("");

@@ -551,6 +593,10 @@ export async function publishEvidence(rawArgs = process.argv.slice(2)) {
    runUrl: args.run_url,
    treeUrl: published.treeUrl,
  });
+  if (!shouldPublishPrComment(manifest)) {
+    console.log("Skipped Mantis QA evidence PR comment because the run did not capture proof.");
+    return;
+  }
  upsertPrComment({
    body,
    marker: args.marker,
--- a/scripts/qa-coverage-report.ts
+++ b/scripts/qa-coverage-report.ts
@@ -4,6 +4,8 @@ type Options = {
  json?: boolean;
  output?: string;
  repoRoot?: string;
+  summary?: string;
+  tools?: boolean;
 };

 function takeValue(args: string[], index: number, flag: string): string {
@@ -27,6 +29,8 @@ Options:
  --json                Print machine-readable JSON
  --output <path>       Write the report to a file
  --repo-root <path>    Repository root to target
+  --summary <path>      Runtime qa-suite-summary.json to overlay on --tools coverage
+  --tools               Print runtime tool fixture coverage instead of scenario coverage
  -h, --help            Display help
 `);
        process.exit(0);
@@ -41,6 +45,13 @@ Options:
        opts.repoRoot = takeValue(args, index, arg);
        index += 1;
        break;
+      case "--summary":
+        opts.summary = takeValue(args, index, arg);
+        index += 1;
+        break;
+      case "--tools":
+        opts.tools = true;
+        break;
      default:
        throw new Error(`Unknown qa coverage option: ${arg}`);
    }
@@ -53,4 +64,6 @@ await runQaCoverageReportCommand({
  ...(opts.json ? { json: true } : {}),
  ...(opts.output ? { output: opts.output } : {}),
  ...(opts.repoRoot ? { repoRoot: opts.repoRoot } : {}),
+  ...(opts.summary ? { summary: opts.summary } : {}),
+  ...(opts.tools ? { tools: true } : {}),
 });
--- a/scripts/qa-parity-report.ts
+++ b/scripts/qa-parity-report.ts
@@ -7,6 +7,9 @@ type Options = {
  candidateSummary?: string;
  outputDir?: string;
  repoRoot?: string;
+  runtimeAxis?: boolean;
+  summary?: string;
+  tokenEfficiency?: boolean;
 };

 function takeValue(args: string[], index: number, flag: string): string {
@@ -31,6 +34,9 @@ Options:
  --baseline-summary <path>   Baseline qa-suite-summary.json path
  --candidate-label <label>   Candidate display label
  --baseline-label <label>    Baseline display label
+  --runtime-axis              Interpret --summary as a runtime-pair summary
+  --summary <path>            Runtime-axis qa-suite-summary.json path
+  --token-efficiency          Also write the runtime token-efficiency report
  --repo-root <path>          Repository root to target
  --output-dir <path>         Artifact directory for the parity report
  -h, --help                  Display help
@@ -60,6 +66,16 @@ Options:
        opts.repoRoot = takeValue(args, index, arg);
        index += 1;
        break;
+      case "--runtime-axis":
+        opts.runtimeAxis = true;
+        break;
+      case "--summary":
+        opts.summary = takeValue(args, index, arg);
+        index += 1;
+        break;
+      case "--token-efficiency":
+        opts.tokenEfficiency = true;
+        break;
      default:
        throw new Error(`Unknown qa parity-report option: ${arg}`);
    }
@@ -68,18 +84,27 @@ Options:
 }

 const opts = parseArgs(process.argv.slice(2));
-if (!opts.candidateSummary) {
-  throw new Error("--candidate-summary is required.");
-}
-if (!opts.baselineSummary) {
-  throw new Error("--baseline-summary is required.");
+if (opts.runtimeAxis) {
+  if (!opts.summary) {
+    throw new Error("--summary is required when --runtime-axis is set.");
+  }
+} else {
+  if (!opts.candidateSummary) {
+    throw new Error("--candidate-summary is required.");
+  }
+  if (!opts.baselineSummary) {
+    throw new Error("--baseline-summary is required.");
+  }
 }

 await runQaParityReportCommand({
-  baselineSummary: opts.baselineSummary,
-  candidateSummary: opts.candidateSummary,
+  ...(opts.baselineSummary ? { baselineSummary: opts.baselineSummary } : {}),
+  ...(opts.candidateSummary ? { candidateSummary: opts.candidateSummary } : {}),
  ...(opts.baselineLabel ? { baselineLabel: opts.baselineLabel } : {}),
  ...(opts.candidateLabel ? { candidateLabel: opts.candidateLabel } : {}),
  ...(opts.outputDir ? { outputDir: opts.outputDir } : {}),
  ...(opts.repoRoot ? { repoRoot: opts.repoRoot } : {}),
+  ...(opts.runtimeAxis ? { runtimeAxis: opts.runtimeAxis } : {}),
+  ...(opts.summary ? { summary: opts.summary } : {}),
+  ...(opts.tokenEfficiency ? { tokenEfficiency: opts.tokenEfficiency } : {}),
 });
--- a/scripts/sync-labels.ts
+++ b/scripts/sync-labels.ts
@@ -9,17 +9,17 @@ type RepoLabel = {
 };

 const COLOR_BY_PREFIX = new Map<string, string>([
-  ["channel", "DDEBFA"],
-  ["app", "EADFF8"],
-  ["extensions", "EDEDED"],
-  ["plugin", "EDEDED"],
-  ["docs", "CFE3F8"],
-  ["cli", "CFE3F8"],
-  ["gateway", "D9CCF5"],
-  ["commands", "CFE3F8"],
-  ["scripts", "D9CCF5"],
-  ["docker", "DDF4E4"],
-  ["size", "E8C4CB"],
+  ["channel", "0969DA"],
+  ["app", "6E7781"],
+  ["extensions", "6E7781"],
+  ["plugin", "6E7781"],
+  ["docs", "0A3069"],
+  ["cli", "0A3069"],
+  ["gateway", "57606A"],
+  ["commands", "0A3069"],
+  ["scripts", "57606A"],
+  ["docker", "D6E3DA"],
+  ["size", "8C959F"],
 ]);

 const EXTRA_LABEL_METADATA = new Map<
--- a/scripts/sync-openclaw-label-colors.mjs
+++ b/scripts/sync-openclaw-label-colors.mjs
@@ -12,23 +12,25 @@ const COLORS = {
  softerAmber: "F9D65C",
  paleYellow: "F7E7A1",
  saturatedGreen: "0E8A16",
-  mutedGreen: "B8E0B0",
-  paleGreen: "DDF4E4",
-  proofGreen: "C2E0C6",
-  mutedProofGreen: "9BD3A0",
-  overrideGreen: "DDECCF",
+  mutedGreen: "8C959F",
+  paleGreen: "D6E3DA",
+  proofGreen: "2DA44E",
+  mutedProofGreen: "1A7F37",
+  overrideGreen: "2DA44E",
  saturatedBlue: "0F2CCE",
-  paleBlue: "CFE3F8",
-  channelBlue: "DDEBFA",
-  dedupeBlue: "BFD4F2",
-  triageBlue: "D8E8F8",
+  paleBlue: "0A3069",
+  channelBlue: "0969DA",
+  dedupeBlue: "57606A",
+  triageBlue: "0969DA",
  saturatedPurple: "7057FF",
-  mutedPurple: "D9CCF5",
-  appPurple: "EADFF8",
-  neutralGray: "EDEDED",
-  duplicateGray: "CFD3D7",
+  mutedPurple: "57606A",
+  taxonomyGray: "6E7781",
+  taxonomySteel: "57606A",
+  appPurple: "6E7781",
+  neutralGray: "E5E7EB",
+  duplicateGray: "D1D5DB",
  darkGray: "8C8C8C",
-  mutedRose: "E8C4CB",
+  mutedRose: "8C959F",
  mutedRed: "E99695",
  black: "000000",
  white: "FFFFFF",
@@ -43,6 +45,13 @@ const EXACT_COLORS = new Map(
    P1: COLORS.saturatedOrangeRed,
    P2: COLORS.saturatedAmber,
    P3: COLORS.mutedGreen,
+    "rating: 🦀 challenger crab": "1F883D",
+    "rating: 🦞 diamond lobster": "0969DA",
+    "rating: 🐚 platinum hermit": "0F766E",
+    "rating: 🦐 gold shrimp": "B7791F",
+    "rating: 🦪 silver shellfish": "7A828E",
+    "rating: 🧂 unranked krab": "8C2F39",
+    "rating: 🌊 off-meta tidepool": "6E7781",
    "impact:data-loss": COLORS.saturatedRed,
    "impact:security": COLORS.saturatedRed,
    "impact:crash-loop": COLORS.saturatedOrangeRed,
@@ -63,13 +72,13 @@ const EXACT_COLORS = new Map(
    "triage:done": COLORS.mutedGreen,
    "triage:needs-review": COLORS.paleBlue,
    "triage:started": COLORS.mutedPurple,
-    agents: COLORS.mutedPurple,
+    agents: COLORS.taxonomySteel,
    docs: COLORS.paleBlue,
    cli: COLORS.paleBlue,
    commands: COLORS.paleBlue,
    scripts: COLORS.mutedPurple,
    gateway: COLORS.mutedPurple,
-    codex: COLORS.neutralGray,
+    codex: COLORS.taxonomySteel,
    docker: COLORS.paleGreen,
    tui: COLORS.paleGreen,
    "extensions: NEW": COLORS.channelBlue,
@@ -158,13 +167,13 @@ const FAMILY_RULES = [
  {
    family: "extension",
    match: (name) => name.startsWith("extensions: "),
-    color: COLORS.neutralGray,
+    color: COLORS.taxonomyGray,
    reason: "plugin implementation taxonomy should not compete with priority",
  },
  {
    family: "plugin",
    match: (name) => name.startsWith("plugin: "),
-    color: COLORS.neutralGray,
+    color: COLORS.taxonomyGray,
    reason: "plugin taxonomy stays neutral unless it becomes an action gate",
  },
  {
@@ -196,6 +205,9 @@ function exactFamily(name) {
  if (/^P[0-3]$/.test(name)) {
    return "priority";
  }
+  if (name.startsWith("rating:")) {
+    return "rating";
+  }
  if (name.startsWith("impact:")) {
    return "impact";
  }
--- a/src/agents/anthropic-payload-log.test.ts
+++ b/src/agents/anthropic-payload-log.test.ts
@@ -64,4 +64,34 @@ describe("createAnthropicPayloadLogger", () => {
    expect(source.sha256).toBe(crypto.createHash("sha256").update("QUJDRA==").digest("hex"));
    expect(event.payloadDigest).toMatch(/^[a-f0-9]{64}$/u);
  });
+
+  it("sanitizes usage and error fields before writing logs", () => {
+    const lines: string[] = [];
+    const logger = createAnthropicPayloadLogger({
+      env: { OPENCLAW_ANTHROPIC_PAYLOAD_LOG: "1" },
+      writer: {
+        filePath: "memory",
+        write: (line) => lines.push(line),
+        flush: async () => undefined,
+      },
+    });
+
+    logger?.recordUsage(
+      [
+        {
+          role: "assistant",
+          content: "",
+          usage: {
+            input: 1,
+            authorization: "Bearer sk-secret", // pragma: allowlist secret
+          },
+        } as never,
+      ],
+      new Error("failed with Bearer sk-secret"), // pragma: allowlist secret
+    );
+
+    const event = JSON.parse(lines[0]?.trim() ?? "{}") as Record<string, unknown>;
+    expect(event.error).toBe("failed with Bearer <redacted>");
+    expect(event.usage).toEqual({ input: 1 });
+  });
 });
--- a/src/agents/anthropic-payload-log.ts
+++ b/src/agents/anthropic-payload-log.ts
@@ -53,16 +53,18 @@ function getWriter(filePath: string): PayloadLogWriter {

 function formatError(error: unknown): string | undefined {
  if (error instanceof Error) {
-    return error.message;
+    const redacted = sanitizeDiagnosticPayload(error.message);
+    return typeof redacted === "string" ? redacted : error.message;
  }
  if (typeof error === "string") {
-    return error;
+    const redacted = sanitizeDiagnosticPayload(error);
+    return typeof redacted === "string" ? redacted : error;
  }
  if (typeof error === "number" || typeof error === "boolean" || typeof error === "bigint") {
    return String(error);
  }
  if (error && typeof error === "object") {
-    return safeJsonStringify(error) ?? "unknown error";
+    return safeJsonStringify(sanitizeDiagnosticPayload(error)) ?? "unknown error";
  }
  return undefined;
 }
@@ -173,7 +175,7 @@ export function createAnthropicPayloadLogger(params: {
      ...base,
      ts: new Date().toISOString(),
      stage: "usage",
-      usage,
+      usage: sanitizeDiagnosticPayload(usage) as Record<string, unknown>,
      error: errorMessage,
    });
    log.info("anthropic usage", {
--- a/src/agents/apply-patch-update.ts
+++ b/src/agents/apply-patch-update.ts
@@ -53,10 +53,13 @@ function computeReplacements(

    if (chunk.oldLines.length === 0) {
      const insertionIndex =
-        originalLines.length > 0 && originalLines[originalLines.length - 1] === ""
-          ? originalLines.length - 1
-          : originalLines.length;
+        chunk.changeContext && !chunk.isEndOfFile
+          ? lineIndex
+          : originalLines.length > 0 && originalLines[originalLines.length - 1] === ""
+            ? originalLines.length - 1
+            : originalLines.length;
      replacements.push([insertionIndex, 0, chunk.newLines]);
+      lineIndex = insertionIndex;
      continue;
    }

--- a/src/agents/apply-patch.test.ts
+++ b/src/agents/apply-patch.test.ts
@@ -131,6 +131,57 @@ describe("applyPatch", () => {
    expect(result.summary.modified).toEqual(["dest.txt"]);
  });

+  it("updates in place when move target resolves to the source file", async () => {
+    const memory = createMemoryPatchSandbox({
+      "source.txt": "foo\nbar\n",
+    });
+    const patch = `*** Begin Patch
+*** Update File: source.txt
+*** Move to: ./source.txt
+@@
+ foo
+-bar
+baz
+*** End Patch`;
+
+    const result = await applyPatch(patch, memory.options);
+
+    expect(memory.files.get("/sandbox/source.txt")).toBe("foo\nbaz\n");
+    expect(result.summary.modified).toEqual(["source.txt"]);
+  });
+
+  it("applies context-only insertions at the requested context", async () => {
+    const memory = createMemoryPatchSandbox({
+      "source.txt": "alpha\nanchor\nomega\n",
+    });
+    const patch = `*** Begin Patch
+*** Update File: source.txt
+@@ anchor
+inserted
+*** End Patch`;
+
+    await applyPatch(patch, memory.options);
+
+    expect(memory.files.get("/sandbox/source.txt")).toBe("alpha\nanchor\ninserted\nomega\n");
+  });
+
+  it("keeps later insertion contexts in original file coordinates", async () => {
+    const memory = createMemoryPatchSandbox({
+      "source.txt": "a\nb\nc\n",
+    });
+    const patch = `*** Begin Patch
+*** Update File: source.txt
+@@ a
+after-a
+@@ b
+after-b
+*** End Patch`;
+
+    await applyPatch(patch, memory.options);
+
+    expect(memory.files.get("/sandbox/source.txt")).toBe("a\nafter-a\nb\nafter-b\nc\n");
+  });
+
  it("supports end-of-file inserts", async () => {
    const memory = createMemoryPatchSandbox({
      "end.txt": "line1\n",
--- a/src/agents/apply-patch.ts
+++ b/src/agents/apply-patch.ts
@@ -175,9 +175,21 @@ export async function applyPatch(
      const moveTarget = await resolvePatchPath(hunk.movePath, options);
      await assertPatchParentPath(hunk.movePath, options);
      await ensureDir(moveTarget.resolved, fileOps);
-      await fileOps.writeFile(moveTarget.resolved, applied);
-      await fileOps.remove(target.resolved);
-      recordSummary(summary, seen, "modified", moveTarget.display);
+      const moveResolvesToSource =
+        path.resolve(moveTarget.resolved) === path.resolve(target.resolved);
+      await fileOps.writeFile(
+        moveResolvesToSource ? target.resolved : moveTarget.resolved,
+        applied,
+      );
+      if (!moveResolvesToSource) {
+        await fileOps.remove(target.resolved);
+      }
+      recordSummary(
+        summary,
+        seen,
+        "modified",
+        moveResolvesToSource ? target.display : moveTarget.display,
+      );
    } else {
      await fileOps.writeFile(target.resolved, applied);
      recordSummary(summary, seen, "modified", target.display);
--- a/src/agents/bundle-mcp.test-harness.ts
+++ b/src/agents/bundle-mcp.test-harness.ts
@@ -172,12 +172,17 @@ const transport = new StdioClientTransport({
 });
 const client = new Client({ name: "fake-claude", version: "1.0.0" });
 await client.connect(transport);
-const tools = await client.listTools();
-if (!tools.tools.some((tool) => tool.name === "bundle_probe")) {
-  throw new Error("bundle_probe tool not exposed");
-}
-const result = await client.callTool({ name: "bundle_probe", arguments: {} });
-await transport.close();
+const result = await (async () => {
+  try {
+    const tools = await client.listTools();
+    if (!tools.tools.some((tool) => tool.name === "bundle_probe")) {
+      throw new Error("bundle_probe tool not exposed");
+    }
+    return await client.callTool({ name: "bundle_probe", arguments: {} });
+  } finally {
+    await transport.close();
+  }
+})();

 const text = Array.isArray(result.content)
  ? result.content
--- a/src/agents/pi-embedded-runner/run/payloads.test.ts
+++ b/src/agents/pi-embedded-runner/run/payloads.test.ts
@@ -363,6 +363,23 @@ describe("buildEmbeddedRunPayloads tool-error warnings", () => {
    });
  });

+  it("surfaces declined Codex native command errors for aborted empty turns", () => {
+    const payloads = buildPayloads({
+      assistantTexts: [],
+      lastToolError: {
+        toolName: "bash",
+        error: "codex native tool blocked",
+        mutatingAction: true,
+      },
+      runAborted: true,
+    });
+
+    expectSingleToolErrorPayload(payloads, {
+      title: "Bash",
+      absentDetail: "codex native tool blocked",
+    });
+  });
+
  it("surfaces exec tool errors for cron sessions even when verbose mode is off", () => {
    const payloads = buildPayloads({
      lastToolError: {
--- a/src/commands/agent-via-gateway.test.ts
+++ b/src/commands/agent-via-gateway.test.ts
@@ -171,45 +171,6 @@ describe("agentCliCommand", () => {
    });
  });

-  it("rejects timeout values with junk suffixes", async () => {
-    await withTempStore(async () => {
-      await expect(
-        agentCliCommand({ message: "hi", to: "+1555", timeout: "10wat" }, runtime),
-      ).rejects.toThrow(
-        "Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
-      );
-
-      expect(callGateway).not.toHaveBeenCalled();
-      expect(agentCommand).not.toHaveBeenCalled();
-    });
-  });
-
-  it("rejects fractional timeout values", async () => {
-    await withTempStore(async () => {
-      await expect(
-        agentCliCommand({ message: "hi", to: "+1555", timeout: "1.5" }, runtime),
-      ).rejects.toThrow(
-        "Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
-      );
-
-      expect(callGateway).not.toHaveBeenCalled();
-      expect(agentCommand).not.toHaveBeenCalled();
-    });
-  });
-
-  it("rejects blank timeout values instead of disabling the timeout", async () => {
-    await withTempStore(async () => {
-      await expect(
-        agentCliCommand({ message: "hi", to: "+1555", timeout: " " }, runtime),
-      ).rejects.toThrow(
-        "Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.",
-      );
-
-      expect(callGateway).not.toHaveBeenCalled();
-      expect(agentCommand).not.toHaveBeenCalled();
-    });
-  });
-
  it("uses gateway by default", async () => {
    await withTempStore(async () => {
      mockGatewaySuccessReply();
--- a/src/commands/agent-via-gateway.ts
+++ b/src/commands/agent-via-gateway.ts
@@ -71,20 +71,16 @@ function protectJsonStdout(opts: Pick<AgentCliOpts, "json">): void {
 }

 function parseTimeoutSeconds(opts: { cfg: OpenClawConfig; timeout?: string }) {
-  const raw = opts.timeout !== undefined ? opts.timeout.trim() : undefined;
-  if (raw !== undefined && !/^\d+$/.test(raw)) {
+  const raw =
+    opts.timeout !== undefined
+      ? Number.parseInt(opts.timeout, 10)
+      : (opts.cfg.agents?.defaults?.timeoutSeconds ?? 600);
+  if (Number.isNaN(raw) || raw < 0) {
    throw new Error(
      `Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.`,
    );
  }
-  const parsed =
-    raw !== undefined ? Number(raw) : (opts.cfg.agents?.defaults?.timeoutSeconds ?? 600);
-  if (!Number.isInteger(parsed) || parsed < 0) {
-    throw new Error(
-      `Invalid --timeout. Use seconds as a non-negative integer, for example --timeout 600. Use --timeout 0 to disable the timeout.`,
-    );
-  }
-  return parsed;
+  return raw;
 }

 function formatPayloadForLog(payload: {
--- a/src/commands/channels.logs.test.ts
+++ b/src/commands/channels.logs.test.ts
@@ -87,54 +87,6 @@ describe("channelsLogsCommand", () => {
    expect(payload.lines.map((line) => line.message)).toEqual(["external sent"]);
  });

-  it("rejects unknown channel filters instead of falling back to all logs", async () => {
-    await fs.writeFile(
-      logPath,
-      [
-        logLine({ module: "gateway/channels/external-chat/send", message: "external sent" }),
-        logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
-      ].join("\n"),
-    );
-
-    await channelsLogsCommand({ channel: "typo", json: true }, runtime);
-
-    expect(runtime.error).toHaveBeenCalledWith(
-      'Unknown channel "typo" for logs. Run `openclaw channels list --all` to see configured and installable channels.',
-    );
-    expect(runtime.exit).toHaveBeenCalledWith(1);
-    expect(runtime.log).not.toHaveBeenCalled();
-  });
-
-  it("rejects invalid line limits instead of silently using the default", async () => {
-    await fs.writeFile(
-      logPath,
-      logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
-    );
-
-    await channelsLogsCommand({ channel: "slack", lines: "wat", json: true }, runtime);
-
-    expect(runtime.error).toHaveBeenCalledWith(
-      "Invalid --lines. Use a positive integer, for example --lines 200.",
-    );
-    expect(runtime.exit).toHaveBeenCalledWith(1);
-    expect(runtime.log).not.toHaveBeenCalled();
-  });
-
-  it("rejects fractional line limits instead of truncating", async () => {
-    await fs.writeFile(
-      logPath,
-      logLine({ module: "gateway/channels/slack/send", message: "slack sent" }),
-    );
-
-    await channelsLogsCommand({ channel: "slack", lines: "2.5", json: true }, runtime);
-
-    expect(runtime.error).toHaveBeenCalledWith(
-      "Invalid --lines. Use a positive integer, for example --lines 200.",
-    );
-    expect(runtime.exit).toHaveBeenCalledWith(1);
-    expect(runtime.log).not.toHaveBeenCalled();
-  });
-
  it("falls back to the latest rolling log when the configured rolling file is missing", async () => {
    const configuredFile = path.join(tempDir, "openclaw-2026-04-26.log");
    const fallbackFile = path.join(tempDir, "openclaw-2026-04-25.log");
--- a/src/commands/channels/logs.ts
+++ b/src/commands/channels/logs.ts
@@ -1,6 +1,5 @@
 import fs from "node:fs/promises";
 import { normalizeChannelId as normalizeBundledChannelId } from "../../channels/registry.js";
-import { formatUnknownChannelMessage } from "../../cli/error-format.js";
 import { getResolvedLoggerSettings } from "../../logging.js";
 import { resolveLogFile } from "../../logging/log-tail.js";
 import { parseLogLine } from "../../logging/parse-log-line.js";
@@ -38,19 +37,7 @@ function parseChannelFilter(raw?: string) {
  if (bundled) {
    return bundled;
  }
-  return listManifestChannelIds().has(trimmed) ? trimmed : null;
-}
-
-function parseLineLimit(raw: string | number | undefined): number | null {
-  if (raw === undefined) {
-    return DEFAULT_LIMIT;
-  }
-  const value = typeof raw === "string" ? raw.trim() : String(raw);
-  if (!/^\d+$/.test(value)) {
-    return null;
-  }
-  const parsed = Number(value);
-  return Number.isSafeInteger(parsed) && parsed > 0 ? parsed : null;
+  return listManifestChannelIds().has(trimmed) ? trimmed : "all";
 }

 function matchesChannel(line: NonNullable<LogLine>, channel: string) {
@@ -104,22 +91,11 @@ export async function channelsLogsCommand(
  runtime: RuntimeEnv = defaultRuntime,
 ) {
  const channel = parseChannelFilter(opts.channel);
-  if (!channel) {
-    runtime.error(
-      formatUnknownChannelMessage({
-        channel: opts.channel ?? "",
-        purpose: "logs",
-      }),
-    );
-    runtime.exit(1);
-    return;
-  }
-  const limit = parseLineLimit(opts.lines);
-  if (limit === null) {
-    runtime.error("Invalid --lines. Use a positive integer, for example --lines 200.");
-    runtime.exit(1);
-    return;
-  }
+  const limitRaw = typeof opts.lines === "string" ? Number(opts.lines) : opts.lines;
+  const limit =
+    typeof limitRaw === "number" && Number.isFinite(limitRaw) && limitRaw > 0
+      ? Math.floor(limitRaw)
+      : DEFAULT_LIMIT;

  const file = await resolveLogFile(getResolvedLoggerSettings().file);
  const rawLines = await readTailLines(file, limit * 4);
--- a/src/commands/configure.commands.test.ts
+++ b/src/commands/configure.commands.test.ts
@@ -1,75 +0,0 @@
-import { beforeEach, describe, expect, it, vi } from "vitest";
-import { formatCliCommand } from "../cli/command-format.js";
-import type { RuntimeEnv } from "../runtime.js";
-import { CONFIGURE_WIZARD_SECTIONS } from "./configure.shared.js";
-
-const mocks = vi.hoisted(() => ({
-  runConfigureWizard: vi.fn(async () => {}),
-}));
-
-vi.mock("./configure.wizard.js", () => ({
-  runConfigureWizard: mocks.runConfigureWizard,
-}));
-
-import { configureCommandFromSectionsArg } from "./configure.commands.js";
-
-function makeRuntime(): RuntimeEnv {
-  return {
-    log: vi.fn(),
-    error: vi.fn(),
-    exit: vi.fn() as unknown as RuntimeEnv["exit"],
-  };
-}
-
-describe("configureCommandFromSectionsArg", () => {
-  beforeEach(() => {
-    vi.clearAllMocks();
-  });
-
-  it("runs the full configure wizard when no sections are provided", async () => {
-    const runtime = makeRuntime();
-
-    await configureCommandFromSectionsArg(undefined, runtime);
-
-    expect(mocks.runConfigureWizard).toHaveBeenCalledWith({ command: "configure" }, runtime);
-    expect(runtime.error).not.toHaveBeenCalled();
-    expect(runtime.exit).not.toHaveBeenCalled();
-  });
-
-  it("runs only the requested valid sections", async () => {
-    const runtime = makeRuntime();
-
-    await configureCommandFromSectionsArg(["gateway", "model"], runtime);
-
-    expect(mocks.runConfigureWizard).toHaveBeenCalledWith(
-      { command: "configure", sections: ["gateway", "model"] },
-      runtime,
-    );
-    expect(runtime.error).not.toHaveBeenCalled();
-    expect(runtime.exit).not.toHaveBeenCalled();
-  });
-
-  it("rejects invalid-only section input instead of falling back to the full wizard", async () => {
-    const runtime = makeRuntime();
-
-    await configureCommandFromSectionsArg(["typo"], runtime);
-
-    expect(runtime.error).toHaveBeenCalledWith(
-      `Invalid --section: typo. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
-    );
-    expect(runtime.exit).toHaveBeenCalledWith(1);
-    expect(mocks.runConfigureWizard).not.toHaveBeenCalled();
-  });
-
-  it("rejects mixed valid and invalid section input without running a partial wizard", async () => {
-    const runtime = makeRuntime();
-
-    await configureCommandFromSectionsArg(["gateway", "bogus"], runtime);
-
-    expect(runtime.error).toHaveBeenCalledWith(
-      `Invalid --section: bogus. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
-    );
-    expect(runtime.exit).toHaveBeenCalledWith(1);
-    expect(mocks.runConfigureWizard).not.toHaveBeenCalled();
-  });
-});
--- a/src/commands/configure.commands.ts
+++ b/src/commands/configure.commands.ts
@@ -21,6 +21,11 @@ export async function configureCommandFromSectionsArg(
  runtime: RuntimeEnv = defaultRuntime,
 ): Promise<void> {
  const { sections, invalid } = parseConfigureWizardSections(rawSections);
+  if (sections.length === 0) {
+    await configureCommand(runtime);
+    return;
+  }
+
  if (invalid.length > 0) {
    runtime.error(
      `Invalid --section: ${invalid.join(", ")}. Expected one of: ${CONFIGURE_WIZARD_SECTIONS.join(", ")}. Run ${formatCliCommand("openclaw configure")} without --section to use the full wizard.`,
@@ -29,10 +34,5 @@ export async function configureCommandFromSectionsArg(
    return;
  }

-  if (sections.length === 0) {
-    await configureCommand(runtime);
-    return;
-  }
-
  await configureCommandWithSections(sections as never, runtime);
 }
--- a/src/commands/doctor-browser.facade.test.ts
+++ b/src/commands/doctor-browser.facade.test.ts
@@ -1,6 +1,10 @@
 import { beforeEach, describe, expect, it, vi } from "vitest";
 import type { OpenClawConfig } from "../config/config.js";
-import { noteChromeMcpBrowserReadiness } from "./doctor-browser.js";
+import {
+  detectLegacyClawdBrowserProfileResidue,
+  maybeArchiveLegacyClawdBrowserProfileResidue,
+  noteChromeMcpBrowserReadiness,
+} from "./doctor-browser.js";

 const loadBundledPluginPublicSurfaceModuleSync = vi.hoisted(() => vi.fn());

@@ -44,6 +48,112 @@ describe("doctor browser facade", () => {
    expect(noteFn).not.toHaveBeenCalled();
  });

+  it("delegates legacy clawd browser profile detection to the browser facade surface", async () => {
+    const residue = {
+      legacyProfileDir: "/tmp/openclaw-home/browser/clawd",
+      legacyUserDataDir: "/tmp/openclaw-home/browser/clawd/user-data",
+      canonicalUserDataDir: "/tmp/openclaw-home/browser/openclaw/user-data",
+    };
+    const detect = vi.fn().mockReturnValue(residue);
+    loadBundledPluginPublicSurfaceModuleSync.mockReturnValue({
+      noteChromeMcpBrowserReadiness: vi.fn(),
+      detectLegacyClawdBrowserProfileResidue: detect,
+    });
+    const cfg: OpenClawConfig = {
+      browser: {
+        profiles: {
+          openclaw: { color: "#FF4500" },
+        },
+      },
+    };
+    const deps = {
+      configDir: "/tmp/openclaw-home",
+      pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
+    };
+
+    await expect(detectLegacyClawdBrowserProfileResidue(cfg, deps)).resolves.toEqual(residue);
+    expect(loadBundledPluginPublicSurfaceModuleSync).toHaveBeenCalledWith({
+      dirName: "browser",
+      artifactBasename: "browser-doctor.js",
+    });
+    expect(detect).toHaveBeenCalledWith(cfg, deps);
+  });
+
+  it("delegates legacy clawd browser profile cleanup to the browser facade surface", async () => {
+    const cleanup = vi.fn().mockResolvedValue({ changes: ["archived"], warnings: [] });
+    loadBundledPluginPublicSurfaceModuleSync.mockReturnValue({
+      noteChromeMcpBrowserReadiness: vi.fn(),
+      maybeArchiveLegacyClawdBrowserProfileResidue: cleanup,
+    });
+
+    const cfg: OpenClawConfig = {
+      browser: {
+        profiles: {
+          openclaw: { color: "#FF4500" },
+        },
+      },
+    };
+    const deps = {
+      configDir: "/tmp/openclaw-home",
+      pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
+    };
+
+    await expect(maybeArchiveLegacyClawdBrowserProfileResidue(cfg, deps)).resolves.toEqual({
+      changes: ["archived"],
+      warnings: [],
+    });
+    expect(loadBundledPluginPublicSurfaceModuleSync).toHaveBeenCalledWith({
+      dirName: "browser",
+      artifactBasename: "browser-doctor.js",
+    });
+    expect(cleanup).toHaveBeenCalledWith(cfg, deps);
+  });
+
+  it("warns when browser profile cleanup surface is unavailable", async () => {
+    loadBundledPluginPublicSurfaceModuleSync.mockImplementation(() => {
+      throw new Error("missing browser doctor facade");
+    });
+
+    await expect(
+      maybeArchiveLegacyClawdBrowserProfileResidue(
+        {},
+        {
+          configDir: "/tmp/openclaw-home",
+          pathExists: (targetPath: string) => targetPath === "/tmp/openclaw-home/browser/clawd",
+        },
+      ),
+    ).resolves.toEqual({
+      changes: [],
+      warnings: ["Browser profile cleanup is unavailable: missing browser doctor facade"],
+    });
+  });
+
+  it("skips loading the browser residue detection surface when legacy residue is absent", async () => {
+    await expect(
+      detectLegacyClawdBrowserProfileResidue(
+        {},
+        {
+          configDir: "/tmp/openclaw-home",
+          pathExists: () => false,
+        },
+      ),
+    ).resolves.toBeNull();
+    expect(loadBundledPluginPublicSurfaceModuleSync).not.toHaveBeenCalled();
+  });
+
+  it("skips loading the browser cleanup surface when legacy residue is absent", async () => {
+    await expect(
+      maybeArchiveLegacyClawdBrowserProfileResidue(
+        {},
+        {
+          configDir: "/tmp/openclaw-home",
+          pathExists: () => false,
+        },
+      ),
+    ).resolves.toEqual({ changes: [], warnings: [] });
+    expect(loadBundledPluginPublicSurfaceModuleSync).not.toHaveBeenCalled();
+  });
+
  it("warns and no-ops when the browser doctor surface is unavailable", async () => {
    loadBundledPluginPublicSurfaceModuleSync.mockImplementation(() => {
      throw new Error("missing browser doctor facade");
--- a/src/commands/doctor-browser.ts
+++ b/src/commands/doctor-browser.ts
@@ -1,6 +1,9 @@
+import fs from "node:fs";
+import path from "node:path";
 import type { OpenClawConfig } from "../config/types.openclaw.js";
 import { loadBundledPluginPublicSurfaceModuleSync } from "../plugin-sdk/facade-loader.js";
 import { note } from "../terminal/note.js";
+import { resolveConfigDir } from "../utils.js";

 type BrowserDoctorDeps = {
  platform?: NodeJS.Platform;
@@ -13,10 +16,33 @@ type BrowserDoctorDeps = {
  ) => { path: string } | null;
  resolveChromeExecutable?: (platform: NodeJS.Platform) => { path: string } | null;
  readVersion?: (executablePath: string) => string | null;
+  configDir?: string;
+  pathExists?: (targetPath: string) => boolean;
+};
+
+export type BrowserDoctorRepairDeps = {
+  env?: NodeJS.ProcessEnv;
+  configDir?: string;
+  pathExists?: (targetPath: string) => boolean;
+  movePathToTrash?: (targetPath: string) => Promise<string>;
+};
+
+export type LegacyClawdBrowserProfileResidue = {
+  legacyProfileDir: string;
+  legacyUserDataDir: string;
+  canonicalUserDataDir: string;
 };

 type BrowserDoctorSurface = {
  noteChromeMcpBrowserReadiness: (cfg: OpenClawConfig, deps?: BrowserDoctorDeps) => Promise<void>;
+  detectLegacyClawdBrowserProfileResidue?: (
+    cfg: OpenClawConfig,
+    deps?: BrowserDoctorRepairDeps,
+  ) => LegacyClawdBrowserProfileResidue | null;
+  maybeArchiveLegacyClawdBrowserProfileResidue?: (
+    cfg: OpenClawConfig,
+    deps?: BrowserDoctorRepairDeps,
+  ) => Promise<{ changes: string[]; warnings: string[] }>;
 };

 function loadBrowserDoctorSurface(): BrowserDoctorSurface {
@@ -26,6 +52,18 @@ function loadBrowserDoctorSurface(): BrowserDoctorSurface {
  });
 }

+function mayHaveLegacyClawdBrowserProfileResidue(deps?: BrowserDoctorRepairDeps): boolean {
+  const configDir = deps?.configDir ?? resolveConfigDir(deps?.env ?? process.env);
+  const legacyProfileDir = path.join(configDir, "browser", "clawd");
+  const legacyUserDataDir = path.join(legacyProfileDir, "user-data");
+  const pathExists = deps?.pathExists ?? fs.existsSync;
+  try {
+    return pathExists(legacyProfileDir) || pathExists(legacyUserDataDir);
+  } catch {
+    return true;
+  }
+}
+
 export async function noteChromeMcpBrowserReadiness(cfg: OpenClawConfig, deps?: BrowserDoctorDeps) {
  try {
    await loadBrowserDoctorSurface().noteChromeMcpBrowserReadiness(cfg, deps);
@@ -35,3 +73,39 @@ export async function noteChromeMcpBrowserReadiness(cfg: OpenClawConfig, deps?:
    noteFn(`- Browser health check is unavailable: ${message}`, "Browser");
  }
 }
+
+export async function detectLegacyClawdBrowserProfileResidue(
+  cfg: OpenClawConfig,
+  deps?: BrowserDoctorRepairDeps,
+): Promise<LegacyClawdBrowserProfileResidue | null> {
+  if (!mayHaveLegacyClawdBrowserProfileResidue(deps)) {
+    return null;
+  }
+  const detect = loadBrowserDoctorSurface().detectLegacyClawdBrowserProfileResidue;
+  if (!detect) {
+    return null;
+  }
+  return detect(cfg, deps);
+}
+
+export async function maybeArchiveLegacyClawdBrowserProfileResidue(
+  cfg: OpenClawConfig,
+  deps?: BrowserDoctorRepairDeps,
+): Promise<{ changes: string[]; warnings: string[] }> {
+  if (!mayHaveLegacyClawdBrowserProfileResidue(deps)) {
+    return { changes: [], warnings: [] };
+  }
+  try {
+    const repair = loadBrowserDoctorSurface().maybeArchiveLegacyClawdBrowserProfileResidue;
+    if (!repair) {
+      return { changes: [], warnings: [] };
+    }
+    return await repair(cfg, deps);
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    return {
+      changes: [],
+      warnings: [`Browser profile cleanup is unavailable: ${message}`],
+    };
+  }
+}
--- a/src/commands/doctor-lint.test.ts
+++ b/src/commands/doctor-lint.test.ts
@@ -65,7 +65,7 @@ describe("runDoctorLintCli", () => {

      expect(exitCode).toBe(0);
      expect(String(stdout.mock.calls[0]?.[0])).toBe(
-        "doctor --lint: ran 5 check(s), 0 finding(s)\n",
+        "doctor --lint: ran 6 check(s), 0 finding(s)\n",
      );
      expect(String(stdout.mock.calls[1]?.[0])).toBe("  no findings\n");
    } finally {
--- a/src/commands/doctor.fast-path-mocks.ts
+++ b/src/commands/doctor.fast-path-mocks.ts
@@ -9,6 +9,11 @@ vi.mock("./doctor-bootstrap-size.js", () => ({
 }));

 vi.mock("./doctor-browser.js", () => ({
+  detectLegacyClawdBrowserProfileResidue: vi.fn().mockResolvedValue(null),
+  maybeArchiveLegacyClawdBrowserProfileResidue: vi.fn().mockResolvedValue({
+    changes: [],
+    warnings: [],
+  }),
  noteChromeMcpBrowserReadiness: vi.fn().mockResolvedValue(undefined),
 }));

--- a/src/commands/models/scan.test.ts
+++ b/src/commands/models/scan.test.ts
@@ -148,30 +148,4 @@ describe("models scan command", () => {

    expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
  });
-
-  it("rejects fractional count options before scanning", async () => {
-    const runtime = createRuntime();
-
-    await expect(modelsScanCommand({ maxCandidates: "1.5" }, runtime)).rejects.toThrow(
-      "--max-candidates must be a positive integer",
-    );
-    await expect(modelsScanCommand({ concurrency: "2.5" }, runtime)).rejects.toThrow(
-      "--concurrency must be a positive integer",
-    );
-
-    expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
-  });
-
-  it("rejects blank count options before scanning", async () => {
-    const runtime = createRuntime();
-
-    await expect(modelsScanCommand({ maxCandidates: "" }, runtime)).rejects.toThrow(
-      "--max-candidates must be a positive integer",
-    );
-    await expect(modelsScanCommand({ concurrency: "" }, runtime)).rejects.toThrow(
-      "--concurrency must be a positive integer",
-    );
-
-    expect(mocks.scanOpenRouterModels).not.toHaveBeenCalled();
-  });
 });
--- a/src/commands/models/scan.ts
+++ b/src/commands/models/scan.ts
@@ -153,21 +153,6 @@ function printScanTable(results: ModelScanResult[], runtime: RuntimeEnv) {
  }
 }

-function parsePositiveIntegerOption(
-  raw: string | undefined,
-  fallback: number | undefined,
-): number | undefined {
-  if (raw === undefined) {
-    return fallback;
-  }
-  const trimmed = raw.trim();
-  if (!/^\d+$/.test(trimmed)) {
-    return undefined;
-  }
-  const value = Number(trimmed);
-  return Number.isInteger(value) && value > 0 ? value : undefined;
-}
-
 export async function modelsScanCommand(
  opts: {
    minParams?: string;
@@ -193,17 +178,17 @@ export async function modelsScanCommand(
  if (maxAgeDays !== undefined && (!Number.isFinite(maxAgeDays) || maxAgeDays < 0)) {
    throw new Error("--max-age-days must be >= 0");
  }
-  const maxCandidates = parsePositiveIntegerOption(opts.maxCandidates, 6);
-  if (maxCandidates === undefined) {
-    throw new Error("--max-candidates must be a positive integer");
+  const maxCandidates = opts.maxCandidates ? Number(opts.maxCandidates) : 6;
+  if (!Number.isFinite(maxCandidates) || maxCandidates <= 0) {
+    throw new Error("--max-candidates must be > 0");
  }
  const timeout = opts.timeout ? Number(opts.timeout) : undefined;
  if (timeout !== undefined && (!Number.isFinite(timeout) || timeout <= 0)) {
    throw new Error("--timeout must be > 0");
  }
-  const concurrency = parsePositiveIntegerOption(opts.concurrency, undefined);
-  if (opts.concurrency !== undefined && concurrency === undefined) {
-    throw new Error("--concurrency must be a positive integer");
+  const concurrency = opts.concurrency ? Number(opts.concurrency) : undefined;
+  if (concurrency !== undefined && (!Number.isFinite(concurrency) || concurrency <= 0)) {
+    throw new Error("--concurrency must be > 0");
  }

  const requestedProbe = opts.probe ?? true;
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Patrick Erichsen	c4c364cd27	fix: mark approval gateway calls as runtime clients	2026-05-17 21:24:42 -07:00
Peter Steinberger	5980c0d807	fix: wrap Mac menu gateway errors	2026-05-18 05:21:19 +01:00
Ayaan Zaidi	1c778f7afb	fix(telegram): repair desktop proof login	2026-05-18 09:49:21 +05:30
Peter Steinberger	84b34519a8	fix: preflight remote skill bin probes	2026-05-18 05:19:02 +01:00
Peter Steinberger	71ed6526b1	ci: reduce aggregate runner jobs	2026-05-18 04:53:40 +01:00
Peter Steinberger	8483d03375	fix(gateway): preserve spawned sessions in configured lists	2026-05-18 04:38:14 +01:00
Peter Steinberger	696b4863c3	chore: quiet autoreview default fallback	2026-05-18 04:37:19 +01:00
Vincent Koc	a642ca9a89	ci(qa-lab): schedule live token efficiency artifacts	2026-05-18 11:33:13 +08:00
Vincent Koc	1300b22630	fix(qa-lab): classify runtime token efficiency	2026-05-18 11:09:08 +08:00
Peter Steinberger	29653e4106	fix: harden Mac gateway transport selection	2026-05-18 04:06:17 +01:00
Peter Steinberger	1ba3368fa6	fix: clean up Mac settings sidebar controls	2026-05-18 04:06:17 +01:00
Vincent Koc	4dec9679e6	fix(qa-lab): gate missing runtime tool coverage	2026-05-18 11:00:20 +08:00
Ayaan Zaidi	1ab84b4327	docs(changelog): note telegram 421 retry (#48908 ) (thanks @MarsDoge)	2026-05-18 08:28:27 +05:30
Dongyan Qian	63b728de43	fix(telegram): retry 421 misdirected request responses Treat Telegram HTTP 421 / Misdirected Request responses as retryable transport failures in both the default channel API retry policy and the strict outbound send retry path. Wire the 421 handling into isSafeToRetrySendError so non-idempotent Telegram send operations can retry this edge-node rejection without enabling broad ambiguous network retries, and add regression coverage for the default retry path plus strict send predicate handling.	2026-05-18 08:28:27 +05:30
Vincent Koc	73ca3cf3c3	test: tolerate optional ACP cron live timeout	2026-05-18 10:55:13 +08:00
Peter Steinberger	11d7499db1	feat: extend autoreview fallback reviewers	2026-05-18 03:49:23 +01:00
Galin Iliev	ad55d486ce	fix(github-copilot): sanitize unsafe reasoning replay ids (#83221 ) Fixes #83220.	2026-05-17 19:48:27 -07:00
Gio Della-Libera	1b5bc33161	fix(doctor): archive legacy clawd browser profile residue (#83230 ) * fix(doctor): archive legacy clawd browser profile residue * Avoid browser cleanup load without residue Doctor --fix now skips loading the browser doctor facade unless the legacy browser/clawd profile path exists, preventing broad config repair tests from paying the plugin load cost when there is nothing to archive. * Use structured health check for browser residue Register the legacy clawd browser profile residue cleanup through the modern doctor health-check contract so doctor --lint can report it and doctor --fix repairs it through structured effects.	2026-05-17 19:45:03 -07:00
Gio Della-Libera	bcbe8b6299	fix(codex): surface declined native tool replies (#83108 )	2026-05-17 19:43:19 -07:00
Galin Iliev	bc4f27c89a	ci: skip changelog-only workflow runs (#83215 ) Summary Problem: root CHANGELOG.md updates currently cause broad pull request and push workflow activity, including CI and workflow sanity fanout, even though changelog-only edits do not touch product, runtime, docs site, or workflow logic. Why it matters: the PR workflow (review, prepare, and land) can add or adjust CHANGELOG.md entries while processing otherwise-ready PRs. Those changelog-only updates retrigger gates, delay landing, and create avoidable contention when several PRs are being landed close together. What changed: CI now ignores pull requests whose only changed path is CHANGELOG.md; Workflow Sanity ignores changelog-only pull requests and main-branch pushes; Docs keeps its markdown/docs trigger but excludes root CHANGELOG.md from the push path set. What did NOT change (scope boundary): metadata-only automation such as labelers, auto-response, real behavior proof, or external GitHub apps can still run on PR events because those workflows are event-driven rather than file-scope CI. Other markdown files, docs files, and workflow files still trigger their existing checks.	2026-05-17 19:29:45 -07:00
Ayaan Zaidi	6baa2b38b2	ci(mantis): make telegram proof skips public-safe	2026-05-18 07:54:11 +05:30
Peter Steinberger	48f7db23f0	fix: harden clawpatch-reported edge cases	2026-05-18 03:18:55 +01:00
Tak Hoffman	816fbe0cf0	chore(labels): cool label palette (#83374 ) * chore(labels): cool label palette * chore(labels): soften taxonomy colors * chore(labels): finalize label palette * chore(labels): harden final palette	2026-05-17 21:12:10 -05:00
Peter Steinberger	69cea57f69	fix(telegram): fail closed on missing topic threads (#83381 ) * fix(telegram): fail closed on missing topic threads * docs(changelog): reference telegram topic cleanup	2026-05-18 03:07:12 +01:00
Vincent Koc	58e1351863	fix(qa-lab): hard gate runtime tool coverage	2026-05-18 10:05:04 +08:00