refactor(codex): simplify native context ownership

refactor(agents): isolate native hook provider policy
test(codex): type detached delivery fixture
2026-06-18 03:52:42 +08:00 · 2026-06-15 16:16:21 +02:00 · 2026-06-14 10:40:31 -07:00 · 2026-06-14 10:09:08 -07:00 · 2026-06-14 09:56:36 -07:00 · 2026-06-14 09:53:14 -07:00
3173 changed files with 67501 additions and 222524 deletions
--- a/.agents/skills/autoreview/SKILL.md
+++ b/.agents/skills/autoreview/SKILL.md
@@ -24,7 +24,7 @@ Use when:
 - Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
 - When an accepted finding shows a bug class or repeated pattern, inspect the current PR scope for sibling instances before fixing.
 - Fix the scoped bug class at once when practical; stop at touched surfaces, owner boundaries, and clear follow-up territory.
- Keep going until structured review returns no accepted/actionable findings only while the work remains inside the original task scope.
+- Keep going until structured review returns no accepted/actionable findings.
 - If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
 - For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
 - Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
@@ -43,42 +43,6 @@ Use when:
 - If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
 - Do not push just to review. Push only when the user requested push/ship/PR update.

-## Scope Governor
-
-Autoreview is a closeout gate, not permission to rewrite the task.
-
-Before the first review, freeze a scope baseline: original request or issue, target branch, intended behavior, owner boundary, changed files, and non-test LOC. For inherited or already-bloated branches, use the intended PR diff as the baseline rather than accepting all existing branch drift.
-
-Before patching a finding, classify it:
-
- **In-scope blocker**: the finding is introduced by the current diff, affects the same owner boundary, and can be fixed without changing the task's contract.
- **Follow-up**: the finding is real but belongs to an adjacent bug class, sibling surface, cleanup, or broader hardening track.
- **Stop-and-escalate**: the finding requires a new protocol/config/storage/public API contract, a different owner boundary, a release-process change, or a design choice outside the original request.
-
-Stop patching and report the scope break instead of continuing when:
-
- a narrow PR turns into an architecture change, protocol change, migration, or release-process change;
- the diff grows past 2x the original files or non-test LOC without explicit approval to expand scope;
- two review-triggered patch cycles have not converged; pause and reclassify every remaining finding before another edit;
- the best fix is "define the canonical contract first" rather than another local inference layer;
- fixing the accepted finding would make the PR no longer describe the same behavior, issue, or owner boundary.
-
-After the two-cycle pause, continue only when every remaining accepted finding is still an in-scope blocker. Otherwise preserve the useful analysis, identify the smallest safe landed subset if one exists, and open or request a follow-up for the larger fix. Do not keep committing speculative fixes just to satisfy the reviewer.
-
-Do not stack or push review-triggered fix commits while scope classification or focused proof is unresolved. Keep exploratory edits local until the cycle is proven in scope; if scope breaks, remove them from the landing lane instead of preserving them as branch history.
-
-Critical exceptions must be explicit: active data loss, crash, broken install/upgrade, release blocker, or concrete security exposure. If the exception is not one of those, it is not critical enough to blow up scope.
-
-## Release Branches And Release Process
-
-On release, beta, stable, hotfix, signing, notarization, appcast, package-publish, or release-check work, use freeze discipline even when the branch name is not release-like:
-
- Fix only release blockers, failed release infrastructure, exact backports, install/upgrade breakage, data loss, crashes, or concrete security exposure.
- Treat non-blocking autoreview findings as follow-ups for `main`, not reasons to broaden the release branch.
- Do not introduce new product behavior, config surface, protocol shape, migration, plugin ownership, docs narrative, or process policy unless it directly unblocks the release.
- Keep proof tied to the release target: exact branch/ref, failing check or shipped-risk reason, smallest command/proof, and whether the fix must also forward-port to `main`.
- If review discovers a real but non-critical design problem during release closeout, stop with a follow-up issue/PR plan; do not use the release branch as the refactor lane.
-
 ## Pick Target

 Dirty local work:
--- a/.agents/skills/autoreview/scripts/autoreview
+++ b/.agents/skills/autoreview/scripts/autoreview
@@ -440,36 +440,8 @@ def load_datasets(args: argparse.Namespace) -> str:
    return "\n\n".join(chunks)


-def review_scope_policy() -> str:
-    return textwrap.dedent(
-        """
-        Review scope discipline:
-        - This helper is a closeout gate. Do not turn a narrow patch into a broad
-          redesign request.
-        - Report a finding only when this diff introduces or exposes a concrete
-          defect that must be fixed before this target can land.
-        - If the best fix requires a new protocol, config, storage, public API,
-          release process, migration, owner-boundary move, or canonical contract,
-          say that directly in the finding and keep the finding tied to the
-          smallest changed line that proves the current patch is not landable.
-        - Do not ask for sibling-surface hardening, cleanup, refactors, or
-          follow-up architecture work unless the current diff is incorrect
-          without that work.
-        - Prefer the smallest correct pre-merge fix. A broader ideal design is
-          not an actionable finding unless the current patch cannot safely land.
-        - If this is release-branch or release-process work, apply freeze
-          discipline. Report only release blockers, exact backport regressions,
-          install/upgrade breakage, crashes, data loss, concrete security
-          exposure, or release-infrastructure failures. Non-blocking design,
-          cleanup, and hardening concerns belong on main as follow-ups.
-        """
-    ).strip()
-
-
 def build_prompt(repo: Path, target: str, target_ref: str | None, bundle: str, extra_prompt: str, datasets: str) -> str:
    target_line = f"{target} {target_ref}" if target_ref else target
-    branch = current_branch(repo)
-    scope_policy = review_scope_policy()
    return textwrap.dedent(
        f"""
        You are a senior code reviewer. Review the provided git change bundle only.
@@ -491,11 +463,8 @@ def build_prompt(repo: Path, target: str, target_ref: str | None, bundle: str, e
        - If there are no actionable findings, return an empty findings array and mark the patch correct.

        Review target: {target_line}
-        Current branch: {branch}
        Repository: {repo}

-        {scope_policy}
-
        {extra_prompt}

        {datasets}
--- a/.agents/skills/autoreview/scripts/test-review-harness.py
+++ b/.agents/skills/autoreview/scripts/test-review-harness.py
@@ -3,7 +3,6 @@ from __future__ import annotations

 import argparse
 import os
-import runpy
 import shutil
 import stat
 import subprocess
@@ -146,23 +145,8 @@ def create_fixture_repo(repo: Path, fixture: str) -> None:
    write_fixture_file(repo, MALICIOUS_CHANGED if fixture == "malicious" else BENIGN_CHANGED)


-def validate_prompt_policy(repo: Path, autoreview: Path) -> None:
-    namespace = runpy.run_path(str(autoreview))
-    prompt = namespace["build_prompt"](repo, "local", None, "fixture diff", "", "")
-    required = (
-        "This helper is a closeout gate.",
-        "Do not turn a narrow patch into a broad",
-        "If this is release-branch or release-process work",
-        "Non-blocking design,",
-    )
-    missing = [needle for needle in required if needle not in prompt]
-    if missing:
-        raise RuntimeError(f"autoreview prompt missing scope policy: {missing}")
-
-
 def run_reviews(repo: Path, script_dir: Path, fixture: str, engines: list[str]) -> None:
    autoreview = script_dir / "autoreview"
-    validate_prompt_policy(repo, autoreview)
    for engine in engines:
        print(f"== {engine} ==", flush=True)
        command = [
--- a/.agents/skills/claw-score/SKILL.md
+++ b/.agents/skills/claw-score/SKILL.md
@@ -1,115 +0,0 @@
---
-name: claw-score
-description: Audit or refresh OpenClaw maturity scorecard docs from root taxonomy, maturity scores, and QA evidence artifacts without using maintainer discrawl data or committed inventory reports.
---
-
-# claw-score
-
-Use this skill when working on the OpenClaw maturity scorecard in this repo.
-This is the openclaw-local version of the maintainer `claw-score` workflow:
-it keeps the taxonomy and scorecard concepts, but excludes discrawl and the old
-committed `inventory/` report tree.
-
-## Authority
-
-This skill owns the operational workflow for:
-
- `taxonomy.yaml`
- `docs/maturity-scores.yaml`
- `docs/maturity-scorecard.md`
- `docs/taxonomy.md`
- `docs/taxonomy-outline.md`
- `scripts/render-maturity-docs.mjs`
- `.github/workflows/maturity-scorecard.yml`
-
-Keep person-specific, maintainer-private, Discord archive, and discrawl facts
-out of this repo. If a score needs private evidence, use the redacted
-`qa-evidence.json` artifact shape generated by OpenClaw QA workflows.
-
-## Source Model
-
- `taxonomy.yaml` is the hand-edited source of truth for surfaces, levels,
-  QA profiles, categories, feature coverage IDs, docs refs, LTS overrides, and
-  completeness-instruction paths.
- `docs/maturity-scores.yaml` is the aggregate score source committed in this
-  repo. It is the only committed score data; do not add generated inventory
-  directories.
- `docs/maturity-scorecard.md`, `docs/taxonomy.md`, and
-  `docs/taxonomy-outline.md` are deterministic docs generated from the root
-  taxonomy and aggregate score source.
- `qa-evidence.json` artifacts provide per-run QA scorecard evidence. They can
-  enrich generated artifact docs, but they are not committed as inventory.
-
-## Commands
-
-Run from the openclaw repo root.
-
-Render committed docs:
-
-```bash
-pnpm maturity:render
-```
-
-Check generated docs are current:
-
-```bash
-pnpm maturity:check
-```
-
-Render an evidence-enriched docs artifact from downloaded QA artifacts:
-
-```bash
-pnpm maturity:render -- --evidence-dir .artifacts/maturity-evidence --output-dir .artifacts/maturity-docs
-```
-
-## Scoring Workflow
-
-When asked to score or refresh a surface:
-
-1. Read the surface in `taxonomy.yaml`.
-2. Read the surface completeness rubric under
-   `.agents/skills/claw-score/references/completeness/`.
-3. Gather public repo evidence from docs, source, tests, and QA scenario
-   metadata.
-4. Prefer existing `qa-evidence.json` artifacts for executed proof. Do not use
-   discrawl or unredacted private archives.
-5. Update `docs/maturity-scores.yaml` only when the score change is backed by
-   public or redacted artifact evidence.
-6. Run `pnpm maturity:render`.
-7. Run `pnpm maturity:check`.
-
-For subjective score changes, make the smallest defensible edit and leave the
-evidence path in the PR or task summary. The deterministic renderer owns
-Markdown structure; manual prose tweaks belong in taxonomy, score source, or
-the renderer rather than in generated docs.
-
-## Score Semantics
-
- Coverage: public or redacted proof that the feature is exercised by docs,
-  tests, QA scenarios, live lanes, or release evidence.
- Quality: reliability, maintainability, operator safety, and regression
-  confidence for the category.
- Completeness: how much of the intended operator-visible workflow exists for
-  the category. Use the surface-specific completeness rubric before changing
-  this score.
- LTS: derived from score thresholds and `human_lts_override`; do not hand-edit
-  generated Markdown to change LTS status.
-
-Bands:
-
- `Lovable`: 95-100
- `Stable`: 80-95
- `Beta`: 70-80
- `Alpha`: 50-70
- `Experimental`: 0-50
-
-## GitHub Action
-
-The `Maturity scorecard` workflow verifies committed generated docs on PRs and
-pushes. Manual dispatch can also download QA artifacts from another workflow run
-with `source_run_id` and `artifact_pattern`, render evidence-enriched docs into
-`.artifacts/maturity-docs`, and upload them as a GitHub artifact.
-
-Do not add the maintainer repo's `docs/kevinslin/maturity-scorecard/inventory/`
-tree to openclaw. Those generated reports are intentionally replaced here by
-short-lived artifact docs and the committed aggregate scorecard pages.
--- a/.agents/skills/claw-score/references/completeness/agent-runtime-and-provider-execution.md
+++ b/.agents/skills/claw-score/references/completeness/agent-runtime-and-provider-execution.md
@@ -1,45 +0,0 @@
-# Agent Runtime Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`agent-runtime-and-provider-execution` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Agent Runtime` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Agent Turn Execution: Turn startup and runtime choice, Session and run coordination, Abort and terminal outcomes
- External Runtimes and Subagents: External harness selection, CLI runtime aliases, Subagent turns, Runtime recovery
- Hosted Provider Execution: Hosted provider turns, Provider-specific model options, Hosted tool use, Reasoning and cache controls, Hosted streaming and replies
- Local and Self-hosted Providers: Local provider profiles, Tool-capability flags, Timeouts and context windows, Local smoke checks, Local failure handling
- Model and Runtime Selection: Model reference selection, Provider and runtime overrides, Thinking and context settings, Invalid route recovery
- Provider Auth: Login and API-key setup, Auth profile selection, Credential health checks, Auth failover, Provider fallback recovery, Rate-limit and capacity recovery, Missing-key and OAuth guidance, Restart and stale-route recovery, Structured provider diagnostics, Subagent credential propagation
- Streaming and Progress: Streaming replies, Progress visibility
- Tool Calls and Response Handling: Tool-call handling, Usage and response reporting, Failure recovery
- Tool Execution Controls: Tool availability rules, Sandboxed exec behavior, Approval flow, Elevated execution, Tool safety controls, Delegated tool access
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/android-app.md
+++ b/.agents/skills/claw-score/references/completeness/android-app.md
@@ -1,43 +0,0 @@
-# Android app Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`android-app` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Android app` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Media Capture: Camera and media capture
- Mobile Chat: Chat tab
- Connection Setup: Gateway discovery
- Distribution: Public Google Play install path, Manual install path, Release smoke and startup performance
- Settings: Settings sheet
- Voice: Voice tab
- Device Runtime: Background reconnect and presence, Device command availability
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/anthropic-provider-path.md
+++ b/.agents/skills/claw-score/references/completeness/anthropic-provider-path.md
@@ -1,41 +0,0 @@
-# Anthropic provider path Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`anthropic-provider-path` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Anthropic provider path` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Provider Auth and Recovery: API-key onboarding, Claude CLI credential reuse, Setup-token auth, Auth profile health, Model status, Usage windows, Cooldown/profile reporting, Long-context recovery, Fallback guidance
- Model and Runtime Selection: Bundled Claude catalog, Canonical anthropic refs, Claude CLI compatibility, Model picker availability, Capability metadata, Runtime selection, Session continuity, MCP/tool bridge, Permission-mode mapping, Fallback prelude
- Request Transport and Turn Semantics: API-key/OAuth transport, Messages payloads, Streaming decode, Usage and stop reasons, Abort/error handling, Tool-use blocks, Tool-result replay, Partial JSON recovery, Native thinking, Signed/redacted thinking replay
- Prompt Cache and Context: Cache retention, System-prompt cache boundary, 1M context, Fast mode/service tier, Cache diagnostics
- Media Inputs: Image input, PDF document input, Media model fallback, Image tool results
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/automation-cron-hooks-tasks-polling.md
+++ b/.agents/skills/claw-score/references/completeness/automation-cron-hooks-tasks-polling.md
@@ -1,42 +0,0 @@
-# Automation: cron, hooks, tasks, polling Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`automation-cron-hooks-tasks-polling` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Automation: cron, hooks, tasks, polling` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Cron Jobs: Create/edit/remove jobs, Schedule types, Timezone and stagger, Cron RPCs, Agent cron tool, Manual cron runs, Isolated cron execution, Model/provider preflight, Run history, Timeout and denial diagnostics, Chat announce delivery, Webhook delivery, Failure destinations, Skipped-run alerts, Delivery previews
- Event Ingress: Telegram long polling, Telegram webhook mode, Zalo polling/webhook mode, Polling stall diagnostics, iMessage watch fallback, Gmail setup wizard, Watcher start/serve, Tailscale/public routing, Push token validation, Gmail event routing, POST /hooks/wake, POST /hooks/agent, Mapped hooks, Hook auth policy, Async dispatch
- Automation Hooks: HOOK.md authoring, Hook discovery, Hook CLI management, Hook packs, Lifecycle event dispatch, api.on registration, Tool-call policy hooks, Message hooks, Session/lifecycle hooks, Plugin approval requests, cron_changed
- Background Tasks and Flows: Task list/show/cancel, Task notifications, Task audit and maintenance, Chat task board, Task pressure status, Managed flows, Mirrored flows, openclaw tasks flow, Flow audit and maintenance, Plugin managedFlows
- Heartbeat: Heartbeat scheduling, Active hours, Wake and cooldown handling, Due-only heartbeat tasks, Commitment check-ins
- Polling Controls: openclaw message poll, Telegram polls, Teams polls, Poll flags, Channel capability gates, process poll, process log, Background process status, No-progress loop detection, Process input controls
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/browser-automation-and-exec-sandbox-tools.md
+++ b/.agents/skills/claw-score/references/completeness/browser-automation-and-exec-sandbox-tools.md
@@ -1,39 +0,0 @@
-# Browser automation and exec/sandbox tools Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`browser-automation-and-exec-sandbox-tools` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Browser automation and exec/sandbox tools` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Browser Automation: Browser Actions, Snapshots, Artifacts, Browser Plugin Service, Profiles, Browser Security, SSRF, Remote Control
- Tool Invocation and Execution: Exec Routing, Process Lifecycle, Direct Tool Invoke API, Node System.run, Host Exec Approvals, Elevated Mode
- Sandbox and Tool Policy: Sandbox Backends, Workspace Isolation, Sandboxed Browser, Codex Dynamic Tools, Tool Policy, Sandbox Tool Gates
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/browser-control-ui-and-webchat.md
+++ b/.agents/skills/claw-score/references/completeness/browser-control-ui-and-webchat.md
@@ -1,43 +0,0 @@
-# Gateway Web App Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`browser-control-ui-and-webchat` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Gateway Web App` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Browser Realtime Talk: Browser Talk start/stop, Provider session selection, Gateway relay audio, Tool-call consults, Steer and cancel
- Browser Access and Trust: Device pairing, Token/password auth, Tailscale Serve auth, Trusted proxy auth, Allowed origins/gatewayUrl
- Configuration: Config snapshots, Schema form editing, Raw JSON editing, Base-hash guarded writes, Apply and restart
- Browser UI: Gateway-hosted UI, Dashboard open/auth bootstrap, Base-path routing, Static asset recovery, Dev gatewayUrl target, PWA install metadata, Service worker updates, VAPID keys, Subscribe/unsubscribe, Test notifications
- WebChat Conversations: Send and abort, Session and agent picker, Model/thinking controls, Attachments, Markdown/tool/media rendering, chat.history projection, chat.send lifecycle, Abort/partial retention, Injected assistant notes, Reconnect continuity, Hosted embeds, External embed gating, Assistant media tickets, Authenticated avatars, CSP image policy
- Remote WebChat: macOS WebChat transport, SSH tunnel data plane, Direct ws/wss remote mode, Session continuity, Remote troubleshooting
- Operator Console: Health/status/models, Live log tail, Update run/status, Activity summaries, RPC timing telemetry, Channels/login, Session manager and history, Cron, Skills/nodes, Exec approvals/agents
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/channel-framework.md
+++ b/.agents/skills/claw-score/references/completeness/channel-framework.md
@@ -1,44 +0,0 @@
-# Channel framework Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`channel-framework` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Channel framework` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Actions Commands and Approvals: Channel-native commands, Native command session target, Message actions, Message tool API discovery, Channel-native approval prompts
- Channel Setup: Supported channel catalog, Channel status taxonomy in channels list, Setup/onboarding flows, Install-on-demand, Setup wizard metadata
- Group Thread and Ambient Room Behavior: Group/channel session isolation, Mention-required, Native threads, Broadcast groups, Bot-loop protection
- Inbound Access and Identity Gates: DM pairing, Group/channel allowlists, Access group expansion, Mention gating, Sanitized inbound identity/route projections
- Media Attachments and Rich Channel Data: Inbound media normalization, Outbound direct text/media sends, Provider-specific channelData, Media roots
- Outbound Delivery and Reply Pipeline: Automatic final reply delivery, Durable outbound send orchestration, Reply pipeline transforms, Provider outbound adapter bridge
- Conversation Routing and Delivery: Inbound conversation routing, Session key construction, Agent binding precedence, Runtime conversation bindings, Thread/parent-child placement, Plugin registry resolution, Channel account startup, Whole-channel lifecycle controls, Config/secrets reload interactions, Auto-restart
- Status Health and Operator Controls: channels.status, Channel health policy, Operator CLI controls, Status read-model
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/clawhub-and-external-plugin-distribution.md
+++ b/.agents/skills/claw-score/references/completeness/clawhub-and-external-plugin-distribution.md
@@ -1,41 +0,0 @@
-# ClawHub Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`clawhub-and-external-plugin-distribution` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `ClawHub` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Publishing: ClawHub package publishing owner, OpenClaw-owned package release validation for ClawHub, Version bump gates, npm trusted publishing provenance, External code plugin package contract required, Skill package metadata, Skill publishing flow
- Catalog Discovery: openclaw plugins search as the ClawHub, Search result metadata, Distinction between plugin search, Catalog lookup failure, Skill catalog search
- Compatibility and Trust: openclaw.compat.pluginApi, ClawHub package compatibility validation, npm compatibility fallback to the newest, Official external plugin catalog behavior, Compatibility docs, Operator trust model for installing, ClawHub archive, npm integrity drift, Built-in dangerous-code scanner, ClawHub publishing review/hidden-release behavior as upstream, Skill archive safety, Skill audit signals
- Plugin Lifecycle: Source prefixes, Bare package behavior during the launch, Explicit pinned versions, Managed install records that preserve source, Codex, Local, Marketplace list, Supported mapped features, Remote marketplace path safety, Update by plugin id, Reinstall vs update semantics, Downgrade, Uninstall config/index/policy/file cleanup, Gateway restart/reload requirements after, ClawHub skill installs, Skill upload install path, Skill dependency installers
- Plugin Health: Per-plugin managed npm project, npm-pack local release-candidate installs, Dependency ownership between plugin packages, Peer dependency relinking, Legacy dependency root cleanup, plugins list, Local plugin index, Troubleshooting stale config, Runtime verification after Gateway
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/cli-install-update-onboard-doctor.md
+++ b/.agents/skills/claw-score/references/completeness/cli-install-update-onboard-doctor.md
@@ -1,47 +0,0 @@
-# CLI Surface Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`cli-install-update-onboard-doctor` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully the CLI supports the intended operator journey
-for installation, onboarding, configuration, repair, and upgrade. Score whether
-an operator can complete the end-to-end job for the category across the
-expected environments and recovery branches.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can a normal operator complete the job end to end from the CLI?
- Are the expected environments represented where they matter for the category,
-  such as local installs, remote gateway use, supervised services, or
-  Windows/WSL2?
- Are the main lifecycle stages present where relevant: setup, inspection,
-  change, repair, and upgrade?
- Are common recovery and troubleshooting branches present, or does the
-  workflow dead-end after the happy path?
- Are major documented operator expectations still unimplemented?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the CLI covers the full operator journey, not
-  only the install or happy path.
- Lower Completeness when the category lacks meaningful repair, migration,
-  remote, or platform-specific branches that users are expected to rely on.
- For Windows and WSL2, score against the intended supported experience rather
-  than parity with macOS/Linux internals.
- Do not use test breadth to lower Completeness; that is Coverage.
- Do not use fragility or bug history to lower Completeness; that is Quality.
-
-## Suggested Bands
-
- `Lovable` (95-100): the category covers the full operator journey across the
-  expected environments and recovery paths.
- `Stable` (80-95): the main workflow set is broadly complete, with only
-  bounded missing paths.
- `Beta` (70-80): the main journey works, but notable operator branches are
-  still absent.
- `Alpha` (50-70): only a partial operator workflow is supported.
- `Experimental` (0-50): the category is fragmentary or heavily caveated.
--- a/.agents/skills/claw-score/references/completeness/discord.md
+++ b/.agents/skills/claw-score/references/completeness/discord.md
@@ -1,42 +0,0 @@
-# Discord Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`discord` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Discord` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Application and bot setup, Token and application ID configuration, Setup wizard and account inspection, Status, doctor, and intent checks, Multi-account bot configuration, Account monitor startup, Gateway WebSocket lifecycle, Reconnect and heartbeat handling, Rate limits and gateway metadata, Status, probe, and health-monitor recovery
- Access and Identity: DM policy modes, Allowlist inheritance, Pairing-code approval, Sender authorization, Access-group authorization, Group DM authorization
- Conversation Routing and Delivery: Guild and channel admission, Mention gating, Session key isolation, Configured and runtime routing, Inbound context visibility, Forum and media-channel thread posts, Thread actions, Target parsing, Thread context resolution, Thread-bound session routing, ACP agent routing, Routing lifecycle, Discord forum/media channel posts created as, CLI and message-tool thread actions, Discord target parsing for `channel:<id>`, Thread context resolution, Thread-bound session routing for `/focus`, `/unfocus`, `/agents`, `/session idle`, `/session max-age`, `sessions_spawn({ thread, ACP current-conversation bindings and ACP thread, Binding lifecycle behavior, Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context
- Media and Rich Content: Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context, Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context, Outbound file uploads from URLs and, Component v2 file and media-gallery blocks, Video caption handling and follow-up media-only delivery, Discord voice-message sends with OGG/Opus conversion, Inbound media/attachment-aware debounce behavior, Realtime voice-channel conversations, General text-only delivery
- Native Controls and Approvals: Native slash command registration, Native slash command execution, Model Picker Commands, Components v2 messages, Callback TTL, Native Discord exec/plugin approvals, Sensitive owner-only command routing for prompts, Discord message actions, Action gates under channels.discord.actions.\*
- Realtime Voice and Calls: Voice Channel Lifecycle, Auto-join and follow-users, Realtime voice modes, Wake, barge-in, and echo handling, Voice codec and DAVE recovery
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/docker-podman-hosting.md
+++ b/.agents/skills/claw-score/references/completeness/docker-podman-hosting.md
@@ -1,40 +0,0 @@
-# Docker / Podman hosting Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`docker-podman-hosting` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Docker / Podman hosting` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Container Setup: Local Image Setup Script, Docker Compose gateway, First-run onboarding, Docker-only first-run notes, Podman setup scripts and Quadlet template, Rootless Podman image setup
- Container Operations: Host CLI routing into running Docker/Podman, Container Targeting, Container update/rebuild/restart guidance for Docker, Docker Compose, Gateway token generation, Ownership, Docker Compose, Container health endpoints, Provider/VPS Docker hosting docs, Docker VM persistence/update guidance, Operator-facing update
- Image Release and Validation: Root Dockerfile build stages, Docker release workflow, Docker E2E package artifact generation, Docker E2E plan/scheduler scripts, Release-path install
- Agent Sandbox and Tooling: Docker gateway setup, Docker-backed agent sandbox support, Container image dependency baking
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/feishu-qq-bot-wechat-yuanbao-zalo-zalo-personal-regional-channels.md
+++ b/.agents/skills/claw-score/references/completeness/feishu-qq-bot-wechat-yuanbao-zalo-zalo-personal-regional-channels.md
@@ -1,40 +0,0 @@
-# Feishu, QQ Bot, WeChat, Yuanbao, Zalo, Zalo Personal, regional channels Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`feishu-qq-bot-wechat-yuanbao-zalo-zalo-personal-regional-channels` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Feishu, QQ Bot, WeChat, Yuanbao, Zalo, Zalo Personal, regional channels` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Docs channel index, Official external channel catalog entries, Core channel-plugin catalog, Channel setup wizard, Missing-plugin, Cross-channel ingress/access/refactor concerns, Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Access and Identity: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Conversation Routing and Delivery: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Media and Rich Content: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/gateway-runtime.md
+++ b/.agents/skills/claw-score/references/completeness/gateway-runtime.md
@@ -1,50 +0,0 @@
-# Gateway Runtime Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`gateway-runtime` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended gateway runtime
-capability to operators and connected clients. This is not test coverage and
-not implementation quality. Score whether the category delivers the full
-operator-visible workflow, including the major modes and recovery paths that a
-real deployment expects.
-
-## Scoring Questions
-
-For each category, ask:
-
- Does the category cover the main happy path an operator or client needs?
- Are the major deployment modes present where they matter for this category:
-  local, remote, node-mediated, supervised, or browser-facing?
- Are the main lifecycle stages present where relevant: setup, normal use,
-  status/inspection, and recovery?
- Are important security or policy branches present where the category implies
-  them?
- Are obvious operator-visible holes or "not yet supported" branches still
-  missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness only when the category supports the full operator
-  journey, not just a protocol primitive or one transport path.
- Lower Completeness when only the core path exists but important branches are
-  missing, such as remote versus local differences, supervised lifecycle
-  behavior, approval/policy variants, or recovery/diagnostic paths.
- Do not lower Completeness just because tests are thin; that is Coverage.
- Do not lower Completeness just because the implementation is fragile; that is
-  Quality.
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across all expected operator/client modes, with
-  only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only
-  bounded missing branches.
- `Beta` (70-80): the main workflows exist, but some meaningful branches or
-  recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can do core
-  tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended
-  capability.
--- a/.agents/skills/claw-score/references/completeness/google-chat.md
+++ b/.agents/skills/claw-score/references/completeness/google-chat.md
@@ -1,41 +0,0 @@
-# Google Chat Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`google-chat` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Google Chat` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Google Cloud project setup, Chat app configuration, Service account setup, Webhook audience and path, Workspace visibility and app status, Guided channel setup, Account resolution, Service account SecretRefs, Env file and inline credentials, Channel status and probes, Directory and mutable-id diagnostics, NPM and ClawHub install, Plugin docs and catalog routing, Channel aliases and labels, Operator status UI, Install/update metadata, Webhook path handling, Standard Chat token verification, Workspace add-on token verification, Audience and appPrincipal validation, Shared-path target selection, Auth rejection diagnostics, Account resolution, Service account SecretRefs, Env file and inline credentials, Channel status and probes, Directory and mutable-id diagnostics, NPM and ClawHub install, Plugin docs and catalog routing, Channel aliases and labels, Operator status UI, Install/update metadata, Webhook path handling, Standard Chat token verification, Workspace add-on token verification, Audience and appPrincipal binding, Shared-path target selection, Auth rejection diagnostics
- Access and Identity: DM pairing approval, Sender allowlists, Google Chat identity matching, Direct session routing, Pairing diagnostics, Space allowlists, Mention gating, Sender access groups, Group session isolation, Bot-loop protection, Space diagnostics
- Conversation Routing and Delivery: DM pairing approval, Sender allowlists, Google Chat identity matching, Direct session routing, Pairing diagnostics, Space allowlists, Mention gating, Sender access groups, Group session isolation, Bot-loop protection, Space diagnostics, Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering
- Media and Rich Content: Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering
- Native Controls and Approvals: Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/google-provider-path.md
+++ b/.agents/skills/claw-score/references/completeness/google-provider-path.md
@@ -1,41 +0,0 @@
-# Google provider path Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`google-provider-path` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Google provider path` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Provider Setup and Credentials: API key onboarding, Auth choice metadata, Gemini CLI OAuth setup, Vertex ADC setup, Daemon and fallback credentials, CLI runtime selection, OAuth login and refresh, Canonical Google model refs, CLI usage normalization, OAuth diagnostics
- Model Routing and Endpoints: Catalog rows and aliases, Dynamic model resolution, Provider routing, Google-native config normalization, Model picker availability, Vertex provider selection, ADC/service-account auth, Project/location endpoints, Custom base URL policy, Compatibility boundaries
- Direct Gemini Runtime: Direct Gemini chat, Multimodal inputs, Tool-call streaming, Usage and stop reasons, Thought-signature replay, Thinking-level mapping, Thought-signature replay, Tool turn ordering, Incomplete-turn recovery, Planning-only turn recovery
- Media, Search, and Realtime: Bundled plugin distribution, Provider auto-enable metadata, Image and media adapters, Speech and realtime adapters, Search and generation tools, Realtime voice sessions, Constrained browser tokens, Audio and transcript events, Live tool calls, Session reconnects
- Prompt Caching: Cache retention config, Managed cachedContents, Manual cachedContent handles, Cache usage accounting, Cache diagnostics and live proof
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/image-video-music-generation-tools.md
+++ b/.agents/skills/claw-score/references/completeness/image-video-music-generation-tools.md
@@ -1,41 +0,0 @@
-# Image/video/music generation tools Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`image-video-music-generation-tools` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Image/video/music generation tools` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Media Routing and Discovery: default media model config, per-call model refs and fallbacks, auth-backed tool discovery, action=list provider inspection
- Task Lifecycle and Delivery: background task creation, task status/list/show/cancel, duplicate guards, progress keepalive, completion/failure wake, no-session inline fallback, local media persistence, MIME/filename inference, Hosted URL fallback, message-tool handoff, idempotent missing-media fallback, channel attachment proof
- Image Generation: text-to-image, reference-image editing, output hints, action=status, provider attempt metadata, OpenAI/Codex OAuth, API-key OpenAI, OpenRouter/xAI/fal/LiteLLM/DeepInfra/Google/MiniMax/ComfyUI auth, provider error diagnostics
- Video Generation: text-to-video, image-to-video, video-to-video, reference role validation, audio refs, typed providerOptions, queue-backed jobs, polling/timeout handling, Hosted URL download, provider skip explanations, returned asset metadata
- Music Generation: prompt and lyrics input, instrumental mode, duration/format controls, image-reference edit lanes, generated audio outputs, provider fallback
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/imessage-bluebubbles.md
+++ b/.agents/skills/claw-score/references/completeness/imessage-bluebubbles.md
@@ -1,41 +0,0 @@
-# iMessage / BlueBubbles Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`imessage-bluebubbles` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `iMessage / BlueBubbles` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Translate legacy config, Cut over safely, Handle migration caveats, Run local imsg, Run through SSH wrapper, Grant macOS permissions, Probe runtime health, Account setup prompts, Account status checks, Doctor repair checks, Account Config, Translate legacy config, Cut over safely, Handle migration caveats, Run local imsg, Run through SSH wrapper, Grant macOS permissions, Probe runtime health
- Access and Identity: Authorize direct senders, Route direct conversations, Bind ACP sessions, Group Policy, Mentions, System Prompts, Group Policy, Mentions, System Prompts
- Conversation Routing and Delivery: Watch live messages, Coalesce split-send DMs, Replay missed messages, Seed conversation history, Authorize direct senders, Route direct conversations, Bind ACP sessions, Group Policy, Mentions, System Prompts
- Media and Rich Content: Media, Attachments, Remote Fetch, Chunking, Native Actions, Private API, Message Tool
- Native Controls and Approvals: Native Approvals, Reactions, Operator Control, Media, Attachments, Remote Fetch, Chunking, Native Actions, Private API, Message Tool, Native Actions, Private API, Message Tool
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/ios-app.md
+++ b/.agents/skills/claw-score/references/completeness/ios-app.md
@@ -1,44 +0,0 @@
-# iOS app Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`ios-app` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `iOS app` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Media and Sharing: Camera list/snap/clip
- Canvas and Screen: Canvas present/hide/navigate/eval/snapshot
- Chat and Sessions: Chat sessions and operator controls
- Gateway Setup and Diagnostics: Bonjour/local, Manual host/port, Gateway connect configuration persistence, TLS fingerprint trust prompt, Pairing approval, Pairing/auth diagnostics for users, Settings tab
- Distribution: Internal preview status
- Device Commands: Location modes, Device command handling
- Notifications and Background: APNs registration and relay delivery
- Voice: Voice wake
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/kubernetes-hosting.md
+++ b/.agents/skills/claw-score/references/completeness/kubernetes-hosting.md
@@ -1,43 +0,0 @@
-# Kubernetes Hosting Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`kubernetes-hosting` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw supports Kubernetes as a cluster
-hosting path for the Gateway. Score whether each category delivers the operator
-workflow for deployment, configuration, secrets, access, exposure, lifecycle,
-security posture, status, and recovery.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can an operator deploy and manage OpenClaw on Kubernetes end to end?
- Are the taxonomy features present as supported manifests, commands, and docs rather than examples only?
- Are setup, normal operation, status or inspection, redeploy, teardown, and secret rotation represented where relevant?
- Are local Kind validation, namespace/image customization, provider secrets, and secure exposure branches covered?
- Do known gaps leave major cluster-hosting capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when a Kubernetes operator can deploy, expose, secure, update, troubleshoot, and remove the Gateway without relying on Docker-only assumptions.
- Lower Completeness when a category only covers happy-path port-forwarding, lacks secret/config rotation, or omits exposed-service security posture.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Deployment Setup: Kustomize packaging, cluster prerequisites, quick deploy, manifest apply, and Kind validation.
- Configuration and Secrets: agent instructions, Gateway config, provider secrets, secret rotation, and image/namespace customization.
- Access and Exposure: port-forward access, service endpoint, ingress exposure, auth/TLS, and localhost posture.
- Cluster Lifecycle: resource layout, state persistence, redeploy, teardown, and security context.
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/linux-companion-app.md
+++ b/.agents/skills/claw-score/references/completeness/linux-companion-app.md
@@ -1,41 +0,0 @@
-# Linux companion app Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`linux-companion-app` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Linux companion app` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- App Distribution: Native app package, Distro package targets, Official release metadata
- Gateway Connectivity: Local Gateway attach and status, Gateway pairing and auth, Remote mode, Local and remote resource boundaries
- Chat and Sessions: Native Linux chat window, Transcript, Gateway chat transport
- Desktop Capabilities: Linux desktop permissions, Secret storage, Sandbox/package posture, Linux native node identity, Host command execution, Desktop tools, Linux native Talk, Microphone capture, Native media permissions
- Status and Diagnostics: Native Linux app readiness, Gateway health/status display, Log/transcript opening, Doctor/repair affordances, Linux tray/status item, Runtime status row, Desktop-environment integration
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/linux-gateway-host.md
+++ b/.agents/skills/claw-score/references/completeness/linux-gateway-host.md
@@ -1,41 +0,0 @@
-# Linux Gateway host Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`linux-gateway-host` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Linux Gateway host` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Host Setup and Updates: Linux CLI install, Node runtime prerequisites, Package-manager policy, Update path
- Gateway Runtime and Service Control: Foreground Gateway Runtime, Process Control, Systemd User Service Lifecycle setup, Systemd User Service Lifecycle operation, Systemd User Service Lifecycle status, Systemd User Service Lifecycle recovery
- Remote Access and Security: Remote Network Exposure, TLS, Tailscale, Gateway exposure safeguards, Gateway authentication modes, Secret Handling
- Diagnostics and Repair: Gateway diagnostic reports, Gateway log tailing, Doctor checks, Operator repair guidance
- Deployment Targets: VPS, Container, Cloud Deployment Guidance
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/local-model-providers-ollama-vllm-sglang-lm-studio.md
+++ b/.agents/skills/claw-score/references/completeness/local-model-providers-ollama-vllm-sglang-lm-studio.md
@@ -1,41 +0,0 @@
-# Local model providers: Ollama, vLLM, SGLang, LM Studio Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`local-model-providers-ollama-vllm-sglang-lm-studio` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Local model providers: Ollama, vLLM, SGLang, LM Studio` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Provider Setup, Lifecycle, and Diagnostics: Provider Selection, Onboarding, localService configuration, Process startup and readiness, Request leases and idle shutdown, Health checks and restart, Provider recipes, Local provider status, Backend reachability probes, Model availability errors, Memory readiness diagnostics, Provider troubleshooting docs
- Native Provider Plugins: Ollama setup and model pulling, Model discovery, Streaming and vision, Ollama embeddings, Web-search support, LM Studio setup, Model discovery and auth, Model preload and JIT loading, Streaming compatibility, LM Studio embeddings
- OpenAI-Compatible Runtime Compatibility: Bundled provider setup, Model Discovery Endpoint, Non-interactive configuration, vLLM thinking controls, OpenAI-compatible chat and tool semantics, SGLang compatibility guidance, Request Stream Compatibility, Tool Calling
- Local Memory and Embeddings: Embedding provider selection, Memory search readiness, memoryFlush model override, Fallback lexical search, Provider mismatch guidance
- Network Safety and Prompt Controls: Safety Network, Prompt Pressure Controls
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/long-tail-hosted-providers.md
+++ b/.agents/skills/claw-score/references/completeness/long-tail-hosted-providers.md
@@ -1,39 +0,0 @@
-# Long-tail hosted providers Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`long-tail-hosted-providers` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Long-tail hosted providers` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Hosted LLM Providers: Bedrock setup, Gateway/proxy routing, Copilot/OpenCode hosted access, Proxy capability diagnostics, Hosted text completion, Tool-call and streaming compatibility, Model catalog resolution, Provider-specific request shaping, Regional provider setup, Region and plan routing, Regional live smoke, Account prerequisite diagnostics
- Hosted Media Providers: Image generation providers, Video generation providers, Music generation providers, Media mode coverage, Text-to-speech providers, Speech-to-text providers, Realtime transcription providers, Audio format diagnostics
- Provider Operations: Provider directory, Provider install catalog, Model catalog metadata, Catalog parity checks, Provider setup descriptors, Auth profiles and aliases, Credential health probes, Key rotation and recovery, Direct provider smoke, Gateway live smoke, Models status probes, Fallback trace and repair
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/macos-companion-app.md
+++ b/.agents/skills/claw-score/references/completeness/macos-companion-app.md
@@ -1,43 +0,0 @@
-# macOS companion app Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`macos-companion-app` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `macOS companion app` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Canvas: Canvas panel open/hide/navigate/eval/snapshot, Local custom URL scheme, A2UI host auto-navigation, Canvas enable/disable setting
- Local Setup: Local mode Gateway attach/start/stop, LaunchAgent install/update/restart/uninstall, Existing-listener detection, Native first-run onboarding flow, CLI discovery, Local workspace selection, Onboarding WebChat session separation
- Status and Settings: Menu-bar status, Activity state ingestion, Settings navigation, Health polling, Channels settings
- Native Capabilities: Mac node session connection, system.run, Exec approval policy, Permission requests, TCC persistence
- Remote Connections: Remote connection mode selection, SSH tunnel, Gateway discovery
- Voice and Talk: Voice Wake runtime, Push-to-talk, Talk provider playback plan
- WebChat: Native SwiftUI WebChat window, Gateway chat transport, Local and remote data-plane reuse
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/macos-gateway-host.md
+++ b/.agents/skills/claw-score/references/completeness/macos-gateway-host.md
@@ -1,43 +0,0 @@
-# macOS Gateway host Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`macos-gateway-host` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `macOS Gateway host` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- CLI Setup: Hosted installer, Node 24 recommendation, App-triggered CLI install, Shell PATH and version-manager drift
- Local Gateway Integration: App local/remote connection mode, App-managed Gateway LaunchAgent install/restart/uninstall, CLI install detection, Attach-to-existing local Gateway compatibility, Gateway endpoint, gateway.mode=local configuration, Loopback bind, Local app endpoint resolution, Bonjour discovery
- Remote Gateway Mode: macOS app "Remote over SSH", SSH tunnel setup, Tailscale MagicDNS, Remote endpoint token/password/TLS fingerprint, Local node host startup
- Gateway Service Lifecycle: Per-user Gateway LaunchAgent install, launchctl bootstrap, LaunchAgent labels, Gateway token/env handling, App-managed LaunchAgent handoff, openclaw update package/git handoff, Managed service refresh, Stale updater launchd job detection, openclaw uninstall, Stranded service recovery
- Diagnostics and Observability: LaunchAgent log paths, openclaw gateway status --deep, Gateway silently stops responding, Stale updater jobs
- Permissions and Native Capabilities: macOS TCC permission prompts/status, Native node capability exposure, system.run policy, Permission-driven support
- Profiles and Isolation: Profile-specific LaunchAgent labels, Profile-specific state/config/workspace roots, Derived ports, Rescue bot setup, Extra Gateway process detection
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/matrix.md
+++ b/.agents/skills/claw-score/references/completeness/matrix.md
@@ -1,42 +0,0 @@
-# Matrix Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`matrix` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Matrix` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Matrix plugin identity, Setup wizard, Account discovery, Matrix doctor warnings, Matrix probe/status, Shared Matrix client resolution, Monitor startup, Startup maintenance, Matrix doctor warnings, Matrix probe/status, Monitor startup, Startup maintenance
- Access and Identity: DM policy, Direct-room classification, Inbound route selection across sender-bound DMs, Mention gates, Matrix thread reply routing, Persisted Matrix thread routing managers, ACP/subagent spawn hooks
- Conversation Routing and Delivery: DM policy, Direct-room classification, Inbound route selection across sender-bound DMs, Mention gates, Matrix thread reply routing, Persisted Matrix thread routing managers, ACP/subagent spawn hooks, Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling
- Media and Rich Content: Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling
- Native Controls and Approvals: Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling, Matrix native exec, Origin target resolution from Matrix turn, Approver DM target resolution, Matrix approval metadata, Origin target resolution from Matrix turn, Approver DM target resolution, Matrix approval metadata
- Encryption and Verification: Encryption setup, Encrypted media upload/download, Legacy state
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/mattermost-line-irc-nextcloud-talk-nostr-twitch-tlon-synology-chat.md
+++ b/.agents/skills/claw-score/references/completeness/mattermost-line-irc-nextcloud-talk-nostr-twitch-tlon-synology-chat.md
@@ -1,40 +0,0 @@
-# Mattermost, LINE, IRC, Nextcloud Talk, Nostr, Twitch, Tlon, Synology Chat Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`mattermost-line-irc-nextcloud-talk-nostr-twitch-tlon-synology-chat` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Mattermost, LINE, IRC, Nextcloud Talk, Nostr, Twitch, Tlon, Synology Chat` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Access and Identity: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Conversation Routing and Delivery: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Media and Rich Content: LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/media-understanding-and-media-generation.md
+++ b/.agents/skills/claw-score/references/completeness/media-understanding-and-media-generation.md
@@ -1,42 +0,0 @@
-# Media understanding and media generation Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`media-understanding-and-media-generation` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Media understanding and media generation` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Media Intake and Access: Local and remote media references, MIME and type detection, Size caps and bounded reads, Safe remote fetch, Local root policy, Inbound media store, PDF/document extraction dispatch, QR and media helper classification
- Channel Media Handling: Inbound attachment staging, Sandbox media rewrites, Reply media templating, Message-tool attachment delivery, Duplicate delivery suppression
- Media Configuration: Media capability configuration
- Text-to-Speech Delivery: TTS, Outbound Voice Audio Delivery
- Media Understanding: Audio attachment selection, Batch STT provider and CLI fallback, Voice-note mention preflight, Transcript insertion and echo, Audio proxy and limit handling, Inbound image summarization, Active vision model bypass, Text-only model media offload, Vision provider fallback, Image and PDF input routing, Video Understanding, Direct Video Analysis
- Media Generation: Image generation tool invocation, Provider and model selection, Reference image editing, Generated image task lifecycle, Generated image persistence and delivery, Music generation tool invocation, Provider and model selection, Lyrics, instrumental, duration, and format controls, Reference inputs where supported, Music task lifecycle and duplicate status, Generated audio persistence and delivery, Video generation tool invocation, Mode and provider capability selection, Reference image, video, and audio inputs, Provider option validation, Video task lifecycle and status, Generated video persistence and delivery
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/microsoft-teams.md
+++ b/.agents/skills/claw-score/references/completeness/microsoft-teams.md
@@ -1,41 +0,0 @@
-# Microsoft Teams Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`microsoft-teams` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Microsoft Teams` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Teams CLI app creation, Bot registration and manifest upload, Credential configuration, Teams app install verification, Setup status, Probe and scope reporting, Teams app doctor, Webhook and health diagnostics, Operator repair paths, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary, Setup status, Probe and scope reporting, Teams app doctor, Webhook and health diagnostics, Operator repair paths, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary
- Access and Identity: DM pairing, Stable sender identity, Allowlists and access groups, Invoke and command authorization, Teams-originated config writes, Bot Framework SSO invokes, Delegated token storage, Graph directory lookup, Member profile lookup, Bot Framework SSO invokes, Delegated token storage, Graph directory lookup, Member profile lookup
- Conversation Routing and Delivery: Team and channel allowlists, Deterministic channel replies, Mention-gated group access, Session routing, Reply and thread context, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary
- Media and Rich Content: Inbound attachments, Graph-hosted media, File consent, SharePoint and OneDrive sharing, Media fetch safety
- Native Controls and Approvals: Message action discovery, Polls and reactions, Read, edit, delete, and pin, Native approval cards, Feedback and group actions
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/multi-agent-orchestration.md
+++ b/.agents/skills/claw-score/references/completeness/multi-agent-orchestration.md
@@ -1,45 +0,0 @@
-# Multi-Agent Orchestration Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`multi-agent-orchestration` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw supports multiple coordinated agents
-as an operator-facing system. Score whether each category delivers setup,
-isolation, conversation routing, account routing, specialist lanes, delegate
-identity, status, recovery, and safe defaults.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can an operator configure and run the category workflow end to end?
- Are the taxonomy features present as supported user paths rather than partial config fragments?
- Are setup, normal operation, status or inspection, recovery, and removal paths represented where relevant?
- Are channel, account, workspace, auth, task, and delegate variants covered where the category expects them?
- Do known gaps leave major coordination or isolation branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when multiple agents can be created, isolated, routed, delegated, and inspected without implicit cross-agent leakage.
- Lower Completeness when a category depends on undocumented config, lacks deterministic routing, or cannot explain who owns state, credentials, and outbound delivery.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Agent Setup: add agents, agent list/delete, identity files, non-interactive setup, and single-agent default.
- Agent Isolation: workspace separation, state separation, auth separation, session separation, and tool profiles.
- Conversation Routing: agent selection, route precedence, default fallback, peer overrides, and cross-channel examples.
- Account Routing: multi-account setup, account selection, default accounts, account credentials, and delivery targets.
- Specialist Lanes: lane contracts, background handoff, concurrency controls, priority controls, and coordinator handoff.
- Delegate Identities: named delegates, authority model, delegate tiers, identity delegation, and organizational assistants.
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/native-windows-cli-and-gateway.md
+++ b/.agents/skills/claw-score/references/completeness/native-windows-cli-and-gateway.md
@@ -1,40 +0,0 @@
-# Native Windows CLI and Gateway Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`native-windows-cli-and-gateway` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Native Windows CLI and Gateway` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Setup: PowerShell installer, Node and package-manager bootstrap, npm global install, Packaged CLI launcher, Windows command shims, openclaw onboard, Local Gateway config, Daemon install flags, Native-vs-WSL setup boundary
- Gateway Management: openclaw gateway, Foreground runtime health/readiness, Windows-specific restart/signal, Unmanaged foreground mode, openclaw gateway install, Gateway launcher files, Scheduled Task runtime status, Startup-folder fallback, openclaw status, Windows service inspection, Post-install diagnostics
- Networking: Native Windows host binding, netsh interface portproxy, Gateway status and probe output, Loopback, LAN, and WSL boundary
- Updates: openclaw update on native Windows package, Managed Gateway stop/restart, Detached update handoff, Windows package locks
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/native-windows-companion-app.md
+++ b/.agents/skills/claw-score/references/completeness/native-windows-companion-app.md
@@ -1,41 +0,0 @@
-# Native Windows companion app Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`native-windows-companion-app` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Native Windows companion app` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Installation and Updates: Official app download, MSI/MSIX/App Installer/winget-style packaging, Windows architecture handling for x64, App release channel
- Gateway Connection: App-managed local Gateway attach/start, Remote Gateway connection modes, Device/node pairing
- Chat Sessions: Native Windows chat window, Gateway chat transport
- Status and Repair: App health states, App-specific repair, Windows system tray app, Status indicators, App-specific notification permission
- Desktop Tools and Permissions: Windows node identity, Host command execution, Desktop command policy, App approval prompts, Screen and media capture, Canvas host behavior, Windows shell integrations, App secrets, Windows ACL, Command approval
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/nix-install-path.md
+++ b/.agents/skills/claw-score/references/completeness/nix-install-path.md
@@ -1,41 +0,0 @@
-# Nix install path Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`nix-install-path` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Nix install path` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Install Handoff: Nix install overview, nix-openclaw source-of-truth, Install discoverability, Verification handoff
- Plugin Lifecycle: Lifecycle command refusal, Declarative plugin selection, Nix-store plugin loading, Hardlink safety
- Activation and App UX: Environment activation, macOS defaults activation, Runtime Nix-mode detection, Stable Nix defaults, Managed-by-Nix banner, Read-only config controls, Onboarding skip
- Config and State: Immutable config guard, Config writer refusal, Agent-first Nix edits, Explicit config path, Writable state directory, Immutable-store config support, State integrity checks
- Service Runtime and Guards: Nix profile PATH discovery, Profile precedence, Service PATH fallback, Trusted binary boundaries, Setup write refusal, Doctor repair refusal, Update handoff, Service lifecycle handoff
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/openai-codex-provider-path.md
+++ b/.agents/skills/claw-score/references/completeness/openai-codex-provider-path.md
@@ -1,41 +0,0 @@
-# OpenAI / Codex provider path Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`openai-codex-provider-path` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `OpenAI / Codex provider path` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Model and Auth: Canonical OpenAI Model Routing, Catalog, Codex OAuth Profiles, Subscription Usage, Doctor Diagnostics, Operator Repair
- Responses and Tool Compatibility: Codex Responses Transport, Payload Compatibility, Tool Context, Capability Compatibility
- Native Codex Harness: Native Codex App-server Harness, Thread Lifecycle
- Image and Multimodal Input: Image Generation Editing, Multimodal Input
- Voice and Realtime Audio: Realtime Voice Transcription, Speech
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/openclaw-app-sdk.md
+++ b/.agents/skills/claw-score/references/completeness/openclaw-app-sdk.md
@@ -1,45 +0,0 @@
-# OpenClaw App SDK Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`openclaw-app-sdk` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes a supported external App SDK
-for applications built outside the Gateway process. Score whether each category
-delivers an app-developer workflow from connection through agent runs, sessions,
-events, approvals, resources, compatibility, and operational error handling.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can an external app developer complete the category workflow using public SDK APIs?
- Are the taxonomy features represented by stable client contracts rather than protocol-only fragments?
- Are setup, authentication, streaming, result handling, error behavior, and compatibility expectations documented?
- Are browser, Node, React, testing, and custom transport variants covered where the category expects them?
- Do known gaps leave major external-app capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the SDK hides low-level Gateway protocol details behind typed, documented, and reusable client APIs.
- Lower Completeness when a category requires users to manually construct raw Gateway frames or rely on internal package shapes.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Client API: SDK entrypoints, namespace layout, package split, and app/plugin boundary.
- Gateway Access: Gateway connect, URL and token config, auto gateway, custom transport, and scopes/redaction.
- Agent Conversations: agent handles, agent runs, run results, session creation, session send, and session controls.
- Events and Approvals: event stream, event envelope, replay cursors, approval callbacks, and questions.
- Resource Helpers: models, ToolSpace, artifacts, tasks, and environments.
- Compatibility: generated client, ergonomic wrappers, unsupported calls, schema alignment, and public package contract.
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/openrouter-provider-path.md
+++ b/.agents/skills/claw-score/references/completeness/openrouter-provider-path.md
@@ -1,40 +0,0 @@
-# OpenRouter provider path Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`openrouter-provider-path` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `OpenRouter provider path` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Provider Setup and Auth: First-run setup, Default model selection, Provider plugin registration, Model-ref examples, OPENROUTER_API_KEY, Auth profiles and auth order, Status/probe and removal, Provider-entry SecretRef/API-key resolution, Gateway env inheritance, Static catalog rows, Dynamic /models discovery, openrouter/auto and nested refs, Free-model scan/probe, Model list/picker cache
- Chat Runtime and Normalization: Chat completions route, Provider routing params, Per-model route overrides, Reasoning payload policy, Anthropic/Gemini/DeepSeek variants, Streamed content parsing, reasoning_details visible output, Tool-call delta preservation, Family-specific replay policy, Response-model and usage normalization, Attribution headers, Response-cache headers/TTL/clear, Anthropic cache-control markers, Cache usage mapping, Custom proxy exclusions
- Provider Recovery and Diagnostics: Timeout/retry classification, Auth/billing/key-limit classification, Context overflow, Model fallback notices, Guarded fetch/pricing warnings
- Media Generation and Speech: image_generate OpenRouter route, video_generate async jobs/polling/download, music_generate audio route, Text-to-speech, Speech-to-text transcription, Inbound media understanding, Generated artifact delivery
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/plugin-sdk-and-bundled-plugin-architecture.md
+++ b/.agents/skills/claw-score/references/completeness/plugin-sdk-and-bundled-plugin-architecture.md
@@ -1,49 +0,0 @@
-# Plugin Surface Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`plugin-sdk-and-bundled-plugin-architecture` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully a plugin author or operator can complete the
-intended plugin lifecycle for the category: authoring, packaging, installing,
-running, approving, publishing, or testing plugins. Score whether OpenClaw
-supports the full capability set a plugin builder or operator expects, not just
-the underlying SDK or runtime primitives.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended plugin task be completed end to end by an author or
-  operator?
- Are the important plugin variants present for this category, such as channel,
-  provider, tool, bundled, local, npm, or ClawHub flows?
- Are the main lifecycle stages present where relevant: create, configure,
-  validate, run, update, and remove or roll back?
- Are compatibility, approval, or safety branches present when the category
-  implies them?
- Are important author/operator-visible gaps still forcing workarounds or
-  unsupported paths?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full plugin journey,
-  not only one import path, one packaging mode, or one runtime path.
- Lower Completeness when a category works only for bundled plugins or only for
-  selected plugin families while the category implies a broader capability.
- Publishing and testing categories should include the expected lifecycle
-  support, not just raw commands or fixtures.
- Do not use missing tests to lower Completeness; that is Coverage.
- Do not use fragility or regressions to lower Completeness; that is Quality.
-
-## Suggested Bands
-
- `Lovable` (95-100): the category supports the full intended plugin lifecycle
-  across the expected plugin variants.
- `Stable` (80-95): most author/operator workflows exist, with only bounded
-  missing branches.
- `Beta` (70-80): the main workflows exist, but notable lifecycle branches or
-  plugin variants are still missing.
- `Alpha` (50-70): only a partial plugin capability set is available.
- `Experimental` (0-50): the category exposes early or fragmentary support only.
--- a/.agents/skills/claw-score/references/completeness/raspberry-pi-small-linux-devices.md
+++ b/.agents/skills/claw-score/references/completeness/raspberry-pi-small-linux-devices.md
@@ -1,40 +0,0 @@
-# Raspberry Pi / small Linux devices Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`raspberry-pi-small-linux-devices` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Raspberry Pi / small Linux devices` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Setup and Compatibility: Hardware and 64-bit OS requirements, Node runtime setup, OpenClaw install and onboarding, First-run verification, Supported Pi model selection, 64-bit ARM boundary, Unsupported device guidance, Slow-device caveats, npm/pnpm/Bun install modes, Installer architecture detection, Optional ARM binary checks, Fallback/build guidance
- Remote Access and Auth: Headless API-key auth, Gateway shared-secret auth, Device pairing approvals, SecretRef handling, Token drift recovery, SSH tunnel dashboard access, Tailscale Serve/Funnel, Loopback/non-loopback exposure controls, Authenticated Control UI access
- Gateway Runtime: Always-on Gateway process, Cloud model configuration, Channel startup, Gateway health/status, User service install, linger/boot persistence, Service drop-ins, Restart tuning, Status/log inspection, Backup/restore
- Performance and Diagnostics: Swap and low-RAM tuning, USB SSD guidance, Compile cache/no-respawn settings, OOM/performance troubleshooting, Diagnostics bundles
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/security-auth-pairing-and-secrets.md
+++ b/.agents/skills/claw-score/references/completeness/security-auth-pairing-and-secrets.md
@@ -1,42 +0,0 @@
-# Security, auth, pairing, and secrets Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`security-auth-pairing-and-secrets` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Security, auth, pairing, and secrets` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Approval Policy and Tool Safeguards: Approval Policy, Dangerous Tool Safeguards
- Gateway Auth and Remote Access: Shared Gateway token/password auth, Gateway auth mode, Trusted-proxy identity, Tailscale Serve/Funnel, Bind and origin restrictions, WebSocket handshake auth, Operator-facing docs, Browser Control UI, Remote Client Trust
- Channel Access Control: Channel Identity, Allowlists, Sender Pairing
- Device and Node Pairing: Setup codes, Device identity creation, Device-token issuance, Device pairing approvals for operator, Operator scopes that gate pairing, Local Control UI, Auth migration, Operator-facing docs, Node Pairing, Capability Trust, Remote Exec Approvals
- Plugin Trust: Plugin Installation Trust, Security Boundaries
- Credential and Secret Hygiene: Provider Auth Profiles, API Key Health, Secrets Storage, Redaction, Configuration Hygiene
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/session-memory-and-context-engine.md
+++ b/.agents/skills/claw-score/references/completeness/session-memory-and-context-engine.md
@@ -1,46 +0,0 @@
-# Session, memory, and context engine Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`session-memory-and-context-engine` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Session, memory, and context engine` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- CLI Session and Transcript Management: CLI Session, Transcript Management
- Compaction, Pruning, and Token Pressure: Compaction, Pruning, Token Pressure
- Context Engine and Runtime Assembly: Context Engine, Runtime Assembly
- Cross-client History and Session Parity: Cross-client History, Session Parity
- Diagnostics, Maintenance, and Recovery: Diagnostics, Maintenance, Recovery
- Instruction Profile and Context Visibility: Instruction Profile, Context Visibility
- Memory Backend Storage and Embedding Search: Memory Backend Storage, Embedding Search
- Memory Files, Tools, and Active Memory: Memory Files, Tools, Active Memory
- Session Routing and Conversation Binding: Session Routing, Conversation Binding
- Transcript Persistence and Durability: Transcript Persistence, Durability
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/signal.md
+++ b/.agents/skills/claw-score/references/completeness/signal.md
@@ -1,41 +0,0 @@
-# Signal Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`signal` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Signal` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Setup and Account Health: QR link setup, SMS registration, Installer and binary setup, Container account provisioning, Status probes, Setup diagnostics, Account safety guardrails
- Conversation Access and Routing: DM pairing, DM allowlists, Sender identity normalization, Group allowlists, Mention gates, Pending group history
- Message Delivery and Actions: Text delivery targets, Media delivery and limits, Typing and read receipts, Styled/chunked output, Reaction action discovery, Add/remove reactions, Group reaction targeting
- Native Approvals: Native approval routing, Reaction approval responses, Approver targeting
- Transport: Native daemon transport, Container transport, API mode selection, Receive reconnect/readiness
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/slack.md
+++ b/.agents/skills/claw-score/references/completeness/slack.md
@@ -1,41 +0,0 @@
-# Slack Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`slack` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Slack` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: App Install, Slack app credentials, Manifest, Scopes, Channel status diagnostics, Slack account status, Operator Repair, Socket, HTTP transport, Runtime Lifecycle, Socket, HTTP transport, Runtime Lifecycle, Channel status diagnostics, Slack account status, Operator Repair
- Access and Identity: Channel allowlists, Thread routing, Session Isolation, DM Pairing, Sender Authorization
- Conversation Routing and Delivery: Channel allowlists, Thread routing, Session Isolation, DM Pairing, Sender Authorization, Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision, Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision
- Media and Rich Content: Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision
- Native Controls and Approvals: Slash Commands, Native Command Routing, Interactive Replies, App Home, Assistant Events, Native Approvals, Actions, Security-sensitive Ops, Interactive Replies, App Home, Assistant Events, Native Approvals, Actions, Security-sensitive Ops
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/telegram.md
+++ b/.agents/skills/claw-score/references/completeness/telegram.md
@@ -1,41 +0,0 @@
-# Telegram Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`telegram` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Telegram` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: BotFather token creation, TELEGRAM_BOT_TOKEN, Setup wizard credential capture, Startup getMe, Doctor/status surfacing, Named account configuration, CLI/message-tool targets, Directory adapters, Channel status, Account-scoped outbound, Long polling runner startup, Webhook listener startup, Reconnect, Restart, Named account configuration, Directory adapters and configured peers/groups for, Channel status, Account-scoped outbound, Long polling runner startup, Reconnect, Restart
- Access and Identity: dmPolicy modes, Pairing-code approval, Numeric Telegram user ID normalization with telegram, allowFrom, Unauthorized DM, Group allowlists, Supergroup negative chat IDs, Forum topic session keys, ACP topic routing, Session key construction
- Conversation Routing and Delivery: dmPolicy modes, Pairing-code approval, Numeric Telegram user ID normalization with telegram, allowFrom, Unauthorized DM, Group allowlists, Supergroup negative chat IDs, Forum topic session keys, ACP topic routing, Session key construction, Inbound media download, Voice notes, Location, Poll sending, Reactions, Text, Preview streaming, Reply threading tags, Durable outbound message recording, Voice notes, Poll sending, Reply threading tags, Durable outbound message recording
- Media and Rich Content: Inbound media download, Voice notes, Location, Poll sending, Reactions, Text, Preview streaming, Reply threading tags, Durable outbound message recording, Voice notes, Poll sending, Reply threading tags, Durable outbound message recording, Inbound media download, Voice notes, Location and venue extraction into channel context, Poll sending, Reactions
- Native Controls and Approvals: Inline keyboard rendering, Exec approvals in DMs, Message actions, Action capability discovery, Native setMyCommands startup sync, Command name/description normalization, Built-in commands, Command authorization in DMs, Model buttons, Native `setMyCommands` startup sync, Command name/description normalization, Built-in commands such as `/help`, Command authorization in DMs, Model buttons and command UI helpers
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/telemetry-diagnostics-and-observability.md
+++ b/.agents/skills/claw-score/references/completeness/telemetry-diagnostics-and-observability.md
@@ -1,41 +0,0 @@
-# Observability Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`telemetry-diagnostics-and-observability` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Observability` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Health and Repair: Background health-monitor loop, Per-account enable/disable settings, Startup grace, Restart logging, openclaw doctor, Structured health checks, Core doctor checks, Plugin SDK doctor/health contracts, openclaw status, openclaw health, Gateway RPC health, Cached health snapshots
- Logging: Rolling Gateway JSONL file logs, openclaw logs, Gateway RPC logs.tail, Redaction patterns and sinks, Trace correlation fields
- Diagnostic Collection: openclaw gateway diagnostics export, openclaw gateway stability --bundle, Chat /diagnostics, Support zip composition, Bounded in-process stability recorder, openclaw gateway stability, Memory pressure events, Critical memory pressure snapshot option
- Telemetry Export: Diagnostic event types, Async dispatch, W3C trace context creation, Plugin SDK diagnostic runtime exports, Model-call diagnostic events, diagnostics-otel plugin install, OTLP/HTTP traces, Trusted trace context, Model and runtime telemetry, diagnostics-prometheus plugin install, Gateway-authenticated GET /api/diagnostics/prometheus, Prometheus text exposition, Trusted diagnostic event subscription
- Session Diagnostics: session.state, Diagnostic session activity snapshots, Model usage, Export of session signals to stability
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/tui-and-terminal-ux.md
+++ b/.agents/skills/claw-score/references/completeness/tui-and-terminal-ux.md
@@ -1,41 +0,0 @@
-# TUI Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`tui-and-terminal-ux` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `TUI` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Runtime Modes: Gateway TUI launch, Local chat launch, Terminal alias launch, Initial message launch, Launch option validation, Gateway connection, Gateway authentication, History load on attach, Reconnect visibility, Gateway command RPCs, Embedded local chat, Local auth flow, Config repair loop, Gateway-free recovery
- Input and Commands: Message composition, Input history, Keyboard shortcuts, Paste and busy-submit handling, IME and AltGr handling, Slash Commands, Pickers, Settings
- Session Management: Session Lifecycle, History, Resume
- Local Shell Execution: Bang-command routing, Approval prompt, Command output display, Execution environment marker
- Rendering and Output Safety: Streaming Message Rendering, Tool Cards, Terminal Rendering Primitives, Output Safety
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/voice-and-realtime-talk.md
+++ b/.agents/skills/claw-score/references/completeness/voice-and-realtime-talk.md
@@ -1,42 +0,0 @@
-# Voice and realtime talk Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`voice-and-realtime-talk` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Voice and realtime talk` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Talk Providers: OpenAI Realtime voice backend bridge, Google Gemini Live backend bridge, Realtime voice provider SDK contracts, Provider diagnostics, Talk catalog, Talk provider config, Shared native config parsing
- Realtime Talk Sessions: Agent consult handoff, Active Talk agent-run status, Talkback runtime behavior, Forced consult scheduling, Browser Talk start/stop UI, Browser WebRTC sessions, Browser relay mode, Browser tool-call forwarding, Realtime session controls, Gateway relay sessions, Audio-frame limits
- Speech and Transcription: Voice directives, Talk speech playback, Transcription relay sessions, Realtime transcription providers, Native directive parsing
- Native App Talk: macOS native Talk mode, iOS Talk mode, Android Talk mode, Shared Talk config
- Voice Wake and Routing: Wake-word settings, Wake routing, macOS Voice Wake runtime, Mobile wake preferences
- Talk Observability: Talk event logging, Session-log health, Live smoke output, Prometheus diagnostic counters, Operator visibility into setup
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/voice-call-channel.md
+++ b/.agents/skills/claw-score/references/completeness/voice-call-channel.md
@@ -1,41 +0,0 @@
-# Voice Call channel Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`voice-call-channel` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Voice Call channel` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Voice Call Channel, Voice Call Channel, Voice Call Channel
- Access and Identity: Voice Call Channel
- Conversation Routing and Delivery: Voice Call Channel
- Media and Rich Content: Voice Call Channel, Voice Call Channel
- Realtime Voice and Calls: Voice Call Channel, Voice Call Channel, Voice Call Channel, Voice Call Channel, Voice Call Channel
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/watchos-companion-surfaces.md
+++ b/.agents/skills/claw-score/references/completeness/watchos-companion-surfaces.md
@@ -1,41 +0,0 @@
-# watchOS companion surfaces Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`watchos-companion-surfaces` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `watchOS companion surfaces` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Delivery and Recovery: APNs relay/direct registration as it affects, Silent push, Pending approval recovery IDs, Gateway-side iOS exec approval, iPhone-side WatchConnectivity transport, Watch-side receiver activation, Delivery fallback among reachable messages
- Exec Approvals: Watch exec approval prompt, Watch approval list/detail UI, iPhone-side prompt caching
- Distribution and Support: Watch app, Signing/profile variables, Public/support status, Changelog, Release metadata, Historical bug/regression themes relevant to scoring
- Notifications and Replies: watch.status, Payload normalization, Mirrored iOS notification fallback when watch, Watch action buttons from generic prompt, Watch-to-iPhone reply payloads, iPhone-side dedupe, Mirrored iOS notification action
- Watch App UI: Watch app entry point, Generic inbox, Persistent watch inbox state
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/web-search-tools.md
+++ b/.agents/skills/claw-score/references/completeness/web-search-tools.md
@@ -1,40 +0,0 @@
-# Web search tools Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`web-search-tools` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Web search tools` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Search Providers: API-backed providers, Keyless and self-hosted providers, Provider comparison and auto-detection, Provider-specific filters and extraction, Result normalization, OpenAI native web_search, Codex native web_search, Gemini grounding, Grok web grounding, Kimi web search, Provider-native citations, Model and filter routing, webSearchProviders, registerWebSearchProvider, webFetchProviders, registerWebFetchProvider, public-artifact loading, runtime resolution, contract tests
- Setup and Diagnostics: Provider credentials, Default provider selection, Credential repair, Status checks, Quota errors, Cache controls, Provider diagnostics, Retry and fallback, Operator repair
- Network Safety: Network Safety, SSRF, Redirects, Untrusted Content
- Tool Availability and Fetch: web_search exposure, web_fetch exposure, x_search exposure, group:web policy, disabled-state diagnostics, provider/model gating, URL fetch, HTML extraction, PDF/text extraction, Safe truncation, Content citation handoff
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/whatsapp.md
+++ b/.agents/skills/claw-score/references/completeness/whatsapp.md
@@ -1,41 +0,0 @@
-# WhatsApp Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`whatsapp` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `WhatsApp` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- Channel Setup and Operations: Official @openclaw/whatsapp plugin metadata, openclaw plugin install whatsapp, Channel config schema, Baileys socket lifecycle, Operator troubleshooting, Baileys socket lifecycle, Operator troubleshooting for reconnect loops
- Access and Identity: QR login, Baileys multi-file auth persistence, DM pairing challenge, Multi-account/default-account resolution, Direct-message dmPolicy, Sender identity extraction, Privacy controls for plugin hooks, Direct-message `dmPolicy`, Sender identity extraction, Privacy controls for plugin hooks and
- Conversation Routing and Delivery: Group allowlists, Group session keys, Outbound text sends, Provider-accepted receipts, Outbound text sends, Provider-accepted receipts and durable delivery identifiers
- Media and Rich Content: Inbound media download, Outbound image
- Native Controls and Approvals: Native exec, Approver target resolution
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/claw-score/references/completeness/windows-via-wsl2.md
+++ b/.agents/skills/claw-score/references/completeness/windows-via-wsl2.md
@@ -1,41 +0,0 @@
-# Windows via WSL2 Completeness
-
-Use this rubric when assigning category Completeness scores for the
-`windows-via-wsl2` surface.
-
-## What Completeness Means Here
-
-Completeness measures how fully OpenClaw exposes the intended `Windows via WSL2` capability set to the user, operator, author, or maintainer persona for this surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform/provider/channel variants where they apply.
-
-## Scoring Questions
-
-For each category, ask:
-
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation, status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
-
-## Surface-Specific Guidance
-
- Favor higher Completeness when the category supports the full operator-visible workflow described by taxonomy and the category note evidence.
- Lower Completeness when only the happy path exists, when important variants are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is Quality.
-
-## Category Scope
-
- WSL Setup and Updates: WSL2 + Ubuntu installation, Node runtime, Linux install flow inside WSL2, WSL2 runtime boundary, WSL2 network-family requirements, Source install and build inside WSL2, openclaw update, npm/pnpm/git package-root, Managed systemd Gateway restart, Service metadata refresh, Package-manager caveats
- Gateway Service Lifecycle: Onboarded systemd install, Gateway service install, systemd user unit rendering, WSL-aware systemd unavailable hints, Doctor service repair, WSL user-service linger, Systemd availability after Windows boot, Windows startup task for WSL, Verification before Windows sign-in, Clear expectations around PC power
- Gateway Access and Exposure: Gateway token/password auth, Provider credentials, Gateway auth SecretRefs, Remote URL credential precedence, WSL virtual network, Windows portproxy setup, Windows Firewall rules, Reachable Gateway URLs, Loopback and LAN exposure, WSL2 IPv4 networking, Tailscale remote access
- Diagnostics and Repair: openclaw doctor, openclaw status, openclaw logs, SecretRef, WSL/systemd unavailable hints, Operator repair guidance after WSL2 service
- Browser and Control UI: WSL2 Gateway with Windows browser, Windows Control UI URL, Raw remote CDP to Windows Chrome, Host-local Chrome MCP, Browser profile cdpUrl, Layered diagnostics
-
-## Suggested Bands
-
- `Lovable` (95-100): complete across expected workflows, variants, and recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended capability.
--- a/.agents/skills/discord-user-post/SKILL.md
+++ b/.agents/skills/discord-user-post/SKILL.md
@@ -1,51 +0,0 @@
---
-name: discord-user-post
-description: Post an approved message as the logged-in Discord user through the Discord desktop app. Use for release announcements or other direct user-authored Discord posts; not for OpenClaw channel sends, bots, webhooks, relays, agent sessions, or archive search.
---
-
-# Discord User Post
-
-Use `$computer-use` to operate `/Applications/Discord.app` in the user's
-existing logged-in session. This workflow represents the user directly.
-
-## Prepare
-
-1. Draft the complete final message outside Discord.
-2. Confirm the intended server and channel with the user when either is
-   ambiguous.
-3. Open Discord and navigate to the exact destination without entering the
-   message.
-4. Verify the visible server name, channel header, and logged-in account.
-
-Do not infer the target from unrelated Discord content. Stop if Discord is not
-logged in, the account is wrong, or the exact destination cannot be verified.
-
-## Confirm and Post
-
-Posting is representational communication. Follow the `$computer-use`
-confirmation policy even when the user previously asked for an announcement:
-
-1. Show the user the exact final body and verified destination.
-2. Request action-time confirmation before typing into Discord.
-3. After confirmation, enter the approved body unchanged.
-4. Visually inspect the composed message and destination again.
-5. Send once.
-
-If the body or destination changes after confirmation, request confirmation
-again before sending.
-
-## Verify
-
- Confirm the message appears once, from the user's account, in the intended
-  channel.
- Report the server, channel, and visible send result.
- Do not edit, delete, react, or send a follow-up without the corresponding
-  user instruction and confirmation.
-
-## Guardrails
-
- Never use `openclaw message`, an OpenClaw agent, a Discord bot, webhook, relay,
-  or token for this workflow.
- Never expose private Discord content or account details in public output.
- Never send a draft, partial message, duplicate, or unreviewed attachment.
- For Discord archive/history/search, use `$discrawl` instead.
--- a/.agents/skills/discord-user-post/agents/openai.yaml
+++ b/.agents/skills/discord-user-post/agents/openai.yaml
@@ -1,4 +0,0 @@
-interface:
-  display_name: "Discord User Post"
-  short_description: "Post approved messages through the logged-in Discord app"
-  default_prompt: "Post this approved message as me through the logged-in Discord desktop app."
--- a/.agents/skills/openclaw-changelog-update/SKILL.md
+++ b/.agents/skills/openclaw-changelog-update/SKILL.md
@@ -91,32 +91,6 @@ attribution.
   - if any compatibility `removeAfter` is on/before release date, resolve it
     or explicitly record the blocker before shipping
 10. Validate and ship:
-   - generate and verify the complete contribution ledger before committing:
-     ```bash
-     node .agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs \
-       --base <base-tag> \
-       --target <target-ref> \
-       --version <YYYY.M.PATCH> \
-       --write-ledger
-     ```
-   - the command fails when any `#NNN` reference in release history or the
-     rendered release section is absent from the ledger, when reverted work is
-     presented as shipped, or when an eligible PR author, issue reporter, or
-     known co-author is missing from that entry's `Thanks @...` credit
-   - after the GitHub release or prerelease is published, verify every matching
-     release page against the same source section:
-     ```bash
-     node .agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs \
-       --base <base-tag> \
-       --target <target-ref> \
-       --version <YYYY.M.PATCH> \
-       --release-tag v<YYYY.M.PATCH> \
-       --check-github
-     ```
-   - add one `--release-tag` for every beta and stable page in the train; a
-     `### Release verification` tail is permitted, but any other body drift
-     fails the check; the GitHub body must begin with the complete
-     `## YYYY.M.PATCH` changelog section, including its heading
   - `git diff --check`
   - for docs/changelog-only changes, no broad tests are required
   - commit with `scripts/committer "docs(changelog): refresh YYYY.M.PATCH notes" CHANGELOG.md`
--- a/.agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs
+++ b/.agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs
@@ -1,443 +0,0 @@
-#!/usr/bin/env node
-
-import { execFileSync } from "node:child_process";
-import { readFileSync, writeFileSync } from "node:fs";
-
-const repo = "openclaw/openclaw";
-const excludedHandles = new Set(["openclaw", "clawsweeper", "codex", "steipete"]);
-
-function fail(message) {
-  throw new Error(message);
-}
-
-function parseArgs(argv) {
-  const options = {
-    releaseTags: [],
-    checkGithub: false,
-    json: false,
-    writeLedger: false,
-  };
-
-  for (let index = 0; index < argv.length; index += 1) {
-    const arg = argv[index];
-    if (arg === "--check-github" || arg === "--json" || arg === "--write-ledger") {
-      options[
-        arg === "--check-github"
-          ? "checkGithub"
-          : arg === "--write-ledger"
-            ? "writeLedger"
-            : "json"
-      ] = true;
-      continue;
-    }
-    if (arg === "--base" || arg === "--target" || arg === "--version" || arg === "--release-tag") {
-      const value = argv[index + 1];
-      if (!value || value.startsWith("--")) {
-        fail(`missing value for ${arg}`);
-      }
-      if (arg === "--release-tag") {
-        options.releaseTags.push(value);
-      } else {
-        options[arg.slice(2)] = value;
-      }
-      index += 1;
-      continue;
-    }
-    fail(`unknown argument: ${arg}`);
-  }
-
-  for (const name of ["base", "target", "version"]) {
-    if (!options[name]) {
-      fail(`--${name} is required`);
-    }
-  }
-  if (options.checkGithub && options.releaseTags.length === 0) {
-    fail("--check-github requires at least one --release-tag");
-  }
-  return options;
-}
-
-function run(command, args) {
-  return execFileSync(command, args, {
-    encoding: "utf8",
-    env: { ...process.env, NO_COLOR: "1" },
-    stdio: ["ignore", "pipe", "pipe"],
-  });
-}
-
-function git(args) {
-  return run("git", args).trimEnd();
-}
-
-function githubApi(args) {
-  try {
-    return JSON.parse(run("ghx", ["api", ...args]).replace(/\u001B\[[0-?]*[ -/]*[@-~]/g, ""));
-  } catch (error) {
-    if (typeof error.stdout === "string" && error.stdout.trim() !== "") {
-      return JSON.parse(error.stdout.replace(/\u001B\[[0-?]*[ -/]*[@-~]/g, ""));
-    }
-    throw error;
-  }
-}
-
-function escapeRegExp(value) {
-  return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
-}
-
-function isEligibleHandle(handle) {
-  return Boolean(handle) && !handle.endsWith("[bot]") && !excludedHandles.has(handle.toLowerCase());
-}
-
-function sectionFor(changelog, version) {
-  const heading = new RegExp(`^## ${escapeRegExp(version)}\\r?$`, "m").exec(changelog);
-  if (!heading || heading.index === undefined) {
-    fail(`CHANGELOG.md does not contain ## ${version}`);
-  }
-  const start = heading.index;
-  const bodyStart = changelog.indexOf("\n", start) + 1;
-  const next = /^## /gm;
-  next.lastIndex = bodyStart;
-  const nextHeading = next.exec(changelog);
-  const end = nextHeading?.index ?? changelog.length;
-  return {
-    start,
-    end,
-    source: changelog.slice(start, end).trimEnd(),
-    body: changelog.slice(bodyStart, end).trim(),
-  };
-}
-
-function referencesIn(text) {
-  return [...text.matchAll(/#(\d+)/g)].map((match) => Number(match[1]));
-}
-
-function appendReferences(references, additions) {
-  const seen = new Set(references);
-  for (const number of additions) {
-    if (!seen.has(number)) {
-      references.push(number);
-      seen.add(number);
-    }
-  }
-}
-
-function sourceCommits(base, target) {
-  const mergeBase = git(["merge-base", base, target]);
-  const output = git([
-    "log",
-    "--first-parent",
-    "--reverse",
-    "--format=%H%x1f%s%x1f%B%x1e",
-    `${mergeBase}..${target}`,
-  ]);
-  const commits = new Map();
-  const revertsByTarget = new Map();
-  for (const record of output.split("\x1e")) {
-    if (!record) {
-      continue;
-    }
-    const [rawHash, subject, ...bodyParts] = record.split("\x1f");
-    const hash = rawHash.trim();
-    const body = bodyParts.join("\x1f");
-    const revertedHash = body.match(/This reverts commit ([0-9a-f]{7,40})\./i)?.[1];
-    const isRevert = subject.startsWith('Revert "') || Boolean(revertedHash);
-    commits.set(hash, { body, hash, isRevert, revertedHash, subject });
-  }
-  for (const commit of commits.values()) {
-    if (!commit.revertedHash) {
-      continue;
-    }
-    const targetHash = [...commits.keys()].find((candidate) => candidate.startsWith(commit.revertedHash));
-    if (targetHash) {
-      const reverts = revertsByTarget.get(targetHash) ?? [];
-      reverts.push(commit.hash);
-      revertsByTarget.set(targetHash, reverts);
-    }
-  }
-  const active = new Map();
-  function isActive(hash) {
-    if (active.has(hash)) {
-      return active.get(hash);
-    }
-    const cancellingReverts = revertsByTarget.get(hash) ?? [];
-    const value = !cancellingReverts.some((revertHash) => isActive(revertHash));
-    active.set(hash, value);
-    return value;
-  }
-
-  const references = [];
-  const revertedReferences = new Set();
-  const coauthorsByReference = new Map();
-  for (const commit of commits.values()) {
-    if (commit.isRevert) {
-      continue;
-    }
-    const uniqueReferences = [...new Set(referencesIn(`${commit.subject}\n${commit.body}`))];
-    if (!isActive(commit.hash)) {
-      for (const number of uniqueReferences) {
-        revertedReferences.add(number);
-      }
-      continue;
-    }
-    appendReferences(references, uniqueReferences);
-    const coauthors = [...commit.body.matchAll(/<(?:(?:\d+)\+)?([^@<>\s]+)@users\.noreply\.github\.com>/gi)]
-      .map((match) => match[1])
-      .filter(isEligibleHandle);
-    for (const number of uniqueReferences) {
-      if (coauthors.length > 0) {
-        const handles = coauthorsByReference.get(number) ?? new Set();
-        for (const handle of coauthors) {
-          handles.add(handle);
-        }
-        coauthorsByReference.set(number, handles);
-      }
-    }
-  }
-
-  return { mergeBase, references, revertedReferences, coauthorsByReference };
-}
-
-function graphql(query) {
-  return githubApi(["graphql", "-f", `query=${query}`]).data;
-}
-
-function resolveReferences(numbers) {
-  const nodes = new Map();
-  for (let index = 0; index < numbers.length; index += 40) {
-    const chunk = numbers.slice(index, index + 40);
-    const fields = chunk
-      .map(
-        (number) => `n${number}: repository(owner: "openclaw", name: "openclaw") {
-          issueOrPullRequest(number: ${number}) {
-            __typename
-            ... on Issue { number title author { __typename login } }
-            ... on PullRequest { number title author { __typename login } }
-          }
-        }`,
-      )
-      .join("\n");
-    const data = graphql(`query { ${fields} }`);
-    for (const number of chunk) {
-      const node = data[`n${number}`]?.issueOrPullRequest;
-      if (node) {
-        nodes.set(number, node);
-      }
-    }
-  }
-  return nodes;
-}
-
-function resolveCoauthors(handles) {
-  const resolved = new Map();
-  const uniqueHandles = [...new Set(handles)];
-  for (let index = 0; index < uniqueHandles.length; index += 80) {
-    const chunk = uniqueHandles.slice(index, index + 80);
-    const fields = chunk
-      .map(
-        (handle, offset) =>
-          `u${index + offset}: user(login: ${JSON.stringify(handle)}) { __typename login }`,
-      )
-      .join("\n");
-    const data = graphql(`query { ${fields} }`);
-    for (let offset = 0; offset < chunk.length; offset += 1) {
-      const user = data[`u${index + offset}`];
-      if (user?.__typename === "User" && isEligibleHandle(user.login)) {
-        resolved.set(chunk[offset].toLowerCase(), user.login);
-      }
-    }
-  }
-  return resolved;
-}
-
-function thanksFor(node, coauthorHandles) {
-  const handles = [];
-  if (node.author?.__typename === "User" && isEligibleHandle(node.author.login)) {
-    handles.push(node.author.login);
-  }
-  for (const handle of coauthorHandles) {
-    if (!handles.some((candidate) => candidate.toLowerCase() === handle.toLowerCase())) {
-      handles.push(handle);
-    }
-  }
-  return handles;
-}
-
-function ledgerFor(base, target, references, nodes, coauthorsByReference, resolvedCoauthors) {
-  const missing = references.filter((number) => !nodes.has(number));
-  if (missing.length > 0) {
-    fail(`GitHub could not resolve source references: ${missing.map((number) => `#${number}`).join(", ")}`);
-  }
-
-  const entries = references.map((number) => {
-    const node = nodes.get(number);
-    const rawCoauthors = coauthorsByReference.get(number) ?? new Set();
-    const coauthors = [...rawCoauthors]
-      .map((handle) => resolvedCoauthors.get(handle.toLowerCase()))
-      .filter(Boolean);
-    return {
-      number,
-      title: node.title.replace(/#(\d+)/g, "issue $1").replace(/\s+/g, " ").trim(),
-      type: node.__typename,
-      thanks: thanksFor(node, coauthors),
-    };
-  });
-
-  const pullRequests = entries.filter((entry) => entry.type === "PullRequest");
-  const issues = entries.filter((entry) => entry.type === "Issue");
-  const renderEntry = (entry, issue = false) => {
-    const attribution = entry.thanks.length > 0 ? ` Thanks ${entry.thanks.map((handle) => `@${handle}`).join(" and ")}.` : "";
-    return `- ${issue ? "Reported: " : ""}${entry.title} (#${entry.number}).${attribution}`;
-  };
-  const ledger = [
-    "### Complete contribution ledger",
-    "",
-    `This audited record covers the complete ${base}..${target} history: ${pullRequests.length} PRs and ${issues.length} linked issues. The grouped notes above prioritize user impact; this ledger preserves every contribution reference and eligible human credit.`,
-    "",
-    "#### Pull requests",
-    "",
-    ...pullRequests.map((entry) => renderEntry(entry)),
-    "",
-    "#### Linked issues",
-    "",
-    ...issues.map((entry) => renderEntry(entry, true)),
-  ].join("\n");
-  return { entries, issues, ledger, pullRequests };
-}
-
-function replaceLedger(changelog, section, ledger) {
-  const beforeLedger = section.source.replace(/\n+### Complete contribution ledger[\s\S]*$/m, "").trimEnd();
-  const replacement = `${beforeLedger}\n\n${ledger}\n`;
-  return `${changelog.slice(0, section.start)}${replacement}${changelog.slice(section.end)}`;
-}
-
-function ledgerChecks(section, entries) {
-  const errors = [];
-  if (!section.source.includes("### Highlights")) {
-    errors.push("missing ### Highlights");
-  }
-  if (!section.source.includes("### Changes")) {
-    errors.push("missing ### Changes");
-  }
-  if (!section.source.includes("### Fixes")) {
-    errors.push("missing ### Fixes");
-  }
-  const ledgerStart = section.source.indexOf("### Complete contribution ledger");
-  if (ledgerStart < 0) {
-    errors.push("missing ### Complete contribution ledger");
-    return errors;
-  }
-  const ledger = section.source.slice(ledgerStart);
-  const entryNumbers = new Set(entries.map((entry) => entry.number));
-  for (const number of new Set(referencesIn(section.source))) {
-    if (!entryNumbers.has(number)) {
-      errors.push(`missing ledger entry for #${number}`);
-    }
-  }
-  for (const entry of entries) {
-    const prefix = entry.type === "Issue" ? "- Reported: " : "- ";
-    const line = ledger
-      .split("\n")
-      .find((candidate) => candidate.startsWith(prefix) && candidate.includes(`(#${entry.number})`));
-    if (!line) {
-      errors.push(`missing ledger entry for #${entry.number}`);
-      continue;
-    }
-    for (const handle of entry.thanks) {
-      if (!line.toLowerCase().includes(`@${handle.toLowerCase()}`)) {
-        errors.push(`missing Thanks @${handle} for #${entry.number}`);
-      }
-    }
-  }
-  return errors;
-}
-
-function releaseChecks(section, releaseTags) {
-  const expected = section.source;
-  const checks = [];
-  for (const tag of releaseTags) {
-    const release = githubApi([`repos/${repo}/releases/tags/${encodeURIComponent(tag)}`]);
-    const suffix = release.body.slice(expected.length).trimStart();
-    const matches =
-      release.body === expected ||
-      (release.body.startsWith(expected) && (suffix === "" || suffix.startsWith("### Release verification")));
-    checks.push({
-      tag,
-      releaseId: release.id,
-      matches,
-      bodyLength: release.body.length,
-    });
-  }
-  return checks;
-}
-
-function main() {
-  const options = parseArgs(process.argv.slice(2));
-  let changelog = readFileSync("CHANGELOG.md", "utf8");
-  let section = sectionFor(changelog, options.version);
-  const source = sourceCommits(options.base, options.target);
-  const preexistingNotes = section.source.replace(/\n+### Complete contribution ledger[\s\S]*$/m, "");
-  const noteReferences = referencesIn(preexistingNotes);
-  const revertedNoteReferences = noteReferences.filter((number) => source.revertedReferences.has(number));
-  if (revertedNoteReferences.length > 0) {
-    fail(
-      `release notes reference reverted work: ${[
-        ...new Set(revertedNoteReferences),
-      ]
-        .map((number) => `#${number}`)
-        .join(", ")}`,
-    );
-  }
-  const references = [...source.references];
-  appendReferences(references, noteReferences);
-  const nodes = resolveReferences(references);
-  const coauthorHandles = [...source.coauthorsByReference.values()].flatMap((handles) => [...handles]);
-  const resolvedCoauthors = resolveCoauthors(coauthorHandles);
-  const ledger = ledgerFor(
-    options.base,
-    options.target,
-    references,
-    nodes,
-    source.coauthorsByReference,
-    resolvedCoauthors,
-  );
-
-  if (options.writeLedger) {
-    changelog = replaceLedger(changelog, section, ledger.ledger);
-    writeFileSync("CHANGELOG.md", changelog);
-    section = sectionFor(changelog, options.version);
-  }
-
-  const errors = ledgerChecks(section, ledger.entries);
-  const github = options.checkGithub ? releaseChecks(section, options.releaseTags) : [];
-  for (const check of github) {
-    if (!check.matches) {
-      errors.push(`GitHub release ${check.tag} does not match the ${options.version} CHANGELOG section`);
-    }
-  }
-
-  const result = {
-    base: options.base,
-    target: options.target,
-    mergeBase: source.mergeBase,
-    version: options.version,
-    source: {
-      references: references.length,
-      pullRequests: ledger.pullRequests.length,
-      issues: ledger.issues.length,
-    },
-    github,
-    errors,
-  };
-  if (options.json) {
-    process.stdout.write(`${JSON.stringify(result, null, 2)}\n`);
-  } else {
-    process.stdout.write(
-      `${options.version}: ${ledger.pullRequests.length} PRs, ${ledger.issues.length} issues, ${errors.length === 0 ? "verified" : `${errors.length} errors`}\n`,
-    );
-  }
-  if (errors.length > 0) {
-    process.exitCode = 1;
-  }
-}
-
-main();
--- a/.agents/skills/openclaw-pr-maintainer/SKILL.md
+++ b/.agents/skills/openclaw-pr-maintainer/SKILL.md
@@ -284,7 +284,7 @@ gh search issues --repo openclaw/openclaw --match title,body --limit 50 \
 - If bot review conversations exist on your PR, address them and resolve them yourself once fixed.
 - Leave a review conversation unresolved only when reviewer or maintainer judgment is still needed.
 - Before landing any PR with non-trivial code changes, run `$autoreview` until no accepted/actionable findings remain, unless equivalent manual review already covered it, the change is trivial/docs-only, or the user opts out.
- When an agent is landing or merging a PR targeting `main`, use only the repo-native `scripts/pr` wrapper: run `scripts/pr review-init <PR>`, follow its emitted checkout/guard guidance, initialize and complete review artifacts with `scripts/pr review-artifacts-init <PR>`, validate them with `scripts/pr review-validate-artifacts <PR>`, then run `scripts/pr prepare-run <PR>` and `scripts/pr merge-run <PR>`.
+- When landing or merging any PR, follow the global `/landpr` process.
 - Use `scripts/committer "<msg>" <file...>` for scoped commits instead of manual `git add` and `git commit`.
 - Keep commit messages concise and action-oriented.
 - Group related changes; avoid bundling unrelated refactors.
--- a/.agents/skills/openclaw-qa-testing/SKILL.md
+++ b/.agents/skills/openclaw-qa-testing/SKILL.md
@@ -13,7 +13,7 @@ Use this skill for `qa-lab` / `qa-channel` work. Repo-local QA only.
 - `docs/help/testing.md`
 - `docs/channels/qa-channel.md`
 - `qa/README.md`
- `qa/scenarios/index.yaml`
+- `qa/scenarios/index.md`
 - `extensions/qa-lab/src/suite.ts`
 - `extensions/qa-lab/src/character-eval.ts`

@@ -198,9 +198,7 @@ pnpm openclaw qa character-eval \
 - Judges default to `openai/gpt-5.4,thinking=xhigh,fast` and `anthropic/claude-opus-4-6,thinking=high`.
 - Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.
 - Candidate and judge concurrency default to 16. Use `--concurrency <n>` and `--judge-concurrency <n>` to override when local gateways or provider limits need a gentler lane.
- Scenario source is YAML-only under `qa/scenarios/`: use `index.yaml` and
-  per-scenario `*.yaml` files with top-level `title`, `scenario`, and optional
-  `flow`. Never add fenced `qa-scenario` / `qa-flow` Markdown files.
+- Scenario source should stay markdown-driven under `qa/scenarios/`.
 - For isolated character/persona evals, write the persona into `SOUL.md` and blank `IDENTITY.md` in the scenario flow. Use `SOUL.md + IDENTITY.md` only when intentionally testing how the normal OpenClaw identity combines with the character.
 - Keep prompts natural and task-shaped. The candidate model should receive character setup through `SOUL.md`, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.
 - Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.
@@ -236,8 +234,7 @@ pnpm openclaw qa manual \

 ## Repo facts

- Seed scenarios live in `qa/scenarios/index.yaml` and
-  `qa/scenarios/<theme>/*.yaml`.
+- Seed scenarios live in `qa/`.
 - Main live runner: `extensions/qa-lab/src/suite.ts`
 - QA lab server: `extensions/qa-lab/src/lab-server.ts`
 - Child gateway harness: `extensions/qa-lab/src/gateway-child.ts`
@@ -265,9 +262,8 @@ pnpm openclaw qa manual \

 ## When adding scenarios

- Add or update scenario YAML under `qa/scenarios/`; do not add `.md` scenario
-  files or fenced YAML blocks.
- Keep kickoff expectations in `qa/scenarios/index.yaml` aligned
+- Add or update scenario markdown under `qa/scenarios/`
+- Keep kickoff expectations in `qa/scenarios/index.md` aligned
 - Add executable coverage in `extensions/qa-lab/src/suite.ts`
 - Prefer end-to-end assertions over mock-only checks
 - Save outputs under `.artifacts/qa-e2e/`
--- a/.agents/skills/release-openclaw-announcement/SKILL.md
+++ b/.agents/skills/release-openclaw-announcement/SKILL.md
@@ -6,8 +6,7 @@ description: "Draft or post OpenClaw beta/stable Discord release announcements f
 # OpenClaw Release Announcement

 Use with `release-openclaw-maintainer` after a beta or stable release is live.
-Use with `$discord-user-post` when actually posting to Discord as the logged-in
-user.
+Use with `openclaw-discord` when actually posting to Discord.

 ## Evidence First

@@ -81,7 +80,6 @@ Fresh installs still point to `https://openclaw.ai`.

 ## Posting

-When asked to post, use `$discord-user-post` to operate the logged-in Discord
-desktop app as the user. Resolve and visibly verify the exact server/channel,
-inspect the final body, and request action-time confirmation before entering or
-sending it. Never use OpenClaw channel sends, bots, webhooks, relays, or tokens.
+When asked to post, use the configured Discord workflow from
+`openclaw-discord` or the approved OpenClaw relay. Never print tokens.
+For public channels, inspect the final body before sending.
--- a/.agents/skills/release-openclaw-ci/SKILL.md
+++ b/.agents/skills/release-openclaw-ci/SKILL.md
@@ -16,10 +16,6 @@ Use this with `$release-openclaw-maintainer` and `$openclaw-testing` when a rele
 - Watch one parent run plus compact child summaries. Avoid broad `gh run view` polling loops; REST quota is easy to burn.
 - Fetch logs only for failed or currently-blocking jobs. If quota is low, stop polling and wait for reset.
 - Treat live-provider flakes separately from code failures: prove key validity, provider HTTP status, retry evidence, and exact failing lane before editing code.
- A model-list response proves authentication, not billing or inference
-  entitlement. Mandatory live providers must pass a real completion probe
-  before release dispatch. Fix the credential first; do not add an alternate
-  auth path merely to bypass a failed release credential.
 - Full Release Validation parent monitors fail fast: once a required child job
  fails, the parent cancels the remaining child matrix and prints the failed
  job summary. Inspect that first red job instead of waiting for unrelated
@@ -40,8 +36,6 @@ git rev-parse HEAD
 preflight. Inject those exact targeted keys first, then run the verifier; use
 ambient env only when it was already intentionally injected for this release.
 The script prints only provider status and HTTP class, never tokens.
-The Anthropic check performs a tiny message completion so exhausted or
-non-billable credentials fail before the expensive release matrix.

 ## Dispatch

@@ -71,13 +65,6 @@ gh workflow run openclaw-performance.yml \

 Prefer the trusted workflow on `main`, target the exact release SHA:

- Keep trusted-workflow checks compatible with frozen release targets. If
-  `main` adds a target-owned guard script or package command after the release
-  branch cut, make the trusted workflow skip only when that target surface is
-  absent. Heal the trusted workflow before rerunning validation; do not port an
-  unrelated runtime refactor or mutate the release candidate just to satisfy a
-  newer `main`-only check.
-
 ```bash
 gh workflow run full-release-validation.yml \
  --repo openclaw/openclaw \
@@ -119,10 +106,7 @@ Stop watchers before ending the turn or switching strategy.
     --jq '.jobs[] | select(.conclusion=="failure" or .conclusion=="timed_out" or .conclusion=="cancelled") | [.databaseId,.name,.conclusion,.url] | @tsv'
   ```
 3. Fetch one failed job log. If rate-limited, note reset time and avoid more REST calls.
-4. For secret-looking failures, validate a real completion from the same secret source before editing code. A successful model-list request is insufficient.
-   Claude CLI subscription credentials are a separate native auth path; prove
-   them in a clean-home CLI probe, never as a substitute for a required
-   Anthropic API-key lane.
+4. For secret-looking failures, validate the provider endpoint from the same secret source before editing code.
 5. For live-cache failures, inspect whether it is missing/invalid key, empty text, provider refusal, timeout, or baseline miss. Do not weaken release gates without clear provider evidence.
 6. Fix narrowly, run local/changed proof, commit, push, rerun the smallest matching group.

--- a/.agents/skills/release-openclaw-ci/scripts/verify-provider-secrets.mjs
+++ b/.agents/skills/release-openclaw-ci/scripts/verify-provider-secrets.mjs
@@ -1,22 +1,17 @@
 #!/usr/bin/env node
 /**
- * Release preflight helper that verifies required provider API keys without
- * printing secret values. Anthropic must complete a prompt because model-list
- * access does not prove billing or inference entitlement.
+ * Release preflight helper that verifies required provider API keys can reach
+ * their model-list endpoints without printing secret values.
 */
 import process from "node:process";

 const args = new Map();
 for (let index = 2; index < process.argv.length; index += 1) {
  const arg = process.argv[index];
-  if (!arg.startsWith("--")) {
-    continue;
-  }
+  if (!arg.startsWith("--")) continue;
  const [key, inlineValue] = arg.slice(2).split("=", 2);
  const value = inlineValue ?? process.argv[index + 1];
-  if (inlineValue === undefined) {
-    index += 1;
-  }
+  if (inlineValue === undefined) index += 1;
  args.set(key, value);
 }

@@ -33,9 +28,7 @@ const timeoutMs = Number(args.get("timeout-ms") ?? 10_000);
 function envFirst(names) {
  for (const name of names) {
    const value = process.env[name]?.trim();
-    if (value) {
-      return { name, value };
-    }
+    if (value) return { name, value };
  }
  return undefined;
 }
@@ -51,19 +44,13 @@ async function checkProvider(id, config) {
  try {
    const headers = config.headers(secret.value);
    const response = await fetch(config.url, {
-      body: config.body,
      headers,
-      method: config.method,
      signal: controller.signal,
    });
-    const responseBody = config.validateResponse
-      ? await response.json().catch(() => undefined)
-      : undefined;
-    const ok = response.ok && (!config.validateResponse || config.validateResponse(responseBody));
    return {
      id,
-      ok,
-      status: response.ok ? (ok ? "ok" : "invalid_response") : `http_${response.status}`,
+      ok: response.ok,
+      status: response.ok ? "ok" : `http_${response.status}`,
      env: secret.name,
    };
  } catch (error) {
@@ -86,21 +73,11 @@ const providers = {
  },
  anthropic: {
    env: ["ANTHROPIC_API_KEY", "ANTHROPIC_API_TOKEN"],
-    url: "https://api.anthropic.com/v1/messages",
-    method: "POST",
-    body: JSON.stringify({
-      max_tokens: 8,
-      messages: [{ role: "user", content: "Reply with OK." }],
-      model: "claude-haiku-4-5",
-    }),
+    url: "https://api.anthropic.com/v1/models",
    headers: (token) => ({
      "anthropic-version": "2023-06-01",
-      "content-type": "application/json",
      "x-api-key": token,
    }),
-    validateResponse: (body) =>
-      Array.isArray(body?.content) &&
-      body.content.some((part) => typeof part?.text === "string" && part.text.trim()),
  },
  fireworks: {
    env: ["FIREWORKS_API_KEY"],
@@ -131,9 +108,7 @@ let failed = false;
 for (const result of results) {
  const requiredLabel = required.has(result.id) ? "required" : "optional";
  console.log(`${result.id}: ${result.status} env=${result.env} ${requiredLabel}`);
-  if (required.has(result.id) && !result.ok) {
-    failed = true;
-  }
+  if (required.has(result.id) && !result.ok) failed = true;
 }

 if (failed) {
--- a/.agents/skills/release-openclaw-maintainer/SKILL.md
+++ b/.agents/skills/release-openclaw-maintainer/SKILL.md
@@ -100,26 +100,6 @@ Use this skill for release and publish-time workflow. Load `$release-private` if
 - `dev`: moving head on `main`
 - When using a beta Git tag, publish npm with the matching beta version suffix so the plain version is not consumed or blocked

-## Close stable releases on main
-
-Stable publication is not complete until `main` carries the actual shipped release state.
-
-1. Start from fresh latest `main`. Audit `release/YYYY.M.PATCH` against it and
-   forward-port real fixes that are absent from `main`. Do not blindly merge
-   release-only compatibility, test, or validation adapters into newer `main`.
-2. Set `main` to the shipped stable version, not a speculative next train. Run
-   `pnpm release:prep` after the root version change, then
-   `pnpm deps:shrinkwrap:generate`.
-3. Make `CHANGELOG.md`'s `## YYYY.M.PATCH` section on `main` exactly match the
-   tagged release branch. Include the stable `appcast.xml` update when the mac
-   release published one.
-4. Do not add `YYYY.M.PATCH+1`, a beta version, or an empty future changelog
-   section to `main` until the operator explicitly starts that release train.
-5. Run `pnpm release:generated:check`, `pnpm deps:shrinkwrap:check`, and
-   `OPENCLAW_TESTBOX=1 pnpm check:changed`. Push, then verify `origin/main`
-   contains the shipped version and changelog before calling the stable release
-   done.
-
 ## Handle versions and release files consistently

 - Version locations include:
@@ -225,11 +205,6 @@ Stable publication is not complete until `main` carries the actual shipped relea
  `CHANGELOG.md` version section, not highlights or an excerpt. When creating
  or editing a release, extract from `## YYYY.M.PATCH` through the line before the
  next level-2 heading and use that complete block as the release notes.
- Before publishing or closing a release, run
-  `$openclaw-changelog-update`'s `verify-release-notes.mjs` with every stable
-  and beta release tag in the train. Do not publish or leave a page live when
-  it is missing a source-history reference, eligible human credit, or the
-  complete matching changelog body.
 - To update an existing GitHub Release body, resolve the numeric release id and
  patch that resource with the notes file as the `body` field:
  `gh api repos/openclaw/openclaw/releases/tags/vYYYY.M.PATCH --jq .id`, then
@@ -346,7 +321,6 @@ Upgrade with the beta channel.
 Before tagging or publishing, run:

 ```bash
-pnpm release:fast-pretag-check
 pnpm check:architecture
 pnpm build
 pnpm ui:build
@@ -355,21 +329,6 @@ pnpm release:check
 pnpm test:install:smoke
 ```

- Treat `pnpm release:fast-pretag-check` as a hard packaging gate. Every
-  publishable plugin must have a non-empty package-root `README.md`, build its
-  package-local runtime, and pass the npm and ClawHub release metadata checks
-  before a tag or publish workflow can start. Do not defer README, entrypoint,
-  or packed-artifact failures to postpublish verification.
- Before tagging, require green CI for the exact release-candidate SHA, not an
-  earlier branch SHA. Heal every related red CI, release-check, packaging, or
-  root-Dockerfile lane on the release branch, forward-port the fix to `main`,
-  and rerun the affected exact-SHA gates. Never waive a red Docker lane because
-  npm preflight passed.
- Root Dockerfile proof is mandatory before every beta and stable tag. Run the
-  release `install-smoke` group or equivalent root Dockerfile build for the
-  exact candidate SHA and require it to pass. The tag-triggered Docker Release
-  workflow is post-tag publishing, not the first valid proof that the root
-  Dockerfile can build.
 - Before tagging, diff publishable plugin package manifests against the last
  reachable stable/beta release tag. For every newly publishable package
  (`openclaw.release.publishToNpm: true` or `publishToClawHub: true`) whose
@@ -577,16 +536,6 @@ node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
 - `preflight_only=true` on the npm workflow is also the right way to validate an
  existing tag after publish; it should keep running the build checks even when
  the npm version is already published.
- npm registry metadata is eventually consistent immediately after trusted
-  publishing. Keep postpublish `npm view` checks on bounded `--prefer-online`
-  retries, and carry that verified tarball/integrity metadata into later proof
-  steps instead of reading the registry again. If the OpenClaw npm child
-  succeeded but the parent publish workflow failed on an immediate exact-version
-  `E404`, verify the exact version with a cache-bypassed registry read, run the
-  standalone postpublish verifier and the full beta verifier with the original
-  successful child run IDs, then finalize the draft, dependency evidence asset,
-  and release proof manually. Never rerun the publish workflow for that
-  already-published version.
 - npm validation-only preflight may still be dispatched from ordinary branches
  when testing workflow changes before merge. Release checks and real publish
  use only `main` or `release/YYYY.M.PATCH`.
@@ -695,10 +644,9 @@ node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
   off, live OpenAI off, and regression failure off. Let it run in parallel
   with preflight and validation work.
 10. Run the fast local beta preflight from the release branch before any npm
-    preflight or publish. Require exact-SHA CI and root Dockerfile install-smoke
-    to be green before tagging. Keep the remaining expensive Docker, Parallels,
-    and published-package install/update lanes for after the beta is live unless
-    the operator asks to run them before beta publication.
+    preflight or publish. Keep expensive Docker, Parallels, and published-package
+    install/update lanes for after the beta is live unless the operator asks to
+    run them before beta publication.
 11. For beta releases, skip mac app build/sign/notarize unless beta scope or a
    release blocker specifically requires it. For stable releases, include the
    mac app, signing, notarization, and appcast path.
@@ -755,13 +703,8 @@ node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
    waited plugin publish or Windows Hub promotion fails after OpenClaw npm
    succeeds, the workflow keeps the release draft with OpenClaw npm evidence
    and exits red; do not undraft until the gap is repaired. The standalone
-    verifier command remains the first recovery probe:
+    verifier command remains the recovery probe:
    `node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>`.
-    For a failed postpublish parent after successful publish children, also run
-    `pnpm release:verify-beta -- <published-version> ... --skip-github-release`
-    with the original child run IDs and an evidence output path before manually
-    recreating the workflow's draft, dependency evidence asset, proof section,
-    and publish step.
 25. Run the post-published beta verification roster. First scan current `main`
    for critical fixes that landed after the release branch cut; backport only
    important low-risk fixes before starting expensive lanes, or increment to
@@ -798,13 +741,13 @@ node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
    and `.dSYM.zip` artifacts to the existing GitHub release in
    `openclaw/openclaw`.
 32. For stable releases, download `macos-appcast-<tag>` from the successful
-    private mac run, update `appcast.xml` on `main`, verify the feed, then
-    complete the **Close stable releases on main** gate.
+    private mac run, update `appcast.xml` on `main`, and verify the feed. Merge
+    or cherry-pick release branch changes back to `main` after stable succeeds.
 33. For beta releases, publish the mac assets only when intentionally requested;
    expect no shared production
    `appcast.xml` artifact and do not update the shared production feed unless a
    separate beta feed exists.
-34. After stable main closeout, verify npm and the attached release artifacts.
+34. After publish, verify npm and the attached release artifacts.

 ## GHSA advisory work

--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -12,14 +12,9 @@
 /.github/workflows/codeql-android-critical-security.yml @openclaw/openclaw-secops
 /.github/workflows/codeql-critical-quality.yml @openclaw/openclaw-secops
 /.github/workflows/dependency-guard.yml @openclaw/openclaw-secops
-/.github/workflows/security-sensitive-guard.yml @openclaw/openclaw-secops
 /test/scripts/dependency-guard-workflow.test.ts @openclaw/openclaw-secops
 /test/scripts/dependency-guard-script.test.ts @openclaw/openclaw-secops
-/test/scripts/security-sensitive-guard-workflow.test.ts @openclaw/openclaw-secops
-/test/scripts/security-sensitive-guard-script.test.ts @openclaw/openclaw-secops
 /scripts/github/dependency-guard.mjs @openclaw/openclaw-secops
-/scripts/github/security-sensitive-guard.mjs @openclaw/openclaw-secops
-/.gitignore @openclaw/openclaw-secops
 /package-lock.json @openclaw/openclaw-secops
 /npm-shrinkwrap.json @openclaw/openclaw-secops
 /extensions/*/package-lock.json @openclaw/openclaw-secops
--- a/.github/workflows/ci-build-artifacts-testbox.yml
+++ b/.github/workflows/ci-build-artifacts-testbox.yml
@@ -61,7 +61,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              -c "http.extraheader=AUTHORIZATION: basic ${auth_header}" \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
@@ -188,7 +188,7 @@ jobs:
        run: |
          set -euo pipefail

-          timeout --signal=TERM --kill-after=10s 120s git \
+          timeout --signal=TERM --kill-after=10s 30s git \
            -c protocol.version=2 \
            fetch --no-tags --prune --no-recurse-submodules --depth=50 origin \
            "+refs/heads/main:refs/remotes/origin/main"
--- a/.github/workflows/ci-check-arm-testbox.yml
+++ b/.github/workflows/ci-check-arm-testbox.yml
@@ -76,7 +76,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              -c "http.extraheader=AUTHORIZATION: basic ${auth_header}" \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
@@ -106,7 +106,7 @@ jobs:
        run: |
          set -euo pipefail

-          timeout --signal=TERM --kill-after=10s 120s git \
+          timeout --signal=TERM --kill-after=10s 30s git \
            -c protocol.version=2 \
            fetch --no-tags --prune --no-recurse-submodules --depth=50 origin \
            "+refs/heads/main:refs/remotes/origin/main"
--- a/.github/workflows/ci-check-testbox.yml
+++ b/.github/workflows/ci-check-testbox.yml
@@ -6,10 +6,6 @@ on:
        type: string
        description: "Testbox session ID"
        required: true
-      timeout_minutes:
-        type: number
-        description: "Maximum GitHub job runtime for long Testbox commands"
-        default: 120
  pull_request:
    paths:
      - ".github/workflows/**"
@@ -29,7 +25,7 @@ jobs:
      contents: read
    name: "check"
    runs-on: blacksmith-32vcpu-ubuntu-2404
-    timeout-minutes: ${{ fromJSON(inputs.timeout_minutes || '30') }}
+    timeout-minutes: 30
    steps:
      - name: Begin Testbox
        uses: useblacksmith/begin-testbox@233448af4bfdc6fca509a7f0974411ac6d8a8043
@@ -65,7 +61,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              -c "http.extraheader=AUTHORIZATION: basic ${auth_header}" \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
@@ -95,7 +91,7 @@ jobs:
        run: |
          set -euo pipefail

-          timeout --signal=TERM --kill-after=10s 120s git \
+          timeout --signal=TERM --kill-after=10s 30s git \
            -c protocol.version=2 \
            fetch --no-tags --prune --no-recurse-submodules --depth=50 origin \
            "+refs/heads/main:refs/remotes/origin/main"
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -90,7 +90,7 @@ jobs:
            local ref="$1"
            local fetch_status
            for attempt in 1 2 3; do
-              timeout --signal=TERM --kill-after=10s 120s git -C "$GITHUB_WORKSPACE" \
+              timeout --signal=TERM --kill-after=10s 30s git -C "$GITHUB_WORKSPACE" \
                -c protocol.version=2 \
                fetch --no-tags --prune --no-recurse-submodules --depth=2 origin \
                "+${ref}:refs/remotes/origin/checkout" && return 0
@@ -351,7 +351,7 @@ jobs:
            local ref="$1"
            local fetch_status
            for attempt in 1 2 3; do
-              timeout --signal=TERM --kill-after=10s 120s git -C "$GITHUB_WORKSPACE" \
+              timeout --signal=TERM --kill-after=10s 30s git -C "$GITHUB_WORKSPACE" \
                -c protocol.version=2 \
                fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
                "+${ref}:refs/remotes/origin/checkout" && return 0
@@ -499,7 +499,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -564,7 +564,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -810,7 +810,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -850,10 +850,10 @@ jobs:
              ;;
            contracts-plugins-ci-routing)
              pnpm test:contracts:plugins
-              pnpm test src/commands/status.scan-result.test.ts src/scripts/ci-changed-scope.test.ts test/scripts/changed-lanes.test.ts test/scripts/ci-workflow-guards.test.ts test/scripts/run-vitest.test.ts test/scripts/test-projects.test.ts
+              pnpm test src/commands/status.scan-result.test.ts src/scripts/ci-changed-scope.test.ts test/scripts/changed-lanes.test.ts test/scripts/run-vitest.test.ts test/scripts/test-projects.test.ts
              ;;
            ci-routing)
-              pnpm test src/commands/status.scan-result.test.ts src/scripts/ci-changed-scope.test.ts test/scripts/changed-lanes.test.ts test/scripts/ci-workflow-guards.test.ts test/scripts/run-vitest.test.ts test/scripts/test-projects.test.ts
+              pnpm test src/commands/status.scan-result.test.ts src/scripts/ci-changed-scope.test.ts test/scripts/changed-lanes.test.ts test/scripts/run-vitest.test.ts test/scripts/test-projects.test.ts
              ;;
            bun-launcher)
              OPENCLAW_TEST_BUN_LAUNCHER=1 pnpm test test/openclaw-launcher.e2e.test.ts
@@ -899,7 +899,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -979,7 +979,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1056,7 +1056,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1131,7 +1131,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1258,7 +1258,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1288,7 +1288,6 @@ jobs:
        env:
          OPENCLAW_LOCAL_CHECK: "0"
          TASK: ${{ matrix.task }}
-          PR_BASE_SHA: ${{ github.event_name == 'pull_request' && github.event.pull_request.base.sha || '' }}
        shell: bash
        run: |
          set -euo pipefail
@@ -1298,10 +1297,6 @@ jobs:
              pnpm tool-display:check
              pnpm check:host-env-policy:swift
              pnpm dup:check:coverage
-              if [ -n "$PR_BASE_SHA" ]; then
-                git fetch --no-tags --depth=1 origin "+${PR_BASE_SHA}:refs/remotes/origin/pr-base"
-                node scripts/report-test-temp-creations.mjs --base refs/remotes/origin/pr-base --head HEAD --no-merge-base
-              fi
              pnpm deps:patches:check
              pnpm lint:webhook:no-low-level-body-read
              pnpm lint:auth:no-pairing-store-group
@@ -1365,8 +1360,6 @@ jobs:
            boundary_shard: 2/4,3/4,4/4
          - check_name: check-session-accessor-boundary
            group: session-accessor-boundary
-          - check_name: check-session-transcript-reader-boundary
-            group: session-transcript-reader-boundary
          - check_name: check-additional-extension-channels
            group: extension-channels
          - check_name: check-additional-extension-bundled
@@ -1399,7 +1392,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1522,15 +1515,6 @@ jobs:
                run_check "lint:tmp:session-accessor-boundary" pnpm run lint:tmp:session-accessor-boundary
              fi
              ;;
-            session-transcript-reader-boundary)
-              if [ ! -f scripts/check-session-transcript-reader-boundary.mjs ]; then
-                echo "[skip] session transcript reader boundary check is not present in this checkout"
-              elif ! node -e 'const pkg = require("./package.json"); process.exit(pkg.scripts?.["lint:tmp:session-transcript-reader-boundary"] ? 0 : 1);'; then
-                echo "[skip] session transcript reader boundary script is not present in package.json"
-              else
-                run_check "lint:tmp:session-transcript-reader-boundary" pnpm run lint:tmp:session-transcript-reader-boundary
-              fi
-              ;;
            extension-channels)
              run_check "lint:extensions:channels" pnpm run lint:extensions:channels
              ;;
@@ -1584,7 +1568,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
@@ -1630,7 +1614,7 @@ jobs:
            git -C "$workdir" config gc.auto 0
            git -C "$workdir" remote add origin "https://github.com/openclaw/clawhub.git"

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+refs/heads/main:refs/remotes/origin/checkout" || return 1
@@ -1677,7 +1661,7 @@ jobs:
          fetch_checkout_ref() {
            local fetch_status
            for attempt in 1 2 3; do
-              timeout --signal=TERM --kill-after=10s 120s git -C "$GITHUB_WORKSPACE" \
+              timeout --signal=TERM --kill-after=10s 30s git -C "$GITHUB_WORKSPACE" \
                -c protocol.version=2 \
                fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
                "+${CHECKOUT_SHA}:refs/remotes/origin/checkout" && return 0
@@ -2083,7 +2067,7 @@ jobs:
            git -C "$workdir" remote add origin "https://github.com/${CHECKOUT_REPO}.git"
            git -C "$workdir" config gc.auto 0

-            timeout --signal=TERM --kill-after=10s 120s git -C "$workdir" \
+            timeout --signal=TERM --kill-after=10s 30s git -C "$workdir" \
              -c protocol.version=2 \
              fetch --no-tags --prune --no-recurse-submodules --depth=1 origin \
              "+${CHECKOUT_SHA}:refs/remotes/origin/ci-target" || return 1
--- a/.github/workflows/full-release-validation.yml
+++ b/.github/workflows/full-release-validation.yml
@@ -275,7 +275,7 @@ jobs:
            local workflow="$1"
            shift

-            local dispatch_output run_id status conclusion url poll_count
+            local before_json dispatch_output run_id status conclusion url poll_count
            gh_with_retry() {
              local output status attempt
              for attempt in 1 2 3 4 5 6; do
@@ -298,6 +298,8 @@ jobs:
              printf '%s\n' "$output" >&2
              return "$status"
            }
+            before_json="$(gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 100 --json databaseId --jq '[.[].databaseId]')"
+
            dispatch_output="$(gh_with_retry workflow run "$workflow" --ref "$CHILD_WORKFLOW_REF" "$@")"
            printf '%s\n' "$dispatch_output"
            run_id="$(
@@ -307,7 +309,20 @@ jobs:
            )"

            if [[ -z "$run_id" ]]; then
-              echo "::error::gh workflow run ${workflow} did not return an Actions run URL; refusing to guess from recent workflow_dispatch runs." >&2
+              for _ in $(seq 1 60); do
+                run_id="$(
+                  BEFORE_IDS="$before_json" gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 50 --json databaseId,createdAt \
+                    --jq 'map(select(.databaseId as $id | (env.BEFORE_IDS | fromjson | index($id) | not))) | sort_by(.createdAt) | reverse | .[0].databaseId // empty'
+                )"
+                if [[ -n "$run_id" ]]; then
+                  break
+                fi
+                sleep 5
+              done
+            fi
+
+            if [[ -z "${run_id:-}" ]]; then
+              echo "Could not find dispatched run for ${workflow}." >&2
              exit 1
            fi

@@ -408,7 +423,7 @@ jobs:
            local workflow="$1"
            shift

-            local dispatch_output run_id status conclusion url poll_count
+            local before_json dispatch_output run_id status conclusion url poll_count
            gh_with_retry() {
              local output status attempt
              for attempt in 1 2 3 4 5 6; do
@@ -431,6 +446,8 @@ jobs:
              printf '%s\n' "$output" >&2
              return "$status"
            }
+            before_json="$(gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 100 --json databaseId --jq '[.[].databaseId]')"
+
            dispatch_output="$(gh_with_retry workflow run "$workflow" --ref "$CHILD_WORKFLOW_REF" "$@")"
            printf '%s\n' "$dispatch_output"
            run_id="$(
@@ -440,7 +457,20 @@ jobs:
            )"

            if [[ -z "$run_id" ]]; then
-              echo "::error::gh workflow run ${workflow} did not return an Actions run URL; refusing to guess from recent workflow_dispatch runs." >&2
+              for _ in $(seq 1 60); do
+                run_id="$(
+                  BEFORE_IDS="$before_json" gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 50 --json databaseId,createdAt \
+                    --jq 'map(select(.databaseId as $id | (env.BEFORE_IDS | fromjson | index($id) | not))) | sort_by(.createdAt) | reverse | .[0].databaseId // empty'
+                )"
+                if [[ -n "$run_id" ]]; then
+                  break
+                fi
+                sleep 5
+              done
+            fi
+
+            if [[ -z "${run_id:-}" ]]; then
+              echo "Could not find dispatched run for ${workflow}." >&2
              exit 1
            fi

@@ -551,7 +581,7 @@ jobs:
            local workflow="$1"
            shift

-            local dispatch_output run_id status conclusion url poll_count run_json
+            local before_json dispatch_output run_id status conclusion url poll_count run_json
            gh_with_retry() {
              local output status attempt
              for attempt in 1 2 3 4 5 6; do
@@ -574,6 +604,8 @@ jobs:
              printf '%s\n' "$output" >&2
              return "$status"
            }
+            before_json="$(gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 100 --json databaseId --jq '[.[].databaseId]')"
+
            dispatch_output="$(gh_with_retry workflow run "$workflow" --ref "$CHILD_WORKFLOW_REF" "$@")"
            printf '%s\n' "$dispatch_output"
            run_id="$(
@@ -583,7 +615,20 @@ jobs:
            )"

            if [[ -z "$run_id" ]]; then
-              echo "::error::gh workflow run ${workflow} did not return an Actions run URL; refusing to guess from recent workflow_dispatch runs." >&2
+              for _ in $(seq 1 60); do
+                run_id="$(
+                  BEFORE_IDS="$before_json" gh_with_retry run list --workflow "$workflow" --event workflow_dispatch --limit 50 --json databaseId,createdAt \
+                    --jq 'map(select(.databaseId as $id | (env.BEFORE_IDS | fromjson | index($id) | not))) | sort_by(.createdAt) | reverse | .[0].databaseId // empty'
+                )"
+                if [[ -n "$run_id" ]]; then
+                  break
+                fi
+                sleep 5
+              done
+            fi
+
+            if [[ -z "${run_id:-}" ]]; then
+              echo "Could not find dispatched run for ${workflow}." >&2
              exit 1
            fi

@@ -883,6 +928,8 @@ jobs:
            return "$status"
          }

+          before_json="$(gh_with_retry run list --workflow npm-telegram-beta-e2e.yml --event workflow_dispatch --limit 100 --json databaseId --jq '[.[].databaseId]')"
+
          args=(-f package_spec="${PACKAGE_SPEC:-openclaw@beta}" -f harness_ref="$TARGET_SHA" -f provider_mode="$PROVIDER_MODE")
          if [[ -z "${PACKAGE_SPEC// }" ]]; then
            if [[ "$PREPARE_PACKAGE_RESULT" != "success" || -z "${PACKAGE_ARTIFACT_NAME// }" ]]; then
@@ -899,16 +946,22 @@ jobs:
            args+=(-f scenario="$SCENARIO")
          fi

-          dispatch_output="$(gh_with_retry workflow run npm-telegram-beta-e2e.yml --ref "$CHILD_WORKFLOW_REF" "${args[@]}")"
-          printf '%s\n' "$dispatch_output"
-          run_id="$(
-            printf '%s\n' "$dispatch_output" |
-              sed -nE 's#.*actions/runs/([0-9]+).*#\1#p' |
-              tail -n 1
-          )"
+          gh_with_retry workflow run npm-telegram-beta-e2e.yml --ref "$CHILD_WORKFLOW_REF" "${args[@]}"
+
+          run_id=""
+          for _ in $(seq 1 60); do
+            run_id="$(
+              BEFORE_IDS="$before_json" gh_with_retry run list --workflow npm-telegram-beta-e2e.yml --event workflow_dispatch --limit 50 --json databaseId,createdAt \
+                --jq 'map(select(.databaseId as $id | (env.BEFORE_IDS | fromjson | index($id) | not))) | sort_by(.createdAt) | reverse | .[0].databaseId // empty'
+            )"
+            if [[ -n "$run_id" ]]; then
+              break
+            fi
+            sleep 5
+          done

          if [[ -z "$run_id" ]]; then
-            echo "::error::gh workflow run npm-telegram-beta-e2e.yml did not return an Actions run URL; refusing to guess from recent workflow_dispatch runs." >&2
+            echo "Could not find dispatched run for npm-telegram-beta-e2e.yml." >&2
            exit 1
          fi

@@ -1020,23 +1073,31 @@ jobs:
            echo "- Release impact: advisory"
          } >> "$GITHUB_STEP_SUMMARY"

-          dispatch_output="$(gh_with_retry workflow run openclaw-performance.yml \
+          before_json="$(gh_with_retry run list --workflow openclaw-performance.yml --event workflow_dispatch --limit 100 --json databaseId --jq '[.[].databaseId]')"
+
+          gh_with_retry workflow run openclaw-performance.yml \
            --ref "$CHILD_WORKFLOW_REF" \
            -f target_ref="$TARGET_SHA" \
            -f profile=release \
            -f repeat=3 \
            -f deep_profile=false \
            -f live_openai_candidate=false \
-            -f fail_on_regression=false)"
-          printf '%s\n' "$dispatch_output"
-          run_id="$(
-            printf '%s\n' "$dispatch_output" |
-              sed -nE 's#.*actions/runs/([0-9]+).*#\1#p' |
-              tail -n 1
-          )"
+            -f fail_on_regression=false
+
+          run_id=""
+          for _ in $(seq 1 60); do
+            run_id="$(
+              BEFORE_IDS="$before_json" gh_with_retry run list --workflow openclaw-performance.yml --event workflow_dispatch --limit 50 --json databaseId,createdAt \
+                --jq 'map(select(.databaseId as $id | (env.BEFORE_IDS | fromjson | index($id) | not))) | sort_by(.createdAt) | reverse | .[0].databaseId // empty'
+            )"
+            if [[ -n "$run_id" ]]; then
+              break
+            fi
+            sleep 5
+          done

          if [[ -z "$run_id" ]]; then
-            echo "::warning::gh workflow run openclaw-performance.yml did not return an Actions run URL; refusing to guess from recent workflow_dispatch runs."
+            echo "::warning::Could not find dispatched run for openclaw-performance.yml."
            exit 0
          fi

--- a/.github/workflows/install-smoke.yml
+++ b/.github/workflows/install-smoke.yml
@@ -476,21 +476,19 @@ jobs:
      - name: Run Rocky Linux installer smoke
        run: |
          timeout --kill-after=30s 20m docker run --rm \
-            --platform linux/amd64 \
            -e OPENCLAW_NO_ONBOARD=1 \
            -e OPENCLAW_NO_PROMPT=1 \
            -v "$PWD/scripts/install.sh:/tmp/install.sh:ro" \
-            rockylinux:9@sha256:d644d203142cd5b54ad2a83a203e1dee68af2229f8fe32f52a30c6e1d3c3a9e0 \
+            rockylinux:9@sha256:d7be1c094cc5845ee815d4632fe377514ee6ebcf8efaed6892889657e5ddaaa6 \
            bash -lc 'dnf install -y -q ca-certificates tar gzip xz findutils which sudo >/dev/null && bash /tmp/install.sh --install-method npm --version latest --no-onboard --no-prompt --verify && openclaw --version'

      - name: Run Rocky Linux CLI installer smoke
        run: |
          timeout --kill-after=30s 20m docker run --rm \
-            --platform linux/amd64 \
            -e OPENCLAW_NO_ONBOARD=1 \
            -e OPENCLAW_NO_PROMPT=1 \
            -v "$PWD/scripts/install-cli.sh:/tmp/install-cli.sh:ro" \
-            rockylinux:9@sha256:d644d203142cd5b54ad2a83a203e1dee68af2229f8fe32f52a30c6e1d3c3a9e0 \
+            rockylinux:9@sha256:d7be1c094cc5845ee815d4632fe377514ee6ebcf8efaed6892889657e5ddaaa6 \
            bash -lc 'dnf install -y -q ca-certificates tar gzip xz findutils which sudo >/dev/null && bash /tmp/install-cli.sh --prefix /tmp/openclaw-cli --version latest --no-onboard && /tmp/openclaw-cli/bin/openclaw --version'

  bun_global_install_smoke:
--- a/.github/workflows/ios-periphery-comment.yml
+++ b/.github/workflows/ios-periphery-comment.yml
@@ -1,447 +0,0 @@
-name: iOS Periphery Dead Code Comment
-
-on:
-  workflow_run: # zizmor: ignore[dangerous-triggers] trusted PR commenter; job gates repository, source event, workflow name, live open PR, and exact current head before reading artifacts or writing comments
-    workflows: ["iOS Periphery Dead Code"]
-    types: [completed]
-
-env:
-  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
-
-permissions:
-  actions: read
-  contents: read
-  issues: write
-  pull-requests: read
-
-jobs:
-  comment:
-    name: Comment on PR
-    runs-on: ubuntu-24.04
-    if: >
-      github.repository == 'openclaw/openclaw' &&
-      github.event.workflow_run.event == 'pull_request' &&
-      github.event.workflow_run.name == 'iOS Periphery Dead Code'
-    steps:
-      - name: Upsert Periphery PR comment
-        uses: actions/github-script@v9
-        with:
-          script: |
-            const fs = require("node:fs");
-            const os = require("node:os");
-            const path = require("node:path");
-            const childProcess = require("node:child_process");
-
-            const marker = "<!-- openclaw-ios-periphery-dead-code -->";
-            const run = context.payload.workflow_run;
-            const pr = run.pull_requests?.[0];
-            if (!pr) {
-              core.info("No pull request attached to workflow_run.");
-              return;
-            }
-
-            const { owner, repo } = context.repo;
-            const repository = `${owner}/${repo}`;
-            if (run.repository?.full_name !== repository) {
-              core.info(`Skipping workflow_run from ${run.repository?.full_name ?? "unknown repository"}.`);
-              return;
-            }
-            if (run.event !== "pull_request") {
-              core.info(`Skipping workflow_run for ${run.event ?? "unknown"} event.`);
-              return;
-            }
-            if (run.name !== "iOS Periphery Dead Code") {
-              core.info(`Skipping unexpected workflow ${run.name ?? "unknown"}.`);
-              return;
-            }
-
-            const livePull = await github.rest.pulls.get({
-              owner,
-              repo,
-              pull_number: pr.number,
-            });
-            if (livePull.data.state !== "open") {
-              core.info(`Skipping closed PR #${pr.number}.`);
-              return;
-            }
-            if (livePull.data.base?.repo?.full_name !== repository) {
-              core.info(`Skipping PR #${pr.number} targeting ${livePull.data.base?.repo?.full_name ?? "unknown repository"}.`);
-              return;
-            }
-            if (livePull.data.head?.sha !== run.head_sha) {
-              core.info(`Skipping stale run ${run.id}; PR #${pr.number} is now at ${livePull.data.head?.sha}.`);
-              return;
-            }
-
-            const jobs = await github.paginate(github.rest.actions.listJobsForWorkflowRun, {
-              owner,
-              repo,
-              run_id: run.id,
-              filter: "latest",
-              per_page: 100,
-            });
-            const scopeJob = jobs.find((job) => job.name === "Detect iOS scan scope");
-            const scanJob = jobs.find((job) => job.name === "Scan iOS dead code");
-            const scanSkipped =
-              scopeJob?.conclusion === "success" && scanJob?.conclusion === "skipped";
-            if (scanSkipped) {
-              core.info(`Skipping intentionally omitted Periphery scan for PR #${pr.number}.`);
-            }
-
-            const artifacts = scanSkipped
-              ? []
-              : await github.paginate(github.rest.actions.listWorkflowRunArtifacts, {
-                  owner,
-                  repo,
-                  run_id: run.id,
-                  per_page: 100,
-                });
-
-            const readReport = async () => {
-              if (scanSkipped) {
-                return;
-              }
-              const artifactName = `ios-periphery-dead-code-${run.id}-${run.run_attempt}`;
-              const artifact = artifacts.find((item) => item.name === artifactName);
-              if (!artifact) {
-                core.warning(`No ${artifactName} artifact found.`);
-                return;
-              }
-              if (artifact.expired) {
-                core.warning(`${artifactName} artifact expired.`);
-                return;
-              }
-
-              const maxArchiveBytes = 1024 * 1024;
-              const archiveSize = Number(artifact.size_in_bytes);
-              if (!Number.isSafeInteger(archiveSize) || archiveSize < 0 || archiveSize > maxArchiveBytes) {
-                core.warning(`Skipping ${artifactName}; compressed artifact size ${artifact.size_in_bytes ?? "unknown"} exceeds the ${maxArchiveBytes} byte limit.`);
-                return;
-              }
-
-              const archive = await github.rest.actions.downloadArtifact({
-                owner,
-                repo,
-                artifact_id: artifact.id,
-                archive_format: "zip",
-              });
-
-              const dir = fs.mkdtempSync(path.join(os.tmpdir(), "ios-periphery-"));
-              const archivePath = path.join(dir, "artifact.zip");
-              const archiveBuffer = Buffer.from(archive.data);
-              fs.writeFileSync(archivePath, archiveBuffer);
-
-              const allowedArtifactFiles = new Set([
-                "periphery.json",
-                "periphery.status",
-                "periphery.stderr.log",
-                "periphery.stdout.json",
-                "should-fail.txt",
-              ]);
-              const maxEntries = allowedArtifactFiles.size;
-              const maxEntryBytes = 2 * 1024 * 1024;
-              const maxTotalBytes = 4 * 1024 * 1024;
-
-              const readUInt16 = (offset) => archiveBuffer.readUInt16LE(offset);
-              const readUInt32 = (offset) => archiveBuffer.readUInt32LE(offset);
-              const findEndOfCentralDirectoryOffset = () => {
-                const minimumOffset = Math.max(0, archiveBuffer.length - 0xffff - 22);
-                for (let offset = archiveBuffer.length - 22; offset >= minimumOffset; offset -= 1) {
-                  if (readUInt32(offset) === 0x06054b50) {
-                    return offset;
-                  }
-                }
-                return -1;
-              };
-
-              const endOfCentralDirectoryOffset = findEndOfCentralDirectoryOffset();
-              if (endOfCentralDirectoryOffset < 0) {
-                core.warning(`Skipping ${artifactName}; ZIP end-of-central-directory record was not found.`);
-                return;
-              }
-              const entryCount = readUInt16(endOfCentralDirectoryOffset + 10);
-              const centralDirectorySize = readUInt32(endOfCentralDirectoryOffset + 12);
-              const centralDirectoryOffset = readUInt32(endOfCentralDirectoryOffset + 16);
-              if (entryCount < 1 || entryCount > maxEntries) {
-                core.warning(`Skipping ${artifactName}; artifact has ${entryCount} entries.`);
-                return;
-              }
-              if (
-                centralDirectoryOffset + centralDirectorySize > archiveBuffer.length ||
-                readUInt32(centralDirectoryOffset) !== 0x02014b50
-              ) {
-                core.warning(`Skipping ${artifactName}; invalid ZIP central directory.`);
-                return;
-              }
-
-              const entries = new Map();
-              let totalUncompressedSize = 0;
-              let offset = centralDirectoryOffset;
-              for (let index = 0; index < entryCount; index += 1) {
-                if (offset + 46 > archiveBuffer.length || readUInt32(offset) !== 0x02014b50) {
-                  core.warning(`Skipping ${artifactName}; invalid central directory entry.`);
-                  return;
-                }
-
-                const compressionMethod = readUInt16(offset + 10);
-                const generalPurposeBitFlag = readUInt16(offset + 8);
-                const compressedSize = readUInt32(offset + 20);
-                const uncompressedSize = readUInt32(offset + 24);
-                const fileNameLength = readUInt16(offset + 28);
-                const extraLength = readUInt16(offset + 30);
-                const commentLength = readUInt16(offset + 32);
-                const externalAttributes = readUInt32(offset + 38);
-                const nameStart = offset + 46;
-                const nameEnd = nameStart + fileNameLength;
-                const nextOffset = nameEnd + extraLength + commentLength;
-                if (nextOffset > archiveBuffer.length) {
-                  core.warning(`Skipping ${artifactName}; central directory entry exceeds archive bounds.`);
-                  return;
-                }
-
-                const name = archiveBuffer.toString("utf8", nameStart, nameEnd);
-                const mode = externalAttributes >>> 16;
-                const fileType = mode & 0o170000;
-                const isRegularFile = fileType === 0 || fileType === 0o100000;
-                const invalidName =
-                  !allowedArtifactFiles.has(name) ||
-                  name.includes("/") ||
-                  name.includes("\\") ||
-                  name.includes("..") ||
-                  path.isAbsolute(name);
-                if (invalidName) {
-                  core.warning(`Skipping ${artifactName}; unexpected artifact entry ${name}.`);
-                  return;
-                }
-                if (!isRegularFile || name.endsWith("/")) {
-                  core.warning(`Skipping ${artifactName}; ${name} is not a regular file.`);
-                  return;
-                }
-                if (entries.has(name)) {
-                  core.warning(`Skipping ${artifactName}; duplicate artifact entry ${name}.`);
-                  return;
-                }
-                if (![0, 8].includes(compressionMethod)) {
-                  core.warning(`Skipping ${artifactName}; ${name} uses unsupported ZIP compression method ${compressionMethod}.`);
-                  return;
-                }
-                if ((generalPurposeBitFlag & 0x1) !== 0) {
-                  core.warning(`Skipping ${artifactName}; ${name} is encrypted.`);
-                  return;
-                }
-                if (compressedSize > maxEntryBytes || uncompressedSize > maxEntryBytes) {
-                  core.warning(`Skipping ${artifactName}; ${name} exceeds the per-file size limit.`);
-                  return;
-                }
-
-                totalUncompressedSize += uncompressedSize;
-                if (totalUncompressedSize > maxTotalBytes) {
-                  core.warning(`Skipping ${artifactName}; artifact exceeds the aggregate size limit.`);
-                  return;
-                }
-
-                entries.set(name, { uncompressedSize });
-                offset = nextOffset;
-              }
-
-              const files = new Map();
-              for (const [name, entry] of entries) {
-                const contents = childProcess.execFileSync("unzip", ["-p", archivePath, name], {
-                  encoding: "utf8",
-                  maxBuffer: Math.max(1, entry.uncompressedSize + 1024),
-                  timeout: 5000,
-                });
-                if (Buffer.byteLength(contents, "utf8") > maxEntryBytes) {
-                  core.warning(`Skipping ${artifactName}; ${name} exceeded the per-file size limit while reading.`);
-                  return;
-                }
-                files.set(name, contents);
-              }
-
-              const read = (name) => {
-                return files.get(name) ?? "";
-              };
-
-              const status = Number(read("periphery.status").trim() || "1");
-              let findings = null;
-              for (const name of ["periphery.json", "periphery.stdout.json"]) {
-                try {
-                  const parsed = JSON.parse(read(name));
-                  const validFindings =
-                    Array.isArray(parsed) &&
-                    parsed.every(
-                      (finding) =>
-                        finding !== null &&
-                        typeof finding === "object" &&
-                        !Array.isArray(finding),
-                    );
-                  if (validFindings) {
-                    findings = parsed;
-                    break;
-                  }
-                } catch {}
-              }
-              return { findings, status };
-            };
-            const report = await readReport();
-            const status = report?.status ?? 1;
-            const findings = report?.findings ?? null;
-
-            const sanitizeCell = (value) => {
-              const normalized = String(value ?? "")
-                .replace(/[\u0000-\u001f\u007f-\u009f]/gu, " ")
-                .replace(/[\u200b-\u200f\u202a-\u202e\u2060\u2066-\u2069\ufeff]/gu, "")
-                .replace(/\s+/gu, " ")
-                .trim();
-              const maxEncodedLength = 180;
-              let escaped = "";
-              for (const character of normalized) {
-                const encoded =
-                  character === "`"
-                    ? "'"
-                    : character === "|"
-                      ? "\\|"
-                      : character;
-                if (escaped.length + encoded.length > maxEncodedLength) {
-                  break;
-                }
-                escaped += encoded;
-              }
-              return `\`${escaped || "-"}\``;
-            };
-
-            const rows = (findings ?? []).map((finding) => {
-              const location = String(finding.location ?? "");
-              const [file, line] = location.split(":");
-              return {
-                file: file ? `apps/ios/${file}` : "",
-                line: line || "",
-                kind: String(finding.kind ?? ""),
-                name: String(finding.name ?? ""),
-              };
-            });
-
-            let mode = "failure";
-            let body = `${marker}\n`;
-            if (scanSkipped) {
-              mode = "skipped";
-              body += [
-                "### iOS Periphery",
-                "",
-                "Periphery scan skipped because the pull request is a draft or no longer touches iOS scan scope.",
-              ].join("\n");
-            } else if (findings === null) {
-              body += [
-                "### iOS Periphery",
-                "",
-                "Periphery did not complete or its report could not be safely read. Check the workflow run for details.",
-              ].join("\n");
-            } else if (rows.length === 0 && status === 0) {
-              mode = "success";
-              body += [
-                "### iOS Periphery",
-                "",
-                "No dead Swift code found.",
-              ].join("\n");
-            } else if (rows.length > 0) {
-              const shown = rows.slice(0, 50);
-              body += [
-                "### iOS Periphery",
-                "",
-                `Found ${rows.length} dead Swift code ${rows.length === 1 ? "symbol" : "symbols"}. Remove the code or add a narrow Periphery exemption with a comment explaining why it must stay.`,
-                "",
-                "| File | Line | Kind | Name |",
-                "| --- | ---: | --- | --- |",
-                ...shown.map((row) => `| ${sanitizeCell(row.file)} | ${sanitizeCell(row.line)} | ${sanitizeCell(row.kind)} | ${sanitizeCell(row.name)} |`),
-                rows.length > shown.length ? "" : null,
-                rows.length > shown.length ? `Showing first ${shown.length}; full JSON is in the workflow artifact.` : null,
-              ].filter(Boolean).join("\n");
-            } else {
-              body += [
-                "### iOS Periphery",
-                "",
-                "Periphery exited with a non-zero status before producing findings. Check the workflow artifact for stdout/stderr.",
-              ].join("\n");
-            }
-            body += "\n";
-            const maxCommentChars = 60_000;
-            if (body.length > maxCommentChars) {
-              body = [
-                marker,
-                "### iOS Periphery",
-                "",
-                `Found ${rows.length} dead Swift code ${rows.length === 1 ? "symbol" : "symbols"}. The rendered report exceeded the safe comment limit; use the workflow artifact for details.`,
-                "",
-              ].join("\n");
-            }
-
-            const comments = await github.paginate(github.rest.issues.listComments, {
-              owner,
-              repo,
-              issue_number: livePull.data.number,
-              per_page: 100,
-            });
-            const existing = comments.find(
-              (comment) =>
-                comment.user?.login === "github-actions[bot]" &&
-                comment.body?.includes(marker),
-            );
-
-            if (!existing && ["skipped", "success"].includes(mode)) {
-              core.info(`No existing Periphery comment and scan ${mode}; skipping comment.`);
-              return;
-            }
-
-            const currentPull = await github.rest.pulls.get({
-              owner,
-              repo,
-              pull_number: pr.number,
-            });
-            if (
-              currentPull.data.state !== "open" ||
-              currentPull.data.base?.repo?.full_name !== repository ||
-              currentPull.data.head?.sha !== run.head_sha
-            ) {
-              core.info(`Skipping stale run ${run.id}; PR #${pr.number} changed before comment update.`);
-              return;
-            }
-
-            const workflowRuns = await github.paginate(github.rest.actions.listWorkflowRuns, {
-              owner,
-              repo,
-              workflow_id: run.workflow_id,
-              event: "pull_request",
-              head_sha: run.head_sha,
-              per_page: 100,
-            });
-            const supersedingRun = workflowRuns.find(
-              (candidate) =>
-                (candidate.id === run.id ||
-                  candidate.pull_requests?.some(
-                    (candidatePull) => candidatePull.number === pr.number,
-                  )) &&
-                (candidate.run_number > run.run_number ||
-                  (candidate.run_number === run.run_number &&
-                    candidate.run_attempt > run.run_attempt)),
-            );
-            if (supersedingRun) {
-              core.info(`Skipping superseded run ${run.id} attempt ${run.run_attempt}; run ${supersedingRun.id} attempt ${supersedingRun.run_attempt} is newer.`);
-              return;
-            }
-
-            if (existing) {
-              await github.rest.issues.updateComment({
-                owner,
-                repo,
-                comment_id: existing.id,
-                body,
-              });
-              return;
-            }
-
-            await github.rest.issues.createComment({
-              owner,
-              repo,
-              issue_number: livePull.data.number,
-              body,
-            });
--- a/.github/workflows/ios-periphery.yml
+++ b/.github/workflows/ios-periphery.yml
@@ -1,229 +0,0 @@
-name: iOS Periphery Dead Code
-
-on:
-  pull_request:
-    types: [opened, synchronize, reopened, ready_for_review, converted_to_draft]
-  workflow_dispatch:
-
-concurrency:
-  group: ios-periphery-${{ github.workflow }}-${{ github.event.pull_request.number || github.sha }}
-  cancel-in-progress: true
-
-env:
-  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
-
-permissions:
-  contents: read
-  pull-requests: read
-
-jobs:
-  scope:
-    name: Detect iOS scan scope
-    runs-on: ubuntu-24.04
-    outputs:
-      should-scan: ${{ steps.scope.outputs.should-scan }}
-    steps:
-      - name: Detect changed paths
-        id: scope
-        uses: actions/github-script@v9
-        with:
-          script: |
-            if (context.eventName === "workflow_dispatch") {
-              core.setOutput("should-scan", "true");
-              return;
-            }
-            if (context.payload.pull_request?.draft) {
-              core.setOutput("should-scan", "false");
-              return;
-            }
-
-            const files = await github.paginate(github.rest.pulls.listFiles, {
-              owner: context.repo.owner,
-              repo: context.repo.repo,
-              pull_number: context.payload.pull_request.number,
-              per_page: 100,
-            });
-            const isScanPath = (filename) =>
-              typeof filename === "string" && (
-                filename.startsWith("apps/ios/") ||
-                filename === ".github/workflows/ios-periphery.yml" ||
-                filename === ".github/workflows/ios-periphery-comment.yml" ||
-                filename === "config/swiftformat" ||
-                filename === "config/swiftlint.yml"
-              );
-            const shouldScan = files.some(
-              ({ filename, previous_filename: previousFilename }) =>
-                isScanPath(filename) || isScanPath(previousFilename)
-            );
-            core.setOutput("should-scan", String(shouldScan));
-
-  scan:
-    name: Scan iOS dead code
-    needs: scope
-    if: ${{ needs.scope.outputs.should-scan == 'true' }}
-    runs-on: ${{ github.event_name == 'workflow_dispatch' && 'macos-26' || (github.repository == 'openclaw/openclaw' && 'blacksmith-12vcpu-macos-26' || 'macos-26') }}
-    timeout-minutes: 45
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 1
-          fetch-tags: false
-          persist-credentials: false
-          submodules: false
-
-      - name: Verify Xcode
-        run: |
-          set -euo pipefail
-          for xcode_app in /Applications/Xcode_26.5.app /Applications/Xcode-26.5.0.app; do
-            if [ -d "$xcode_app/Contents/Developer" ]; then
-              sudo xcode-select -s "$xcode_app/Contents/Developer"
-              break
-            fi
-          done
-          xcodebuild -version
-          xcode_version="$(xcodebuild -version | awk 'NR == 1 { print $2 }')"
-          if [[ "$xcode_version" != 26.* ]]; then
-            echo "error: expected Xcode 26.x, got $xcode_version" >&2
-            exit 1
-          fi
-          swift --version
-
-      - name: Setup Node environment
-        uses: ./.github/actions/setup-node-env
-        with:
-          install-bun: "false"
-
-      - name: Install iOS Swift tooling
-        run: brew install xcodegen swiftformat swiftlint periphery
-
-      - name: Generate iOS project
-        run: |
-          set -euo pipefail
-          ./scripts/ios-configure-signing.sh
-          ./scripts/ios-write-version-xcconfig.sh
-          cd apps/ios
-          xcodegen generate
-
-      - name: Run Periphery
-        run: |
-          set -euo pipefail
-          output_dir="$RUNNER_TEMP/ios-periphery"
-          mkdir -p "$output_dir"
-          cd apps/ios
-          set +e
-          periphery scan \
-            --config .periphery.yml \
-            --strict \
-            --format json \
-            --write-results "$output_dir/periphery.json" \
-            >"$output_dir/periphery.stdout.json" \
-            2>"$output_dir/periphery.stderr.log"
-          periphery_status="$?"
-          set -e
-          printf '%s\n' "$periphery_status" >"$output_dir/periphery.status"
-          if [ ! -s "$output_dir/periphery.json" ]; then
-            cp "$output_dir/periphery.stdout.json" "$output_dir/periphery.json"
-          fi
-
-      - name: Build Periphery report
-        run: |
-          set -euo pipefail
-          node <<'NODE'
-          const fs = require("node:fs");
-          const path = require("node:path");
-
-          const outputDir = path.join(process.env.RUNNER_TEMP, "ios-periphery");
-          const read = (name) => {
-            const file = path.join(outputDir, name);
-            return fs.existsSync(file) ? fs.readFileSync(file, "utf8") : "";
-          };
-
-          const status = Number(read("periphery.status").trim() || "1");
-          let findings = null;
-          for (const name of ["periphery.json", "periphery.stdout.json"]) {
-            try {
-              const parsed = JSON.parse(read(name));
-              if (Array.isArray(parsed)) {
-                findings = parsed;
-                break;
-              }
-            } catch {}
-          }
-
-          const escapeCommandData = (value) =>
-            String(value ?? "")
-              .replaceAll("%", "%25")
-              .replaceAll("\r", "%0D")
-              .replaceAll("\n", "%0A");
-          const escapeCommandProperty = (value) =>
-            escapeCommandData(value)
-              .replaceAll(":", "%3A")
-              .replaceAll(",", "%2C");
-
-          const rows = (findings ?? []).map((finding) => {
-            const location = String(finding.location ?? "");
-            const [file, line] = location.split(":");
-            const repoFile = file ? `apps/ios/${file}` : "";
-            return {
-              file: repoFile,
-              line: line || "",
-              kind: String(finding.kind ?? ""),
-              name: String(finding.name ?? ""),
-            };
-          });
-
-          for (const row of rows) {
-            if (!row.file) continue;
-            const line = row.line ? `,line=${escapeCommandProperty(row.line)}` : "";
-            const title = `${row.kind || "Unused code"} ${row.name}`.trim();
-            console.log(`::error file=${escapeCommandProperty(row.file)}${line},title=Dead Swift code::${escapeCommandData(title)}`);
-          }
-
-          let shouldFail = "1";
-          let summary = "";
-
-          if (findings === null) {
-            summary = [
-              "### iOS Periphery",
-              "",
-              "Periphery did not complete. Check the workflow artifact for stdout/stderr.",
-            ].join("\n");
-          } else if (rows.length === 0 && status === 0) {
-            shouldFail = "0";
-            summary = [
-              "### iOS Periphery",
-              "",
-              "No dead Swift code found.",
-            ].join("\n");
-          } else if (rows.length > 0) {
-            summary = [
-              "### iOS Periphery",
-              "",
-              `Found ${rows.length} dead Swift code ${rows.length === 1 ? "symbol" : "symbols"}. See the PR comment or workflow artifact for details.`,
-            ].join("\n");
-          } else {
-            summary = [
-              "### iOS Periphery",
-              "",
-              "Periphery exited with a non-zero status before producing findings. Check the workflow artifact for stdout/stderr.",
-            ].join("\n");
-          }
-
-          fs.writeFileSync(path.join(outputDir, "should-fail.txt"), `${shouldFail}\n`);
-          fs.appendFileSync(process.env.GITHUB_STEP_SUMMARY, `${summary.trim()}\n`);
-          NODE
-
-      - name: Upload Periphery report
-        if: always()
-        uses: actions/upload-artifact@v7
-        with:
-          name: ios-periphery-dead-code-${{ github.run_id }}-${{ github.run_attempt }}
-          path: ${{ runner.temp }}/ios-periphery
-          if-no-files-found: warn
-          retention-days: 14
-
-      - name: Fail on dead code
-        run: |
-          set -euo pipefail
-          test "$(cat "$RUNNER_TEMP/ios-periphery/should-fail.txt")" = "0"
--- a/.github/workflows/maturity-scorecard.yml
+++ b/.github/workflows/maturity-scorecard.yml
@@ -1,113 +0,0 @@
-name: Maturity scorecard
-
-on:
-  workflow_dispatch:
-    inputs:
-      source_run_id:
-        description: Optional workflow run id containing qa-evidence.json artifacts
-        required: false
-        type: string
-      artifact_pattern:
-        description: Artifact name pattern to download from source_run_id
-        required: false
-        default: "*qa*"
-        type: string
-  push:
-    branches: [main]
-    paths:
-      - "taxonomy.yaml"
-      - "docs/maturity-scores.yaml"
-      - "docs/maturity-scorecard.md"
-      - "docs/taxonomy.md"
-      - "docs/taxonomy-outline.md"
-      - "scripts/render-maturity-docs.mjs"
-      - "package.json"
-      - ".github/workflows/maturity-scorecard.yml"
-  pull_request:
-    paths:
-      - "taxonomy.yaml"
-      - "docs/maturity-scores.yaml"
-      - "docs/maturity-scorecard.md"
-      - "docs/taxonomy.md"
-      - "docs/taxonomy-outline.md"
-      - "scripts/render-maturity-docs.mjs"
-      - "package.json"
-      - ".github/workflows/maturity-scorecard.yml"
-
-permissions:
-  actions: read
-  contents: read
-
-concurrency:
-  group: ${{ format('{0}-{1}', github.workflow, github.ref) }}
-  cancel-in-progress: true
-
-env:
-  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
-  NODE_VERSION: "24.x"
-
-jobs:
-  render:
-    name: Render maturity docs
-    runs-on: ubuntu-24.04
-    timeout-minutes: 20
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v6
-        with:
-          fetch-depth: 1
-          fetch-tags: false
-          persist-credentials: false
-          submodules: false
-
-      - name: Setup Node environment
-        uses: ./.github/actions/setup-node-env
-        with:
-          node-version: ${{ env.NODE_VERSION }}
-          install-bun: "false"
-
-      - name: Check committed maturity docs
-        run: pnpm maturity:check
-
-      - name: Download QA evidence artifacts
-        if: ${{ github.event_name == 'workflow_dispatch' && inputs.source_run_id != '' }}
-        env:
-          GH_TOKEN: ${{ github.token }}
-          SOURCE_RUN_ID: ${{ inputs.source_run_id }}
-          ARTIFACT_PATTERN: ${{ inputs.artifact_pattern }}
-        run: |
-          set -euo pipefail
-          mkdir -p .artifacts/maturity-evidence
-          gh run download "$SOURCE_RUN_ID" \
-            --repo "$GITHUB_REPOSITORY" \
-            --pattern "$ARTIFACT_PATTERN" \
-            --dir .artifacts/maturity-evidence
-          find .artifacts/maturity-evidence -name qa-evidence.json -print
-
-      - name: Render artifact docs
-        run: |
-          set -euo pipefail
-          args=(--output-dir .artifacts/maturity-docs)
-          if find .artifacts/maturity-evidence -name qa-evidence.json -print -quit 2>/dev/null | grep -q .; then
-            args+=(--evidence-dir .artifacts/maturity-evidence)
-          fi
-          pnpm maturity:render -- "${args[@]}"
-          {
-            echo "### Maturity scorecard docs"
-            echo
-            echo "- Committed docs check: passed"
-            echo "- Artifact docs: \`.artifacts/maturity-docs\`"
-            if find .artifacts/maturity-evidence -name qa-evidence.json -print -quit 2>/dev/null | grep -q .; then
-              echo "- QA evidence: included"
-            else
-              echo "- QA evidence: none downloaded"
-            fi
-          } >> "$GITHUB_STEP_SUMMARY"
-
-      - name: Upload maturity docs artifact
-        uses: actions/upload-artifact@v7
-        with:
-          name: maturity-scorecard-docs-${{ github.run_id }}-${{ github.run_attempt }}
-          path: .artifacts/maturity-docs/
-          retention-days: 30
-          if-no-files-found: error
--- a/.github/workflows/openclaw-cross-os-release-checks-reusable.yml
+++ b/.github/workflows/openclaw-cross-os-release-checks-reusable.yml
@@ -407,28 +407,12 @@ jobs:
          const path = require("node:path");

          const packageDir = process.env.PACKAGE_DIR;
-          function resolveTarballFileName(value, label) {
-            const fileName = typeof value === "string" ? value.trim() : "";
-            if (
-              !fileName.endsWith(".tgz") ||
-              fileName.includes("\0") ||
-              fileName !== path.basename(fileName) ||
-              fileName !== path.win32.basename(fileName)
-            ) {
-              throw new Error(`${label} must be a local .tgz filename.`);
-            }
-            return fileName;
-          }
          const requestedFileName = process.env.INPUT_CANDIDATE_FILE_NAME.trim();
          const files = fs.readdirSync(packageDir).filter((file) => file.endsWith(".tgz"));
-          const selectedCandidateFileName = requestedFileName || (files.length === 1 ? files[0] : "");
-          if (!selectedCandidateFileName) {
+          const candidateFileName = requestedFileName || (files.length === 1 ? files[0] : "");
+          if (!candidateFileName) {
            throw new Error(`Expected exactly one candidate .tgz in ${packageDir}; found ${files.length}.`);
          }
-          const candidateFileName = resolveTarballFileName(
-            selectedCandidateFileName,
-            "candidate_file_name",
-          );
          if (!fs.existsSync(path.join(packageDir, candidateFileName))) {
            throw new Error(`Provided candidate artifact does not contain ${candidateFileName}.`);
          }
@@ -490,23 +474,12 @@ jobs:
        run: |
          node <<'NODE' >>"$GITHUB_OUTPUT"
          const fs = require("node:fs");
-          const path = require("node:path");
-          function resolveTarballFileName(value, label) {
-            const fileName = typeof value === "string" ? value.trim() : "";
-            if (
-              !fileName.endsWith(".tgz") ||
-              fileName.includes("\0") ||
-              fileName !== path.basename(fileName) ||
-              fileName !== path.win32.basename(fileName)
-            ) {
-              throw new Error(`${label} must be a local .tgz filename.`);
-            }
-            return fileName;
-          }
          const payload = JSON.parse(fs.readFileSync(process.env.BASELINE_PACK_JSON, "utf8"));
          const entry = Array.isArray(payload) ? payload.at(-1) : null;
-          const fileName = resolveTarballFileName(entry?.filename, "Baseline npm pack filename");
-          process.stdout.write(`file_name=${fileName}\n`);
+          if (!entry?.filename) {
+            throw new Error("Baseline npm pack did not produce a filename.");
+          }
+          process.stdout.write(`file_name=${entry.filename}\n`);
          NODE

      - name: Upload candidate artifact
--- a/.github/workflows/openclaw-live-and-e2e-checks-reusable.yml
+++ b/.github/workflows/openclaw-live-and-e2e-checks-reusable.yml
@@ -2222,11 +2222,7 @@ jobs:
          case "${{ matrix.suite_id }}" in
            live-cli-backend-docker)
              echo "OPENCLAW_LIVE_CLI_BACKEND_MODEL=claude-cli/claude-sonnet-4-6" >> "$GITHUB_ENV"
-              if [[ -n "${OPENCLAW_CLAUDE_CREDENTIALS_JSON:-}" || -n "${CLAUDE_CODE_OAUTH_TOKEN:-}" ]]; then
-                echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=subscription" >> "$GITHUB_ENV"
-              else
-                echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=api-key" >> "$GITHUB_ENV"
-              fi
+              echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=api-key" >> "$GITHUB_ENV"
              echo "OPENCLAW_LIVE_CLI_BACKEND_DEBUG=1" >> "$GITHUB_ENV"
              echo "OPENCLAW_CLI_BACKEND_LOG_OUTPUT=1" >> "$GITHUB_ENV"
              echo "OPENCLAW_TEST_CONSOLE=1" >> "$GITHUB_ENV"
@@ -2451,11 +2447,7 @@ jobs:
          case "${{ matrix.suite_id }}" in
            live-cli-backend-docker)
              echo "OPENCLAW_LIVE_CLI_BACKEND_MODEL=claude-cli/claude-sonnet-4-6" >> "$GITHUB_ENV"
-              if [[ -n "${OPENCLAW_CLAUDE_CREDENTIALS_JSON:-}" || -n "${CLAUDE_CODE_OAUTH_TOKEN:-}" ]]; then
-                echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=subscription" >> "$GITHUB_ENV"
-              else
-                echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=api-key" >> "$GITHUB_ENV"
-              fi
+              echo "OPENCLAW_LIVE_CLI_BACKEND_AUTH=api-key" >> "$GITHUB_ENV"
              echo "OPENCLAW_LIVE_CLI_BACKEND_DEBUG=1" >> "$GITHUB_ENV"
              echo "OPENCLAW_CLI_BACKEND_LOG_OUTPUT=1" >> "$GITHUB_ENV"
              echo "OPENCLAW_TEST_CONSOLE=1" >> "$GITHUB_ENV"
--- a/.github/workflows/openclaw-npm-release.yml
+++ b/.github/workflows/openclaw-npm-release.yml
@@ -223,25 +223,10 @@ jobs:
          set -euo pipefail
          PACK_OUTPUT="$RUNNER_TEMP/npm-pack-output.txt"
          npm pack --json 2>&1 | tee "$PACK_OUTPUT"
-          PACK_NAME="$(node - "$PACK_OUTPUT" <<'NODE'
+          PACK_PATH="$(node - "$PACK_OUTPUT" <<'NODE'
          const fs = require("node:fs");
-          const path = require("node:path");
          const input = fs.readFileSync(process.argv[2], "utf8");

-          function resolveTarballFileName(value) {
-            const fileName = typeof value === "string" ? value.trim() : "";
-            if (
-              !fileName.endsWith(".tgz") ||
-              fileName.includes("\0") ||
-              fileName !== path.basename(fileName) ||
-              fileName !== path.win32.basename(fileName)
-            ) {
-              console.error(`npm pack reported unsafe tarball filename ${JSON.stringify(fileName)}.`);
-              process.exit(1);
-            }
-            return fileName;
-          }
-
          function arrayEndFrom(start) {
            let depth = 0;
            let inString = false;
@@ -281,8 +266,8 @@ jobs:
            try {
              const parsed = JSON.parse(input.slice(start, end));
              const first = Array.isArray(parsed) ? parsed[0] : null;
-              if (first && Object.prototype.hasOwnProperty.call(first, "filename")) {
-                process.stdout.write(resolveTarballFileName(first.filename));
+              if (first && typeof first.filename === "string" && first.filename) {
+                process.stdout.write(first.filename);
                process.exit(0);
              }
            } catch {
@@ -294,7 +279,6 @@ jobs:
          process.exit(1);
          NODE
          )"
-          PACK_PATH="$PWD/$PACK_NAME"
          if [[ -z "$PACK_PATH" || ! -f "$PACK_PATH" ]]; then
            echo "npm pack did not produce a tarball file." >&2
            exit 1
@@ -306,7 +290,7 @@ jobs:
          else
            RELEASE_TAG="${RELEASE_REF}"
          fi
-          TARBALL_NAME="$PACK_NAME"
+          TARBALL_NAME="$(basename "$PACK_PATH")"
          TARBALL_SHA256="$(sha256sum "$PACK_PATH" | awk '{print $1}')"
          ARTIFACT_DIR="$RUNNER_TEMP/openclaw-npm-preflight"
          rm -rf "$ARTIFACT_DIR"
--- a/.github/workflows/openclaw-performance.yml
+++ b/.github/workflows/openclaw-performance.yml
@@ -56,7 +56,6 @@ concurrency:
 env:
  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
  OCM_VERSION: v0.2.15
-  OCM_LINUX_X64_SHA256: b849b8de5d77e97e0df9319703254ae95e29d7f26a7552ea79bf173ff110ea0a
  KOVA_REPOSITORY: openclaw/Kova
  PERFORMANCE_MODEL_ID: gpt-5.5

@@ -188,20 +187,11 @@ jobs:
          set -euo pipefail
          KOVA_SRC="${RUNNER_TEMP}/kova-src"
          echo "KOVA_SRC=$KOVA_SRC" >> "$GITHUB_ENV"
-          mkdir -p "$HOME/.local/bin" "$(dirname "$KOVA_SRC")" "${RUNNER_TEMP}/ocm-install"
-
-          ocm_archive="${RUNNER_TEMP}/ocm-${OCM_VERSION}-x86_64-unknown-linux-gnu.tar.gz"
-          curl -fsSL --proto '=https' --tlsv1.2 --retry 3 --retry-delay 1 --retry-connrefused \
-            -o "$ocm_archive" \
-            "https://github.com/shakkernerd/ocm/releases/download/${OCM_VERSION}/ocm-x86_64-unknown-linux-gnu.tar.gz"
-          echo "${OCM_LINUX_X64_SHA256}  ${ocm_archive}" | sha256sum -c -
-          tar -xzf "$ocm_archive" -C "${RUNNER_TEMP}/ocm-install"
-          install -m 0755 "${RUNNER_TEMP}/ocm-install/ocm" "$HOME/.local/bin/ocm"
-
-          git init -b main "$KOVA_SRC"
-          git -C "$KOVA_SRC" remote add origin "https://github.com/${KOVA_REPOSITORY}.git"
-          git -C "$KOVA_SRC" fetch --filter=blob:none --depth 1 origin "$KOVA_REF"
-          git -C "$KOVA_SRC" checkout --detach FETCH_HEAD
+          mkdir -p "$HOME/.local/bin" "$(dirname "$KOVA_SRC")"
+          curl -fsSL https://raw.githubusercontent.com/shakkernerd/ocm/main/install.sh \
+            | bash -s -- --version "$OCM_VERSION" --prefix "$HOME/.local" --force
+          git clone --filter=blob:none "https://github.com/${KOVA_REPOSITORY}.git" "$KOVA_SRC"
+          git -C "$KOVA_SRC" checkout "$KOVA_REF"
          cat > "$HOME/.local/bin/kova" <<EOF
          #!/usr/bin/env bash
          export KOVA_HOME="${KOVA_HOME}"
--- a/.github/workflows/openclaw-release-publish.yml
+++ b/.github/workflows/openclaw-release-publish.yml
@@ -1112,14 +1112,13 @@ jobs:
          }

          append_release_proof_to_github_release() {
-            local release_version body_file notes_file evidence_path tarball integrity telegram_line clawhub_line clawhub_bootstrap_line clawhub_runtime_state_path windows_line
+            local release_version body_file notes_file tarball integrity telegram_line clawhub_line clawhub_bootstrap_line clawhub_runtime_state_path windows_line

            release_version="${RELEASE_TAG#v}"
            body_file="${RUNNER_TEMP}/release-body.md"
            notes_file="${RUNNER_TEMP}/release-notes-with-proof.md"
-            evidence_path="${POSTPUBLISH_EVIDENCE_DIR}/release-postpublish-evidence.json"
-            tarball="$(jq -er '.openclawNpmTarball | select(type == "string" and length > 0)' "${evidence_path}")"
-            integrity="$(jq -er '.openclawNpmIntegrity | select(type == "string" and length > 0)' "${evidence_path}")"
+            tarball="$(npm view "openclaw@${release_version}" dist.tarball --json | jq -r '.')"
+            integrity="$(npm view "openclaw@${release_version}" dist.integrity --json | jq -r '.')"
            gh release view "${RELEASE_TAG}" --repo "$GITHUB_REPOSITORY" --json body --jq .body > "${body_file}"

            if [[ -n "${NPM_TELEGRAM_RUN_ID// }" ]]; then
--- a/.github/workflows/security-sensitive-guard.yml
+++ b/.github/workflows/security-sensitive-guard.yml
@@ -1,114 +0,0 @@
-name: Security Sensitive Guard
-
-on:
-  pull_request_target: # zizmor: ignore[dangerous-triggers] checks trusted base script only; never checks out PR head
-    types: [opened, reopened, synchronize, ready_for_review]
-
-permissions:
-  contents: read
-  pull-requests: write
-  issues: write
-
-env:
-  # Temporary rollout bridge for PRs opened before this workflow's script landed.
-  # Remove once the pre-rollout PR set has drained.
-  OPENCLAW_SECURITY_SENSITIVE_GUARD_ROLLOUT_SHA: 5d9c010628ea4de3492a12e32f9be5b8c5dfa9ed
-
-concurrency:
-  group: security-sensitive-guard-${{ github.event.pull_request.number }}
-  cancel-in-progress: true
-
-jobs:
-  security-sensitive-guard-detect:
-    if: ${{ !github.event.pull_request.draft }}
-    runs-on: ubuntu-24.04
-    timeout-minutes: 5
-    steps:
-      - name: Check security-sensitive guard rollout eligibility
-        id: rollout
-        env:
-          GH_TOKEN: ${{ github.token }}
-          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
-        run: |
-          status="$(
-            gh api \
-              "repos/${GITHUB_REPOSITORY}/compare/${OPENCLAW_SECURITY_SENSITIVE_GUARD_ROLLOUT_SHA}...${PR_BASE_SHA}" \
-              --jq '.status'
-          )"
-          case "$status" in
-            ahead|identical)
-              echo "ready=true" >> "$GITHUB_OUTPUT"
-              ;;
-            behind|diverged)
-              echo "ready=false" >> "$GITHUB_OUTPUT"
-              echo "::notice::Skipping security-sensitive guard for a PR base that predates rollout commit ${OPENCLAW_SECURITY_SENSITIVE_GUARD_ROLLOUT_SHA}."
-              ;;
-            *)
-              echo "Unexpected compare status for security-sensitive guard rollout: $status" >&2
-              exit 1
-              ;;
-          esac
-
-      - name: Check out trusted base workflow scripts
-        if: steps.rollout.outputs.ready == 'true'
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
-        with:
-          ref: ${{ github.workflow_sha }}
-          persist-credentials: false
-
-      - name: Detect security-sensitive changes
-        if: steps.rollout.outputs.ready == 'true'
-        env:
-          GITHUB_TOKEN: ${{ github.token }}
-          OPENCLAW_SECURITY_APPROVERS: vincentkoc,steipete,joshavant
-          OPENCLAW_SECURITY_SENSITIVE_GUARD_MODE: detect
-          OPENCLAW_SECURITY_TEAM_SLUG: openclaw-secops
-        run: node scripts/github/security-sensitive-guard.mjs
-
-  security-sensitive-guard:
-    if: ${{ !github.event.pull_request.draft && always() }}
-    needs:
-      - security-sensitive-guard-detect
-    runs-on: ubuntu-24.04
-    timeout-minutes: 5
-    steps:
-      - name: Check security-sensitive guard rollout eligibility
-        id: rollout
-        env:
-          GH_TOKEN: ${{ github.token }}
-          PR_BASE_SHA: ${{ github.event.pull_request.base.sha }}
-        run: |
-          status="$(
-            gh api \
-              "repos/${GITHUB_REPOSITORY}/compare/${OPENCLAW_SECURITY_SENSITIVE_GUARD_ROLLOUT_SHA}...${PR_BASE_SHA}" \
-              --jq '.status'
-          )"
-          case "$status" in
-            ahead|identical)
-              echo "ready=true" >> "$GITHUB_OUTPUT"
-              ;;
-            behind|diverged)
-              echo "ready=false" >> "$GITHUB_OUTPUT"
-              echo "::notice::Skipping security-sensitive guard for a PR base that predates rollout commit ${OPENCLAW_SECURITY_SENSITIVE_GUARD_ROLLOUT_SHA}."
-              ;;
-            *)
-              echo "Unexpected compare status for security-sensitive guard rollout: $status" >&2
-              exit 1
-              ;;
-          esac
-
-      - name: Check out trusted base workflow scripts
-        if: steps.rollout.outputs.ready == 'true'
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
-        with:
-          ref: ${{ github.workflow_sha }}
-          persist-credentials: false
-
-      - name: Enforce security-sensitive guard
-        if: steps.rollout.outputs.ready == 'true'
-        env:
-          GITHUB_TOKEN: ${{ github.token }}
-          OPENCLAW_SECURITY_APPROVERS: vincentkoc,steipete,joshavant
-          OPENCLAW_SECURITY_SENSITIVE_GUARD_MODE: enforce
-          OPENCLAW_SECURITY_TEAM_SLUG: openclaw-secops
-        run: node scripts/github/security-sensitive-guard.mjs
--- a/.github/workflows/windows-blacksmith-testbox.yml
+++ b/.github/workflows/windows-blacksmith-testbox.yml
@@ -65,9 +65,7 @@ jobs:
          fi
          runner_ssh_port="${BLACKSMITH_SSH_PORT:-22}"

-          hydrating_response="$RUNNER_TEMP/testbox-hydrating.response"
-          hydrating_http_code="$(curl -sS -L --post302 --post303 -o "$hydrating_response" -w '%{http_code}' \
-            -X POST "${api_url}/api/testbox/phone-home" \
+          response="$(curl -s -f -L --post302 --post303 -X POST "${api_url}/api/testbox/phone-home" \
            -H "Content-Type: application/json" \
            -H "Authorization: Bearer ${auth_token}" \
            -d "{
@@ -79,15 +77,7 @@ jobs:
              \"working_directory\": \"${GITHUB_WORKSPACE}\",
              \"adopted_run_id\": \"${GITHUB_RUN_ID}\",
              \"metadata\": {}
-            }" || true)"
-
-          echo "phone_home_hydrating_http=${hydrating_http_code}"
-          if [[ ! "$hydrating_http_code" =~ ^2 ]]; then
-            echo "Blacksmith phone-home hydrating failed; response body:" >&2
-            cat "$hydrating_response" >&2 || true
-            exit 1
-          fi
-          response="$(cat "$hydrating_response")"
+            }" 2>/dev/null || true)"

          echo "$TESTBOX_ID" > "$state/testbox_id"
          echo "$installation_model_id" > "$state/installation_model_id"
@@ -110,14 +100,12 @@ jobs:
          fi

          ssh_public_key="$(cat "$state/ssh_public_key" 2>/dev/null || true)"
-          if [ -z "$ssh_public_key" ]; then
-            echo "Blacksmith phone-home did not return an SSH public key; testbox cannot accept CLI connections." >&2
-            exit 1
+          if [ -n "$ssh_public_key" ]; then
+            mkdir -p ~/.ssh
+            printf '%s\n' "$ssh_public_key" >> ~/.ssh/authorized_keys
+            chmod 700 ~/.ssh
+            chmod 600 ~/.ssh/authorized_keys
          fi
-          mkdir -p ~/.ssh
-          printf '%s\n' "$ssh_public_key" >> ~/.ssh/authorized_keys
-          chmod 700 ~/.ssh
-          chmod 600 ~/.ssh/authorized_keys

      - name: Checkout
        uses: actions/checkout@v6
@@ -173,11 +161,6 @@ jobs:
            -H "Authorization: Bearer ${auth_token}" \
            --data-binary @"$ready_body" || true)"
          echo "phone_home_ready_http=${http_code}"
-          if [[ ! "$http_code" =~ ^2 ]]; then
-            echo "Blacksmith phone-home ready failed; response body:" >&2
-            cat "$RUNNER_TEMP/testbox-ready.response" >&2 || true
-            exit 1
-          fi

          echo "============================================"
          echo "Testbox ready!"
--- a/.github/workflows/windows-testbox-probe.yml
+++ b/.github/workflows/windows-testbox-probe.yml
@@ -133,9 +133,8 @@ jobs:
              $rootfs = "C:\wsl\ubuntu-noble-wsl.rootfs.tar.gz"
              New-Item -ItemType Directory -Force -Path @((Split-Path -Parent $rootfs), $wslRoot) | Out-Null
              Invoke-WebRequest -Uri $env:UBUNTU_WSL_ROOTFS_URL -OutFile $rootfs -UseBasicParsing
-              $import = Invoke-WslText -Arguments @("--import", "UbuntuProbe", $wslRoot, $rootfs, "--version", "2")
-              Write-Host $import.Text
-              Write-Host "wsl_import_exit=$($import.Code)"
+              wsl.exe --import UbuntuProbe $wslRoot $rootfs --version 2
+              Write-Host "wsl_import_exit=$LASTEXITCODE"
              $list = Invoke-WslText -Arguments @("--list", "--verbose")
              Write-Host $list.Text
              Write-Host "wsl_list_after_import_exit=$($list.Code)"
@@ -145,15 +144,14 @@ jobs:
            if ($distros.Count -gt 0) {
              $distro = $distros[0]
              Write-Host "wsl_probe_distro=$distro"
-              $exec = Invoke-WslText -Arguments @("-d", $distro, "--exec", "bash", "-lc", 'set -euo pipefail; uname -a; if [ -f /etc/os-release ]; then sed -n "1,8p" /etc/os-release; fi')
+              wsl.exe -d $distro --exec bash -lc 'set -euo pipefail; uname -a; if [ -f /etc/os-release ]; then sed -n "1,8p" /etc/os-release; fi'
            } else {
-              $exec = Invoke-WslText -Arguments @("--exec", "bash", "-lc", 'set -euo pipefail; uname -a; if [ -f /etc/os-release ]; then sed -n "1,8p" /etc/os-release; fi')
+              wsl.exe --exec bash -lc 'set -euo pipefail; uname -a; if [ -f /etc/os-release ]; then sed -n "1,8p" /etc/os-release; fi'
            }
-            Write-Host $exec.Text
-            if ($exec.Code -eq 0) {
+            if ($LASTEXITCODE -eq 0) {
              $ok = $true
            }
-            Write-Host "wsl_exec_exit=$($exec.Code)"
+            Write-Host "wsl_exec_exit=$LASTEXITCODE"
          }

          if ($ok) {
--- a/.github/workflows/workflow-sanity.yml
+++ b/.github/workflows/workflow-sanity.yml
@@ -251,6 +251,3 @@ jobs:

      - name: Check plugin SDK API baseline drift
        run: pnpm plugin-sdk:api:check
-
-      - name: Check plugin SDK surface budget
-        run: pnpm plugin-sdk:surface:check
--- a/.gitignore
+++ b/.gitignore
@@ -77,19 +77,12 @@ extensions/canvas/src/host/a2ui/*.map

 # fastlane (iOS)
 apps/ios/fastlane/README.md
-apps/android/fastlane/README.md
 apps/ios/fastlane/report.xml
 apps/ios/fastlane/Preview.html
 apps/ios/fastlane/screenshots/
 apps/ios/fastlane/test_output/
 apps/ios/fastlane/logs/
 apps/ios/fastlane/.env
-apps/android/fastlane/report.xml
-apps/android/fastlane/Preview.html
-apps/android/fastlane/test_output/
-apps/android/fastlane/logs/
-apps/android/fastlane/.env
-apps/android/fastlane/metadata/android/**/images/

 # fastlane build artifacts (local)
 apps/ios/*.ipa
@@ -132,12 +125,8 @@ mantis/
 !.agents/skills/crabbox/**
 !.agents/skills/clawdtributor/
 !.agents/skills/clawdtributor/**
-!.agents/skills/claw-score/
-!.agents/skills/claw-score/**
 !.agents/skills/control-ui-e2e/
 !.agents/skills/control-ui-e2e/**
-!.agents/skills/discord-user-post/
-!.agents/skills/discord-user-post/**
 !.agents/skills/gitcrawl/
 !.agents/skills/gitcrawl/**
 !.agents/skills/technical-documentation/
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -172,7 +172,7 @@ Skills own workflows; root owns hard policy and routing.
 - PR artifacts/screenshots: attach to PR/comment/external artifact store. Never push screenshots, videos, proof images, or proof assets to OpenClaw or any product repo branch, including temp artifact branches. Use Crabbox artifact publishing plus the manifest URL. Do not commit `.github/pr-assets`.
 - CI polling: exact SHA, relevant checks only, minimal fields. Skip routine noise (`Auto response`, `Labeler`, docs agents, performance/stale). Logs only after failure/completion or concrete need.
 - OpenClaw write-access maintainers may skip `Real behavior proof` when local tests or Crabbox verified behavior; record proof in PR verification.
- Agent PR landing to `main`: use only the repo-native `scripts/pr` wrapper: run `scripts/pr review-init <PR>`, follow its emitted checkout/guard guidance, initialize and complete review artifacts with `scripts/pr review-artifacts-init <PR>`, validate them with `scripts/pr review-validate-artifacts <PR>`, then run `scripts/pr prepare-run <PR>` and `scripts/pr merge-run <PR>`; do not idle on `auto-response` or `check-docs`.
+- `/landpr`: use `~/.codex/prompts/landpr.md`; do not idle on `auto-response` or `check-docs`.

 ## Code

@@ -214,7 +214,6 @@ Skills own workflows; root owns hard policy and routing.

 - Vitest. Colocated `*.test.ts`; e2e `*.e2e.test.ts`; example models `sonnet-4.6`, `gpt-5.5`; test GPT with 5.5 preferred, 5.4 ok; no GPT-4.x agent-smoke defaults.
 - Prefer behavior tests over workflow/docs string greps. Put operator policy reminders in AGENTS/docs.
- QA scenario sources are YAML only: `qa/scenarios/index.yaml` and `qa/scenarios/<theme>/*.yaml`. Do not add fenced `qa-scenario`/`qa-flow` Markdown files under `qa/scenarios/`.
 - Clean timers/env/globals/mocks/sockets/temp dirs/module state; `--isolate=false` safe.
 - Prefer injection and narrow `*.runtime.ts` mocks over broad barrels or `openclaw/plugin-sdk/*`.
 - Do not edit baseline/inventory/ignore/snapshot/expected-failure files to silence checks without explicit approval.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -110,7 +110,7 @@ For coordinated change sets that genuinely need more than 20 PRs, join the **#cl
 - Keep PRs takeover-ready: open them from a branch maintainers can push to. For fork PRs, leave GitHub's **Allow edits by maintainers** option enabled so maintainers can finish urgent fixes, changelog entries, or merge prep when needed. If GitHub shows **Allow edits and access to secrets by maintainers**, enable it only when that workflow/secrets access is acceptable and say so in the PR.
 - Do not edit `CHANGELOG.md` in contributor PRs. Maintainers or ClawSweeper add the changelog entry when landing user-facing changes.
 - Run tests: `pnpm build && pnpm check && pnpm test`
- For iterative local commits, `scripts/committer --fast "message" <files...>` skips commit hooks. Only use it when you've already run equivalent targeted validation for the touched surface.
+- For iterative local commits, `scripts/committer --fast "message" <files...>` passes `FAST_COMMIT=1` through to the pre-commit hook so it skips the repo-wide `pnpm check`. Only use it when you've already run equivalent targeted validation for the touched surface.
 - For extension/plugin changes, run the fast local lane first:
  - `pnpm test:extension <extension-name>`
  - `pnpm test:extension --list` to see valid extension ids
--- a/2
+++ b/2
@@ -138,7 +138,7 @@ ARG OPENCLAW_BUNDLED_PLUGIN_DIR
 # BuildKit cache mounts are not part of cached layers; seed tarballs for the
 # installed prod graph in the same step that runs offline prune.
 RUN --mount=type=cache,id=openclaw-pnpm-store,target=/root/.local/share/pnpm/store,sharing=locked \
-    node scripts/list-prod-store-packages.mjs | xargs -r pnpm store add && \
+    pnpm list --prod --depth Infinity --json | node scripts/list-prod-store-packages.mjs | xargs -r pnpm store add && \
    CI=true pnpm prune --prod \
      --config.offline=true \
      --config.supportedArchitectures.os=linux \
--- a/appcast.xml
+++ b/appcast.xml
@@ -2,48 +2,6 @@
 <rss xmlns:sparkle="http://www.andymatuschak.org/xml-namespaces/sparkle" version="2.0">
    <channel>
        <title>OpenClaw</title>
-        <item>
-            <title>2026.6.8</title>
-            <pubDate>Tue, 16 Jun 2026 17:17:20 +0000</pubDate>
-            <link>https://raw.githubusercontent.com/openclaw/openclaw/main/appcast.xml</link>
-            <sparkle:version>2606000890</sparkle:version>
-            <sparkle:shortVersionString>2026.6.8</sparkle:shortVersionString>
-            <sparkle:minimumSystemVersion>15.0</sparkle:minimumSystemVersion>
-            <description><![CDATA[<h2>OpenClaw 2026.6.8</h2>
-<h3>Highlights</h3>
-<ul>
-<li>Telegram and WhatsApp channel delivery are richer and less brittle: Telegram can send structured rich text with tables, lists, expandable blockquotes, preserved intentional line breaks, prompt-preserving CLI backend delivery, retired native draft migration, and safer rich-media boundaries, while WhatsApp now honors configured ACP bindings. (#92679, #93164, #84082, #89421, #92513) Thanks @obviyus, @jzakirov, @spacegeologist, and @TurboTheTurtle.</li>
-<li>Agent and Gateway recovery is sharper across account-scoped DM sends, generated media completions, auto-reply message-tool final replies, reset archive fallback reads, restart shutdown aborts, yielded subagent pauses, trusted subagent thinking override fallback, yielded cron media, heartbeat dedupe, session identity prompts, and unknown OpenAI agent selector rejection. (#92788, #91246, #92879, #91357, #92631, #92412, #92146, #91287, #92468, #92510) Thanks @yetval, @TurboTheTurtle, @masatohoshino, @CadanHu, @ooiuuii, @openperf, @IWhatsskill, @ZengWen-DT, and @zhangguiping-xydt.</li>
-<li>Provider/model handling expands and tightens with GLM-5.2, Claude Haiku 4.5 catalog rows, OpenRouter and Google Vertex provider-prefix normalization, managed SecretRef auth, OAuth image-default routing through Codex, bounded model browse discovery, LM Studio binary thinking-off delivery, storeless OpenAI Responses replay gating, invalid OpenAI reasoning-signature and genericized Anthropic thinking-signature recovery, Claude 4.5 Copilot tool-streaming safety, and OpenAI/Anthropic-family payload quarantine for unreadable or post-hook tool schemas. (#92796, #90116, #92627, #91218, #90686, #92824, #92247, #92002, #90706, #92941, #92201, #92916, #75393, #92908, #92921, #92928) Thanks @arkyu2077, @liuhao1024, @bymle, @rohitjavvadi, @nxmxbbd, @bek91, @samson910022, @mmyzwl, @CarlCapital, @snowzlm, @Kailigithub, and @vincentkoc.</li>
-<li><code>/usage</code> and reply payload hooks now have a native full footer renderer, default template, fixed-decimal formatting, credential-aware limits, better partial-count handling, and warnings for broken templates instead of silent bad output. (#92657, #89835, #89629) Thanks @Marvinthebored.</li>
-<li>UI and mobile flows are steadier: workspace files can collapse and start collapsed, WebChat backscroll survives streaming, the sidebar session picker remains interactive above the desktop workbench, reset soft args survive UI dispatch, stale dashboard session parent lineage is preserved, and iOS reconnects stale foreground gateways. (#92779, #92622, #92705, #91353, #90658, #92552) Thanks @shakkernerd, @TurboTheTurtle, @NianJiuZst, @zhouhe-xydt, @luoyanglang, and @Solvely-Colin.</li>
-<li>Memory, state, and diagnostics recover cleaner: oversized OpenAI embedding batches split before 431s, QMD memory search stays available in transient mode, SQLite avoids WAL on NFS state volumes, stuck-session recovery scheduling no longer resets warning backoff, full memory reindexes preserve rollback/cache recovery, raw Memory Wiki source pages stop looking malformed, and Infinity chunk limits stay genuinely unbounded. (#92650, #92618, #92639, #91247, #92752, #92881, #59137, #92876, #69700, #92735) Thanks @mushuiyu886, @TurboTheTurtle, @849261680, @gnanam1990, @TSHOGX, @arlen8411, and @yhterrance.</li>
-</ul>
-<h3>Changes</h3>
-<ul>
-<li>Providers/models: add GLM-5.2 support and Claude Haiku 4.5 catalog entries while keeping provider-qualified model IDs normalized across OpenRouter and Google Vertex paths. (#92796, #90116, #92627, #91218) Thanks @arkyu2077, @liuhao1024, and @bymle.</li>
-<li>Web search: keep key-free providers such as Parallel Free, DuckDuckGo, Ollama, and Codex Hosted Search as explicit opt-ins instead of selecting them automatically when no API-backed provider is configured. (#93616) Thanks @davemorin and @vincentkoc.</li>
-<li>Channel plugins: ship Telegram rich-message delivery and WhatsApp ACP binding support, including preserved intentional line breaks, rich prompt handoff to CLI backends, and transport fixtures for richer drafts. (#92679, #93164, #92513) Thanks @obviyus and @TurboTheTurtle.</li>
-<li>Agent commands: support <code>/btw</code> in CLI-backed sessions and keep CLI usage-error exits classified as usage failures instead of successful runs. (#92669, #92162) Thanks @joshavant and @Pandah97.</li>
-<li>Usage hooks: add built-in full footer rendering, default footer templates, per-turn usage state, credential-aware limits, and fixed-decimal formatting for usage-bar templates. (#92657, #89835, #89629) Thanks @Marvinthebored.</li>
-<li>Docs and operator guidance: document node config examples, clarify before-install hook scope, correct agent default concurrency comments, refresh ZAI provider docs, and update channel/group docs for current Telegram and WhatsApp behavior. (#92677, #92766, #92695) Thanks @liuhao1024, @sallyom, and @ArielSmoliar.</li>
-</ul>
-<h3>Fixes</h3>
-<ul>
-<li>Channels and delivery: preserve account-scoped DM channel send policy, intentional rich-message line breaks in Telegram and status output, rich Telegram final replies, rich Telegram tables and lists, Telegram thread-create CLI remapping, Feishu dynamic-agent routes after persisted binding reuse, Slack outbound <code>message_sent</code> hooks, contributed message-tool schema optionality, same-channel generated media completions, and channel chunking around surrogate pairs and Infinity limits. (#92788, #93164, #92679, #89421, #89943, #42837, #92814, #91137, #91246, #92735) Thanks @yetval, @obviyus, @spacegeologist, @rishitamrakar, @liuhao1024, @lundog, @TurboTheTurtle, and @yhterrance.</li>
-<li>Discord: give generated auto-thread titles a 60-second timeout and 4,096-token reasoning-model output budget, clamped to the selected model output cap. (#64734) Thanks @hanamizuki.</li>
-<li>Agent, cron, and Gateway runtime: mark active main sessions before restart shutdown aborts, pause yielded subagent runs whose terminal also signals abort, clamp trusted subagent thinking overrides through provider/model fallback, preserve yielded media completions, deliver channel message-tool final replies through auto-reply while hiding internal delivery hints, restore reset archive fallback reads when active async transcripts are missing, de-duplicate main-session heartbeat events, expose session identity in runtime prompts, reject unknown OpenAI agent selectors, keep generated media completions, slash-command block replies, and trajectory export commands in WebChat, and require admin privileges for HTTP session/model override surfaces. (#91357, #92631, #92412, #92146, #92879, #91287, #92468, #92510, #91246, #92651, #92646) Thanks @ooiuuii, @openperf, @IWhatsskill, @masatohoshino, @CadanHu, @ZengWen-DT, @zhangguiping-xydt, and @TurboTheTurtle.</li>
-<li>Providers and model replay: preserve storeless OpenAI Responses replay compatibility, recover invalid OpenAI reasoning signatures and genericized Anthropic thinking-signature replay errors, route OAuth image defaults through Codex for eligible OpenAI profiles, avoid eager tool streaming for Claude 4.5 in Copilot, quarantine unreadable and post-hook OpenAI/Anthropic-family tool schemas without broadening allowed tool choices, deliver explicit thinking-off requests to LM Studio binary-thinking models, honor profile auth for SecretRef model entries, bound model browsing, strip provider prefixes where runtimes need bare IDs, and surface nested embedding fetch failures. (#90706, #92941, #92201, #92916, #92824, #75393, #92908, #92921, #92928, #92002, #90686, #92247, #92627, #91218, #92628) Thanks @snowzlm, @mmyzwl, @CarlCapital, @bek91, @Kailigithub, @vincentkoc, @rohitjavvadi, @samson910022, @nxmxbbd, @liuhao1024, @bymle, and @mushuiyu886.</li>
-<li>Memory, state, diagnostics, and config: split header-too-large embedding batches, keep QMD memory search enabled in transient mode, avoid SQLite WAL on NFS volumes, preserve recovery scheduling outside stuck-session warning backoff, preserve full-reindex rollback/cache recovery, treat raw Memory Wiki source pages as source evidence, and keep shell environment fallbacks contained in config write tests. (#92650, #92618, #92639, #91247, #92752, #92881, #59137, #92876, #69700) Thanks @mushuiyu886, @TurboTheTurtle, @849261680, @gnanam1990, @TSHOGX, and @arlen8411.</li>
-<li>UI/mobile/TUI: preserve dashboard session parent lineage, WebChat backscroll, reset soft command args, sidebar session picker interactivity, collapsed workspace files, resolved <code>/model</code> confirmation refs, stale foreground iOS Gateway reconnects, and paused setup-parent stdin after inherited-stdio child exit. (#90658, #92622, #91353, #92705, #92779, #92773, #92552, #93159) Thanks @luoyanglang, @TurboTheTurtle, @zhouhe-xydt, @NianJiuZst, @shakkernerd, @NarahariRaghava, @Solvely-Colin, and @fuller-stack-dev.</li>
-<li>Plugins and updates: repair missing required platform packages during managed plugin installs and updates, including omitted Codex platform binaries.</li>
-<li>Dependencies: update Hono to 4.12.25 so published OpenClaw and ACPX packages use the patched runtime.</li>
-<li>Release and test reliability: extend slow Gateway/full-suite watchdogs, split local full-suite shards when throttled, stabilize plugin auth marker fixtures, avoid brittle provider-ref error text, fold Telegram RTT sampling into live QA evidence, simplify QA scorecard mappings around canonical coverage IDs, keep QA Lab bootstrap selection assertions aligned with flow-only scenarios, skip QA coverage artifact consumers when runtime parity producer status is not green, keep Feishu lifecycle release checks pointed at the active fixture config, isolate trajectory-export live seed turns from Codex-native shell approvals, preserve release-check child refs while pinning expected SHAs, widen live OpenAI TTS budgets for slower provider responses, and avoid false downgrade prompts for unresolved latest-tag updates. (#92652, #92550, #92558, #92911) Thanks @RomneyDa and @Andy312432.</li>
-</ul>
-<p><a href="https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md">View full changelog</a></p>
-]]></description>
-            <enclosure url="https://github.com/openclaw/openclaw/releases/download/v2026.6.8/OpenClaw-2026.6.8.zip" length="55815364" type="application/octet-stream" sparkle:edSignature="hLJ14xg6+DMFrXViIW3Njs++OPIGO+RWH9h+mPCSzXPAkKyYUGvtOLu1qEKvvfC8rs5FGgW/w4zDLfD2azqiBA=="/>
-        </item>
        <item>
            <title>2026.6.5</title>
            <pubDate>Tue, 09 Jun 2026 19:06:49 +0000</pubDate>
@@ -251,5 +209,69 @@
 ]]></description>
            <enclosure url="https://github.com/openclaw/openclaw/releases/download/v2026.6.1/OpenClaw-2026.6.1.zip" length="55062100" type="application/octet-stream" sparkle:edSignature="PVp8E2HBCvikB/0LCr36lFEyHPAzoFA2ScT6LW27FlzvP+m4r1AEuVN2UrtgWlpkGSsn4Eav0kPJe32u4ObNBw=="/>
        </item>
+        <item>
+            <title>2026.5.28</title>
+            <pubDate>Sat, 30 May 2026 21:21:09 +0000</pubDate>
+            <link>https://raw.githubusercontent.com/openclaw/openclaw/main/appcast.xml</link>
+            <sparkle:version>2026052890</sparkle:version>
+            <sparkle:shortVersionString>2026.5.28</sparkle:shortVersionString>
+            <sparkle:minimumSystemVersion>15.0</sparkle:minimumSystemVersion>
+            <description><![CDATA[<h2>OpenClaw 2026.5.28</h2>
+<h3>Highlights</h3>
+<ul>
+<li>Agent and Codex runtime recovery is steadier: subagents keep cwd/workspace separation, hook context stays prompt-local, session locks release on timeout abort while live OpenClaw locks survive cleanup, stale restart continuations are avoided, and Codex app-server/helper failures no longer tear down shared runtime state. (#87218, #86875, #87409, #87399, #87375, #88129)</li>
+<li>Channel delivery and session identity got safer across outbound plugin hooks, Matrix room ids, iMessage reactions/approvals, Slack final replies, Discord recovered tool warnings, runtime-config message actions, WhatsApp profile auth roots, Telegram polling, and Microsoft Teams service URL trust checks. (#73706, #75670, #87366, #87451, #87334, #84535, #82492, #83304, #87160)</li>
+<li>Mobile and chat surfaces got a broader refresh: the iOS Pro UI, hosted push relay default, realtime Talk tab playback, Gateway chat transport, onboarding, Talk permissions, WebChat reconnect delivery, and session picker behavior now preserve more state across reconnects and empty searches. (#87367, #87531, #87682, #88096, #88105) Thanks @ngutman.</li>
+<li>Browser, channel, and automation inputs are stricter: Browser tool timeouts, viewport/tab indices, Gateway ports, cron retry handling, Discord component ids, schema array refs, Telegram callback pages, and channel progress callbacks now reject malformed values earlier and preserve the intended delivery context. (#82887)</li>
+<li>Provider, media, and document coverage expands with Claude Opus 4.8, Fal Krea image schemas, NVIDIA featured models, MiniMax streaming music responses, encrypted PDF extraction, voice model catalogs, GitHub Copilot agent runtime support, and a Codex Supervisor plugin path for delegated Codex workflows. (#87845, #87890, #80775, #84764, #87751, #87794)</li>
+<li>CLI, auth, doctor, and provider paths fail faster and recover more clearly: malformed numeric/version options are rejected, workspace dotenv provider credentials are ignored, heartbeat defaults, OAuth/token lifetimes, and local service startup requests are bounded, agent auth health labels are clearer, legacy <code>api_key</code> auth profiles migrate to canonical form, and restart guidance is actionable. (#87398, #86281, #87361, #88133, #83655, #87559, #88088, #85924) Thanks @vincentkoc and @giodl73-repo.</li>
+<li>Plugin and Gateway hot paths do less repeated work while preserving cache correctness for install records, config JSON parsing, tool search catalogs, session stores, manifest model rows, auto-enabled plugin config, browser tokens, viewer assets, and release-split external plugin packages. (#86699)</li>
+<li>Release, QA, and E2E validation now bound more log, artifact, harness, and cross-OS waits so failing lanes produce proof instead of hanging or false-greening.</li>
+</ul>
+<h3>Changes</h3>
+<ul>
+<li>Status: show active subagent details in status output.</li>
+<li>Diffs: split the default language pack and expand default Diffs language coverage while keeping the host floor aligned. (#87370, #87372) Thanks @RomneyDa.</li>
+<li>ClawHub: add plugin display names plus skill verification and trust surfaces. (#87354, #86699) Thanks @thewilloftheshadow and @Patrick-Erichsen.</li>
+<li>iOS: refresh the dev app with Pro Command, Chat, Agents, Settings, hosted push relay defaults, and realtime Talk playback wired to gateway sessions, diagnostics, chat, and realtime Talk. (#87367, #88096, #88105) Thanks @Solvely-Colin and @ngutman.</li>
+<li>Docs: clarify Codex computer-use setup, paste-token stdin auth setup, macOS gateway sleep troubleshooting, native Codex hook relay recovery, container model auth, install deployment cards, device-token admin gating, CLI setup flow compatibility, Notte cloud browser CDP setup, and backport targets. (#87313, #63050, #87685) Thanks @bdjben, @liaoandi, and @thewilloftheshadow.</li>
+<li>PDF/tools: use ClawPDF for PDF extraction, support encrypted PDF extraction, and surface MCP structured content in agent tool results. (#87670, #87751)</li>
+<li>Providers: add Claude Opus 4.8 support, Fal Krea image model schemas, NVIDIA featured model catalogs, MiniMax streaming music responses, and provider-backed voice model catalogs. (#87845, #87890, #80775, #84764, #87794) Thanks @eleqtrizit and @vincentkoc.</li>
+<li>Codex/GitHub: add the GitHub Copilot agent runtime and the Codex Supervisor plugin package.</li>
+<li>Plugins: externalize GitHub Copilot and Tokenjuice as official install-on-demand plugins with npm and ClawHub publish metadata.</li>
+<li>Workboard: add agent coordination tools for tracking and handing off active agent work.</li>
+<li>Discord: show commentary in progress drafts so live Discord runs expose useful in-progress context. (#85200)</li>
+<li>Plugin SDK: add a reply payload sending hook for plugins that need to deliver channel-owned replies and flatten package types for SDK declarations. (#82823, #87165) Thanks @RomneyDa.</li>
+<li>Policy: add policy comparison, ingress-channel conformance, and sandbox-posture conformance checks. (#85572, #85744, #86768)</li>
+</ul>
+<h3>Fixes</h3>
+<ul>
+<li>Agents: fall back to local config pruning when the optional <code>agents delete</code> Gateway probe cannot authenticate, so offline installs can still delete agents without removing shared workspaces.</li>
+<li>Tighten phone-control mutation authorization [AI]. (#87150) Thanks @pgondhi987.</li>
+<li>Clarify directive persistence authorization policy [AI]. (#86369) Thanks @pgondhi987.</li>
+<li>Agents/Codex: keep spawned agent cwd/workspace state separated, forward ACP spawn attachments, keep hook context prompt-local, release session locks on timeout abort and runtime teardown without deleting live OpenClaw-owned locks during cleanup, avoid session event queue self-wait, clean up exec abort listeners, stream assistant deltas incrementally, recover raw missing-thread compaction failures, preserve rotated compaction session identity, keep compaction-timeout snapshots continuable, preserve shared app-server state across startup or helper failures, keep native hook relay alive across restarts and prune stale bridge files, close native hook relay replacement races, keep Claude live tool progress visible for watchdog recovery, suppress abandoned requester completion handoff, route workspace memory through tools, resolve Codex runtime models first, report quarantined dynamic tools, format <code>skills</code> command output, bind node auto-review to prepared plans, retry Claude CLI transcript probes, and bound compaction/steering retries. (#87218, #86875, #86123, #88129, #87399, #87375, #72574, #87383, #87400, #83022, #87671, #87738, #87747, #87706, #87546, #87541, #81048) Thanks @mbelinky, @Alix-007, @luoyanglang, @yetval, @sjf, @joshavant, and @benjamin1492.</li>
+<li>Codex Supervisor: keep real-home app-server MCP session listing on the loaded state path, bound stored history scans, and close WebSocket probes cleanly.</li>
+<li>Channels: thread canonical session keys into outbound hooks, preserve Matrix room-id case, keep fallback tool warnings mention-inert, retain delivered Slack final replies during late cleanup, continue iMessage polling after denied reactions, suppress duplicate native exec approvals, resolve Gateway message actions against the active runtime config, preserve Telegram SecretRef prompt config and polling keepalives, preserve WhatsApp profile auth roots, QR display, document filenames, and plugin hook config, suppress Discord recovered tool warnings, preserve the Discord voice outbound helper, cap Discord/Signal/Zalo channel request and container timeouts, and block untrusted Teams service URLs while keeping TeamsSDK patterns aligned. (#73706, #75670, #87366, #87451, #87465, #87334, #84535, #76262, #83304, #82492, #87581, #77114, #86426, #85529, #87160) Thanks @zeroaltitude, @lukeboyett, @xiaotian, @funmerlin, @joshavant, @eleqtrizit, @heyitsaamir, @amittell, @liorb-mountapps, @masatohoshino, @bladin, and @giodl73-repo.</li>
+<li>CLI/auth/doctor/providers: reject malformed numeric/timeout/subcommand-version inputs, ignore workspace dotenv provider credentials, wait for respawn child shutdown, bound heartbeat defaults plus Codex, GitHub Copilot, OpenAI, Anthropic, Google, Feishu, LM Studio, MiniMax, Xiaomi TTS, and local-provider OAuth/token/model requests, harden Codex auth probes, label auth health by agent, preserve explicit agentRuntime pins during Codex model migration, warm provider auth off the main thread, honor Codex response timeouts, stop migrating current Claude Haiku 4.5 profiles to Sonnet, bound local service startup, resolve GPT-5.5 without cached catalog, migrate legacy memory auto-provider config, rewrite non-canonical <code>api_key</code> auth profiles, and make doctor restart follow-ups actionable. (#87398, #86281, #87361, #88133, #83655, #87559, #87719, #88088, #85924, #84362) Thanks @Patrick-Erichsen, @samzong, @giodl73-repo, @alkor2000, @mmaps, @nxmxbbd, and @vincentkoc.</li>
+<li>Gateway/security/session state: expire browser tokens after auth rotation, scope assistant idempotency dedupe, drain probe client closes, avoid stale restart continuation reuse, preserve retry-after fallbacks and stale rate-limit cooldown probes, bound webchat image and artifact transcript scans, include seconds in inbound metadata timestamps, clear completed session active runs, clear stale chat stream buffers, and evict current plugin-state namespaces at row caps. (#87810, #87833, #75089) Thanks @joshavant and @litang9.</li>
+<li>Config/parsing/network: reject partial numeric parsing, parse provider/Discord retry headers and dates strictly, honor IPv6 and bare IPv6 <code>no_proxy</code> entries, preserve empty plugin allowlists, canonicalize secret target array indexes, and reject malformed media content lengths, inspected TCP ports, marketplace content lengths, cron epochs, sandbox stat fields, unsafe duration values, empty config path segments, noncanonical schema array refs, unsafe Telegram callback pages, and invalid Teams attachment-fetch DNS targets. (#87883) Thanks @zhangguiping-xydt.</li>
+<li>Browser/input hardening: reject invalid tab indexes, excessive viewport resizes, explicit zero CDP ports, malformed geolocation options, unsafe screenshot or permission-grant timeouts, loose response-body limits, invalid cookie expiries, and non-finite Browser tool delays/timeouts.</li>
+<li>Cron/automation: retry recurring jobs after transient model rate limits before waiting for the next scheduled slot, and preflight model fallbacks before skipping scheduled work. (#82887)</li>
+<li>Auto-reply/directives: respect provider and relayed channel metadata during directive persistence so channel-originated decisions keep their intended context. (#87683)</li>
+<li>WhatsApp: resolve the auth directory from the active profile so profile-scoped WhatsApp installs do not drift to the wrong credential root. (#82492)</li>
+<li>Gateway/session state: clear completed session active runs, avoid cold-loading providers for MCP inventory, cache single-session child indexes, cap handshake timers, and bound preauth, auth-guard, media, transcript, readiness, and port options.</li>
+<li>Channels/replies: preserve channel-owned progress callbacks when verbose output is off, keep group-room progress suppression intact, prefer external session delivery context, escape Discord component id delimiters, force final TUI chat repaints, show Slack reasoning previews, and normalize Discord/Matrix/Mattermost channel numeric options. (#87476, #87423)</li>
+<li>Agents/tool args: harden smart-quoted argument repair for edit arrays and exact escaped arguments so model-produced tool calls recover without corrupting valid input. (#86611)</li>
+<li>Providers/agents: preserve seeded Anthropic signatures, preserve signed thinking payloads, concatenate signature-delta chunks, preserve DeepSeek <code>reasoning_content</code> replay across tier suffixes, apply OpenRouter strict9 ids to Mistral routes, promote Ollama plain-text tool calls, load NVIDIA featured model catalogs, stream MiniMax music generation responses, and recover empty preflight compaction. (#87593, #87493, #80775, #84764) Thanks @eleqtrizit.</li>
+<li>Media/images: skip CLI image cache refs when resolving generated images, allow trusted generated HTML attachments, and bound generated video downloads so stale refs and slow providers fail cleanly. (#87523, #87982)</li>
+<li>File transfer: handle late tar stdin pipe errors after archive validation or unpacking has already settled.</li>
+<li>Performance: trust install-record caches between reloads, prefer native JSON parsing, reuse unchanged tool-search catalogs, reuse gateway session and plugin metadata paths, skip unchanged store serialization, patch single-entry session writes, add precomputed session patch writers, reduce store clone allocations, cache manifest model catalog rows and auto-enabled plugin config, avoid full session snapshots for entry reads, defer configured Slack full startup, prefer bundled plugin dist entries, and slim current metadata identity caches. (#87760)</li>
+<li>Docker/release/QA: package runtime workspace templates, stream cross-OS served artifacts, preserve sparse Crabbox run artifacts, isolate npm plugin installs per package, reject incompatible package plugin API installs, drop the leftover root Sharp dependency from package manifests after the Rastermill migration, bound OpenClaw instance logs, plugin gauntlet relay logs, MCP channel buffers, kitchen-sink scans, agent-turn assertions, QA-Lab credential broker calls, QA Matrix substrate requests, and release scenario logs, and keep release/google live guards current. (#87647, #87477) Thanks @rohitjavvadi and @vincentkoc.</li>
+<li>Release/CI: bound manual git fetches, ClawHub verifier responses, ClawHub owner metadata, dependency-guard error bodies, Parallels limits, startup/test/memory budget parsing, and diffs viewer build warnings so release lanes fail with useful proof instead of hanging. (#87839)</li>
+</ul>
+<p><a href="https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md">View full changelog</a></p>
+]]></description>
+            <enclosure url="https://github.com/openclaw/openclaw/releases/download/v2026.5.28/OpenClaw-2026.5.28.zip" length="54750142" type="application/octet-stream" sparkle:edSignature="U4O55uMdPU+OqSx9QR1ApUJ8wg65wxTydzD7iyCn1GHtm1MBK9noEeiA/yoUKkqb/bx0hzi1gNhn+ye19RXnCA=="/>
+        </item>
    </channel>
 </rss>
--- a/apps/android/CHANGELOG.md
+++ b/apps/android/CHANGELOG.md
@@ -1,11 +0,0 @@
-# OpenClaw Android Changelog
-
-## Unreleased
-
-Maintenance update for the current OpenClaw Android release.
-
-## 2026.6.2 - 2026-06-02
-
-OpenClaw is now available on Android.
-
-Connect to your OpenClaw Gateway to chat with your assistant, use realtime Talk mode, review approvals, and bring Android device capabilities like camera, location, screen, and notifications into your private automation workflows.
--- a/apps/android/Config/ReleaseSigning.json
+++ b/apps/android/Config/ReleaseSigning.json
@@ -1,14 +0,0 @@
-{
-  "signingRepo": "git@github.com:openclaw/apps-signing.git",
-  "signingBranch": "main",
-  "assetPath": "android/openclaw",
-  "uploadKeystoreEncryptedFile": "upload-keystore.jks.enc",
-  "gradlePropertiesEncryptedFile": "gradle.properties.enc",
-  "materializedRoot": "apps/android/build/release-signing",
-  "gradlePropertyNames": [
-    "OPENCLAW_ANDROID_STORE_FILE",
-    "OPENCLAW_ANDROID_STORE_PASSWORD",
-    "OPENCLAW_ANDROID_KEY_ALIAS",
-    "OPENCLAW_ANDROID_KEY_PASSWORD"
-  ]
-}
--- a/apps/android/Config/Version.properties
+++ b/apps/android/Config/Version.properties
@@ -1,6 +0,0 @@
-# Shared Android version defaults.
-# Source of truth: apps/android/version.json
-# Generated by scripts/android-sync-versioning.ts.
-
-OPENCLAW_ANDROID_VERSION_NAME=2026.6.2
-OPENCLAW_ANDROID_VERSION_CODE=2026060201
--- a/apps/android/README.md
+++ b/apps/android/README.md
@@ -32,7 +32,7 @@ cd apps/android
 ./gradlew :app:installPlayDebug
 ./gradlew :app:testPlayDebugUnitTest
 cd ../..
-pnpm android:release:archive
+bun run android:bundle:release
 ```

 Third-party debug flavor:
@@ -44,39 +44,10 @@ cd apps/android
 ./gradlew :app:testThirdPartyDebugUnitTest
 ```

-Android release archives use the pinned version in `apps/android/version.json`. Update it with:
+`bun run android:bundle:release` auto-bumps Android `versionName`/`versionCode` in `apps/android/app/build.gradle.kts`, then builds two signed release bundles:

-```bash
-pnpm android:version
-pnpm android:version:check
-pnpm android:version:pin -- --from-gateway
-pnpm android:version:pin -- --version 2026.6.5 --version-code 2026060501
-```
-
-Release-owner signing sync:
-
-```bash
-pnpm android:release:signing:plan
-MATCH_PASSWORD=<signing repo password> pnpm android:release:signing:sync:pull
-MATCH_PASSWORD=<signing repo password> pnpm android:release:signing:check
-```
-
-The signing sync pulls encrypted Android upload-key assets from the shared `apps-signing` repo and materializes decrypted files under `apps/android/build/release-signing/`.
-
-Generate raw Google Play screenshots:
-
-```bash
-pnpm android:screenshots
-```
-
-`pnpm android:release:archive` builds signed release artifacts into `apps/android/build/release-artifacts/` and writes `.sha256` checksum files:
-
- Play build: `openclaw-<version>-play-release.aab`
- Third-party build: `openclaw-<version>-third-party-release.apk`
-
-`pnpm android:bundle:release` is an alias for the same Fastlane archive lane.
-
-See `apps/android/VERSIONING.md` and `apps/android/fastlane/SETUP.md` for the release workflow.
+- Play build: `apps/android/build/release-bundles/openclaw-<version>-play-release.aab`
+- Third-party build: `apps/android/build/release-bundles/openclaw-<version>-third-party-release.aab`

 Flavor-specific direct Gradle tasks:

--- a/apps/android/VERSIONING.md
+++ b/apps/android/VERSIONING.md
@@ -1,65 +0,0 @@
-# OpenClaw Android Versioning
-
-Android release builds use pinned app metadata instead of auto-bumping `build.gradle.kts`.
-
-## Version model
-
- `apps/android/version.json` is the source of truth.
- `version` is the Play `versionName` and uses CalVer: `YYYY.M.D`.
- `versionCode` uses `YYYYMMDDNN`, where `NN` is a two-digit build number for that pinned app version.
- `apps/android/Config/Version.properties` is generated from `version.json` and read by Gradle.
- `apps/android/CHANGELOG.md` is the Android-only changelog and release-note source.
- `apps/android/fastlane/metadata/android/en-US/release_notes.txt` is generated from the changelog.
-
-Examples:
-
- `version = 2026.6.2`
- `versionCode = 2026060201`
- another upload on the same release train: `versionCode = 2026060202`
-
-## Commands
-
-```bash
-pnpm android:version
-pnpm android:version:check
-pnpm android:version:sync
-pnpm android:version:pin -- --from-gateway
-pnpm android:version:pin -- --version 2026.6.5 --version-code 2026060501
-pnpm android:release:signing:plan
-MATCH_PASSWORD=<signing repo password> pnpm android:release:signing:sync:pull
-pnpm android:release:preflight
-```
-
-## Release-note resolution order
-
-When generating `apps/android/fastlane/metadata/android/en-US/release_notes.txt`, the tooling reads the first available changelog section in this order:
-
-1. exact pinned version, for example `## 2026.6.2`
-2. `## Unreleased`
-
-Recommended workflow:
-
- while iterating on a Play internal testing train, keep pending notes under `## Unreleased`
- before the production release, move or copy the final notes under `## <pinned version>` and run sync again
-
-## Release Workflow
-
-1. Pin Android to the intended release version.
-2. Run `pnpm android:version:sync`.
-3. Update `apps/android/CHANGELOG.md`, then run `pnpm android:version:sync` again if needed.
-4. Run `MATCH_PASSWORD=<signing repo password> pnpm android:release:signing:sync:pull` to materialize encrypted Android signing assets from `apps-signing`.
-5. Run `pnpm android:release:preflight` to validate Play auth, signing, synced versioning, and release notes.
-6. Run `pnpm android:screenshots` to refresh raw Google Play screenshots.
-7. Run `pnpm android:release:archive` to produce the signed Play AAB and third-party APK.
-8. Run `pnpm android:release:upload` to upload metadata, screenshots, and the Play AAB to Google Play internal testing.
-9. Promote to production manually in Google Play Console.
-
-The third-party flavor is archived as a signed APK for non-Play distribution. It is not uploaded by the Play release lane.
-
-## Signing model
-
-`apps/android/Config/ReleaseSigning.json` pins the Android signing assets in the shared private `apps-signing` repo. The Android pipeline uses the same `MATCH_PASSWORD` release-owner secret as iOS, but the Android files are managed by `scripts/android-release-signing.mjs` instead of Fastlane `match`.
-
-`sync:pull` decrypts the Play upload keystore and Gradle signing properties into `apps/android/build/release-signing/`. That directory is gitignored, and Fastlane exports the materialized values as Gradle project properties for the current release command.
-
-If `MATCH_PASSWORD` is not set, the existing manual Gradle-property signing path still works: provide `OPENCLAW_ANDROID_STORE_FILE`, `OPENCLAW_ANDROID_STORE_PASSWORD`, `OPENCLAW_ANDROID_KEY_ALIAS`, and `OPENCLAW_ANDROID_KEY_PASSWORD` through your local Gradle user properties before running release tasks.
--- a/apps/android/app/build.gradle.kts
+++ b/apps/android/app/build.gradle.kts
@@ -1,24 +1,6 @@
 import com.android.build.api.variant.impl.VariantOutputImpl
-import java.util.Properties

 val dnsjavaInetAddressResolverService = "META-INF/services/java.net.spi.InetAddressResolverProvider"
-val openClawAndroidVersionFile = rootProject.file("Config/Version.properties")
-val openClawAndroidVersionProperties =
-  Properties().apply {
-    if (!openClawAndroidVersionFile.isFile) {
-      error("Missing Android version properties. Run `pnpm android:version:sync`.")
-    }
-    openClawAndroidVersionFile.inputStream().use(::load)
-  }
-
-fun requireOpenClawAndroidVersionProperty(name: String): String =
-  openClawAndroidVersionProperties.getProperty(name)?.trim()?.takeIf { it.isNotEmpty() }
-    ?: error("Missing $name in Config/Version.properties. Run `pnpm android:version:sync`.")
-
-val openClawAndroidVersionName = requireOpenClawAndroidVersionProperty("OPENCLAW_ANDROID_VERSION_NAME")
-val openClawAndroidVersionCode =
-  requireOpenClawAndroidVersionProperty("OPENCLAW_ANDROID_VERSION_CODE").toIntOrNull()
-    ?: error("OPENCLAW_ANDROID_VERSION_CODE must be an integer in Config/Version.properties.")

 val androidStoreFile = providers.gradleProperty("OPENCLAW_ANDROID_STORE_FILE").orNull?.takeIf { it.isNotBlank() }
 val androidStorePassword = providers.gradleProperty("OPENCLAW_ANDROID_STORE_PASSWORD").orNull?.takeIf { it.isNotBlank() }
@@ -83,8 +65,8 @@ android {
    applicationId = "ai.openclaw.app"
    minSdk = 31
    targetSdk = 36
-    versionCode = openClawAndroidVersionCode
-    versionName = openClawAndroidVersionName
+    versionCode = 2026060201
+    versionName = "2026.6.2"
    ndk {
      // Support all major ABIs — native libs are tiny (~47 KB per ABI)
      abiFilters += listOf("armeabi-v7a", "arm64-v8a", "x86", "x86_64")
--- a/apps/android/app/src/main/java/ai/openclaw/app/AndroidScreenshotMode.kt
+++ b/apps/android/app/src/main/java/ai/openclaw/app/AndroidScreenshotMode.kt
@@ -1,28 +0,0 @@
-package ai.openclaw.app
-
-import android.content.Intent
-
-const val extraAndroidScreenshotMode = "openclaw.screenshotMode"
-const val extraAndroidScreenshotScene = "openclaw.screenshotScene"
-
-enum class AndroidScreenshotScene(
-  val rawValue: String,
-) {
-  Connect("connect"),
-  Chat("chat"),
-  Voice("voice"),
-  Screen("screen"),
-  Settings("settings"),
-  ;
-
-  companion object {
-    fun fromRawValue(raw: String?): AndroidScreenshotScene = entries.firstOrNull { it.rawValue == raw?.trim()?.lowercase() } ?: Connect
-  }
-}
-
-fun parseAndroidScreenshotModeIntent(intent: Intent?): AndroidScreenshotScene? {
-  if (intent?.getBooleanExtra(extraAndroidScreenshotMode, false) != true) {
-    return null
-  }
-  return AndroidScreenshotScene.fromRawValue(intent.getStringExtra(extraAndroidScreenshotScene))
-}
--- a/apps/android/app/src/main/java/ai/openclaw/app/MainActivity.kt
+++ b/apps/android/app/src/main/java/ai/openclaw/app/MainActivity.kt
@@ -1,6 +1,5 @@
 package ai.openclaw.app

-import ai.openclaw.app.ui.AndroidScreenshotModeScreen
 import ai.openclaw.app.ui.OpenClawTheme
 import ai.openclaw.app.ui.RootScreen
 import android.content.Intent
@@ -52,12 +51,6 @@ class MainActivity : ComponentActivity() {
    pendingIntent = intent
    WindowCompat.setDecorFitsSystemWindows(window, false)
    permissionRequester = PermissionRequester(this)
-    if (BuildConfig.DEBUG) {
-      parseAndroidScreenshotModeIntent(intent)?.let { scene ->
-        enterScreenshotMode(scene)
-        return
-      }
-    }

    setContent {
      var activeViewModel by remember { mutableStateOf<MainViewModel?>(null) }
@@ -86,12 +79,6 @@ class MainActivity : ComponentActivity() {
    }
  }

-  private fun enterScreenshotMode(scene: AndroidScreenshotScene) {
-    setContent {
-      AndroidScreenshotModeScreen(scene = scene)
-    }
-  }
-
  override fun onStart() {
    super.onStart()
    foreground = true
--- a/apps/android/app/src/main/java/ai/openclaw/app/MainViewModel.kt
+++ b/apps/android/app/src/main/java/ai/openclaw/app/MainViewModel.kt
@@ -111,8 +111,6 @@ class MainViewModel(

  val isConnected: StateFlow<Boolean> = runtimeState(initial = false) { it.isConnected }
  val isNodeConnected: StateFlow<Boolean> = runtimeState(initial = false) { it.nodeConnected }
-  val nodeCapabilityApprovalState: StateFlow<GatewayNodeApprovalState> =
-    runtimeState(initial = GatewayNodeApprovalState.Loading) { it.nodeCapabilityApprovalState }
  val statusText: StateFlow<String> = runtimeState(initial = "Offline") { it.statusText }
  val gatewayConnectionProblem: StateFlow<GatewayConnectionProblem?> = runtimeState(initial = null) { it.gatewayConnectionProblem }
  val serverName: StateFlow<String?> = runtimeState(initial = null) { it.serverName }
--- a/apps/android/app/src/main/java/ai/openclaw/app/NodeRuntime.kt
+++ b/apps/android/app/src/main/java/ai/openclaw/app/NodeRuntime.kt
@@ -69,7 +69,6 @@ import kotlinx.coroutines.withTimeout
 import kotlinx.serialization.Serializable
 import kotlinx.serialization.json.Json
 import kotlinx.serialization.json.JsonArray
-import kotlinx.serialization.json.JsonElement
 import kotlinx.serialization.json.JsonObject
 import kotlinx.serialization.json.JsonPrimitive
 import kotlinx.serialization.json.buildJsonObject
@@ -302,8 +301,6 @@ class NodeRuntime(
  val isConnected: StateFlow<Boolean> = _isConnected.asStateFlow()
  private val _nodeConnected = MutableStateFlow(false)
  val nodeConnected: StateFlow<Boolean> = _nodeConnected.asStateFlow()
-  private val _nodeCapabilityApprovalState = MutableStateFlow(GatewayNodeApprovalState.Loading)
-  val nodeCapabilityApprovalState: StateFlow<GatewayNodeApprovalState> = _nodeCapabilityApprovalState.asStateFlow()

  private val _statusText = MutableStateFlow("Offline")
  val statusText: StateFlow<String> = _statusText.asStateFlow()
@@ -398,7 +395,6 @@ class NodeRuntime(
  val nodesDevicesRefreshing: StateFlow<Boolean> = _nodesDevicesRefreshing.asStateFlow()
  private val _nodesDevicesErrorText = MutableStateFlow<String?>(null)
  val nodesDevicesErrorText: StateFlow<String?> = _nodesDevicesErrorText.asStateFlow()
-  private val nodeApprovalRefreshGuard = GatewayNodeApprovalRefreshGuard()
  private val _channelsSummary = MutableStateFlow(GatewayChannelsSummary(channels = emptyList()))
  val channelsSummary: StateFlow<GatewayChannelsSummary> = _channelsSummary.asStateFlow()
  private val _channelsRefreshing = MutableStateFlow(false)
@@ -447,7 +443,6 @@ class NodeRuntime(
        updateStatus()
        micCapture.onGatewayConnectionChanged(true)
        scope.launch {
-          subscribeOperatorSessionEvents()
          refreshHomeCanvasOverviewIfConnected()
          if (voiceReplySpeakerLazy.isInitialized()) {
            voiceReplySpeaker.refreshConfig()
@@ -456,7 +451,6 @@ class NodeRuntime(
      },
      onDisconnected = { message ->
        operatorConnected = false
-        invalidateNodeCapabilityApprovalState()
        operatorStatusText = message
        _serverName.value = null
        _remoteAddress.value = null
@@ -491,14 +485,6 @@ class NodeRuntime(
      },
    )

-  private suspend fun subscribeOperatorSessionEvents() {
-    try {
-      operatorSession.request("sessions.subscribe", null)
-    } catch (err: Throwable) {
-      Log.d("OpenClawRuntime", "sessions.subscribe failed: ${err.message ?: err::class.java.simpleName}")
-    }
-  }
-
  private val nodeSession =
    GatewaySession(
      scope = scope,
@@ -517,15 +503,12 @@ class NodeRuntime(
        publishNodePresenceAliveBeacon(NodePresenceAliveBeacon.Trigger.Connect)
        val endpoint = connectedEndpoint
        val auth = activeGatewayAuth
-        if (operatorConnected) {
-          scope.launch { refreshNodesDevicesFromGateway() }
-        } else if (endpoint != null && auth != null) {
+        if (endpoint != null && auth != null) {
          maybeStartOperatorSessionAfterNodeConnect(endpoint, auth)
        }
      },
      onDisconnected = { message ->
        _nodeConnected.value = false
-        invalidateNodeCapabilityApprovalState()
        nodeStatusText = message
        didAutoRequestCanvasRehydrate = false
        _canvasA2uiHydrated.value = false
@@ -2017,42 +2000,21 @@ class NodeRuntime(
  }

  private suspend fun refreshNodesDevicesFromGateway() {
-    val refreshGeneration = nodeApprovalRefreshGuard.begin()
-    val refreshStarted =
-      nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-        _nodesDevicesRefreshing.value = true
-        _nodesDevicesErrorText.value = null
-        _nodeCapabilityApprovalState.value = GatewayNodeApprovalState.Loading
-      }
-    if (!refreshStarted) return
+    _nodesDevicesRefreshing.value = true
+    _nodesDevicesErrorText.value = null
    if (!operatorConnected) {
-      nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-        _nodesDevicesSummary.value =
-          GatewayNodesDevicesSummary(
-            nodes = emptyList(),
-            pendingDevices = emptyList(),
-            pairedDevices = emptyList(),
-          )
-        _nodesDevicesRefreshing.value = false
-      }
+      _nodesDevicesSummary.value =
+        GatewayNodesDevicesSummary(
+          nodes = emptyList(),
+          pendingDevices = emptyList(),
+          pairedDevices = emptyList(),
+        )
+      _nodesDevicesRefreshing.value = false
      return
    }
    try {
      val nodesRes = operatorSession.request("node.list", "{}")
      val nodesRoot = json.parseToJsonElement(nodesRes).asObjectOrNull()
-      val nodes = parseGatewayNodes(nodesRoot?.get("nodes") as? JsonArray)
-      val approvalState =
-        currentNodeCapabilityApprovalState(
-          nodes = nodes,
-          selfNodeId = identityStore.loadOrCreate().deviceId,
-        )
-      val publishedApproval =
-        nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-          _nodeCapabilityApprovalState.value = approvalState
-        }
-      if (!publishedApproval) {
-        return
-      }
      val devicesRoot =
        try {
          val devicesRes = operatorSession.request("device.pair.list", "{}")
@@ -2060,30 +2022,16 @@ class NodeRuntime(
        } catch (_: Throwable) {
          null
        }
-      nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-        _nodesDevicesSummary.value =
-          GatewayNodesDevicesSummary(
-            nodes = nodes,
-            pendingDevices = parsePendingDevices(devicesRoot?.get("pending") as? JsonArray),
-            pairedDevices = parsePairedDevices(devicesRoot?.get("paired") as? JsonArray),
-            devicePairingAvailable = devicesRoot != null,
-          )
-      }
+      _nodesDevicesSummary.value =
+        GatewayNodesDevicesSummary(
+          nodes = parseGatewayNodes(nodesRoot?.get("nodes") as? JsonArray),
+          pendingDevices = parsePendingDevices(devicesRoot?.get("pending") as? JsonArray),
+          pairedDevices = parsePairedDevices(devicesRoot?.get("paired") as? JsonArray),
+          devicePairingAvailable = devicesRoot != null,
+        )
    } catch (_: Throwable) {
-      nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-        _nodesDevicesErrorText.value = "Could not load nodes and devices."
-      }
+      _nodesDevicesErrorText.value = "Could not load nodes and devices."
    } finally {
-      nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-        _nodesDevicesRefreshing.value = false
-      }
-    }
-  }
-
-  private fun invalidateNodeCapabilityApprovalState() {
-    val refreshGeneration = nodeApprovalRefreshGuard.begin()
-    nodeApprovalRefreshGuard.publishIfCurrent(refreshGeneration) {
-      _nodeCapabilityApprovalState.value = GatewayNodeApprovalState.Loading
      _nodesDevicesRefreshing.value = false
    }
  }
@@ -2332,8 +2280,22 @@ class NodeRuntime(

  private fun parseGatewayNodes(nodes: JsonArray?): List<GatewayNodeSummary> =
    nodes
-      ?.mapNotNull(::parseGatewayNodeSummary)
-      .orEmpty()
+      ?.mapNotNull { item ->
+        val obj = item.asObjectOrNull() ?: return@mapNotNull null
+        val id = obj["nodeId"].asStringOrNull()?.trim().orEmpty()
+        if (id.isEmpty()) return@mapNotNull null
+        GatewayNodeSummary(
+          id = id,
+          displayName = obj["displayName"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
+          remoteIp = obj["remoteIp"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
+          version = obj["version"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
+          deviceFamily = obj["deviceFamily"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
+          paired = obj.boolean("paired"),
+          connected = obj.boolean("connected"),
+          capabilities = parseStringArray(obj["caps"] as? JsonArray),
+          commands = parseStringArray(obj["commands"] as? JsonArray),
+        )
+      }.orEmpty()

  private fun parsePendingDevices(devices: JsonArray?): List<GatewayPendingDeviceSummary> =
    devices
@@ -2861,81 +2823,6 @@ data class GatewayNodesDevicesSummary(
  val devicePairingAvailable: Boolean = true,
 )

-enum class GatewayNodeApprovalState {
-  Loading,
-  Unsupported,
-  Approved,
-  PendingApproval,
-  PendingReapproval,
-  Unapproved,
-}
-
-/** Prevents older node.list responses from overwriting newer approval state. */
-internal class GatewayNodeApprovalRefreshGuard {
-  private val lock = Any()
-  private var generation = 0L
-
-  fun begin(): Long =
-    synchronized(lock) {
-      generation += 1
-      generation
-    }
-
-  fun publishIfCurrent(
-    refreshGeneration: Long,
-    publish: () -> Unit,
-  ): Boolean =
-    synchronized(lock) {
-      if (refreshGeneration != generation) return@synchronized false
-      publish()
-      true
-    }
-}
-
-internal fun parseGatewayNodeApprovalState(raw: String?): GatewayNodeApprovalState =
-  when (raw?.trim()?.lowercase()) {
-    null, "" -> GatewayNodeApprovalState.Loading
-    "approved" -> GatewayNodeApprovalState.Approved
-    "pending-approval" -> GatewayNodeApprovalState.PendingApproval
-    "pending-reapproval" -> GatewayNodeApprovalState.PendingReapproval
-    "unapproved" -> GatewayNodeApprovalState.Unapproved
-    else -> GatewayNodeApprovalState.Loading
-  }
-
-internal fun currentNodeCapabilityApprovalState(
-  nodes: List<GatewayNodeSummary>,
-  selfNodeId: String,
-): GatewayNodeApprovalState =
-  nodes
-    .firstOrNull { it.id == selfNodeId }
-    ?.approvalState
-    ?: GatewayNodeApprovalState.Loading
-
-internal fun parseGatewayNodeSummary(item: JsonElement): GatewayNodeSummary? {
-  val obj = item.asObjectOrNull() ?: return null
-  val id = obj["nodeId"].asStringOrNull()?.trim().orEmpty()
-  if (id.isEmpty()) return null
-  return GatewayNodeSummary(
-    id = id,
-    displayName = obj["displayName"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
-    remoteIp = obj["remoteIp"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
-    version = obj["version"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
-    deviceFamily = obj["deviceFamily"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
-    paired = obj.boolean("paired"),
-    connected = obj.boolean("connected"),
-    // Only an omitted field identifies a legacy gateway; malformed and future values stay fail-closed.
-    approvalState =
-      if (obj.containsKey("approvalState")) {
-        parseGatewayNodeApprovalState(obj["approvalState"].asStringOrNull())
-      } else {
-        GatewayNodeApprovalState.Unsupported
-      },
-    pendingRequestId = obj["pendingRequestId"].asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() },
-    capabilities = parseGatewayStringArray(obj["caps"] as? JsonArray),
-    commands = parseGatewayStringArray(obj["commands"] as? JsonArray),
-  )
-}
-
 data class GatewayNodeSummary(
  val id: String,
  val displayName: String?,
@@ -2944,8 +2831,6 @@ data class GatewayNodeSummary(
  val deviceFamily: String?,
  val paired: Boolean,
  val connected: Boolean,
-  val approvalState: GatewayNodeApprovalState,
-  val pendingRequestId: String?,
  val capabilities: List<String>,
  val commands: List<String>,
 )
@@ -3068,11 +2953,6 @@ private fun JsonObject?.cronStatus(key: String): String? =
    ?.trim()
    ?.takeIf { it.isNotEmpty() }

-private fun parseGatewayStringArray(items: JsonArray?): List<String> =
-  items
-    ?.mapNotNull { it.asStringOrNull()?.trim()?.takeIf { value -> value.isNotEmpty() } }
-    .orEmpty()
-
 fun providerDisplayName(provider: String): String =
  when (provider.trim().lowercase()) {
    "openai" -> "OpenAI"
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Peter Steinberger	97059b9697	refactor(codex): simplify native context ownership	2026-06-15 16:16:21 +02:00
Peter Steinberger	f395fca214	refactor(agents): isolate native hook provider policy	2026-06-14 10:40:31 -07:00
Peter Steinberger	68a5d4b5f5	test(codex): type detached delivery fixture	2026-06-14 10:09:08 -07:00
Peter Steinberger	2abcddaa2f	fix(codex): fence stale completion recovery	2026-06-14 09:56:36 -07:00
Peter Steinberger	faffa4b8f7	fix(codex): serialize detached completion delivery	2026-06-14 09:53:14 -07:00
Peter Steinberger	45ccb20d98	fix(codex): preserve clients after terminal turn failures	2026-06-14 09:53:13 -07:00
Peter Steinberger	dbd74318f7	docs(codex): clarify subagent recovery owner	2026-06-14 09:53:13 -07:00
Peter Steinberger	4f21111df9	test(codex): narrow monitor fixture errors	2026-06-14 09:53:13 -07:00
Peter Steinberger	d9efe22cd3	test(codex): update generation reclaim fixture	2026-06-14 09:53:13 -07:00
Peter Steinberger	107462abae	fix(codex): close runtime ownership races	2026-06-14 09:53:13 -07:00
Peter Steinberger	04c30720a0	fix(codex): finalize runtime integration	2026-06-14 09:53:13 -07:00
Peter Steinberger	58601a7f0e	refactor(codex): remove stale binding lease type	2026-06-14 09:53:13 -07:00
Peter Steinberger	9c3d186d7c	fix(codex): resolve diagnostics sessions by agent	2026-06-14 09:53:13 -07:00
Peter Steinberger	990edcfbf5	fix(codex): keep media runtime inside plugin package	2026-06-14 09:53:13 -07:00
Peter Steinberger	1a4e815e37	refactor(codex): unify app-server runtime ownership	2026-06-14 09:53:12 -07:00