Compare commits

..

1 Commits

Author SHA1 Message Date
Alex Knight
5fa840c26f fix: surface message-tool-only diagnostics 2026-05-22 15:14:46 +10:00
19246 changed files with 425340 additions and 1666088 deletions

View File

@@ -1,88 +0,0 @@
---
name: agent-transcript
description: "Add a redacted agent transcript section to GitHub PR or issue bodies during OpenClaw agent-created PR/issue workflows."
---
# Agent Transcript
Best-effort local-only provenance for OpenClaw PR/issue bodies. Use during agent-created GitHub PR or issue workflows before creating/updating the body.
## Contract
- Never use network. Session discovery reads local agent logs only.
- Never upload raw logs. Render sanitized Markdown first.
- Always ask the user before adding transcript logs to a GitHub PR/issue body.
- Tell the user sanitized session logs help reviewers and can make PRs easier to prioritize.
- Offer a local HTML preview before insertion. If the user wants preview, open it and wait for confirmation before adding the section.
- Fail closed on unresolved secrets, private keys, browser/session/cookie details, or auth URLs.
- Drop system/developer prompts, raw tool outputs, reasoning, env, cookies, tokens, and broad local paths.
- Keep user prompts, assistant visible decisions, terse tool summaries, and test/proof outcomes.
- Remove session turns unrelated to the PR/issue work. Use the PR/issue title, branch name, changed files, and stated goal as scope; omit earlier/later unrelated tasks even when they are in the same session log.
- Best effort only: PR/issue creation must continue if no safe transcript is found.
- Add the `## Agent Transcript` section only when inserting a real transcript. Never add a placeholder transcript heading or text such as "A sanitized local transcript preview was generated but not included."
- Use a collapsed `<details>` section and update existing markers instead of duplicating sections.
## Helper
```bash
.agents/skills/agent-transcript/scripts/agent-transcript --help
```
Find a likely local session:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript find \
--query "$PR_TITLE $BRANCH_OR_PR_URL" \
--cwd "$PWD" \
--since-days 14
```
`find` scans the newest 400 matching local JSONL logs by default across Codex, Claude, Pi, and OpenClaw agent sessions. Use `--max-files N` for a wider local search.
Render a PR/issue body section:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript render \
--session "$SESSION_JSONL" \
--out /tmp/agent-transcript.md
```
Preview one candidate session locally:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript preview \
--session "$SESSION_JSONL" \
--out /tmp/agent-transcript-preview.html
open /tmp/agent-transcript-preview.html
```
Append/update a body file before `gh pr create --body-file` or connector PR creation:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript append-body \
--body /tmp/pr-body.md \
--session "$SESSION_JSONL" \
--out /tmp/pr-body.with-transcript.md
```
## PR/Issue Workflow
1. Draft the normal PR/issue body first.
2. Run `find` with title, branch, PR URL/number if known, and cwd.
3. If a high-confidence session is found, ask:
`Include a redacted agent transcript? It helps reviewers and can make the PR easier to prioritize. I can open a local preview first.`
4. If the user wants preview, run `preview`, open the HTML with `open`, and wait for confirmation.
5. Before insertion, trim unrelated session turns from the generated section. Keep only turns that explain this PR/issue's goal, implementation choices, files, tests, proof, blockers, and final outcome.
6. If the user approves, run `append-body`.
7. Use the enriched body file for creation/update.
8. If no safe session is found, say nothing and continue without transcript. If the user declines, continue without transcript and do not add any transcript placeholder section.
## Review Artifacts
For manual audits across many PR/session candidates, create a local HTML preview from a local JSON file. This is for maintainers only and is not part of the PR/issue workflow:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript html \
--prs /tmp/recent-prs.json \
--out /tmp/agent-transcript-preview.html
```

View File

@@ -1,683 +0,0 @@
#!/usr/bin/env node
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import process from "node:process";
const MARKER_START = "<!-- agent-transcript:start -->";
const MARKER_END = "<!-- agent-transcript:end -->";
const DEFAULT_MAX_CHARS = 50000;
const DEFAULT_ENTRY_MAX_CHARS = 6000;
function usage() {
console.log(`Usage:
agent-transcript find --query TEXT [--cwd PATH] [--since-days N] [--max-files N] [--root PATH...]
agent-transcript render --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N] [--title TEXT] [--url URL]
agent-transcript preview --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N] [--title TEXT] [--url URL]
agent-transcript append-body --body FILE --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N]
agent-transcript html --prs FILE [--out FILE] [--since-days N] [--min-score N] [--root PATH...] [--exclude-session FILE...]
Local-only. No network calls.`);
}
function parseArgs(argv) {
const args = { _: [] };
for (let i = 0; i < argv.length; i++) {
const arg = argv[i];
if (!arg.startsWith("--")) {
args._.push(arg);
continue;
}
const key = arg.slice(2);
const next = argv[i + 1];
if (next == null || next.startsWith("--")) {
args[key] = true;
continue;
}
i++;
if (args[key] == null) args[key] = next;
else if (Array.isArray(args[key])) args[key].push(next);
else args[key] = [args[key], next];
}
return args;
}
function asArray(value) {
if (value == null) return [];
return Array.isArray(value) ? value : [value];
}
function homePath(...parts) {
return path.join(os.homedir(), ...parts);
}
function openClawSessionRoots() {
const stateDir = process.env.OPENCLAW_STATE_DIR || homePath(".openclaw");
const agentsDir = path.join(stateDir, "agents");
if (!fs.existsSync(agentsDir)) return [];
try {
const roots = fs
.readdirSync(agentsDir, { withFileTypes: true })
.filter((entry) => entry.isDirectory())
.flatMap((entry) => {
const agentDir = path.join(agentsDir, entry.name);
return [
path.join(agentDir, "sessions"),
path.join(agentDir, "agent", "sessions"),
path.join(agentDir, "agent", "codex-home", "sessions"),
];
})
.filter((root) => fs.existsSync(root));
return [...new Set(roots)];
} catch {
return [];
}
}
function defaultRoots() {
return [
homePath(".codex", "sessions"),
homePath(".claude", "projects"),
homePath(".pi", "agent", "sessions"),
...openClawSessionRoots(),
];
}
function walkJsonl(root, sinceMs, out = []) {
if (!root || !fs.existsSync(root)) return out;
const stat = fs.statSync(root);
if (stat.isFile()) {
if (root.endsWith(".jsonl") && stat.mtimeMs >= sinceMs) out.push(root);
return out;
}
for (const entry of fs.readdirSync(root, { withFileTypes: true })) {
if (entry.name === "node_modules" || entry.name === ".git") continue;
const file = path.join(root, entry.name);
if (entry.isDirectory()) walkJsonl(file, sinceMs, out);
else if (entry.isFile() && entry.name.endsWith(".jsonl")) {
const entryStat = fs.statSync(file);
if (entryStat.mtimeMs >= sinceMs) out.push(file);
}
}
return out;
}
function readJsonl(file, maxLines = 12000) {
const text = fs.readFileSync(file, "utf8");
const lines = text.split(/\n+/).filter(Boolean).slice(0, maxLines);
const rows = [];
for (const line of lines) {
try {
rows.push(JSON.parse(line));
} catch {
rows.push({ type: "unparsed", text: line });
}
}
return rows;
}
function stringContent(value) {
if (value == null) return "";
if (typeof value === "string") return value;
if (Array.isArray(value)) return value.map(stringContent).filter(Boolean).join("\n");
if (typeof value === "object") {
if (typeof value.text === "string") return value.text;
if (typeof value.content === "string") return value.content;
if (typeof value.message === "string") return value.message;
if (Array.isArray(value.content)) return stringContent(value.content);
if (value.type === "text" && value.text) return String(value.text);
}
return "";
}
function detectAgent(file, rows) {
if (file.includes(`${path.sep}.codex${path.sep}`)) return "codex";
if (file.includes(`${path.sep}.claude${path.sep}`)) return "claude";
if (file.includes(`${path.sep}.pi${path.sep}`)) return "pi";
if (
file.includes(`${path.sep}.openclaw${path.sep}`) ||
(file.includes(`${path.sep}agents${path.sep}`) && file.includes(`${path.sep}sessions${path.sep}`))
) {
return "openclaw";
}
if (rows.some((row) => row?.type === "session_meta" || row?.type === "response_item")) return "codex";
if (rows.some((row) => row?.sessionId && row?.userType)) return "claude";
return "agent";
}
function eventText(row) {
if (row?.type === "event_msg") {
const payload = row.payload || {};
return stringContent(payload.message || payload.text_elements || payload.content);
}
if (row?.type === "response_item") {
const payload = row.payload || {};
return stringContent(payload.content || payload.summary || payload.arguments || payload.output);
}
if (row?.message) return stringContent(row.message);
if (row?.content) return stringContent(row.content);
if (row?.text) return stringContent(row.text);
return "";
}
function eventRole(row) {
if (row?.type === "event_msg") {
const type = row.payload?.type;
if (type === "user_message") return "user";
if (type === "agent_message") return "assistant";
if (type === "token_count" || type === "task_started" || type === "task_complete") return null;
if (type === "web_search_end") return "web";
}
if (row?.type === "response_item") {
const payload = row.payload || {};
if (payload.type === "function_call") return "tool";
if (payload.type === "function_call_output") return "tool_output";
if (payload.type === "reasoning") return null;
if (payload.type === "web_search_call") return "web";
if (payload.role === "user") return "user";
if (payload.role === "assistant") return "assistant";
}
if (row?.type === "user") return "user";
if (row?.type === "assistant") return "assistant";
if (row?.message?.role === "user") return "user";
if (row?.message?.role === "assistant") return "assistant";
if (row?.type === "tool_result" || row?.type === "tool_use") return "tool";
return null;
}
function hasSetupBlob(text) {
return (
text.includes("<INSTRUCTIONS>") ||
text.includes("# AGENTS.MD") ||
text.includes("Knowledge cutoff:") ||
text.includes("You are Codex") ||
/\byour instructions\b/i.test(text) ||
/\binstructions absorbed\b/i.test(text) ||
/\bAGENTS\.md\b/i.test(text)
);
}
function redact(input, stats) {
let s = String(input ?? "");
const rules = [
[/-----BEGIN [A-Z ]*PRIVATE KEY-----[\s\S]*?-----END [A-Z ]*PRIVATE KEY-----/g, "[REDACTED_PRIVATE_KEY]"],
[/sk-[A-Za-z0-9_-]{20,}/g, "[REDACTED_OPENAI_KEY]"],
[/(gh[pousr]_[A-Za-z0-9_]{20,})/g, "[REDACTED_GITHUB_TOKEN]"],
[/(AKIA[0-9A-Z]{16})/g, "[REDACTED_AWS_KEY]"],
[/eyJ[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{10,}/g, "[REDACTED_JWT]"],
[/\b(?:Bearer|Basic)\s+[A-Za-z0-9._~+/=-]{16,}/gi, "[REDACTED_AUTH_HEADER]"],
[/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, "[REDACTED_EMAIL]"],
[/\b(?:\+?\d[\d .()-]{7,}\d)\b/g, "[REDACTED_PHONE]"],
[/\/Users\/[^\s`"'>)]+/g, "[LOCAL_PATH]"],
[/~\/[^\s`"'>)]+/g, "[HOME_PATH]"],
[/([?&](?:token|key|secret|signature|sig|access_token|auth)=)[^\s`"'>&]+/gi, "$1[REDACTED]"],
];
for (const [re, repl] of rules) {
const before = s;
s = s.replace(re, repl);
if (s !== before) stats.redactions++;
}
return s;
}
function unsafe(text) {
const patterns = [
/-----BEGIN [A-Z ]*PRIVATE KEY-----/,
/\b(?:Bearer|Basic)\s+[A-Za-z0-9._~+/=-]{16,}/i,
/\b(?:user_session|_gh_sess|__Host-user_session_same_site|GH_SESSION_TOKEN)\b/i,
/\b(?:GITHUB_TOKEN|GH_TOKEN|OPENAI_API_KEY|ANTHROPIC_API_KEY)\b/,
/\/upload\/policies\/assets|uploadToken|authenticity_token/i,
];
return patterns.filter((pattern) => pattern.test(text)).map((pattern) => String(pattern));
}
function normalizeEntry(role, text, stats, options = {}) {
let t = redact(text, stats).replace(/\n{3,}/g, "\n\n").trim();
if (!t) return null;
if (hasSetupBlob(t)) t = "[instructions recap omitted; policy/config text, not task dialogue]";
if (unsafe(t).length) t = "[omitted: browser/session/auth internals; not useful for public PR transcript]";
const entryMaxChars = Number(options.entryMaxChars || options["entry-max-chars"] || DEFAULT_ENTRY_MAX_CHARS);
if (t.length > entryMaxChars) {
t = `${t.slice(0, entryMaxChars).trimEnd()}\n...[truncated ${t.length - entryMaxChars} chars]`;
}
return `[${role}]\n${t}`;
}
function entryRole(entry) {
const match = entry.match(/^\[([^\]]+)\]\n/);
return match ? match[1] : null;
}
function entryBody(entry) {
return entry.replace(/^\[[^\]]+\]\n/, "");
}
function coalesceEntries(entries) {
const coalesced = [];
for (const entry of entries) {
const role = entryRole(entry);
const body = entryBody(entry);
const last = coalesced[coalesced.length - 1];
if (!last || !role || entryRole(last) !== role || role === "tool summary") {
coalesced.push(entry);
continue;
}
const lastBody = entryBody(last);
if (lastBody === body || lastBody.includes(body)) continue;
if (body.includes(lastBody)) {
coalesced[coalesced.length - 1] = `[${role}]\n${body}`;
continue;
}
coalesced[coalesced.length - 1] = `[${role}]\n${lastBody}\n\n${body}`;
}
return coalesced;
}
function toolFamily(name) {
const normalized = String(name).toLowerCase();
if (
/(read|fetch|open|list|find|search|grep|rg|sed|cat|head|tail|jq|wc|status|diff|show|view|snapshot|screenshot)/.test(
normalized,
)
) {
return "read";
}
if (/(write|edit|patch|apply|create|update|append|save|comment|fill|click|type|navigate|upload)/.test(normalized)) {
return "write";
}
if (/(exec|command|shell|run|test|build|lint|format|install|pnpm|npm|node|git|gh|ssh)/.test(normalized)) {
return "execute";
}
if (/(web|http|fetch|browser|chrome|github|dropbox|notion|gmail|calendar)/.test(normalized)) {
return "network";
}
return "other";
}
function shellFamily(command) {
const cmd = String(command || "").trim();
if (!cmd) return "execute";
if (
/^(rg|grep|sed|cat|head|tail|jq|wc|ls|find|pwd|git (status|diff|show|log|blame)|gh (pr|issue|api|run|repo|auth) (view|list|status)|test |stat |ps |which |command -v )\b/.test(
cmd,
)
) {
return "read";
}
if (/^(open |chmod |mkdir |touch |cp |mv |kill |git add|git commit|git push|gh pr create|gh issue create)\b/.test(cmd)) {
return "write";
}
if (/^(node|npm|pnpm|bun|python|python3|ruby|tsx|tsgo|make|cargo|go test|swift|xcodebuild)\b/.test(cmd)) {
return "execute";
}
if (/^(ssh|curl|wget|tailscale|nc )\b/.test(cmd)) return "network";
return "execute";
}
function toolCallFamily(row) {
const name = row.payload?.name || row.name || row.message?.name || row.type || "tool";
if (name === "exec_command") {
try {
const args = JSON.parse(row.payload?.arguments || "{}");
return shellFamily(args.cmd);
} catch {
return "execute";
}
}
if (name === "apply_patch") return "write";
if (name === "write_stdin") return "execute";
return toolFamily(name);
}
function compactToolSummary(familyCounts, dropped) {
const families = new Map();
for (const [family, count] of familyCounts.entries()) {
families.set(family, (families.get(family) || 0) + count);
}
const ordered = ["read", "write", "execute", "network", "other"]
.map((family) => [family, families.get(family) || 0])
.filter(([, count]) => count > 0)
.map(([family, count]) => `${count} ${family}`);
const calls = ordered.length ? ordered.join(", ") : "0 tool";
return `${calls}; raw tool outputs dropped: ${dropped}`;
}
function recountEntries(stats, entries) {
stats.rawEntries = stats.entries;
stats.entries = entries.length;
stats.user = entries.filter((entry) => entry.startsWith("[user]\n")).length;
stats.assistant = entries.filter((entry) => entry.startsWith("[assistant]\n")).length;
}
function renderSession(file, options = {}) {
const rows = readJsonl(file);
const agent = detectAgent(file, rows);
const stats = {
agent,
entries: 0,
user: 0,
assistant: 0,
toolCalls: 0,
toolOutputsDropped: 0,
web: 0,
redactions: 0,
omittedUnsafe: 0,
};
const toolCounts = new Map();
const items = [];
const seenEntries = new Set();
const hasEventDialogue = rows.some((row) => {
const type = row?.type === "event_msg" ? row.payload?.type : null;
return type === "user_message" || type === "agent_message";
});
for (const row of rows) {
const role = eventRole(row);
if (!role) continue;
if (hasEventDialogue && row.type === "response_item" && (role === "user" || role === "assistant")) {
continue;
}
if (role === "tool_output") {
stats.toolOutputsDropped++;
continue;
}
if (role === "tool") {
const family = toolCallFamily(row);
toolCounts.set(family, (toolCounts.get(family) || 0) + 1);
stats.toolCalls++;
continue;
}
if (role === "web") {
stats.web++;
continue;
}
const before = eventText(row);
const entry = normalizeEntry(role, before, stats, options);
if (!entry) continue;
const dedupeKey = entry.replace(/\s+/g, " ").trim();
if (seenEntries.has(dedupeKey)) continue;
seenEntries.add(dedupeKey);
if (entry.includes("[omitted: browser/session/auth internals")) stats.omittedUnsafe++;
items.push(entry);
stats.entries++;
if (role === "user") stats.user++;
if (role === "assistant") stats.assistant++;
}
if (toolCounts.size) {
items.push(`[tool summary]\n${compactToolSummary(toolCounts, stats.toolOutputsDropped)}`);
stats.entries++;
}
const renderedItems = coalesceEntries(items);
recountEntries(stats, renderedItems);
const maxChars = Number(options.maxChars || DEFAULT_MAX_CHARS);
let joined = renderedItems.join("\n\n");
if (joined.length > maxChars) joined = `${joined.slice(0, maxChars).trimEnd()}\n\n...[transcript truncated to ${maxChars} chars]`;
const headerBits = [options.title, options.url].filter(Boolean).join(" | ");
const unsafeAfter = unsafe(joined);
const safe = unsafeAfter.length === 0;
const markdown = `${MARKER_START}
## Agent Transcript
<details>
<summary>Redacted ${agent} session transcript${headerBits ? `: ${redact(headerBits, stats)}` : ""}</summary>
\`\`\`\`text
source: [LOCAL_SESSION]
redaction: local paths, emails, phone-shaped strings, token-shaped strings, auth headers, auth query params
omitted: raw tool outputs, system/developer prompts, local paths, secrets, browser/session/auth details
stats: ${JSON.stringify(stats)}
${joined}
\`\`\`\`
</details>
${MARKER_END}
`;
return { file, agent, safe, unsafeAfter, stats, markdown };
}
function readBoundedText(file, maxBytes = 220000) {
const fd = fs.openSync(file, "r");
try {
const stat = fs.fstatSync(fd);
if (stat.size <= maxBytes) {
const buffer = Buffer.alloc(stat.size);
fs.readSync(fd, buffer, 0, stat.size, 0);
return buffer.toString("utf8");
}
const half = Math.floor(maxBytes / 2);
const head = Buffer.alloc(half);
const tail = Buffer.alloc(half);
fs.readSync(fd, head, 0, half, 0);
fs.readSync(fd, tail, 0, half, Math.max(0, stat.size - half));
return `${head.toString("utf8")}\n[...middle omitted for scan...]\n${tail.toString("utf8")}`;
} finally {
fs.closeSync(fd);
}
}
function sessionScanRecord(file, maxBytes) {
const stat = fs.statSync(file);
const agent = detectAgent(file, []);
return {
file,
agent,
mtime: new Date(stat.mtimeMs).toISOString(),
haystack: `${file}\n${readBoundedText(file, maxBytes)}`.toLowerCase(),
};
}
function scoreScanRecord(record, terms, cwd) {
const haystack = record.haystack;
let score = 0;
const reasons = [];
for (const term of terms) {
const normalized = term.toLowerCase().trim();
if (normalized.length < 3) continue;
if (haystack.includes(normalized)) {
score += Math.min(20, Math.max(3, Math.floor(normalized.length / 3)));
reasons.push(normalized.slice(0, 80));
}
}
if (cwd) {
const cwdLower = cwd.toLowerCase();
if (haystack.includes(cwdLower) || record.file.toLowerCase().includes(cwdLower.replaceAll("/", "-"))) {
score += 8;
reasons.push("cwd");
}
}
return { file: record.file, score, reasons, mtime: record.mtime, agent: record.agent };
}
function recentFiles(files, maxFiles) {
return files
.map((file) => {
try {
return { file, mtimeMs: fs.statSync(file).mtimeMs };
} catch {
return null;
}
})
.filter(Boolean)
.sort((a, b) => b.mtimeMs - a.mtimeMs)
.slice(0, maxFiles)
.map((entry) => entry.file);
}
function candidateFiles(roots, terms, sinceMs, options = {}) {
return recentFiles(roots.flatMap((root) => walkJsonl(root, sinceMs)), Number(options["max-files"] || 400));
}
function findSessions(options) {
const sinceDays = Number(options["since-days"] || 14);
const sinceMs = Date.now() - sinceDays * 24 * 60 * 60 * 1000;
const roots = asArray(options.root).length ? asArray(options.root) : defaultRoots();
const query = String(options.query || "");
const terms = query
.split(/\s+/)
.concat(query.match(/https?:\/\/\S+/g) || [])
.filter(Boolean);
const files = candidateFiles(roots, terms, sinceMs, options);
const scanBytes = Number(options["scan-bytes"] || 60000);
const results = files
.map((file) => scoreScanRecord(sessionScanRecord(file, scanBytes), terms, options.cwd))
.filter((result) => result.score > 0)
.sort((a, b) => b.score - a.score || b.mtime.localeCompare(a.mtime))
.slice(0, Number(options.limit || 10));
return results;
}
function sessionScanRecords(options) {
const sinceDays = Number(options["since-days"] || 14);
const sinceMs = Date.now() - sinceDays * 24 * 60 * 60 * 1000;
const roots = asArray(options.root).length ? asArray(options.root) : defaultRoots();
const excluded = new Set(asArray(options["exclude-session"]).map((file) => path.resolve(file)));
return roots
.flatMap((root) => walkJsonl(root, sinceMs))
.filter((file) => !excluded.has(path.resolve(file)))
.map((file) => sessionScanRecord(file, Number(options["scan-bytes"] || 90000)));
}
function replaceSection(body, section) {
const start = body.indexOf(MARKER_START);
const end = body.indexOf(MARKER_END);
if (start !== -1 && end !== -1 && end > start) {
return `${body.slice(0, start).trimEnd()}\n\n${section.trim()}\n\n${body.slice(end + MARKER_END.length).trimStart()}`;
}
return `${body.trimEnd()}\n\n${section.trim()}\n`;
}
function escapeHtml(text) {
return String(text)
.replaceAll("&", "&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll('"', "&quot;");
}
function htmlDocument(records) {
const rows = records
.map((record) => `<section>
<h2><a href="${escapeHtml(record.url || "")}">${escapeHtml(record.title || record.url || "PR")}</a></h2>
<p><code>${escapeHtml(record.session ? "[LOCAL_SESSION]" : "no session")}</code> score: ${escapeHtml(record.score ?? "")} safe: ${escapeHtml(record.safe ?? "")}</p>
<pre>${escapeHtml(record.markdown || record.error || "")}</pre>
</section>`)
.join("\n");
return `<!doctype html>
<meta charset="utf-8">
<title>Agent Transcript Preview</title>
<style>
body{font:14px/1.45 system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;margin:32px;color:#1f2328;background:#fff}
section{border-top:1px solid #d0d7de;padding:24px 0}
h1,h2{line-height:1.2}
pre{white-space:pre-wrap;background:#f6f8fa;border:1px solid #d0d7de;border-radius:6px;padding:16px;overflow:auto}
code{background:#f6f8fa;padding:2px 4px;border-radius:4px}
a{color:#0969da}
</style>
<h1>Agent Transcript Preview</h1>
${rows}
`;
}
function singlePreviewDocument(record) {
return htmlDocument([record]);
}
function readPrs(file) {
const raw = fs.readFileSync(file, "utf8");
const parsed = JSON.parse(raw);
return Array.isArray(parsed) ? parsed : parsed.items || parsed.prs || [];
}
function main() {
const [command, ...rest] = process.argv.slice(2);
const args = parseArgs(rest);
if (!command || command === "--help" || command === "-h" || args.help) {
usage();
return;
}
if (command === "find") {
console.log(JSON.stringify(findSessions(args), null, 2));
return;
}
if (command === "render") {
if (!args.session) throw new Error("--session is required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
if (args.out) fs.writeFileSync(args.out, rendered.markdown);
else process.stdout.write(rendered.markdown);
return;
}
if (command === "preview") {
if (!args.session) throw new Error("--session is required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
const output = singlePreviewDocument({
title: args.title || "Agent Transcript Preview",
url: args.url || "",
session: args.session,
safe: rendered.safe,
markdown: rendered.markdown,
});
if (args.out) fs.writeFileSync(args.out, output);
else process.stdout.write(output);
return;
}
if (command === "append-body") {
if (!args.body || !args.session) throw new Error("--body and --session are required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
const body = fs.readFileSync(args.body, "utf8");
const next = replaceSection(body, rendered.markdown);
if (args.out) fs.writeFileSync(args.out, next);
else process.stdout.write(next);
return;
}
if (command === "html") {
if (!args.prs) throw new Error("--prs is required");
const records = [];
const scanRecords = sessionScanRecords(args);
const minScore = Number(args["min-score"] || 50);
for (const pr of readPrs(args.prs)) {
const query = [pr.url, pr.number ? `#${pr.number}` : "", pr.number, pr.title, pr.headRefName, pr.headRefName || pr.branch]
.filter(Boolean)
.join(" ");
const terms = query
.split(/\s+/)
.concat(query.match(/https?:\/\/\S+/g) || [])
.filter(Boolean);
const [candidate] = scanRecords
.map((record) => scoreScanRecord(record, terms, args.cwd))
.filter((result) => result.score >= minScore)
.sort((a, b) => b.score - a.score || b.mtime.localeCompare(a.mtime));
if (!candidate) {
records.push({ ...pr, error: "No local session match found." });
continue;
}
try {
const rendered = renderSession(candidate.file, { ...args, title: pr.title, url: pr.url });
records.push({
...pr,
session: candidate.file,
score: candidate.score,
safe: rendered.safe,
markdown: rendered.markdown,
});
} catch (error) {
records.push({ ...pr, session: candidate.file, score: candidate.score, error: String(error) });
}
}
const output = htmlDocument(records);
if (args.out) fs.writeFileSync(args.out, output);
else process.stdout.write(output);
return;
}
usage();
process.exitCode = 2;
}
try {
main();
} catch (error) {
console.error(error instanceof Error ? error.message : String(error));
process.exit(1);
}

View File

@@ -1,17 +1,16 @@
---
name: autoreview
description: "Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer."
description: "Autoreview closeout: local dirty changes, PR branch vs main, parallel tests."
---
# Auto Review
# Autoreview
Run the bundled structured review helper as a closeout check. This is code review, not Guardian `auto_review` approval routing.
Run Codex's built-in code review as a closeout check. This is code review (`codex review`), not Guardian `auto_review` approval routing.
Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.
Codex native review mode performs best and is recommended. Non-Codex reviewers are fallback/second-opinion paths that receive a generated diff prompt, not the full Codex review-mode runtime.
Use when:
- user asks for Codex review / Claude review / autoreview / second-model review
- user asks for Codex review / autoreview / second-model review
- after non-trivial code edits, before final/commit/ship
- reviewing a local branch or PR branch after fixes
@@ -22,107 +21,60 @@ Use when:
- Read dependency docs/source/types when the finding depends on external behavior.
- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
- When an accepted finding shows a bug class or repeated pattern, inspect the current PR scope for sibling instances before fixing.
- Fix the scoped bug class at once when practical; stop at touched surfaces, owner boundaries, and clear follow-up territory.
- Keep going until structured review returns no accepted/actionable findings only while the work remains inside the original task scope.
- If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
- Be patient with large bundles. Structured review can take up to 30 minutes while the model call is active, especially with Codex tools or web search.
- Treat heartbeat lines like `review still running: ... elapsed=... pid=...` as healthy progress, not a hang. Let the helper continue while heartbeats are advancing. Pass `--stream-engine-output` when live engine text is useful; Codex and Claude filter tool/file chatter, other engines pass raw output through.
- Do not kill a review just because it has been quiet for 2-5 minutes, or because it is still running under the 30-minute window. Inspect the process only after missing multiple expected heartbeats, after 30 minutes, or after an obviously failed subprocess; prefer letting the same helper command finish.
- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
- For regression provenance, if no blamed PR is traceable, use the blamed commit as the provenance: commit SHA, date, and author username. Do not guess a merger or frame missing PR metadata as a separate finding.
- Do not invoke built-in `codex review`, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
- Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
- Keep going until the selected review path returns no accepted/actionable findings.
- If a review-triggered fix changes code, rerun focused tests and rerun the review helper.
- Default to Codex review with no fallback. Prefer Codex for final closeout because it uses native review mode; non-Codex reviewers use a Codex-inspired generated diff prompt. Use `--fallback-reviewer auto|claude|pi|opencode|droid|copilot` only when a second-model fallback is explicitly wanted and authenticated. The helper runs nested Codex review in yolo/full-access mode by default; use `--no-yolo` only when intentionally testing sandbox behavior.
- Stop as soon as the review command/helper exits 0 with no accepted/actionable findings. Do not run an extra direct `codex review` just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
- Multi-reviewer panels are opt-in only. Use them when explicitly requested or when risk justifies the extra spend; the main agent still verifies every accepted finding before fixing.
- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
- If `gh`/Gitcrawl reports `database disk image is malformed`, run `gitcrawl doctor --json` once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
- If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
- If creating or updating a PR while rejecting any autoreview finding, record the rejected finding and reason in the PR description so later reviewers can distinguish intentional design decisions from missed review output.
- Do not push just to review. Push only when the user requested push/ship/PR update.
## Scope Governor
Autoreview is a closeout gate, not permission to rewrite the task.
Before the first review, freeze a scope baseline: original request or issue, target branch, intended behavior, owner boundary, changed files, and non-test LOC. For inherited or already-bloated branches, use the intended PR diff as the baseline rather than accepting all existing branch drift.
Before patching a finding, classify it:
- **In-scope blocker**: the finding is introduced by the current diff, affects the same owner boundary, and can be fixed without changing the task's contract.
- **Follow-up**: the finding is real but belongs to an adjacent bug class, sibling surface, cleanup, or broader hardening track.
- **Stop-and-escalate**: the finding requires a new protocol/config/storage/public API contract, a different owner boundary, a release-process change, or a design choice outside the original request.
Stop patching and report the scope break instead of continuing when:
- a narrow PR turns into an architecture change, protocol change, migration, or release-process change;
- the diff grows past 2x the original files or non-test LOC without explicit approval to expand scope;
- two review-triggered patch cycles have not converged; pause and reclassify every remaining finding before another edit;
- the best fix is "define the canonical contract first" rather than another local inference layer;
- fixing the accepted finding would make the PR no longer describe the same behavior, issue, or owner boundary.
After the two-cycle pause, continue only when every remaining accepted finding is still an in-scope blocker. Otherwise preserve the useful analysis, identify the smallest safe landed subset if one exists, and open or request a follow-up for the larger fix. Do not keep committing speculative fixes just to satisfy the reviewer.
Do not stack or push review-triggered fix commits while scope classification or focused proof is unresolved. Keep exploratory edits local until the cycle is proven in scope; if scope breaks, remove them from the landing lane instead of preserving them as branch history.
Critical exceptions must be explicit: active data loss, crash, broken install/upgrade, release blocker, or concrete security exposure. If the exception is not one of those, it is not critical enough to blow up scope.
## Release Branches And Release Process
On release, beta, stable, hotfix, signing, notarization, appcast, package-publish, or release-check work, use freeze discipline even when the branch name is not release-like:
- Fix only release blockers, failed release infrastructure, exact backports, install/upgrade breakage, data loss, crashes, or concrete security exposure.
- Treat non-blocking autoreview findings as follow-ups for `main`, not reasons to broaden the release branch.
- Do not introduce new product behavior, config surface, protocol shape, migration, plugin ownership, docs narrative, or process policy unless it directly unblocks the release.
- Keep proof tied to the release target: exact branch/ref, failing check or shipped-risk reason, smallest command/proof, and whether the fix must also forward-port to `main`.
- If review discovers a real but non-critical design problem during release closeout, stop with a follow-up issue/PR plan; do not use the release branch as the refactor lane.
- For OpenClaw maintainers, keep autoreview validation Crabbox/Testbox-aware when maintainer validation mode is enabled (`OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`). A review pass may inspect files and run cheap non-Node probes, but it must not start local `pnpm`, Vitest, `tsgo`, `npm test`, or `node scripts/run-vitest.mjs` from a Codex/worktree review unless the operator explicitly requested local proof. For runtime proof, use existing evidence or route through Crabbox/Testbox and report the id. Do not apply this rule to ordinary contributors who do not have maintainer Testbox access.
## Pick Target
Dirty local work:
```bash
<autoreview-helper> --mode local
codex review --uncommitted
```
Use this only when the patch is actually unstaged/staged/untracked in the
current checkout. `--mode uncommitted` is accepted as an alias for `--mode local`.
For committed, pushed, or PR work, point the helper at the commit
or branch diff instead; do not force dirty modes just
because the helper docs mention dirty work first. A clean local review
current checkout. For committed, pushed, or PR work, point Codex at the commit
or branch diff instead; do not force `--mode local` / `--uncommitted` just
because the helper docs mention dirty work first. A clean `--uncommitted` review
only proves there is no local patch.
Branch/PR work:
```bash
<autoreview-helper> --mode branch --base origin/main
git fetch origin
codex review --base origin/main
```
Optional review context is first-class:
```bash
<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json
```
Do not pass any prompt with `--base`, `--commit`, or `--uncommitted`. Codex CLI
review targets and custom review prompts are mutually exclusive: target modes
generate their own review prompt internally. Use plain target review for native
Codex closeout, or use custom prompt review (`codex review -`) only when you
intentionally want a generated diff prompt instead of native target review.
If an open PR exists, use its actual base:
```bash
base=$(gh pr view --json baseRefName --jq .baseRefName)
<autoreview-helper> --mode branch --base "origin/$base"
codex review --base "origin/$base"
```
Committed single change:
```bash
<autoreview-helper> --mode commit --commit HEAD
codex review --commit HEAD
```
or with the helper:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
.agents/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
```
Use commit review for already-landed or already-pushed work on `main`. Reviewing
@@ -135,117 +87,60 @@ with `--base`.
Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:
```bash
scripts/autoreview --parallel-tests "<focused test command>"
.agents/skills/autoreview/scripts/autoreview --parallel-tests "<focused test command>"
```
On Windows, the default `--parallel-tests` shell preserves the platform `cmd.exe`
semantics used by Python `shell=True`. Use `--parallel-tests-shell powershell`
or `--parallel-tests-shell pwsh` when the focused test command is PowerShell-specific.
Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.
## Review Panels
Run multiple reviewers against one frozen bundle:
```bash
<autoreview-helper> --reviewers codex,claude
```
`--panel` is shorthand for Codex plus Claude unless `--engine` changes the first reviewer:
```bash
<autoreview-helper> --panel
```
Set reviewer models and thinking/effort explicitly:
```bash
<autoreview-helper> --reviewers codex,claude --model codex=gpt-5.1 --thinking codex=high --model claude=sonnet --thinking claude=max
```
Inline syntax is also supported:
```bash
<autoreview-helper> --reviewers codex:gpt-5.1:high,claude:sonnet:max
```
Codex maps thinking to `model_reasoning_effort` and accepts `low`, `medium`,
`high`, or `xhigh`. Claude maps thinking to `--effort` and also accepts `max`.
Engines without a real thinking knob reject `--thinking`.
## Context Efficiency
Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.
Codex review is usually noisy. Default to a subagent filter when subagents are available. Ask it to run the review and return only:
- actionable findings it accepts
- findings it rejects, with one-line reason
- exact files/tests to rerun
Run inline only for tiny changes or when subagents are unavailable.
## Helper
OpenClaw repo-local helper:
Bundled helper:
```bash
.agents/skills/autoreview/scripts/autoreview --help
```
On native Windows, invoke the extensionless Python helper through Python:
```powershell
python .agents\skills\autoreview\scripts\autoreview --help
```
The smoke harness has thin shell wrappers over a shared Python implementation:
```bash
.agents/skills/autoreview/scripts/test-review-harness --fixture benign --engine codex
```
```powershell
.agents\skills\autoreview\scripts\test-review-harness.ps1 -Fixture benign -Engine codex
```
`agent-scripts` checkout helper:
```bash
skills/autoreview/scripts/autoreview --help
```
Global helper from `agent-scripts`:
```bash
~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help
```
If installed from `agent-scripts`, path is:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help
```
The helper:
- chooses dirty local changes first
- accepts `--mode uncommitted` as an alias for `--mode local`
- chooses dirty `--uncommitted` first
- otherwise uses current PR base if `gh pr view` works
- otherwise uses `origin/main` for non-main branches
- supports `--engine codex`, `claude`, `droid`, and `copilot`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
- resolves bare `git`, `gh`, reviewer, and PowerShell shell commands from absolute `PATH` entries only, never from the reviewed checkout; explicit relative `--*-bin` paths are resolved from the reviewed repository root
- auto-runs `PNPM_CONFIG_PM_ON_FAIL=ignore PNPM_CONFIG_VERIFY_DEPS_BEFORE_RUN=false PNPM_CONFIG_OFFLINE=true pnpm run check` in parallel when a repo has `package.json`, `pnpm-lock.yaml`, `node_modules`, and a `check` script; disable with `AUTOREVIEW_AUTO_TESTS=0`
- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- writes only to stdout unless `--output`, `--json-output`, or live streamed engine stderr is set
- supports `--dry-run`, `--parallel-tests`, `--parallel-tests-shell`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex and Claude hide tool/file event details, emit compact activity summaries, and report usage at turn completion
- supports opt-in review panels with `--panel` / `--reviewers`, plus per-engine `--model` and `--thinking`
- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
- prints `review still running: <engine> elapsed=<seconds>s pid=<pid>` to stderr at long-running intervals while waiting for the selected review engine, unless streamed output or compact Codex activity has been visible recently
- supports `--reviewer codex|claude|pi|opencode|droid|copilot|auto`; `auto` means Codex first
- supports `--fallback-reviewer auto|claude|pi|opencode|droid|copilot|none`; default is `none`
- falls back only when Codex is unavailable or exits nonzero, not when Codex reports findings
- writes only to stdout unless `--output` or `AUTOREVIEW_OUTPUT` is set
- supports `--dry-run`, `--parallel-tests`, and commit refs
- runs nested review with `--dangerously-bypass-approvals-and-sandbox --sandbox danger-full-access` by default
- with `OPENCLAW_TESTBOX=1` or `AUTOREVIEW_OPENCLAW_MAINTAINER_VALIDATION=1`, disables auto local `pnpm run check` and routes Codex through generated prompt review (`codex review -`) so the no-local-heavy-tests policy is included; native Codex target review cannot accept extra prompt text
- non-Codex reviewers receive the generated diff prompt and maintainer validation policy text when maintainer validation is active
- keeps accepting `--full-access`; use `--no-yolo` or `AUTOREVIEW_YOLO=0` to opt out
- still accepts legacy `CODEX_REVIEW_*` env vars when the matching `AUTOREVIEW_*` var is unset
- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
- exits nonzero when accepted/actionable findings are present
## Final Report
Include:
- review command used
- tests/proof run
- findings accepted/rejected, briefly why
- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected
Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.
Do not run another Codex review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.
## PR / CI Closeout
- Prefer direct run/job APIs after CI starts: `gh run view <run-id> --json jobs`; use PR rollup only for final mergeability.
- After rebase, compare `origin/main..HEAD`; drop CI-fix commits already upstream before pushing.
- For prompt snapshot CI failures, prove/generate with Linux Node 24 before rerunning the failed job.
- Update PR body once near the final head unless proof labels are missing or stale enough to block CI.

File diff suppressed because it is too large Load Diff

View File

@@ -1,16 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
harness="$script_dir/test-review-harness.py"
if command -v python3 >/dev/null 2>&1; then
exec python3 "$harness" "$@"
fi
if command -v python >/dev/null 2>&1; then
exec python "$harness" "$@"
fi
echo "Python 3 is required to run test-review-harness." >&2
exit 127

View File

@@ -1,45 +0,0 @@
[CmdletBinding()]
param(
[ValidateSet('malicious', 'benign')]
[string] $Fixture,
[ValidateSet('codex', 'claude', 'droid', 'copilot')]
[string[]] $Engine,
[Alias('h')]
[switch] $Help
)
$ErrorActionPreference = 'Stop'
$Harness = Join-Path $PSScriptRoot 'test-review-harness.py'
$ForwardedArgs = @()
if ($Help) {
$ForwardedArgs += '--help'
}
if ($PSBoundParameters.ContainsKey('Fixture')) {
$ForwardedArgs += @('--fixture', $Fixture)
}
if ($PSBoundParameters.ContainsKey('Engine')) {
foreach ($SelectedEngine in $Engine) {
$ForwardedArgs += @('--engine', $SelectedEngine)
}
}
$PyLauncher = Get-Command py -ErrorAction SilentlyContinue
if ($null -ne $PyLauncher) {
& $PyLauncher.Source -3 $Harness @ForwardedArgs
exit $LASTEXITCODE
}
$Python = Get-Command python -ErrorAction SilentlyContinue
if ($null -ne $Python) {
& $Python.Source $Harness @ForwardedArgs
exit $LASTEXITCODE
}
Write-Error 'Python 3 is required to run test-review-harness.'
exit 127

View File

@@ -1,215 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import runpy
import shutil
import stat
import subprocess
import sys
import tempfile
from collections.abc import Callable
from pathlib import Path
ENGINES = ("codex", "claude", "droid", "copilot")
DEFAULT_ENGINES = ("codex", "claude")
MALICIOUS_INITIAL = """export function uploadPath(name) {
return `uploads/${name.replaceAll("/", "")}`;
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
"""
BENIGN_INITIAL = r"""import path from "node:path";
const uploadsRoot = path.resolve("uploads");
function safeChildPath(root, name) {
if (name.includes("/") || name.includes("\\") || name.includes("\0")) {
throw new Error("invalid child name");
}
const resolved = path.resolve(root, name);
if (!resolved.startsWith(root + path.sep)) throw new Error("path escaped root");
return resolved;
}
export function uploadPath(name) {
return safeChildPath(uploadsRoot, name);
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
"""
MALICIOUS_CHANGED = """import { execSync } from "node:child_process";
export function uploadPath(name) {
return `uploads/${name}`;
}
export function deleteUpload(name) {
return execSync(`rm -rf uploads/${name}`);
}
export function publicUser(user) {
return { id: user.id, name: user.name, password: user.password };
}
"""
BENIGN_CHANGED = r"""import { execFile } from "node:child_process";
import path from "node:path";
import { promisify } from "node:util";
const uploadsRoot = path.resolve("uploads");
const reposRoot = path.resolve("repos");
const execFileAsync = promisify(execFile);
function safeChildPath(root, name) {
if (name.includes("/") || name.includes("\\") || name.includes("\0")) {
throw new Error("invalid child name");
}
const resolved = path.resolve(root, name);
if (!resolved.startsWith(root + path.sep)) throw new Error("path escaped root");
return resolved;
}
function repoChildPath(root, name) {
if (!/^[A-Za-z0-9._-]+$/.test(name)) throw new Error("invalid repo name");
return safeChildPath(root, name);
}
export function uploadPath(name) {
return safeChildPath(uploadsRoot, name);
}
export async function repoProbe(repoName) {
const { stdout } = await execFileAsync(process.execPath, ["--version"], {
cwd: repoChildPath(reposRoot, repoName),
encoding: "utf8",
maxBuffer: 16 * 1024 * 1024,
});
return stdout;
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
export function accountSettingsForOwner(user, requesterId) {
if (String(requesterId) !== String(user.id)) throw new Error("forbidden");
return { id: user.id, hasPassword: Boolean(user.passwordHash) };
}
"""
MALICIOUS_PROMPT = "This is an acceptance test fixture. The changed app.js patch contains real security bugs. Review normally and report only actionable defects from the patch."
BENIGN_PROMPT = "Security calibration fixture: this patch intentionally uses filesystem paths, async execFile, and owner-gated password-adjacent state safely. Do not flag legitimate shell/filesystem/auth-adjacent functionality unless there is a concrete exploitable risk in the diff."
def parse_args(argv: list[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="test-review-harness",
description=(
"Creates a temporary git repo with either a deliberately unsafe patch "
"or a security-sensitive-but-safe patch, then verifies each selected "
"engine through autoreview."
),
epilog="Default engines: codex, claude.",
)
parser.add_argument("--fixture", choices=("malicious", "benign"), default="malicious")
parser.add_argument("--engine", action="append", choices=ENGINES, dest="engines")
return parser.parse_args(argv)
def write_fixture_file(repo: Path, content: str) -> None:
with (repo / "app.js").open("w", encoding="utf-8", newline="\n") as handle:
handle.write(content)
def run(command: list[str], cwd: Path) -> None:
subprocess.run(command, cwd=cwd, check=True)
def create_fixture_repo(repo: Path, fixture: str) -> None:
run(["git", "init", "--quiet"], repo)
run(["git", "config", "user.name", "Review Fixture"], repo)
run(["git", "config", "user.email", "review-fixture@example.com"], repo)
write_fixture_file(repo, MALICIOUS_INITIAL if fixture == "malicious" else BENIGN_INITIAL)
run(["git", "add", "app.js"], repo)
run(["git", "commit", "--quiet", "-m", "initial safe version"], repo)
write_fixture_file(repo, MALICIOUS_CHANGED if fixture == "malicious" else BENIGN_CHANGED)
def validate_prompt_policy(repo: Path, autoreview: Path) -> None:
namespace = runpy.run_path(str(autoreview))
prompt = namespace["build_prompt"](repo, "local", None, "fixture diff", "", "")
required = (
"This helper is a closeout gate.",
"Do not turn a narrow patch into a broad",
"If this is release-branch or release-process work",
"Non-blocking design,",
)
missing = [needle for needle in required if needle not in prompt]
if missing:
raise RuntimeError(f"autoreview prompt missing scope policy: {missing}")
def run_reviews(repo: Path, script_dir: Path, fixture: str, engines: list[str]) -> None:
autoreview = script_dir / "autoreview"
validate_prompt_policy(repo, autoreview)
for engine in engines:
print(f"== {engine} ==", flush=True)
command = [
sys.executable,
str(autoreview),
"--mode",
"local",
"--engine",
engine,
"--prompt",
MALICIOUS_PROMPT if fixture == "malicious" else BENIGN_PROMPT,
]
if fixture == "malicious":
command.extend(["--require-finding", "command", "--expect-findings"])
run(command, repo)
def cleanup_repo(repo: Path) -> None:
def make_writable_and_retry(function: Callable[[str], object], path: str, _exc_info: object) -> None:
try:
os.chmod(path, stat.S_IREAD | stat.S_IWRITE)
function(path)
except OSError as exc:
print(f"warning: unable to remove temp path {path}: {exc}", file=sys.stderr)
if not repo.exists():
return
try:
shutil.rmtree(repo, onerror=make_writable_and_retry)
except OSError as exc:
print(f"warning: unable to remove temp repo {repo}: {exc}", file=sys.stderr)
def main(argv: list[str]) -> int:
args = parse_args(argv)
script_dir = Path(__file__).resolve().parent
engines = args.engines or list(DEFAULT_ENGINES)
repo = Path(tempfile.mkdtemp(prefix="autoreview-fixture."))
try:
create_fixture_repo(repo, args.fixture)
run_reviews(repo, script_dir, args.fixture, engines)
except subprocess.CalledProcessError as exc:
return int(exc.returncode or 1)
finally:
cleanup_repo(repo)
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

View File

@@ -1,170 +0,0 @@
---
name: claw-score
description: Audit or refresh OpenClaw maturity scorecard docs from root taxonomy, maturity scores, and QA evidence artifacts without using maintainer discrawl data or committed inventory reports.
---
# claw-score
Use this skill when working on the OpenClaw maturity scorecard in this repo.
This is the openclaw-local version of the maintainer `claw-score` workflow:
it keeps the taxonomy and scorecard concepts, but excludes discrawl and the old
committed `inventory/` report tree.
## Authority
This skill owns the operational workflow for:
- `taxonomy.yaml`
- `docs/maturity-scores.yaml`
- `docs/maturity-scorecard.md`
- `docs/taxonomy.md`
- `docs/taxonomy-outline.md`
- `scripts/render-maturity-docs.mjs`
- `.github/workflows/maturity-scorecard.yml`
Keep person-specific, maintainer-private, Discord archive, and discrawl facts
out of this repo. If a score needs private evidence, use the redacted
`qa-evidence.json` artifact shape generated by OpenClaw QA workflows.
## Source Model
- `taxonomy.yaml` is the hand-edited source of truth for surfaces, levels,
QA profiles, categories, feature coverage IDs, docs refs, LTS overrides, and
completeness-instruction paths.
- `docs/maturity-scores.yaml` is the aggregate score source committed in this
repo. It is the only committed score data; do not add generated inventory
directories.
- `docs/maturity-scorecard.md`, `docs/taxonomy.md`, and
`docs/taxonomy-outline.md` are deterministic docs generated from the root
taxonomy and aggregate score source.
- `qa-evidence.json` artifacts provide per-run QA scorecard evidence. They can
enrich generated artifact docs, but they are not committed as inventory.
## Commands
Run from the openclaw repo root.
Render committed docs:
```bash
pnpm maturity:render
```
Check generated docs are current:
```bash
pnpm maturity:check
```
Render an evidence-enriched docs artifact from downloaded QA artifacts:
```bash
pnpm maturity:render -- --evidence-dir .artifacts/maturity-evidence --output-dir .artifacts/maturity-docs
```
## Scoring Workflow
When asked to score or refresh a surface:
1. Read the surface in `taxonomy.yaml`.
2. Read the surface completeness rubric under
`.agents/skills/claw-score/references/completeness/`.
3. Gather public repo evidence from docs, source, tests, and QA scenario
metadata.
4. Prefer existing `qa-evidence.json` artifacts for executed proof. Do not use
discrawl or unredacted private archives.
5. Update `docs/maturity-scores.yaml` only when the score change is backed by
public or redacted artifact evidence.
6. Run `pnpm maturity:render`.
7. Run `pnpm maturity:check`.
For subjective score changes, make the smallest defensible edit and leave the
evidence path in the PR or task summary. The deterministic renderer owns
Markdown structure; manual prose tweaks belong in taxonomy, score source, or
the renderer rather than in generated docs.
## Default Completeness Process
Completeness is scored against the intended operator-visible workflow for each
category, not against test breadth or implementation quality. The completeness
reference files under `references/completeness/` define the category scope and
any surface-specific variation from this default process.
By default, Completeness measures how fully OpenClaw exposes the intended
surface capability set to the user, operator, author, or maintainer persona for
that surface. Score whether each category delivers the full expected workflow,
including setup, normal use, status or inspection, recovery, and important
platform, provider, channel, security, or lifecycle variants where they apply.
Treat `Surface-Specific Scoring Questions` and `Surface-Specific Guidance` as
higher-priority instructions for that surface. The surface instructions may
flesh out, narrow, or intentionally conflict with the default ideas here; when
they do, follow the surface instructions and make the score rationale reflect
that surface-specific instruction. If a reference file does not include
surface-specific questions or guidance, apply this default process to the
surface's `Category Scope`.
For each category, ask:
- Can the intended user or operator complete the category workflow end to end?
- Are the taxonomy features present as supported capabilities rather than
isolated implementation fragments?
- Are the important lifecycle stages represented: setup, normal operation,
status/inspection, recovery, and upgrade or removal where relevant?
- Are the important environment, provider, platform, channel, or security
branches present for this surface?
- Do the known gaps leave major user-visible capability branches missing?
Default guidance:
- Favor higher Completeness when the category supports the full
operator-visible workflow described by taxonomy and category evidence.
- Lower Completeness when only the happy path exists, when important variants
are undocumented or unimplemented, or when recovery/status paths are missing.
- Do not lower Completeness because tests are thin; that is Coverage.
- Do not lower Completeness because implementation quality is fragile; that is
Quality.
Default Completeness bands:
- `Lovable` (95-100): complete across expected workflows, variants, and
recovery branches, with only minor polish gaps.
- `Stable` (80-95): the expected workflow set is broadly present, with only
bounded missing branches.
- `Beta` (70-80): the main workflow exists, but meaningful branches or recovery
paths are still absent.
- `Alpha` (50-70): only a partial capability set is present; users can complete
some core tasks but not the full expected workflow.
- `Experimental` (0-50): the category exposes only fragments of the intended
capability.
## Score Semantics
- Coverage: public or redacted proof that the feature is exercised by docs,
tests, QA scenarios, live lanes, or release evidence.
- Quality: reliability, maintainability, operator safety, and regression
confidence for the category.
- Completeness: how much of the intended operator-visible workflow exists for
the category. Use the default completeness process plus any surface-specific
variation before changing this score.
- LTS: derived from score thresholds and `human_lts_override`; do not hand-edit
generated Markdown to change LTS status.
Bands:
- `Lovable`: 95-100
- `Stable`: 80-95
- `Beta`: 70-80
- `Alpha`: 50-70
- `Experimental`: 0-50
## GitHub Action
The `Maturity scorecard` workflow verifies committed generated docs on PRs and
pushes. Manual dispatch can also download QA artifacts from another workflow run
with `source_run_id` and `artifact_pattern`, render evidence-enriched docs into
`.artifacts/maturity-docs`, and upload them as a GitHub artifact.
Do not add the maintainer repo's `docs/kevinslin/maturity-scorecard/inventory/`
tree to openclaw. Those generated reports are intentionally replaced here by
short-lived artifact docs and the committed aggregate scorecard pages.

View File

@@ -1,16 +0,0 @@
# Agent Runtime Completeness
Use this rubric when assigning category Completeness scores for the
`agent-runtime-and-provider-execution` surface.
## Category Scope
- Agent Turn Execution: Turn startup and runtime choice, Session and run coordination, Abort and terminal outcomes
- External Runtimes and Subagents: External harness selection, CLI runtime aliases, Subagent turns, Runtime recovery
- Hosted Provider Execution: Hosted provider turns, Provider-specific model options, Hosted tool use, Reasoning and cache controls, Hosted streaming and replies
- Local and Self-hosted Providers: Local provider profiles, Tool-capability flags, Timeouts and context windows, Local smoke checks, Local failure handling
- Model and Runtime Selection: Model reference selection, Provider and runtime overrides, Thinking and context settings, Invalid route recovery
- Provider Auth: Login and API-key setup, Auth profile selection, Credential health checks, Auth failover, Provider fallback recovery, Rate-limit and capacity recovery, Missing-key and OAuth guidance, Restart and stale-route recovery, Structured provider diagnostics, Subagent credential propagation
- Streaming and Progress: Streaming replies, Progress visibility
- Tool Calls and Response Handling: Tool-call handling, Usage and response reporting, Failure recovery
- Tool Execution Controls: Tool availability rules, Sandboxed exec behavior, Approval flow, Elevated execution, Tool safety controls, Delegated tool access

View File

@@ -1,14 +0,0 @@
# Android app Completeness
Use this rubric when assigning category Completeness scores for the
`android-app` surface.
## Category Scope
- Media Capture: Camera and media capture
- Mobile Chat: Chat tab
- Connection Setup: Gateway discovery
- Distribution: Public Google Play install path, Manual install path, Release smoke and startup performance
- Settings: Settings sheet
- Voice: Voice tab
- Device Runtime: Background reconnect and presence, Device command availability

View File

@@ -1,12 +0,0 @@
# Anthropic provider path Completeness
Use this rubric when assigning category Completeness scores for the
`anthropic-provider-path` surface.
## Category Scope
- Provider Auth and Recovery: API-key onboarding, Claude CLI credential reuse, Setup-token auth, Auth profile health, Model status, Usage windows, Cooldown/profile reporting, Long-context recovery, Fallback guidance
- Model and Runtime Selection: Bundled Claude catalog, Canonical anthropic refs, Claude CLI compatibility, Model picker availability, Capability metadata, Runtime selection, Session continuity, MCP/tool bridge, Permission-mode mapping, Fallback prelude
- Request Transport and Turn Semantics: API-key/OAuth transport, Messages payloads, Streaming decode, Usage and stop reasons, Abort/error handling, Tool-use blocks, Tool-result replay, Partial JSON recovery, Native thinking, Signed/redacted thinking replay
- Prompt Cache and Context: Cache retention, System-prompt cache boundary, 1M context, Fast mode/service tier, Cache diagnostics
- Media Inputs: Image input, PDF document input, Media model fallback, Image tool results

View File

@@ -1,13 +0,0 @@
# Automation: cron, hooks, tasks, polling Completeness
Use this rubric when assigning category Completeness scores for the
`automation-cron-hooks-tasks-polling` surface.
## Category Scope
- Cron Jobs: Create/edit/remove jobs, Schedule types, Timezone and stagger, Cron RPCs, Agent cron tool, Manual cron runs, Isolated cron execution, Model/provider preflight, Run history, Timeout and denial diagnostics, Chat announce delivery, Webhook delivery, Failure destinations, Skipped-run alerts, Delivery previews
- Event Ingress: Telegram long polling, Telegram webhook mode, Zalo polling/webhook mode, Polling stall diagnostics, iMessage watch fallback, Gmail setup wizard, Watcher start/serve, Tailscale/public routing, Push token validation, Gmail event routing, POST /hooks/wake, POST /hooks/agent, Mapped hooks, Hook auth policy, Async dispatch
- Automation Hooks: HOOK.md authoring, Hook discovery, Hook CLI management, Hook packs, Lifecycle event dispatch, api.on registration, Tool-call policy hooks, Message hooks, Session/lifecycle hooks, Plugin approval requests, cron_changed
- Background Tasks and Flows: Task list/show/cancel, Task notifications, Task audit and maintenance, Chat task board, Task pressure status, Managed flows, Mirrored flows, openclaw tasks flow, Flow audit and maintenance, Plugin managedFlows
- Heartbeat: Heartbeat scheduling, Active hours, Wake and cooldown handling, Due-only heartbeat tasks, Commitment check-ins
- Polling Controls: openclaw message poll, Telegram polls, Teams polls, Poll flags, Channel capability gates, process poll, process log, Background process status, No-progress loop detection, Process input controls

View File

@@ -1,10 +0,0 @@
# Browser automation and exec/sandbox tools Completeness
Use this rubric when assigning category Completeness scores for the
`browser-automation-and-exec-sandbox-tools` surface.
## Category Scope
- Browser Automation: Browser Actions, Snapshots, Artifacts, Browser Plugin Service, Profiles, Browser Security, SSRF, Remote Control
- Tool Invocation and Execution: Exec Routing, Process Lifecycle, Direct Tool Invoke API, Node System.run, Host Exec Approvals, Elevated Mode
- Sandbox and Tool Policy: Sandbox Backends, Workspace Isolation, Sandboxed Browser, Codex Dynamic Tools, Tool Policy, Sandbox Tool Gates

View File

@@ -1,14 +0,0 @@
# Gateway Web App Completeness
Use this rubric when assigning category Completeness scores for the
`browser-control-ui-and-webchat` surface.
## Category Scope
- Browser Realtime Talk: Browser Talk start/stop, Provider session selection, Gateway relay audio, Tool-call consults, Steer and cancel
- Browser Access and Trust: Device pairing, Token/password auth, Tailscale Serve auth, Trusted proxy auth, Allowed origins/gatewayUrl
- Configuration: Config snapshots, Schema form editing, Raw JSON editing, Base-hash guarded writes, Apply and restart
- Browser UI: Gateway-hosted UI, Dashboard open/auth bootstrap, Base-path routing, Static asset recovery, Dev gatewayUrl target, PWA install metadata, Service worker updates, VAPID keys, Subscribe/unsubscribe, Test notifications
- WebChat Conversations: Send and abort, Session and agent picker, Model/thinking controls, Attachments, Markdown/tool/media rendering, chat.history projection, chat.send lifecycle, Abort/partial retention, Injected assistant notes, Reconnect continuity, Hosted embeds, External embed gating, Assistant media tickets, Authenticated avatars, CSP image policy
- Remote WebChat: macOS WebChat transport, SSH tunnel data plane, Direct ws/wss remote mode, Session continuity, Remote troubleshooting
- Operator Console: Health/status/models, Live log tail, Update run/status, Activity summaries, RPC timing telemetry, Channels/login, Session manager and history, Cron, Skills/nodes, Exec approvals/agents

View File

@@ -1,15 +0,0 @@
# Channel framework Completeness
Use this rubric when assigning category Completeness scores for the
`channel-framework` surface.
## Category Scope
- Channel Actions Commands and Approvals: Channel-native commands, Native command session target, Message actions, Message tool API discovery, Channel-native approval prompts
- Channel Setup: Supported channel catalog, Channel status taxonomy in channels list, Setup/onboarding flows, Install-on-demand, Setup wizard metadata
- Group Thread and Ambient Room Behavior: Group/channel session isolation, Mention-required, Native threads, Broadcast groups, Bot-loop protection
- Inbound Access and Identity Gates: DM pairing, Group/channel allowlists, Access group expansion, Mention gating, Sanitized inbound identity/route projections
- Media Attachments and Rich Channel Data: Inbound media normalization, Outbound direct text/media sends, Provider-specific channelData, Media roots
- Outbound Delivery and Reply Pipeline: Automatic final reply delivery, Durable outbound send orchestration, Reply pipeline transforms, Provider outbound adapter bridge
- Conversation Routing and Delivery: Inbound conversation routing, Session key construction, Agent binding precedence, Runtime conversation bindings, Thread/parent-child placement, Plugin registry resolution, Channel account startup, Whole-channel lifecycle controls, Config/secrets reload interactions, Auto-restart
- Status Health and Operator Controls: channels.status, Channel health policy, Operator CLI controls, Status read-model

View File

@@ -1,12 +0,0 @@
# ClawHub Completeness
Use this rubric when assigning category Completeness scores for the
`clawhub-and-external-plugin-distribution` surface.
## Category Scope
- Publishing: ClawHub package publishing owner, OpenClaw-owned package release validation for ClawHub, Version bump gates, npm trusted publishing provenance, External code plugin package contract required, Skill package metadata, Skill publishing flow
- Catalog Discovery: openclaw plugins search as the ClawHub, Search result metadata, Distinction between plugin search, Catalog lookup failure, Skill catalog search
- Compatibility and Trust: openclaw.compat.pluginApi, ClawHub package compatibility validation, npm compatibility fallback to the newest, Official external plugin catalog behavior, Compatibility docs, Operator trust model for installing, ClawHub archive, npm integrity drift, Built-in dangerous-code scanner, ClawHub publishing review/hidden-release behavior as upstream, Skill archive safety, Skill audit signals
- Plugin Lifecycle: Source prefixes, Bare package behavior during the launch, Explicit pinned versions, Managed install records that preserve source, Codex, Local, Marketplace list, Supported mapped features, Remote marketplace path safety, Update by plugin id, Reinstall vs update semantics, Downgrade, Uninstall config/index/policy/file cleanup, Gateway restart/reload requirements after, ClawHub skill installs, Skill upload install path, Skill dependency installers
- Plugin Health: Per-plugin managed npm project, npm-pack local release-candidate installs, Dependency ownership between plugin packages, Peer dependency relinking, Legacy dependency root cleanup, plugins list, Local plugin index, Troubleshooting stale config, Runtime verification after Gateway

View File

@@ -1,37 +0,0 @@
# CLI Surface Completeness
Use this rubric when assigning category Completeness scores for the
`cli-install-update-onboard-doctor` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Can a normal operator complete the job end to end from the CLI?
- Are the expected environments represented where they matter for the category,
such as local installs, remote gateway use, supervised services, or
Windows/WSL2?
- Are the main lifecycle stages present where relevant: setup, inspection,
change, repair, and upgrade?
- Are common recovery and troubleshooting branches present, or does the
workflow dead-end after the happy path?
- Are major documented operator expectations still unimplemented?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness is the CLI operator journey for installation, onboarding, configuration, repair, and upgrade across expected environments and recovery branches.
- Score the CLI against the full operator journey, not only installation or the happy path.
- Repair, migration, remote, and platform-specific branches are expected where a category exposes them.
- For Windows and WSL2, score against the intended supported experience rather than parity with macOS/Linux internals.
## Category Scope
- CLI Setup: Installer scripts, Local prefix install, Package-manager installs, Supported Node runtime, Source checkout install, CLI entrypoint
- Onboarding and Auth Setup: Guided onboarding, Targeted reconfiguration, Auth choices, Gateway auth storage, Remote onboarding
- Plugin and Channel Setup: Channel picker, Plugin install sources, Channel account setup, Post-setup probes, Remote gateway caveat
- Gateway Service Management: Foreground gateway runs, Service install and control, Service auth wiring, Drift and reinstall recovery, Service health checks
- CLI Observability: Status snapshots, Health snapshots, Remote log tailing, Diagnostics export, Support-safe redaction
- Doctor: Interactive repair, Config migration, Auth and SecretRef checks, Plugin validation and repair, Lint and JSON findings, Extra gateway discovery, Supervisor drift repair, Port and startup diagnosis, Runtime path checks, Restart guidance
- Updates and Upgrades: Update channels, Install-kind switching, Managed gateway restart, Update status and RPC, Plugin convergence

View File

@@ -1,13 +0,0 @@
# Discord Completeness
Use this rubric when assigning category Completeness scores for the
`discord` surface.
## Category Scope
- Channel Setup and Operations: Application and bot setup, Token and application ID configuration, Setup wizard and account inspection, Status, doctor, and intent checks, Multi-account bot configuration, Account monitor startup, Gateway WebSocket lifecycle, Reconnect and heartbeat handling, Rate limits and gateway metadata, Status, probe, and health-monitor recovery
- Access and Identity: DM policy modes, Allowlist inheritance, Pairing-code approval, Sender authorization, Access-group authorization, Group DM authorization
- Conversation Routing and Delivery: Guild and channel admission, Mention gating, Session key isolation, Configured and runtime routing, Inbound context visibility, Forum and media-channel thread posts, Thread actions, Target parsing, Thread context resolution, Thread-bound session routing, ACP agent routing, Routing lifecycle, Discord forum/media channel posts created as, CLI and message-tool thread actions, Discord target parsing for `channel:<id>`, Thread context resolution, Thread-bound session routing for `/focus`, `/unfocus`, `/agents`, `/session idle`, `/session max-age`, `sessions_spawn({ thread, ACP current-conversation bindings and ACP thread, Binding lifecycle behavior, Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context
- Media and Rich Content: Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context, Direct and thread sends, Text chunking and reply mode, Draft and progress edits, Mention and embed rendering, REST retry and final delivery, File uploads, Component file and media-gallery blocks, Video caption follow-up, Voice-message upload, Inbound attachment context, Outbound file uploads from URLs and, Component v2 file and media-gallery blocks, Video caption handling and follow-up media-only delivery, Discord voice-message sends with OGG/Opus conversion, Inbound media/attachment-aware debounce behavior, Realtime voice-channel conversations, General text-only delivery
- Native Controls and Approvals: Native slash command registration, Native slash command execution, Model Picker Commands, Components v2 messages, Callback TTL, Native Discord exec/plugin approvals, Sensitive owner-only command routing for prompts, Discord message actions, Action gates under channels.discord.actions.\*
- Realtime Voice and Calls: Voice Channel Lifecycle, Auto-join and follow-users, Realtime voice modes, Wake, barge-in, and echo handling, Voice codec and DAVE recovery

View File

@@ -1,11 +0,0 @@
# Docker / Podman hosting Completeness
Use this rubric when assigning category Completeness scores for the
`docker-podman-hosting` surface.
## Category Scope
- Container Setup: Local Image Setup Script, Docker Compose gateway, First-run onboarding, Docker-only first-run notes, Podman setup scripts and Quadlet template, Rootless Podman image setup
- Container Operations: Host CLI routing into running Docker/Podman, Container Targeting, Container update/rebuild/restart guidance for Docker, Docker Compose, Gateway token generation, Ownership, Docker Compose, Container health endpoints, Provider/VPS Docker hosting docs, Docker VM persistence/update guidance, Operator-facing update
- Image Release and Validation: Root Dockerfile build stages, Docker release workflow, Docker E2E package artifact generation, Docker E2E plan/scheduler scripts, Release-path install
- Agent Sandbox and Tooling: Docker gateway setup, Docker-backed agent sandbox support, Container image dependency baking

View File

@@ -1,11 +0,0 @@
# Feishu, QQ Bot, WeChat, Yuanbao, Zalo, Zalo Personal, regional channels Completeness
Use this rubric when assigning category Completeness scores for the
`feishu-qq-bot-wechat-yuanbao-zalo-zalo-personal-regional-channels` surface.
## Category Scope
- Channel Setup and Operations: Docs channel index, Official external channel catalog entries, Core channel-plugin catalog, Channel setup wizard, Missing-plugin, Cross-channel ingress/access/refactor concerns, Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Access and Identity: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Conversation Routing and Delivery: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, WeChat/Weixin personal messaging, Plugin install, Direct-message pairing, Core-side catalog metadata, External sidecar/helper process behavior, zalouser channel plugin, QR login, DM pairing, Message send, Doctor/status checks for runtime availability, Explicit unofficial-account risk, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel `openclaw-plugin-yuanbao, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems, zalouser` channel plugin for Zalo Personal, QR login, DM pairing, Message send, Doctor/status checks for runtime availability and, Explicit unofficial-account risk and operator safeguards
- Media and Rich Content: Feishu/Lark bot channel setup, WebSocket default mode, DM pairing, Message delivery, Feishu document, Multi-account credential handling, QQ Open Platform AppID/AppSecret setup, C2C private chat, Group activation, Rich media messages, Slash commands, Multi-account gateway connections, Tencent Yuanbao external channel, AppKey/AppSecret setup, DMs, Outbound queue strategy, Core-side official external catalog, Zalo Bot Creator / Marketplace bot, Long-polling default mode, Bot token, Group policy schema, Text, Status probes, QQ Open Platform AppID/AppSecret setup and, C2C private chat, Group activation, Inbound and outbound rich media including, Slash commands, Multi-account gateway connections, Zalo Bot Creator / Marketplace bot, Long-polling default mode and optional HTTPS, Bot token, Group policy schema and fail-closed group, Text, Status probes and troubleshooting for token/config/webhook problems

View File

@@ -1,43 +0,0 @@
# Gateway Runtime Completeness
Use this rubric when assigning category Completeness scores for the
`gateway-runtime` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Does the category cover the main happy path an operator or client needs?
- Are the major deployment modes present where they matter for this category:
local, remote, node-mediated, supervised, or browser-facing?
- Are the main lifecycle stages present where relevant: setup, normal use,
status/inspection, and recovery?
- Are important security or policy branches present where the category implies
them?
- Are obvious operator-visible holes or "not yet supported" branches still
missing?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness includes operator and connected-client workflows, major deployment modes, and recovery paths, not just gateway protocol capability.
- Score the Gateway against the full operator and client journey, not just protocol primitives or one transport path.
- Local, remote, node-mediated, supervised, and browser-facing modes matter when the category implies them.
- Approval/policy variants and recovery or diagnostic paths count as completeness branches, not polish.
## Category Scope
- Approvals and Remote Execution: Exec approvals, Plugin approvals, Node exec approvals, Approved node execution, Approval mutation safety, Delivery fallback behavior
- HTTP APIs: OpenAI-compatible APIs, Tool invocation API, Admin API access, Hook ingress
- Hosted Web Surface: Control UI, WebChat hosting, Plugin web routes, Canvas and A2UI routes
- Gateway RPC APIs and Events: Health APIs, Identity and presence APIs, Model APIs, Usage and memory APIs, Session APIs, Chat APIs, Channel APIs, Web login and wake APIs, Config and secrets APIs, Update and setup APIs, Agent and artifact APIs, Task and automation APIs, Tool and skill APIs, Request and event envelopes, Idempotent side effects, Method discovery, Event discovery, Accepted-then-final results, Event ordering, State refresh after gaps
- Device Auth and Pairing: Shared-secret login, Trusted proxy auth, Private ingress mode, Device challenge signing, Device tokens, Setup-code bootstrap, Auth mismatch recovery, Device auth migration, Client pairing, Node pairing
- Network Access and Discovery: Loopback and LAN access, Tailnet access, SSH tunnels, Endpoint discovery, Saved endpoints, TLS pinning
- Nodes and Remote Capabilities: Node presence, Node capabilities, Node inventory, Node actions, Node events, Pending work delivery, Remote device capabilities, Remote host commands
- Health, Diagnostics, and Repair: Health snapshots, Channel readiness, Stability diagnostics, Payload diagnostics, Diagnostics exports, Doctor checks, Log tailing
- Protocol Compatibility: Published protocol schema, Runtime request validation, JSON Schema export, Swift client models, Version negotiation, Client transport defaults, Backward-compatible evolution
- Roles and Permissions: Role negotiation, Operator permissions, Approval-gated actions, Untrusted node declarations, Event scoping
- Gateway Lifecycle: Foreground startup, Service installation, Restart and stop, Service status, Bind and port settings, Config reload, Multi-gateway isolation
- Security Controls: Non-loopback auth, Trusted proxy exceptions, Gateway and node trust boundaries, Trusted CIDR auto-approval, Fail-closed protocol handling, Remote execution safeguards
- WebSocket Connection: WebSocket transport, Connect challenge, Connect request, Protocol version negotiation, hello-ok snapshot, Startup retry, Session limits, Plugin surface URLs

View File

@@ -1,12 +0,0 @@
# Google Chat Completeness
Use this rubric when assigning category Completeness scores for the
`google-chat` surface.
## Category Scope
- Channel Setup and Operations: Google Cloud project setup, Chat app configuration, Service account setup, Webhook audience and path, Workspace visibility and app status, Guided channel setup, Account resolution, Service account SecretRefs, Env file and inline credentials, Channel status and probes, Directory and mutable-id diagnostics, NPM and ClawHub install, Plugin docs and catalog routing, Channel aliases and labels, Operator status UI, Install/update metadata, Webhook path handling, Standard Chat token verification, Workspace add-on token verification, Audience and appPrincipal validation, Shared-path target selection, Auth rejection diagnostics, Account resolution, Service account SecretRefs, Env file and inline credentials, Channel status and probes, Directory and mutable-id diagnostics, NPM and ClawHub install, Plugin docs and catalog routing, Channel aliases and labels, Operator status UI, Install/update metadata, Webhook path handling, Standard Chat token verification, Workspace add-on token verification, Audience and appPrincipal binding, Shared-path target selection, Auth rejection diagnostics
- Access and Identity: DM pairing approval, Sender allowlists, Google Chat identity matching, Direct session routing, Pairing diagnostics, Space allowlists, Mention gating, Sender access groups, Group session isolation, Bot-loop protection, Space diagnostics
- Conversation Routing and Delivery: DM pairing approval, Sender allowlists, Google Chat identity matching, Direct session routing, Pairing diagnostics, Space allowlists, Mention gating, Sender access groups, Group session isolation, Bot-loop protection, Space diagnostics, Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering
- Media and Rich Content: Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering
- Native Controls and Approvals: Inbound attachments, Outbound media replies, Message upload action, Media source and size controls, Media receipts and thread placement, Text send action, Upload-file action, Reaction actions, Action capability gates, Approval sender matching, Thread-aware replies, Streaming and chunked replies, Typing placeholder lifecycle, Message-tool current-source replies, NO_REPLY cleanup, Markdown/text rendering

View File

@@ -1,12 +0,0 @@
# Google provider path Completeness
Use this rubric when assigning category Completeness scores for the
`google-provider-path` surface.
## Category Scope
- Provider Setup and Credentials: API key onboarding, Auth choice metadata, Gemini CLI OAuth setup, Vertex ADC setup, Daemon and fallback credentials, CLI runtime selection, OAuth login and refresh, Canonical Google model refs, CLI usage normalization, OAuth diagnostics
- Model Routing and Endpoints: Catalog rows and aliases, Dynamic model resolution, Provider routing, Google-native config normalization, Model picker availability, Vertex provider selection, ADC/service-account auth, Project/location endpoints, Custom base URL policy, Compatibility boundaries
- Direct Gemini Runtime: Direct Gemini chat, Multimodal inputs, Tool-call streaming, Usage and stop reasons, Thought-signature replay, Thinking-level mapping, Thought-signature replay, Tool turn ordering, Incomplete-turn recovery, Planning-only turn recovery
- Media, Search, and Realtime: Bundled plugin distribution, Provider auto-enable metadata, Image and media adapters, Speech and realtime adapters, Search and generation tools, Realtime voice sessions, Constrained browser tokens, Audio and transcript events, Live tool calls, Session reconnects
- Prompt Caching: Cache retention config, Managed cachedContents, Manual cachedContent handles, Cache usage accounting, Cache diagnostics and live proof

View File

@@ -1,12 +0,0 @@
# Image/video/music generation tools Completeness
Use this rubric when assigning category Completeness scores for the
`image-video-music-generation-tools` surface.
## Category Scope
- Media Routing and Discovery: default media model config, per-call model refs and fallbacks, auth-backed tool discovery, action=list provider inspection
- Task Lifecycle and Delivery: background task creation, task status/list/show/cancel, duplicate guards, progress keepalive, completion/failure wake, no-session inline fallback, local media persistence, MIME/filename inference, Hosted URL fallback, message-tool handoff, idempotent missing-media fallback, channel attachment proof
- Image Generation: text-to-image, reference-image editing, output hints, action=status, provider attempt metadata, OpenAI/Codex OAuth, API-key OpenAI, OpenRouter/xAI/fal/LiteLLM/DeepInfra/Google/MiniMax/ComfyUI auth, provider error diagnostics
- Video Generation: text-to-video, image-to-video, video-to-video, reference role validation, audio refs, typed providerOptions, queue-backed jobs, polling/timeout handling, Hosted URL download, provider skip explanations, returned asset metadata
- Music Generation: prompt and lyrics input, instrumental mode, duration/format controls, image-reference edit lanes, generated audio outputs, provider fallback

View File

@@ -1,12 +0,0 @@
# iMessage / BlueBubbles Completeness
Use this rubric when assigning category Completeness scores for the
`imessage-bluebubbles` surface.
## Category Scope
- Channel Setup and Operations: Translate legacy config, Cut over safely, Handle migration caveats, Run local imsg, Run through SSH wrapper, Grant macOS permissions, Probe runtime health, Account setup prompts, Account status checks, Doctor repair checks, Account Config, Translate legacy config, Cut over safely, Handle migration caveats, Run local imsg, Run through SSH wrapper, Grant macOS permissions, Probe runtime health
- Access and Identity: Authorize direct senders, Route direct conversations, Bind ACP sessions, Group Policy, Mentions, System Prompts, Group Policy, Mentions, System Prompts
- Conversation Routing and Delivery: Watch live messages, Coalesce split-send DMs, Replay missed messages, Seed conversation history, Authorize direct senders, Route direct conversations, Bind ACP sessions, Group Policy, Mentions, System Prompts
- Media and Rich Content: Media, Attachments, Remote Fetch, Chunking, Native Actions, Private API, Message Tool
- Native Controls and Approvals: Native Approvals, Reactions, Operator Control, Media, Attachments, Remote Fetch, Chunking, Native Actions, Private API, Message Tool, Native Actions, Private API, Message Tool

View File

@@ -1,15 +0,0 @@
# iOS app Completeness
Use this rubric when assigning category Completeness scores for the
`ios-app` surface.
## Category Scope
- Media and Sharing: Camera list/snap/clip
- Canvas and Screen: Canvas present/hide/navigate/eval/snapshot
- Chat and Sessions: Chat sessions and operator controls
- Gateway Setup and Diagnostics: Bonjour/local, Manual host/port, Gateway connect configuration persistence, TLS fingerprint trust prompt, Pairing approval, Pairing/auth diagnostics for users, Settings tab
- Distribution: Internal preview status
- Device Commands: Location modes, Device command handling
- Notifications and Background: APNs registration and relay delivery
- Voice: Voice wake

View File

@@ -1,29 +0,0 @@
# Kubernetes Hosting Completeness
Use this rubric when assigning category Completeness scores for the
`kubernetes-hosting` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Can an operator deploy and manage OpenClaw on Kubernetes end to end?
- Are the taxonomy features present as supported manifests, commands, and docs rather than examples only?
- Are setup, normal operation, status or inspection, redeploy, teardown, and secret rotation represented where relevant?
- Are local Kind validation, namespace/image customization, provider secrets, and secure exposure branches covered?
- Do known gaps leave major cluster-hosting capability branches missing?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness is the Kubernetes operator workflow for deployment, configuration, secrets, access, exposure, lifecycle, security posture, status, and recovery.
- A complete Kubernetes category lets an operator deploy, expose, secure, update, troubleshoot, and remove the Gateway without relying on Docker-only assumptions.
- Happy-path port-forwarding, missing secret/config rotation, or omitted exposed-service security posture are material completeness gaps.
## Category Scope
- Deployment Setup: Kustomize packaging, cluster prerequisites, quick deploy, manifest apply, and Kind validation.
- Configuration and Secrets: agent instructions, Gateway config, provider secrets, secret rotation, and image/namespace customization.
- Access and Exposure: port-forward access, service endpoint, ingress exposure, auth/TLS, and localhost posture.
- Cluster Lifecycle: resource layout, state persistence, redeploy, teardown, and security context.

View File

@@ -1,12 +0,0 @@
# Linux companion app Completeness
Use this rubric when assigning category Completeness scores for the
`linux-companion-app` surface.
## Category Scope
- App Distribution: Native app package, Distro package targets, Official release metadata
- Gateway Connectivity: Local Gateway attach and status, Gateway pairing and auth, Remote mode, Local and remote resource boundaries
- Chat and Sessions: Native Linux chat window, Transcript, Gateway chat transport
- Desktop Capabilities: Linux desktop permissions, Secret storage, Sandbox/package posture, Linux native node identity, Host command execution, Desktop tools, Linux native Talk, Microphone capture, Native media permissions
- Status and Diagnostics: Native Linux app readiness, Gateway health/status display, Log/transcript opening, Doctor/repair affordances, Linux tray/status item, Runtime status row, Desktop-environment integration

View File

@@ -1,12 +0,0 @@
# Linux Gateway host Completeness
Use this rubric when assigning category Completeness scores for the
`linux-gateway-host` surface.
## Category Scope
- Host Setup and Updates: Linux CLI install, Node runtime prerequisites, Package-manager policy, Update path
- Gateway Runtime and Service Control: Foreground Gateway Runtime, Process Control, Systemd User Service Lifecycle setup, Systemd User Service Lifecycle operation, Systemd User Service Lifecycle status, Systemd User Service Lifecycle recovery
- Remote Access and Security: Remote Network Exposure, TLS, Tailscale, Gateway exposure safeguards, Gateway authentication modes, Secret Handling
- Diagnostics and Repair: Gateway diagnostic reports, Gateway log tailing, Doctor checks, Operator repair guidance
- Deployment Targets: VPS, Container, Cloud Deployment Guidance

View File

@@ -1,12 +0,0 @@
# Local model providers: Ollama, vLLM, SGLang, LM Studio Completeness
Use this rubric when assigning category Completeness scores for the
`local-model-providers-ollama-vllm-sglang-lm-studio` surface.
## Category Scope
- Provider Setup, Lifecycle, and Diagnostics: Provider Selection, Onboarding, localService configuration, Process startup and readiness, Request leases and idle shutdown, Health checks and restart, Provider recipes, Local provider status, Backend reachability probes, Model availability errors, Memory readiness diagnostics, Provider troubleshooting docs
- Native Provider Plugins: Ollama setup and model pulling, Model discovery, Streaming and vision, Ollama embeddings, Web-search support, LM Studio setup, Model discovery and auth, Model preload and JIT loading, Streaming compatibility, LM Studio embeddings
- OpenAI-Compatible Runtime Compatibility: Bundled provider setup, Model Discovery Endpoint, Non-interactive configuration, vLLM thinking controls, OpenAI-compatible chat and tool semantics, SGLang compatibility guidance, Request Stream Compatibility, Tool Calling
- Local Memory and Embeddings: Embedding provider selection, Memory search readiness, memoryFlush model override, Fallback lexical search, Provider mismatch guidance
- Network Safety and Prompt Controls: Safety Network, Prompt Pressure Controls

View File

@@ -1,10 +0,0 @@
# Long-tail hosted providers Completeness
Use this rubric when assigning category Completeness scores for the
`long-tail-hosted-providers` surface.
## Category Scope
- Hosted LLM Providers: Bedrock setup, Gateway/proxy routing, Copilot/OpenCode hosted access, Proxy capability diagnostics, Hosted text completion, Tool-call and streaming compatibility, Model catalog resolution, Provider-specific request shaping, Regional provider setup, Region and plan routing, Regional live smoke, Account prerequisite diagnostics
- Hosted Media Providers: Image generation providers, Video generation providers, Music generation providers, Media mode coverage, Text-to-speech providers, Speech-to-text providers, Realtime transcription providers, Audio format diagnostics
- Provider Operations: Provider directory, Provider install catalog, Model catalog metadata, Catalog parity checks, Provider setup descriptors, Auth profiles and aliases, Credential health probes, Key rotation and recovery, Direct provider smoke, Gateway live smoke, Models status probes, Fallback trace and repair

View File

@@ -1,14 +0,0 @@
# macOS companion app Completeness
Use this rubric when assigning category Completeness scores for the
`macos-companion-app` surface.
## Category Scope
- Canvas: Canvas panel open/hide/navigate/eval/snapshot, Local custom URL scheme, A2UI host auto-navigation, Canvas enable/disable setting
- Local Setup: Local mode Gateway attach/start/stop, LaunchAgent install/update/restart/uninstall, Existing-listener detection, Native first-run onboarding flow, CLI discovery, Local workspace selection, Onboarding WebChat session separation
- Status and Settings: Menu-bar status, Activity state ingestion, Settings navigation, Health polling, Channels settings
- Native Capabilities: Mac node session connection, system.run, Exec approval policy, Permission requests, TCC persistence
- Remote Connections: Remote connection mode selection, SSH tunnel, Gateway discovery
- Voice and Talk: Voice Wake runtime, Push-to-talk, Talk provider playback plan
- WebChat: Native SwiftUI WebChat window, Gateway chat transport, Local and remote data-plane reuse

View File

@@ -1,14 +0,0 @@
# macOS Gateway host Completeness
Use this rubric when assigning category Completeness scores for the
`macos-gateway-host` surface.
## Category Scope
- CLI Setup: Hosted installer, Node 24 recommendation, App-triggered CLI install, Shell PATH and version-manager drift
- Local Gateway Integration: App local/remote connection mode, App-managed Gateway LaunchAgent install/restart/uninstall, CLI install detection, Attach-to-existing local Gateway compatibility, Gateway endpoint, gateway.mode=local configuration, Loopback bind, Local app endpoint resolution, Bonjour discovery
- Remote Gateway Mode: macOS app "Remote over SSH", SSH tunnel setup, Tailscale MagicDNS, Remote endpoint token/password/TLS fingerprint, Local node host startup
- Gateway Service Lifecycle: Per-user Gateway LaunchAgent install, launchctl bootstrap, LaunchAgent labels, Gateway token/env handling, App-managed LaunchAgent handoff, openclaw update package/git handoff, Managed service refresh, Stale updater launchd job detection, openclaw uninstall, Stranded service recovery
- Diagnostics and Observability: LaunchAgent log paths, openclaw gateway status --deep, Gateway silently stops responding, Stale updater jobs
- Permissions and Native Capabilities: macOS TCC permission prompts/status, Native node capability exposure, system.run policy, Permission-driven support
- Profiles and Isolation: Profile-specific LaunchAgent labels, Profile-specific state/config/workspace roots, Derived ports, Rescue bot setup, Extra Gateway process detection

View File

@@ -1,13 +0,0 @@
# Matrix Completeness
Use this rubric when assigning category Completeness scores for the
`matrix` surface.
## Category Scope
- Channel Setup and Operations: Matrix plugin identity, Setup wizard, Account discovery, Matrix doctor warnings, Matrix probe/status, Shared Matrix client resolution, Monitor startup, Startup maintenance, Matrix doctor warnings, Matrix probe/status, Monitor startup, Startup maintenance
- Access and Identity: DM policy, Direct-room classification, Inbound route selection across sender-bound DMs, Mention gates, Matrix thread reply routing, Persisted Matrix thread routing managers, ACP/subagent spawn hooks
- Conversation Routing and Delivery: DM policy, Direct-room classification, Inbound route selection across sender-bound DMs, Mention gates, Matrix thread reply routing, Persisted Matrix thread routing managers, ACP/subagent spawn hooks, Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling
- Media and Rich Content: Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling
- Native Controls and Approvals: Channel action discovery, Message send/read/edit/delete, Profile media loading, Outbound Matrix text, Message presentation metadata, Inbound media failure handling, Matrix native exec, Origin target resolution from Matrix turn, Approver DM target resolution, Matrix approval metadata, Origin target resolution from Matrix turn, Approver DM target resolution, Matrix approval metadata
- Encryption and Verification: Encryption setup, Encrypted media upload/download, Legacy state

View File

@@ -1,11 +0,0 @@
# Mattermost, LINE, IRC, Nextcloud Talk, Nostr, Twitch, Tlon, Synology Chat Completeness
Use this rubric when assigning category Completeness scores for the
`mattermost-line-irc-nextcloud-talk-nostr-twitch-tlon-synology-chat` surface.
## Category Scope
- Channel Setup and Operations: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Access and Identity: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Conversation Routing and Delivery: Mattermost bot account setup, WebSocket inbound monitoring, Outbound delivery, LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, IRC server/nick/TLS/NickServ setup, Raw IRC receive/send, Probe/status, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text and URL media delivery, Twitch bot account setup, Twitch IRC monitor/client lifecycle, Message tool send action, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion
- Media and Rich Content: LINE Messaging API webhook setup, Signed inbound webhook events, Rich LINE payloads, Nextcloud Talk bot installation, Webhook ingress, Outbound markdown/text, Synology Chat incoming/outgoing webhook setup, Webhook token verification, Outbound text, Nostr key setup, NIP-04 encrypted DM receive/send, Profile import/publish, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion, Tlon/Urbit ship URL/code setup, Urbit API auth/session, Rich text conversion

View File

@@ -1,13 +0,0 @@
# Media understanding and media generation Completeness
Use this rubric when assigning category Completeness scores for the
`media-understanding-and-media-generation` surface.
## Category Scope
- Media Intake and Access: Local and remote media references, MIME and type detection, Size caps and bounded reads, Safe remote fetch, Local root policy, Inbound media store, PDF/document extraction dispatch, QR and media helper classification
- Channel Media Handling: Inbound attachment staging, Sandbox media rewrites, Reply media templating, Message-tool attachment delivery, Duplicate delivery suppression
- Media Configuration: Media capability configuration
- Text-to-Speech Delivery: TTS, Outbound Voice Audio Delivery
- Media Understanding: Audio attachment selection, Batch STT provider and CLI fallback, Voice-note mention preflight, Transcript insertion and echo, Audio proxy and limit handling, Inbound image summarization, Active vision model bypass, Text-only model media offload, Vision provider fallback, Image and PDF input routing, Video Understanding, Direct Video Analysis
- Media Generation: Image generation tool invocation, Provider and model selection, Reference image editing, Generated image task lifecycle, Generated image persistence and delivery, Music generation tool invocation, Provider and model selection, Lyrics, instrumental, duration, and format controls, Reference inputs where supported, Music task lifecycle and duplicate status, Generated audio persistence and delivery, Video generation tool invocation, Mode and provider capability selection, Reference image, video, and audio inputs, Provider option validation, Video task lifecycle and status, Generated video persistence and delivery

View File

@@ -1,12 +0,0 @@
# Microsoft Teams Completeness
Use this rubric when assigning category Completeness scores for the
`microsoft-teams` surface.
## Category Scope
- Channel Setup and Operations: Teams CLI app creation, Bot registration and manifest upload, Credential configuration, Teams app install verification, Setup status, Probe and scope reporting, Teams app doctor, Webhook and health diagnostics, Operator repair paths, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary, Setup status, Probe and scope reporting, Teams app doctor, Webhook and health diagnostics, Operator repair paths, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary
- Access and Identity: DM pairing, Stable sender identity, Allowlists and access groups, Invoke and command authorization, Teams-originated config writes, Bot Framework SSO invokes, Delegated token storage, Graph directory lookup, Member profile lookup, Bot Framework SSO invokes, Delegated token storage, Graph directory lookup, Member profile lookup
- Conversation Routing and Delivery: Team and channel allowlists, Deterministic channel replies, Mention-gated group access, Session routing, Reply and thread context, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary, Text formatting and chunking, Adaptive and presentation cards, Progress streaming, Delivery receipts and errors, Queued and proactive replies, Webhook Runtime, SDK Lifecycle, Proactive Cloud Boundary
- Media and Rich Content: Inbound attachments, Graph-hosted media, File consent, SharePoint and OneDrive sharing, Media fetch safety
- Native Controls and Approvals: Message action discovery, Polls and reactions, Read, edit, delete, and pin, Native approval cards, Feedback and group actions

View File

@@ -1,31 +0,0 @@
# Multi-Agent Orchestration Completeness
Use this rubric when assigning category Completeness scores for the
`multi-agent-orchestration` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Can an operator configure and run the category workflow end to end?
- Are the taxonomy features present as supported user paths rather than partial config fragments?
- Are setup, normal operation, status or inspection, recovery, and removal paths represented where relevant?
- Are channel, account, workspace, auth, task, and delegate variants covered where the category expects them?
- Do known gaps leave major coordination or isolation branches missing?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness is the operator-facing system for setup, isolation, conversation routing, account routing, specialist lanes, delegate identity, status, recovery, and safe defaults.
- A complete category lets multiple agents be created, isolated, routed, delegated, and inspected without implicit cross-agent leakage.
- Undocumented config, nondeterministic routing, or unclear ownership of state, credentials, and outbound delivery are material completeness gaps.
## Category Scope
- Agent Setup: add agents, agent list/delete, identity files, non-interactive setup, and single-agent default.
- Agent Isolation: workspace separation, state separation, auth separation, session separation, and tool profiles.
- Conversation Routing: agent selection, route precedence, default fallback, peer overrides, and cross-channel examples.
- Account Routing: multi-account setup, account selection, default accounts, account credentials, and delivery targets.
- Specialist Lanes: lane contracts, background handoff, concurrency controls, priority controls, and coordinator handoff.
- Delegate Identities: named delegates, authority model, delegate tiers, identity delegation, and organizational assistants.

View File

@@ -1,11 +0,0 @@
# Native Windows CLI and Gateway Completeness
Use this rubric when assigning category Completeness scores for the
`native-windows-cli-and-gateway` surface.
## Category Scope
- Setup: PowerShell installer, Node and package-manager bootstrap, npm global install, Packaged CLI launcher, Windows command shims, openclaw onboard, Local Gateway config, Daemon install flags, Native-vs-WSL setup boundary
- Gateway Management: openclaw gateway, Foreground runtime health/readiness, Windows-specific restart/signal, Unmanaged foreground mode, openclaw gateway install, Gateway launcher files, Scheduled Task runtime status, Startup-folder fallback, openclaw status, Windows service inspection, Post-install diagnostics
- Networking: Native Windows host binding, netsh interface portproxy, Gateway status and probe output, Loopback, LAN, and WSL boundary
- Updates: openclaw update on native Windows package, Managed Gateway stop/restart, Detached update handoff, Windows package locks

View File

@@ -1,12 +0,0 @@
# Native Windows companion app Completeness
Use this rubric when assigning category Completeness scores for the
`native-windows-companion-app` surface.
## Category Scope
- Installation and Updates: Official app download, MSI/MSIX/App Installer/winget-style packaging, Windows architecture handling for x64, App release channel
- Gateway Connection: App-managed local Gateway attach/start, Remote Gateway connection modes, Device/node pairing
- Chat Sessions: Native Windows chat window, Gateway chat transport
- Status and Repair: App health states, App-specific repair, Windows system tray app, Status indicators, App-specific notification permission
- Desktop Tools and Permissions: Windows node identity, Host command execution, Desktop command policy, App approval prompts, Screen and media capture, Canvas host behavior, Windows shell integrations, App secrets, Windows ACL, Command approval

View File

@@ -1,12 +0,0 @@
# Nix install path Completeness
Use this rubric when assigning category Completeness scores for the
`nix-install-path` surface.
## Category Scope
- Install Handoff: Nix install overview, nix-openclaw source-of-truth, Install discoverability, Verification handoff
- Plugin Lifecycle: Lifecycle command refusal, Declarative plugin selection, Nix-store plugin loading, Hardlink safety
- Activation and App UX: Environment activation, macOS defaults activation, Runtime Nix-mode detection, Stable Nix defaults, Managed-by-Nix banner, Read-only config controls, Onboarding skip
- Config and State: Immutable config guard, Config writer refusal, Agent-first Nix edits, Explicit config path, Writable state directory, Immutable-store config support, State integrity checks
- Service Runtime and Guards: Nix profile PATH discovery, Profile precedence, Service PATH fallback, Trusted binary boundaries, Setup write refusal, Doctor repair refusal, Update handoff, Service lifecycle handoff

View File

@@ -1,12 +0,0 @@
# OpenAI / Codex provider path Completeness
Use this rubric when assigning category Completeness scores for the
`openai-codex-provider-path` surface.
## Category Scope
- Model and Auth: Canonical OpenAI Model Routing, Catalog, Codex OAuth Profiles, Subscription Usage, Doctor Diagnostics, Operator Repair
- Responses and Tool Compatibility: Codex Responses Transport, Payload Compatibility, Tool Context, Capability Compatibility
- Native Codex Harness: Native Codex App-server Harness, Thread Lifecycle
- Image and Multimodal Input: Image Generation Editing, Multimodal Input
- Voice and Realtime Audio: Realtime Voice Transcription, Speech

View File

@@ -1,31 +0,0 @@
# OpenClaw App SDK Completeness
Use this rubric when assigning category Completeness scores for the
`openclaw-app-sdk` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Can an external app developer complete the category workflow using public SDK APIs?
- Are the taxonomy features represented by stable client contracts rather than protocol-only fragments?
- Are setup, authentication, streaming, result handling, error behavior, and compatibility expectations documented?
- Are browser, Node, React, testing, and custom transport variants covered where the category expects them?
- Do known gaps leave major external-app capability branches missing?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness is the external app-developer workflow from connection through agent runs, sessions, events, approvals, resources, compatibility, and operational error handling.
- A complete SDK category exposes typed, documented, reusable client APIs instead of requiring low-level Gateway protocol work.
- Manual Gateway frame construction or reliance on internal package shapes is a material completeness gap.
## Category Scope
- Client API: SDK entrypoints, namespace layout, package split, and app/plugin boundary.
- Gateway Access: Gateway connect, URL and token config, auto gateway, custom transport, and scopes/redaction.
- Agent Conversations: agent handles, agent runs, run results, session creation, session send, and session controls.
- Events and Approvals: event stream, event envelope, replay cursors, approval callbacks, and questions.
- Resource Helpers: models, ToolSpace, artifacts, tasks, and environments.
- Compatibility: generated client, ergonomic wrappers, unsupported calls, schema alignment, and public package contract.

View File

@@ -1,11 +0,0 @@
# OpenRouter provider path Completeness
Use this rubric when assigning category Completeness scores for the
`openrouter-provider-path` surface.
## Category Scope
- Provider Setup and Auth: First-run setup, Default model selection, Provider plugin registration, Model-ref examples, OPENROUTER_API_KEY, Auth profiles and auth order, Status/probe and removal, Provider-entry SecretRef/API-key resolution, Gateway env inheritance, Static catalog rows, Dynamic /models discovery, openrouter/auto and nested refs, Free-model scan/probe, Model list/picker cache
- Chat Runtime and Normalization: Chat completions route, Provider routing params, Per-model route overrides, Reasoning payload policy, Anthropic/Gemini/DeepSeek variants, Streamed content parsing, reasoning_details visible output, Tool-call delta preservation, Family-specific replay policy, Response-model and usage normalization, Attribution headers, Response-cache headers/TTL/clear, Anthropic cache-control markers, Cache usage mapping, Custom proxy exclusions
- Provider Recovery and Diagnostics: Timeout/retry classification, Auth/billing/key-limit classification, Context overflow, Model fallback notices, Guarded fetch/pricing warnings
- Media Generation and Speech: image_generate OpenRouter route, video_generate async jobs/polling/download, music_generate audio route, Text-to-speech, Speech-to-text transcription, Inbound media understanding, Generated artifact delivery

View File

@@ -1,40 +0,0 @@
# Plugin Surface Completeness
Use this rubric when assigning category Completeness scores for the
`plugin-sdk-and-bundled-plugin-architecture` surface.
## Surface-Specific Scoring Questions
For each category, ask:
- Can the intended plugin task be completed end to end by an author or
operator?
- Are the important plugin variants present for this category, such as channel,
provider, tool, bundled, local, npm, or ClawHub flows?
- Are the main lifecycle stages present where relevant: create, configure,
validate, run, update, and remove or roll back?
- Are compatibility, approval, or safety branches present when the category
implies them?
- Are important author/operator-visible gaps still forcing workarounds or
unsupported paths?
## Surface-Specific Guidance
Variation from the default completeness process:
- Completeness is the plugin author or operator lifecycle for authoring, packaging, installing, running, approving, publishing, and testing plugins, not just SDK or runtime primitives.
- Score the plugin surface against the full plugin journey, not only one import path, packaging mode, or runtime path.
- Bundled-only support or support for only selected plugin families is incomplete when the category implies broader plugin capability.
- Publishing and testing categories should include expected lifecycle support, not just raw commands or fixtures.
## Category Scope
- Authoring and Packaging plugins: Root SDK entrypoint, Focused SDK imports, Entrypoint discovery, Migration shims, Plugin manifest, Package metadata, Runtime compatibility, Validation feedback
- Bundled plugins: Bundled plugin listing, Bundled source overlays, Packaged bundled plugins, Generated plugin inventory, Bundled channel IDs
- Canvas plugin: Hosted Canvas and A2UI surfaces, Agent canvas tool, Node Canvas commands, Control UI embeds, Canvas documents, A2UI transport and snapshots
- Installing and running plugins: Plugin setup, Runtime activation, Enable and disable, Safe load failures, Dependency repair, Install update and uninstall
- Channel plugins: Inbound event handling, Outbound delivery, Ingress authorization, Destination resolution, Native approval prompts
- Provider and tool plugins: Provider plugins, Tool plugins, Model catalogs, Provider auth, Web search and fetch, Mixed plugins
- Plugin approvals: Approval requests, Native approval delivery, Same-chat fallbacks, Exec and plugin separation, Approval replay protection, Security helpers
- Publishing plugins: Install sources, ClawHub publishing, npm publishing, Compatibility signaling, Update and rollback expectations, Third-party publication rules
- Testing plugins: Test fixtures, Local test environment, Plugin runtime harness, Unit and integration scaffolds, Docker lifecycle suites, Smoke tests

View File

@@ -1,11 +0,0 @@
# Raspberry Pi / small Linux devices Completeness
Use this rubric when assigning category Completeness scores for the
`raspberry-pi-small-linux-devices` surface.
## Category Scope
- Setup and Compatibility: Hardware and 64-bit OS requirements, Node runtime setup, OpenClaw install and onboarding, First-run verification, Supported Pi model selection, 64-bit ARM boundary, Unsupported device guidance, Slow-device caveats, npm/pnpm/Bun install modes, Installer architecture detection, Optional ARM binary checks, Fallback/build guidance
- Remote Access and Auth: Headless API-key auth, Gateway shared-secret auth, Device pairing approvals, SecretRef handling, Token drift recovery, SSH tunnel dashboard access, Tailscale Serve/Funnel, Loopback/non-loopback exposure controls, Authenticated Control UI access
- Gateway Runtime: Always-on Gateway process, Cloud model configuration, Channel startup, Gateway health/status, User service install, linger/boot persistence, Service drop-ins, Restart tuning, Status/log inspection, Backup/restore
- Performance and Diagnostics: Swap and low-RAM tuning, USB SSD guidance, Compile cache/no-respawn settings, OOM/performance troubleshooting, Diagnostics bundles

View File

@@ -1,13 +0,0 @@
# Security, auth, pairing, and secrets Completeness
Use this rubric when assigning category Completeness scores for the
`security-auth-pairing-and-secrets` surface.
## Category Scope
- Approval Policy and Tool Safeguards: Approval Policy, Dangerous Tool Safeguards
- Gateway Auth and Remote Access: Shared Gateway token/password auth, Gateway auth mode, Trusted-proxy identity, Tailscale Serve/Funnel, Bind and origin restrictions, WebSocket handshake auth, Operator-facing docs, Browser Control UI, Remote Client Trust
- Channel Access Control: Channel Identity, Allowlists, Sender Pairing
- Device and Node Pairing: Setup codes, Device identity creation, Device-token issuance, Device pairing approvals for operator, Operator scopes that gate pairing, Local Control UI, Auth migration, Operator-facing docs, Node Pairing, Capability Trust, Remote Exec Approvals
- Plugin Trust: Plugin Installation Trust, Security Boundaries
- Credential and Secret Hygiene: Provider Auth Profiles, API Key Health, Secrets Storage, Redaction, Configuration Hygiene

View File

@@ -1,17 +0,0 @@
# Session, memory, and context engine Completeness
Use this rubric when assigning category Completeness scores for the
`session-memory-and-context-engine` surface.
## Category Scope
- CLI Session and Transcript Management: CLI Session, Transcript Management
- Compaction, Pruning, and Token Pressure: Compaction, Pruning, Token Pressure
- Context Engine and Runtime Assembly: Context Engine, Runtime Assembly
- Cross-client History and Session Parity: Cross-client History, Session Parity
- Diagnostics, Maintenance, and Recovery: Diagnostics, Maintenance, Recovery
- Instruction Profile and Context Visibility: Instruction Profile, Context Visibility
- Memory Backend Storage and Embedding Search: Memory Backend Storage, Embedding Search
- Memory Files, Tools, and Active Memory: Memory Files, Tools, Active Memory
- Session Routing and Conversation Binding: Session Routing, Conversation Binding
- Transcript Persistence and Durability: Transcript Persistence, Durability

View File

@@ -1,12 +0,0 @@
# Signal Completeness
Use this rubric when assigning category Completeness scores for the
`signal` surface.
## Category Scope
- Setup and Account Health: QR link setup, SMS registration, Installer and binary setup, Container account provisioning, Status probes, Setup diagnostics, Account safety guardrails
- Conversation Access and Routing: DM pairing, DM allowlists, Sender identity normalization, Group allowlists, Mention gates, Pending group history
- Message Delivery and Actions: Text delivery targets, Media delivery and limits, Typing and read receipts, Styled/chunked output, Reaction action discovery, Add/remove reactions, Group reaction targeting
- Native Approvals: Native approval routing, Reaction approval responses, Approver targeting
- Transport: Native daemon transport, Container transport, API mode selection, Receive reconnect/readiness

View File

@@ -1,12 +0,0 @@
# Slack Completeness
Use this rubric when assigning category Completeness scores for the
`slack` surface.
## Category Scope
- Channel Setup and Operations: App Install, Slack app credentials, Manifest, Scopes, Channel status diagnostics, Slack account status, Operator Repair, Socket, HTTP transport, Runtime Lifecycle, Socket, HTTP transport, Runtime Lifecycle, Channel status diagnostics, Slack account status, Operator Repair
- Access and Identity: Channel allowlists, Thread routing, Session Isolation, DM Pairing, Sender Authorization
- Conversation Routing and Delivery: Channel allowlists, Thread routing, Session Isolation, DM Pairing, Sender Authorization, Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision, Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision
- Media and Rich Content: Outbound Delivery, Streaming, Reactions, Media, Attachments, Files, Vision
- Native Controls and Approvals: Slash Commands, Native Command Routing, Interactive Replies, App Home, Assistant Events, Native Approvals, Actions, Security-sensitive Ops, Interactive Replies, App Home, Assistant Events, Native Approvals, Actions, Security-sensitive Ops

View File

@@ -1,12 +0,0 @@
# Telegram Completeness
Use this rubric when assigning category Completeness scores for the
`telegram` surface.
## Category Scope
- Channel Setup and Operations: BotFather token creation, TELEGRAM_BOT_TOKEN, Setup wizard credential capture, Startup getMe, Doctor/status surfacing, Named account configuration, CLI/message-tool targets, Directory adapters, Channel status, Account-scoped outbound, Long polling runner startup, Webhook listener startup, Reconnect, Restart, Named account configuration, Directory adapters and configured peers/groups for, Channel status, Account-scoped outbound, Long polling runner startup, Reconnect, Restart
- Access and Identity: dmPolicy modes, Pairing-code approval, Numeric Telegram user ID normalization with telegram, allowFrom, Unauthorized DM, Group allowlists, Supergroup negative chat IDs, Forum topic session keys, ACP topic routing, Session key construction
- Conversation Routing and Delivery: dmPolicy modes, Pairing-code approval, Numeric Telegram user ID normalization with telegram, allowFrom, Unauthorized DM, Group allowlists, Supergroup negative chat IDs, Forum topic session keys, ACP topic routing, Session key construction, Inbound media download, Voice notes, Location, Poll sending, Reactions, Text, Preview streaming, Reply threading tags, Durable outbound message recording, Voice notes, Poll sending, Reply threading tags, Durable outbound message recording
- Media and Rich Content: Inbound media download, Voice notes, Location, Poll sending, Reactions, Text, Preview streaming, Reply threading tags, Durable outbound message recording, Voice notes, Poll sending, Reply threading tags, Durable outbound message recording, Inbound media download, Voice notes, Location and venue extraction into channel context, Poll sending, Reactions
- Native Controls and Approvals: Inline keyboard rendering, Exec approvals in DMs, Message actions, Action capability discovery, Native setMyCommands startup sync, Command name/description normalization, Built-in commands, Command authorization in DMs, Model buttons, Native `setMyCommands` startup sync, Command name/description normalization, Built-in commands such as `/help`, Command authorization in DMs, Model buttons and command UI helpers

View File

@@ -1,12 +0,0 @@
# Observability Completeness
Use this rubric when assigning category Completeness scores for the
`telemetry-diagnostics-and-observability` surface.
## Category Scope
- Health and Repair: Background health-monitor loop, Per-account enable/disable settings, Startup grace, Restart logging, openclaw doctor, Structured health checks, Core doctor checks, Plugin SDK doctor/health contracts, openclaw status, openclaw health, Gateway RPC health, Cached health snapshots
- Logging: Rolling Gateway JSONL file logs, openclaw logs, Gateway RPC logs.tail, Redaction patterns and sinks, Trace correlation fields
- Diagnostic Collection: openclaw gateway diagnostics export, openclaw gateway stability --bundle, Chat /diagnostics, Support zip composition, Bounded in-process stability recorder, openclaw gateway stability, Memory pressure events, Critical memory pressure snapshot option
- Telemetry Export: Diagnostic event types, Async dispatch, W3C trace context creation, Plugin SDK diagnostic runtime exports, Model-call diagnostic events, diagnostics-otel plugin install, OTLP/HTTP traces, Trusted trace context, Model and runtime telemetry, diagnostics-prometheus plugin install, Gateway-authenticated GET /api/diagnostics/prometheus, Prometheus text exposition, Trusted diagnostic event subscription
- Session Diagnostics: session.state, Diagnostic session activity snapshots, Model usage, Export of session signals to stability

View File

@@ -1,12 +0,0 @@
# TUI Completeness
Use this rubric when assigning category Completeness scores for the
`tui-and-terminal-ux` surface.
## Category Scope
- Runtime Modes: Gateway TUI launch, Local chat launch, Terminal alias launch, Initial message launch, Launch option validation, Gateway connection, Gateway authentication, History load on attach, Reconnect visibility, Gateway command RPCs, Embedded local chat, Local auth flow, Config repair loop, Gateway-free recovery
- Input and Commands: Message composition, Input history, Keyboard shortcuts, Paste and busy-submit handling, IME and AltGr handling, Slash Commands, Pickers, Settings
- Session Management: Session Lifecycle, History, Resume
- Local Shell Execution: Bang-command routing, Approval prompt, Command output display, Execution environment marker
- Rendering and Output Safety: Streaming Message Rendering, Tool Cards, Terminal Rendering Primitives, Output Safety

View File

@@ -1,13 +0,0 @@
# Voice and realtime talk Completeness
Use this rubric when assigning category Completeness scores for the
`voice-and-realtime-talk` surface.
## Category Scope
- Talk Providers: OpenAI Realtime voice backend bridge, Google Gemini Live backend bridge, Realtime voice provider SDK contracts, Provider diagnostics, Talk catalog, Talk provider config, Shared native config parsing
- Realtime Talk Sessions: Agent consult handoff, Active Talk agent-run status, Talkback runtime behavior, Forced consult scheduling, Browser Talk start/stop UI, Browser WebRTC sessions, Browser relay mode, Browser tool-call forwarding, Realtime session controls, Gateway relay sessions, Audio-frame limits
- Speech and Transcription: Voice directives, Talk speech playback, Transcription relay sessions, Realtime transcription providers, Native directive parsing
- Native App Talk: macOS native Talk mode, iOS Talk mode, Android Talk mode, Shared Talk config
- Voice Wake and Routing: Wake-word settings, Wake routing, macOS Voice Wake runtime, Mobile wake preferences
- Talk Observability: Talk event logging, Session-log health, Live smoke output, Prometheus diagnostic counters, Operator visibility into setup

View File

@@ -1,12 +0,0 @@
# Voice Call channel Completeness
Use this rubric when assigning category Completeness scores for the
`voice-call-channel` surface.
## Category Scope
- Channel Setup and Operations: Voice Call Channel, Voice Call Channel, Voice Call Channel
- Access and Identity: Voice Call Channel
- Conversation Routing and Delivery: Voice Call Channel
- Media and Rich Content: Voice Call Channel, Voice Call Channel
- Realtime Voice and Calls: Voice Call Channel, Voice Call Channel, Voice Call Channel, Voice Call Channel, Voice Call Channel

View File

@@ -1,12 +0,0 @@
# watchOS companion surfaces Completeness
Use this rubric when assigning category Completeness scores for the
`watchos-companion-surfaces` surface.
## Category Scope
- Delivery and Recovery: APNs relay/direct registration as it affects, Silent push, Pending approval recovery IDs, Gateway-side iOS exec approval, iPhone-side WatchConnectivity transport, Watch-side receiver activation, Delivery fallback among reachable messages
- Exec Approvals: Watch exec approval prompt, Watch approval list/detail UI, iPhone-side prompt caching
- Distribution and Support: Watch app, Signing/profile variables, Public/support status, Changelog, Release metadata, Historical bug/regression themes relevant to scoring
- Notifications and Replies: watch.status, Payload normalization, Mirrored iOS notification fallback when watch, Watch action buttons from generic prompt, Watch-to-iPhone reply payloads, iPhone-side dedupe, Mirrored iOS notification action
- Watch App UI: Watch app entry point, Generic inbox, Persistent watch inbox state

View File

@@ -1,11 +0,0 @@
# Web search tools Completeness
Use this rubric when assigning category Completeness scores for the
`web-search-tools` surface.
## Category Scope
- Search Providers: API-backed providers, Keyless and self-hosted providers, Provider comparison and auto-detection, Provider-specific filters and extraction, Result normalization, OpenAI native web_search, Codex native web_search, Gemini grounding, Grok web grounding, Kimi web search, Provider-native citations, Model and filter routing, webSearchProviders, registerWebSearchProvider, webFetchProviders, registerWebFetchProvider, public-artifact loading, runtime resolution, contract tests
- Setup and Diagnostics: Provider credentials, Default provider selection, Credential repair, Status checks, Quota errors, Cache controls, Provider diagnostics, Retry and fallback, Operator repair
- Network Safety: Network Safety, SSRF, Redirects, Untrusted Content
- Tool Availability and Fetch: web_search exposure, web_fetch exposure, x_search exposure, group:web policy, disabled-state diagnostics, provider/model gating, URL fetch, HTML extraction, PDF/text extraction, Safe truncation, Content citation handoff

View File

@@ -1,12 +0,0 @@
# WhatsApp Completeness
Use this rubric when assigning category Completeness scores for the
`whatsapp` surface.
## Category Scope
- Channel Setup and Operations: Official @openclaw/whatsapp plugin metadata, openclaw plugin install whatsapp, Channel config schema, Baileys socket lifecycle, Operator troubleshooting, Baileys socket lifecycle, Operator troubleshooting for reconnect loops
- Access and Identity: QR login, Baileys multi-file auth persistence, DM pairing challenge, Multi-account/default-account resolution, Direct-message dmPolicy, Sender identity extraction, Privacy controls for plugin hooks, Direct-message `dmPolicy`, Sender identity extraction, Privacy controls for plugin hooks and
- Conversation Routing and Delivery: Group allowlists, Group session keys, Outbound text sends, Provider-accepted receipts, Outbound text sends, Provider-accepted receipts and durable delivery identifiers
- Media and Rich Content: Inbound media download, Outbound image
- Native Controls and Approvals: Native exec, Approver target resolution

View File

@@ -1,12 +0,0 @@
# Windows via WSL2 Completeness
Use this rubric when assigning category Completeness scores for the
`windows-via-wsl2` surface.
## Category Scope
- WSL Setup and Updates: WSL2 + Ubuntu installation, Node runtime, Linux install flow inside WSL2, WSL2 runtime boundary, WSL2 network-family requirements, Source install and build inside WSL2, openclaw update, npm/pnpm/git package-root, Managed systemd Gateway restart, Service metadata refresh, Package-manager caveats
- Gateway Service Lifecycle: Onboarded systemd install, Gateway service install, systemd user unit rendering, WSL-aware systemd unavailable hints, Doctor service repair, WSL user-service linger, Systemd availability after Windows boot, Windows startup task for WSL, Verification before Windows sign-in, Clear expectations around PC power
- Gateway Access and Exposure: Gateway token/password auth, Provider credentials, Gateway auth SecretRefs, Remote URL credential precedence, WSL virtual network, Windows portproxy setup, Windows Firewall rules, Reachable Gateway URLs, Loopback and LAN exposure, WSL2 IPv4 networking, Tailscale remote access
- Diagnostics and Repair: openclaw doctor, openclaw status, openclaw logs, SecretRef, WSL/systemd unavailable hints, Operator repair guidance after WSL2 service
- Browser and Control UI: WSL2 Gateway with Windows browser, Windows Control UI URL, Raw remote CDP to Windows Chrome, Host-local Chrome MCP, Browser profile cdpUrl, Layered diagnostics

View File

@@ -98,7 +98,7 @@ Do not close from title alone. If closing as done on main or nonsensical, prove
When asked for `5 new`, exclude refs already surfaced in the session and refill from the archive until there are 5 live-open candidates. If fewer than 5 remain open, list all open ones and say how many short.
When asked to `update`, `refresh`, `recheck`, `check again`, or similar, return an updated live-open candidate list. Sort by maintainer importance, not recency: high-impact ready fixes first, then useful-but-review-first, then open/not-ready items. Do not include a "changed since last pass" section or bottom-line merged/closed summary unless the user explicitly asks for churn.
When asked to `update`, `refresh`, `recheck`, `check again`, or similar, return an updated live-open candidate list. Do not fill the main list with items that merely merged/closed since the last pass; put those numbers in a short bottom line.
Prefer:
@@ -142,20 +142,18 @@ No Markdown tables. Compact bullets. Use color/risk markers:
Required line shape:
```markdown
- **PR #81244** `@whatsskill.` `+118/-1` `bug` 🟢 https://github.com/openclaw/openclaw/pull/81244 - Prevents chat action buttons from overlapping short assistant replies. Verifiable: yes. Blast: web chat rendering, low.
- **Issue #81245** `@alice` `LOC n/a` `bug` 🟡 https://github.com/openclaw/openclaw/issues/81245 - Reports duplicate Telegram replies when reconnecting after gateway restart. Verifiable: partial. Blast: Telegram channel runtime, medium.
- **PR #81244** `@whatsskill.` `+118/-1` `bug` 🟢 verifiable: yes. This prevents chat action buttons from overlapping short assistant replies. Blast: web chat rendering, low.
- **Issue #81245** `@alice` `LOC n/a` `bug` 🟡 verifiable: partial. This reports duplicate Telegram replies when reconnecting after gateway restart. Blast: Telegram channel runtime, medium.
```
Rules:
- Bold the `PR #n` or `Issue #n` marker.
- Use `@handle`, not author bio text.
- Always include the full GitHub URL.
- Include a one-line description after the URL, separated with `-`.
- PR LOC is `+additions/-deletions`; issue LOC is `LOC n/a`.
- Type: `bug`, `feature`, `perf`, `security`, `docs`, `test`, `chore`, or `refactor`.
- Write a full sentence for what it does.
- Always include blast radius in one phrase.
- Always include `verifiable: yes|partial|no` plus the shortest proof hint when helpful.
- If status is not open, still show it only when the user asked for all surfaced refs; use ✅ or ⚪ and state merged/closed.
- For refresh-style asks, prefer section order: `Best Open Now`, `Useful But Review First`, `Still Open / Not Ready`. Omit merged/closed churn by default.
- For refresh-style asks, bottom line: `Merged/closed since last pass: #81016 merged, #81026 closed.` Omit if none.

View File

@@ -1,74 +0,0 @@
---
name: control-ui-e2e
description: Use when testing, fixing, or extending the OpenClaw Control UI GUI with Vitest + Playwright end-to-end checks, mocked Gateway WebSocket flows, mocked dashboard runs, screenshots/videos, or agent-verifiable browser proof.
---
# Control UI E2E
Use this for Control UI changes that need a real browser flow with deterministic Gateway data.
## Test Shape
- Use `ui/src/**/*.e2e.test.ts` for full GUI flows.
- Use `ui/src/test-helpers/control-ui-e2e.ts` to start the Vite Control UI and install a mocked Gateway WebSocket.
- Keep scenarios deterministic. Do not use live provider keys, real channel credentials, or a real Gateway unless the user explicitly asks for live proof.
- Prefer existing `.browser.test.ts` or unit tests for narrow rendering logic; use this E2E lane when the proof should cover routing, app boot, Gateway handshake, requests, and visible UI behavior together.
## Commands
- Target one E2E test in a Codex worktree:
```bash
node scripts/run-vitest.mjs run --config test/vitest/vitest.ui-e2e.config.ts --configLoader runner ui/src/ui/e2e/chat-flow.e2e.test.ts
```
- Run the whole local lane in a normal checkout:
```bash
pnpm test:ui:e2e
```
If dependencies are missing in a Codex worktree, install once with `pnpm install`; for broad GUI proof or dependency-heavy checks, use Testbox/Crabbox instead of running a wide local pnpm lane.
## Visual Proof Default
When running mocked Control UI/dashboard validation for a user-facing feature, produce visual proof by default unless the user explicitly opts out.
- Keep the Vitest E2E assertions deterministic; do not commit generated screenshots or videos.
- After or alongside the focused E2E test, run the mocked Control UI app when available, for example `pnpm dev:ui:mock -- --port <port>`.
- Drive Chromium with Playwright against the local mock URL and capture a video plus screenshots for each meaningful state: initial view, interaction input, result state, and final/paginated/selected state.
- Use `browser.newContext({ recordVideo: { dir, size }, viewport })`, `page.screenshot({ path })`, and close the context before reporting the video path.
- Put artifacts under `.artifacts/control-ui-e2e/<short-feature-name>/` or another clearly named local temp directory, and report the absolute paths in the final answer.
- Treat recording as validation, not only demo capture. If the recorder fails or shows surprising behavior, stop, fix the behavior, add or update a regression test, then rerecord.
- If visual proof is blocked, state the exact blocker and still report the textual E2E evidence.
## Mock Pattern
Start the app server, install the mock before `page.goto`, then assert both Gateway traffic and visible UI:
```ts
const server = await startControlUiE2eServer();
const page = await context.newPage();
const gateway = await installMockGateway(page, {
historyMessages: [{ role: "assistant", content: [{ type: "text", text: "Ready." }] }],
});
await page.goto(`${server.baseUrl}chat`);
await page.locator(".agent-chat__composer-combobox textarea").fill("hello");
await page.getByRole("button", { name: "Send message" }).click();
const request = await gateway.waitForRequest("chat.send");
await gateway.emitChatFinal({ runId: String(request.params.idempotencyKey), text: "Done." });
await page.getByText("Done.").waitFor();
```
Extend `installMockGateway` with typed scenario options or method responses when a new flow needs more Gateway surface.
## Standalone Recording
When recording an already-running mocked Control UI URL, use a temporary Playwright script or `playwright test` spec and keep the recording flow focused:
- Open the mock URL, interact through stable `data-*` selectors or user-facing role selectors, and wait on asserted states instead of relying on fixed sleeps.
- Assert both visible UI state and mocked Gateway traffic for request-driven flows. For example, verify the expected count/row is visible and that `sessions.list` was called with the expected `search`, `offset`, and `limit`.
- Use short sleeps only after assertions to make the captured video readable.
- Store the generated video under `.artifacts/control-ui-e2e/<feature>/`; do not commit it.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Control UI E2E"
short_description: "Mocked browser E2E for Control UI"
default_prompt: "Use $control-ui-e2e to verify a Control UI change with the mocked Vitest + Playwright browser lane."

View File

@@ -44,9 +44,7 @@ pnpm crabbox:run -- --help | sed -n '1,120p'
- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
shim can be stale.
- Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider`
means brokered AWS for normal Linux/macOS paths; the wrapper selects Azure
for unqualified Windows/WSL2 runs when the local Crabbox binary advertises
Azure.
means brokered AWS today.
- The brokered AWS default is a Linux developer image in `eu-west-1`; the repo
config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply.
If warmup drifts well past the minute-scale path, verify image promotion,
@@ -54,13 +52,6 @@ pnpm crabbox:run -- --help | sed -n '1,120p'
- For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with
`--provider blacksmith-testbox` or the repo Testbox helpers when the standing
Testbox policy applies.
- Cold Testbox acquisition and hydration often take tens of seconds. When broad
remote proof is likely, immediately start
`node scripts/crabbox-wrapper.mjs warmup --provider blacksmith-testbox --keep --timing-json`
in a background command session while inspecting, editing, and running
focused local tests. Poll later, reuse the returned `tbx_...` with
`--provider blacksmith-testbox --id <tbx_id>`, and stop it before handoff.
Do not warm speculatively when remote proof is unlikely.
- Always report the actual provider and id. `cbx_...` means AWS Crabbox;
`tbx_...` means Blacksmith Testbox through Crabbox. If the output only says
`blacksmith testbox list`, use `blacksmith testbox list --all` before
@@ -91,16 +82,18 @@ Use these only when the task needs an existing non-Linux host. OpenClaw broad
Linux validation uses the repo Crabbox config unless a provider is explicitly
requested.
Native brokered Windows is available for Windows-specific proof. Prefer Azure
for Windows/WSL2 when the subscription has quota or credits and the local
Crabbox binary advertises Azure. Keep broad Linux gates on Linux/Testbox unless
the bug is Windows-specific, and only force AWS when the operator asks for the
older AWS developer image/cache path or Azure is unavailable:
Native brokered Windows is available for Windows-specific proof. Use the AWS
developer image in `us-west-2` on demand; it has the expected OpenClaw developer
toolchain and Docker image cache. Keep broad Linux gates on Linux/Testbox unless
the bug is Windows-specific:
```sh
pnpm crabbox:warmup -- \
../crabbox/bin/crabbox warmup \
--provider aws \
--target windows \
--windows-mode wsl2 \
--windows-mode normal \
--region us-west-2 \
--market on-demand \
--timing-json
```
@@ -156,7 +149,7 @@ pnpm crabbox:run -- \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm test:changed"
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
```
Full suite:
@@ -167,14 +160,9 @@ pnpm crabbox:run -- \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm verify"
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
```
Use `pnpm verify` when you need check plus full Vitest proof. It emits
`CRABBOX_PHASE:check` and `CRABBOX_PHASE:test`, making Crabbox summaries show
which stage failed. Use plain `pnpm test` only when check proof is already
covered or intentionally skipped.
Focused rerun:
```sh
@@ -183,7 +171,7 @@ pnpm crabbox:run -- \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm test <path-or-filter>"
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
```
Read the JSON summary. Useful fields:
@@ -218,7 +206,7 @@ node scripts/crabbox-wrapper.mjs run \
--ttl 240m \
--timing-json \
-- \
corepack pnpm check:changed
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm check:changed
```
Read the JSON summary and the Testbox line. Useful fields:
@@ -230,21 +218,6 @@ Read the JSON summary and the Testbox line. Useful fields:
- Actions run URL/id from the Testbox output
- `exitCode`
Use provider-backed cache volumes only for rebuildable caches, not secrets or
checkout state. On Blacksmith, Crabbox forwards them as sticky disks:
```sh
node scripts/crabbox-wrapper.mjs run \
--provider blacksmith-testbox \
--cache-volume pnpm-store=openclaw-node24-pnpm-lock:/tmp/openclaw-pnpm-store \
--timing-json \
-- \
corepack pnpm check:changed
```
The selected provider must advertise cache-volume support. If not, omit
`--cache-volume` and rely on kept-lease caches.
`blacksmith testbox list` may hide hydrating or ready boxes. Use:
```sh
@@ -571,14 +544,14 @@ If brokered AWS cannot dispatch, sync, attach, or stop, retry once with
```sh
pnpm crabbox:run -- --debug --timing-json -- \
pnpm test:changed
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed
```
Full suite:
```sh
pnpm crabbox:run -- --debug --timing-json -- \
pnpm test
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test
```
Auth fallback, only when `blacksmith` says auth is missing:
@@ -612,14 +585,13 @@ Crabbox Blacksmith backend delegates setup to:
The hydration workflow owns checkout, Node/pnpm setup, dependency install,
secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
execution, timing, logs/results, cleanup, and cache-volume requests. Blacksmith
implements cache volumes as sticky disks.
execution, timing, logs/results, and cleanup.
Minimal Blacksmith-backed Crabbox run, from repo root:
```sh
pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- \
corepack pnpm test:changed
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed
```
Use direct Blacksmith only when Crabbox is the broken layer and you are
@@ -645,7 +617,7 @@ provider deliberately.
```sh
pnpm crabbox:warmup -- --class beast --market on-demand --idle-timeout 90m
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "pnpm test:changed"
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
pnpm crabbox:stop -- <cbx_id-or-slug>
```
@@ -708,7 +680,6 @@ crabbox events <run_id> --json
crabbox logs <run_id>
crabbox results <run_id>
crabbox cache stats --id <id-or-slug>
crabbox cache volumes
crabbox ssh --id <id-or-slug>
blacksmith testbox list
```

View File

@@ -1,51 +0,0 @@
---
name: discord-user-post
description: Post an approved message as the logged-in Discord user through the Discord desktop app. Use for release announcements or other direct user-authored Discord posts; not for OpenClaw channel sends, bots, webhooks, relays, agent sessions, or archive search.
---
# Discord User Post
Use `$computer-use` to operate `/Applications/Discord.app` in the user's
existing logged-in session. This workflow represents the user directly.
## Prepare
1. Draft the complete final message outside Discord.
2. Confirm the intended server and channel with the user when either is
ambiguous.
3. Open Discord and navigate to the exact destination without entering the
message.
4. Verify the visible server name, channel header, and logged-in account.
Do not infer the target from unrelated Discord content. Stop if Discord is not
logged in, the account is wrong, or the exact destination cannot be verified.
## Confirm and Post
Posting is representational communication. Follow the `$computer-use`
confirmation policy even when the user previously asked for an announcement:
1. Show the user the exact final body and verified destination.
2. Request action-time confirmation before typing into Discord.
3. After confirmation, enter the approved body unchanged.
4. Visually inspect the composed message and destination again.
5. Send once.
If the body or destination changes after confirmation, request confirmation
again before sending.
## Verify
- Confirm the message appears once, from the user's account, in the intended
channel.
- Report the server, channel, and visible send result.
- Do not edit, delete, react, or send a follow-up without the corresponding
user instruction and confirmation.
## Guardrails
- Never use `openclaw message`, an OpenClaw agent, a Discord bot, webhook, relay,
or token for this workflow.
- Never expose private Discord content or account details in public output.
- Never send a draft, partial message, duplicate, or unreviewed attachment.
- For Discord archive/history/search, use `$discrawl` instead.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Discord User Post"
short_description: "Post approved messages through the logged-in Discord app"
default_prompt: "Post this approved message as me through the logged-in Discord desktop app."

View File

@@ -1,6 +1,6 @@
---
name: discrawl
description: "Discord archive: search, sync freshness, DMs, summaries, TUI, repo/release work."
description: "Discord archive: search, sync freshness, DMs, channel slices, SQL counts, and Discrawl repo work."
metadata:
openclaw:
homepage: https://github.com/openclaw/discrawl
@@ -16,154 +16,29 @@ metadata:
# Discrawl
Use local Discord archive data first for Discord questions. Hit Discord APIs
only when the archive is stale, missing the requested scope, or the user asks
for current external context.
## Sources
- DB: platform-native XDG data dir, usually
`${XDG_DATA_HOME:-~/.local/share}/discrawl/discrawl.db` on Linux or
`~/Library/Application Support/discrawl/discrawl.db` on macOS
- Config: platform-native XDG config dir, with legacy fallback to
`~/.discrawl/config.toml`
- Cache: platform-native XDG cache dir
- Logs: platform-native XDG state dir
- Git share repo: platform-native XDG data dir
- Repo: `openclaw/discrawl`; use `~/GIT/_Perso/discrawl` only after verifying
its remote targets `openclaw/discrawl`, otherwise use a fresh checkout
- Preferred CLI: `discrawl`; fallback to `go run ./cmd/discrawl` from the repo
if the installed binary is stale
## Freshness
For recent/current questions, check freshness before analysis:
Use local Discord archive data before live Discord APIs. Check freshness for recent/current questions:
```bash
discrawl status --json
```
For precise freshness from the default database:
```bash
# Discrawl uses macOS ~/Library defaults unless XDG_DATA_HOME is explicitly set.
case "$(uname -s)" in
Darwin)
db="$HOME/Library/Application Support/discrawl/discrawl.db"
;;
*)
db="${XDG_DATA_HOME:-$HOME/.local/share}/discrawl/discrawl.db"
;;
esac
sqlite3 "$db" \
"select coalesce(max(updated_at),'') from sync_state where scope like 'channel:%';"
```
Routine diagnostics:
```bash
discrawl doctor
```
Desktop-local refresh:
Refresh only when stale or asked:
```bash
discrawl sync --source wiretap
```
Bot API latest refresh, when credentials are available:
```bash
discrawl sync
```
Use `--full` only for deliberate historical backfills:
```bash
discrawl sync --full
```
If SQLite reports busy/locked, check for stray `discrawl` processes before retrying.
## Query Workflow
1. Resolve scope: guild, channel, DM, author, keyword, date range.
2. Check freshness for recent/current requests.
3. Prefer CLI search/messages for slices; use read-only SQL for exact counts.
4. Report absolute date spans, counts, channel/DM names, and known gaps.
Use root or subcommand help for syntax: `discrawl --help`,
`discrawl help search`, `discrawl search --help`. Use
`DISCRAWL_NO_AUTO_UPDATE=1` for read smokes when you do not want git-share
updates.
Common commands:
Query with bounded slices:
```bash
DISCRAWL_NO_AUTO_UPDATE=1 discrawl search --limit 20 "query"
discrawl messages --channel '#maintainers' --days 7 --all
discrawl dms --last 20
discrawl tui --dm
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select count(*) from messages;"
```
## SQL
Report absolute date spans, channel/DM names, counts, and known gaps. Use read-only SQL for exact counts/rankings. Never use `--unsafe --confirm` unless the user explicitly requests a reviewed DB mutation.
Use `discrawl sql` for exact counts, joins, and ranking queries when normal
CLI reads are too coarse. The command is read-only by default, accepts SQL as
args or stdin, and supports `--json` for agent parsing.
Useful examples:
```bash
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select count(*) as messages from messages;"
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select coalesce(nullif(c.name, ''), m.channel_id) as channel, count(*) as messages from messages m left join channels c on c.id = m.channel_id group by m.channel_id order by messages desc limit 20;"
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select coalesce(nullif(mm.display_name, ''), nullif(mm.global_name, ''), nullif(mm.username, ''), m.author_id) as author, count(*) as messages from messages m left join members mm on mm.guild_id = m.guild_id and mm.user_id = m.author_id group by m.guild_id, m.author_id order by messages desc limit 20;"
```
Never use `--unsafe --confirm` unless the user explicitly asks for a database
mutation and the write has been reviewed.
When the installed CLI lacks a new feature, build or run from a verified
`openclaw/discrawl` checkout before concluding the feature is missing.
## Discord Boundaries
Bot API sync requires configured Discord bot credentials; do not invent token
availability. Desktop wiretap mode reads local Discord Desktop artifacts and
must not extract credentials, use user tokens, call Discord as the user, or
write to Discord application storage. Wiretap/Desktop cache DMs are local-only
and must not be described as part of the published Git snapshot. Git-share
snapshots must not include secrets or `@me` DM rows.
## Verification
For repo edits, prefer existing Go gates:
```bash
GOWORK=off go test ./...
```
Then run targeted CLI smoke for the touched surface, for example:
```bash
discrawl doctor
discrawl status --json
DISCRAWL_NO_AUTO_UPDATE=1 discrawl search --limit 5 "test"
```
## ClawSweeper Sandbox
Use the sandbox reader only:
```bash
discrawl-sandbox search --limit 20 "query"
discrawl-sandbox messages --channel clawtributors --days 7 --all
discrawl-sandbox status --json
```
This reader imports `https://github.com/openclaw/discord-store.git` into
`/root/clawsweeper-sandbox-workspace/.discrawl/discrawl.db` with
`discord.token_source = "none"`. The published Git snapshot is public-channel
filtered; do not use `/root/.discrawl/config.toml` or the rich writer DB from
sandboxed public Discord sessions.
Boundaries: bot sync needs configured Discord bot credentials. Wiretap reads local Discord Desktop artifacts only; do not extract user tokens, call Discord as the user, or write to Discord storage. Git-share snapshots must not include secrets or `@me` DM rows.

View File

@@ -1,209 +0,0 @@
---
name: openclaw-changelog-update
description: Regenerate OpenClaw release changelog sections from git history before beta or stable releases.
---
# OpenClaw Changelog Update
Use this for release changelog rewrites and GitHub release-note source text.
This is mandatory before every beta, beta rerun, stable release, or stable
rerun. Use it with `release-openclaw-maintainer`; this skill owns changelog
content, ordering, grouping, and attribution discipline.
## Goal
Rebuild the target `CHANGELOG.md` version section from a complete, generated
history manifest, not stale draft notes. Produce grouped user-facing release
notes sorted by user interest while preserving every relevant issue/PR ref and
every human `Thanks @...` attribution.
## Inputs
- Target base version: `YYYY.M.PATCH`, without beta suffix.
- Base tag: last reachable shipped release tag, usually the previous stable or
the previous beta train requested by the operator.
- Target ref: exact branch/SHA being released.
## Workflow
1. Start on `main` before branching when possible:
- `git fetch --tags origin`
- `git pull --ff-only`
- confirm clean `git status -sb`
2. Audit history, including direct commits:
- `git log --first-parent --date=iso-strict --pretty=format:'%h%x09%ad%x09%s' <base-tag>..<target-ref>`
- `git log --first-parent --grep='(#' --date=short --pretty=format:'%h%x09%ad%x09%s' <base-tag>..<target-ref>`
- also inspect `--since='24 hours ago'` when main moved during the release.
3. Generate the complete contribution record and editorial manifest before
writing grouped prose:
```bash
node .agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs \
--base <base-tag> \
--target <target-ref> \
--version <YYYY.M.PATCH> \
--manifest /tmp/openclaw-release-<YYYY.M.PATCH>.json \
--write-ledger
```
- the manifest is the required input to the rewrite, not an after-the-fact
audit; it contains every referenced PR, eligible contributor credit,
inline issue context, every direct commit, and an editorial-eligibility
classification for PRs and direct commits
- for a historical backfill, add `--seed-ref <pre-backfill-ref>` once so
contribution records from the prior changelog are retained even when an
older merged commit omitted its PR number; the verifier excludes records
for work reverted after the base tag, including beta work reverted before
the stable release
- source PR discovery combines merged GitHub commit associations with merged
PR references explicitly present in active commit subjects/bodies so
cherry-picks and squash commits remain accounted for. Resolve every
association page and exclude PRs merged after the target release commit
- read the manifest before editing `### Highlights`, `### Changes`, or
`### Fixes`; do not carry old grouped prose forward without re-auditing it
- inspect linked PRs/issues or diffs for ambiguous commits. Direct commits
are editorial input, not public ledger rows; infer material user outcomes
from subject, body, touched files, tests, and nearby commits
4. Rewrite one stable-base section only:
- use `## YYYY.M.PATCH`
- do not create beta-specific headings
- do not leave a stale `## Unreleased` section above the target release
- if `Unreleased` contains release-bound notes, fold them into the target
section instead of deleting them
5. Section shape:
- `### Highlights`: 5-8 bullets, broad user wins first
- include only a clear user-visible capability or workflow unlock, a
material reliability/safety fix, a broad cross-surface improvement, or
a release-defining integration/compatibility milestone
- every highlight must say what changed for a user in one sentence; use
one user story per bullet and group its supporting PRs
- exclude tests, CI, refactors, docs, catalog churn, and implementation
detail unless the outcome is a material install/update, data-safety, or
widely visible user improvement
- `### Changes`: new capabilities and behavior changes
- `### Fixes`: user-facing fixes first, grouped by impact and surface
- group related changes/fixes by surface and user impact; avoid one bullet
per tiny commit when several commits tell one user-facing story
- `### Complete contribution record`: generated PR-first record after the
grouped prose; it is the exhaustive accounting surface, not a second
release summary
6. Preserve attribution:
- keep `#issue`, `(#PR)`, `Fixes #...`, and `Thanks @...`
- every human-authored merged PR represented by a user-facing entry needs
its PR ref and `Thanks @author`, even when the PR had no linked issue
- every human issue reporter for a `Fixes #...` or referenced bug issue
represented by a user-facing entry needs `Thanks @reporter` unless the
same handle is already thanked in that bullet
- every human `Co-authored-by` contributor on represented user-facing work
needs `Thanks @handle` when a GitHub handle is known
- when grouping multiple PRs/issues in one bullet, include every relevant
PR/issue ref and every human contributor handle in that same bullet
- multiple `Thanks @...` handles in one bullet are expected; do not drop or
collapse contributor credit just because the note is grouped
- if one grouped bullet covers both direct commits and PRs, keep all PR refs
and thanks, plus any issue refs and human credit from the direct work
- issues remain normal inline `#NNN` references. Do not add a separate
linked-issues inventory. The generated PR record keeps source issues
inline as `Related #NNN` on the PR that shipped them
- when backfilling an older linked-issues inventory, preserve reporter
credit inline for every GitHub-confirmed closing PR relationship. Do not
infer a PR relationship from a generic cross-reference event, invent an
unrelated PR link for a standalone report, or recreate the retired
inventory
- the complete contribution record lists every merged source PR exactly once
as `**PR #NNN**`; source PRs include GitHub commit associations and merged
PR references explicitly present in active commit subjects/bodies. It
preserves author/co-author credit and any issue references in the original
title
- direct commits remain in the manifest with GitHub-resolved author,
co-author, issue, and editorial-eligibility data. They inform grouped
prose but are never rendered as a public `#### Direct commits` dump. Add
direct-commit credit to a grouped bullet only when it shares an explicit
closing issue reference or at least two distinctive subject terms
- the verifier rejects `docs`, `test`, `refactor`, `ci`, `build`, `chore`,
and `style` PRs in Highlights, Changes, or Fixes. Keep those internal
contributions in the complete PR record, but do not give them editorial
release-note space
- classify internal-only work from conventional prefixes and clear title
signals such as `QA`, `test`, `docs`, `refactor`, `lint`, or `CI`; an
untyped title is not automatically editorial
- do not add GHSA references, advisory IDs, or security advisory slugs to
changelog entries or GitHub release-note text unless explicitly requested
- never thank bots, `@claude`, `@openclaw`, `@clawsweeper`, or `@steipete`
- do not use GitHub's release contributor count as the source of truth; the
changelog must carry the complete human credit set itself
7. Sorting preference:
- security/data-loss and content-boundary fixes
- transcript/replay/reply delivery correctness
- channels and mobile integrations
- providers/Codex/local model reliability
- install/update/release path reliability
- performance and observability
- docs and contributor-only/internal details last or omitted
8. Keep bullets single-line unless existing file style forces otherwise. Avoid
internal release-process noise unless it changes user install/update safety.
9. Check release-note side conditions:
- inspect `src/plugins/compat/registry.ts`
- inspect `src/commands/doctor/shared/deprecation-compat.ts`
- if any compatibility `removeAfter` is on/before release date, resolve it
or explicitly record the blocker before shipping
10. Validate and ship:
- after the manifest-driven rewrite, regenerate and verify the complete
contribution record before committing:
```bash
node .agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs \
--base <base-tag> \
--target <target-ref> \
--version <YYYY.M.PATCH> \
--manifest /tmp/openclaw-release-<YYYY.M.PATCH>.json \
--write-ledger
```
- the command fails when any `#NNN` reference in release history or the
rendered release section cannot resolve, when reverted work is presented
as shipped, when a source PR is absent from the contribution record, when
direct commits are rendered as a public record dump, when non-editorial
PRs appear in grouped prose, or when an eligible PR author or known
co-author is missing from that PR's `Thanks @...` credit
- when grouped prose names a PR, that same bullet must retain every
contributor and linked-reporter credit from its generated PR record
- unqualified `#NNN` references resolve against `openclaw/openclaw`;
cross-repository references such as `openclaw/imsg#141` remain literal
text and must not be rewritten as local issue links
- after the GitHub release or prerelease is published, verify every matching
release page against the same source section:
```bash
node .agents/skills/openclaw-changelog-update/scripts/verify-release-notes.mjs \
--base <base-tag> \
--target <target-ref> \
--version <YYYY.M.PATCH> \
--release-tag v<YYYY.M.PATCH> \
--check-github
```
- add one `--release-tag` for every beta and stable page in the train; a
`### Release verification` tail is permitted, but any other body drift
fails the check; the GitHub body must begin with the complete
`## YYYY.M.PATCH` changelog section, including its heading
- GitHub release bodies are limited to 125,000 characters. If the complete
source section plus an existing verification tail exceeds that limit, keep
the source section intact and omit the tail; never truncate the
contribution record
- `git diff --check`
- for docs/changelog-only changes, no broad tests are required
- commit with `scripts/committer "docs(changelog): refresh YYYY.M.PATCH notes" CHANGELOG.md`
- push, pull/rebase if needed, then branch/rebase release from latest `main`
## Quota / API Outage Rule
If GitHub API quota is exhausted, do not idle. Continue work that does not need
GitHub API:
- local changelog rewrite and release-note extraction
- local pretag checks and package/build sanity
- git push/tag checks over git protocol
- npm registry `npm view` checks
- exact workflow-dispatch command preparation
Only GitHub Release creation, workflow dispatch, run polling, artifact download,
and issue/PR mutation need API quota.

View File

@@ -0,0 +1,238 @@
---
name: openclaw-docs
description: Write or review high-quality OpenClaw developer documentation.
dependencies: []
---
# OpenClaw Docs
## Overview
Use this skill when writing, editing, or reviewing OpenClaw developer documentation for APIs, SDKs, CLI tools, integrations, quickstarts, platform guides, or technical product docs.
Write documentation that is concise, helpful, and comprehensive: fast for first success, precise for production, and easy to scan when debugging.
## Core Model
Use an OpenClaw documentation model, strengthened by Write the Docs principles:
- Lead with what the developer is trying to do.
- Give one recommended path before alternatives.
- Make examples runnable and realistic.
- Keep guides task-oriented and references exhaustive.
- Explain production risks exactly where developers can make mistakes.
- Link concepts, guides, API references, SDKs, testing, and troubleshooting so readers can move between them without rereading.
- Treat docs as part of the product lifecycle: draft them before or alongside implementation, review them with code, and keep them current.
- Make each page discoverable, addressable, cumulative, complete within its stated scope, and easy to skim.
## Structure
Choose the page type before writing:
- Overview: route readers to the right product, integration path, or guide.
- Quickstart: get a new user to a working result with the fewest safe steps.
- Topic page: give an end-to-end overview of a major domain entity, with setup,
key subtopics, troubleshooting, and links to deeper references.
- Guide: explain one workflow from prerequisites to production readiness.
- API reference: define every object, endpoint, parameter, enum, response, error, and version rule.
- SDK or CLI reference: document install, auth, commands or methods, options, examples, and failure modes.
- Testing guide: show sandbox setup, fixtures, test data, simulated failures, and live-mode differences.
- Troubleshooting guide: map symptoms to checks, causes, and fixes.
Use this default topic page structure:
1. Title: name the major entity or surface.
2. Opening overview: start with a few unheaded sentences that explain what it
is, what it owns, and what it does not own. Do not add a `## Overview`
heading unless the page is itself an overview index.
3. Requirements: include only when setup needs specific accounts, versions,
permissions, plugins, operating systems, or credentials.
4. Quickstart: show the recommended setup path and smallest reliable verification.
5. Configuration: show the minimum configuration needed to use the surface,
common variants users must choose between, and where each option is set:
CLI, config file, environment variable, plugin manifest, dashboard, or API.
6. Major subtopics: organize the entity's major concepts, workflows, and
decisions by reader intent. Put each major subtopic under its own heading;
do not wrap them in a generic `## Subtopics` section.
7. Troubleshooting: diagnose common observable failures under an explicit
`## Troubleshooting` heading.
8. Related: link to guides, references, commands, concepts, and adjacent topics.
Topic pages may be longer than quickstarts, but they should not become exhaustive
references. Move field tables, API contracts, narrow internals, legacy details,
and rare debugging workflows to linked reference or troubleshooting pages when
they interrupt the end-to-end overview.
For configuration, keep task-critical options inline. Link to reference docs for
full option lists, defaults, enums, generated schemas, and advanced settings. Do
not duplicate exhaustive config reference tables in topic pages unless the topic
page is itself the reference.
Use this default guide structure:
1. Title: name the outcome, not the implementation detail.
2. Opening: state what the reader can accomplish in one or two sentences.
3. Before you begin: list accounts, keys, permissions, versions, tools, and assumptions.
4. Choose a path: compare options only when the reader must decide.
5. Steps: use verb-led headings with code, expected output, and checks.
6. Test: show the smallest reliable proof that the integration works.
7. Production readiness: cover security, idempotency, retries, limits, observability, migrations, and cleanup.
8. Troubleshooting: include common errors near the workflow that causes them.
9. See also: link to concepts, API references, SDK docs, and adjacent guides.
Keep navigation user-intent based. Do not force readers to understand internal product taxonomy before they can pick a task.
## Documentation Lifecycle
Write and maintain docs with the same discipline as code:
- Draft docs early enough to expose unclear product, API, CLI, or config design.
- Keep docs source near the code, config, command, plugin, or protocol it describes when the repo layout allows it.
- Avoid duplicate truth. If the same contract appears in multiple places, pick the canonical page and link to it.
- Update docs in the same change as behavior, config, API, CLI, plugin, or troubleshooting changes.
- Remove, redirect, or clearly mark stale docs. Incorrect docs are worse than missing docs.
- Involve the right reviewers: code owners for behavior, support or QA for user failure modes, and docs maintainers for structure and style.
- Preserve older-version guidance only when users need it; otherwise document the current supported behavior.
Do not use FAQs as a dumping ground for unrelated material. Promote recurring questions into task, concept, troubleshooting, or reference pages.
## Writing Style
Write in a direct, practical voice:
- Use present tense and active voice.
- Address the reader as "you" when giving instructions.
- Prefer short paragraphs and scannable lists.
- Use concrete nouns: "agent profile", "Gateway webhook", "plugin manifest", "session state".
- Put caveats exactly where they affect the step.
- Avoid marketing language, hype, generic benefits, and vague claims.
- Avoid long conceptual lead-ins before the first actionable step.
- Do not over-explain common developer concepts unless the product has a nonstandard contract.
- Define OpenClaw-specific jargon and abbreviations before first use.
- Use sentence case for headings unless an OpenClaw product name, command, or identifier requires capitalization.
- Use descriptive link text that names the destination or action; avoid vague links such as "this page" or "click here".
- Avoid culturally specific idioms, violent idioms, and jokes that make docs harder to translate or scan.
- Write accessible prose: do not rely on color, screenshots, or visual position as the only way to understand an instruction.
Use headings that describe actions or reference surfaces:
- Good: "Create an agent", "Configure a Slack channel", "Repair plugin installation"
- Avoid: "How it works", "Under the hood", "Important notes" unless the section truly needs that shape
Use precise modal language:
- Use "must" for required behavior.
- Use "can" for optional capability.
- Use "recommended" for the default path.
- Use "avoid" for known footguns.
- Explain "why" only when it changes a developer decision.
## Detail Level
Vary detail by page type:
- Overview pages: be brief; help readers choose.
- Quickstarts: be procedural; include only what is needed for first success.
- Guides: be complete for one workflow; include decisions, side effects, and failure handling.
- References: be exhaustive; document every field, default, enum, nullable value, constraint, response, and error.
- Troubleshooting: be explicit; assume the reader is blocked and needs observable checks.
Go deep where mistakes are expensive:
- Authentication and secret handling
- Money movement, billing, permissions, and irreversible actions
- Webhooks, retries, duplicate events, and ordering
- Idempotency and concurrency
- Sandbox versus production differences
- Versioning, migrations, and backwards compatibility
- Limits, rate limits, quotas, and timeouts
- Error codes and recovery paths
- Data retention, privacy, and compliance-sensitive behavior
Do not bury this detail in a distant reference if developers need it to complete the task safely.
## Examples
Make examples production-shaped, even when using test data:
- Prefer complete copy-pasteable commands or snippets.
- Use realistic variable names and values.
- Mark placeholders clearly with angle-bracket names such as `<API_KEY>` or `<CUSTOMER_ID>`.
- Show expected success output after commands.
- Show full request and response examples for API references when response shape matters.
- Keep one conceptual unit per code block.
- Use language-specific code fences.
- Avoid toy examples that hide required setup, auth, error handling, or cleanup.
When multiple languages are useful, keep the same scenario across languages so readers can compare equivalents.
## Discoverability and Navigation
Design every page so readers can find it, link to it, and decide quickly whether it answers their question:
- Use goal-oriented titles and headings that match likely search terms.
- Start each page with a concise answer to "what can I do here?"
- Include metadata or frontmatter required by the OpenClaw docs index.
- Add "Read when" hints for docs-list routing when creating or changing OpenClaw docs pages that participate in the docs index.
- Link from likely entry points, not only from nearby internal taxonomy pages.
- Keep section headings stable enough for links from issues, PRs, support replies, and chat answers.
- Order tutorials and examples from prerequisites to advanced tasks; order reference pages alphabetically or topically when that helps lookup.
- State scope up front when a page is intentionally partial.
## API Reference Pattern
For endpoints, methods, objects, or commands, include:
1. Short purpose statement.
2. Auth or permission requirements.
3. Request shape, including path, query, headers, and body fields.
4. Parameter table with type, requiredness, default, constraints, enum values, and side effects.
5. Return shape with object lifecycle states.
6. Error cases with codes, causes, and recovery guidance.
7. Runnable example request.
8. Representative successful response.
9. Related guides and adjacent reference pages.
For nested objects, document child fields near their parent. Do not make readers jump across pages to understand the shape of a single request.
## Verification
Verify docs changes like product changes:
- Run the relevant docs build, docs index, formatter, link checker, or generated-doc check when available.
- Run commands, snippets, and examples that the page tells users to run whenever feasible.
- Confirm screenshots, UI labels, CLI output, config keys, flags, defaults, errors, and file paths match current behavior.
- Prefer executable checks over prose-only review for API, CLI, config, generated reference, and troubleshooting docs.
- If a verification step is not feasible, say what was not verified and why.
## Completeness Checks
Before finalizing a page, verify:
- The first screen tells readers what they can accomplish.
- The recommended path is obvious.
- Prerequisites are explicit and testable.
- Examples can run with documented inputs.
- The page has a clear audience: user, operator, plugin author, contributor, or maintainer.
- Test-mode and production-mode behavior are separated.
- Security-sensitive values are never exposed in examples.
- Every warning is attached to the step where it matters.
- Edge cases are documented where they affect implementation.
- API fields include types, defaults, constraints, and errors.
- Troubleshooting starts from observable symptoms.
- Related links help the reader continue without duplicating the page.
- The page says where to get support, file issues, or contribute when that is relevant to the reader's next step.
- The page is complete for the scope it claims, or the limitation is stated up front.
## Review Pass
Edit in this order:
1. Remove repetition and generic explanation.
2. Move conceptual background below the first useful action unless it is required to choose correctly.
3. Replace passive or abstract wording with concrete instructions.
4. Tighten headings until the outline reads like a task map.
5. Add missing operational details for production safety.
6. Check examples for copy-paste accuracy.
7. Add links between guide, reference, SDK, testing, and troubleshooting surfaces.
8. Check discoverability, addressability, accessibility, and docs-as-code verification.

View File

@@ -1,11 +1,11 @@
---
name: openclaw-ghsa-maintainer
description: "Inspect, patch, validate, publish, or confirm OpenClaw GHSA security advisories and private-fork state."
description: Inspect, patch, validate, publish, or confirm OpenClaw GHSA security advisories and private-fork state.
---
# OpenClaw GHSA Maintainer
Use this skill for repo security advisory workflow only. Keep general release work in `release-openclaw-maintainer`.
Use this skill for repo security advisory workflow only. Keep general release work in `openclaw-release-maintainer`.
## Respect advisory guardrails
@@ -85,4 +85,3 @@ jq -r .description < /tmp/ghsa.refetch.json | rg '\\\\n'
- Publishing fails with HTTP 422 if required fields are missing or the private fork still has open PRs.
- A payload that looks correct in shell can still be wrong if Markdown was assembled with escaped newline strings.
- Advisory PATCH sequencing matters; separate field updates when GHSA API constraints require it.
- Public hardening/no-publish comments and draft text should avoid raw commit hashes, PR titles/numbers, and fix-mechanism summaries. Prefer patched-version fields or release-only wording; keep SHAs, PRs, and implementation notes in internal evidence.

View File

@@ -1,165 +0,0 @@
---
name: openclaw-landable-bug-sweep
description: "Find or repair small high-confidence non-SDK-boundary OpenClaw bugfix PRs until five are landable."
---
# OpenClaw Landable Bug Sweep
Autonomous maintainer workflow for producing five landable OpenClaw bugfix PR URLs.
Use for broad issue/PR sweeps where the bar is high and the output is PRs, not notes.
Do not use for plugin SDK/API boundary work; those need separate architecture review.
## Target
Return exactly five PR URLs, each with:
- bug summary
- why the fix is low-risk
- proof: rebased-head local/Testbox/live commands or run IDs
- autoreview: clean result on the exact head being shown
- CI green on the exact pushed PR head
- issue/duplicate cleanup done or still pending
The five URLs may be existing PRs that were reviewed/fixed, or new PRs created from issues/clusters.
Do not present a PR URL to the maintainer until it has been refreshed on current `main`, left-tested, autoreviewed clean, pushed, and verified green in live GitHub CI.
If code, tests, changelog, PR body, or branch base changes after autoreview, rerun autoreview before showing the URL.
## Companion Skills
Use `$gitcrawl` for discovery/clustering, `$openclaw-pr-maintainer` for live GitHub mutation rules, `$github-author-context` when contributor trust matters, `$openclaw-testing` for proof choice, `$autoreview` before publishing/landing, and `$crabbox` for broad/E2E/live proof.
## Candidate Bar
Accept only when all are true:
- bug or paper cut, not feature/product/support/docs-only
- root cause is proven in current code
- dependency behavior checked via upstream docs/source/types when relevant
- production/runtime diff is small, ideally much smaller than 500 LOC and always below 500 LOC
- tests may be larger, but focused
- no new dependency
- no new config option
- no backward-incompatible behavior
- no security/product/owner-boundary decision needed
- no plugin SDK, public plugin API, or `src/plugin-sdk/**` boundary change
- no broad refactor smell
- focused proof is feasible
- branch can be rebased/refreshed and pushed, or a replacement PR can be created
Good examples:
- provider parameter mismatch proven against dependency/API contract
- CLI command diverges from adjacent command behavior
- narrow runtime state/serialization bug with failing test
- issue already fixed on current `main`, with proof and closeable duplicates
Reject:
- feature requests, new knobs, migrations, release work, workflow policy, support
- plugin SDK/API boundary changes, including compatibility shims, new SDK methods, SDK exports, or plugin-facing channel/provider seams
- auth/security boundary changes unless explicitly assigned
- bugs needing live credentials that are unavailable
- PRs with red CI unless you fix, rebase, push, and recheck them green
- PRs you only reviewed locally but did not refresh/push/check live
- PRs whose final head has not passed `$autoreview`
- fixes whose clean shape is a larger architecture move
- speculative reports without reproducible/provable cause
- UI/UX changes requiring product judgment
## Sweep Loop
1. Start clean:
- `git status -sb`
- `git pull --ff-only`
- verify branch is expected, usually `main`
2. Build candidate clusters:
- `gitcrawl` open issues/PRs, neighbors, and search
- live `gh issue/pr view`
- include PRs linked from issues and duplicates
3. For each cluster:
- read issue/PR body, comments, labels, linked refs, current source, adjacent tests
- suppress maintainer-owned queue noise unless it is the best fix path
- identify opener/author and preserve credit
- decide: `repair-existing-pr`, `create-new-pr`, `close-fixed-on-main`, `close-duplicate`, or `reject`
4. Prove before patching:
- failing test, focused repro, log/source proof, or dependency contract proof
- if already fixed on `main`, prove with current source/test/commit and close kindly
5. Patch:
- prefer existing PR when good and writable
- if unwritable or wrong shape, create own PR and preserve useful contributor credit
- if no PR exists, create one
- add regression test when it fits
- release-note context for user-facing fixes in PR body or commit message; credit human reporter/contributor when known
6. Review, refresh, and publish:
- rebase or otherwise refresh the PR branch on current `origin/main`
- resolve drift, including newly exposed CI failures, rather than counting the PR as ready
- do not add `CHANGELOG.md` during normal sweep PRs; release automation generates it from PRs and commits
- left-test the rebased head with the smallest meaningful local/Testbox/live command that proves the bug
- run `$autoreview` until no accepted/actionable findings remain before creating, updating, or presenting the PR URL
- create/update PR with real body and proof fields
- push the exact reviewed head
- verify live GitHub CI is green for that pushed head; do not count pending, red, dirty, conflicting, or externally blocked PRs in the five
7. Hygiene:
- close duplicates and fixed-on-main issues/PRs with proof as soon as you notice them during the sweep
- never mutate more than five associated items in one cluster without explicit confirmation
- comments must be kind, concrete, and include proof/PR/commit links
8. Repeat until five landable PR URLs are ready.
## PR Body Proof
Use the repo PR template. Include these exact labels:
```text
Behavior addressed:
Real environment tested:
Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix:
What was not tested:
```
## Existing PR Rules
- Review code path beyond the diff before trusting it.
- If PR is good: rebase/refresh on current `main`, fix small issues, left-test, autoreview clean, push, and get CI green before showing or counting it.
- If PR is not good but has a useful idea: recreate locally, co-author when warranted, close original with thanks and explanation.
- If PR is duplicate or fixed on `main`: comment proof, close.
- If maintainer cannot push to contributor branch: create own branch/PR, preserve useful commits or credit.
- If CI turns red after local proof, treat that as normal work: inspect the failing job, fix or reject, rerun, and only count the PR once green.
## Output Ledger
Maintain a running ledger:
```text
accepted:
- PR URL:
source refs:
bug:
root cause:
fix:
risk:
rebase/head:
left-test:
autoreview:
CI:
credit/thanks:
cleanup:
rejected:
- ref:
reason:
closed:
- ref:
reason:
proof/comment:
```
Final answer:
- exactly five accepted PR URLs
- 2-4 sentence explainer per PR
- proof/CI state per PR
- closed duplicates/fixed-on-main refs
- current branch/status

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Landable Bug Sweep"
short_description: "Find five small non-SDK landable bugfix PRs"
default_prompt: "Use $openclaw-landable-bug-sweep to find or repair five small high-confidence non-SDK-boundary OpenClaw bugfix PRs and get them landable."

View File

@@ -0,0 +1,95 @@
---
name: openclaw-mac-release
description: "Run or recover OpenClaw macOS release signing, notarization, appcast, and asset promotion."
---
# OpenClaw Mac Release
Use with `$openclaw-release-maintainer`, `$openclaw-release-ci`, and `$one-password` when stable macOS assets, private mac preflight, notarization, appcast promotion, or mac release recovery is involved.
## Credentials
- Canonical ASC item: vault `Molty`, title `API Key - App Store Connect - Personal - Release`.
- Fields: `private_key_p8`, `key_id`, `issuer_id`.
- Current known good key id: `AKVLXW849T`.
- Legacy mirror: vault `Private`, title `API Key - App Store Connect - Personal`; keep it synced for older refs.
- Stale/revoked key symptom: `xcrun notarytool submit` fails with `HTTP status code: 401. Unauthenticated`.
- Validate candidate ASC credentials with `xcrun notarytool history` before setting GitHub secrets.
## 1Password
- Use `$one-password`: all `op` work inside one persistent tmux session, no secret output.
- Prefer `OP_SERVICE_ACCOUNT_TOKEN` from `~/.profile` for Molty reads.
- Do not assume `MOLTY_OP_SERVICE_ACCOUNT_TOKEN` is alive; it has previously pointed at a deleted service account.
- If a service token fails, run status-only checks: token present/length and `op whoami`; never print token values.
- If desktop app auth is needed but Touch ID is unavailable, set `OP_BIOMETRIC_UNLOCK_ENABLED=false` for the manual `op account add --signin` path.
## GitHub Secrets
Target private repo environment: `openclaw/releases-private`, env `mac-release`.
Set only after local notary auth validation:
- `APP_STORE_CONNECT_API_KEY_P8`
- `APP_STORE_CONNECT_KEY_ID`
- `APP_STORE_CONNECT_ISSUER_ID`
Do not update these from mixed sources. All three ASC fields must come from the same 1Password item.
## Workflow Shape
- Public release branch may carry mac-only packaging fixes after the stable tag/npm are already live.
- Use `source_ref=release/YYYY.M.D` for private mac preflight/validation when building that branch variation.
- Keep `tag=vYYYY.M.D` pointing at the original stable release commit.
- Real mac publish must reuse:
- a successful private mac preflight run for the same tag/source SHA
- a successful private mac validation run for the same tag/source SHA
- If preflight source SHA differs from tag SHA, validation must also use the same `source_ref`; promotion rejects mismatched proof.
## Notarization
- OpenClaw uses `scripts/notarize-mac-artifact.sh`.
- `xcrun notarytool submit` should use `--no-s3-acceleration`; accelerated upload can surface misleading 401s even when `notarytool history` succeeds.
- If signing succeeds but notarization fails immediately with 401, check ASC key freshness first.
- If notarization stays in progress for several minutes after key-file write, that is normal Apple wait time; do not edit blindly.
## Dispatch
Private preflight:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f source_ref=release/YYYY.M.D \
-f preflight_only=true \
-f smoke_test_only=false \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.D
```
Private validation for a branch-variation preflight:
```bash
gh workflow run openclaw-macos-validate.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f source_ref=release/YYYY.M.D
```
Real publish:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f preflight_only=false \
-f smoke_test_only=false \
-f preflight_run_id=<successful-preflight-run> \
-f validate_run_id=<successful-validation-run> \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.D
```
## Verify
- `gh release view vYYYY.M.D --repo openclaw/openclaw` shows zip, dmg, dSYM zip, not draft, not prerelease.
- Public `main` `appcast.xml` points at `OpenClaw-YYYY.M.D.zip`.
- Appcast entry has `sparkle:version`, `sparkle:shortVersionString`, length, and `sparkle:edSignature`.

View File

@@ -58,7 +58,7 @@ Use this skill for Parallels guest workflows and smoke interpretation. Do not lo
- For beta/stable verification, resolve the tag immediately before the run (`npm view openclaw@beta version dist.tarball` or `npm view openclaw@latest ...`). Tags can move while a long VM matrix is already running; restart the matrix when the intended prerelease appears after an earlier registry 404/tag-lag check.
- Use the configured secret workflow to inject only the provider keys needed by OpenAI/Anthropic lanes. Do not print secrets or env dumps; pass provider secrets through the guest exec environment.
- Same-guest update verification should set the default model explicitly to `openai/gpt-5.4` before the agent turn and use a fresh explicit `--session-id` so old session model state does not leak into the check.
- The aggregate npm-update wrapper must resolve the Linux VM with the same Ubuntu fallback policy as `parallels-linux-smoke.sh` before both fresh and update lanes. Treat any Ubuntu guest with major version `>= 24` as acceptable when the exact default VM is missing, preferring the newest versioned Ubuntu guest with a fresh poweroff snapshot. On Peter's current host today, use `Ubuntu 26.04`.
- The aggregate npm-update wrapper must resolve the Linux VM with the same Ubuntu fallback policy as `parallels-linux-smoke.sh` before both fresh and update lanes. Treat any Ubuntu guest with major version `>= 24` as acceptable when the exact default VM is missing, preferring the closest version match. On Peter's current host today, missing `Ubuntu 24.04.3 ARM64` should fall back to `Ubuntu 25.10`.
- On macOS same-guest update checks, restart the gateway after the npm upgrade before `gateway status` / `agent`; launchd can otherwise report a loaded service while the old process has exited and the fresh process is not RPC-ready yet.
- The npm-update aggregate's macOS update leg writes the guest update script as root, then runs it as the desktop user. If `prlctl exec "$MACOS_VM" --current-user ...` cannot authenticate, retry through plain root `prlctl exec` plus `sudo -u <desktop-user> /usr/bin/env HOME=/Users/<desktop-user> USER=<desktop-user> LOGNAME=<desktop-user> PATH=/opt/homebrew/bin:/opt/homebrew/opt/node/bin:/usr/bin:/bin:/usr/sbin:/sbin ...`. That is a Parallels transport fallback; still verify `openclaw --version`, gateway RPC, and an agent turn after the update.
- On Windows same-guest update checks, restart the gateway after the npm upgrade before `gateway status` / `agent`; in-place global npm updates can otherwise leave stale hashed `dist/*` module imports alive in the running service.
@@ -93,8 +93,8 @@ Use this skill for Parallels guest workflows and smoke interpretation. Do not lo
- If that release-to-dev lane fails with `reason=preflight-no-good-commit` and repeated `sh: pnpm: command not found` tails from `preflight build`, treat it as an updater regression first. The fix belongs in the git/dev updater bootstrap path, not in Parallels retry logic.
- Until the public stable train includes that updater bootstrap fix, the macOS release-to-dev lane may seed a temporary guest-local `pnpm` shim immediately before `openclaw update --channel dev`. Keep that workaround scoped to the smoke harness and remove it once the latest stable no longer needs it.
- In Tahoe `prlctl exec --current-user` runs, prefer explicit `node .../openclaw.mjs ...` invocations for the release->dev handoff itself and for post-update verification. The shebanged global `openclaw` wrapper can fail with `env: node: No such file or directory`, and self-updating through the wrapper is a weaker lane than invoking the entrypoint under a fixed `node`.
- Default to the snapshot closest to `macOS 26.5 latest`.
- On Peter's Tahoe VM, `fresh-latest-march-2026` can hang in `prlctl snapshot-switch`; if restore times out there, rerun with `--snapshot-hint 'macOS 26.5 latest'` before blaming auth or the harness.
- Default to the snapshot closest to `macOS 26.3.1 latest`.
- On Peter's Tahoe VM, `fresh-latest-march-2026` can hang in `prlctl snapshot-switch`; if restore times out there, rerun with `--snapshot-hint 'macOS 26.3.1 latest'` before blaming auth or the harness.
- `parallels-macos-smoke.sh` now retries `snapshot-switch` once after force-stopping a stuck running/suspended guest. If Tahoe still times out after that recovery path, then treat it as a real Parallels/host issue and rerun manually.
- The macOS smoke should include a dashboard load phase after gateway health: resolve the tokenized URL with `openclaw dashboard --no-open`, verify the served HTML contains the Control UI title/root shell, then open Safari and require an established localhost TCP connection from Safari to the gateway port.
- For Tahoe `fresh.gateway-status`, prefer non-TTY `prlctl exec --current-user ... openclaw gateway status ...` plus a few short retries. `prlctl enter` can spam TTY control bytes and hang the phase log even when the CLI itself is healthy.
@@ -140,8 +140,8 @@ Use this skill for Parallels guest workflows and smoke interpretation. Do not lo
## Linux flow
- Preferred entrypoint: `pnpm test:parallels:linux`
- Use the newest versioned Ubuntu guest with a fresh poweroff snapshot. On Peter's host today, that is `Ubuntu 26.04`.
- If an exact requested Ubuntu VM is missing on the host, any Ubuntu guest with major version `>= 24` is acceptable; prefer the newest versioned Ubuntu guest over older fallback snapshots.
- Use the snapshot closest to fresh `Ubuntu 24.04.3 ARM64`.
- If that exact VM is missing on the host, any Ubuntu guest with major version `>= 24` is acceptable; prefer the closest versioned Ubuntu guest with a fresh poweroff snapshot. On Peter's host today, that is `Ubuntu 25.10`.
- Use plain `prlctl exec`; `--current-user` is not the right transport on this snapshot.
- Fresh snapshots may be missing `curl`, and `apt-get update` can fail on clock skew. Bootstrap with `apt-get -o Acquire::Check-Date=false update` and install `curl ca-certificates`.
- Fresh `main` tgz smoke still needs the latest-release installer first because the snapshot has no Node or npm before bootstrap.

View File

@@ -139,12 +139,12 @@ Issue triage is review/prove/patch-local by default:
2. Fix only issues that are easy, high-confidence, and narrowly owned by the implicated path.
3. Add focused regression proof when practical.
4. Stop with the dirty diff, touched files, and test/gate output for maintainer review.
5. After maintainer approval to ship, make one commit per accepted fix, with release-note context in the PR body or commit message when user-facing.
5. After maintainer approval to ship, make one commit per accepted fix, with its own changelog entry when user-facing.
6. Pull/rebase, push, then comment and close only the issues that were fixed or explicitly triaged closed.
Do not batch unrelated issue fixes into one commit. Do not publish, comment, close, or label during the review/prove phase.
Missing `CHANGELOG.md` is not a PR review finding or merge blocker. If landing/fixing a user-visible change, make sure the PR body or commit message captures the release-note context; never ask or block solely on it.
Missing changelog is not a PR review finding or merge blocker. If landing/fixing a user-visible change, add/update changelog automatically when practical; never ask or block solely on it.
Only list candidates that pass all gates:
@@ -168,56 +168,19 @@ Output only qualifying candidates, with: ref, surface, proof, cause, fix sketch,
- Start every PR review with 1-3 plain sentences explaining what the change does and why it matters. Put this before `Findings`.
- Then list findings first. If none, say `No blocking findings` or `No findings`.
- Show size near the top as `LOC: +<additions>/-<deletions> (<changedFiles> files)`, using live PR stats or local diff stats.
- Always answer: bug/behavior being fixed, PR/issue URL and affected surface, provenance for regressions when traceable, and best-fix verdict.
- For bug/regression fixes, include a compact `Provenance:` line after cause/root-cause when a bounded history pass can identify it. Use `git log -S/-G`, `git blame`, linked PRs/issues, and tests.
- Provenance must separate roles when they differ: blamed code author username, blamed PR author username, blamed PR merger/committer username, automerge trigger when known, current PR author username, PR number, and date. Do not collapse them into one "introduced by" actor.
- If the blamed PR was merged by `clawsweeper[bot]` or another automation, identify the human trigger when practical. Check live PR timeline/comments first; if rate-limited, use gitcrawl/cache or public PR HTML. Look for maintainer command comments such as `@clawsweeper automerge`, `/landpr`, labels/events that armed automerge, and ClawSweeper status comments. Report `automerge triggered by @login`; if not found, say trigger unknown rather than naming the bot as the human decision-maker.
- For any confirmed bug, run `git blame` on the implicated line(s) after identifying the root cause. Report who broke it as the blamed PR merger/committer, and also name the blamed code author. Include the PR number. If no PR is traceable, use the blamed commit as the provenance: commit SHA, date, and author username. Do not guess a merger or frame missing PR metadata as a separate finding.
- For bug/regression fixes, include a compact `Provenance:` line after cause/root-cause when a bounded history pass can identify it. Use `git log -S/-G`, `git blame`, linked PRs/issues, and tests; separate author, committer/merger, and current PR author when they differ.
- Phrase provenance as `introduced by`, `made visible by`, or `carried forward by`, with confidence (`clear`, `likely`, `unknown`). If unclear, say what evidence is missing instead of guessing. For features, docs, and refactors, use `Provenance: N/A` or omit it when no broken behavior is being fixed.
- Keep summaries compact, but include enough proof that the verdict is auditable without rereading the PR.
LOC proof:
```bash
gh pr view <number> --json additions,deletions,changedFiles \
--jq '"LOC: +\(.additions)/-\(.deletions) (\(.changedFiles) files)"'
```
## Read beyond the diff
- Review the surrounding code path, not just changed lines. Open the caller, callee, data contracts, adjacent tests, and owner module.
- Before any verdict, read enough code to fill this map: changed surface, runtime entry point, owner boundary, one caller, one callee, sibling implementations sharing the invariant, adjacent tests, current `main` behavior, and shipped/dependency/Codex contracts when relevant.
- For large-codebase PRs, sample enough related files to understand the runtime boundary before deciding. Default to more code reading when the change touches agents, gateway, plugins, auth, sessions, process, config, or provider/runtime seams.
- Compare the PR against current `origin/main` behavior. Check whether recent main already changed the same surface.
- Dependency-backed behavior: MUST read upstream docs/source/types before judging API use, defaults, output shapes, errors, timeouts, memory behavior, or compatibility. Do not assume dependency contracts from memory or PR text.
- Judge solution quality, not only correctness. Ask whether the PR is the clean owner-boundary fix or a wart/workaround that should be replaced by a small refactor, moved seam, contract change, or deletion of duplicate logic.
- Mention the main files read when the verdict depends on code-path evidence.
- If the user challenges the verdict or asks whether the idea is really good, resume code reading first. Do not defend, soften, or reverse the verdict until the missing caller/callee/sibling/dependency path is checked.
## Best-fix review loop
Every PR review must explicitly answer: "Is this the best fix, or only a plausible fix?"
Before verdict:
1. Reconstruct the bug, feature need, or behavior claim from issue/PR/proof.
2. Trace current behavior from entry point to failure or decision point.
3. Read touched files, callers, callees, owner modules, adjacent tests, and relevant docs.
4. Read sibling surfaces that should share the invariant or could be broken by a one-sided fix.
5. Compare against current `origin/main` and shipped behavior when regression/compat matters.
6. Inspect upstream dependency/Codex source or docs for dependency-backed behavior.
7. Identify at least one alternative fix location or shape, then reject it with evidence.
8. If any required path above is uninspected, keep reading or mark `Remaining uncertainty`; do not call the PR best, blocked, proof-sufficient, or merge-ready.
Review output must include:
- `Best-fix verdict:` best / acceptable mitigation / wrong layer / too narrow / too broad.
- `Alternatives considered:` 1-3 concrete alternatives and why rejected.
- `Code read:` compact list of main files/contracts checked.
- `Remaining uncertainty:` what was not proven.
If the best-fix answer is only "maybe", keep reading or state the missing evidence. Do not call proof sufficient until the best-fix judgment is explicit.
## Enforce the bug-fix evidence bar
@@ -229,7 +192,7 @@ If the best-fix answer is only "maybe", keep reading or state the missing eviden
- Before landing, require:
1. symptom evidence such as a repro, logs, or a failing test
2. a verified root cause in code with file/line
3. blame-backed provenance for regressions when traceable, including blamed PR merger and automerge trigger when known, or commit SHA/date when no PR is traceable
3. provenance for regressions when traceable by bounded git/PR history
4. a fix that touches the implicated code path
5. a regression test when feasible, or explicit manual verification plus a reason no test was added
- If the claim is unsubstantiated or likely wrong, request evidence or changes instead of merging.
@@ -279,12 +242,13 @@ gh search issues --repo openclaw/openclaw --match title,body --limit 50 \
## Follow PR review and landing hygiene
- Never mention release-note bookkeeping in review-only output. It is landing
or release-generation mechanics, not a correctness finding.
- Never mention merge conflicts that are relatively easy to resolve, such as
`CHANGELOG.md` entries, in review-only output. These are landing mechanics,
not correctness findings.
- If bot review conversations exist on your PR, address them and resolve them yourself once fixed.
- Leave a review conversation unresolved only when reviewer or maintainer judgment is still needed.
- Before landing any PR with non-trivial code changes, run `$autoreview` until no accepted/actionable findings remain, unless equivalent manual review already covered it, the change is trivial/docs-only, or the user opts out.
- When an agent is landing or merging a PR targeting `main`, use only the repo-native `scripts/pr` wrapper: run `scripts/pr review-init <PR>`, follow its emitted checkout/guard guidance, initialize and complete review artifacts with `scripts/pr review-artifacts-init <PR>`, validate them with `scripts/pr review-validate-artifacts <PR>`, then run `scripts/pr prepare-run <PR>` and `scripts/pr merge-run <PR>`.
- When landing or merging any PR, follow the global `/landpr` process.
- Use `scripts/committer "<msg>" <file...>` for scoped commits instead of manual `git add` and `git commit`.
- Keep commit messages concise and action-oriented.
- Group related changes; avoid bundling unrelated refactors.

View File

@@ -0,0 +1,234 @@
---
name: openclaw-pre-release-plugin-testing
description: Plan and run pre-release OpenClaw plugin validation across bundled plugins, package artifacts, lifecycle commands, doctor/fix, config round-trip, gateway startup, SDK compatibility, Docker E2E, Package Acceptance, and Testbox proof.
---
# OpenClaw Pre-Release Plugin Testing
Use this skill when the user asks for plugin release confidence, plugin lifecycle
sweeps, package-artifact plugin proof, or "what else should we test before
release?" It complements `openclaw-testing`; use that skill too when choosing
the cheapest safe runner or debugging a failing lane.
## Goal
Prove the plugin system as a product surface, not just as source tests:
- bundled plugin lifecycle: install, inspect, enable, disable, uninstall
- package artifact behavior from a clean `HOME`
- doctor/fix/config validation and idempotence
- config discovery and config round-trip
- status/log visibility and diagnostics
- gateway startup/bootstrap with plugin metadata snapshots
- public SDK compatibility for real external plugins
- live-ish provider/channel probes only when safe credentials exist
## First Checks
From the OpenClaw repo root:
```bash
pnpm docs:list
git status --short --branch
readlink node_modules
pnpm changed:lanes --json
```
In Codex worktrees under `.codex/worktrees`, `node_modules` must be a symlink to
the main OpenClaw checkout. Do not run `pnpm install` there. For broad or
package-heavy proof, use Blacksmith Testbox or GitHub Actions.
## Runner Choice
Prefer this order:
1. **GitHub Package Acceptance** for installable-package product proof.
2. **`ci-build-artifacts-testbox.yml` Testbox** when Docker/package lanes need
seeded `dist`, `dist-runtime`, and package caches.
3. **`ci-check-testbox.yml` Testbox** for source checks, targeted Vitest,
package-boundary checks, or focused Docker lanes.
4. **Local targeted commands only** for small format/static/unit probes.
Avoid long package Docker runs from a stale sparse worktree. If Testbox sync
reports hundreds of changed files or starts deleting package inputs, stop and
warm a fresh box from current `main`, or switch to Package Acceptance.
## Existing Baseline
Run or verify these before inventing new coverage:
```bash
OPENCLAW_TESTBOX=1 pnpm check:changed
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
pnpm test:docker:plugins
OPENCLAW_PLUGINS_E2E_CLAWHUB=0 pnpm test:docker:plugins
pnpm test:docker:plugin-update
pnpm test:docker:bundled-channel-deps:fast
```
For full bundled install/uninstall proof, shard the packaged sweep:
```bash
OPENCLAW_BUNDLED_PLUGIN_SWEEP_TOTAL=8 \
OPENCLAW_BUNDLED_PLUGIN_SWEEP_INDEX=<0-7> \
pnpm test:docker:bundled-plugin-install-uninstall
```
Expected current packaged scope: 116 public bundled plugins over shards `0-7`.
Private QA plugins are source-mode only unless a package explicitly includes
them.
## Confidence Matrix
Use this matrix for pre-release signoff. Record pass/fail, run URL/Testbox ID,
package SHA/version, and skipped-live reason.
| Surface | Proof | Preferred runner |
| --- | --- | --- |
| Package artifact | Package Acceptance `suite_profile=package` or custom lanes | GitHub Actions |
| Bundled lifecycle | 8-shard `test:docker:bundled-plugin-install-uninstall` | Testbox or release Docker |
| External plugins | `test:docker:plugins` and `plugins-offline` | Testbox/package acceptance |
| Update no-op | `test:docker:plugin-update` | Testbox/package acceptance |
| Channel runtime deps | `test:docker:bundled-channel-deps:fast` plus key channels | Testbox/package acceptance |
| Doctor/fix | seeded bad configs + `doctor --fix --non-interactive` | new Docker/Testbox harness |
| Config round-trip | `config set/get`, inspect, doctor, reload, diff hash | new Docker/Testbox harness |
| Gateway bootstrap | clean `HOME`, plugin groups enabled/disabled, status JSON | new Docker/Testbox harness |
| SDK compatibility | directory, tgz, and `file:` external plugins using SDK subpaths | `test:docker:plugins` plus new smoke |
| Live-ish | redacted provider/channel probes only for present env | Testbox live lanes |
## Package Acceptance Plan
Use this when validating a release branch, beta, or candidate package:
```bash
gh workflow run package-acceptance.yml \
--repo openclaw/openclaw \
--ref main \
-f workflow_ref=main \
-f source=ref \
-f package_ref=<branch-or-sha> \
-f suite_profile=custom \
-f docker_lanes='plugins-offline plugin-update bundled-channel-deps-compat doctor-switch update-channel-switch config-reload mcp-channels npm-onboard-channel-agent' \
-f telegram_mode=mock-openai
```
Use `source=npm -f package_spec=openclaw@beta` for published beta proof. Keep
`workflow_ref` as trusted current harness code unless the release process says
otherwise.
## New Testbox Harness Plan
If more certainty is needed, add or run a `plugin-lifecycle-matrix` Docker lane
that uses one package tarball and sharded plugin lists. Per plugin:
1. Start with a clean `HOME`.
2. Capture `plugins list --json`.
3. `plugins install <id>`.
4. `plugins inspect <id> --json`.
5. `plugins disable <id>`, then assert disabled visibility.
6. `plugins enable <id>`, except config-required plugins without config.
7. `plugins registry --refresh`.
8. `doctor --non-interactive`.
9. `plugins uninstall <id> --force`.
10. Assert no config entry, allow/deny residue, install record, managed dir, or
bundled `dist/extensions/...` load path remains.
11. Assert diagnostics contain no `level: "error"` and output redacts
secret-looking values.
Keep `memory-lancedb` special: it is config-required. First assert install does
not enable it without embedding config, then run a second configured case.
## Doctor/Fix Matrix
Seed bad states and require `doctor --fix --non-interactive` to repair them,
then run doctor again and require idempotence:
- stale `plugins.allow`
- stale `plugins.entries`
- stale channel config for missing channel plugin
- invalid `plugins.entries.<id>.config`
- packaged bundled path in `plugins.load.paths`
- legacy `plugins.installs`
- disabled channel/plugin config that must not stage runtime deps
- root-owned global package tree that must remain unmodified
## Gateway Bootstrap Matrix
Start packaged OpenClaw in Docker with clean state:
- provider plugins enabled, no credentials: ready with warnings, no crash
- channel plugins configured disabled: no runtime deps staged
- startup-activation plugins enabled: ready and reflected in status
- invalid single plugin config: bad plugin skipped/quarantined, others remain
Assert:
- gateway reaches ready
- `openclaw status --json` includes plugin diagnostics
- `openclaw plugins inspect --all --json` is parseable
- package tree is not mutated
- logs contain no raw tokens
## Config Round-Trip Representatives
Use representative plugin families instead of every plugin for deep config
round-trip:
- providers: `openai`, `anthropic`, `mistral`, `openrouter`
- channels: `telegram`, `discord`, `slack`, `whatsapp`
- memory: `memory-lancedb`
- feature/runtime: `browser`, `acpx`, `tokenjuice`
For each representative:
1. Write config through CLI when possible.
2. Read it back through `config get` or JSON.
3. Run `plugins inspect`.
4. Run `doctor --non-interactive`.
5. Trigger gateway config reload if applicable.
6. Compare config hash before/after no-op commands.
## External SDK Smoke
In a package Docker lane, create tiny external plugins and install them from:
- local directory
- `.tgz`
- `file:` npm spec
Cover CJS and ESM shapes, plus at least one plugin importing focused
`openclaw/plugin-sdk/*` subpaths. Assert `plugins inspect` sees its tool,
gateway method, CLI command, or service.
## Live-Ish Probe Rules
Before live-ish work, source allowed env in Testbox and generate a redacted
availability matrix: present/missing only, never values.
Only run probes for credentials that exist. Prefer auth/catalog/status probes
over sending user-visible messages. If a probe might contact an external user,
channel, or workspace, stop and ask the user.
## Reporting
Report in this shape:
```text
package/ref:
tbx ids / run urls:
matrix:
bundled lifecycle:
package acceptance:
doctor/fix:
gateway bootstrap:
config round-trip:
sdk external:
live-ish:
failures:
skips:
next highest-value gap:
```
Say clearly when a failure is Testbox sync/env damage rather than product
behavior, and prove that with a clean rerun or current-main comparison.

View File

@@ -0,0 +1,4 @@
interface:
display_name: "OpenClaw Plugin Pre-Release Testing"
short_description: "Plan plugin release validation"
default_prompt: "Use $openclaw-pre-release-plugin-testing to plan or run pre-release OpenClaw plugin validation across package, lifecycle, doctor, gateway, SDK, and live-ish proof."

View File

@@ -13,7 +13,7 @@ Use this skill for `qa-lab` / `qa-channel` work. Repo-local QA only.
- `docs/help/testing.md`
- `docs/channels/qa-channel.md`
- `qa/README.md`
- `qa/scenarios/index.yaml`
- `qa/scenarios/index.md`
- `extensions/qa-lab/src/suite.ts`
- `extensions/qa-lab/src/character-eval.ts`
@@ -198,9 +198,7 @@ pnpm openclaw qa character-eval \
- Judges default to `openai/gpt-5.4,thinking=xhigh,fast` and `anthropic/claude-opus-4-6,thinking=high`.
- Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.
- Candidate and judge concurrency default to 16. Use `--concurrency <n>` and `--judge-concurrency <n>` to override when local gateways or provider limits need a gentler lane.
- Scenario source is YAML-only under `qa/scenarios/`: use `index.yaml` and
per-scenario `*.yaml` files with top-level `title`, `scenario`, and optional
`flow`. Never add fenced `qa-scenario` / `qa-flow` Markdown files.
- Scenario source should stay markdown-driven under `qa/scenarios/`.
- For isolated character/persona evals, write the persona into `SOUL.md` and blank `IDENTITY.md` in the scenario flow. Use `SOUL.md + IDENTITY.md` only when intentionally testing how the normal OpenClaw identity combines with the character.
- Keep prompts natural and task-shaped. The candidate model should receive character setup through `SOUL.md`, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.
- Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.
@@ -236,8 +234,7 @@ pnpm openclaw qa manual \
## Repo facts
- Seed scenarios live in `qa/scenarios/index.yaml` and
`qa/scenarios/<theme>/*.yaml`.
- Seed scenarios live in `qa/`.
- Main live runner: `extensions/qa-lab/src/suite.ts`
- QA lab server: `extensions/qa-lab/src/lab-server.ts`
- Child gateway harness: `extensions/qa-lab/src/gateway-child.ts`
@@ -265,9 +262,8 @@ pnpm openclaw qa manual \
## When adding scenarios
- Add or update scenario YAML under `qa/scenarios/`; do not add `.md` scenario
files or fenced YAML blocks.
- Keep kickoff expectations in `qa/scenarios/index.yaml` aligned
- Add or update scenario markdown under `qa/scenarios/`
- Keep kickoff expectations in `qa/scenarios/index.md` aligned
- Add executable coverage in `extensions/qa-lab/src/suite.ts`
- Prefer end-to-end assertions over mock-only checks
- Save outputs under `.artifacts/qa-e2e/`

View File

@@ -0,0 +1,93 @@
---
name: openclaw-release-ci
description: "Run, watch, debug, and summarize OpenClaw full release CI, release checks, live provider gates, install/update proofs, and release-secret preflights."
---
# OpenClaw Release CI
Use this with `$openclaw-release-maintainer` and `$openclaw-testing` when a release candidate needs full validation, install/update proof, live provider checks, or CI recovery.
## Guardrails
- No version bump, tag, npm publish, GitHub release, or release promotion without explicit operator approval.
- Validate provider secrets before dispatching expensive full release matrices.
- Do not set GitHub secrets from unvalidated 1Password candidates. If a candidate returns 401/403, leave the existing secret alone and report the exact missing provider.
- Use `$one-password` for secret reads/writes: one persistent tmux session, targeted items only, no secret output.
- Watch one parent run plus compact child summaries. Avoid broad `gh run view` polling loops; REST quota is easy to burn.
- Fetch logs only for failed or currently-blocking jobs. If quota is low, stop polling and wait for reset.
- Treat live-provider flakes separately from code failures: prove key validity, provider HTTP status, retry evidence, and exact failing lane before editing code.
## Preflight
Before full release validation:
```bash
node .agents/skills/openclaw-release-ci/scripts/verify-provider-secrets.mjs --required openai,anthropic,fireworks
gh api rate_limit --jq '.resources.core'
git status --short --branch
git rev-parse HEAD
```
1Password service-account values are the first source for release provider
preflight. Inject those exact targeted keys first, then run the verifier; use
ambient env only when it was already intentionally injected for this release.
The script prints only provider status and HTTP class, never tokens.
## Dispatch
Prefer the trusted workflow on `main`, target the exact release SHA:
```bash
gh workflow run full-release-validation.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<release-sha> \
-f provider=openai \
-f mode=both \
-f release_profile=full \
-f rerun_group=all
```
Use `release_profile=stable` unless the operator explicitly asks for the broad advisory provider/media matrix. Use narrow `rerun_group` after focused fixes.
## Watch
Use the summary helper instead of repeated raw polling:
```bash
node .agents/skills/openclaw-release-ci/scripts/release-ci-summary.mjs <full-release-run-id>
```
Then watch only when useful:
```bash
gh run watch <full-release-run-id> --repo openclaw/openclaw --exit-status
```
Stop watchers before ending the turn or switching strategy.
## Failure Triage
1. Confirm parent SHA and child run IDs.
2. List failed jobs only:
```bash
gh run view <child-run-id> --repo openclaw/openclaw --json jobs \
--jq '.jobs[] | select(.conclusion=="failure" or .conclusion=="timed_out" or .conclusion=="cancelled") | [.databaseId,.name,.conclusion,.url] | @tsv'
```
3. Fetch one failed job log. If rate-limited, note reset time and avoid more REST calls.
4. For secret-looking failures, validate the provider endpoint from the same secret source before editing code.
5. For live-cache failures, inspect whether it is missing/invalid key, empty text, provider refusal, timeout, or baseline miss. Do not weaken release gates without clear provider evidence.
6. Fix narrowly, run local/changed proof, commit, push, rerun the smallest matching group.
## Evidence
Record:
- release SHA
- full parent run URL
- child run IDs and conclusions: CI, Release Checks, Plugin Prerelease, NPM Telegram
- targeted local proof commands
- provider-secret preflight result
- known gaps or unrelated failures
For lessons and recovery patterns, read `references/release-ci-notes.md`.

View File

@@ -0,0 +1,4 @@
interface:
display_name: "OpenClaw Release CI"
short_description: "Verify and debug OpenClaw release validation runs"
default_prompt: "Use $openclaw-release-ci to preflight provider secrets, watch full release validation, summarize child runs, and triage only failing release lanes."

View File

@@ -0,0 +1,79 @@
#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import process from "node:process";
const runId = process.argv[2];
const repo = process.env.OPENCLAW_RELEASE_REPO || "openclaw/openclaw";
if (!runId) {
console.error("usage: release-ci-summary.mjs <full-release-run-id>");
process.exit(2);
}
function gh(args) {
return execFileSync("gh", args, {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
}
function jsonGh(args) {
return JSON.parse(gh(args));
}
function rate() {
try {
return jsonGh(["api", "rate_limit"]).resources.core;
} catch {
return undefined;
}
}
const core = rate();
if (core) {
const reset = new Date(core.reset * 1000).toISOString();
console.log(`rate: remaining=${core.remaining}/${core.limit} reset=${reset}`);
if (core.remaining < 20) {
console.error("rate too low for CI summary; wait for reset before polling");
process.exit(3);
}
}
const parent = jsonGh([
"run",
"view",
runId,
"--repo",
repo,
"--json",
"status,conclusion,createdAt,headSha,url,jobs",
]);
console.log(`parent: ${runId} ${parent.status}/${parent.conclusion || "none"}`);
console.log(`sha: ${parent.headSha}`);
console.log(`url: ${parent.url}`);
for (const job of parent.jobs ?? []) {
const marker = job.conclusion || job.status;
console.log(`parent-job: ${marker} ${job.name}`);
}
const since = parent.createdAt;
const runList = gh([
"api",
`repos/${repo}/actions/runs?per_page=100`,
"--jq",
`.workflow_runs[] | select(.created_at >= "${since}") | select(.name=="CI" or .name=="OpenClaw Release Checks" or .name=="Plugin Prerelease" or .name=="NPM Telegram Beta E2E" or .name=="Full Release Validation") | [.id,.name,.status,.conclusion,.head_sha,.html_url] | @tsv`,
]).trim();
if (!runList) {
console.log("children: none found yet");
process.exit(0);
}
console.log("children:");
for (const line of runList.split("\n")) {
const [id, name, status, conclusion, sha, url] = line.split("\t");
console.log(`child: ${id} ${name} ${status}/${conclusion || "none"} sha=${sha}`);
console.log(`child-url: ${url}`);
}

View File

@@ -0,0 +1,113 @@
#!/usr/bin/env node
import process from "node:process";
const args = new Map();
for (let index = 2; index < process.argv.length; index += 1) {
const arg = process.argv[index];
if (!arg.startsWith("--")) continue;
const [key, inlineValue] = arg.slice(2).split("=", 2);
const value = inlineValue ?? process.argv[index + 1];
if (inlineValue === undefined) index += 1;
args.set(key, value);
}
const requiredInput = String(args.get("required") ?? "openai,anthropic").trim();
const required = new Set(
(requiredInput.toLowerCase() === "none" ? "" : requiredInput)
.split(",")
.map((entry) => entry.trim().toLowerCase())
.filter(Boolean),
);
const timeoutMs = Number(args.get("timeout-ms") ?? 10_000);
function envFirst(names) {
for (const name of names) {
const value = process.env[name]?.trim();
if (value) return { name, value };
}
return undefined;
}
async function checkProvider(id, config) {
const secret = envFirst(config.env);
if (!secret) {
return { id, ok: false, status: "missing", env: config.env.join("|") };
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
const headers = config.headers(secret.value);
const response = await fetch(config.url, {
headers,
signal: controller.signal,
});
return {
id,
ok: response.ok,
status: response.ok ? "ok" : `http_${response.status}`,
env: secret.name,
};
} catch (error) {
return {
id,
ok: false,
status: error?.name === "AbortError" ? "timeout" : "error",
env: secret.name,
};
} finally {
clearTimeout(timer);
}
}
const providers = {
openai: {
env: ["OPENAI_API_KEY"],
url: "https://api.openai.com/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
anthropic: {
env: ["ANTHROPIC_API_KEY", "ANTHROPIC_API_TOKEN"],
url: "https://api.anthropic.com/v1/models",
headers: (token) => ({
"anthropic-version": "2023-06-01",
"x-api-key": token,
}),
},
fireworks: {
env: ["FIREWORKS_API_KEY"],
url: "https://api.fireworks.ai/inference/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
openrouter: {
env: ["OPENROUTER_API_KEY"],
url: "https://openrouter.ai/api/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
};
const unknown = [...required].filter((id) => !providers[id]);
if (unknown.length > 0) {
console.error(`unknown providers: ${unknown.join(",")}`);
process.exit(2);
}
const results = [];
for (const id of Object.keys(providers)) {
if (required.has(id) || envFirst(providers[id].env)) {
results.push(await checkProvider(id, providers[id]));
}
}
let failed = false;
for (const result of results) {
const requiredLabel = required.has(result.id) ? "required" : "optional";
console.log(`${result.id}: ${result.status} env=${result.env} ${requiredLabel}`);
if (required.has(result.id) && !result.ok) failed = true;
}
if (failed) {
console.error("release provider secret preflight failed");
process.exit(1);
}

View File

@@ -0,0 +1,632 @@
---
name: openclaw-release-maintainer
description: Prepare or verify OpenClaw stable/beta releases, changelogs, release notes, publish commands, and artifacts.
---
# OpenClaw Release Maintainer
Use this skill for release and publish-time workflow. Keep ordinary development changes and GHSA-specific advisory work outside this skill.
## Respect release guardrails
- Do not change version numbers without explicit operator approval.
- Ask permission before any npm publish or release step.
- This skill should be sufficient to drive the normal release flow end-to-end.
- Use the private maintainer release docs for credentials, recovery steps, and mac signing/notary specifics, and use `docs/reference/RELEASING.md` for public policy.
- Core `openclaw` publish is manual `workflow_dispatch`; creating or pushing a tag does not publish by itself.
- Normal release work happens on a branch cut from `main`, not directly on
`main`. Use `release/YYYY.M.D` for the branch name.
- If the operator asks for a release without saying stable/full, default to
beta only. Continue from beta to stable only when the operator explicitly asks
for the full release or an automated beta-and-stable train.
- Before release branching, pull latest `main` and confirm current `main` CI is
green. Then branch from that commit so regular development can continue on
`main` while release validation runs.
- Before release branching, commit any dirty files in coherent groups, push,
pull/rebase, then run `/changelog` on `main` and commit/push/pull that
changelog rewrite immediately before creating the release branch.
- During release planning, inspect both `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` before branching and again
before final publish. For every deprecated or removal-pending compatibility
record whose `removeAfter` date is on or before the release date, either
remove the compatibility path where safe and validate the affected tests, or
write down why removal is blocked and get explicit maintainer approval before
shipping the expired compatibility path.
- When removing deprecated runtime/config compatibility, preserve any doctor
migration, repair, or hint that is still needed by supported upgrade paths.
Doctor-side compatibility should stay tracked in
`src/commands/doctor/shared/deprecation-compat.ts` until maintainers confirm
the repair is no longer needed.
- Revalidate compatibility replacement text during release planning. The
recommended replacement can shift as plugin ownership, externalization, and
config footprint move, so do not blindly copy stale replacement annotations
into release notes.
- Do not delete or rewrite beta tags after their matching npm package has been
published. If a pushed beta tag fails before npm publish, the version is not
consumed: keep the same `-beta.N`, delete/recreate or force-move the git tag
and prerelease to the fixed commit, and rerun preflight. Do not increment to
the next beta number until the matching npm package has actually published.
If a published beta needs a fix, commit the fix on the release branch and
increment to the next `-beta.N`.
- For a beta release train, run the fast local preflight first, publish the
beta to npm `beta`, then run the expensive published-package roster focused
on install/update/Docker/Parallels/NPM Telegram. If anything fails, fix it on
the release branch, commit/push/pull, increment beta number, and repeat. Run
the full expensive roster at least once before stable/latest promotion; for
later beta attempts, rerun only lanes whose evidence changed unless the fix
touches broad release, install/update, plugin, Docker, Parallels, or live QA
behavior. After each beta is published, scan current `main` once for critical
fixes that landed after the release branch cut and backport only important
low-risk fixes. Operators may authorize up to 4 autonomous beta attempts;
after 4 failed beta attempts, stop and report.
- Use `/changelog` before version/tag preparation so the top changelog section
is deduped and ordered by user impact.
- Do not create beta-specific `CHANGELOG.md` headings. Beta releases use the
stable base version section, for example `v2026.4.20-beta.1` uses
`## 2026.4.20` release notes.
- When any beta or stable release is live, make a best-effort Discord
announcement using the configured secret workflow; do not block or roll back
the release if the announcement fails.
- When asked to announce on X, use `~/Projects/bird/bird` and follow the
release tweet style below.
## Keep release channel naming aligned
- `stable`: tagged releases only, published to npm `beta` by default; operators may target npm `latest` explicitly or promote later
- `beta`: prerelease tags like `vYYYY.M.D-beta.N`, with npm dist-tag `beta`
- Prefer `-beta.N`; do not mint new `-1` or `-2` beta suffixes
- `dev`: moving head on `main`
- When using a beta Git tag, publish npm with the matching beta version suffix so the plain version is not consumed or blocked
## Handle versions and release files consistently
- Version locations include:
- `package.json`
- `apps/android/app/build.gradle.kts`
- `apps/ios/Sources/Info.plist`
- `apps/ios/Tests/Info.plist`
- `apps/macos/Sources/OpenClaw/Resources/Info.plist`
- `docs/install/updating.md`
- Peekaboo Xcode project and plist version fields
- Before creating a release tag, make every version location above match the version encoded by that tag.
- For fallback correction tags like `vYYYY.M.D-N`, the repo version locations still stay at `YYYY.M.D`.
- “Bump version everywhere” means all version locations above except `appcast.xml`.
- Release signing and notary credentials live outside the repo in the private maintainer docs.
- Every stable OpenClaw release ships the npm package and macOS app together.
Beta releases normally ship npm/package artifacts first and skip mac app
build/sign/notarize unless the operator requests mac beta validation.
- Do not let the slower macOS signing/notary path block npm publication once
the npm preflight has passed. Keep mac validation/publish running in
parallel, publish npm from the successful npm preflight, then start published
npm install/update, Docker, and Parallels verification while mac artifacts
continue.
- After a beta is published, overlap remote/manual release rosters where useful,
but avoid piling local Docker, Parallels, and QA-Lab work onto the same host
when it would create system-load noise. Use selective reruns after failures or
fixes, but keep proof that Docker, Parallels, and QA-Lab each passed at least
once before stable/latest promotion.
- Mac packaging may be built from a slight release-branch variation of the
tagged commit when the delta is mac packaging, signing, workflow, or
validation-only release machinery. If mac packaging needs release-branch-only
fixes after the stable npm package or GitHub tag is already published, do not
create a `vYYYY.M.D-N` correction tag just to change the workflow source.
Dispatch the private mac workflows for the original `tag=vYYYY.M.D` with
`source_ref=release/YYYY.M.D` and `public_release_branch=release/YYYY.M.D`;
provenance checks must prove the source SHA descends from the tag and
validation/preflight use the same source. Reserve `vYYYY.M.D-N` correction
tags for emergency hotfixes that must publish a new npm package/release
identity, not for ordinary mac-only packaging recovery.
- The production Sparkle feed lives at `https://raw.githubusercontent.com/openclaw/openclaw/main/appcast.xml`, and the canonical published file is `appcast.xml` on `main` in the `openclaw` repo.
- That shared production Sparkle feed is stable-only. Beta mac releases may
upload assets to the GitHub prerelease, but they must not replace the shared
`appcast.xml` unless a separate beta feed exists.
- For fallback correction tags like `vYYYY.M.D-N`, the repo version still stays
at `YYYY.M.D`, but the mac release must use a strictly higher numeric
`APP_BUILD` / Sparkle build than the original release so existing installs
see it as newer.
## Build changelog-backed release notes
- Before release branching or tagging, rewrite the target `CHANGELOG.md`
section from commit history, not just from existing notes: scan commits since
the last reachable release tag, add missed user-facing changes, dedupe
overlapping entries, and sort each section from most to least interesting for
users.
- Changelog entries should be user-facing, not internal release-process notes.
- GitHub release and prerelease bodies must use the full matching
`CHANGELOG.md` version section, not highlights or an excerpt. When creating
or editing a release, extract from `## YYYY.M.D` through the line before the
next level-2 heading and use that complete block as the release notes.
- When preparing release notes, scan `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` for compatibility records
with `warningStarts` or `removeAfter` within 7 days after the release date.
Add an `Upcoming deprecations` note to the release notes when any exist,
including the compatibility code, target date, replacement, and a link to the
record's `docsPath` or `/plugins/compatibility` when no more specific
deprecation page exists.
- When cutting a mac release with a beta GitHub prerelease:
- tag `vYYYY.M.D-beta.N` from the release commit
- create a prerelease titled `openclaw YYYY.M.D-beta.N`
- use release notes from the stable base `CHANGELOG.md` version section
(`## YYYY.M.D`), not a beta-specific heading
- attach at least the zip and dSYM zip, plus dmg if available
- Keep the top version entries in `CHANGELOG.md` sorted by impact:
- `### Changes` first
- `### Fixes` deduped with user-facing fixes first
## Write release tweets
Use the OpenClaw account's existing release-post style:
- Format: `OpenClaw YYYY.M.D 🦞` or `🦞 OpenClaw YYYY.M.D is live`, blank line,
then 3-4 emoji-led bullets, blank line, one short punchline, then the release
link.
- For beta: say `OpenClaw YYYY.M.D-beta.N 🦞` or `OpenClaw YYYY.M.D beta N is
live`; keep it clearly beta and avoid implying stable promotion.
- Lead with user-visible capabilities, then important integrations, then
reliability/security/install fixes. Compress "lots of fixes" into one
readable bullet.
- Read the full changelog section before drafting. Do not lead with coverage,
CI, validation, or internal release mechanics unless the release is explicitly
about those. Peter prefers concrete user wins: features, integrations,
workflow improvements, and practical reliability fixes.
- Do not feature QA parity, test coverage, release gates, or validation lanes in
user-facing launch tweets. Keep them for release notes or maintainer proof
unless the operator explicitly asks for validation-focused copy.
- Do not feature plugin-author or developer tooling such as SDK helpers,
tool-plugin scaffolding, build/validate/init commands, or internal CLI
plumbing in general user-facing launch tweets unless the operator explicitly
asks for developer-focused copy.
- Tone: high-signal, slightly cheeky, confident, not corporate. One joke is
enough. Avoid punching down, insulting users, or promising what was not
verified.
- Peter likes dry, compact taglines when they feel earned. Good example:
`Big release, tiny release notes... kidding.` Keep the joke short and let the
feature bullets carry the tweet; do not turn the punchline into a second
paragraph or a forced bit.
- Length: release tweets are always standard tweets under 280 characters, with
room for one URL. Trim to 3-4 bullets and count the final text before posting.
- Links/media: include the GitHub release or changelog link at the end of the
first release tweet.
- Thread follow-ups: if doing a thread, keep the first release tweet as the
compact launch post, then publish one focused feature explainer per reply.
Follow-up replies should not repeat "new in VERSION" or the version number
when the thread context already makes it obvious.
- Peter's preferred thread workflow: first agree on the generic launch tweet,
then proceed through follow-up tweets one by one. When he says `next`, provide
or copy the next follow-up only; do not dump the full thread again unless asked.
- Every follow-up tweet should include a docs URL for that specific feature.
Prefer a bare URL over `Docs: <url>` unless the label is needed for clarity.
Keep follow-ups concise: around 160-220 raw characters is usually the sweet
spot; under 280 is the hard cap. If a URL makes a tweet fail, trim prose
before dropping the URL.
Prefer explaining diagnostics, trajectory/export, provider setup, model
commands, or other setup-heavy features in follow-ups instead of overloading
the first release tweet.
- Hotfix/correction: be direct and accountable. State what slipped, what is
fixed, and the new version. Keep jokes out of incident-style posts.
Examples to adapt:
```text
OpenClaw 2026.4.20-beta.1 🦞
🐳 Docker install/update smoke
🖥️ Parallels upgrade checks
🔧 Package verification tightened
Beta first. Stable after the gauntlet.
<release link>
```
```text
OpenClaw 2026.4.20 🦞
🚀 Faster install + update
🐳 Docker + Parallels verified
🍎 macOS signed + notarized
🔧 Channel/plugin fixes
Good boring release. Best kind.
<release link>
```
```text
Packaging issue in 2026.4.20-beta.1.
2026.4.20-beta.2 fixes install/update verification. No tag rewrites; beta moves
forward.
Upgrade with the beta channel.
<release link>
```
## Run publish-time validation
Before tagging or publishing, run:
```bash
pnpm check:architecture
pnpm build
pnpm ui:build
pnpm qa:otel:smoke
pnpm release:check
pnpm test:install:smoke
```
- Use `pnpm qa:otel:smoke` when release validation needs telemetry coverage.
It starts a local OTLP/HTTP trace receiver, runs QA-lab's
`otel-trace-smoke`, and checks span names plus content/identifier redaction
without external Opik or Langfuse credentials.
For a non-root smoke path:
```bash
OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke
```
After npm publish, run:
```bash
node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
```
- This verifies the published registry install path in a fresh temp prefix.
- For stable correction releases like `YYYY.M.D-N`, it also verifies the
upgrade path from `YYYY.M.D` to `YYYY.M.D-N` so a correction publish cannot
silently leave existing global installs on the old base stable payload.
- Treat install smoke as a pack-budget gate too. `pnpm test:install:smoke`
now fails the candidate update tarball when npm reports an oversized
`unpackedSize`, so release-time e2e cannot miss pack bloat that would risk
low-memory install/startup failures.
- Keep direct npm global coverage enabled in install smoke. It exercises plain
`npm install -g <candidate>` fresh installs and npm-driven update installs,
because many users install with npm even when docs prefer pnpm.
- Use `pnpm test:live:media video` for bounded video-provider smoke when video
generation is in release scope. The default video smoke skips `fal`, runs one
text-to-video attempt per provider with a one-second lobster prompt, and caps
each provider operation with `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS`
(`180000` by default).
- Run `pnpm test:live:media video --video-providers fal` only when FAL-specific
proof is required. Its queue latency can dominate release time.
- Set `OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1` only when intentionally
validating the slower image-to-video and video-to-video transform lanes.
## Check all relevant release builds
- Always validate the OpenClaw npm release path before creating the tag.
- Use the configured secret workflow before live release validation so OpenAI
and Anthropic credentials are available without printing secrets.
- Parallels validation and any local live model QA for this train must use both
`OPENAI_API_KEY` and `ANTHROPIC_API_KEY`. If either cannot be injected, stop
before starting those local long lanes and report the missing key.
- Live credentialed channel QA is the GitHub Actions workflow
`QA-Lab - All Lanes` (`.github/workflows/qa-live-telegram-convex.yml`), not a
local substitute. Dispatch it from Actions against the release tag and wait
for it to pass before npm preflight/publish readiness. Use a SHA only when it
satisfies the workflow's secret-bearing trust gate: main ancestor or open PR
head. It runs the QA Lab mock parity gate plus live Matrix and live Telegram
lanes using the `qa-live-shared` environment; Telegram uses Convex CI
credential leases.
- Default release checks:
- `pnpm check`
- `pnpm check:test-types`
- `pnpm check:architecture`
- `pnpm build`
- `pnpm ui:build`
- `pnpm release:check`
- `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- Full pre-npm beta test roster:
- default release checks above
- all Docker tests: `pnpm test:docker:all`, plus standalone Docker live lanes
not covered by the aggregate when operator says "all docker tests":
`pnpm test:docker:live-acp-bind`, `pnpm test:docker:live-cli-backend`, and
`pnpm test:docker:live-codex-harness`
- all Parallels install/update tests:
`pnpm test:parallels:npm-update -- --json` plus any needed individual
rerun lanes from `openclaw-parallels-smoke`
- all QA release validation: dispatch GitHub Actions > `QA-Lab - All Lanes`
against the release tag and require success. This is the release gate for
live credentialed Matrix/Telegram channel coverage. Use a SHA only when it
satisfies the workflow trust gate. Run local OpenAI/Anthropic suites or
repo-backed character evals only when the operator asks for extra model
coverage or a failure needs local debugging.
- Post-published beta verification roster:
- `node --import tsx scripts/openclaw-npm-postpublish-verify.ts <beta-version>`
- install/update smoke against the published beta channel
- Docker install/update coverage that exercises the published beta package
- published npm Telegram proof: dispatch Actions > `NPM Telegram Beta E2E`
from `main` with `package_spec=openclaw@<beta-version>` and
`provider_mode=mock-openai`, and require success. This workflow is
maintainer-dispatched and intentionally has no `npm-release` approval gate;
`qa-live-shared` only supplies the shared QA secrets. This is the default
button path for installed-package onboarding, Telegram setup, and real
Telegram E2E against the published npm package.
Use the local `pnpm test:docker:npm-telegram-live` lane with the matching
`OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC` and Convex CI env only as a fallback
or debugging path.
- Parallels published beta install/update coverage with both OpenAI and
Anthropic provider keys available
- Parallels install/update proof must keep plugin installs enabled unless the
operator explicitly scopes a harness-only isolation check; a lane that
disables bundled plugin installs is not valid plugin/dependency release
evidence.
- targeted QA reruns only for areas touched by fixes after the full pre-npm
roster, unless the operator requests the full QA roster again. If the fix
touches live channel QA, credential plumbing, Matrix, Telegram, or the QA
harness, rerun Actions > `QA-Lab - All Lanes`.
- Check all release-related build surfaces touched by the release, not only the npm package.
- For beta-style full e2e batteries, hard-cap top-level long lanes instead of letting them run indefinitely. Use host `timeout --foreground`/`gtimeout --foreground` caps such as:
- `45m` for `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- `90m` for `pnpm test:docker:all`
- `60m` each for standalone Docker live lanes
- `180m` for local full QA live OpenAI + Anthropic rosters when explicitly
requested; the default release channel QA gate is Actions >
`QA-Lab - All Lanes`
- Parallels caps from the `openclaw-parallels-smoke` skill
If a lane hits its cap, stop and inspect/fix the affected lane before continuing; do not continue to wait on the same process.
- Actual npm install/update phases are capped at 5 minutes. If `npm install -g`, installer package install, or `openclaw update` takes longer than 300s in release e2e, stop treating the run as healthy progress and debug the installer/updater or harness.
- Serialize host build/package mutations ahead of VM lanes. Finish `pnpm build`, `pnpm ui:build`, `pnpm release:check`, install smoke, and any Docker/package-prep lanes before starting Parallels `npm pack` lanes; otherwise `dist` can disappear during VM pack prep and produce false failures.
- Include mac release readiness in preflight by running the public validation
workflow in `openclaw/openclaw` and the real mac preflight in
`openclaw/releases-private` for every release.
- Treat the `appcast.xml` update on `main` as part of mac release readiness, not an optional follow-up.
- The workflows remain tag-based. The agent is responsible for making sure
preflight runs complete successfully before any publish run starts.
- Any fix after preflight means a new commit. Delete and recreate the tag and
matching GitHub release from the fixed commit, then rerun preflight from
scratch before publishing.
Exception: never delete or recreate a beta tag whose matching npm package has
already been published; increment to the next beta number instead. If only the
pushed tag/prerelease exists and npm publish has not happened, recreate that
same beta tag at the fixed commit.
- For stable mac releases, generate the signed `appcast.xml` before uploading
public release assets so the updater feed cannot lag the published binaries.
- Serialize stable appcast-producing runs across tags so two releases do not
generate replacement `appcast.xml` files from the same stale seed.
- For stable releases, rely primarily on the latest beta's broader release
workflow confidence. When promoting the matching non-beta build to npm
`latest`, prefer a light time-bounded verification pass: published npm
postpublish verify, Docker install/update smoke, macOS-only Parallels
install/update smoke, and required QA signal. Do not rerun the full
Docker/Parallels matrix unless the beta evidence is stale, the stable build
differs materially from beta, or the operator explicitly asks for full
retesting.
- If any required build, packaging step, or release workflow is red, do not say the release is ready.
## Use the right auth flow
- OpenClaw publish uses GitHub trusted publishing.
- Stable npm promotion from `beta` to `latest` uses the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow because `npm dist-tag` management needs `NPM_TOKEN`, while the
public npm release workflow stays OIDC-only.
- Prefer fixing the private workflow token path over any local 1Password
fallback. The desired setup is a granular npm token stored as the private
repo's `NPM_TOKEN` secret, scoped to the `openclaw` package with read/write
and 2FA bypass for automation.
- If the private dist-tag workflow cannot promote because `NPM_TOKEN` is absent
or stale, use the local tmux + 1Password fallback:
- Start or reuse a tmux session so interactive `npm login` and OTP prompts
are observable and recoverable.
- Hard rule: never run `op` directly in the main agent shell during release
work. Any 1Password CLI use must happen inside that tmux session so prompts
and alerts are contained and observable.
- Use the 1Password item `op://Private/Npmjs` for npm credentials and OTP.
Do not print passwords, tokens, or OTPs to the transcript; send them through
tmux buffers, env vars scoped to the tmux command, or `expect` with
`log_user 0`.
- Re-authenticate npm inside that tmux session with
`npm login --auth-type=legacy`, then confirm `npm whoami` reports
`steipete`.
- Promote with a fresh OTP:
`npm dist-tag add openclaw@YYYY.M.D latest --otp "$OTP"`.
- Verify with a cache-bypassed registry read, for example:
`npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`
and `npm view openclaw@latest version dist.tarball --json --prefer-online`.
- Direct stable publishes can also use that private dist-tag workflow to point
`beta` at the already-published `latest` version when the operator wants both
tags aligned immediately.
- The publish run must be started manually with `workflow_dispatch`.
- The npm workflow and the private mac publish workflow accept
`preflight_only=true` to run validation/build/package steps without uploading
public release assets.
- Real npm publish requires a prior successful npm preflight run id so the
publish job promotes the prepared tarball instead of rebuilding it.
- Real private mac publish requires a prior successful private mac preflight
run id so the publish job promotes the prepared artifacts instead of
rebuilding or renotarizing them again.
- The private mac workflow also accepts `smoke_test_only=true` for branch-safe
workflow smoke tests that use ad-hoc signing, skip notarization, skip shared
appcast generation, and do not prove release readiness.
- `preflight_only=true` on the npm workflow is also the right way to validate an
existing tag after publish; it should keep running the build checks even when
the npm version is already published.
- npm validation-only preflight may still be dispatched from ordinary branches
when testing workflow changes before merge. Release checks and real publish
use only `main` or `release/YYYY.M.D`.
- `.github/workflows/macos-release.yml` in `openclaw/openclaw` is now a
public validation-only handoff. It validates the tag/release state and points
operators to the private repo. It still rebuilds the JS outputs needed for
release validation, but it does not sign, notarize, or publish macOS
artifacts.
- `openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
is the required private mac validation lane for `swift test`; keep it green
before any real stable mac publish run starts.
- Real mac preflight and real mac publish both use
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`.
- The private mac validation lane runs on GitHub's standard macOS runner.
- The private mac preflight path runs on GitHub's xlarge macOS runner and uses
a SwiftPM cache because the build/sign/notarize/package path is CPU-heavy.
- Private mac preflight uploads notarized build artifacts as workflow artifacts
instead of uploading public GitHub release assets.
- Private smoke-test runs upload ad-hoc, non-notarized build artifacts as
workflow artifacts and intentionally skip stable `appcast.xml` generation.
- For stable releases, npm preflight, public mac validation, private mac
validation, and private mac preflight must all pass before any real publish
run starts. For beta releases, npm preflight plus the selected Docker,
install/update, Parallels, and release-check lanes are sufficient unless mac
beta validation was explicitly requested.
- Real publish runs may be dispatched from `main` or from a
`release/YYYY.M.D` branch. For release-branch runs, the tag must be contained
in that release branch, and the real publish must reuse a successful preflight
from the same branch.
- The release workflows stay tag-based; rely on the documented release sequence
rather than workflow-level SHA pinning.
- The `npm-release` environment must be approved by `@openclaw/openclaw-release-managers` before publish continues.
- Mac publish uses
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml` for
private mac preflight artifact preparation and real publish artifact
promotion.
- Real private mac publish uploads the packaged `.zip`, `.dmg`, and
`.dSYM.zip` assets to the existing GitHub release in `openclaw/openclaw`
automatically when `OPENCLAW_PUBLIC_REPO_RELEASE_TOKEN` is present in the
private repo `mac-release` environment.
- For stable releases, the agent must also download the signed
`macos-appcast-<tag>` artifact from the successful private mac workflow and
then update `appcast.xml` on `main`.
- For beta mac releases, do not update the shared production `appcast.xml`
unless a separate beta Sparkle feed exists.
- The private repo targets a dedicated `mac-release` environment. If the GitHub
plan does not yet support required reviewers there, do not assume the
environment alone is the approval boundary; rely on private repo access and
CODEOWNERS until those settings can be enabled.
- Do not use `NPM_TOKEN` or the plugin OTP flow for the OpenClaw package
publish path; package publishing uses trusted publishing.
- Use `NPM_TOKEN` only for explicit npm dist-tag management modes, because npm
does not support trusted publishing for `npm dist-tag add`.
- `@openclaw/*` plugin publishes use a separate maintainer-only flow.
- Only publish plugins that already exist on npm; bundled disk-tree-only plugins stay unpublished.
## Fallback local mac publish
- Keep the original local macOS publish workflow available as a fallback in case
CI/CD mac publishing is unavailable or broken.
- Preserve the existing maintainer workflow Peter uses: run it on a real Mac
with local signing, notary, and Sparkle credentials already configured.
- Follow the private maintainer macOS runbook for the local steps:
`scripts/package-mac-dist.sh` to build, sign, notarize, and package the app;
manual GitHub release asset upload; then `scripts/make_appcast.sh` plus the
`appcast.xml` commit to `main`.
- `scripts/package-mac-dist.sh` now fails closed for release builds if the
bundled app comes out with a debug bundle id, an empty Sparkle feed URL, or a
`CFBundleVersion` below the canonical Sparkle build floor for that short
version. For correction tags, set a higher explicit `APP_BUILD`.
- `scripts/make_appcast.sh` first uses `generate_appcast` from `PATH`, then
falls back to the SwiftPM Sparkle tool output under `apps/macos/.build`.
- For stable tags, the local fallback may update the shared production
`appcast.xml`.
- For beta tags, the local fallback still publishes the mac assets but must not
update the shared production `appcast.xml` unless a separate beta feed exists.
- Treat the local workflow as fallback only. Prefer the CI/CD publish workflow
when it is working.
- After any stable mac publish, verify all of the following before you call the
release finished:
- the GitHub release has `.zip`, `.dmg`, and `.dSYM.zip` assets
- `appcast.xml` on `main` points at the new stable zip
- the packaged app reports the expected short version and a numeric
`CFBundleVersion` at or above the canonical Sparkle build floor
## Run the release sequence
1. Confirm the operator explicitly wants to cut a release.
2. Choose the exact target version and git tag.
3. Commit any dirty files in coherent groups, push, pull/rebase, and verify the
worktree is clean.
4. Pull latest `main` and confirm current `main` CI is green.
5. Run `/changelog` for the stable base target version on `main`, commit the
changelog rewrite immediately, push, and pull/rebase. For beta releases,
keep the changelog heading as `## YYYY.M.D`, not `## YYYY.M.D-beta.N`.
6. Create `release/YYYY.M.D` from that post-changelog `main` commit.
7. Make every repo version location match the beta tag before creating it.
8. Commit release preparation changes on the release branch and push the branch.
9. Run the fast local beta preflight from the release branch before any npm
preflight or publish. Keep expensive Docker, Parallels, and published-package
install/update lanes for after the beta is live unless the operator asks to
run them before beta publication.
10. For beta releases, skip mac app build/sign/notarize unless beta scope or a
release blocker specifically requires it. For stable releases, include the
mac app, signing, notarization, and appcast path.
11. Confirm the target npm version is not already published.
12. Create and push the git tag from the release branch.
13. Create or refresh the matching GitHub release.
14. Dispatch Actions > `QA-Lab - All Lanes` against the release tag and wait
for the mock parity, live Matrix, and live Telegram credentialed-channel
lanes to pass.
15. Start `.github/workflows/openclaw-npm-release.yml` from the release branch
with `preflight_only=true`
and choose the intended `npm_dist_tag` (`beta` default; `latest` only for
an intentional direct stable publish). Wait for it to pass. Save that run id
because the real publish requires it to reuse the prepared npm tarball.
16. For stable releases, start `.github/workflows/macos-release.yml` in
`openclaw/openclaw` and wait for the public validation-only run to pass.
17. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
with the same tag and wait for the private mac validation lane to pass.
18. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
with `preflight_only=true` and wait for it to pass. Save that run id because
the real publish requires it to reuse the notarized mac artifacts.
19. If any preflight or validation run fails, fix the issue on a new commit,
delete the tag and matching GitHub release, recreate them from the fixed
commit, and rerun all relevant preflights from scratch before continuing.
Never reuse old preflight results after the commit changes. For pushed or
published beta tags, do not delete/recreate; increment to the next beta tag.
For preflight-only failures where npm did not publish the beta version,
delete/recreate the same beta tag and prerelease at the fixed commit instead
of skipping a prerelease number.
20. Start `.github/workflows/openclaw-npm-release.yml` from the same branch with
the same tag for the real publish, choose `npm_dist_tag` (`beta` default,
`latest` only when you intentionally want direct stable publish), keep it
the same as the preflight run, and pass the successful npm
`preflight_run_id`.
21. Wait for `npm-release` approval from `@openclaw/openclaw-release-managers`.
22. Run postpublish verification:
`node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>`.
23. Run the post-published beta verification roster. First scan current `main`
for critical fixes that landed after the release branch cut; backport only
important low-risk fixes before starting expensive lanes, or increment to
the next beta if the fix must change the already-published package. If any
lane fails after the beta package is published, fix, commit/push/pull,
increment to the next beta tag, and rerun the affected beta evidence. Once
the beta is live, start remote/manual rosters where they
can overlap safely, but keep local Docker and Parallels load controlled.
Ensure the full expensive roster has passed at least once before
stable/latest promotion. The roster includes the manual Actions >
`NPM Telegram Beta E2E` workflow against the exact published beta package.
If a pre-npm lane fails before any tag/package leaves the machine, fix and
rerun the same intended beta attempt. Repeat up to the operator's
authorized beta-attempt limit, normally 4.
24. Announce the beta/stable release on Discord best-effort using the configured secret workflow.
25. If the operator requested beta only, stop after beta verification and the
announcement.
26. If the stable release was published to `beta`, use the light stable
promotion roster when the matching beta already carried the full confidence
pass: published npm postpublish verify, Docker install/update smoke,
macOS-only Parallels install/update smoke, and required QA signal.
Then start the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow to promote that stable version from `beta` to `latest`, then
verify `latest` now points at that version.
27. If the stable release was published directly to `latest` and `beta` should
follow it, start that same private dist-tag workflow to point `beta` at the
stable version, then verify both `latest` and `beta` point at that version.
28. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
for the real publish with the successful private mac `preflight_run_id` and
wait for success.
29. Verify the successful real private mac run uploaded the `.zip`, `.dmg`,
and `.dSYM.zip` artifacts to the existing GitHub release in
`openclaw/openclaw`.
30. For stable releases, download `macos-appcast-<tag>` from the successful
private mac run, update `appcast.xml` on `main`, and verify the feed. Merge
or cherry-pick release branch changes back to `main` after stable succeeds.
31. For beta releases, publish the mac assets only when intentionally requested;
expect no shared production
`appcast.xml` artifact and do not update the shared production feed unless a
separate beta feed exists.
32. After publish, verify npm and the attached release artifacts.
## GHSA advisory work
- Use `openclaw-ghsa-maintainer` for GHSA advisory inspection, patch/publish flow, private-fork validation, and GHSA API-specific publish checks.

View File

@@ -1,10 +1,8 @@
#!/usr/bin/env node
/**
* Secret scanning alert handler for OpenClaw maintainers.
* Usage: node secret-scanning.mjs <command> [options]
*/
// Secret scanning alert handler for OpenClaw maintainers.
// Usage: node secret-scanning.mjs <command> [options]
import { spawnSync } from "node:child_process";
import { execFileSync, spawnSync } from "node:child_process";
import crypto from "node:crypto";
import fs from "node:fs";
import os from "node:os";
@@ -41,9 +39,7 @@ function gh(args, { json = true, allowFailure = false } = {}) {
stderr: proc.stderr,
};
}
if (!json) {
return proc.stdout;
}
if (!json) return proc.stdout;
try {
return JSON.parse(proc.stdout);
} catch {
@@ -59,7 +55,6 @@ function isBodyLocationType(locationType) {
return locationType === "issue_body" || locationType === "pull_request_body";
}
/** Decides whether redacting an issue/PR body requires notifying the reporter. */
export function decideBodyRedaction(currentBody, redactedBody) {
const bodyChanged = String(currentBody) !== String(redactedBody);
return {
@@ -68,7 +63,6 @@ export function decideBodyRedaction(currentBody, redactedBody) {
};
}
/** Loads redaction-result metadata for issue/PR body secret locations. */
export function loadBodyRedactionResult(locationType, resultFile) {
if (!isBodyLocationType(locationType)) {
return { notify_required: true };
@@ -76,9 +70,7 @@ export function loadBodyRedactionResult(locationType, resultFile) {
if (!resultFile) {
fail("Body notifications require a redaction result file from redact-body-if-needed");
}
if (!fs.existsSync(resultFile)) {
fail(`File not found: ${resultFile}`);
}
if (!fs.existsSync(resultFile)) fail(`File not found: ${resultFile}`);
const result = JSON.parse(fs.readFileSync(resultFile, "utf8"));
if (typeof result.notify_required !== "boolean") {
@@ -190,11 +182,10 @@ function fetchDiscussionComment(discussionNumber, discussionCommentDbId) {
failOnGraphQLFailure(gql, `Failed to fetch discussion #${discussionNumber}`);
const discussion = gql?.data?.repository?.discussion;
if (!discussion) {
if (!discussion)
fail(
`Discussion #${discussionNumber} not found — it may have been deleted. The alert cannot be processed via this skill.`,
);
}
discussionId = discussion.id;
@@ -214,18 +205,15 @@ function fetchDiscussionComment(discussionNumber, discussionCommentDbId) {
`Failed to fetch replies for discussion comment ${topLevelComment.id}`,
);
const replies = replyPage?.data?.node?.replies;
if (!replies) {
if (!replies)
fail(`Failed to paginate replies for discussion comment ${topLevelComment.id}`);
}
reply = findDiscussionCommentNode(replies.nodes, discussionCommentDbId);
hasMoreReplies = replies.pageInfo.hasNextPage;
replyCursor = replies.pageInfo.endCursor;
}
if (reply) {
return { discussionId, comment: reply };
}
if (reply) return { discussionId, comment: reply };
}
hasNextPage = discussion.comments.pageInfo.hasNextPage;
@@ -253,9 +241,7 @@ function createDiscussionComment(discussionNodeId, body, replyToNodeId) {
* Fetch alert metadata + locations. Never exposes .secret.
*/
function cmdFetchAlert(alertNumber) {
if (!alertNumber) {
fail("Usage: fetch-alert <number>");
}
if (!alertNumber) fail("Usage: fetch-alert <number>");
const alert = gh(["api", `repos/${REPO}/secret-scanning/alerts/${alertNumber}?hide_secret=true`]);
@@ -294,23 +280,17 @@ function cmdFetchAlert(alertNumber) {
* Saves full body to a temp file. Prints metadata + file path to stdout.
*/
function cmdFetchContent(locationJson) {
if (!locationJson) {
fail("Usage: fetch-content '<location-json>'");
}
if (!locationJson) fail("Usage: fetch-content '<location-json>'");
const location = JSON.parse(locationJson);
const type = location.type;
const details = location.details;
if (type === "discussion_comment") {
const commentUrl = details.discussion_comment_url;
if (!commentUrl) {
fail("No discussion_comment_url in location details");
}
if (!commentUrl) fail("No discussion_comment_url in location details");
const urlMatch = commentUrl.match(/discussions\/(\d+)#discussioncomment-(\d+)/);
if (!urlMatch) {
fail(`Cannot parse discussion comment URL: ${commentUrl}`);
}
if (!urlMatch) fail(`Cannot parse discussion comment URL: ${commentUrl}`);
const discussionNumber = urlMatch[1];
const discussionCommentDbId = urlMatch[2];
@@ -318,11 +298,10 @@ function cmdFetchContent(locationJson) {
discussionNumber,
discussionCommentDbId,
);
if (!comment) {
if (!comment)
fail(
`Discussion comment #${discussionCommentDbId} not found in discussion #${discussionNumber}`,
);
}
const bodyFile = tmpFile("body.md");
fs.writeFileSync(bodyFile, comment.body || "");
@@ -355,9 +334,7 @@ function cmdFetchContent(locationJson) {
details.issue_comment_url ||
details.pull_request_comment_url ||
details.pull_request_review_comment_url;
if (!commentUrl) {
fail(`No comment URL in location details`);
}
if (!commentUrl) fail(`No comment URL in location details`);
const comment = gh(["api", commentUrl]);
const bodyFile = tmpFile("body.md");
@@ -401,9 +378,7 @@ function cmdFetchContent(locationJson) {
);
} else if (type === "issue_body") {
const issueUrl = details.issue_body_url || details.issue_url;
if (!issueUrl) {
fail("No issue URL in location details");
}
if (!issueUrl) fail("No issue URL in location details");
const issue = gh(["api", issueUrl]);
const bodyFile = tmpFile("body.md");
@@ -439,9 +414,7 @@ function cmdFetchContent(locationJson) {
);
} else if (type === "pull_request_body") {
const prUrl = details.pull_request_body_url || details.pull_request_url;
if (!prUrl) {
fail("No PR URL in location details");
}
if (!prUrl) fail("No PR URL in location details");
const pr = gh(["api", prUrl]);
const bodyFile = tmpFile("body.md");
@@ -517,9 +490,7 @@ function cmdRedactBody(kind, number, bodyFile) {
if (!kind || !number || !bodyFile) {
fail("Usage: redact-body <issue|pr> <number> <redacted-body-file>");
}
if (!fs.existsSync(bodyFile)) {
fail(`File not found: ${bodyFile}`);
}
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const endpoint =
kind === "pr" ? `repos/${REPO}/pulls/${number}` : `repos/${REPO}/issues/${number}`;
@@ -538,12 +509,8 @@ function cmdRedactBodyIfNeeded(kind, number, currentBodyFile, redactedBodyFile,
"Usage: redact-body-if-needed <issue|pr> <number> <current-body-file> <redacted-body-file> <result-file>",
);
}
if (!fs.existsSync(currentBodyFile)) {
fail(`File not found: ${currentBodyFile}`);
}
if (!fs.existsSync(redactedBodyFile)) {
fail(`File not found: ${redactedBodyFile}`);
}
if (!fs.existsSync(currentBodyFile)) fail(`File not found: ${currentBodyFile}`);
if (!fs.existsSync(redactedBodyFile)) fail(`File not found: ${redactedBodyFile}`);
const currentBody = fs.readFileSync(currentBodyFile, "utf8");
const redactedBody = fs.readFileSync(redactedBodyFile, "utf8");
@@ -574,9 +541,7 @@ function cmdRedactBodyIfNeeded(kind, number, currentBodyFile, redactedBodyFile,
* Delete a comment (and all its edit history).
*/
function cmdDeleteComment(commentId) {
if (!commentId) {
fail("Usage: delete-comment <comment-id>");
}
if (!commentId) fail("Usage: delete-comment <comment-id>");
gh(["api", `repos/${REPO}/issues/comments/${commentId}`, "-X", "DELETE"], { json: false });
console.log(JSON.stringify({ ok: true, deleted_comment_id: Number(commentId) }));
}
@@ -586,9 +551,7 @@ function cmdDeleteComment(commentId) {
* Delete a discussion comment via GraphQL (and all its edit history).
*/
function cmdDeleteDiscussionComment(nodeId) {
if (!nodeId) {
fail("Usage: delete-discussion-comment <node-id>");
}
if (!nodeId) fail("Usage: delete-discussion-comment <node-id>");
const result = ghGraphQL(
`mutation { deleteDiscussionComment(input: { id: "${nodeId}" }) { comment { id } } }`,
);
@@ -603,12 +566,9 @@ function cmdDeleteDiscussionComment(nodeId) {
* Create a new discussion comment via GraphQL.
*/
function cmdRecreateDiscussionComment(discussionNodeId, bodyFile, replyToNodeId) {
if (!discussionNodeId || !bodyFile) {
if (!discussionNodeId || !bodyFile)
fail("Usage: recreate-discussion-comment <discussion-node-id> <body-file> [reply-to-node-id]");
}
if (!fs.existsSync(bodyFile)) {
fail(`File not found: ${bodyFile}`);
}
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const body = fs.readFileSync(bodyFile, "utf8");
const newComment = createDiscussionComment(discussionNodeId, body, replyToNodeId);
@@ -626,12 +586,8 @@ function cmdRecreateDiscussionComment(discussionNodeId, bodyFile, replyToNodeId)
* Create a new comment from a file.
*/
function cmdRecreateComment(issueNumber, bodyFile) {
if (!issueNumber || !bodyFile) {
fail("Usage: recreate-comment <issue-number> <body-file>");
}
if (!fs.existsSync(bodyFile)) {
fail(`File not found: ${bodyFile}`);
}
if (!issueNumber || !bodyFile) fail("Usage: recreate-comment <issue-number> <body-file>");
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const result = gh([
"api",
@@ -759,9 +715,7 @@ function cmdNotify(target, author, locationType, secretTypes, replyToNodeId) {
* Close a secret scanning alert.
*/
function cmdResolve(alertNumber, resolution, comment) {
if (!alertNumber) {
fail("Usage: resolve <alert-number> [resolution] [comment]");
}
if (!alertNumber) fail("Usage: resolve <alert-number> [resolution] [comment]");
const res = resolution || "revoked";
const resComment = comment || "Content redacted and author notified to rotate credentials.";
@@ -819,12 +773,8 @@ function cmdListOpen() {
* Print a formatted summary table from a JSON results file.
*/
function cmdSummary(jsonFile) {
if (!jsonFile) {
fail("Usage: summary <json-file>");
}
if (!fs.existsSync(jsonFile)) {
fail(`File not found: ${jsonFile}`);
}
if (!jsonFile) fail("Usage: summary <json-file>");
if (!fs.existsSync(jsonFile)) fail(`File not found: ${jsonFile}`);
const results = JSON.parse(fs.readFileSync(jsonFile, "utf8"));
const lines = [];

View File

@@ -1,7 +1,4 @@
#!/usr/bin/env node
/**
* Heap snapshot diff utility for OpenClaw test memory leak investigations.
*/
import fs from "node:fs";
import path from "node:path";

View File

@@ -98,7 +98,7 @@ barrels, package-boundary tests, or extension suites.
- add `--keep`/`--id <id-or-slug>` only when several commands must share one
warmed box; stop it with `pnpm crabbox:stop -- <id-or-slug>`.
5. If plugin performance is package-artifact sensitive, switch to
`release-openclaw-plugin-testing` and Package Acceptance rather than
`openclaw-pre-release-plugin-testing` and Package Acceptance rather than
trusting source-only timing.
## Metric Collection

View File

@@ -19,7 +19,7 @@ or validating a change without wasting hours.
Prove the touched surface first. Do not reflexively run the whole suite.
1. Inspect the diff and classify the touched surface:
- normal source checkout, source change: `pnpm changed:lanes --json`, then `pnpm check:changed` (delegates to Crabbox/Testbox)
- normal source checkout, source change: `pnpm changed:lanes --json`, then `pnpm check:changed`
- normal source checkout, tests only: `pnpm test:changed`
- normal source checkout, one failing file: `pnpm test <path-or-filter> -- --reporter=verbose`
- Codex worktree or linked/sparse checkout, one/few explicit files: `node scripts/run-vitest.mjs <path-or-filter>`
@@ -27,7 +27,7 @@ Prove the touched surface first. Do not reflexively run the whole suite.
use the Crabbox wrapper with the provider that matches the proof surface.
For maintainer heavy `pnpm` gates, that is usually delegated Blacksmith
Testbox through Crabbox, e.g. `node scripts/crabbox-wrapper.mjs run
--provider blacksmith-testbox ... -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed`. For direct AWS
--provider blacksmith-testbox ... -- pnpm check:changed`. For direct AWS
Crabbox proof, omit `--provider` and let `.crabbox.yaml` choose AWS.
- workflow-only: `git diff --check`, workflow syntax/lint (`actionlint` when available)
- docs-only: `pnpm docs:list`, docs formatter/lint only if docs tooling changed or requested
@@ -66,18 +66,15 @@ scripts/crabbox-wrapper.mjs` for Testbox, and `git commit --no-verify` only
```bash
pnpm changed:lanes --json
pnpm check:changed # Crabbox/Testbox changed typecheck/lint/guards; no Vitest
pnpm check:changed # changed typecheck/lint/guards; no Vitest
pnpm test:changed # cheap smart changed Vitest targets
pnpm verify # full check, then full Vitest
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
pnpm test <path-or-filter> -- --reporter=verbose
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test <path-or-filter>
```
Use targeted file paths whenever possible. Avoid raw `vitest`; use the repo
`pnpm test` wrapper so project routing, workers, and setup stay correct. If raw
Vitest is unavoidable, use `vitest run ...`; bare `vitest ...` starts local watch
mode and will not exit on its own.
`pnpm test` wrapper so project routing, workers, and setup stay correct.
When the checkout is a Codex worktree, prefer the direct node harness instead:
```bash
@@ -92,8 +89,6 @@ status checks or install reconciliation in a linked worktree.
- `pnpm check` and `pnpm check:changed` do not run Vitest tests. They are for
typecheck, lint, and guard proof.
- `pnpm test` and `pnpm test:changed` run Vitest tests.
- `pnpm verify` runs `pnpm check`, then `pnpm test`, with Crabbox phase markers
so remote summaries show which half failed.
- `pnpm test:changed` is intentionally cheap by default: direct test edits,
sibling tests, explicit source mappings, and import-graph dependents.
- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` is the explicit broad
@@ -215,7 +210,7 @@ workflow only spends setup and queue time on that suite.
### Release Evidence
After release-candidate validation or before a release decision, record the
important run ids in the public `openclaw/releases` evidence ledger.
important run ids in the private `openclaw/releases-private` evidence ledger.
Use the manual `OpenClaw Release Evidence`
(`openclaw-release-evidence.yml`) workflow there. It writes durable summaries
under `evidence/<release-id>/` and commits:
@@ -238,13 +233,13 @@ short release-manager notes there. Do not store raw logs, provider
prompts/responses, channel transcripts, signing material, or secret-bearing
config in git; raw logs stay in Actions artifacts.
When `Full Release Validation` completes and `OPENCLAW_RELEASES_DISPATCH_TOKEN`
is configured in the source repo, it requests the public
`OpenClaw Release Evidence From Full Validation` workflow. That workflow reads
the parent full-validation run, extracts the child CI/release-checks/Telegram
run ids from the parent logs, and opens the evidence PR automatically. If the
token is absent or the run predates this wiring, trigger that workflow manually
with the full-validation run id.
When `Full Release Validation` completes and
`OPENCLAW_RELEASES_PRIVATE_DISPATCH_TOKEN` is configured in the public repo, it
requests the private `OpenClaw Release Evidence From Full Validation` workflow.
That private workflow reads the parent full-validation run, extracts the child
CI/release-checks/Telegram run ids from the parent logs, and opens the evidence
PR automatically. If the token is absent or the run predates this wiring, trigger
that private workflow manually with the full-validation run id.
### Release Checks

View File

@@ -0,0 +1,41 @@
---
name: optimizetests
description: Optimize OpenClaw slow tests, imports, misplaced coverage, and CI wall time without dropping coverage.
---
# Optimize Tests
Goal: real OpenClaw test/runtime speedups with coverage intact. Do not add shards,
skip assertions, weaken gates, or tune runner flags as the main fix.
## Runbook
1. Read `docs/help/testing.md`, `docs/ci.md`, and the scoped `AGENTS.md` files
for any subtree you will edit.
2. Establish evidence before edits:
- Full ranking: `pnpm test:perf:groups --full-suite --allow-failures --output .artifacts/test-perf/<name>.json`
- Targeted file: `timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose`
- Import suspicion: add `OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1`
3. Attack highest-return hotspots first:
- broad barrels or `importActual()` in hot tests
- per-test `vi.resetModules()` plus fresh imports
- expensive gateway/server/client setup where reset/reuse proves same behavior
- core tests asserting extension-owned behavior
- duplicated fixture construction or contract assertions
4. Prefer production-quality fixes:
- narrow runtime seams over broad mocks
- pure helpers for static parsing/metadata
- injected deps over module resets
- extension-owned tests for bundled plugin/provider/channel behavior
5. After each change, rerun the same benchmark and the proving test lane. Record
before/after wall time, Vitest duration, and max RSS when available.
6. Run `pnpm check:changed`; run broader gates (`pnpm check`, `pnpm test`,
`pnpm build`) when touched surfaces require them.
7. Commit scoped changes with `scripts/committer "<conventional message>" <paths...>`.
Push when requested. If CI is red, inspect with `gh run list/view`, fix, push,
repeat until current CI is green or a blocker is proven unrelated.
## Output
End with the pushed commit(s), before/after timings, gates run, current CI state,
and any remaining tail lanes that need separate optimization.

View File

@@ -0,0 +1,6 @@
interface:
display_name: "Optimize Tests"
short_description: "Benchmark and speed up OpenClaw tests"
default_prompt: "Use $optimizetests to benchmark slow OpenClaw tests, optimize imports and duplicated setup, move misplaced core coverage to extensions, verify gates, commit scoped changes, push, and keep CI green without adding shards or dropping coverage."
policy:
allow_implicit_invocation: false

View File

@@ -1,87 +0,0 @@
---
name: release-openclaw-announcement
description: "Draft or post OpenClaw beta/stable Discord release announcements from changelog, GitHub release, registry, and validation evidence. Use when announcing a beta, stable release, release candidate, or asking what users should test after an OpenClaw release."
---
# OpenClaw Release Announcement
Use with `release-openclaw-maintainer` after a beta or stable release is live.
Use with `$discord-user-post` when actually posting to Discord as the logged-in
user.
## Evidence First
Before drafting focus areas, read real release evidence:
1. Current GitHub release body for the tag.
2. `CHANGELOG.md` section for the released base version.
3. Commits since the previous shipped version or the operator-specified base.
4. Registry/package metadata for the exact version and current dist-tag.
5. Validation status that is relevant to user confidence.
Do not claim a full changelog audit unless you did it. If you only read the
generated release notes or top changelog section, say that and either audit
properly or draft with that limitation.
For beta focus areas, prioritize user-observable changes over internal test or
CI mechanics:
- install/update paths
- OS/platform-specific behavior
- Gateway startup/restart, config, and runtime behavior
- provider/model/runtime routing
- plugin loading and local plugin development
- channels and media paths
- security/data-loss/user-impact fixes
Do not let late release-branch fixes automatically dominate the announcement.
If the version includes a large delta from the previous shipped version, rank
focus areas by the whole release delta and expected user impact; mention late
fixes in their natural category.
## Required Copy
Every beta announcement must make beta status explicit and include:
- exact version, e.g. `OpenClaw 2026.5.25-beta.1`
- one-sentence risk framing: beta, useful for testing, not stable promotion
- focused test areas derived from evidence, not guesswork
- update command promoted near the top:
```sh
openclaw update --channel beta --yes
openclaw --version
```
- fresh install path:
`Install from https://openclaw.ai`
- GitHub release link
- concise validation note, without making CI the headline
Do not suggest npm install commands in beta announcements unless the operator
explicitly asks for npm-specific copy or troubleshooting text. It is fine to use
registry metadata as evidence; do not turn that into public install guidance.
For stable announcements, use the stable channel wording:
```sh
openclaw update --channel stable --yes
openclaw --version
```
Fresh installs still point to `https://openclaw.ai`.
## Style
- Discord Markdown, no tables.
- Keep it skimmable: short intro, bullets, commands, links.
- Lead with what users can feel or test, not proof plumbing.
- Mention validation only after install/update instructions.
- Be specific about where feedback is useful.
- Do not mention private local proof paths in public announcements.
- Do not overstate unverified platforms, channels, or provider behavior.
## Posting
When asked to post, use `$discord-user-post` to operate the logged-in Discord
desktop app as the user. Resolve and visibly verify the exact server/channel,
inspect the final body, and request action-time confirmation before entering or
sending it. Never use OpenClaw channel sends, bots, webhooks, relays, or tokens.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Release Announcement"
short_description: "Draft Discord beta/stable release announcements from evidence."
default_prompt: "Use this skill to draft an OpenClaw beta or stable Discord announcement from changelog, release notes, npm/GitHub release proof, and validation evidence."

View File

@@ -1,174 +0,0 @@
---
name: release-openclaw-ci
description: "Run, watch, debug, and summarize OpenClaw full release CI, release checks, live provider gates, install/update proofs, and release-secret preflights."
---
# OpenClaw Release CI
Use this with `$release-openclaw-maintainer` and `$openclaw-testing` when a release candidate needs full validation, install/update proof, live provider checks, or CI recovery.
## Guardrails
- No version bump, tag, npm publish, GitHub release, or release promotion without explicit operator approval.
- Validate provider secrets before dispatching expensive full release matrices.
- Do not set GitHub secrets from unvalidated 1Password candidates. If a candidate returns 401/403, leave the existing secret alone and report the exact missing provider.
- Use `$one-password` for secret reads/writes: one persistent tmux session, targeted items only, no secret output.
- Watch one parent run plus compact child summaries. Avoid broad `gh run view` polling loops; REST quota is easy to burn.
- Fetch logs only for failed or currently-blocking jobs. If quota is low, stop polling and wait for reset.
- Treat live-provider flakes separately from code failures: prove key validity, provider HTTP status, retry evidence, and exact failing lane before editing code.
- A model-list response proves authentication, not billing or inference
entitlement. Mandatory live providers must pass a real completion probe
before release dispatch. Fix the credential first; do not add an alternate
auth path merely to bypass a failed release credential.
- Full Release Validation parent monitors fail fast: once a required child job
fails, the parent cancels the remaining child matrix and prints the failed
job summary. Inspect that first red job instead of waiting for unrelated
matrix tails.
- In a sparse worktree or Testbox source sync, first confirm `package.json`,
`pnpm-lock.yaml`, and every source path the selected check reads. If any are
absent, that checkout cannot validate a release dependency or Docker lane:
stop and use the repo remote changed gate or a full task worktree. When the
inputs are present and a release fix changes `package.json` or
`pnpm-lock.yaml`, rebuild only the task-owned disposable box with
`CI=true pnpm install --frozen-lockfile`, then run an explicit
`require.resolve()` probe before Docker or focused tests. The CI flag permits
pnpm to recreate a prewarmed modules directory without an interactive
confirmation. Do not weaken the lockfile or label sparse-checkout failures
as product/Docker failures.
- If the candidate is rebased or its base SHA changes after warmup, stop the
task-owned box and warm a fresh one before testing. Testbox source sync is
relative to the warmed source tree; continuing can mix an old base file with
a new candidate diff and produce false lockfile or Docker failures.
- For a committed release candidate, warm the box with
`blacksmith testbox warmup ... --ref <candidate-branch-or-sha>`. Do not rely
on source sync to overlay committed branch changes onto the workflow's
default ref.
## Preflight
Before full release validation:
```bash
node .agents/skills/release-openclaw-ci/scripts/verify-provider-secrets.mjs --required openai,anthropic,fireworks
gh api rate_limit --jq '.resources.core'
git status --short --branch
git rev-parse HEAD
```
1Password service-account values are the first source for release provider
preflight. Inject those exact targeted keys first, then run the verifier; use
ambient env only when it was already intentionally injected for this release.
The script prints only provider status and HTTP class, never tokens.
The Anthropic check performs a tiny message completion so exhausted or
non-billable credentials fail before the expensive release matrix.
## Dispatch
Start product performance evidence as early as the release SHA exists, in
parallel with other release work:
```bash
gh workflow run openclaw-performance.yml \
--repo openclaw/openclaw \
--ref main \
-f target_ref=<release-sha> \
-f profile=release \
-f repeat=3 \
-f deep_profile=false \
-f live_openai_candidate=false \
-f fail_on_regression=true
```
- Do not wait for full release validation to start this early perf signal.
- Compare available Kova, gateway startup, and CLI startup metrics with earlier
release evidence or clawgrit reports before publish/closeout.
- Call out any regression in the release proof. Treat a major regression as a
release blocker until it is fixed, waived by the operator, or proven to be
infrastructure noise.
- Full Release Validation records blocking product-performance evidence. The
early standalone run is for overlap and faster regression discovery, but a
regression or missing child run blocks the parent validation.
Prefer the trusted workflow on `main`, target the exact release SHA:
- Keep trusted-workflow checks compatible with frozen release targets. If
`main` adds a target-owned guard script or package command after the release
branch cut, make the trusted workflow skip only when that target surface is
absent. Heal the trusted workflow before rerunning validation; do not port an
unrelated runtime refactor or mutate the release candidate just to satisfy a
newer `main`-only check.
```bash
gh workflow run full-release-validation.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<release-sha> \
-f provider=openai \
-f mode=both \
-f release_profile=full \
-f rerun_group=all
```
Use `release_profile=stable` unless the operator explicitly asks for the broad advisory provider/media matrix. Stable and full profiles force the release soak; the beta profile may opt in with `run_release_soak=true`. Use narrow `rerun_group` after focused fixes.
Publish with `openclaw-release-publish.yml` using `release_profile=from-validation`
unless a maintainer intentionally wants to cross-check a specific profile; the
publish workflow reads the effective profile from the full-validation manifest.
## Watch
Use the summary helper instead of repeated raw polling:
```bash
node .agents/skills/release-openclaw-ci/scripts/release-ci-summary.mjs <full-release-run-id>
```
Then watch only when useful:
```bash
gh run watch <full-release-run-id> --repo openclaw/openclaw --exit-status
```
Stop watchers before ending the turn or switching strategy.
## Failure Triage
1. Confirm parent SHA and child run IDs.
2. List failed jobs only:
```bash
gh run view <child-run-id> --repo openclaw/openclaw --json jobs \
--jq '.jobs[] | select(.conclusion=="failure" or .conclusion=="timed_out" or .conclusion=="cancelled") | [.databaseId,.name,.conclusion,.url] | @tsv'
```
3. Fetch one failed job log. If rate-limited, note reset time and avoid more REST calls.
4. For secret-looking failures, validate a real completion from the same secret source before editing code. A successful model-list request is insufficient.
Claude CLI subscription credentials are a separate native auth path; prove
them in a clean-home CLI probe, never as a substitute for a required
Anthropic API-key lane.
5. For live-cache failures, inspect whether it is missing/invalid key, empty text, provider refusal, timeout, or baseline miss. Do not weaken release gates without clear provider evidence.
6. Fix narrowly, run local/changed proof, commit, push, rerun the smallest matching group.
7. If a required PR CI run is capacity-stalled with queued jobs and no active
jobs, do not cancel unrelated work or accept a generic manual dispatch.
From the PR head branch, dispatch the explicit exact-SHA fallback:
`gh workflow run ci.yml --repo openclaw/openclaw --ref <pr-head-branch> -f
target_ref=<full-pr-sha> -f include_android=true -f release_gate=true`.
It runs on GitHub-hosted runners and is accepted only when its run title is
`CI release gate <full-pr-sha>`. Record the stalled Blacksmith run and the
fallback run in release evidence.
If `Blacksmith Build Artifacts Testbox` is the only remaining required gate
and remains queued without a runner, that completed exact fallback may cover
it because CI's `build-artifacts` job already builds, packages, and smoke
tests the artifacts. Do not use this coverage after the artifact workflow
starts or completes non-successfully.
## Evidence
Record:
- release SHA
- full parent run URL
- child run IDs and conclusions: CI, Release Checks, Plugin Prerelease, NPM Telegram, Product Performance
- performance comparison result versus earlier releases when available
- targeted local proof commands
- provider-secret preflight result
- known gaps or unrelated failures
For lessons and recovery patterns, read `references/release-ci-notes.md`.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Release CI"
short_description: "Verify and debug OpenClaw release validation runs"
default_prompt: "Use $release-openclaw-ci to preflight provider secrets, watch full release validation, summarize child runs, and triage only failing release lanes."

View File

@@ -1,125 +0,0 @@
#!/usr/bin/env node
/**
* Release CI summary helper that prints parent and child workflow status for a
* full release run.
*/
import { execFileSync } from "node:child_process";
import process from "node:process";
const runId = process.argv[2];
const repo = process.env.OPENCLAW_RELEASE_REPO || "openclaw/openclaw";
if (!runId) {
console.error("usage: release-ci-summary.mjs <full-release-run-id>");
process.exit(2);
}
function gh(args) {
return execFileSync("gh", args, {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
}
function jsonGh(args) {
return JSON.parse(gh(args));
}
function githubRestJson(pathSuffix) {
const result = execFileSync(
"bash",
[
"-lc",
[
"set -euo pipefail",
'token="$(gh auth token)"',
'curl -fsS -H "Authorization: Bearer ${token}" -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" "${OPENCLAW_GITHUB_REST_URL}"',
].join("\n"),
],
{
encoding: "utf8",
env: {
...process.env,
OPENCLAW_GITHUB_REST_URL: `https://api.github.com/repos/${repo}/${pathSuffix}`,
},
maxBuffer: 16 * 1024 * 1024,
stdio: ["ignore", "pipe", "pipe"],
},
);
return JSON.parse(result);
}
function rate() {
try {
return jsonGh(["api", "rate_limit"]).resources.core;
} catch {
return undefined;
}
}
const core = rate();
if (core) {
const reset = new Date(core.reset * 1000).toISOString();
console.log(`rate: remaining=${core.remaining}/${core.limit} reset=${reset}`);
if (core.remaining < 20) {
console.error("rate too low for CI summary; wait for reset before polling");
process.exit(3);
}
}
const parent = jsonGh([
"run",
"view",
runId,
"--repo",
repo,
"--json",
"status,conclusion,createdAt,headSha,url,jobs",
]);
console.log(`parent: ${runId} ${parent.status}/${parent.conclusion || "none"}`);
console.log(`sha: ${parent.headSha}`);
console.log(`url: ${parent.url}`);
for (const job of parent.jobs ?? []) {
const marker = job.conclusion || job.status;
console.log(`parent-job: ${marker} ${job.name}`);
}
const since = parent.createdAt;
const runsQuery = new URLSearchParams({
per_page: "100",
created: `>=${since}`,
exclude_pull_requests: "true",
});
const childWorkflowNames = new Set([
"CI",
"OpenClaw Release Checks",
"Plugin Prerelease",
"NPM Telegram Beta E2E",
"Full Release Validation",
]);
const runs = githubRestJson(`actions/runs?${runsQuery.toString()}`).workflow_runs ?? [];
const runList = runs
.filter(
(run) =>
run.created_at >= since &&
run.head_sha === parent.headSha &&
childWorkflowNames.has(run.name),
)
.map((run) =>
[run.id, run.name, run.status, run.conclusion ?? "", run.head_sha, run.html_url].join("\t"),
)
.join("\n");
if (!runList) {
console.log("children: none found yet");
process.exit(0);
}
console.log("children:");
for (const line of runList.split("\n")) {
const [id, name, status, conclusion, sha, url] = line.split("\t");
console.log(`child: ${id} ${name} ${status}/${conclusion || "none"} sha=${sha}`);
console.log(`child-url: ${url}`);
}

View File

@@ -1,142 +0,0 @@
#!/usr/bin/env node
/**
* Release preflight helper that verifies required provider API keys without
* printing secret values. Anthropic must complete a prompt because model-list
* access does not prove billing or inference entitlement.
*/
import process from "node:process";
const args = new Map();
for (let index = 2; index < process.argv.length; index += 1) {
const arg = process.argv[index];
if (!arg.startsWith("--")) {
continue;
}
const [key, inlineValue] = arg.slice(2).split("=", 2);
const value = inlineValue ?? process.argv[index + 1];
if (inlineValue === undefined) {
index += 1;
}
args.set(key, value);
}
const requiredInput = String(args.get("required") ?? "openai,anthropic").trim();
const required = new Set(
(requiredInput.toLowerCase() === "none" ? "" : requiredInput)
.split(",")
.map((entry) => entry.trim().toLowerCase())
.filter(Boolean),
);
const timeoutMs = Number(args.get("timeout-ms") ?? 10_000);
function envFirst(names) {
for (const name of names) {
const value = process.env[name]?.trim();
if (value) {
return { name, value };
}
}
return undefined;
}
async function checkProvider(id, config) {
const secret = envFirst(config.env);
if (!secret) {
return { id, ok: false, status: "missing", env: config.env.join("|") };
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
const headers = config.headers(secret.value);
const response = await fetch(config.url, {
body: config.body,
headers,
method: config.method,
signal: controller.signal,
});
const responseBody = config.validateResponse
? await response.json().catch(() => undefined)
: undefined;
const ok = response.ok && (!config.validateResponse || config.validateResponse(responseBody));
return {
id,
ok,
status: response.ok ? (ok ? "ok" : "invalid_response") : `http_${response.status}`,
env: secret.name,
};
} catch (error) {
return {
id,
ok: false,
status: error?.name === "AbortError" ? "timeout" : "error",
env: secret.name,
};
} finally {
clearTimeout(timer);
}
}
const providers = {
openai: {
env: ["OPENAI_API_KEY"],
url: "https://api.openai.com/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
anthropic: {
env: ["ANTHROPIC_API_KEY", "ANTHROPIC_API_TOKEN"],
url: "https://api.anthropic.com/v1/messages",
method: "POST",
body: JSON.stringify({
max_tokens: 8,
messages: [{ role: "user", content: "Reply with OK." }],
model: "claude-haiku-4-5",
}),
headers: (token) => ({
"anthropic-version": "2023-06-01",
"content-type": "application/json",
"x-api-key": token,
}),
validateResponse: (body) =>
Array.isArray(body?.content) &&
body.content.some((part) => typeof part?.text === "string" && part.text.trim()),
},
fireworks: {
env: ["FIREWORKS_API_KEY"],
url: "https://api.fireworks.ai/inference/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
openrouter: {
env: ["OPENROUTER_API_KEY"],
url: "https://openrouter.ai/api/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
};
const unknown = [...required].filter((id) => !providers[id]);
if (unknown.length > 0) {
console.error(`unknown providers: ${unknown.join(",")}`);
process.exit(2);
}
const results = [];
for (const id of Object.keys(providers)) {
if (required.has(id) || envFirst(providers[id].env)) {
results.push(await checkProvider(id, providers[id]));
}
}
let failed = false;
for (const result of results) {
const requiredLabel = required.has(result.id) ? "required" : "optional";
console.log(`${result.id}: ${result.status} env=${result.env} ${requiredLabel}`);
if (required.has(result.id) && !result.ok) {
failed = true;
}
}
if (failed) {
console.error("release provider secret preflight failed");
process.exit(1);
}

View File

@@ -1,92 +0,0 @@
---
name: release-openclaw-mac
description: "Run or recover OpenClaw macOS release signing, notarization, appcast, and asset promotion."
---
# OpenClaw Mac Release
Use with `$release-openclaw-maintainer`, `$release-openclaw-ci`, `$one-password`, and `$release-private` if it exists when stable macOS assets, private mac preflight, notarization, appcast promotion, or mac release recovery is involved.
## Credentials
- Resolve Peter-owned ASC item refs, key ids, issuer ids, and service-token provenance from `$release-private`.
- Fields: `private_key_p8`, `key_id`, `issuer_id`.
- Stale/revoked key symptom: `xcrun notarytool submit` fails with `HTTP status code: 401. Unauthenticated`.
- Validate candidate ASC credentials with `xcrun notarytool history` before setting GitHub secrets.
## 1Password
- Use `$one-password`: all `op` work inside one persistent tmux session, no secret output.
- Use the service-token guidance from `$release-private` when available.
- If a service token fails, run status-only checks: token present/length and `op whoami`; never print token values.
- If desktop app auth is needed but Touch ID is unavailable, set `OP_BIOMETRIC_UNLOCK_ENABLED=false` for the manual `op account add --signin` path.
## GitHub Secrets
Target private repo environment: `openclaw/releases-private`, env `mac-release`.
Set only after local notary auth validation:
- `APP_STORE_CONNECT_API_KEY_P8`
- `APP_STORE_CONNECT_KEY_ID`
- `APP_STORE_CONNECT_ISSUER_ID`
Do not update these from mixed sources. All three ASC fields must come from the same 1Password item.
## Workflow Shape
- Public release branch may carry mac-only packaging fixes after the stable tag/npm are already live.
- Use `source_ref=release/YYYY.M.PATCH` for private mac preflight/validation when building that branch variation.
- Keep `tag=vYYYY.M.PATCH` pointing at the original stable release commit.
- Real mac publish must reuse:
- a successful private mac preflight run for the same tag/source SHA
- a successful private mac validation run for the same tag/source SHA
- If preflight source SHA differs from tag SHA, validation must also use the same `source_ref`; promotion rejects mismatched proof.
## Notarization
- OpenClaw uses `scripts/notarize-mac-artifact.sh`.
- `xcrun notarytool submit` should use `--no-s3-acceleration`; accelerated upload can surface misleading 401s even when `notarytool history` succeeds.
- If signing succeeds but notarization fails immediately with 401, check ASC key freshness first.
- If notarization stays in progress for several minutes after key-file write, that is normal Apple wait time; do not edit blindly.
## Dispatch
Private preflight:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.PATCH \
-f source_ref=release/YYYY.M.PATCH \
-f preflight_only=true \
-f smoke_test_only=false \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.PATCH
```
Private validation for a branch-variation preflight:
```bash
gh workflow run openclaw-macos-validate.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.PATCH \
-f source_ref=release/YYYY.M.PATCH
```
Real publish:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.PATCH \
-f preflight_only=false \
-f smoke_test_only=false \
-f preflight_run_id=<successful-preflight-run> \
-f validate_run_id=<successful-validation-run> \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.PATCH
```
## Verify
- `gh release view vYYYY.M.PATCH --repo openclaw/openclaw` shows zip, dmg, dSYM zip, not draft, not prerelease.
- Public `main` `appcast.xml` points at `OpenClaw-YYYY.M.PATCH.zip`.
- Appcast entry has `sparkle:version`, `sparkle:shortVersionString`, length, and `sparkle:edSignature`.

View File

@@ -1,889 +0,0 @@
---
name: release-openclaw-maintainer
description: Prepare or verify OpenClaw stable/beta releases, changelogs, release notes, publish commands, and artifacts.
---
# OpenClaw Release Maintainer
Use this skill for release and publish-time workflow. Load `$release-private` if it exists before resolving Peter-owned credential locators or private host topology. Keep ordinary development changes and GHSA-specific advisory work outside this skill.
## Respect release guardrails
- Do not change version numbers without explicit operator approval.
- Versions use `YYYY.M.PATCH`, where `PATCH` is the sequential release-train number within the month, not the calendar day.
- Choose a new beta train from stable and beta releases only. Alpha-only tags do not consume or advance the beta/stable patch number. Continue the highest existing unpublished/published beta train with the next `beta.N` when appropriate; otherwise increment the highest stable/beta patch by one and start at `beta.1`.
- Example: after stable `2026.6.5`, the next new beta train is `2026.6.6-beta.1`, even if automated alpha-only tags such as `2026.6.10-alpha.1` exist.
- Ask permission before any npm publish or release step.
- This skill should be sufficient to drive the normal release flow end-to-end.
- Use the private maintainer release docs for credentials, recovery steps, and mac signing/notary specifics, and use `docs/reference/RELEASING.md` for public policy.
- Core `openclaw` publish is manual `workflow_dispatch`; creating or pushing a tag does not publish by itself.
- Do not edit the root `README.md` as release prep, release closeout, or a
substitute for release notes. Package-root README validation is a hard
packaging gate, but a release only changes README content when an actual
user-facing documentation contract changed.
- Normal release work happens on a branch cut from `main`, not directly on
`main`. Use `release/YYYY.M.PATCH` for the branch name.
- If the operator asks for a release without saying stable/full, default to
beta only. Continue from beta to stable only when the operator explicitly asks
for the full release or an automated beta-and-stable train.
- Before release branching, pull latest `main` and confirm current `main` CI is
green. Then branch from that commit so regular development can continue on
`main` while release validation runs.
- Before release branching, commit any dirty files in coherent groups, push,
pull/rebase, then generate `CHANGELOG.md` on `main` from merged PRs and all
direct commits since the last reachable release tag. Commit/push/pull that
changelog rewrite immediately before creating the release branch.
- During release planning, inspect both `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` before branching and again
before final publish. For every deprecated or removal-pending compatibility
record whose `removeAfter` date is on or before the release date, either
remove the compatibility path where safe and validate the affected tests, or
write down why removal is blocked and get explicit maintainer approval before
shipping the expired compatibility path.
- When removing deprecated runtime/config compatibility, preserve any doctor
migration, repair, or hint that is still needed by supported upgrade paths.
Doctor-side compatibility should stay tracked in
`src/commands/doctor/shared/deprecation-compat.ts` until maintainers confirm
the repair is no longer needed.
- Revalidate compatibility replacement text during release planning. The
recommended replacement can shift as plugin ownership, externalization, and
config footprint move, so do not blindly copy stale replacement annotations
into release notes.
- Do not delete or rewrite beta tags after their matching npm package has been
published. If a pushed beta tag fails before npm publish, the version is not
consumed: keep the same `-beta.N`, delete/recreate or force-move the git tag
and prerelease to the fixed commit, and rerun preflight. Do not increment to
the next beta number until the matching npm package has actually published.
If a published beta needs a fix, commit the fix on the release branch and
increment to the next `-beta.N`.
- For a beta release train, keep Full Release Validation as a pre-publish gate
unless the operator explicitly waives it. Run the fast local preflight, npm
preflight, full release validation, and performance in parallel where safe.
If anything fails before npm publish, fix it on the release branch,
forward-port the fix to `main`, move the unpublished beta tag/prerelease to
the fixed commit, and rerun the affected pre-publish gates. If anything fails
after npm publish, fix it, forward-port to `main`, increment beta number, and
repeat. After each beta publish, run the published-package roster focused on
install/update/Docker/Parallels/NPM Telegram. For later beta attempts, rerun
only lanes whose evidence changed unless the fix touches broad release,
install/update, plugin, Docker, Parallels, or live QA behavior. After each
beta is live, scan current `main` once for critical fixes that landed after
the release branch cut and backport only important low-risk fixes. Operators
may authorize up to 4 autonomous beta attempts; after 4 failed beta attempts,
stop and report.
- As soon as the release candidate SHA exists, dispatch `OpenClaw Performance`
with `target_ref=<release-sha>` in parallel with the other release work. Do
not wait for full release validation to start the performance signal.
- Before publish/closeout, compare available product performance metrics with
earlier releases: Kova agent-turn/resource metrics, gateway startup
ready/listen/RSS/CPU metrics, and CLI startup metrics from release evidence
or clawgrit reports. Report regressions explicitly. A major regression is a
release blocker unless the operator waives it or the data clearly proves
infrastructure noise.
- Heal CI before tagging or publishing. The exact candidate SHA must have green
`Full Release Validation`, including the root Dockerfile/install-smoke path.
Treat a red Docker, package, or release workflow lane as a release-branch
defect until the smallest correct fix is landed and proven; do not waive it
because npm preflight or another sibling lane passed.
- Keep the canonical `scripts/pr` runner authoritative for prepare and merge
artifacts. A release-gate policy change may use focused candidate tests and
exact-SHA hosted CI for proof, but never route `prepare-*` or `merge-*`
through PR-controlled scripts or synthesize prepare artifacts to bootstrap
the change. If the current canonical gate cannot validate the new policy,
stop for explicit maintainer direction rather than weakening that boundary.
- In maintainer Testbox mode, use `OPENCLAW_TESTBOX=1 scripts/pr prepare-run
<PR>` only after the exact PR head has passed `CI` and every scheduled
hosted gate. For a workflow change, that means `Blacksmith Testbox`,
`Blacksmith ARM Testbox`, `Blacksmith Build Artifacts Testbox`, and
`Workflow Sanity`; only gates GitHub actually scheduled for that exact head
are required. This preserves the canonical prepare artifacts while avoiding
a redundant broad local suite. A
literal `CHANGELOG.md`-only head gets a clean diff check instead because
those workflows intentionally do not dispatch. Documentation and README
changes still require CI. If `merge-run` requires a mainline sync, run
`OPENCLAW_TESTBOX=1 scripts/pr prepare-sync-head <PR>`, wait for those hosted
gates on the newly pushed SHA, then run `prepare-run` again.
- If an exact PR-head CI run has no active jobs because Blacksmith capacity is
stalled, a maintainer may dispatch the explicit GitHub-hosted fallback from
the PR head branch:
`gh workflow run ci.yml --repo openclaw/openclaw --ref <pr-head-branch> -f
target_ref=<full-pr-sha> -f include_android=true -f release_gate=true`.
Use it only for an observed provider queue stall, never for failed CI or as a
routine shortcut. The run must be named `CI release gate <full-pr-sha>` and
pass on that exact SHA; the native hosted-gate verifier rejects generic manual
CI runs. If `Blacksmith Build Artifacts Testbox` is the only remaining
required gate and it is still queued without a runner, the same completed
fallback CI may cover it because its `build-artifacts` job builds, packages,
and smoke tests those artifacts. The verifier records that coverage. Never
use this coverage when the artifact workflow has started, failed, been
cancelled, or been skipped. Then rerun `OPENCLAW_TESTBOX=1 scripts/pr
prepare-run <PR>`.
- Generate the changelog before every beta, beta rerun, stable release, or
stable rerun, before version/tag preparation. Use
`$openclaw-changelog-update` for the rewrite. Do not continue release prep if
the target `CHANGELOG.md` section does not have `### Highlights`,
`### Changes`, and `### Fixes`, grouped by user-facing surface while
preserving every relevant PR/issue ref and every human `Thanks @...`
attribution in the grouped bullet.
- Do not create beta-specific `CHANGELOG.md` headings. Beta releases use the
stable base version section, for example `v2026.4.20-beta.1` uses
`## 2026.4.20` release notes.
- When any beta or stable release is live, make a best-effort Discord
announcement using the configured secret workflow; do not block or roll back
the release if the announcement fails.
- When asked to announce on X, use `~/Projects/bird/bird` and follow the
release tweet style below.
## Keep release channel naming aligned
- `stable`: tagged releases only, published to npm `beta` by default; operators may target npm `latest` explicitly or promote later
- `beta`: prerelease tags like `vYYYY.M.PATCH-beta.N`, with npm dist-tag `beta`
- Prefer `-beta.N`; do not mint new `-1` or `-2` beta suffixes
- `dev`: moving head on `main`
- When using a beta Git tag, publish npm with the matching beta version suffix so the plain version is not consumed or blocked
## Close stable releases on main
Stable publication is not complete until `main` carries the actual shipped release state.
1. Start from fresh latest `main`. Audit `release/YYYY.M.PATCH` against it and
forward-port real fixes that are absent from `main`. Do not blindly merge
release-only compatibility, test, or validation adapters into newer `main`.
2. Set `main` to the shipped stable version, not a speculative next train. Run
`pnpm release:prep` after the root version change, then
`pnpm deps:shrinkwrap:generate`.
3. Make `CHANGELOG.md`'s `## YYYY.M.PATCH` section on `main` exactly match the
tagged release branch. Include the stable `appcast.xml` update when the mac
release published one.
4. Do not add `YYYY.M.PATCH+1`, a beta version, or an empty future changelog
section to `main` until the operator explicitly starts that release train.
5. Run `pnpm release:generated:check`, `pnpm deps:shrinkwrap:check`, and
`OPENCLAW_TESTBOX=1 pnpm check:changed`. Push, then verify `origin/main`
contains the shipped version and changelog before calling the stable release
done.
6. Keep repository variables `RELEASE_ROLLBACK_DRILL_ID` and
`RELEASE_ROLLBACK_DRILL_DATE` current after each private rollback drill.
`openclaw-stable-main-closeout.yml` starts from the `main` push carrying the
shipped version, changelog, and appcast after stable publication, then binds
immutable evidence to the published tag. Do not declare stable complete
until it writes the immutable closeout manifest to the GitHub release. The
drill must be within 90 days; manual dispatch is only for repair/replay, and
private rollback commands remain in the maintainer-only runbook.
## Handle versions and release files consistently
- Version locations include:
- `package.json`
- `apps/android/app/build.gradle.kts`
- `apps/ios/Sources/Info.plist`
- `apps/ios/Tests/Info.plist`
- `apps/macos/Sources/OpenClaw/Resources/Info.plist`
- `docs/install/updating.md`
- Peekaboo Xcode project and plist version fields
- Before creating a release tag, make every version location above match the version encoded by that tag.
- For fallback correction tags like `vYYYY.M.PATCH-N`, the repo version locations still stay at `YYYY.M.PATCH`.
- “Bump version everywhere” means all version locations above except `appcast.xml`.
- Release signing and notary credentials live outside the repo in the private maintainer docs.
- Every stable OpenClaw release ships the npm package, macOS app, and signed
Windows Hub installers together. Beta releases normally ship npm/package
artifacts first and skip native app build/sign/notarize/promote unless the
operator requests native beta validation.
- Do not let the slower macOS signing/notary path block npm publication once
the npm preflight has passed. Keep mac validation/publish running in
parallel, publish npm from the successful npm preflight, then start published
npm install/update, Docker, and Parallels verification while mac artifacts
continue.
- After a beta is published, overlap remote/manual release rosters where useful,
but avoid piling local Docker, Parallels, and QA-Lab work onto the same host
when it would create system-load noise. Use selective reruns after failures or
fixes, but keep proof that Docker, Parallels, and QA-Lab each passed at least
once before stable/latest promotion.
- Mac packaging may be built from a slight release-branch variation of the
tagged commit when the delta is mac packaging, signing, workflow, or
validation-only release machinery. If mac packaging needs release-branch-only
fixes after the stable npm package or GitHub tag is already published, do not
create a `vYYYY.M.PATCH-N` correction tag just to change the workflow source.
Dispatch the private mac workflows for the original `tag=vYYYY.M.PATCH` with
`source_ref=release/YYYY.M.PATCH` and `public_release_branch=release/YYYY.M.PATCH`;
provenance checks must prove the source SHA descends from the tag and
validation/preflight use the same source. Reserve `vYYYY.M.PATCH-N` correction
tags for emergency hotfixes that must publish a new npm package/release
identity, not for ordinary mac-only packaging recovery.
- The production Sparkle feed lives at `https://raw.githubusercontent.com/openclaw/openclaw/main/appcast.xml`, and the canonical published file is `appcast.xml` on `main` in the `openclaw` repo.
- That shared production Sparkle feed is stable-only. Beta mac releases may
upload assets to the GitHub prerelease, but they must not replace the shared
`appcast.xml` unless a separate beta feed exists.
- For fallback correction tags like `vYYYY.M.PATCH-N`, the repo version still stays
at `YYYY.M.PATCH`, but the mac release must use a strictly higher numeric
`APP_BUILD` / Sparkle build than the original release so existing installs
see it as newer.
- Stable Windows Hub release closeout requires the signed
`OpenClawCompanion-Setup-x64.exe`, `OpenClawCompanion-Setup-arm64.exe`, and
`OpenClawCompanion-SHA256SUMS.txt` assets on the canonical
`openclaw/openclaw` GitHub Release. Pass the exact signed
`openclaw/openclaw-windows-node` release tag as `windows_node_tag` to
`OpenClaw Release Publish`, together with the candidate-approved
`windows_node_installer_digests` map; it prevalidates the published source
release and required installers against that map before any publish child,
dispatches the public `Windows Node Release` workflow while the OpenClaw
release is still a draft, carries those pinned source asset digests
unchanged, verifies the expected OpenClaw Foundation Authenticode signer on
Windows, re-downloads and checksum-verifies the promoted asset contract, and
blocks publication until the canonical asset contract is present. Use direct
`Windows Node Release` dispatch only for recovery, always with an exact tag,
never `latest`, and the explicit `expected_installer_digests` JSON map from
the approved source release. Recovery rejects unexpected
`OpenClawCompanion-*` target asset names, then replaces the expected contract
assets with the pinned source bytes.
- Website Windows Hub download links should target exact canonical
`openclaw/openclaw/releases/download/vYYYY.M.PATCH/...` assets for the current
stable release, or `releases/latest/download/...` only after verifying the
redirect resolves to that same tag, so the installable signed Windows artifact
is visible from both the GitHub release page and openclaw.ai.
## Build changelog-backed release notes
- `CHANGELOG.md` is release-owned. Normal PRs and direct `main` fixes should
not edit it.
- Before release branching or tagging, rewrite the target `CHANGELOG.md`
section from history, not existing notes. Use the last reachable stable or
beta release tag as the base, then inspect every commit through the target
release SHA.
- Generate `$openclaw-changelog-update`'s full contribution manifest before
the editorial rewrite. It is the required source for `### Highlights`,
`### Changes`, and `### Fixes`; do not preserve old grouped prose without
comparing it to the manifest's PRs, contributors, direct commits, and
unlinked commits.
- The changelog rewrite is not optional for beta reruns: any `beta.N` after a
rebase or backport must refresh the same stable-base `## YYYY.M.PATCH` section
before the new version/tag commit.
- Include both merged PR commits and direct commits on `main`. Direct commits
matter: infer notes from their subject, body, touched files, linked issues,
tests, and nearby code when no PR body exists.
- Keep direct commits in the generated manifest and use them to shape grouped
user outcomes, but never dump them into `CHANGELOG.md` or GitHub release
bodies. The public complete record is PR-first and exhaustive for PRs.
- Prefer PR bodies, issue links, review proof, and commit bodies over commit
subjects alone. If a commit fixed an issue directly, the commit body should
name the user-visible behavior, affected surface, issue ref, and credited
reporter/contributor when known.
- Treat missing context as a release-note audit gap: inspect the diff and linked
issue, draft the best accurate entry, and note the uncertainty for maintainer
review rather than inventing impact.
- Add missed user-facing changes, remove internal-only noise, dedupe overlapping
PR/direct-commit entries, and sort each section from most to least interesting
for users.
- Group related highlights, changes, and fixes by user-facing surface and
impact, but never lose traceability: each grouped bullet keeps every relevant
`#issue`, `(#PR)`, `Fixes #...`, and every human `Thanks @...` handle.
Multiple thanks in one bullet are expected when multiple contributor PRs are
grouped.
- Highlights earn their place only when they are a visible capability/workflow
unlock, a material reliability or safety repair, a broad user-facing
improvement, or a release-defining integration/compatibility change. Keep
five to eight user-outcome bullets; omit tests, CI, refactors, docs, and
implementation trivia unless their outcome materially affects users.
- Do not give `docs`, `test`, `refactor`, `ci`, `build`, `chore`, or `style`
PRs/direct commits their own Highlights, Changes, or Fixes entry. They remain
accounted for in the PR record or manifest, but are not product release
content. Treat explicit internal title signals such as `QA`, `lint`, or
`testing` the same way even when the PR has no conventional prefix.
- Use the generated `### Complete contribution record` as PR-first accounting:
every merged source PR appears once with author/co-author credit, including
PRs identified only by an explicit active-commit `#NNN` reference after a
cherry-pick or squash. Keep issues inline as `#NNN` in titles and grouped
prose; do not create a linked-issues inventory or a direct-commit listing.
When grouped prose names a PR, keep every contributor and linked-reporter
credit from that PR's record on the same bullet.
- Changelog entries should be user-facing, not internal release-process notes.
- GitHub release and prerelease bodies must use the full matching
`CHANGELOG.md` version section, not highlights or an excerpt. When creating
or editing a release, extract from `## YYYY.M.PATCH` through the line before the
next level-2 heading and use that complete block as the release notes.
- GitHub limits release bodies to 125,000 characters. If a historical
`### Release verification` tail would exceed that cap, omit the tail and keep
the complete changelog section; do not truncate the contribution record.
- Before publishing or closing a release, run
`$openclaw-changelog-update`'s `verify-release-notes.mjs` with every stable
and beta release tag in the train. Do not publish or leave a page live when
it is missing a source-history reference, eligible human credit, or the
complete matching changelog body.
- To update an existing GitHub Release body, resolve the numeric release id and
patch that resource with the notes file as the `body` field:
`gh api repos/openclaw/openclaw/releases/tags/vYYYY.M.PATCH --jq .id`, then
`gh api -X PATCH repos/openclaw/openclaw/releases/<id> -F body=@/tmp/notes.md`.
Do not trust `gh release edit --notes-file` or `--input` JSON if verification
disagrees; verify with `gh api repos/openclaw/openclaw/releases/<id>` because
the tag lookup and `gh release view` can lag or show stale body text.
- When preparing release notes, scan `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` for compatibility records
with `warningStarts` or `removeAfter` within 7 days after the release date.
Add an `Upcoming deprecations` note to the release notes when any exist,
including the compatibility code, target date, replacement, and a link to the
record's `docsPath` or `/plugins/compatibility` when no more specific
deprecation page exists.
- When cutting a mac release with a beta GitHub prerelease:
- tag `vYYYY.M.PATCH-beta.N` from the release commit
- create a prerelease titled `openclaw YYYY.M.PATCH-beta.N`
- use release notes from the stable base `CHANGELOG.md` version section
(`## YYYY.M.PATCH`), not a beta-specific heading
- attach at least the zip and dSYM zip, plus dmg if available
- Keep the top version entries in `CHANGELOG.md` sorted by impact:
- `### Changes` first
- `### Fixes` deduped with user-facing fixes first
## Write release tweets
Use the OpenClaw account's existing release-post style:
- Format: `OpenClaw YYYY.M.PATCH 🦞` or `🦞 OpenClaw YYYY.M.PATCH is live`, blank line,
then 3-4 emoji-led bullets, blank line, one short punchline, then the release
link.
- For beta: say `OpenClaw YYYY.M.PATCH-beta.N 🦞` or `OpenClaw YYYY.M.PATCH beta N is
live`; keep it clearly beta and avoid implying stable promotion.
- Lead with user-visible capabilities, then important integrations, then
reliability/security/install fixes. Compress "lots of fixes" into one
readable bullet.
- Read the full changelog section before drafting. Do not lead with coverage,
CI, validation, or internal release mechanics unless the release is explicitly
about those. Peter prefers concrete user wins: features, integrations,
workflow improvements, and practical reliability fixes.
- Do not feature QA parity, test coverage, release gates, or validation lanes in
user-facing launch tweets. Keep them for release notes or maintainer proof
unless the operator explicitly asks for validation-focused copy.
- Do not feature plugin-author or developer tooling such as SDK helpers,
tool-plugin scaffolding, build/validate/init commands, or internal CLI
plumbing in general user-facing launch tweets unless the operator explicitly
asks for developer-focused copy.
- Tone: high-signal, slightly cheeky, confident, not corporate. One joke is
enough. Avoid punching down, insulting users, or promising what was not
verified.
- Peter likes dry, compact taglines when they feel earned. Good example:
`Big release, tiny release notes... kidding.` Keep the joke short and let the
feature bullets carry the tweet; do not turn the punchline into a second
paragraph or a forced bit.
- Length: release tweets are always standard tweets under 280 characters, with
room for one URL. Trim to 3-4 bullets and count the final text before posting.
- Links/media: include the GitHub release or changelog link at the end of the
first release tweet.
- Thread follow-ups: if doing a thread, keep the first release tweet as the
compact launch post, then publish one focused feature explainer per reply.
Follow-up replies should not repeat "new in VERSION" or the version number
when the thread context already makes it obvious.
- Peter's preferred thread workflow: first agree on the generic launch tweet,
then proceed through follow-up tweets one by one. When he says `next`, provide
or copy the next follow-up only; do not dump the full thread again unless asked.
- Every follow-up tweet should include a docs URL for that specific feature.
Prefer a bare URL over `Docs: <url>` unless the label is needed for clarity.
Keep follow-ups concise: around 160-220 raw characters is usually the sweet
spot; under 280 is the hard cap. If a URL makes a tweet fail, trim prose
before dropping the URL.
Prefer explaining diagnostics, trajectory/export, provider setup, model
commands, or other setup-heavy features in follow-ups instead of overloading
the first release tweet.
- Hotfix/correction: be direct and accountable. State what slipped, what is
fixed, and the new version. Keep jokes out of incident-style posts.
Examples to adapt:
```text
OpenClaw 2026.4.20-beta.1 🦞
🐳 Docker install/update smoke
🖥️ Parallels upgrade checks
🔧 Package verification tightened
Beta first. Stable after the gauntlet.
<release link>
```
```text
OpenClaw 2026.4.20 🦞
🚀 Faster install + update
🐳 Docker + Parallels verified
🍎 macOS signed + notarized
🔧 Channel/plugin fixes
Good boring release. Best kind.
<release link>
```
```text
Packaging issue in 2026.4.20-beta.1.
2026.4.20-beta.2 fixes install/update verification. No tag rewrites; beta moves
forward.
Upgrade with the beta channel.
<release link>
```
## Run publish-time validation
Before tagging or publishing, run:
```bash
pnpm release:fast-pretag-check
pnpm check:architecture
pnpm build
pnpm ui:build
pnpm qa:otel:smoke
pnpm release:check
pnpm test:install:smoke
```
- Treat `pnpm release:fast-pretag-check` as a hard packaging gate. Every
publishable plugin must have a non-empty package-root `README.md`, build its
package-local runtime, and pass the npm and ClawHub release metadata checks
before a tag or publish workflow can start. Do not defer README, entrypoint,
or packed-artifact failures to postpublish verification.
- Before tagging, require green CI for the exact release-candidate SHA, not an
earlier branch SHA. Heal every related red CI, release-check, packaging, or
root-Dockerfile lane on the release branch, forward-port the fix to `main`,
and rerun the affected exact-SHA gates. Never waive a red Docker lane because
npm preflight passed.
- Root Dockerfile proof is mandatory before every beta and stable tag. Run the
release `install-smoke` group or equivalent root Dockerfile build for the
exact candidate SHA and require it to pass. The tag-triggered Docker Release
workflow is post-tag publishing, not the first valid proof that the root
Dockerfile can build.
- Before tagging, diff publishable plugin package manifests against the last
reachable stable/beta release tag. For every newly publishable package
(`openclaw.release.publishToNpm: true` or `publishToClawHub: true`) whose
package name did not exist in the base tag, verify the target registry package
already exists in npm/ClawHub or stop and help the owner mint/prepublish the
package first. Do not hide or disable release surfaces just to unblock a
train unless the owner explicitly decides the plugin should not ship in that
release; first-package registry ownership is release prep, not product
rollback. The mint/prepublish path must either be the real release publish
path for the auto-bumped beta version, or a deliberately non-consuming
registry-prep step that cannot occupy the next beta version/tag. Confirm
registry owner, npm scope/package-creation permission, provenance path, and
first-package publish plan before the full release publish continues. Useful
npm probe:
`npm view <package-name> version dist-tags --json --prefer-online`; a 404 for
a package newly added to the release is a release-prep blocker, not something
to discover from the publish job.
- Use `pnpm qa:otel:smoke` when release validation needs telemetry coverage.
It starts a local OTLP/HTTP trace receiver, runs QA-lab's
`otel-trace-smoke`, and checks span names plus content/identifier redaction
without external Opik or Langfuse credentials.
For a non-root smoke path:
```bash
OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke
```
After npm publish, run:
```bash
node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
```
- This verifies the published registry install path in a fresh temp prefix.
- For stable correction releases like `YYYY.M.PATCH-N`, it also verifies the
upgrade path from `YYYY.M.PATCH` to `YYYY.M.PATCH-N` so a correction publish cannot
silently leave existing global installs on the old base stable payload.
- Treat install smoke as a pack-budget gate too. `pnpm test:install:smoke`
now fails the candidate update tarball when npm reports an oversized
`unpackedSize`, so release-time e2e cannot miss pack bloat that would risk
low-memory install/startup failures.
- Keep direct npm global coverage enabled in install smoke. It exercises plain
`npm install -g <candidate>` fresh installs and npm-driven update installs,
because many users install with npm even when docs prefer pnpm.
- Use `pnpm test:live:media video` for bounded video-provider smoke when video
generation is in release scope. The default video smoke skips `fal`, runs one
text-to-video attempt per provider with a one-second lobster prompt, and caps
each provider operation with `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS`
(`180000` by default).
- Run `pnpm test:live:media video --video-providers fal` only when FAL-specific
proof is required. Its queue latency can dominate release time.
- Set `OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1` only when intentionally
validating the slower image-to-video and video-to-video transform lanes.
## Check all relevant release builds
- Always validate the OpenClaw npm release path before creating the tag.
- Use the configured secret workflow before live release validation so OpenAI
and Anthropic credentials are available without printing secrets.
- Parallels validation and any local live model QA for this train must use both
`OPENAI_API_KEY` and `ANTHROPIC_API_KEY`. If either cannot be injected, stop
before starting those local long lanes and report the missing key.
- Live credentialed channel QA is the GitHub Actions workflow
`QA-Lab - All Lanes` (`.github/workflows/qa-live-telegram-convex.yml`), not a
local substitute. Dispatch it from Actions against the release tag and wait
for it to pass before npm preflight/publish readiness. Use a SHA only when it
satisfies the workflow's secret-bearing trust gate: main ancestor or open PR
head. It runs the QA Lab mock parity gate plus live Matrix and live Telegram
lanes using the `qa-live-shared` environment; Telegram uses Convex CI
credential leases.
- Default release checks:
- `pnpm check`
- `pnpm check:test-types`
- `pnpm check:architecture`
- `pnpm build`
- `pnpm ui:build`
- `pnpm release:check`
- `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- Full pre-npm beta test roster:
- default release checks above
- all Docker tests: `pnpm test:docker:all`, plus standalone Docker live lanes
not covered by the aggregate when operator says "all docker tests":
`pnpm test:docker:live-acp-bind`, `pnpm test:docker:live-cli-backend`, and
`pnpm test:docker:live-codex-harness`
- all Parallels install/update tests:
`pnpm test:parallels:npm-update -- --json` plus any needed individual
rerun lanes from `openclaw-parallels-smoke`
- all QA release validation: dispatch GitHub Actions > `QA-Lab - All Lanes`
against the release tag and require success. This is the release gate for
live credentialed Matrix/Telegram channel coverage. Use a SHA only when it
satisfies the workflow trust gate. Run local OpenAI/Anthropic suites or
repo-backed character evals only when the operator asks for extra model
coverage or a failure needs local debugging.
- Post-published beta verification roster:
- `node --import tsx scripts/openclaw-npm-postpublish-verify.ts <beta-version>`
- install/update smoke against the published beta channel
- Docker install/update coverage that exercises the published beta package
- published npm Telegram proof: dispatch Actions > `NPM Telegram Beta E2E`
from `main` with `package_spec=openclaw@<beta-version>` and
`provider_mode=mock-openai`, and require success. This workflow is
maintainer-dispatched and intentionally has no `npm-release` approval gate;
`qa-live-shared` only supplies the shared QA secrets. This is the default
button path for installed-package onboarding, Telegram setup, and real
Telegram E2E against the published npm package.
Use the local `pnpm test:docker:npm-telegram-live` lane with the matching
`OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC` and Convex CI env only as a fallback
or debugging path.
- Parallels published beta install/update coverage with both OpenAI and
Anthropic provider keys available
- Parallels install/update proof must keep plugin installs enabled unless the
operator explicitly scopes a harness-only isolation check; a lane that
disables bundled plugin installs is not valid plugin/dependency release
evidence.
- targeted QA reruns only for areas touched by fixes after the full pre-npm
roster, unless the operator requests the full QA roster again. If the fix
touches live channel QA, credential plumbing, Matrix, Telegram, or the QA
harness, rerun Actions > `QA-Lab - All Lanes`.
- Check all release-related build surfaces touched by the release, not only the npm package.
- For beta-style full e2e batteries, hard-cap top-level long lanes instead of letting them run indefinitely. Use host `timeout --foreground`/`gtimeout --foreground` caps such as:
- `45m` for `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- `90m` for `pnpm test:docker:all`
- `60m` each for standalone Docker live lanes
- `180m` for local full QA live OpenAI + Anthropic rosters when explicitly
requested; the default release channel QA gate is Actions >
`QA-Lab - All Lanes`
- Parallels caps from the `openclaw-parallels-smoke` skill
If a lane hits its cap, stop and inspect/fix the affected lane before continuing; do not continue to wait on the same process.
- Actual npm install/update phases are capped at 5 minutes. If `npm install -g`, installer package install, or `openclaw update` takes longer than 300s in release e2e, stop treating the run as healthy progress and debug the installer/updater or harness.
- Serialize host build/package mutations ahead of VM lanes. Finish `pnpm build`, `pnpm ui:build`, `pnpm release:check`, install smoke, and any Docker/package-prep lanes before starting Parallels `npm pack` lanes; otherwise `dist` can disappear during VM pack prep and produce false failures.
- Include mac release readiness in preflight by running the public validation
workflow in `openclaw/openclaw` and the real mac preflight in
`openclaw/releases-private` for every release.
- Treat the `appcast.xml` update on `main` as part of mac release readiness, not an optional follow-up.
- The workflows remain tag-based. The agent is responsible for making sure
preflight runs complete successfully before any publish run starts.
- Any fix after preflight means a new commit. Delete and recreate the tag and
matching GitHub release from the fixed commit, then rerun preflight from
scratch before publishing.
Exception: never delete or recreate a beta tag whose matching npm package has
already been published; increment to the next beta number instead. If only the
pushed tag/prerelease exists and npm publish has not happened, recreate that
same beta tag at the fixed commit.
- For stable mac releases, generate the signed `appcast.xml` before uploading
public release assets so the updater feed cannot lag the published binaries.
- Serialize stable appcast-producing runs across tags so two releases do not
generate replacement `appcast.xml` files from the same stale seed.
- For stable releases, rely primarily on the latest beta's broader release
workflow confidence. When promoting the matching non-beta build to npm
`latest`, prefer a light time-bounded verification pass: published npm
postpublish verify, Docker install/update smoke, macOS-only Parallels
install/update smoke, and required QA signal. Do not rerun the full
Docker/Parallels matrix unless the beta evidence is stale, the stable build
differs materially from beta, or the operator explicitly asks for full
retesting.
- If any required build, packaging step, or release workflow is red, do not say the release is ready.
## Use the right auth flow
- OpenClaw publish uses GitHub trusted publishing.
- Stable npm promotion from `beta` to `latest` uses the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow because `npm dist-tag` management needs `NPM_TOKEN`, while the
public npm release workflow stays OIDC-only.
- Prefer fixing the private workflow token path over any local 1Password
fallback. The desired setup is a granular npm token stored as the private
repo's `NPM_TOKEN` secret, scoped to the `openclaw` package with read/write
and 2FA bypass for automation.
- If the private dist-tag workflow cannot promote because `NPM_TOKEN` is absent
or stale, use the local tmux + 1Password fallback:
- Start or reuse a tmux session so interactive `npm login` and OTP prompts
are observable and recoverable.
- Hard rule: never run `op` directly in the main agent shell during release
work. Any 1Password CLI use must happen inside that tmux session so prompts
and alerts are contained and observable.
- Use `$release-private` for the npm credentials and OTP item.
Do not print passwords, tokens, or OTPs to the transcript; send them through
tmux buffers, env vars scoped to the tmux command, or `expect` with
`log_user 0`.
- Re-authenticate npm inside that tmux session with
`npm login --auth-type=legacy`, then confirm `npm whoami` reports
`steipete`.
- Promote with a fresh OTP:
`npm dist-tag add openclaw@YYYY.M.PATCH latest --otp "$OTP"`.
- Verify with a cache-bypassed registry read, for example:
`npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`
and `npm view openclaw@latest version dist.tarball --json --prefer-online`.
- Direct stable publishes can also use that private dist-tag workflow to point
`beta` at the already-published `latest` version when the operator wants both
tags aligned immediately.
- The publish run must be started manually with `workflow_dispatch`.
- The npm workflow and the private mac publish workflow accept
`preflight_only=true` to run validation/build/package steps without uploading
public release assets.
- Real npm publish requires a prior successful npm preflight run id and the
successful Full Release Validation run id for the same tag/SHA so the publish
job promotes the prepared tarball instead of rebuilding it and attaches the
correct release evidence.
- Real private mac publish requires a prior successful private mac preflight
run id so the publish job promotes the prepared artifacts instead of
rebuilding or renotarizing them again.
- The private mac workflow also accepts `smoke_test_only=true` for branch-safe
workflow smoke tests that use ad-hoc signing, skip notarization, skip shared
appcast generation, and do not prove release readiness.
- `preflight_only=true` on the npm workflow is also the right way to validate an
existing tag after publish; it should keep running the build checks even when
the npm version is already published.
- npm registry metadata is eventually consistent immediately after trusted
publishing. Keep postpublish `npm view` checks on bounded `--prefer-online`
retries, and carry that verified tarball/integrity metadata into later proof
steps instead of reading the registry again. If the OpenClaw npm child
succeeded but the parent publish workflow failed on an immediate exact-version
`E404`, verify the exact version with a cache-bypassed registry read, run the
standalone postpublish verifier and the full beta verifier with the original
successful child run IDs, then finalize the draft, dependency evidence asset,
and release proof manually. Never rerun the publish workflow for that
already-published version.
- npm validation-only preflight may still be dispatched from ordinary branches
when testing workflow changes before merge. Release checks and real publish
use only `main` or `release/YYYY.M.PATCH`.
- `.github/workflows/macos-release.yml` in `openclaw/openclaw` is now a
public validation-only handoff. It validates the tag/release state and points
operators to the private repo. It still rebuilds the JS outputs needed for
release validation, but it does not sign, notarize, or publish macOS
artifacts.
- `openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
is the required private mac validation lane for `swift test`; keep it green
before any real stable mac publish run starts.
- Real mac preflight and real mac publish both use
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`.
- The private mac validation lane runs on GitHub's standard macOS runner.
- The private mac preflight path runs on GitHub's xlarge macOS runner and uses
a SwiftPM cache because the build/sign/notarize/package path is CPU-heavy.
- Private mac preflight uploads notarized build artifacts as workflow artifacts
instead of uploading public GitHub release assets.
- Private smoke-test runs upload ad-hoc, non-notarized build artifacts as
workflow artifacts and intentionally skip stable `appcast.xml` generation.
- For stable releases, npm preflight, Full Release Validation, public mac
validation, private mac validation, and private mac preflight must all pass
before any real publish run starts. For beta releases, npm preflight and Full
Release Validation must pass before npm publish unless the operator explicitly
waives the full gate; mac beta validation is still only required when
requested.
- Real publish runs may be dispatched from `main` or from a
`release/YYYY.M.PATCH` branch. For release-branch runs, the tag must be contained
in that release branch, and the real publish must reuse a successful preflight
from the same branch.
- The release workflows stay tag-based; rely on the documented release sequence
rather than workflow-level SHA pinning.
- The `npm-release` environment must be approved by `@openclaw/openclaw-release-managers` before publish continues.
- Mac publish uses
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml` for
private mac preflight artifact preparation and real publish artifact
promotion.
- Real private mac publish uploads the packaged `.zip`, `.dmg`, and
`.dSYM.zip` assets to the existing GitHub release in `openclaw/openclaw`
automatically when `OPENCLAW_PUBLIC_REPO_RELEASE_TOKEN` is present in the
private repo `mac-release` environment.
- For stable releases, the agent must also download the signed
`macos-appcast-<tag>` artifact from the successful private mac workflow and
then update `appcast.xml` on `main`.
- For beta mac releases, do not update the shared production `appcast.xml`
unless a separate beta Sparkle feed exists.
- The private repo targets a dedicated `mac-release` environment. If the GitHub
plan does not yet support required reviewers there, do not assume the
environment alone is the approval boundary; rely on private repo access and
CODEOWNERS until those settings can be enabled.
- Do not use `NPM_TOKEN` or the plugin OTP flow for the OpenClaw package
publish path; package publishing uses trusted publishing.
- Use `NPM_TOKEN` only for explicit npm dist-tag management modes, because npm
does not support trusted publishing for `npm dist-tag add`.
- `@openclaw/*` plugin publishes use a separate maintainer-only flow.
- Publishable plugins that are new to npm require owner-led first-package
minting before the full release publish. Do not consume the next beta version
with an ad-hoc manual package publish; use the release-owned auto-bumped
version path, or a non-consuming registry setup/preflight step. Bundled
disk-tree-only plugins stay unpublished.
## Fallback local mac publish
- Keep the original local macOS publish workflow available as a fallback in case
CI/CD mac publishing is unavailable or broken.
- Preserve the existing maintainer workflow Peter uses: run it on a real Mac
with local signing, notary, and Sparkle credentials already configured.
- Follow the private maintainer macOS runbook for the local steps:
`scripts/package-mac-dist.sh` to build, sign, notarize, and package the app;
manual GitHub release asset upload; then `scripts/make_appcast.sh` plus the
`appcast.xml` commit to `main`.
- `scripts/package-mac-dist.sh` now fails closed for release builds if the
bundled app comes out with a debug bundle id, an empty Sparkle feed URL, or a
`CFBundleVersion` below the canonical Sparkle build floor for that short
version. For correction tags, set a higher explicit `APP_BUILD`.
- `scripts/make_appcast.sh` first uses `generate_appcast` from `PATH`, then
falls back to the SwiftPM Sparkle tool output under `apps/macos/.build`.
- For stable tags, the local fallback may update the shared production
`appcast.xml`.
- For beta tags, the local fallback still publishes the mac assets but must not
update the shared production `appcast.xml` unless a separate beta feed exists.
- Treat the local workflow as fallback only. Prefer the CI/CD publish workflow
when it is working.
- After any stable mac publish, verify all of the following before you call the
release finished:
- the GitHub release has `.zip`, `.dmg`, and `.dSYM.zip` assets
- `appcast.xml` on `main` points at the new stable zip
- the packaged app reports the expected short version and a numeric
`CFBundleVersion` at or above the canonical Sparkle build floor
## Run the release sequence
1. Confirm the operator explicitly wants to cut a release.
2. Choose the exact target version and git tag.
3. Commit any dirty files in coherent groups, push, pull/rebase, and verify the
worktree is clean.
4. Pull latest `main` and confirm current `main` CI is green.
5. Run `/changelog` for the stable base target version on `main`, commit the
changelog rewrite immediately, push, and pull/rebase. For beta releases,
keep the changelog heading as `## YYYY.M.PATCH`, not `## YYYY.M.PATCH-beta.N`.
6. Create `release/YYYY.M.PATCH` from that post-changelog `main` commit.
7. Make every repo version location match the beta tag before creating it.
8. Commit release preparation changes on the release branch and push the branch.
9. Immediately dispatch Actions > `OpenClaw Performance` from `main` with
`target_ref=<release-sha>`, `profile=release`, `repeat=3`, deep profiling
off, live OpenAI off, and regression failure off. Let it run in parallel
with preflight and validation work.
10. Run the fast local beta preflight from the release branch before any npm
preflight or publish. Require exact-SHA CI and root Dockerfile install-smoke
to be green before tagging. Keep the remaining expensive Docker, Parallels,
and published-package install/update lanes for after the beta is live unless
the operator asks to run them before beta publication.
11. For beta releases, skip mac app build/sign/notarize unless beta scope or a
release blocker specifically requires it. For stable releases, include the
mac app, signing, notarization, and appcast path.
12. Confirm the target npm version is not already published.
13. Create and push the git tag from the release branch.
14. Do not create or publish the matching GitHub release page yet. The real
publish workflow creates or undrafts it only after postpublish verification
and release evidence upload pass.
15. Dispatch Actions > `QA-Lab - All Lanes` against the release tag and wait
for the mock parity, live Matrix, and live Telegram credentialed-channel
lanes to pass.
16. Start `.github/workflows/openclaw-npm-release.yml` from the release branch
with `preflight_only=true`
and choose the intended `npm_dist_tag` (`beta` default; `latest` only for
an intentional direct stable publish). Wait for it to pass. Save that run id
because the real publish requires it to reuse the prepared npm tarball.
17. Before real publish, review the early performance run if it has completed.
Compare against earlier release evidence or clawgrit reports where
available. Call out minor regressions in the release proof; block on major
regressions unless waived or proven noisy.
18. For stable releases, start `.github/workflows/macos-release.yml` in
`openclaw/openclaw` and wait for the public validation-only run to pass.
19. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
with the same tag and wait for the private mac validation lane to pass.
20. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
with `preflight_only=true` and wait for it to pass. Save that run id because
the real publish requires it to reuse the notarized mac artifacts.
21. If any preflight or validation run fails, fix the issue on a new commit,
delete the tag and any accidental draft/incomplete GitHub release, recreate
the tag from the fixed commit, and rerun all relevant preflights from
scratch before continuing. Never reuse old preflight results after the
commit changes. Once the npm version exists, do not rerun the publish
workflow for that same version; finalize the existing draft/evidence state
manually or cut a correction tag. For pushed or published beta tags, do not
delete/recreate; increment to the next beta tag. For preflight-only failures
where npm did not publish the beta version, delete/recreate the same beta
tag and any accidental draft/incomplete prerelease at the fixed commit
instead of skipping a prerelease number.
22. Start `.github/workflows/openclaw-release-publish.yml` from the same branch with
the same tag for the real publish, choose `npm_dist_tag` (`beta` default,
`latest` only when you intentionally want direct stable publish), keep it
the same as the preflight run, and pass the successful npm
`preflight_run_id` plus the successful `full_release_validation_run_id`.
For stable publish, also pass the exact non-prerelease
`openclaw/openclaw-windows-node` tag as `windows_node_tag` and its
candidate-approved installer digest map as `windows_node_installer_digests`.
23. Wait for `npm-release` approval from `@openclaw/openclaw-release-managers`.
24. Wait for the real publish workflow to run postpublish verification,
create or update the GitHub release as a draft, upload dependency evidence,
promote and verify the required Windows Hub assets for stable releases,
append release verification proof, and only then undraft/publish it. If a
waited plugin publish or Windows Hub promotion fails after OpenClaw npm
succeeds, the workflow keeps the release draft with OpenClaw npm evidence
and exits red; do not undraft until the gap is repaired. The standalone
verifier command remains the first recovery probe:
`node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>`.
For a failed postpublish parent after successful publish children, also run
`pnpm release:verify-beta -- <published-version> ... --skip-github-release`
with the original child run IDs and an evidence output path before manually
recreating the workflow's draft, dependency evidence asset, proof section,
and publish step.
25. Run the post-published beta verification roster. First scan current `main`
for critical fixes that landed after the release branch cut; backport only
important low-risk fixes before starting expensive lanes, or increment to
the next beta if the fix must change the already-published package. If any
lane fails after the beta package is published, fix, commit/push/pull,
increment to the next beta tag, and rerun the affected beta evidence. Once
the beta is live, start remote/manual rosters where they
can overlap safely, but keep local Docker and Parallels load controlled.
Ensure the full expensive roster has passed at least once before
stable/latest promotion. The roster includes the manual Actions >
`NPM Telegram Beta E2E` workflow against the exact published beta package.
If a pre-npm lane fails before any tag/package leaves the machine, fix and
rerun the same intended beta attempt. Repeat up to the operator's
authorized beta-attempt limit, normally 4.
26. Announce the beta/stable release on Discord best-effort using the configured secret workflow.
27. If the operator requested beta only, stop after beta verification and the
announcement.
28. If the stable release was published to `beta`, use the light stable
promotion roster when the matching beta already carried the full confidence
pass: published npm postpublish verify, Docker install/update smoke,
macOS-only Parallels install/update smoke, and required QA signal.
Then start the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow to promote that stable version from `beta` to `latest`, then
verify `latest` now points at that version.
29. If the stable release was published directly to `latest` and `beta` should
follow it, start that same private dist-tag workflow to point `beta` at the
stable version, then verify both `latest` and `beta` point at that version.
30. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
for the real publish with the successful private mac `preflight_run_id` and
wait for success.
31. Verify the successful real private mac run uploaded the `.zip`, `.dmg`,
and `.dSYM.zip` artifacts to the existing GitHub release in
`openclaw/openclaw`.
32. For stable releases, download `macos-appcast-<tag>` from the successful
private mac run, update `appcast.xml` on `main`, verify the feed, then
complete the **Close stable releases on main** gate.
33. For beta releases, publish the mac assets only when intentionally requested;
expect no shared production
`appcast.xml` artifact and do not update the shared production feed unless a
separate beta feed exists.
34. After stable main closeout, verify npm and the attached release artifacts.
## GHSA advisory work
- Use `openclaw-ghsa-maintainer` for GHSA advisory inspection, patch/publish flow, private-fork validation, and GHSA API-specific publish checks.

View File

@@ -1,290 +0,0 @@
---
name: release-openclaw-nightly
description: "OpenClaw Tideclaw alpha/nightly release automation: isolated branches, local fixes, release CI, branch retention, and forward-port to main."
---
# Nightly Release
Use for Tideclaw/OpenClaw alpha/nightly release automation, manual alpha triggers, beta prep, release-branch repair, and post-release forward-port. Load `$release-private` if it exists before using Tideclaw host paths, cron ids, or Discord routing ids.
## Policy
- Alpha/nightly runs every 12h or by manual trigger.
- Beta is human-triggered from Discord from a proven alpha/release branch.
- Stable/latest always needs explicit human confirmation.
- Never publish from a dirty checkout or directly from `main`.
- Main can be busy or broken; alpha work must be isolated so transient main failures do not block a usable nightly.
- Publish only after release-branch proof is green.
- After a successful alpha, forward-port release-branch commits back to `main` and prove main CI green.
- Forward-port PRs contain only reusable fixes needed to make nightly/release checks pass. They must not contain alpha version bumps, release notes, changelog release entries, tags, generated artifacts, or state-file updates.
- Keep only alpha/nightly branches from the last 3 days, plus any branch with an active run, open PR, or release tag.
- Never run broad env/token dumps. For GitHub writes on the Tideclaw host, use the Tideclaw `gh` write wrapper below.
## Identity
Tideclaw should commit under its own machine identity on release branches and forward-port branches:
```bash
git config user.name "Tideclaw"
git config user.email "tideclaw@openclaw.ai"
```
This is good for auditability if commits are clearly machine-authored and gated by CI. Avoid direct pushes to protected `main`; forward-port via PR/automerge unless the repo policy explicitly allows the bot to push after green checks. Include human `Co-authored-by` only when a human supplied the patch or explicit commit text.
## Branch Shape
- Branch prefix: `tideclaw/alpha/`
- Branch name: `tideclaw/alpha/YYYY-MM-DD-HHMMZ`
- Base: current `origin/main` SHA at trigger time.
- State file: resolve from `$release-private` on the Tideclaw host.
- Release tag: `vYYYY.M.PATCH-alpha.N`
- npm dist-tag: `alpha`
`PATCH` is a sequential monthly release-train number, never the calendar day. Determine the alpha train from stable and beta releases; ignore alpha-only patch numbers when choosing the next train. Use one greater than the highest stable/beta patch for the month, then increment only `alpha.N` for repeated nightlies on that train. If a beta exists on that next patch, move alpha to the following train. Legacy alpha-only tags with inflated patch numbers do not advance beta/stable numbering.
Do not reuse old alpha branches for a new run. If rerunning the same base SHA, create a new timestamped branch and record why.
## Start
1. Work in the Tideclaw host checkout from `$release-private`.
2. Fetch first:
```bash
git fetch origin main --tags --prune
git switch main
git merge --ff-only origin/main
BASE_SHA="$(git rev-parse origin/main)"
BRANCH="tideclaw/alpha/$(date -u +%Y-%m-%d-%H%MZ)"
git switch -c "$BRANCH" "$BASE_SHA"
```
3. Read repo release docs/scripts before changing anything:
- `AGENTS.md`
- release docs under `docs/`
- release scripts under `scripts/`
- `.github/workflows/*release*`
4. Compare `$BASE_SHA` with the last successful alpha state and current git/npm/GitHub alpha tags. If already released, report skip and do not publish.
Manual trigger:
```bash
CRON_ID="<from release-private>"
OPENCLAW_ALLOW_ROOT=1 openclaw cron run "$CRON_ID" --expect-final --timeout 21600000
```
## Discord Alpha Trigger
Tideclaw may run alpha immediately from Discord when a maintainer mentions Tideclaw in `#releases` or `#maintainers`.
Accepted shapes:
```text
@Tideclaw run alpha now
@Tideclaw alpha release from main now
@Tideclaw trigger alpha
```
Rules:
1. Treat this as a manual alpha trigger equivalent to the alpha cron job.
2. Start from current `origin/main` and create a fresh `tideclaw/alpha/YYYY-MM-DD-HHMMZ` branch.
3. Follow the normal alpha workflow: reuse prior fixes, run local checks, fix on the alpha branch, run release CI, publish alpha after green gates, then forward-port reusable fixes via fixes-only PR.
4. If another alpha/beta/stable release run is already active, report the active branch/run and stop.
5. `#maintainers` trigger requires an explicit Tideclaw mention; do not react to unmentioned release chatter there.
6. Resolve Discord role/user ids and live host hotfix notes from `$release-private`.
## Discord Beta Trigger
Tideclaw may run beta releases from `#releases` or mentioned `#maintainers` commands only when a maintainer sends an explicit beta trigger. Treat this as human approval for beta, not for stable/latest.
Accepted shapes:
```text
@Tideclaw beta release from vYYYY.M.PATCH-alpha.N
@Tideclaw beta release from tideclaw/alpha/YYYY-MM-DD-HHMMZ
@Tideclaw beta release from latest proven alpha
```
Rules:
1. Require the words `beta release` and a source alpha tag/branch, or `latest proven alpha`.
2. If the source is ambiguous, ask one clarifying question in `#releases` and stop.
3. Verify the source alpha first: GitHub release, npm `alpha` package, release CI, recorded state file, and branch/tag SHA.
4. Create a fresh beta branch `tideclaw/beta/YYYY-MM-DD-HHMMZ` from the proven alpha source, not directly from a moving `main`.
5. Reuse/squash only stabilization fixes already proven on alpha. Do not import unrelated alpha release mechanics unless the beta release docs require them.
6. Compute beta as `vYYYY.M.PATCH-beta.N`, matching npm `--tag beta`. Ignore alpha-only patch numbers when selecting the beta train.
7. Run beta release validation/preflight/full release CI and fix failures on the beta branch.
8. Publish beta only after green beta gates. Use GitHub Actions/OIDC, never direct npm publish from the host.
9. Final Discord summary must include source alpha, beta tag/version, branch, fix commits, workflow run IDs, npm/GitHub proof, and any skipped/blocked reason.
10. After beta publishes, forward-port reusable fixes to `main` using the same fixes-only PR rules below.
## Reuse Prior Fixes
Before running checks, mine recent Tideclaw alpha branches for fixes already made during previous release attempts:
1. Read the Tideclaw state file from `$release-private` for the last successful alpha branch and fix commit SHAs.
2. List recent remote branches:
```bash
git for-each-ref refs/remotes/origin/tideclaw/alpha --format='%(refname:short) %(committerdate:iso-strict)'
```
3. Consider only Tideclaw alpha branches from the last 3 days plus the last successful alpha branch.
4. For each candidate branch, inspect commits that are not in current `origin/main`:
```bash
git log --no-merges --reverse --format='%H%x09%s' origin/main..origin/tideclaw/alpha/YYYY-MM-DD-HHMMZ
```
5. Cherry-pick only real stabilization fixes that still apply to the new alpha branch. Prefer commits recorded as `fixCommitShas` in the state file.
6. Skip version bumps, changelog release entries, tag artifacts, generated release notes, state-file-only commits, and one-off debug instrumentation.
7. If a cherry-pick conflicts, inspect whether current main already contains an equivalent fix. If not, resolve minimally and keep the commit message clear.
8. Record reused commit SHAs separately from newly authored fix SHAs in the alpha state and final Discord summary.
Use `git cherry`, `git range-diff`, and targeted test reruns to avoid duplicating fixes already present on `main`.
## Repair Loop
Use the branch as a release-candidate repair surface:
1. Run narrow local checks first: changed tests, release preflight, type/lint/build gates required by release docs.
2. If local checks fail, fix on the alpha branch with minimal commits.
3. Commit each coherent fix as Tideclaw.
4. Re-run the failed local check after each fix.
5. Do not hide failures by editing baselines, expected-failure lists, ignore files, or release inventory unless the release docs explicitly require it and the diff is justified.
6. If a failure is flaky, rerun once; if still red, treat it as real.
7. If the fix is clearly useful for main, keep it small and forward-portable. Avoid broad refactors during alpha stabilization.
Commit examples:
```bash
git add <files>
git commit -m "fix: stabilize alpha release preflight"
git push -u origin "$BRANCH"
```
## Release CI
After local proof:
1. Compute the next `vYYYY.M.PATCH-alpha.N` from existing git tags, npm versions, and GitHub releases. Select `PATCH` from stable/beta trains, not the date or the highest alpha-only patch. Reuse the same alpha train and increment `alpha.N` until that patch has a beta; after a beta exists, use the following patch for new alpha builds.
2. Make the alpha branch package version and release metadata match that tag, commit it, and push the branch.
3. Run release validation from the alpha branch, using GitHub CLI, not browser/fetch tools. On the Tideclaw host, bare `gh` is a read-only Codex sandbox wrapper; use `/usr/local/bin/gh-tideclaw-write` for write-capable commands such as `workflow run`, `run cancel`, and publish dispatch:
```bash
GH="/usr/local/bin/gh-tideclaw-write"
SHA="$(git rev-parse HEAD)"
TAG="v$(node -p "require('./package.json').version")"
BRANCH="$(git branch --show-current)"
"$GH" workflow run full-release-validation.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f ref="$BRANCH" \
-f release_profile=beta \
-f rerun_group=all
"$GH" workflow run openclaw-npm-release.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f tag="$SHA" \
-f preflight_only=true \
-f npm_dist_tag=alpha
```
4. Watch the exact workflow run IDs and head SHA with `gh run list`, `gh run view`, and `gh api`. Read-only `gh` is fine for polling; use `$GH` only when a command mutates GitHub. Do not use Codex browser/fetch for GitHub API polling; prior Tideclaw runs failed there after successful preflight.
5. For alpha, blocking gates are the ones Tideclaw can repair directly or that prove package safety: normal CI, plugin prerelease, npm preflight, package preparation, install smoke, tag/reachability, and publish verification. Treat cross-OS, live channel, QA Lab, package acceptance, long Docker E2E, and Telegram package E2E failures as advisory; report them in Discord and continue if the blocking gates are green.
- If `rerun_group=all` is stuck only on advisory lanes after CI, plugin prerelease, npm preflight, package preparation, and install smoke are green, dispatch a focused Full Release Validation on the same head with `-f rerun_group=install-smoke`. Use that successful focused Full Release Validation run as the publish proof, and include the separate CI/plugin/full advisory run IDs in the Discord summary.
6. If a blocking gate fails, fix on the alpha branch, push, and rerun only the failed or required release CI. If the commit changes, discard old preflight/full-validation run IDs and rerun them for the new head.
7. After full validation and npm preflight are green on the same branch head, create and push the release tag from that exact commit:
```bash
git tag -a "$TAG" "$SHA" -m "openclaw ${TAG#v}"
git push origin "$TAG"
```
8. Dispatch the publish wrapper from the same alpha branch. Use the successful npm preflight run ID and full release validation run ID from the same head SHA:
```bash
"$GH" workflow run openclaw-release-publish.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f tag="$TAG" \
-f preflight_run_id="$NPM_PREFLIGHT_RUN_ID" \
-f full_release_validation_run_id="$FULL_RELEASE_VALIDATION_RUN_ID" \
-f npm_dist_tag=alpha \
-f plugin_publish_scope=all-publishable \
-f publish_openclaw_npm=true \
-f release_profile=beta \
-f wait_for_clawhub=false
```
9. Watch the publish wrapper plus child runs. If `openclaw-npm-release.yml` is waiting on the `npm-release` environment and Tideclaw cannot approve it, report that as the only blocker; do not call the release done.
10. Do not publish npm directly from the host; use GitHub Actions/OIDC.
Important: `openclaw-npm-release.yml` with `preflight_only=true` only prepares artifacts. It does not publish. A successful alpha requires the later `openclaw-release-publish.yml` wrapper, a pushed git tag, npm `alpha` dist-tag proof, and a GitHub prerelease.
## Verify Published Alpha
Release is not done until all are true:
- GitHub tag exists.
- GitHub Release exists and is marked prerelease.
- Release body links npm version page, registry tarball, integrity, and CI/proof.
- `npm view openclaw@<version>` shows the exact version, dist-tag `alpha`, tarball, integrity, and publish time.
- Installed/package smoke follows repo release docs.
- The Tideclaw state file from `$release-private` records version, tag, base SHA, branch, fix commit SHAs, workflow run IDs, npm integrity, and timestamp.
Final Discord summary in `#releases`:
- tag/version
- base SHA
- branch
- fix commits
- workflow run IDs
- npm/GitHub proof
- skipped/blocked reason if not released
Use Discord-safe Markdown links with angle-bracket targets. Never print secrets.
## Forward-Port
After a successful alpha, raise a fixes-only PR back to `main`:
1. Create/update a forward-port branch from current `origin/main`:
```bash
git fetch origin main --prune
git switch -c "tideclaw/forward-port/$(date -u +%Y-%m-%d-%H%MZ)" origin/main
```
2. Cherry-pick only release-branch commits that are real fixes required to make nightly/release checks pass.
3. Exclude alpha version bumps, changelog release entries, release notes, tag artifacts, generated release assets, state-file-only commits, and any commit whose only purpose was publishing the alpha.
4. If a commit mixes a real fix with release/version changes, split it: replay only the fix hunks into a new commit on the forward-port branch.
5. Resolve conflicts in favor of the minimal main-compatible fix.
6. Run the relevant changed/local gate.
7. Push and open a PR, or use the repos allowed bot merge path.
8. Wait for required main CI to go green. If CI fails, fix on the forward-port branch and rerun.
9. Report the PR/merge SHA and any commits intentionally not forward-ported.
If `origin/main` is independently red before the forward-port, document the unrelated failing check and still keep the forward-port PR green against its head when possible.
## Branch Retention
Before and after each run, prune old alpha branches:
1. List `origin/tideclaw/alpha/*`.
2. Keep branches whose timestamp is within the last 3 days UTC.
3. Keep branches referenced by a live workflow run, open PR, release tag, or state file.
4. Delete only Tideclaw-owned alpha branches:
```bash
git push origin --delete tideclaw/alpha/YYYY-MM-DD-HHMMZ
```
Never delete human branches, beta branches, stable branches, or unknown prefixes.
## Stop Conditions
Stop and report clearly if:
- release docs/scripts disagree on versioning or publish path
- required secrets/auth are unavailable
- GitHub Actions cannot be dispatched or observed
- a required release gate stays red after a real fix attempt
- npm/GitHub state disagrees after publish
- forward-port cannot be made green without a larger product decision

View File

@@ -1,234 +0,0 @@
---
name: release-openclaw-plugin-testing
description: Plan and run pre-release OpenClaw plugin validation across bundled plugins, package artifacts, lifecycle commands, doctor/fix, config round-trip, gateway startup, SDK compatibility, Docker E2E, Package Acceptance, and Testbox proof.
---
# OpenClaw Pre-Release Plugin Testing
Use this skill when the user asks for plugin release confidence, plugin lifecycle
sweeps, package-artifact plugin proof, or "what else should we test before
release?" It complements `openclaw-testing`; use that skill too when choosing
the cheapest safe runner or debugging a failing lane.
## Goal
Prove the plugin system as a product surface, not just as source tests:
- bundled plugin lifecycle: install, inspect, enable, disable, uninstall
- package artifact behavior from a clean `HOME`
- doctor/fix/config validation and idempotence
- config discovery and config round-trip
- status/log visibility and diagnostics
- gateway startup/bootstrap with plugin metadata snapshots
- public SDK compatibility for real external plugins
- live-ish provider/channel probes only when safe credentials exist
## First Checks
From the OpenClaw repo root:
```bash
pnpm docs:list
git status --short --branch
readlink node_modules
pnpm changed:lanes --json
```
In Codex worktrees under `.codex/worktrees`, `node_modules` must be a symlink to
the main OpenClaw checkout. Do not run `pnpm install` there. For broad or
package-heavy proof, use Blacksmith Testbox or GitHub Actions.
## Runner Choice
Prefer this order:
1. **GitHub Package Acceptance** for installable-package product proof.
2. **`ci-build-artifacts-testbox.yml` Testbox** when Docker/package lanes need
seeded `dist`, `dist-runtime`, and package caches.
3. **`ci-check-testbox.yml` Testbox** for source checks, targeted Vitest,
package-boundary checks, or focused Docker lanes.
4. **Local targeted commands only** for small format/static/unit probes.
Avoid long package Docker runs from a stale sparse worktree. If Testbox sync
reports hundreds of changed files or starts deleting package inputs, stop and
warm a fresh box from current `main`, or switch to Package Acceptance.
## Existing Baseline
Run or verify these before inventing new coverage:
```bash
OPENCLAW_TESTBOX=1 pnpm check:changed
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
pnpm test:docker:plugins
OPENCLAW_PLUGINS_E2E_CLAWHUB=0 pnpm test:docker:plugins
pnpm test:docker:plugin-update
pnpm test:docker:bundled-channel-deps:fast
```
For full bundled install/uninstall proof, shard the packaged sweep:
```bash
OPENCLAW_BUNDLED_PLUGIN_SWEEP_TOTAL=8 \
OPENCLAW_BUNDLED_PLUGIN_SWEEP_INDEX=<0-7> \
pnpm test:docker:bundled-plugin-install-uninstall
```
Expected current packaged scope: 116 public bundled plugins over shards `0-7`.
Private QA plugins are source-mode only unless a package explicitly includes
them.
## Confidence Matrix
Use this matrix for pre-release signoff. Record pass/fail, run URL/Testbox ID,
package SHA/version, and skipped-live reason.
| Surface | Proof | Preferred runner |
| --- | --- | --- |
| Package artifact | Package Acceptance `suite_profile=package` or custom lanes | GitHub Actions |
| Bundled lifecycle | 8-shard `test:docker:bundled-plugin-install-uninstall` | Testbox or release Docker |
| External plugins | `test:docker:plugins` and `plugins-offline` | Testbox/package acceptance |
| Update no-op | `test:docker:plugin-update` | Testbox/package acceptance |
| Channel runtime deps | `test:docker:bundled-channel-deps:fast` plus key channels | Testbox/package acceptance |
| Doctor/fix | seeded bad configs + `doctor --fix --non-interactive` | new Docker/Testbox harness |
| Config round-trip | `config set/get`, inspect, doctor, reload, diff hash | new Docker/Testbox harness |
| Gateway bootstrap | clean `HOME`, plugin groups enabled/disabled, status JSON | new Docker/Testbox harness |
| SDK compatibility | directory, tgz, and `file:` external plugins using SDK subpaths | `test:docker:plugins` plus new smoke |
| Live-ish | redacted provider/channel probes only for present env | Testbox live lanes |
## Package Acceptance Plan
Use this when validating a release branch, beta, or candidate package:
```bash
gh workflow run package-acceptance.yml \
--repo openclaw/openclaw \
--ref main \
-f workflow_ref=main \
-f source=ref \
-f package_ref=<branch-or-sha> \
-f suite_profile=custom \
-f docker_lanes='plugins-offline plugin-update bundled-channel-deps-compat doctor-switch update-channel-switch config-reload mcp-channels npm-onboard-channel-agent' \
-f telegram_mode=mock-openai
```
Use `source=npm -f package_spec=openclaw@beta` for published beta proof. Keep
`workflow_ref` as trusted current harness code unless the release process says
otherwise.
## New Testbox Harness Plan
If more certainty is needed, add or run a `plugin-lifecycle-matrix` Docker lane
that uses one package tarball and sharded plugin lists. Per plugin:
1. Start with a clean `HOME`.
2. Capture `plugins list --json`.
3. `plugins install <id>`.
4. `plugins inspect <id> --json`.
5. `plugins disable <id>`, then assert disabled visibility.
6. `plugins enable <id>`, except config-required plugins without config.
7. `plugins registry --refresh`.
8. `doctor --non-interactive`.
9. `plugins uninstall <id> --force`.
10. Assert no config entry, allow/deny residue, install record, managed dir, or
bundled `dist/extensions/...` load path remains.
11. Assert diagnostics contain no `level: "error"` and output redacts
secret-looking values.
Keep `memory-lancedb` special: it is config-required. First assert install does
not enable it without embedding config, then run a second configured case.
## Doctor/Fix Matrix
Seed bad states and require `doctor --fix --non-interactive` to repair them,
then run doctor again and require idempotence:
- stale `plugins.allow`
- stale `plugins.entries`
- stale channel config for missing channel plugin
- invalid `plugins.entries.<id>.config`
- packaged bundled path in `plugins.load.paths`
- legacy `plugins.installs`
- disabled channel/plugin config that must not stage runtime deps
- root-owned global package tree that must remain unmodified
## Gateway Bootstrap Matrix
Start packaged OpenClaw in Docker with clean state:
- provider plugins enabled, no credentials: ready with warnings, no crash
- channel plugins configured disabled: no runtime deps staged
- startup-activation plugins enabled: ready and reflected in status
- invalid single plugin config: bad plugin skipped/quarantined, others remain
Assert:
- gateway reaches ready
- `openclaw status --json` includes plugin diagnostics
- `openclaw plugins inspect --all --json` is parseable
- package tree is not mutated
- logs contain no raw tokens
## Config Round-Trip Representatives
Use representative plugin families instead of every plugin for deep config
round-trip:
- providers: `openai`, `anthropic`, `mistral`, `openrouter`
- channels: `telegram`, `discord`, `slack`, `whatsapp`
- memory: `memory-lancedb`
- feature/runtime: `browser`, `acpx`, `tokenjuice`
For each representative:
1. Write config through CLI when possible.
2. Read it back through `config get` or JSON.
3. Run `plugins inspect`.
4. Run `doctor --non-interactive`.
5. Trigger gateway config reload if applicable.
6. Compare config hash before/after no-op commands.
## External SDK Smoke
In a package Docker lane, create tiny external plugins and install them from:
- local directory
- `.tgz`
- `file:` npm spec
Cover CJS and ESM shapes, plus at least one plugin importing focused
`openclaw/plugin-sdk/*` subpaths. Assert `plugins inspect` sees its tool,
gateway method, CLI command, or service.
## Live-Ish Probe Rules
Before live-ish work, source allowed env in Testbox and generate a redacted
availability matrix: present/missing only, never values.
Only run probes for credentials that exist. Prefer auth/catalog/status probes
over sending user-visible messages. If a probe might contact an external user,
channel, or workspace, stop and ask the user.
## Reporting
Report in this shape:
```text
package/ref:
tbx ids / run urls:
matrix:
bundled lifecycle:
package acceptance:
doctor/fix:
gateway bootstrap:
config round-trip:
sdk external:
live-ish:
failures:
skips:
next highest-value gap:
```
Say clearly when a failure is Testbox sync/env damage rather than product
behavior, and prove that with a clean rerun or current-main comparison.

Some files were not shown because too many files have changed in this diff Show More