Compare commits

..

2 Commits

Author SHA1 Message Date
Peter Steinberger
172b6e3ba2 fix: harden memory plugin config (#1149) (thanks @radek-paclt) 2026-01-18 07:09:49 +00:00
Radek Paclt
e0b1561e16 feat(memory): add lifecycle hooks and vector memory plugin
Add plugin lifecycle hooks infrastructure:
- before_agent_start: inject context before agent loop
- agent_end: analyze conversation after completion
- 13 hook types total (message, tool, session, gateway hooks)

Memory plugin implementation:
- LanceDB vector storage with OpenAI embeddings
- kind: "memory" to integrate with upstream slot system
- Auto-recall: injects <relevant-memories> when context found
- Auto-capture: stores preferences, decisions, entities
- Rule-based capture filtering with 0.95 similarity dedup
- Tools: memory_recall, memory_store, memory_forget
- CLI: clawdbot ltm list|search|stats

Plugin infrastructure:
- api.on() method for hook registration
- Global hook runner singleton for cross-module access
- Priority ordering and error catching

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-18 06:59:03 +00:00
21041 changed files with 423368 additions and 4562066 deletions

View File

@@ -1,37 +0,0 @@
# Telegram Maintainer Decisions
Use this page during Telegram PR review. These are intentional maintainer decisions, not incidental implementation details.
Verified against Telegram Bot API 10.0, May 8 2026.
## Streaming
- Do not reintroduce `sendMessageDraft` for answer streaming. Telegram drafts are ephemeral 30-second previews in private chats; final delivery still requires a separate `sendMessage`. OpenClaw uses `sendMessage` plus `editMessageText`, then finalizes in place so the user sees one persistent answer.
- Streaming owns one visible preview message. Edit it forward. Do not send an extra final bubble unless the final edit genuinely failed.
- Keep the first-preview debounce. If a provider sends token-sized deltas, coalesce them into cumulative preview text instead of removing the debounce.
- Respect Telegram limits in the Telegram layer. Text over 4096 chars chains into continuation messages. Polls keep the current Bot API 12-option cap.
## Telegram API Ownership
- Prefer grammY primitives and Telegram-native helpers when they model the behavior directly. Avoid custom Bot API wrappers for behavior grammY already owns.
- Throttling is bot-token scoped. All Telegram API clients for the same token share one grammY `apiThrottler()` instance.
- Do not silently retry failed topic sends without topic metadata. A wrong-surface success is worse than a loud Telegram error.
- DM topics and forum topics are distinct. `direct_messages_topic_id` and `message_thread_id` are not interchangeable.
## Context And Authorization
- Reply context comes from OpenClaw-observed messages. Bot API updates expose `reply_to_message`, but there is no arbitrary `getMessage(chat, id)` hydration path later.
- Current local chat context must outrank stale reply ancestry in the prompt. Old replied-to messages should not look like the active conversation.
- Pairing is DM-only. Group and topic authorization need explicit config allowlists.
- Telegram allowlists use numeric sender IDs. Usernames are optional, mutable, and not a reliable arbitrary-user lookup key in the Bot API.
- Group and channel visible replies are policy-controlled. Normal room replies stay private unless `messages.groupChat.visibleReplies: "automatic"` is set or the agent explicitly calls `message.send`.
## Interactive Surfaces
- Native callbacks stay structured. Approval, native command, plugin, select, and multiselect callbacks must not fall through as raw callback text.
- Preserve callback values exactly, including delimiters such as `env|prod`.
- Native slash commands should remain fast-pathable before full workspace and agent-turn setup.
## Review Standard
Telegram behavior PRs need real Telegram proof when they touch transport, streaming, topics, callbacks, authorization, or reply context. Prefer the bot-to-bot QA lane or an equivalent live Telegram probe over synthetic-only validation.

View File

@@ -1,88 +0,0 @@
---
name: agent-transcript
description: "Add a redacted agent transcript section to GitHub PR or issue bodies during OpenClaw agent-created PR/issue workflows."
---
# Agent Transcript
Best-effort local-only provenance for OpenClaw PR/issue bodies. Use during agent-created GitHub PR or issue workflows before creating/updating the body.
## Contract
- Never use network. Session discovery reads local agent logs only.
- Never upload raw logs. Render sanitized Markdown first.
- Always ask the user before adding transcript logs to a GitHub PR/issue body.
- Tell the user sanitized session logs help reviewers and can make PRs easier to prioritize.
- Offer a local HTML preview before insertion. If the user wants preview, open it and wait for confirmation before adding the section.
- Fail closed on unresolved secrets, private keys, browser/session/cookie details, or auth URLs.
- Drop system/developer prompts, raw tool outputs, reasoning, env, cookies, tokens, and broad local paths.
- Keep user prompts, assistant visible decisions, terse tool summaries, and test/proof outcomes.
- Remove session turns unrelated to the PR/issue work. Use the PR/issue title, branch name, changed files, and stated goal as scope; omit earlier/later unrelated tasks even when they are in the same session log.
- Best effort only: PR/issue creation must continue if no safe transcript is found.
- Add the `## Agent Transcript` section only when inserting a real transcript. Never add a placeholder transcript heading or text such as "A sanitized local transcript preview was generated but not included."
- Use a collapsed `<details>` section and update existing markers instead of duplicating sections.
## Helper
```bash
.agents/skills/agent-transcript/scripts/agent-transcript --help
```
Find a likely local session:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript find \
--query "$PR_TITLE $BRANCH_OR_PR_URL" \
--cwd "$PWD" \
--since-days 14
```
`find` scans the newest 400 matching local JSONL logs by default across Codex, Claude, Pi, and OpenClaw agent sessions. Use `--max-files N` for a wider local search.
Render a PR/issue body section:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript render \
--session "$SESSION_JSONL" \
--out /tmp/agent-transcript.md
```
Preview one candidate session locally:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript preview \
--session "$SESSION_JSONL" \
--out /tmp/agent-transcript-preview.html
open /tmp/agent-transcript-preview.html
```
Append/update a body file before `gh pr create --body-file` or connector PR creation:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript append-body \
--body /tmp/pr-body.md \
--session "$SESSION_JSONL" \
--out /tmp/pr-body.with-transcript.md
```
## PR/Issue Workflow
1. Draft the normal PR/issue body first.
2. Run `find` with title, branch, PR URL/number if known, and cwd.
3. If a high-confidence session is found, ask:
`Include a redacted agent transcript? It helps reviewers and can make the PR easier to prioritize. I can open a local preview first.`
4. If the user wants preview, run `preview`, open the HTML with `open`, and wait for confirmation.
5. Before insertion, trim unrelated session turns from the generated section. Keep only turns that explain this PR/issue's goal, implementation choices, files, tests, proof, blockers, and final outcome.
6. If the user approves, run `append-body`.
7. Use the enriched body file for creation/update.
8. If no safe session is found, say nothing and continue without transcript. If the user declines, continue without transcript and do not add any transcript placeholder section.
## Review Artifacts
For manual audits across many PR/session candidates, create a local HTML preview from a local JSON file. This is for maintainers only and is not part of the PR/issue workflow:
```bash
.agents/skills/agent-transcript/scripts/agent-transcript html \
--prs /tmp/recent-prs.json \
--out /tmp/agent-transcript-preview.html
```

View File

@@ -1,683 +0,0 @@
#!/usr/bin/env node
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import process from "node:process";
const MARKER_START = "<!-- agent-transcript:start -->";
const MARKER_END = "<!-- agent-transcript:end -->";
const DEFAULT_MAX_CHARS = 50000;
const DEFAULT_ENTRY_MAX_CHARS = 6000;
function usage() {
console.log(`Usage:
agent-transcript find --query TEXT [--cwd PATH] [--since-days N] [--max-files N] [--root PATH...]
agent-transcript render --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N] [--title TEXT] [--url URL]
agent-transcript preview --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N] [--title TEXT] [--url URL]
agent-transcript append-body --body FILE --session FILE [--out FILE] [--max-chars N] [--entry-max-chars N]
agent-transcript html --prs FILE [--out FILE] [--since-days N] [--min-score N] [--root PATH...] [--exclude-session FILE...]
Local-only. No network calls.`);
}
function parseArgs(argv) {
const args = { _: [] };
for (let i = 0; i < argv.length; i++) {
const arg = argv[i];
if (!arg.startsWith("--")) {
args._.push(arg);
continue;
}
const key = arg.slice(2);
const next = argv[i + 1];
if (next == null || next.startsWith("--")) {
args[key] = true;
continue;
}
i++;
if (args[key] == null) args[key] = next;
else if (Array.isArray(args[key])) args[key].push(next);
else args[key] = [args[key], next];
}
return args;
}
function asArray(value) {
if (value == null) return [];
return Array.isArray(value) ? value : [value];
}
function homePath(...parts) {
return path.join(os.homedir(), ...parts);
}
function openClawSessionRoots() {
const stateDir = process.env.OPENCLAW_STATE_DIR || homePath(".openclaw");
const agentsDir = path.join(stateDir, "agents");
if (!fs.existsSync(agentsDir)) return [];
try {
const roots = fs
.readdirSync(agentsDir, { withFileTypes: true })
.filter((entry) => entry.isDirectory())
.flatMap((entry) => {
const agentDir = path.join(agentsDir, entry.name);
return [
path.join(agentDir, "sessions"),
path.join(agentDir, "agent", "sessions"),
path.join(agentDir, "agent", "codex-home", "sessions"),
];
})
.filter((root) => fs.existsSync(root));
return [...new Set(roots)];
} catch {
return [];
}
}
function defaultRoots() {
return [
homePath(".codex", "sessions"),
homePath(".claude", "projects"),
homePath(".pi", "agent", "sessions"),
...openClawSessionRoots(),
];
}
function walkJsonl(root, sinceMs, out = []) {
if (!root || !fs.existsSync(root)) return out;
const stat = fs.statSync(root);
if (stat.isFile()) {
if (root.endsWith(".jsonl") && stat.mtimeMs >= sinceMs) out.push(root);
return out;
}
for (const entry of fs.readdirSync(root, { withFileTypes: true })) {
if (entry.name === "node_modules" || entry.name === ".git") continue;
const file = path.join(root, entry.name);
if (entry.isDirectory()) walkJsonl(file, sinceMs, out);
else if (entry.isFile() && entry.name.endsWith(".jsonl")) {
const entryStat = fs.statSync(file);
if (entryStat.mtimeMs >= sinceMs) out.push(file);
}
}
return out;
}
function readJsonl(file, maxLines = 12000) {
const text = fs.readFileSync(file, "utf8");
const lines = text.split(/\n+/).filter(Boolean).slice(0, maxLines);
const rows = [];
for (const line of lines) {
try {
rows.push(JSON.parse(line));
} catch {
rows.push({ type: "unparsed", text: line });
}
}
return rows;
}
function stringContent(value) {
if (value == null) return "";
if (typeof value === "string") return value;
if (Array.isArray(value)) return value.map(stringContent).filter(Boolean).join("\n");
if (typeof value === "object") {
if (typeof value.text === "string") return value.text;
if (typeof value.content === "string") return value.content;
if (typeof value.message === "string") return value.message;
if (Array.isArray(value.content)) return stringContent(value.content);
if (value.type === "text" && value.text) return String(value.text);
}
return "";
}
function detectAgent(file, rows) {
if (file.includes(`${path.sep}.codex${path.sep}`)) return "codex";
if (file.includes(`${path.sep}.claude${path.sep}`)) return "claude";
if (file.includes(`${path.sep}.pi${path.sep}`)) return "pi";
if (
file.includes(`${path.sep}.openclaw${path.sep}`) ||
(file.includes(`${path.sep}agents${path.sep}`) && file.includes(`${path.sep}sessions${path.sep}`))
) {
return "openclaw";
}
if (rows.some((row) => row?.type === "session_meta" || row?.type === "response_item")) return "codex";
if (rows.some((row) => row?.sessionId && row?.userType)) return "claude";
return "agent";
}
function eventText(row) {
if (row?.type === "event_msg") {
const payload = row.payload || {};
return stringContent(payload.message || payload.text_elements || payload.content);
}
if (row?.type === "response_item") {
const payload = row.payload || {};
return stringContent(payload.content || payload.summary || payload.arguments || payload.output);
}
if (row?.message) return stringContent(row.message);
if (row?.content) return stringContent(row.content);
if (row?.text) return stringContent(row.text);
return "";
}
function eventRole(row) {
if (row?.type === "event_msg") {
const type = row.payload?.type;
if (type === "user_message") return "user";
if (type === "agent_message") return "assistant";
if (type === "token_count" || type === "task_started" || type === "task_complete") return null;
if (type === "web_search_end") return "web";
}
if (row?.type === "response_item") {
const payload = row.payload || {};
if (payload.type === "function_call") return "tool";
if (payload.type === "function_call_output") return "tool_output";
if (payload.type === "reasoning") return null;
if (payload.type === "web_search_call") return "web";
if (payload.role === "user") return "user";
if (payload.role === "assistant") return "assistant";
}
if (row?.type === "user") return "user";
if (row?.type === "assistant") return "assistant";
if (row?.message?.role === "user") return "user";
if (row?.message?.role === "assistant") return "assistant";
if (row?.type === "tool_result" || row?.type === "tool_use") return "tool";
return null;
}
function hasSetupBlob(text) {
return (
text.includes("<INSTRUCTIONS>") ||
text.includes("# AGENTS.MD") ||
text.includes("Knowledge cutoff:") ||
text.includes("You are Codex") ||
/\byour instructions\b/i.test(text) ||
/\binstructions absorbed\b/i.test(text) ||
/\bAGENTS\.md\b/i.test(text)
);
}
function redact(input, stats) {
let s = String(input ?? "");
const rules = [
[/-----BEGIN [A-Z ]*PRIVATE KEY-----[\s\S]*?-----END [A-Z ]*PRIVATE KEY-----/g, "[REDACTED_PRIVATE_KEY]"],
[/sk-[A-Za-z0-9_-]{20,}/g, "[REDACTED_OPENAI_KEY]"],
[/(gh[pousr]_[A-Za-z0-9_]{20,})/g, "[REDACTED_GITHUB_TOKEN]"],
[/(AKIA[0-9A-Z]{16})/g, "[REDACTED_AWS_KEY]"],
[/eyJ[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{20,}\.[A-Za-z0-9_-]{10,}/g, "[REDACTED_JWT]"],
[/\b(?:Bearer|Basic)\s+[A-Za-z0-9._~+/=-]{16,}/gi, "[REDACTED_AUTH_HEADER]"],
[/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}/gi, "[REDACTED_EMAIL]"],
[/\b(?:\+?\d[\d .()-]{7,}\d)\b/g, "[REDACTED_PHONE]"],
[/\/Users\/[^\s`"'>)]+/g, "[LOCAL_PATH]"],
[/~\/[^\s`"'>)]+/g, "[HOME_PATH]"],
[/([?&](?:token|key|secret|signature|sig|access_token|auth)=)[^\s`"'>&]+/gi, "$1[REDACTED]"],
];
for (const [re, repl] of rules) {
const before = s;
s = s.replace(re, repl);
if (s !== before) stats.redactions++;
}
return s;
}
function unsafe(text) {
const patterns = [
/-----BEGIN [A-Z ]*PRIVATE KEY-----/,
/\b(?:Bearer|Basic)\s+[A-Za-z0-9._~+/=-]{16,}/i,
/\b(?:user_session|_gh_sess|__Host-user_session_same_site|GH_SESSION_TOKEN)\b/i,
/\b(?:GITHUB_TOKEN|GH_TOKEN|OPENAI_API_KEY|ANTHROPIC_API_KEY)\b/,
/\/upload\/policies\/assets|uploadToken|authenticity_token/i,
];
return patterns.filter((pattern) => pattern.test(text)).map((pattern) => String(pattern));
}
function normalizeEntry(role, text, stats, options = {}) {
let t = redact(text, stats).replace(/\n{3,}/g, "\n\n").trim();
if (!t) return null;
if (hasSetupBlob(t)) t = "[instructions recap omitted; policy/config text, not task dialogue]";
if (unsafe(t).length) t = "[omitted: browser/session/auth internals; not useful for public PR transcript]";
const entryMaxChars = Number(options.entryMaxChars || options["entry-max-chars"] || DEFAULT_ENTRY_MAX_CHARS);
if (t.length > entryMaxChars) {
t = `${t.slice(0, entryMaxChars).trimEnd()}\n...[truncated ${t.length - entryMaxChars} chars]`;
}
return `[${role}]\n${t}`;
}
function entryRole(entry) {
const match = entry.match(/^\[([^\]]+)\]\n/);
return match ? match[1] : null;
}
function entryBody(entry) {
return entry.replace(/^\[[^\]]+\]\n/, "");
}
function coalesceEntries(entries) {
const coalesced = [];
for (const entry of entries) {
const role = entryRole(entry);
const body = entryBody(entry);
const last = coalesced[coalesced.length - 1];
if (!last || !role || entryRole(last) !== role || role === "tool summary") {
coalesced.push(entry);
continue;
}
const lastBody = entryBody(last);
if (lastBody === body || lastBody.includes(body)) continue;
if (body.includes(lastBody)) {
coalesced[coalesced.length - 1] = `[${role}]\n${body}`;
continue;
}
coalesced[coalesced.length - 1] = `[${role}]\n${lastBody}\n\n${body}`;
}
return coalesced;
}
function toolFamily(name) {
const normalized = String(name).toLowerCase();
if (
/(read|fetch|open|list|find|search|grep|rg|sed|cat|head|tail|jq|wc|status|diff|show|view|snapshot|screenshot)/.test(
normalized,
)
) {
return "read";
}
if (/(write|edit|patch|apply|create|update|append|save|comment|fill|click|type|navigate|upload)/.test(normalized)) {
return "write";
}
if (/(exec|command|shell|run|test|build|lint|format|install|pnpm|npm|node|git|gh|ssh)/.test(normalized)) {
return "execute";
}
if (/(web|http|fetch|browser|chrome|github|dropbox|notion|gmail|calendar)/.test(normalized)) {
return "network";
}
return "other";
}
function shellFamily(command) {
const cmd = String(command || "").trim();
if (!cmd) return "execute";
if (
/^(rg|grep|sed|cat|head|tail|jq|wc|ls|find|pwd|git (status|diff|show|log|blame)|gh (pr|issue|api|run|repo|auth) (view|list|status)|test |stat |ps |which |command -v )\b/.test(
cmd,
)
) {
return "read";
}
if (/^(open |chmod |mkdir |touch |cp |mv |kill |git add|git commit|git push|gh pr create|gh issue create)\b/.test(cmd)) {
return "write";
}
if (/^(node|npm|pnpm|bun|python|python3|ruby|tsx|tsgo|make|cargo|go test|swift|xcodebuild)\b/.test(cmd)) {
return "execute";
}
if (/^(ssh|curl|wget|tailscale|nc )\b/.test(cmd)) return "network";
return "execute";
}
function toolCallFamily(row) {
const name = row.payload?.name || row.name || row.message?.name || row.type || "tool";
if (name === "exec_command") {
try {
const args = JSON.parse(row.payload?.arguments || "{}");
return shellFamily(args.cmd);
} catch {
return "execute";
}
}
if (name === "apply_patch") return "write";
if (name === "write_stdin") return "execute";
return toolFamily(name);
}
function compactToolSummary(familyCounts, dropped) {
const families = new Map();
for (const [family, count] of familyCounts.entries()) {
families.set(family, (families.get(family) || 0) + count);
}
const ordered = ["read", "write", "execute", "network", "other"]
.map((family) => [family, families.get(family) || 0])
.filter(([, count]) => count > 0)
.map(([family, count]) => `${count} ${family}`);
const calls = ordered.length ? ordered.join(", ") : "0 tool";
return `${calls}; raw tool outputs dropped: ${dropped}`;
}
function recountEntries(stats, entries) {
stats.rawEntries = stats.entries;
stats.entries = entries.length;
stats.user = entries.filter((entry) => entry.startsWith("[user]\n")).length;
stats.assistant = entries.filter((entry) => entry.startsWith("[assistant]\n")).length;
}
function renderSession(file, options = {}) {
const rows = readJsonl(file);
const agent = detectAgent(file, rows);
const stats = {
agent,
entries: 0,
user: 0,
assistant: 0,
toolCalls: 0,
toolOutputsDropped: 0,
web: 0,
redactions: 0,
omittedUnsafe: 0,
};
const toolCounts = new Map();
const items = [];
const seenEntries = new Set();
const hasEventDialogue = rows.some((row) => {
const type = row?.type === "event_msg" ? row.payload?.type : null;
return type === "user_message" || type === "agent_message";
});
for (const row of rows) {
const role = eventRole(row);
if (!role) continue;
if (hasEventDialogue && row.type === "response_item" && (role === "user" || role === "assistant")) {
continue;
}
if (role === "tool_output") {
stats.toolOutputsDropped++;
continue;
}
if (role === "tool") {
const family = toolCallFamily(row);
toolCounts.set(family, (toolCounts.get(family) || 0) + 1);
stats.toolCalls++;
continue;
}
if (role === "web") {
stats.web++;
continue;
}
const before = eventText(row);
const entry = normalizeEntry(role, before, stats, options);
if (!entry) continue;
const dedupeKey = entry.replace(/\s+/g, " ").trim();
if (seenEntries.has(dedupeKey)) continue;
seenEntries.add(dedupeKey);
if (entry.includes("[omitted: browser/session/auth internals")) stats.omittedUnsafe++;
items.push(entry);
stats.entries++;
if (role === "user") stats.user++;
if (role === "assistant") stats.assistant++;
}
if (toolCounts.size) {
items.push(`[tool summary]\n${compactToolSummary(toolCounts, stats.toolOutputsDropped)}`);
stats.entries++;
}
const renderedItems = coalesceEntries(items);
recountEntries(stats, renderedItems);
const maxChars = Number(options.maxChars || DEFAULT_MAX_CHARS);
let joined = renderedItems.join("\n\n");
if (joined.length > maxChars) joined = `${joined.slice(0, maxChars).trimEnd()}\n\n...[transcript truncated to ${maxChars} chars]`;
const headerBits = [options.title, options.url].filter(Boolean).join(" | ");
const unsafeAfter = unsafe(joined);
const safe = unsafeAfter.length === 0;
const markdown = `${MARKER_START}
## Agent Transcript
<details>
<summary>Redacted ${agent} session transcript${headerBits ? `: ${redact(headerBits, stats)}` : ""}</summary>
\`\`\`\`text
source: [LOCAL_SESSION]
redaction: local paths, emails, phone-shaped strings, token-shaped strings, auth headers, auth query params
omitted: raw tool outputs, system/developer prompts, local paths, secrets, browser/session/auth details
stats: ${JSON.stringify(stats)}
${joined}
\`\`\`\`
</details>
${MARKER_END}
`;
return { file, agent, safe, unsafeAfter, stats, markdown };
}
function readBoundedText(file, maxBytes = 220000) {
const fd = fs.openSync(file, "r");
try {
const stat = fs.fstatSync(fd);
if (stat.size <= maxBytes) {
const buffer = Buffer.alloc(stat.size);
fs.readSync(fd, buffer, 0, stat.size, 0);
return buffer.toString("utf8");
}
const half = Math.floor(maxBytes / 2);
const head = Buffer.alloc(half);
const tail = Buffer.alloc(half);
fs.readSync(fd, head, 0, half, 0);
fs.readSync(fd, tail, 0, half, Math.max(0, stat.size - half));
return `${head.toString("utf8")}\n[...middle omitted for scan...]\n${tail.toString("utf8")}`;
} finally {
fs.closeSync(fd);
}
}
function sessionScanRecord(file, maxBytes) {
const stat = fs.statSync(file);
const agent = detectAgent(file, []);
return {
file,
agent,
mtime: new Date(stat.mtimeMs).toISOString(),
haystack: `${file}\n${readBoundedText(file, maxBytes)}`.toLowerCase(),
};
}
function scoreScanRecord(record, terms, cwd) {
const haystack = record.haystack;
let score = 0;
const reasons = [];
for (const term of terms) {
const normalized = term.toLowerCase().trim();
if (normalized.length < 3) continue;
if (haystack.includes(normalized)) {
score += Math.min(20, Math.max(3, Math.floor(normalized.length / 3)));
reasons.push(normalized.slice(0, 80));
}
}
if (cwd) {
const cwdLower = cwd.toLowerCase();
if (haystack.includes(cwdLower) || record.file.toLowerCase().includes(cwdLower.replaceAll("/", "-"))) {
score += 8;
reasons.push("cwd");
}
}
return { file: record.file, score, reasons, mtime: record.mtime, agent: record.agent };
}
function recentFiles(files, maxFiles) {
return files
.map((file) => {
try {
return { file, mtimeMs: fs.statSync(file).mtimeMs };
} catch {
return null;
}
})
.filter(Boolean)
.sort((a, b) => b.mtimeMs - a.mtimeMs)
.slice(0, maxFiles)
.map((entry) => entry.file);
}
function candidateFiles(roots, terms, sinceMs, options = {}) {
return recentFiles(roots.flatMap((root) => walkJsonl(root, sinceMs)), Number(options["max-files"] || 400));
}
function findSessions(options) {
const sinceDays = Number(options["since-days"] || 14);
const sinceMs = Date.now() - sinceDays * 24 * 60 * 60 * 1000;
const roots = asArray(options.root).length ? asArray(options.root) : defaultRoots();
const query = String(options.query || "");
const terms = query
.split(/\s+/)
.concat(query.match(/https?:\/\/\S+/g) || [])
.filter(Boolean);
const files = candidateFiles(roots, terms, sinceMs, options);
const scanBytes = Number(options["scan-bytes"] || 60000);
const results = files
.map((file) => scoreScanRecord(sessionScanRecord(file, scanBytes), terms, options.cwd))
.filter((result) => result.score > 0)
.sort((a, b) => b.score - a.score || b.mtime.localeCompare(a.mtime))
.slice(0, Number(options.limit || 10));
return results;
}
function sessionScanRecords(options) {
const sinceDays = Number(options["since-days"] || 14);
const sinceMs = Date.now() - sinceDays * 24 * 60 * 60 * 1000;
const roots = asArray(options.root).length ? asArray(options.root) : defaultRoots();
const excluded = new Set(asArray(options["exclude-session"]).map((file) => path.resolve(file)));
return roots
.flatMap((root) => walkJsonl(root, sinceMs))
.filter((file) => !excluded.has(path.resolve(file)))
.map((file) => sessionScanRecord(file, Number(options["scan-bytes"] || 90000)));
}
function replaceSection(body, section) {
const start = body.indexOf(MARKER_START);
const end = body.indexOf(MARKER_END);
if (start !== -1 && end !== -1 && end > start) {
return `${body.slice(0, start).trimEnd()}\n\n${section.trim()}\n\n${body.slice(end + MARKER_END.length).trimStart()}`;
}
return `${body.trimEnd()}\n\n${section.trim()}\n`;
}
function escapeHtml(text) {
return String(text)
.replaceAll("&", "&amp;")
.replaceAll("<", "&lt;")
.replaceAll(">", "&gt;")
.replaceAll('"', "&quot;");
}
function htmlDocument(records) {
const rows = records
.map((record) => `<section>
<h2><a href="${escapeHtml(record.url || "")}">${escapeHtml(record.title || record.url || "PR")}</a></h2>
<p><code>${escapeHtml(record.session ? "[LOCAL_SESSION]" : "no session")}</code> score: ${escapeHtml(record.score ?? "")} safe: ${escapeHtml(record.safe ?? "")}</p>
<pre>${escapeHtml(record.markdown || record.error || "")}</pre>
</section>`)
.join("\n");
return `<!doctype html>
<meta charset="utf-8">
<title>Agent Transcript Preview</title>
<style>
body{font:14px/1.45 system-ui,-apple-system,BlinkMacSystemFont,"Segoe UI",sans-serif;margin:32px;color:#1f2328;background:#fff}
section{border-top:1px solid #d0d7de;padding:24px 0}
h1,h2{line-height:1.2}
pre{white-space:pre-wrap;background:#f6f8fa;border:1px solid #d0d7de;border-radius:6px;padding:16px;overflow:auto}
code{background:#f6f8fa;padding:2px 4px;border-radius:4px}
a{color:#0969da}
</style>
<h1>Agent Transcript Preview</h1>
${rows}
`;
}
function singlePreviewDocument(record) {
return htmlDocument([record]);
}
function readPrs(file) {
const raw = fs.readFileSync(file, "utf8");
const parsed = JSON.parse(raw);
return Array.isArray(parsed) ? parsed : parsed.items || parsed.prs || [];
}
function main() {
const [command, ...rest] = process.argv.slice(2);
const args = parseArgs(rest);
if (!command || command === "--help" || command === "-h" || args.help) {
usage();
return;
}
if (command === "find") {
console.log(JSON.stringify(findSessions(args), null, 2));
return;
}
if (command === "render") {
if (!args.session) throw new Error("--session is required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
if (args.out) fs.writeFileSync(args.out, rendered.markdown);
else process.stdout.write(rendered.markdown);
return;
}
if (command === "preview") {
if (!args.session) throw new Error("--session is required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
const output = singlePreviewDocument({
title: args.title || "Agent Transcript Preview",
url: args.url || "",
session: args.session,
safe: rendered.safe,
markdown: rendered.markdown,
});
if (args.out) fs.writeFileSync(args.out, output);
else process.stdout.write(output);
return;
}
if (command === "append-body") {
if (!args.body || !args.session) throw new Error("--body and --session are required");
const rendered = renderSession(args.session, args);
if (!rendered.safe) throw new Error(`unsafe transcript after redaction: ${rendered.unsafeAfter.join(", ")}`);
const body = fs.readFileSync(args.body, "utf8");
const next = replaceSection(body, rendered.markdown);
if (args.out) fs.writeFileSync(args.out, next);
else process.stdout.write(next);
return;
}
if (command === "html") {
if (!args.prs) throw new Error("--prs is required");
const records = [];
const scanRecords = sessionScanRecords(args);
const minScore = Number(args["min-score"] || 50);
for (const pr of readPrs(args.prs)) {
const query = [pr.url, pr.number ? `#${pr.number}` : "", pr.number, pr.title, pr.headRefName, pr.headRefName || pr.branch]
.filter(Boolean)
.join(" ");
const terms = query
.split(/\s+/)
.concat(query.match(/https?:\/\/\S+/g) || [])
.filter(Boolean);
const [candidate] = scanRecords
.map((record) => scoreScanRecord(record, terms, args.cwd))
.filter((result) => result.score >= minScore)
.sort((a, b) => b.score - a.score || b.mtime.localeCompare(a.mtime));
if (!candidate) {
records.push({ ...pr, error: "No local session match found." });
continue;
}
try {
const rendered = renderSession(candidate.file, { ...args, title: pr.title, url: pr.url });
records.push({
...pr,
session: candidate.file,
score: candidate.score,
safe: rendered.safe,
markdown: rendered.markdown,
});
} catch (error) {
records.push({ ...pr, session: candidate.file, score: candidate.score, error: String(error) });
}
}
const output = htmlDocument(records);
if (args.out) fs.writeFileSync(args.out, output);
else process.stdout.write(output);
return;
}
usage();
process.exitCode = 2;
}
try {
main();
} catch (error) {
console.error(error instanceof Error ? error.message : String(error));
process.exit(1);
}

View File

@@ -1,213 +0,0 @@
---
name: autoreview
description: "Auto Review closeout. Codex review is the default when no engine is set and is the recommended reviewer."
---
# Auto Review
Run the bundled structured review helper as a closeout check. This is code review, not Guardian `auto_review` approval routing.
Codex review is the default when no engine is set. It usually delivers the best review results and should remain the normal final closeout engine.
Use when:
- user asks for Codex review / Claude review / autoreview / second-model review
- after non-trivial code edits, before final/commit/ship
- reviewing a local branch or PR branch after fixes
## Contract
- Treat review output as advisory. Never blindly apply it.
- Verify every finding by reading the real code path and adjacent files.
- Read dependency docs/source/types when the finding depends on external behavior.
- Reject unrealistic edge cases, speculative risks, broad rewrites, and fixes that over-complicate the codebase.
- Prefer small fixes at the right ownership boundary; no refactor unless it clearly improves the bug class.
- Keep going until structured review returns no accepted/actionable findings.
- If a review-triggered fix changes code, rerun focused tests and rerun the structured review helper.
- For security-audit suppression changes, verify accepted findings remain auditable: suppressed findings stay in structured output, active output keeps an unsuppressible suppression notice, and aggregate findings cannot hide unrelated active risk.
- Never switch or override the requested review engine/model. If the review hits model capacity, retry the same command a few times with the same engine/model.
- Be patient with large bundles. Structured review can take up to 30 minutes while the model call is active, especially with Codex tools or web search.
- Treat heartbeat lines like `review still running: ... elapsed=... pid=...` as healthy progress, not a hang. Let the helper continue while heartbeats are advancing. Pass `--stream-engine-output` when live engine text is useful; Codex and Claude filter tool/file chatter, other engines pass raw output through.
- Do not kill a review just because it has been quiet for 2-5 minutes, or because it is still running under the 30-minute window. Inspect the process only after missing multiple expected heartbeats, after 30 minutes, or after an obviously failed subprocess; prefer letting the same helper command finish.
- Tools are useful in review mode. The helper allows read-only inspection tools and web search by default so reviewers can check dependency contracts, upstream docs, and current behavior.
- Security perspective is always included, but it should not cripple legitimate functionality. Report security findings only when the change creates a concrete, actionable risk or removes an important safety check.
- For regression provenance, if no blamed PR is traceable, use the blamed commit as the provenance: commit SHA, date, and author username. Do not guess a merger or frame missing PR metadata as a separate finding.
- Do not invoke built-in `codex review`, nested reviewers, or reviewer panels from inside the review. The helper builds one bundle, calls one selected engine, validates one structured result, and stops.
- Stop as soon as the helper exits 0 with no accepted/actionable findings. Do not run an extra review just to get a nicer "clean" line, a second opinion, or clearer closeout wording.
- Treat the helper's successful exit plus absence of actionable findings as the clean review result, even if the underlying Codex CLI output is terse.
- Multi-reviewer panels are opt-in only. Use them when explicitly requested or when risk justifies the extra spend; the main agent still verifies every accepted finding before fixing.
- If rejecting a finding as intentional/not worth fixing, add a brief inline code comment only when it explains a real invariant or ownership decision that future reviewers should know.
- If `gh`/Gitcrawl reports `database disk image is malformed`, run `gitcrawl doctor --json` once to let the portable cache repair before retrying review; do not bypass the shim unless repair fails and freshness requires live GitHub.
- If Gitcrawl reports a portable manifest mismatch, source/runtime DB health error, or stale portable-store checkout, run `gitcrawl doctor --json` and inspect `source_db_health`, `runtime_db_health`, and `portable_store_status` before falling back to live GitHub.
- Do not push just to review. Push only when the user requested push/ship/PR update.
## Pick Target
Dirty local work:
```bash
<autoreview-helper> --mode local
```
Use this only when the patch is actually unstaged/staged/untracked in the
current checkout. `--mode uncommitted` is accepted as an alias for `--mode local`.
For committed, pushed, or PR work, point the helper at the commit
or branch diff instead; do not force dirty modes just
because the helper docs mention dirty work first. A clean local review
only proves there is no local patch.
Branch/PR work:
```bash
<autoreview-helper> --mode branch --base origin/main
```
Optional review context is first-class:
```bash
<autoreview-helper> --mode branch --base origin/main --prompt-file /tmp/review-notes.md --dataset /tmp/evidence.json
```
If an open PR exists, use its actual base:
```bash
base=$(gh pr view --json baseRefName --jq .baseRefName)
<autoreview-helper> --mode branch --base "origin/$base"
```
Committed single change:
```bash
<autoreview-helper> --mode commit --commit HEAD
```
or with the helper:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --mode commit --commit HEAD
```
Use commit review for already-landed or already-pushed work on `main`. Reviewing
clean `main` against `origin/main` is usually an empty diff after push. For a
small stack, review each commit explicitly or review the branch before merging
with `--base`.
## Parallel Closeout
Format first if formatting can change line locations. Then it is OK to run tests and review in parallel:
```bash
scripts/autoreview --parallel-tests "<focused test command>"
```
On Windows, the default `--parallel-tests` shell preserves the platform `cmd.exe`
semantics used by Python `shell=True`. Use `--parallel-tests-shell powershell`
or `--parallel-tests-shell pwsh` when the focused test command is PowerShell-specific.
Tradeoff: tests may force code changes that stale the review. If tests or review lead to code edits, rerun the affected tests and rerun review until no accepted/actionable findings remain. Once that rerun exits cleanly, stop; do not spend another long review cycle on redundant confirmation.
## Review Panels
Run multiple reviewers against one frozen bundle:
```bash
<autoreview-helper> --reviewers codex,claude
```
`--panel` is shorthand for Codex plus Claude unless `--engine` changes the first reviewer:
```bash
<autoreview-helper> --panel
```
Set reviewer models and thinking/effort explicitly:
```bash
<autoreview-helper> --reviewers codex,claude --model codex=gpt-5.1 --thinking codex=high --model claude=sonnet --thinking claude=max
```
Inline syntax is also supported:
```bash
<autoreview-helper> --reviewers codex:gpt-5.1:high,claude:sonnet:max
```
Codex maps thinking to `model_reasoning_effort` and accepts `low`, `medium`,
`high`, or `xhigh`. Claude maps thinking to `--effort` and also accepts `max`.
Engines without a real thinking knob reject `--thinking`.
## Context Efficiency
Run the helper directly so target selection, engine choice, structured validation, and exit status all stay in one path. If output is noisy, summarize the completed helper output after it returns; do not ask another agent or reviewer to rerun the review.
## Helper
OpenClaw repo-local helper:
```bash
.agents/skills/autoreview/scripts/autoreview --help
```
On native Windows, invoke the extensionless Python helper through Python:
```powershell
python .agents\skills\autoreview\scripts\autoreview --help
```
The smoke harness has thin shell wrappers over a shared Python implementation:
```bash
.agents/skills/autoreview/scripts/test-review-harness --fixture benign --engine codex
```
```powershell
.agents\skills\autoreview\scripts\test-review-harness.ps1 -Fixture benign -Engine codex
```
`agent-scripts` checkout helper:
```bash
skills/autoreview/scripts/autoreview --help
```
Global helper from `agent-scripts`:
```bash
~/.codex/skills/agent-scripts/autoreview/scripts/autoreview --help
```
If installed from `agent-scripts`, path is:
```bash
/Users/steipete/Projects/agent-scripts/skills/autoreview/scripts/autoreview --help
```
The helper:
- chooses dirty local changes first
- accepts `--mode uncommitted` as an alias for `--mode local`
- otherwise uses current PR base if `gh pr view` works
- otherwise uses `origin/main` for non-main branches
- supports `--engine codex`, `claude`, `droid`, and `copilot`; default is `AUTOREVIEW_ENGINE` or `codex`; Codex should remain the default when nothing is set
- resolves bare `git`, `gh`, reviewer, and PowerShell shell commands from absolute `PATH` entries only, never from the reviewed checkout; explicit relative `--*-bin` paths are resolved from the reviewed repository root
- use `--mode commit --commit <ref>` for already-committed work, especially clean `main` after landing
- should be left in `--mode auto` or forced to `--mode branch` for PR/branch work; do not force `--mode local` after committing
- writes only to stdout unless `--output`, `--json-output`, or live streamed engine stderr is set
- supports `--dry-run`, `--parallel-tests`, `--parallel-tests-shell`, `--prompt`, `--prompt-file`, `--dataset`, `--no-tools`, `--no-web-search`, and commit refs
- supports `--stream-engine-output` or `AUTOREVIEW_STREAM_ENGINE_OUTPUT=1` for live engine text while preserving structured validation; Codex and Claude hide tool/file event details, emit compact activity summaries, and report usage at turn completion
- supports opt-in review panels with `--panel` / `--reviewers`, plus per-engine `--model` and `--thinking`
- allows read-only tools and web search by default where the selected CLI supports them; forbids nested review in the prompt; Codex is run through `codex exec` with read-only sandbox and structured output
- prints `review still running: <engine> elapsed=<seconds>s pid=<pid>` to stderr at long-running intervals while waiting for the selected review engine, unless streamed output or compact Codex activity has been visible recently
- prints `autoreview clean: no accepted/actionable findings reported` when the selected review command exits 0
- exits nonzero when accepted/actionable findings are present
## Final Report
Include:
- review command used
- tests/proof run
- findings accepted/rejected, briefly why
- the clean review result from the final helper/review run, or why a remaining finding was consciously rejected
Do not run another review solely to improve the final report wording. If the final helper run exited 0 and produced no accepted/actionable findings, report that exact run as clean.

File diff suppressed because it is too large Load Diff

View File

@@ -1,16 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
script_dir=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
harness="$script_dir/test-review-harness.py"
if command -v python3 >/dev/null 2>&1; then
exec python3 "$harness" "$@"
fi
if command -v python >/dev/null 2>&1; then
exec python "$harness" "$@"
fi
echo "Python 3 is required to run test-review-harness." >&2
exit 127

View File

@@ -1,45 +0,0 @@
[CmdletBinding()]
param(
[ValidateSet('malicious', 'benign')]
[string] $Fixture,
[ValidateSet('codex', 'claude', 'droid', 'copilot')]
[string[]] $Engine,
[Alias('h')]
[switch] $Help
)
$ErrorActionPreference = 'Stop'
$Harness = Join-Path $PSScriptRoot 'test-review-harness.py'
$ForwardedArgs = @()
if ($Help) {
$ForwardedArgs += '--help'
}
if ($PSBoundParameters.ContainsKey('Fixture')) {
$ForwardedArgs += @('--fixture', $Fixture)
}
if ($PSBoundParameters.ContainsKey('Engine')) {
foreach ($SelectedEngine in $Engine) {
$ForwardedArgs += @('--engine', $SelectedEngine)
}
}
$PyLauncher = Get-Command py -ErrorAction SilentlyContinue
if ($null -ne $PyLauncher) {
& $PyLauncher.Source -3 $Harness @ForwardedArgs
exit $LASTEXITCODE
}
$Python = Get-Command python -ErrorAction SilentlyContinue
if ($null -ne $Python) {
& $Python.Source $Harness @ForwardedArgs
exit $LASTEXITCODE
}
Write-Error 'Python 3 is required to run test-review-harness.'
exit 127

View File

@@ -1,199 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import shutil
import stat
import subprocess
import sys
import tempfile
from collections.abc import Callable
from pathlib import Path
ENGINES = ("codex", "claude", "droid", "copilot")
DEFAULT_ENGINES = ("codex", "claude")
MALICIOUS_INITIAL = """export function uploadPath(name) {
return `uploads/${name.replaceAll("/", "")}`;
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
"""
BENIGN_INITIAL = r"""import path from "node:path";
const uploadsRoot = path.resolve("uploads");
function safeChildPath(root, name) {
if (name.includes("/") || name.includes("\\") || name.includes("\0")) {
throw new Error("invalid child name");
}
const resolved = path.resolve(root, name);
if (!resolved.startsWith(root + path.sep)) throw new Error("path escaped root");
return resolved;
}
export function uploadPath(name) {
return safeChildPath(uploadsRoot, name);
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
"""
MALICIOUS_CHANGED = """import { execSync } from "node:child_process";
export function uploadPath(name) {
return `uploads/${name}`;
}
export function deleteUpload(name) {
return execSync(`rm -rf uploads/${name}`);
}
export function publicUser(user) {
return { id: user.id, name: user.name, password: user.password };
}
"""
BENIGN_CHANGED = r"""import { execFile } from "node:child_process";
import path from "node:path";
import { promisify } from "node:util";
const uploadsRoot = path.resolve("uploads");
const reposRoot = path.resolve("repos");
const execFileAsync = promisify(execFile);
function safeChildPath(root, name) {
if (name.includes("/") || name.includes("\\") || name.includes("\0")) {
throw new Error("invalid child name");
}
const resolved = path.resolve(root, name);
if (!resolved.startsWith(root + path.sep)) throw new Error("path escaped root");
return resolved;
}
function repoChildPath(root, name) {
if (!/^[A-Za-z0-9._-]+$/.test(name)) throw new Error("invalid repo name");
return safeChildPath(root, name);
}
export function uploadPath(name) {
return safeChildPath(uploadsRoot, name);
}
export async function repoProbe(repoName) {
const { stdout } = await execFileAsync(process.execPath, ["--version"], {
cwd: repoChildPath(reposRoot, repoName),
encoding: "utf8",
maxBuffer: 16 * 1024 * 1024,
});
return stdout;
}
export function publicUser(user) {
return { id: user.id, name: user.name };
}
export function accountSettingsForOwner(user, requesterId) {
if (String(requesterId) !== String(user.id)) throw new Error("forbidden");
return { id: user.id, hasPassword: Boolean(user.passwordHash) };
}
"""
MALICIOUS_PROMPT = "This is an acceptance test fixture. The changed app.js patch contains real security bugs. Review normally and report only actionable defects from the patch."
BENIGN_PROMPT = "Security calibration fixture: this patch intentionally uses filesystem paths, async execFile, and owner-gated password-adjacent state safely. Do not flag legitimate shell/filesystem/auth-adjacent functionality unless there is a concrete exploitable risk in the diff."
def parse_args(argv: list[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="test-review-harness",
description=(
"Creates a temporary git repo with either a deliberately unsafe patch "
"or a security-sensitive-but-safe patch, then verifies each selected "
"engine through autoreview."
),
epilog="Default engines: codex, claude.",
)
parser.add_argument("--fixture", choices=("malicious", "benign"), default="malicious")
parser.add_argument("--engine", action="append", choices=ENGINES, dest="engines")
return parser.parse_args(argv)
def write_fixture_file(repo: Path, content: str) -> None:
with (repo / "app.js").open("w", encoding="utf-8", newline="\n") as handle:
handle.write(content)
def run(command: list[str], cwd: Path) -> None:
subprocess.run(command, cwd=cwd, check=True)
def create_fixture_repo(repo: Path, fixture: str) -> None:
run(["git", "init", "--quiet"], repo)
run(["git", "config", "user.name", "Review Fixture"], repo)
run(["git", "config", "user.email", "review-fixture@example.com"], repo)
write_fixture_file(repo, MALICIOUS_INITIAL if fixture == "malicious" else BENIGN_INITIAL)
run(["git", "add", "app.js"], repo)
run(["git", "commit", "--quiet", "-m", "initial safe version"], repo)
write_fixture_file(repo, MALICIOUS_CHANGED if fixture == "malicious" else BENIGN_CHANGED)
def run_reviews(repo: Path, script_dir: Path, fixture: str, engines: list[str]) -> None:
autoreview = script_dir / "autoreview"
for engine in engines:
print(f"== {engine} ==", flush=True)
command = [
sys.executable,
str(autoreview),
"--mode",
"local",
"--engine",
engine,
"--prompt",
MALICIOUS_PROMPT if fixture == "malicious" else BENIGN_PROMPT,
]
if fixture == "malicious":
command.extend(["--require-finding", "command", "--expect-findings"])
run(command, repo)
def cleanup_repo(repo: Path) -> None:
def make_writable_and_retry(function: Callable[[str], object], path: str, _exc_info: object) -> None:
try:
os.chmod(path, stat.S_IREAD | stat.S_IWRITE)
function(path)
except OSError as exc:
print(f"warning: unable to remove temp path {path}: {exc}", file=sys.stderr)
if not repo.exists():
return
try:
shutil.rmtree(repo, onerror=make_writable_and_retry)
except OSError as exc:
print(f"warning: unable to remove temp repo {repo}: {exc}", file=sys.stderr)
def main(argv: list[str]) -> int:
args = parse_args(argv)
script_dir = Path(__file__).resolve().parent
engines = args.engines or list(DEFAULT_ENGINES)
repo = Path(tempfile.mkdtemp(prefix="autoreview-fixture."))
try:
create_fixture_repo(repo, args.fixture)
run_reviews(repo, script_dir, args.fixture, engines)
except subprocess.CalledProcessError as exc:
return int(exc.returncode or 1)
finally:
cleanup_repo(repo)
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

View File

@@ -1,44 +0,0 @@
---
name: channel-message-flows
description: "Use when previewing local channel message flow fixtures."
---
# Channel Message Flows
Use this from the OpenClaw repo root to send canned channel preview flows while iterating on message UX. These are real sends/edits/deletes against the configured channel target.
## Telegram
Native Telegram `sendMessageDraft` tool progress, then a final answer:
```bash
node --import tsx scripts/dev/channel-message-flows.ts \
--channel telegram \
--target <telegram-chat-id> \
--flow working-final \
--duration-ms 20000
```
Thinking preview, then a final answer:
```bash
node --import tsx scripts/dev/channel-message-flows.ts \
--channel telegram \
--target <telegram-chat-id> \
--flow thinking-final
```
## Options
- `--account <accountId>`: Telegram account id when not using the default.
- `--thread-id <id>`: Telegram forum topic/message thread id.
- `--delay-ms <ms>`: Override preview update cadence.
- `--duration-ms <ms>`: Simulated working duration for `working-final`.
- `--final-text <text>`: Override the durable final message.
## Notes
- `--target` is the numeric Telegram chat id.
- `working-final` exercises native Telegram `sendMessageDraft` with static `Working` status and sample tool progress.
- `thinking-final` exercises formatted `Thinking` reasoning preview clearing before the final answer.
- Only `--channel telegram` is implemented for now.

View File

@@ -1,161 +0,0 @@
---
name: clawdtributor
description: "Use for OpenClaw clawtributors PR/issue triage: Discrawl discovery, live-open rechecks, deep review, topic grouping, and compact @handle/LOC/type/blast/verification summaries."
---
# Clawdtributor
Use for the `#clawtributors` queue: Discord-discovered OpenClaw PRs/issues that need live GitHub status plus maintainer-quality review.
## Compose with other skills
- `$discrawl`: local Discord archive sync/search.
- `$openclaw-pr-maintainer`: live GitHub PR/issue review, duplicate search, close/land rules.
- `$gitcrawl`: related issue/PR and current-main/stale-proof search.
- `$openclaw-testing` / `$crabbox`: proof choice when a candidate needs real validation.
## Archive flow
Local archive first; verify freshness for current questions.
```bash
discrawl status --json
discrawl sync
```
Resolve channel if needed:
```bash
sqlite3 "$HOME/.discrawl/discrawl.db" \
"select id,name from channels where name like '%clawtributor%' order by name;"
```
Current known channel id from prior work: `1458141495701012561`. Re-resolve if it stops matching.
Extract recent refs:
```bash
sqlite3 "$HOME/.discrawl/discrawl.db" "
select m.created_at, coalesce(nullif(mm.username,''), m.author_id), m.content
from messages m
left join members mm on mm.guild_id=m.guild_id and mm.user_id=m.author_id
where m.channel_id='1458141495701012561'
and m.created_at >= '<ISO cutoff>'
order by m.created_at desc;" |
perl -nE 'while(m{github\.com/openclaw/openclaw/(pull|issues)/(\d+)}g){say "$1\t$2\t$_"}'
```
Map a PR/issue back to the Discord handle:
```bash
sqlite3 -separator $'\t' "$HOME/.discrawl/discrawl.db" "
select m.created_at,
coalesce(nullif(mm.username,''), nullif(mm.global_name,''), m.author_id)
from messages m
left join members mm on mm.guild_id=m.guild_id and mm.user_id=m.author_id
where m.channel_id='1458141495701012561'
and m.content like '%github.com/openclaw/openclaw/<pull-or-issues>/<number>%'
order by m.created_at desc
limit 1;"
```
Show only `@handle` in the final list. Do not write the word Discord unless the user asks for source details.
## Live GitHub recheck
Always recheck live state before listing, closing, or saying "open".
```bash
GITHUB_TOKEN= GITHUB_TOKEN_NODIFF= GH_TOKEN= \
gh api repos/openclaw/openclaw/pulls/<number> \
--jq '. | {number,title,state,merged,mergeable,draft,author:.user.login,url:.html_url,updatedAt:.updated_at,additions,deletions,changedFiles:.changed_files}'
```
For issues:
```bash
GITHUB_TOKEN= GITHUB_TOKEN_NODIFF= GH_TOKEN= \
gh api repos/openclaw/openclaw/issues/<number> \
--jq '. | {number,title,state,author:.user.login,url:.html_url,updatedAt:.updated_at,pull_request}'
```
If `gh` says bad credentials, clear env vars with empty assignments as above. Use `--jq '. | {...}'` for object projections.
## Review depth
For each open item, inspect enough to classify risk:
- PR body, linked issue, comments, files, additions/deletions, checks.
- Current `origin/main` code path and adjacent tests.
- Related threads with `gitcrawl neighbors/search`.
- Whether main already fixed it, the PR is obsolete, or the idea is invalid.
- Blast radius: touched runtime surfaces, config/schema, plugin/core boundary, user-visible behavior, release/package surface.
- Verification: say if local unit/docs proof is enough, live/provider proof is needed, or it is not directly verifiable.
Do not close from title alone. If closing as done on main or nonsensical, prove it against current main and comment first when mutation is requested. Bulk close/reopen above 5 requires explicit scope.
## Candidate selection
When asked for `5 new`, exclude refs already surfaced in the session and refill from the archive until there are 5 live-open candidates. If fewer than 5 remain open, list all open ones and say how many short.
When asked to `update`, `refresh`, `recheck`, `check again`, or similar, return an updated live-open candidate list. Sort by maintainer importance, not recency: high-impact ready fixes first, then useful-but-review-first, then open/not-ready items. Do not include a "changed since last pass" section or bottom-line merged/closed summary unless the user explicitly asks for churn.
Prefer:
- Fresh, open, external contributor work.
- Small, high-confidence bugfixes.
- Clear repro, tests, or obvious code-path proof.
Demote:
- Broad product/features without owner decision.
- Large rewrites with unclear contract.
- PRs already in progress, merged, closed, duplicate, or fixed on main.
## Topic grouping
Group only when useful or requested:
- Agents/tooling
- Providers/auth/models
- Channels/messaging
- UI/web
- Gateway/protocol/runtime
- Config/memory/cache
- Docker/install/release
- Docs/tests/chore
- Closed/obsolete
Infer topic from labels, touched files, title/body, and actual code path.
## Output format
No Markdown tables. Compact bullets. Use color/risk markers:
- 🟢 low/narrow
- 🟡 medium or needs targeted proof
- 🔴 broad/high runtime risk
- 🟣 security/policy/owner-boundary slow review
- ✅ merged
- ⚪ closed unmerged
Required line shape:
```markdown
- **PR #81244** `@whatsskill.` `+118/-1` `bug` 🟢 https://github.com/openclaw/openclaw/pull/81244 - Prevents chat action buttons from overlapping short assistant replies. Verifiable: yes. Blast: web chat rendering, low.
- **Issue #81245** `@alice` `LOC n/a` `bug` 🟡 https://github.com/openclaw/openclaw/issues/81245 - Reports duplicate Telegram replies when reconnecting after gateway restart. Verifiable: partial. Blast: Telegram channel runtime, medium.
```
Rules:
- Bold the `PR #n` or `Issue #n` marker.
- Use `@handle`, not author bio text.
- Always include the full GitHub URL.
- Include a one-line description after the URL, separated with `-`.
- PR LOC is `+additions/-deletions`; issue LOC is `LOC n/a`.
- Type: `bug`, `feature`, `perf`, `security`, `docs`, `test`, `chore`, or `refactor`.
- Write a full sentence for what it does.
- Always include blast radius in one phrase.
- Always include `verifiable: yes|partial|no` plus the shortest proof hint when helpful.
- If status is not open, still show it only when the user asked for all surfaced refs; use ✅ or ⚪ and state merged/closed.
- For refresh-style asks, prefer section order: `Best Open Now`, `Useful But Review First`, `Still Open / Not Ready`. Omit merged/closed churn by default.

View File

@@ -1,340 +0,0 @@
---
name: clawsweeper
description: "Use for all ClawSweeper work: OpenClaw issue/PR sweep reports, commit-review reports, repair jobs, cloud fix PRs, @clawsweeper maintainer mention commands, trusted ClawSweeper-reviewed autofix/automerge, GitHub Actions monitoring, permissions, gates, and manual backfills."
---
# ClawSweeper
ClawSweeper lives at `~/Projects/clawsweeper`. It is the one OpenClaw
maintenance bot for sweeping, commit review, repair jobs, and guarded fix PRs.
Use this skill whenever asked about reports, findings, dispatch health,
repair/cloud PR creation, comment commands, automerge, permissions, or gates.
## Start
```bash
cd ~/Projects/clawsweeper
git status --short --branch
git pull --ff-only
pnpm run build:all
```
Do not overwrite unrelated edits. If the tree is dirty, inspect first and keep
read-only report work read-only unless the requester asked to commit.
## One Bot, One App
Use the ClawSweeper repo and the `clawsweeper` GitHub App. Use only
`CLAWSWEEPER_*` configuration for this automation. Do not use legacy apps,
variables, labels, or skills.
Required app setup:
- `CLAWSWEEPER_APP_CLIENT_ID`: public app client ID for `clawsweeper`.
- `CLAWSWEEPER_APP_PRIVATE_KEY`: private key used only inside
`actions/create-github-app-token` steps.
- Target app permissions: read target scan context; write issues and pull
requests; contents write for report commits, repair branches, and workflow
inputs; Actions write on `openclaw/clawsweeper` for comment-router
re-review dispatch, workflow dispatch, run cancellation, and self-heal;
optional Checks write for commit Check Runs.
Token boundary:
- Codex workers do not get mutation credentials.
- Review workers run with stripped secret/token env.
- Deterministic scripts own comments, labels, branch pushes, PR creation,
closes, and merges through short-lived GitHub App tokens.
- Merge and write gates default closed.
## Commit Reports
Canonical commit reports:
```text
records/<repo-slug>/commits/<40-char-sha>.md
```
Use the lister:
```bash
pnpm commit-reports -- --since 6h
pnpm commit-reports -- --since "24 hours ago" --findings
pnpm commit-reports -- --since 7d --non-clean
pnpm commit-reports -- --repo openclaw/openclaw --author steipete --since 7d
pnpm commit-reports -- --since 24h --json
```
Results: `nothing_found`, `findings`, `inconclusive`, `failed`,
`skipped_non_code`. One report per SHA; reruns overwrite the SHA-named report.
Manual rerun/backfill:
```bash
gh workflow run commit-review.yml --repo openclaw/clawsweeper \
-f target_repo=openclaw/openclaw \
-f commit_sha=<end-sha> \
-f before_sha=<start-or-parent-sha> \
-f create_checks=false \
-f enabled=true
```
Use `create_checks=true` only when the requester explicitly wants target commit Check
Runs. Add `-f additional_prompt="..."` for focused one-off review instructions.
## Sweep Reports
Issue/PR reports live at:
```text
records/<repo-slug>/items/<number>.md
records/<repo-slug>/closed/<number>.md
```
Lead with counts, concrete findings, and report links. Do not post unsolicited
GitHub comments from report-reading work. Public surfaces are markdown reports,
durable ClawSweeper review comments, and optional checks.
PR reports include Codex `/review`-style `reviewFindings` with priority,
confidence, repository-relative file, and line range. Public PR comments show a
short `Review findings:` list when findings exist; full review comments,
evidence links, likely owners, and runtime details stay inside the collapsed
`Review details` block.
Useful commands:
```bash
pnpm run status
pnpm run audit
pnpm run reconcile
pnpm run apply-decisions -- --dry-run
```
## Create One Repair Job
Create a job from issue/PR refs and a maintainer prompt:
```bash
pnpm run repair:create-job -- \
--repo openclaw/openclaw \
--refs 123,456 \
--prompt-file /tmp/clawsweeper-prompt.md
```
Create from an existing ClawSweeper report:
```bash
pnpm run repair:create-job -- \
--from-report ../clawsweeper/records/openclaw-openclaw/items/123.md
```
The job creator checks for an existing open PR, body match, or remote
`clawsweeper/<cluster-id>` branch before writing another job. Use `--dry-run`
to inspect. Use `--force` only after deciding the duplicate guard is stale.
Validate, commit, then dispatch:
```bash
pnpm run repair:validate-job -- jobs/openclaw/inbox/clawsweeper-openclaw-openclaw-123.md
pnpm run repair:dispatch -- jobs/openclaw/inbox/clawsweeper-openclaw-openclaw-123.md \
--mode autonomous \
--runner blacksmith-4vcpu-ubuntu-2404 \
--execution-runner blacksmith-16vcpu-ubuntu-2404 \
--model gpt-5.5
```
Do not dispatch a just-created job before the job file is committed and pushed;
the workflow reads the job path from GitHub.
## Replacement PRs
For a useful but uneditable/stale/unsafe source PR, make the maintainer prompt
explicit:
```md
Treat #123 as useful source work. If the source branch cannot be safely updated
because it is uneditable, stale, draft-only, unmergeable, or unsafe, create a
narrow ClawSweeper replacement PR instead of waiting. Preserve the source PR
author as co-author, credit the source PR in the replacement PR body, and close
only that source PR after the replacement PR is opened.
```
The worker should emit `repair_strategy=replace_uneditable_branch` and list the
source PR URL in `source_prs`. The deterministic executor opens or updates
`clawsweeper/<cluster-id>`, adds non-bot source authors as `Co-authored-by`
trailers, and closes superseded source PRs only after replacement exists.
## Gates
Open execution windows intentionally and close them after the run:
```bash
gh variable set CLAWSWEEPER_ALLOW_EXECUTE --repo openclaw/clawsweeper --body 1
gh variable set CLAWSWEEPER_ALLOW_FIX_PR --repo openclaw/clawsweeper --body 1
gh variable set CLAWSWEEPER_ALLOW_MERGE --repo openclaw/clawsweeper --body 1
gh variable set CLAWSWEEPER_ALLOW_AUTOMERGE --repo openclaw/clawsweeper --body 1
```
Reset gates only when explicitly requested; the active maintainer window may intentionally
leave them at `1`.
Important gates:
- `CLAWSWEEPER_ALLOW_EXECUTE`: allows deterministic write lanes.
- `CLAWSWEEPER_ALLOW_FIX_PR`: allows branch repair/replacement PRs.
- `CLAWSWEEPER_ALLOW_MERGE`: allows merge-capable applicators.
- `CLAWSWEEPER_ALLOW_AUTOMERGE`: allows comment-router automerge.
- `CLAWSWEEPER_COMMENT_ROUTER_EXECUTE`: lets scheduled comment routing
post replies and dispatch repair.
## Maintainer Mentions
Prefer `@clawsweeper` comments for all maintainer-facing control. Slash
commands still parse as compatibility aliases, but examples and live guidance
should use mentions.
```text
@clawsweeper status
@clawsweeper re-review
@clawsweeper review
@clawsweeper fix ci
@clawsweeper address review
@clawsweeper rebase
@clawsweeper autofix
@clawsweeper automerge
@clawsweeper approve
@clawsweeper explain
@clawsweeper stop
@clawsweeper <question or safe action request>
@clawsweeper[bot] re-review
@openclaw-clawsweeper fix ci
@openclaw-clawsweeper[bot] fix ci
```
Accepted aliases: `review`, `re-review`, `rereview`, `review again`,
`rerun review`, and `run review`. `review` and `re-review` dispatch a fresh
ClawSweeper issue/PR review without starting repair. `fix ci`,
`address review`, and `rebase` dispatch the
repair worker only for ClawSweeper PRs or PRs opted into
`clawsweeper:autofix` or `clawsweeper:automerge`. `autofix` runs the bounded
review/fix loop without merging. `automerge` runs the bounded review/fix/merge
loop, but draft PRs stay fix-only until GitHub marks them ready for review.
Freeform maintainer mentions such as `@clawsweeper why did automerge stop?`
or `@clawsweeper: can you explain this failure?` dispatch a read-only assist
review with the mention text as one-off instructions. The answer lands in the
next public ClawSweeper review comment. Action-looking prose does not directly
mutate GitHub; it must map to existing structured recommendations and pass the
normal deterministic gates.
Default accepted maintainers: `OWNER`, `MEMBER`, `COLLABORATOR`; fallback
repository permission accepts `admin`, `maintain`, or `write`. Contributor
comments are ignored without a reply.
Run router manually:
```bash
pnpm run repair:comment-router -- --repo openclaw/openclaw --lookback-minutes 180
pnpm run repair:comment-router -- --repo openclaw/openclaw --execute --wait-for-capacity
```
Scheduled routing stays dry unless
`CLAWSWEEPER_COMMENT_ROUTER_EXECUTE=1`.
## Trusted Autofix And Automerge
`@clawsweeper autofix` opts an existing PR into the bounded review/fix loop.
`@clawsweeper automerge` opts an existing PR into the bounded review/fix/merge
loop. The router:
- verifies maintainer authorization;
- labels the PR `clawsweeper:autofix` or `clawsweeper:automerge`;
- dispatches ClawSweeper review for the current head SHA;
- creates or reuses a durable adopted job;
- repairs at most the configured caps;
- never merges autofix PRs or draft PRs;
- merges automerge PRs only when ClawSweeper passed the exact current head,
checks are green, GitHub says mergeable, no human-review label is present,
the PR is not draft, and both merge gates are open.
Missing changelog is not a review finding or merge blocker. If repairing a user-facing change, add/update changelog automatically when practical; never ask or block solely on it.
If ClawSweeper passes while merge gates are closed, it labels
`clawsweeper:merge-ready` and comments instead of merging. `@clawsweeper stop`
adds `clawsweeper:human-review`.
When asked to create a PR and enable ClawSweeper automerge, do not
leave the local OpenClaw checkout on the PR branch. After the PR is created,
pushed, and the `@clawsweeper automerge` request is posted or otherwise
confirmed, return the local checkout to `main` and fast-forward it when the
working tree is clean:
```bash
git switch main
git pull --ff-only
```
If unrelated local edits or an in-progress rebase prevent switching, report the
blocker instead of stashing, deleting, or overwriting work.
Repair caps:
```bash
CLAWSWEEPER_MAX_REPAIRS_PER_PR=10
CLAWSWEEPER_MAX_REPAIRS_PER_HEAD=1
```
## Security Boundary
Do not stage unapproved security-sensitive work for ClawSweeper Repair. Route
vulnerability reports, CVE/GHSA/advisory work, leaked secrets/tokens/keys,
plaintext secret storage, SSRF, XSS, CSRF, RCE, auth bypass, privilege
escalation, and sensitive data exposure to central OpenClaw security handling.
For PRs explicitly opted into `clawsweeper:autofix` or
`clawsweeper:automerge`, security-sensitive review findings may dispatch
bounded repair, but merge remains blocked until a later exact-head review is
clean and the normal merge gates pass. Trust deterministic ClawSweeper security
markers, labels, and job frontmatter; do not infer security handling from vague
prose.
## Monitoring
Receiver workflows:
```bash
gh run list --repo openclaw/clawsweeper --workflow "ClawSweeper Commit Review" \
--limit 12 --json databaseId,displayTitle,event,status,conclusion,createdAt,updatedAt,url
gh run list --repo openclaw/clawsweeper --workflow "repair cluster worker" \
--limit 12 --json databaseId,displayTitle,event,status,conclusion,createdAt,updatedAt,url
gh run list --repo openclaw/clawsweeper --workflow "repair comment router" \
--limit 12 --json databaseId,displayTitle,event,status,conclusion,createdAt,updatedAt,url
```
Target dispatcher:
```bash
gh run list --repo openclaw/openclaw --workflow "ClawSweeper Dispatch" \
--event push --limit 8 --json databaseId,displayTitle,event,status,conclusion,headSha,url
```
Target commit check:
```bash
gh api "repos/openclaw/openclaw/commits/<sha>/check-runs?per_page=100" \
--jq '.check_runs[] | select(.name=="ClawSweeper Commit Review") | [.status,.conclusion,.details_url] | @tsv'
```
## Reading Output
For findings or failures, summarize:
- target repo, item/PR/commit, run, report path
- result, confidence, severity, and exact blocker
- affected files or cluster refs
- validation commands and whether they passed
- whether mutation gates were open or closed
- next deterministic action
Keep the broom small: one cluster, one branch, one PR, narrow proof, clear
owner-visible evidence.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "ClawSweeper"
short_description: "Inspect ClawSweeper commit review reports and Actions runs."
default_prompt: "Review recent ClawSweeper commit reports and summarize findings."

View File

@@ -1,74 +0,0 @@
---
name: control-ui-e2e
description: Use when testing, fixing, or extending the OpenClaw Control UI GUI with Vitest + Playwright end-to-end checks, mocked Gateway WebSocket flows, mocked dashboard runs, screenshots/videos, or agent-verifiable browser proof.
---
# Control UI E2E
Use this for Control UI changes that need a real browser flow with deterministic Gateway data.
## Test Shape
- Use `ui/src/**/*.e2e.test.ts` for full GUI flows.
- Use `ui/src/test-helpers/control-ui-e2e.ts` to start the Vite Control UI and install a mocked Gateway WebSocket.
- Keep scenarios deterministic. Do not use live provider keys, real channel credentials, or a real Gateway unless the user explicitly asks for live proof.
- Prefer existing `.browser.test.ts` or unit tests for narrow rendering logic; use this E2E lane when the proof should cover routing, app boot, Gateway handshake, requests, and visible UI behavior together.
## Commands
- Target one E2E test in a Codex worktree:
```bash
node scripts/run-vitest.mjs run --config test/vitest/vitest.ui-e2e.config.ts --configLoader runner ui/src/ui/e2e/chat-flow.e2e.test.ts
```
- Run the whole local lane in a normal checkout:
```bash
pnpm test:ui:e2e
```
If dependencies are missing in a Codex worktree, install once with `pnpm install`; for broad GUI proof or dependency-heavy checks, use Testbox/Crabbox instead of running a wide local pnpm lane.
## Visual Proof Default
When running mocked Control UI/dashboard validation for a user-facing feature, produce visual proof by default unless the user explicitly opts out.
- Keep the Vitest E2E assertions deterministic; do not commit generated screenshots or videos.
- After or alongside the focused E2E test, run the mocked Control UI app when available, for example `pnpm dev:ui:mock -- --port <port>`.
- Drive Chromium with Playwright against the local mock URL and capture a video plus screenshots for each meaningful state: initial view, interaction input, result state, and final/paginated/selected state.
- Use `browser.newContext({ recordVideo: { dir, size }, viewport })`, `page.screenshot({ path })`, and close the context before reporting the video path.
- Put artifacts under `.artifacts/control-ui-e2e/<short-feature-name>/` or another clearly named local temp directory, and report the absolute paths in the final answer.
- Treat recording as validation, not only demo capture. If the recorder fails or shows surprising behavior, stop, fix the behavior, add or update a regression test, then rerecord.
- If visual proof is blocked, state the exact blocker and still report the textual E2E evidence.
## Mock Pattern
Start the app server, install the mock before `page.goto`, then assert both Gateway traffic and visible UI:
```ts
const server = await startControlUiE2eServer();
const page = await context.newPage();
const gateway = await installMockGateway(page, {
historyMessages: [{ role: "assistant", content: [{ type: "text", text: "Ready." }] }],
});
await page.goto(`${server.baseUrl}chat`);
await page.locator(".agent-chat__composer-combobox textarea").fill("hello");
await page.getByRole("button", { name: "Send message" }).click();
const request = await gateway.waitForRequest("chat.send");
await gateway.emitChatFinal({ runId: String(request.params.idempotencyKey), text: "Done." });
await page.getByText("Done.").waitFor();
```
Extend `installMockGateway` with typed scenario options or method responses when a new flow needs more Gateway surface.
## Standalone Recording
When recording an already-running mocked Control UI URL, use a temporary Playwright script or `playwright test` spec and keep the recording flow focused:
- Open the mock URL, interact through stable `data-*` selectors or user-facing role selectors, and wait on asserted states instead of relying on fixed sleeps.
- Assert both visible UI state and mocked Gateway traffic for request-driven flows. For example, verify the expected count/row is visible and that `sessions.list` was called with the expected `search`, `offset`, and `limit`.
- Use short sleeps only after assertions to make the captured video readable.
- Store the generated video under `.artifacts/control-ui-e2e/<feature>/`; do not commit it.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Control UI E2E"
short_description: "Mocked browser E2E for Control UI"
default_prompt: "Use $control-ui-e2e to verify a Control UI change with the mocked Vitest + Playwright browser lane."

View File

@@ -1,732 +0,0 @@
---
name: crabbox
description: Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id.
---
# Crabbox
Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests,
CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed
reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup.
Crabbox is the transport/orchestration surface. The actual backend can be:
- brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like
`cbx_...`, `syncDelegated=false`
- Blacksmith Testbox through Crabbox: delegated provider,
`provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true`
For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the
Crabbox wrapper is acceptable and often preferred when the standing Testbox
rules apply. Do not describe those runs as "AWS Crabbox"; report them as
Testbox-through-Crabbox with the `tbx_...` id and Actions run.
Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs
direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`,
`--full-resync`, environment forwarding, capture/download support, or provider
comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw
maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or
the user asks for Testbox/Blacksmith.
## First Checks
- Run from the repo root. Crabbox sync mirrors the current checkout.
- Check the wrapper and providers before remote work:
```sh
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,120p'
../crabbox/bin/crabbox desktop launch --help
../crabbox/bin/crabbox webvnc --help
```
- OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH
shim can be stale.
- Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider`
means brokered AWS for normal Linux/macOS paths; the wrapper selects Azure
for unqualified Windows/WSL2 runs when the local Crabbox binary advertises
Azure.
- The brokered AWS default is a Linux developer image in `eu-west-1`; the repo
config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply.
If warmup drifts well past the minute-scale path, verify image promotion,
region/AZ placement, and FSR state before blaming OpenClaw.
- For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with
`--provider blacksmith-testbox` or the repo Testbox helpers when the standing
Testbox policy applies.
- Always report the actual provider and id. `cbx_...` means AWS Crabbox;
`tbx_...` means Blacksmith Testbox through Crabbox. If the output only says
`blacksmith testbox list`, use `blacksmith testbox list --all` before
concluding no box exists.
- If a warm direct-provider lease smells stale, retry with `--full-resync`
(alias `--fresh-sync`) before replacing the lease. This resets the remote
workdir, skips the fingerprint fast path, reseeds Git when possible, and
uploads the checkout from scratch.
- For live/provider bugs, use the configured secret workflow before downgrading
to mocks. Copy only the exact needed key into the remote process environment
for that one command. Do not print it, do not sync it as a repo file, and do
not leave it in remote shell history or logs. If no secret-safe injection path
is available, say true live provider auth is blocked instead of silently using
a fake key.
- Prefer local targeted tests for tight edit loops. Broad gates belong remote.
- Do not treat inherited shell env as operator intent. In particular,
`OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission
to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or
lint/typecheck fan-out onto the laptop.
- Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly
asks for local proof in the current task. If Testbox is queued or capacity is
constrained, report the blocker and keep only targeted local edit-loop checks
running.
## macOS And Windows Targets
Use these only when the task needs an existing non-Linux host. OpenClaw broad
Linux validation uses the repo Crabbox config unless a provider is explicitly
requested.
Native brokered Windows is available for Windows-specific proof. Prefer Azure
for Windows/WSL2 when the subscription has quota or credits and the local
Crabbox binary advertises Azure. Keep broad Linux gates on Linux/Testbox unless
the bug is Windows-specific, and only force AWS when the operator asks for the
older AWS developer image/cache path or Azure is unavailable:
```sh
pnpm crabbox:warmup -- \
--target windows \
--windows-mode wsl2 \
--timing-json
```
The hydrate workflow assumes Docker should already be baked into Linux images
and only installs it as a fallback. Do not add per-run Docker installs to proof
commands unless the image probe shows Docker is actually missing.
When the user explicitly asks for brokered macOS runners, use Crabbox AWS
macOS only after confirming the deployed coordinator supports EC2 Mac host
lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota
and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host,
or run the no-spend preflight first:
```sh
crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json
crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json
CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh
```
Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report
paid-host blockers as quota, IAM, coordinator deployment, or host availability
instead of falling back to local macOS.
Crabbox supports static SSH targets:
```sh
../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test"
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test
```
- `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH,
bash, Git, rsync, and tar contract.
- Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar
archive transfer into `static.workRoot`. Direct native Windows runs support
`--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`.
- `crabbox actions hydrate/register` are Linux-only today; use plain
`crabbox run` loops for static macOS and Windows hosts.
- Live proof needs a reachable, operator-managed SSH host. Without one, verify
with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox
Go test suite.
## Direct Brokered AWS Backend
Use this when the task needs direct AWS Crabbox semantics rather than the
prepared Blacksmith Testbox CI environment.
Changed gate:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm test:changed"
```
Full suite:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm verify"
```
Use `pnpm verify` when you need check plus full Vitest proof. It emits
`CRABBOX_PHASE:check` and `CRABBOX_PHASE:test`, making Crabbox summaries show
which stage failed. Use plain `pnpm test` only when check proof is already
covered or intentionally skipped.
Focused rerun:
```sh
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"pnpm test <path-or-filter>"
```
Read the JSON summary. Useful fields:
- `provider`: `aws`
- `leaseId`: `cbx_...`
- `syncDelegated`: `false`
- `commandPhases`: populated when the command prints `CRABBOX_PHASE:<name>`
- `commandMs` / `totalMs`
- `exitCode`
Crabbox should stop one-shot AWS leases automatically after the run. Verify
cleanup when a run fails, is interrupted, or the command output is unclear:
```sh
../crabbox/bin/crabbox list --provider aws
```
## Blacksmith Testbox Through Crabbox
Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI
environment is the right proof surface:
```sh
node scripts/crabbox-wrapper.mjs run \
--provider blacksmith-testbox \
--blacksmith-org openclaw \
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
--blacksmith-job check \
--blacksmith-ref main \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
-- \
corepack pnpm check:changed
```
Read the JSON summary and the Testbox line. Useful fields:
- `provider`: `blacksmith-testbox`
- `leaseId`: `tbx_...`
- `syncDelegated`: `true`
- `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync
- Actions run URL/id from the Testbox output
- `exitCode`
Use provider-backed cache volumes only for rebuildable caches, not secrets or
checkout state. On Blacksmith, Crabbox forwards them as sticky disks:
```sh
node scripts/crabbox-wrapper.mjs run \
--provider blacksmith-testbox \
--cache-volume pnpm-store=openclaw-node24-pnpm-lock:/tmp/openclaw-pnpm-store \
--timing-json \
-- \
corepack pnpm check:changed
```
The selected provider must advertise cache-volume support. If not, omit
`--cache-volume` and rely on kept-lease caches.
`blacksmith testbox list` may hide hydrating or ready boxes. Use:
```sh
blacksmith testbox list --all
blacksmith testbox status <tbx_id>
```
## Observability Flags
Use these on debugging runs before inventing ad hoc logging:
- `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd,
and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`,
`corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX
`sudo`/`apt`/`bubblewrap` and native Windows
`powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add
`--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo
`run.preflightTools` to replace the list. `default` expands built-ins; `none`
prints only the workspace summary. Preflight is diagnostic only; install
toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or
the run script. On `blacksmith-testbox`, this prints a delegated-unsupported
note because the workflow owns setup.
- `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct
providers and prints `set len=N secret=true` style summaries. On
`blacksmith-testbox`, env forwarding is unsupported; put secrets in the
Testbox workflow instead.
- `--env-from-profile <file>` plus `--allow-env NAME`: loads simple
`export NAME=value` / `NAME=value` lines from a local profile without
executing it, then forwards only allowlisted names. `--allow-env` is
repeatable and comma-separated. Profile values override ambient allowlisted
env values for that run. Direct POSIX, WSL2, and native Windows runs are
supported; delegated providers are not. Crabbox probes the uploaded profile
remotely and prints redacted presence/length metadata before the command.
- `--env-helper <name>`: with `--env-from-profile` on POSIX SSH targets,
persists `.crabbox/env/<name>` and `.crabbox/env/<name>.env` so follow-up
commands on the same lease can run through `./.crabbox/env/<name> <command>`.
Use only on leases you control; the profile stays until cleanup, lease reset,
or `--full-resync`.
- `--script <file>` / `--script-stdin`: upload a local script into
`.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute
directly on POSIX; scripts without a shebang run through `bash`. Native
Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1`
when needed. Arguments after `--` become script args.
- `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a
fresh remote checkout of the GitHub PR. Bare numbers use the current repo's
GitHub origin. Add `--apply-local-patch` only when the current local
`git diff --binary HEAD` should be applied on top of that PR checkout.
- `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir
before syncing. Use after sync fingerprints look wrong, SSH times out before
sync, or rsync watchdog output suggests it. It is redundant with
`--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated
providers.
- `--capture-stdout <path>` / `--capture-stderr <path>`: write remote streams to
local files and keep binary/noisy output out of retained logs. Parent
directories must already exist. These are direct-provider only.
- `--capture-on-fail`: on non-zero direct-provider exits, downloads
`.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`,
`coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed.
- `--keep-on-failure`: leave a failed one-shot lease alive for live debugging
until idle/TTL expiry. Useful on direct providers and delegated one-shots.
- `--timing-json`: final machine-readable timing. Add
`echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell
commands; direct providers and Blacksmith Testbox both report them as
`commandPhases`.
Live-provider debug template for direct AWS/Hetzner leases:
```sh
mkdir -p .crabbox/logs
CRABBOX_ENV_ALLOW=OPENAI_API_KEY,OPENAI_BASE_URL \
pnpm crabbox:run -- --provider aws \
--preflight \
--allow-env OPENAI_API_KEY,OPENAI_BASE_URL \
--timing-json \
--capture-stdout .crabbox/logs/live-provider.stdout.log \
--capture-stderr .crabbox/logs/live-provider.stderr.log \
--capture-on-fail \
--shell -- \
"echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live"
```
Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or
`--sync-only` to delegated providers. Crabbox rejects them because the provider
owns sync or command transport.
## Efficient Bug E2E Verification
Use the smallest Crabbox lane that proves the reported user path, not just the
touched code. Aim for one after-fix E2E proof before commenting, closing, or
opening a PR for a user-visible bug.
When the user says "test in Crabbox", do not simply copy tests to the remote
box and run them there. Crabbox is for remote real-scenario proof: copy or
install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API
call that failed, and capture behavior from that entrypoint. For regressions or
bug reports, prove the broken state first when feasible, then run the same
scenario after the fix.
Pick the lane by symptom:
- Docker/setup/install bug: build a package tarball and run the matching
`scripts/e2e/*-docker.sh` or package script. This proves npm packaging,
install paths, runtime deps, config writes, and container behavior.
- Provider/model/auth bug: prefer true live E2E. Use the configured secret
workflow, then inject the single needed key into Crabbox if needed. Scrub
unrelated provider env vars in the child command so interactive defaults do
not drift to another provider. If only a dummy key is used, label the proof
narrowly, e.g. "UI/install path only; live provider auth not exercised."
- Channel delivery bug: use the channel Docker/live lane when available; include
setup, config, gateway start, send/receive or agent-turn proof, and redacted
logs.
- Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that
creates real state and inspects the resulting files/API output.
- Pure parser/config bug: targeted tests may be enough, but still run a
Crabbox command when OS, package, Docker, secrets, or service lifecycle could
change behavior.
Efficient flow:
1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint
when feasible. If the issue cannot be reproduced, capture the exact command
and observed behavior instead.
2. Patch locally and run narrow local tests for edit speed.
3. Run one Crabbox E2E command that starts from the user-facing entrypoint:
package install, Docker setup, onboarding, channel add, gateway start, or
agent turn as appropriate.
4. Record proof as: Testbox id, command, environment shape, redacted secret
source, and copied success/failure output.
5. If the issue says "cannot reproduce", ask for the missing config/log fields
that would distinguish the tested path from the reporter's path.
Keep it efficient:
- Reuse existing E2E scripts and helper assertions before writing ad hoc shell.
- Use `--script <file>` or `--script-stdin` for multi-line E2E commands instead
of quote-heavy `--shell` strings on direct SSH providers.
- Use `--fresh-pr <pr>` when validating an upstream PR in isolation from the
local dirty tree. Add `--apply-local-patch` only when testing a local fixup on
top of that PR.
- Use `--full-resync` before replacing a warmed direct-provider lease when the
remote workdir or sync fingerprint appears stale.
- Use one-shot Crabbox for a single proof; use a reusable Testbox only when
several commands must share built images, installed packages, or live state.
- Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a
candidate tarball; prefer the repo's package helper instead of direct source
execution when the bug might be packaging/install related.
- Keep secrets redacted. It is fine to report key presence, source, and length;
never print secret values.
- Include `--timing-json` on broad or flaky runs when command duration or sync
behavior matters.
Before/after PR proof on delegated Testbox:
- For PRs that should prove "broken before, fixed after", compare base and PR
on the same Testbox when practical. Fetch both refs, create detached temp
worktrees under `/tmp`, install in each, then run the same harness twice.
- Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync
may leave the root dirty with local files; `git checkout` can abort or mix
proof state.
- Temp harness files under `/tmp` do not resolve repo packages by default. Put
the harness inside the worktree, or in ESM use
`createRequire(path.join(process.cwd(), "package.json"))` before requiring
workspace deps such as `@lydell/node-pty`.
- For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only
assertions. Use a real PTY, wait for visible lifecycle markers, send input,
then send control keys and assert process exit/stuck behavior.
- When validating a rebased local branch before push, remember delegated sync
usually validates synced file content on a detached dirty checkout, not a
remote commit object. Record the local head SHA, changed files, Testbox id,
and final success markers; after pushing, ensure the pushed SHA has the same
file content.
- If GitHub CI is still queued but the exact changed content passed Testbox
`pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is
reasonable to merge once required checks allow it. Note any still-running
unrelated shards in the proof comment instead of waiting forever.
Interactive CLI/onboarding:
- For full-screen or prompt-heavy CLI flows, run the target command inside tmux
on the Crabbox and drive it with `tmux send-keys`; capture proof with
`tmux capture-pane`, redacted through `sed`.
- Prefer deterministic arrow navigation over search typing for Clack-style
searchable selects. Raw `send-keys -l openai` may not trigger filtering in a
tmux pane; inspect option order locally or on-box and send exact Down/Enter
sequences.
- Isolate mutable state with `OPENCLAW_STATE_DIR=$(mktemp -d)`. Plugin npm
installs live under that state dir (`npm/node_modules/...`), not under
`OPENCLAW_CONFIG_DIR`. Verify downloads by checking the state dir, package
lock, and installed package metadata.
- To test automatic setup installs against local package artifacts, use
`OPENCLAW_ALLOW_PLUGIN_INSTALL_OVERRIDES=1` plus
`OPENCLAW_PLUGIN_INSTALL_OVERRIDES='{"plugin-id":"npm-pack:/tmp/plugin.tgz"}'`.
Pack with `npm pack`, set an isolated `OPENCLAW_STATE_DIR`, and verify the
package under `npm/node_modules`. Overrides are test-only and must not be
treated as official/trusted-source installs.
- For OpenAI/Codex onboarding proof, the useful markers are the UI line
`Installed Codex plugin`, `npm/node_modules/@openclaw/codex`, and the
package-lock entry showing the bundled `@openai/codex` dependency. A dummy
OpenAI-shaped key can prove only UI/install behavior; it is not live auth.
## Reuse And Keepalive
For most Crabbox calls, one-shot is enough. Use reuse only when you need
multiple manual commands on the same hydrated box.
If Crabbox returns a reusable id or you intentionally keep a lease:
```sh
pnpm crabbox:run -- --id <cbx_id-or-slug> --no-sync --timing-json --shell -- "pnpm test <path>"
```
Stop boxes you created before handoff:
```sh
pnpm crabbox:stop -- <id-or-slug>
blacksmith testbox stop --id <tbx_id>
```
## Interactive Desktop And WebVNC
Prefer WebVNC for human inspection because the browser portal can preload the
lease VNC password and avoids a native VNC client's copy/paste/password dance.
Use native `crabbox vnc` only when WebVNC is unavailable, the browser portal is
broken, or the user explicitly wants a local VNC client.
Common desktop flow:
```sh
../crabbox/bin/crabbox warmup --provider hetzner --desktop --browser --class standard --idle-timeout 60m --ttl 240m
../crabbox/bin/crabbox desktop launch --provider hetzner --id <cbx_id-or-slug> --browser --url https://example.com --webvnc --open --take-control
```
Useful WebVNC commands:
```sh
../crabbox/bin/crabbox webvnc --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon start --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc daemon stop --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc reset --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox desktop doctor --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox desktop click --provider hetzner --id <cbx_id-or-slug> --x 640 --y 420
../crabbox/bin/crabbox desktop paste --provider hetzner --id <cbx_id-or-slug> --text "user@example.com"
../crabbox/bin/crabbox desktop key --provider hetzner --id <cbx_id-or-slug> ctrl+l
../crabbox/bin/crabbox artifacts collect --id <cbx_id-or-slug> --all --output artifacts/<slug>
../crabbox/bin/crabbox artifacts publish --dir artifacts/<slug> --pr <number>
```
`desktop launch --webvnc --open` is usually the nicest one-shot: it starts the
browser/app inside the visible session, bridges the lease into the authenticated
WebVNC portal, and opens the portal. Keep browsers windowed for human QA; use
`--fullscreen` only for capture/video workflows.
For human handoff, include `--take-control` so the opened portal viewer gets
keyboard/mouse control automatically instead of landing as an observer.
Human handoff preflight:
- Do not assume a visible desktop or launched browser means the repo CLI/app is
installed, built, or on the interactive terminal's `PATH`.
- Before handing WebVNC to a human tester, prove the expected command from the
same kept lease and from a neutral directory such as `~`.
- If the handoff needs repo-local code, sync/build/link it explicitly on that
lease. Source-tree CLIs often need build output before a symlink works.
- Prefer a real `command -v <expected-command> && <expected-command> --version`
check over a repo-root-only `pnpm ...` command.
Generic handoff repair pattern:
```sh
../crabbox/bin/crabbox run --id <cbx_id-or-slug> --full-resync --shell -- \
"set -euo pipefail
pnpm install --frozen-lockfile
pnpm build
sudo ln -sf \"\$PWD/<cli-entry>\" /usr/local/bin/<expected-command>
cd ~
command -v <expected-command>
<expected-command> --version"
```
## If Crabbox Fails
Keep the fallback narrow. First decide whether the failure is Crabbox itself,
the brokered AWS lease, Blacksmith/Testbox, repo hydration, sync, or the test
command.
Fast checks:
```sh
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,140p'
../crabbox/bin/crabbox doctor
command -v blacksmith
blacksmith --version
blacksmith testbox list
```
Common Crabbox-only failures:
- Provider missing or old CLI: use `../crabbox/bin/crabbox` from the sibling
repo, or update/install Crabbox before retrying.
- Bad local config: inspect `.crabbox.yaml`, `crabbox config show`, and
`crabbox whoami`; normal OpenClaw proof should use brokered AWS without
asking for cloud keys.
- Slug/claim confusion: use the raw `cbx_...` / `tbx_...` id, or run one-shot
without `--id`.
- Sync/timing bug: add `--debug --timing-json`; capture the final JSON and the
printed Actions URL. Large sync warnings now include top source directories
by file count and a hint to update `.crabboxignore` / `sync.exclude`; inspect
those before reaching for `--force-sync-large`. Quiet rsync watchdogs and SSH
timeouts now print `next_action=` hints; follow them, usually `--full-resync`
first and a fresh lease second.
- Cleanup uncertainty: run `crabbox list --provider aws`; for explicit
Blacksmith runs, use `blacksmith testbox list` and stop only boxes you
created.
- Testbox queued/capacity pressure: do not retry Blacksmith repeatedly. Rerun
once without `--provider` so `.crabbox.yaml` routes to brokered AWS, or report
the Blacksmith blocker if Testbox itself is the requested proof.
If brokered AWS cannot dispatch, sync, attach, or stop, retry once with
`--debug` and `--timing-json`:
```sh
pnpm crabbox:run -- --debug --timing-json -- \
pnpm test:changed
```
Full suite:
```sh
pnpm crabbox:run -- --debug --timing-json -- \
pnpm test
```
Auth fallback, only when `blacksmith` says auth is missing:
```sh
blacksmith auth login --non-interactive --organization openclaw
```
Raw Blacksmith footguns:
- Run from repo root. The CLI syncs the current directory.
- Save the returned `tbx_...` id in the session.
- Reuse that id for focused reruns; stop it before handoff.
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
- Treat `blacksmith testbox list` as cleanup diagnostics, not a shared reusable
queue.
Use Blacksmith only when the task is specifically about Testbox, brokered AWS
is unavailable, or an explicit comparison is needed. If Blacksmith is down or
quota-limited, do not keep probing it; stay on brokered AWS and note the
delegated-provider outage.
## Blacksmith Backend Notes
Crabbox Blacksmith backend delegates setup to:
- org: `openclaw`
- workflow: `.github/workflows/ci-check-testbox.yml`
- job: `check`
- ref: `main` unless testing a branch/tag intentionally
The hydration workflow owns checkout, Node/pnpm setup, dependency install,
secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command
execution, timing, logs/results, cleanup, and cache-volume requests. Blacksmith
implements cache volumes as sticky disks.
Minimal Blacksmith-backed Crabbox run, from repo root:
```sh
pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- \
corepack pnpm test:changed
```
Use direct Blacksmith only when Crabbox is the broken layer and you are
isolating a Crabbox bug. Prefer direct `blacksmith testbox list` for cleanup
diagnostics, not as a reusable work queue.
Important Blacksmith footguns:
- Always run from repo root. The CLI syncs the current directory.
- Raw commit SHAs are not reliable `warmup --ref` refs; use a branch or tag.
- If auth is missing and browser auth is acceptable:
```sh
blacksmith auth login --non-interactive --organization openclaw
```
## Brokered AWS
Use AWS for normal OpenClaw remote proof. The repo `.crabbox.yaml` already
selects brokered AWS, so omit `--provider` unless you are testing a different
provider deliberately.
```sh
pnpm crabbox:warmup -- --class beast --market on-demand --idle-timeout 90m
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "pnpm test:changed"
pnpm crabbox:stop -- <cbx_id-or-slug>
```
Install/auth for owned Crabbox if needed:
```sh
brew install openclaw/tap/crabbox
crabbox login --url https://crabbox.openclaw.ai --provider aws
```
New users should self-resolve broker auth before anyone asks for AWS keys:
```sh
crabbox config show
crabbox doctor
crabbox whoami
```
- If broker auth is missing, run `crabbox login --url https://crabbox.openclaw.ai --provider aws`.
- If the CLI asks for `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, or AWS
profile setup during normal OpenClaw validation, assume the agent selected
the wrong path. Use brokered `crabbox login` or an existing brokered lease
before asking the user for cloud credentials.
- Ask for AWS keys only for explicit direct-provider/account administration,
not for normal brokered OpenClaw proof.
- Trusted automation may still use
`printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin`.
macOS config lives at:
```text
~/Library/Application Support/crabbox/config.yaml
```
It should include `broker.url`, `broker.token`, and usually `provider: aws`
for OpenClaw lanes. Let that config drive normal validation.
### Interactive Desktop / WebVNC
For human desktop demos, prefer `webvnc` over native `vnc` and keep the remote
desktop visible/windowed. Do not fullscreen the remote browser or hide the XFCE
panel/window chrome unless the explicit goal is video/capture output. After
launch, verify a screenshot shows the desktop panel plus browser title bar. If
Chrome is fullscreen, toggle it back with:
```sh
crabbox run --id <lease> --shell -- 'DISPLAY=:99 xdotool search --onlyvisible --class google-chrome windowactivate key F11'
```
## Diagnostics
```sh
crabbox status --id <id-or-slug> --wait
crabbox inspect --id <id-or-slug> --json
crabbox sync-plan
crabbox history --limit 20
crabbox history --lease <id-or-slug>
crabbox attach <run_id>
crabbox events <run_id> --json
crabbox logs <run_id>
crabbox results <run_id>
crabbox cache stats --id <id-or-slug>
crabbox cache volumes
crabbox ssh --id <id-or-slug>
blacksmith testbox list
```
Use `--debug` on `run` when measuring sync timing.
Use `--timing-json` on warmup, hydrate, and run when comparing backends.
Use `--market spot|on-demand` only on AWS warmup/one-shot runs.
## Failure Triage
- Crabbox cannot find provider: verify `../crabbox/bin/crabbox --help` lists
the provider selected by `.crabbox.yaml`; update Crabbox before falling back.
- Hydration stuck or failed: open the printed GitHub Actions run URL and inspect
the hydration step.
- Sync failed: rerun with `--debug`; check changed-file count and whether the
checkout is dirty.
- Command failed: rerun only the failing shard/file first. Do not rerun a full
suite until the focused failure is understood.
- Cleanup uncertain: `crabbox list --provider aws`; for explicit Blacksmith
runs, use `blacksmith testbox list` and stop owned `tbx_...` leases you
created.
- Crabbox broken but Blacksmith works: use the direct Blacksmith fallback above,
then file/fix the Crabbox issue.
## Boundary
Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the
hydration workflow and keep Crabbox generic around lease, sync, command
execution, logs/results, timing, and cleanup.

View File

@@ -1,37 +0,0 @@
---
name: discord-clawd
description: Use to talk to the Discord-backed OpenClaw agent/session; not for archive search.
---
# Discord Clawd
Use this when the task is to talk with the Discord-backed agent/session, ask it a question, or post through that route.
For Discord archive/history/search, use `$discrawl` instead.
## Transport
Use the OpenClaw relay helper:
```bash
cd ~/Projects/agent-scripts
python3 skills/openclaw-relay/scripts/openclaw_relay.py targets
python3 skills/openclaw-relay/scripts/openclaw_relay.py resolve --target maintainers
```
If the target alias exists, prefer a private ask first:
```bash
python3 skills/openclaw-relay/scripts/openclaw_relay.py ask \
--target maintainers \
--message "Reply with exactly OK."
```
Use `publish` when the session should decide whether to post. Use `force-send` only when the user explicitly wants a message posted.
## Guardrails
- Resolve the target before sending real content.
- Report the target and delivery mode used.
- Do not use this for local Discord archive queries.
- Do not expose gateway tokens or session secrets.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Discord Clawd"
short_description: "Talk to the Discord-backed OpenClaw agent"
default_prompt: "Use $discord-clawd to route a private ask or explicit post through the Discord-backed OpenClaw agent/session."

View File

@@ -1,169 +0,0 @@
---
name: discrawl
description: "Discord archive: search, sync freshness, DMs, summaries, TUI, repo/release work."
metadata:
openclaw:
homepage: https://github.com/openclaw/discrawl
requires:
bins:
- discrawl
install:
- kind: go
module: github.com/openclaw/discrawl/cmd/discrawl@latest
bins:
- discrawl
---
# Discrawl
Use local Discord archive data first for Discord questions. Hit Discord APIs
only when the archive is stale, missing the requested scope, or the user asks
for current external context.
## Sources
- DB: platform-native XDG data dir, usually
`${XDG_DATA_HOME:-~/.local/share}/discrawl/discrawl.db` on Linux or
`~/Library/Application Support/discrawl/discrawl.db` on macOS
- Config: platform-native XDG config dir, with legacy fallback to
`~/.discrawl/config.toml`
- Cache: platform-native XDG cache dir
- Logs: platform-native XDG state dir
- Git share repo: platform-native XDG data dir
- Repo: `openclaw/discrawl`; use `~/GIT/_Perso/discrawl` only after verifying
its remote targets `openclaw/discrawl`, otherwise use a fresh checkout
- Preferred CLI: `discrawl`; fallback to `go run ./cmd/discrawl` from the repo
if the installed binary is stale
## Freshness
For recent/current questions, check freshness before analysis:
```bash
discrawl status --json
```
For precise freshness from the default database:
```bash
# Discrawl uses macOS ~/Library defaults unless XDG_DATA_HOME is explicitly set.
case "$(uname -s)" in
Darwin)
db="$HOME/Library/Application Support/discrawl/discrawl.db"
;;
*)
db="${XDG_DATA_HOME:-$HOME/.local/share}/discrawl/discrawl.db"
;;
esac
sqlite3 "$db" \
"select coalesce(max(updated_at),'') from sync_state where scope like 'channel:%';"
```
Routine diagnostics:
```bash
discrawl doctor
```
Desktop-local refresh:
```bash
discrawl sync --source wiretap
```
Bot API latest refresh, when credentials are available:
```bash
discrawl sync
```
Use `--full` only for deliberate historical backfills:
```bash
discrawl sync --full
```
If SQLite reports busy/locked, check for stray `discrawl` processes before retrying.
## Query Workflow
1. Resolve scope: guild, channel, DM, author, keyword, date range.
2. Check freshness for recent/current requests.
3. Prefer CLI search/messages for slices; use read-only SQL for exact counts.
4. Report absolute date spans, counts, channel/DM names, and known gaps.
Use root or subcommand help for syntax: `discrawl --help`,
`discrawl help search`, `discrawl search --help`. Use
`DISCRAWL_NO_AUTO_UPDATE=1` for read smokes when you do not want git-share
updates.
Common commands:
```bash
DISCRAWL_NO_AUTO_UPDATE=1 discrawl search --limit 20 "query"
discrawl messages --channel '#maintainers' --days 7 --all
discrawl dms --last 20
discrawl tui --dm
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select count(*) from messages;"
```
## SQL
Use `discrawl sql` for exact counts, joins, and ranking queries when normal
CLI reads are too coarse. The command is read-only by default, accepts SQL as
args or stdin, and supports `--json` for agent parsing.
Useful examples:
```bash
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select count(*) as messages from messages;"
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select coalesce(nullif(c.name, ''), m.channel_id) as channel, count(*) as messages from messages m left join channels c on c.id = m.channel_id group by m.channel_id order by messages desc limit 20;"
DISCRAWL_NO_AUTO_UPDATE=1 discrawl --json sql "select coalesce(nullif(mm.display_name, ''), nullif(mm.global_name, ''), nullif(mm.username, ''), m.author_id) as author, count(*) as messages from messages m left join members mm on mm.guild_id = m.guild_id and mm.user_id = m.author_id group by m.guild_id, m.author_id order by messages desc limit 20;"
```
Never use `--unsafe --confirm` unless the user explicitly asks for a database
mutation and the write has been reviewed.
When the installed CLI lacks a new feature, build or run from a verified
`openclaw/discrawl` checkout before concluding the feature is missing.
## Discord Boundaries
Bot API sync requires configured Discord bot credentials; do not invent token
availability. Desktop wiretap mode reads local Discord Desktop artifacts and
must not extract credentials, use user tokens, call Discord as the user, or
write to Discord application storage. Wiretap/Desktop cache DMs are local-only
and must not be described as part of the published Git snapshot. Git-share
snapshots must not include secrets or `@me` DM rows.
## Verification
For repo edits, prefer existing Go gates:
```bash
GOWORK=off go test ./...
```
Then run targeted CLI smoke for the touched surface, for example:
```bash
discrawl doctor
discrawl status --json
DISCRAWL_NO_AUTO_UPDATE=1 discrawl search --limit 5 "test"
```
## ClawSweeper Sandbox
Use the sandbox reader only:
```bash
discrawl-sandbox search --limit 20 "query"
discrawl-sandbox messages --channel clawtributors --days 7 --all
discrawl-sandbox status --json
```
This reader imports `https://github.com/openclaw/discord-store.git` into
`/root/clawsweeper-sandbox-workspace/.discrawl/discrawl.db` with
`discord.token_source = "none"`. The published Git snapshot is public-channel
filtered; do not use `/root/.discrawl/config.toml` or the rich writer DB from
sandboxed public Discord sessions.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Discrawl"
short_description: "Search local Discord archives and freshness"
default_prompt: "Use $discrawl to search local Discord archives, check freshness, inspect DMs or channel slices, and report exact date spans and source gaps."

View File

@@ -1,50 +0,0 @@
---
name: gitcrawl
description: "GitHub archive: issue/PR search, sync freshness, duplicate clusters, gh-shim PR status, and Gitcrawl repo work."
metadata:
openclaw:
homepage: https://github.com/openclaw/gitcrawl
requires:
bins:
- gitcrawl
install:
- kind: go
module: github.com/openclaw/gitcrawl/cmd/gitcrawl@latest
bins:
- gitcrawl
---
# Gitcrawl
Use local GitHub issue/PR archives before live GitHub search. Check freshness first:
```bash
gitcrawl doctor --json
```
Find candidates:
```bash
gitcrawl threads openclaw/openclaw --numbers <issue-or-pr-number> --include-closed --json
gitcrawl neighbors openclaw/openclaw --number <issue-or-pr-number> --limit 12 --json
gitcrawl search issues "query" -R openclaw/openclaw --state open --json number,title,url
gitcrawl clusters openclaw/openclaw --sort size --min-size 5
gitcrawl cluster-detail openclaw/openclaw --id <cluster-id>
```
For PR triage, start cached and go live only before mutation/merge decisions:
```bash
gitcrawl gh pr status <number-or-url> -R openclaw/openclaw --compact
gitcrawl gh pr view <number-or-url> -R openclaw/openclaw --json number,title,state,url,isDraft,headRef,headSha
gitcrawl gh --live pr status <number-or-url> -R openclaw/openclaw --compact
```
Use live `gh` plus checkout proof before commenting, labeling, closing, reopening, merging, or filing a PR review:
```bash
gh pr view <number> --json number,title,state,mergedAt,body,files,comments,reviews,statusCheckRollup
gh issue view <number> --json number,title,state,body,comments,closedAt
```
Report absolute dates, repo names, issue/PR numbers, cluster ids, and source gaps. Do not close/label from similarity alone; require matching intent plus live verification.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Gitcrawl"
short_description: "Search local OpenClaw issue and PR history before live GitHub triage"
default_prompt: "Use $gitcrawl to inspect OpenClaw issue and PR history, find related threads and duplicate candidates, then verify actionable decisions with live GitHub."

View File

@@ -1,44 +0,0 @@
---
name: graincrawl
description: "Granola archive: search, sync freshness, notes, transcripts, panels, SQL counts, and Graincrawl repo work."
metadata:
openclaw:
homepage: https://github.com/openclaw/graincrawl
requires:
bins:
- graincrawl
install:
- kind: go
module: github.com/vincentkoc/graincrawl/cmd/graincrawl@latest
bins:
- graincrawl
---
# Graincrawl
Use local Granola archive data first. Check freshness for recent/current questions:
```bash
graincrawl doctor --json
graincrawl status --json
```
Refresh only when stale or asked:
```bash
graincrawl sync --source private-api
graincrawl sync --source desktop-cache
```
Query with bounded reads:
```bash
graincrawl search "query"
graincrawl notes --json
graincrawl note get <id>
graincrawl transcripts get <id>
graincrawl panels get <id>
graincrawl --json sql "select count(*) as notes from notes;"
```
Report absolute date spans, note titles, source gaps, and transcript/panel availability. Use read-only SQL for exact counts/rankings. Before encrypted source debugging, run explicit unlock/secrets checks; do not surprise-prompt Keychain.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Graincrawl"
short_description: "Search local Granola notes and transcripts"
default_prompt: "Use $graincrawl to search local Granola notes, transcripts, and panels, check freshness, and report exact date spans and source gaps."

View File

@@ -1,202 +0,0 @@
---
name: kysely-database-access
description: Use when adding, reviewing, or refactoring OpenClaw Kysely database access, native node:sqlite stores, generated DB types, SQLite schemas, migrations, raw SQL, transactions, or database access best practices.
---
# Kysely Database Access
Use this skill for OpenClaw database code that touches Kysely, `node:sqlite`,
generated DB types, SQLite schemas, migrations, or store/query design.
## Read First
- `docs/concepts/kysely.md` for the repo's Kysely rules and examples.
- The owning subtree `AGENTS.md`, if present.
- Relevant local Kysely source/types under `node_modules/kysely/dist/esm/...`
before assuming dialect behavior, result types, transactions, plugins, or raw
SQL semantics.
- For codegen behavior, inspect `scripts/generate-kysely-types.mjs` and
`kysely-codegen --help` from the repo package manager.
## Official Docs Cross-Check
When the behavior matters, verify against current Kysely docs/source before
patching:
- Generating types: production apps should keep schema types aligned with the
database through code generation.
- Data types: TypeScript types do not affect runtime values; the driver decides
runtime values, and Kysely returns what the driver returns unless a plugin
transforms results.
- Raw SQL: the `sql` tag can execute full raw SQL and embed snippets into
builders. Prefer typed builders/helpers when they express the same thing.
- Reusable helpers: take `Expression<T>` or an `ExpressionBuilder` when wrapping
SQL expressions; alias helper expressions explicitly in `select`. Extract a
helper only when it quarantines raw SQL, removes meaningful duplication, or
preserves a tricky inferred type.
- Split build/execute only at deliberate boundaries. Compiled-query execution
is useful for native sync adapters, but keep plugin/result-transform behavior
in mind.
- Migrations: Kysely migration files run without a schema type. In OpenClaw,
prefer the committed SQL-source-of-truth path unless a new owner explicitly
needs Kysely-managed migrations.
- Plugins: plugins can transform queries and results. Any sync shortcut that
bypasses Kysely's async executor needs a documented invariant or tests.
## Default Workflow
1. Identify the owner boundary:
- Core state DB: `src/state/*`
- Per-agent DB: `src/state/openclaw-agent-*`
- Feature store: owning `*.sqlite.ts` module
- Plugin-owned state: plugin/module owner, not generic core
2. Inspect the schema source first:
- `*.sql` is the source of truth when generated schema/types exist.
- Generated `*.generated.*` files are outputs, not hand-edit targets.
3. Prefer Kysely builders for normal CRUD:
- `selectFrom`, `insertInto`, `updateTable`, `deleteFrom`
- `executeTakeFirst`, `executeTakeFirstOrThrow`, `execute`
- `eb.fn.countAll`, `eb.fn.count`, `eb.fn.coalesce` for common functions
- Keep compile-time Kysely reference literals such as `"host"` and
`"flow_id as flowId"` when they are clearer than constants; they are
type-checked by Kysely.
- Let Kysely infer selected row shapes. Do not pass broad row generics to
sync helpers for normal builder queries.
- Treat `executeSqliteQuerySync<Row>(db, builder)` and
`executeSqliteQueryTakeFirstSync<Row>(db, builder)` as a smell: the generic
can lie about selected columns. Use no generic for builders; use an exact
raw boundary helper for raw SQL.
- For finite public query presets, use a preset-to-row type map plus a union
boundary type instead of `Record<string, ...>`.
- After touching Kysely/native SQLite code, run `pnpm lint:kysely`. The AST
guard rejects raw identifier helpers, unreviewed typed `sql<T>` snippets,
`db.dynamic`, explicit sync-helper row generics for builders, and new raw
`node:sqlite` runtime access outside owner allowlists. It also rejects
persisted enum-like casts in SQLite stores; keep row fields as `string` and
parse through closed validators.
4. Keep raw SQL deliberate:
- Good: pragmas, virtual tables, FTS, SQLite JSON functions, migrations,
`sqlite_master`, compact repeated expressions.
- Bad: raw `COUNT(*)` or dynamic SQL where Kysely has a typed builder shape.
- Use `${value}` parameters; use `sql.ref` / `sql.table` only for validated,
closed-set identifiers.
- Do not feed unconstrained runtime `string` values into table/column/group/
order/identifier positions. Narrow them to local unions or generated table
keys first.
- Prefer `eb.fn`, `eb.lit`, `eb.ref`, and expression callbacks for scalar
SQL such as `count`, `coalesce`, `max`, `exists`, and constant selections.
5. Align TypeScript with real driver values:
- Kysely does not coerce runtime values.
- Native `node:sqlite` returns BLOB columns as `Uint8Array`; convert with
`Buffer.from(...)` only at API boundaries that need Buffer helpers.
- Keep JSON/text/timestamp parsing at module boundaries.
- Keep persisted enum-like strings as `string` in row types, then parse them
through closed validator helpers such as `parseTaskStatus(value)`. Do not
cast corrupt persisted data into exported unions.
6. Decide migration need from shipped state:
- Unshipped schema/type cleanup: no SQLite migration.
- Shipped canonical schema change: add the appropriate migration or
doctor/fix repair path with tests.
- Legacy config repair belongs in doctor/fix paths, not startup surprises.
## Codegen
For committed SQL-backed generated types:
```bash
pnpm db:kysely:gen
pnpm db:kysely:check
```
The repo maps SQLite `blob` to `Uint8Array` through `kysely-codegen`
`--type-mapping`. Do not post-process generated files by hand; change the
generator or SQL source and regenerate.
## Native SQLite Guardrails
- Use `getNodeSqliteKysely(db)` and sync helpers from `src/infra/kysely-sync.ts`
for `DatabaseSync` stores.
- New direct `db.prepare(...)` / `db.exec(...)` runtime access should be rare.
Prefer Kysely or add an explicit `scripts/check-kysely-guardrails.mjs`
allowlist entry with a clear owner reason.
- If raw SQLite is repeated or cast-heavy, extract a narrow boundary helper
such as `assertSqliteIntegrityOk(db, message)` and allowlist that helper
instead of each caller.
- Keep sync helper result types derived from `CompiledQuery<Row>` / Kysely
builders. Explicit helper generics are for raw SQL or external boundaries,
not for widening a typed builder result into a generic record.
- Keep the native dialect in `src/infra/kysely-node-sqlite.ts` aligned with
Kysely's SQLite driver structure: single connection, mutex, SQLite adapter,
SQLite query compiler, SQLite introspector.
- Use `StatementSync.columns().length` behavior for row-returning statements;
do not parse SQL verbs.
- Return `insertId` only for changed Kysely insert nodes. Raw insert SQL and
ignored inserts must not expose stale `lastInsertRowid`.
- Remember that sync execution compiles through Kysely but bypasses async
`executeQuery` result plugins/logging. If plugins enter this path, add tests
or a documented invariant.
## Tests
Pick the smallest proof that covers the touched surface:
```bash
pnpm db:kysely:check
pnpm lint:kysely
pnpm test src/infra/kysely-node-sqlite.test.ts
pnpm test <owning-store>.test.ts
pnpm tsgo:core
```
Add or update focused tests for:
- generated type/runtime mismatches
- native dialect metadata (`insertId`, `numAffectedRows`, row-returning SQL)
- transactions/savepoints
- BLOB and JSON boundary conversions
- schema/codegen drift
- type inference contracts for sync helpers and public query result maps
- negative type contracts with `@ts-expect-error` for important column/preset
mistakes
- corruption-path tests that mutate SQLite directly and assert the public load
or read method rejects invalid persisted strings
- public store behavior, not just private SQL shape
## Helper Extraction
Good helpers:
- `readSqliteNumberPragma(db, pragma)` style helpers with a closed union for
PRAGMA names.
- Raw-expression helpers that accept Kysely expressions/refs instead of raw
column strings.
- Public query preset maps that preserve exact row types at the API boundary.
Avoid helpers that:
- Wrap obvious Kysely literals just to avoid strings.
- Take generic `string` table/column/order names.
- Return heavily generic query builders that are harder to type than the query
they hide.
## Performance
- Benchmark prepare/compile overhead before adding statement caches or compiled
query caches. Include the real public store method work: SQLite execution,
JSON/BLOB conversion, and result mapping.
- Keep caches local, close/dispose them with the owning store, and test invalid
or stale behavior. Clear builders are the default until numbers prove a hot
path.
## Avoid
- Do not introduce ORM/repository layers or hidden relation loading.
- Do not make root dependencies for plugin-only database needs.
- Do not migrate everything to raw SQL or everything to builders for purity.
- Do not hand-edit generated DB types.
- Do not hide finite query result shapes behind `Record<string, ...>` just to
make JSON output convenient; use exact row unions or map at the boundary.
- Do not replace every Kysely string literal with constants for aesthetics; fix
dynamic identifiers, raw SQL assertions, and public result boundaries instead.
- Do not add broad cache layers to hide repeated query/discovery work; carry the
known runtime fact earlier when possible.

View File

@@ -1,42 +0,0 @@
---
name: notcrawl
description: "Notion archive: search, sync freshness, pages/databases, Markdown exports, SQL counts, and Notcrawl repo work."
metadata:
openclaw:
homepage: https://github.com/openclaw/notcrawl
requires:
bins:
- notcrawl
install:
- kind: go
module: github.com/vincentkoc/notcrawl/cmd/notcrawl@latest
bins:
- notcrawl
---
# Notcrawl
Use local Notion archive data before browsing or live Notion API calls. Check freshness for recent/current questions:
```bash
notcrawl doctor
notcrawl status --json
```
Refresh only when stale or asked:
```bash
notcrawl sync --source desktop
notcrawl sync --source api
```
Query with bounded reads:
```bash
notcrawl search "query"
notcrawl databases
notcrawl report
notcrawl sql "select count(*) from pages;"
```
Report workspace/teamspace, page/database titles, absolute date spans, counts, and known gaps. Use read-only SQL only; never mutate the archive. API mode requires `NOTION_TOKEN`; do not assume token availability.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Notcrawl"
short_description: "Search local Notion archives and freshness"
default_prompt: "Use $notcrawl to search local Notion pages and databases, check freshness, inspect exports, and report exact date spans and source gaps."

View File

@@ -1,111 +0,0 @@
---
name: openclaw-changelog-update
description: Regenerate OpenClaw release changelog sections from git history before beta or stable releases.
---
# OpenClaw Changelog Update
Use this for release changelog rewrites and GitHub release-note source text.
This is mandatory before every beta, beta rerun, stable release, or stable
rerun. Use it with `release-openclaw-maintainer`; this skill owns changelog
content, ordering, grouping, and attribution discipline.
## Goal
Rewrite the target `CHANGELOG.md` version section from history, not from stale
draft notes. Produce grouped user-facing release notes sorted by user interest
while preserving every relevant issue/PR ref and every human `Thanks @...`
attribution.
## Inputs
- Target base version: `YYYY.M.D`, without beta suffix.
- Base tag: last reachable shipped release tag, usually the previous stable or
the previous beta train requested by the operator.
- Target ref: exact branch/SHA being released.
## Workflow
1. Start on `main` before branching when possible:
- `git fetch --tags origin`
- `git pull --ff-only`
- confirm clean `git status -sb`
2. Audit history, including direct commits:
- `git log --first-parent --date=iso-strict --pretty=format:'%h%x09%ad%x09%s' <base-tag>..<target-ref>`
- `git log --first-parent --grep='(#' --date=short --pretty=format:'%h%x09%ad%x09%s' <base-tag>..<target-ref>`
- also inspect `--since='24 hours ago'` when main moved during the release.
3. Read linked PRs/issues or diffs for ambiguous commits. Direct commits matter;
infer notes from subject, body, touched files, tests, and nearby commits.
4. Rewrite one stable-base section only:
- use `## YYYY.M.D`
- do not create beta-specific headings
- do not leave a stale `## Unreleased` section above the target release
- if `Unreleased` contains release-bound notes, fold them into the target
section instead of deleting them
5. Section shape:
- `### Highlights`: 5-8 bullets, broad user wins first
- `### Changes`: new capabilities and behavior changes
- `### Fixes`: user-facing fixes first, grouped by impact and surface
- group related changes/fixes by surface and user impact; avoid one bullet
per tiny commit when several commits tell one user-facing story
6. Preserve attribution:
- keep `#issue`, `(#PR)`, `Fixes #...`, and `Thanks @...`
- every human-authored merged PR represented by a user-facing entry needs
its PR ref and `Thanks @author`, even when the PR had no linked issue
- every human issue reporter for a `Fixes #...` or referenced bug issue
represented by a user-facing entry needs `Thanks @reporter` unless the
same handle is already thanked in that bullet
- every human `Co-authored-by` contributor on represented user-facing work
needs `Thanks @handle` when a GitHub handle is known
- when grouping multiple PRs/issues in one bullet, include every relevant
PR/issue ref and every human contributor handle in that same bullet
- multiple `Thanks @...` handles in one bullet are expected; do not drop or
collapse contributor credit just because the note is grouped
- if one grouped bullet covers both direct commits and PRs, keep all PR refs
and thanks, plus any issue refs from the direct commits
- before finalizing, audit the final release-note body:
- extract all `#NNN` refs from the notes
- resolve which refs are PRs and collect human PR authors
- resolve issue refs used as bug/report refs and collect human reporters
- scan represented commits for `Co-authored-by`
- compare those handles to the final `Thanks @...` set
- fix every missing human credit or explicitly record why it is omitted
- do not add GHSA references, advisory IDs, or security advisory slugs to
changelog entries or GitHub release-note text unless explicitly requested
- never thank bots, `@openclaw`, `@clawsweeper`, or `@steipete`
- do not use GitHub's release contributor count as the source of truth; the
changelog must carry the complete human credit set itself
7. Sorting preference:
- security/data-loss and content-boundary fixes
- transcript/replay/reply delivery correctness
- channels and mobile integrations
- providers/Codex/local model reliability
- install/update/release path reliability
- performance and observability
- docs and contributor-only/internal details last or omitted
8. Keep bullets single-line unless existing file style forces otherwise. Avoid
internal release-process noise unless it changes user install/update safety.
9. Check release-note side conditions:
- inspect `src/plugins/compat/registry.ts`
- inspect `src/commands/doctor/shared/deprecation-compat.ts`
- if any compatibility `removeAfter` is on/before release date, resolve it
or explicitly record the blocker before shipping
10. Validate and ship:
- `git diff --check`
- for docs/changelog-only changes, no broad tests are required
- commit with `scripts/committer "docs(changelog): refresh YYYY.M.D notes" CHANGELOG.md`
- push, pull/rebase if needed, then branch/rebase release from latest `main`
## Quota / API Outage Rule
If GitHub API quota is exhausted, do not idle. Continue work that does not need
GitHub API:
- local changelog rewrite and release-note extraction
- local pretag checks and package/build sanity
- git push/tag checks over git protocol
- npm registry `npm view` checks
- exact workflow-dispatch command preparation
Only GitHub Release creation, workflow dispatch, run polling, artifact download,
and issue/PR mutation need API quota.

View File

@@ -1,114 +0,0 @@
---
name: openclaw-debugging
description: Debug OpenClaw model, provider, tool-surface, code-mode, streaming, and live/Crabbox behavior by choosing the right logs, probes, and proof path before changing code.
---
# OpenClaw Debugging
Use this skill when OpenClaw behavior differs between local tests, live models,
providers, code mode, Tool Search, Crabbox, or CI, and the next move should be a
debug signal rather than a guess.
## Read First
- `docs/logging.md` for log files, `openclaw logs`, and targeted debug flags.
- `docs/reference/test.md` for local test commands.
- `docs/reference/code-mode.md` for code-mode exec/wait and tool catalog rules.
- Use `$openclaw-testing` for choosing test lanes.
- Use `$crabbox` for broad, Docker, package, Linux, live-key, or CI-parity proof.
## Default Loop
1. State the suspected boundary: config, tool construction, provider payload,
fetch, stream/SSE, transcript replay, worker/runtime, package/dist, or CI.
2. Add or enable the narrowest signal that proves that boundary.
3. Reproduce with the same provider/model/config. Do not randomly switch models
unless the model itself is the variable being tested.
4. Compare configured state with actual run activation.
5. Patch the root cause.
6. Rerun the exact failing probe, then broaden only if the contract requires it.
## Model Transport Logs
Use targeted env flags instead of global debug when the model request shape or
stream timing matters:
```bash
OPENCLAW_DEBUG_MODEL_TRANSPORT=1 openclaw gateway
OPENCLAW_DEBUG_MODEL_PAYLOAD=tools OPENCLAW_DEBUG_SSE=events openclaw gateway
OPENCLAW_DEBUG_MODEL_PAYLOAD=full-redacted OPENCLAW_DEBUG_SSE=peek openclaw gateway
```
Useful flags:
- `OPENCLAW_DEBUG_MODEL_TRANSPORT=1`: request start, fetch response, SDK
headers, first SSE event, stream done, and transport errors at `info`.
- `OPENCLAW_DEBUG_MODEL_PAYLOAD=summary`: bounded payload summary.
- `OPENCLAW_DEBUG_MODEL_PAYLOAD=tools`: all model-facing tool names.
- `OPENCLAW_DEBUG_MODEL_PAYLOAD=full-redacted`: capped, redacted JSON payload.
Use only while debugging; prompts/message text may still appear.
- `OPENCLAW_DEBUG_SSE=events`: first-event and stream-completion timing.
- `OPENCLAW_DEBUG_SSE=peek`: first five redacted SSE events.
- `OPENCLAW_DEBUG_CODE_MODE=1`: code-mode tool-surface diagnostics.
Watch logs with:
```bash
openclaw logs --follow
```
## Common Boundaries
- **Config vs activation:** config can be enabled while the run disables tools,
is raw, has an empty allowlist, or lacks model tool support. Check the actual
visible tools before enforcing provider payload invariants.
- **Tool surface:** inspect final model-visible tool names, not only the tool
registry or config. Code mode means exactly `exec` and `wait` only after it
actually activates.
- **Provider payload:** log fields, model id, service tier, reasoning, input
size, metadata keys, prompt-cache key presence, and tool names before SDK
call.
- **Fetch vs SSE:** fetch response proves HTTP headers arrived; first SSE event
proves provider body progress. A gap here is a stream/body/provider issue, not
tool execution.
- **Worker/dist:** run `pnpm build` when touching workers, dynamic imports,
package exports, lazy runtime boundaries, or published paths.
- **Live keys:** use the configured secret workflow for missing provider keys
before saying live proof is blocked. Env checks are presence-only; never print
secrets.
## Code Pointers
- Model payload + Responses stream:
`src/agents/openai-transport-stream.ts`
- Guarded fetch/timing:
`src/agents/provider-transport-fetch.ts`
- OpenAI/Codex provider wrappers:
`src/agents/pi-embedded-runner/openai-stream-wrappers.ts`
- Tool construction, Tool Search, code-mode activation:
`src/agents/pi-embedded-runner/run/attempt.ts`
- Code-mode runtime and worker:
`src/agents/code-mode.ts`
`src/agents/code-mode.worker.ts`
- Tool Search catalog:
`src/agents/tool-search.ts`
## Proof Choice
- Single helper/payload bug: local targeted Vitest.
- Docs/logging-only: `pnpm check:docs` and `git diff --check`.
- Worker/dist/lazy import/package surface: targeted tests plus `pnpm build`.
- Live provider/model behavior: same provider/model with debug flags and a real
key if available.
- Docker/package/Linux/CI-parity: `$crabbox`.
- CI failure: exact SHA, relevant job only, logs only after failure/completion.
## Output Habit
Report:
- boundary tested
- exact command/env shape, redacted
- observed signal, such as tool names or first SSE event timing
- fix location
- narrow proof and any remaining risk

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Debugging"
short_description: "Debug model, tool, stream, and live behavior"
default_prompt: "Use $openclaw-debugging to identify the right OpenClaw debug boundary, turn on targeted logs, and choose the narrowest local or Crabbox proof."

View File

@@ -1,64 +0,0 @@
---
name: openclaw-docker-e2e-authoring
description: "Author OpenClaw Docker E2E and live provider Docker lanes."
---
# OpenClaw Docker E2E Authoring
Use this when adding or changing Docker E2E lanes, release-path Docker tests,
or live-provider Docker proof.
## Lane Choice
- Deterministic Docker: fake the dependency/server and assert the exact runtime
contract crossing the boundary.
- Live Docker: use real provider credentials/model only when user-visible
behavior needs the real service.
- Prefer both when they prove different risks: deterministic for byte/payload
routing, live for actual provider behavior.
## Authoring Rules
- Test-only helpers live in `test/helpers` or `scripts/e2e/lib/<lane>/`, not
`src/**`, unless production imports them.
- Package-installed app runs from `/app`; mount only explicit harness/helper
paths read-only.
- Fake servers should log boundary requests as JSONL and clients should assert
the real dependency payload, not just process success.
- Add the package script and `scripts/lib/docker-e2e-scenarios.mjs` lane in the
same change.
- If a lane installs a plugin from npm, default the spec via env so published
and local override paths are both testable.
## Media And Vision
- Expected answer must exist only in pixels or provider output being tested.
- Use neutral filenames, neutral prompts, and no metadata leaks.
- Random bitmap/OCR tokens reuse the repo OCR-safe alphabet `24567ACEF` unless
the test owns a stronger glyph set.
- Make the expected answer unique per run when proving real image
understanding.
## `chat.send` E2E
- Require `chat.send` to return `status: "started"` and a string `runId`.
- Wait for completion with `agent.wait`.
- Assert final user-visible text via `chat.history` when event ordering is not
the behavior under test.
- Keep originating channel/account metadata only when the bug path needs queued
inbound/channel context.
## Verification
Run the smallest proof that covers the touched lane:
```bash
pnpm exec oxfmt --write <changed files>
node --check <new .mjs files>
bash -n <new .sh files>
node scripts/run-vitest.mjs test/scripts/docker-e2e-plan.test.ts
OPENCLAW_SKIP_DOCKER_BUILD=1 pnpm test:docker:<lane>
```
For real-provider lanes, run the matching live Docker script after deterministic
Docker is green. Finish with `$autoreview` before commit/PR.

View File

@@ -1,88 +0,0 @@
---
name: openclaw-ghsa-maintainer
description: "Inspect, patch, validate, publish, or confirm OpenClaw GHSA security advisories and private-fork state."
---
# OpenClaw GHSA Maintainer
Use this skill for repo security advisory workflow only. Keep general release work in `release-openclaw-maintainer`.
## Respect advisory guardrails
- Before reviewing or publishing a repo advisory, read `SECURITY.md`.
- Ask permission before any publish action.
- Treat this skill as GHSA-only. Do not use it for stable or beta release work.
## Fetch and inspect advisory state
Fetch the current advisory and the latest published npm version:
```bash
gh api /repos/openclaw/openclaw/security-advisories/<GHSA>
npm view openclaw version --userconfig "$(mktemp)"
```
Use the fetch output to confirm the advisory state, linked private fork, and vulnerability payload shape before patching.
## Verify private fork PRs are closed
Before publishing, verify that the advisory's private fork has no open PRs:
```bash
fork=$(gh api /repos/openclaw/openclaw/security-advisories/<GHSA> | jq -r .private_fork.full_name)
gh pr list -R "$fork" --state open
```
The PR list must be empty before publish.
## Prepare advisory Markdown and JSON safely
- Write advisory Markdown via heredoc to a temp file. Do not use escaped `\n` strings.
- Build PATCH payload JSON with `jq`, not hand-escaped shell JSON.
Example pattern:
```bash
cat > /tmp/ghsa.desc.md <<'EOF'
<markdown description>
EOF
jq -n --rawfile desc /tmp/ghsa.desc.md \
'{summary,severity,description:$desc,vulnerabilities:[...]}' \
> /tmp/ghsa.patch.json
```
## Apply PATCH calls in the correct sequence
- Do not set `severity` and `cvss_vector_string` in the same PATCH call.
- Use separate calls when the advisory requires both fields.
- Publish by PATCHing the advisory and setting `"state":"published"`. There is no separate `/publish` endpoint.
Example shape:
```bash
gh api -X PATCH /repos/openclaw/openclaw/security-advisories/<GHSA> \
--input /tmp/ghsa.patch.json
```
## Publish and verify success
After publish, re-fetch the advisory and confirm:
- `state=published`
- `published_at` is set
- the description does not contain literal escaped `\\n`
Verification pattern:
```bash
gh api /repos/openclaw/openclaw/security-advisories/<GHSA>
jq -r .description < /tmp/ghsa.refetch.json | rg '\\\\n'
```
## Common GHSA footguns
- Publishing fails with HTTP 422 if required fields are missing or the private fork still has open PRs.
- A payload that looks correct in shell can still be wrong if Markdown was assembled with escaped newline strings.
- Advisory PATCH sequencing matters; separate field updates when GHSA API constraints require it.
- Public hardening/no-publish comments and draft text should avoid raw commit hashes, PR titles/numbers, and fix-mechanism summaries. Prefer patched-version fields or release-only wording; keep SHAs, PRs, and implementation notes in internal evidence.

View File

@@ -1,165 +0,0 @@
---
name: openclaw-landable-bug-sweep
description: "Find or repair small high-confidence non-SDK-boundary OpenClaw bugfix PRs until five are landable."
---
# OpenClaw Landable Bug Sweep
Autonomous maintainer workflow for producing five landable OpenClaw bugfix PR URLs.
Use for broad issue/PR sweeps where the bar is high and the output is PRs, not notes.
Do not use for plugin SDK/API boundary work; those need separate architecture review.
## Target
Return exactly five PR URLs, each with:
- bug summary
- why the fix is low-risk
- proof: rebased-head local/Testbox/live commands or run IDs
- autoreview: clean result on the exact head being shown
- CI green on the exact pushed PR head
- issue/duplicate cleanup done or still pending
The five URLs may be existing PRs that were reviewed/fixed, or new PRs created from issues/clusters.
Do not present a PR URL to the maintainer until it has been refreshed on current `main`, left-tested, autoreviewed clean, pushed, and verified green in live GitHub CI.
If code, tests, changelog, PR body, or branch base changes after autoreview, rerun autoreview before showing the URL.
## Companion Skills
Use `$gitcrawl` for discovery/clustering, `$openclaw-pr-maintainer` for live GitHub mutation rules, `$github-author-context` when contributor trust matters, `$openclaw-testing` for proof choice, `$autoreview` before publishing/landing, and `$crabbox` for broad/E2E/live proof.
## Candidate Bar
Accept only when all are true:
- bug or paper cut, not feature/product/support/docs-only
- root cause is proven in current code
- dependency behavior checked via upstream docs/source/types when relevant
- production/runtime diff is small, ideally much smaller than 500 LOC and always below 500 LOC
- tests may be larger, but focused
- no new dependency
- no new config option
- no backward-incompatible behavior
- no security/product/owner-boundary decision needed
- no plugin SDK, public plugin API, or `src/plugin-sdk/**` boundary change
- no broad refactor smell
- focused proof is feasible
- branch can be rebased/refreshed and pushed, or a replacement PR can be created
Good examples:
- provider parameter mismatch proven against dependency/API contract
- CLI command diverges from adjacent command behavior
- narrow runtime state/serialization bug with failing test
- issue already fixed on current `main`, with proof and closeable duplicates
Reject:
- feature requests, new knobs, migrations, release work, workflow policy, support
- plugin SDK/API boundary changes, including compatibility shims, new SDK methods, SDK exports, or plugin-facing channel/provider seams
- auth/security boundary changes unless explicitly assigned
- bugs needing live credentials that are unavailable
- PRs with red CI unless you fix, rebase, push, and recheck them green
- PRs you only reviewed locally but did not refresh/push/check live
- PRs whose final head has not passed `$autoreview`
- fixes whose clean shape is a larger architecture move
- speculative reports without reproducible/provable cause
- UI/UX changes requiring product judgment
## Sweep Loop
1. Start clean:
- `git status -sb`
- `git pull --ff-only`
- verify branch is expected, usually `main`
2. Build candidate clusters:
- `gitcrawl` open issues/PRs, neighbors, and search
- live `gh issue/pr view`
- include PRs linked from issues and duplicates
3. For each cluster:
- read issue/PR body, comments, labels, linked refs, current source, adjacent tests
- suppress maintainer-owned queue noise unless it is the best fix path
- identify opener/author and preserve credit
- decide: `repair-existing-pr`, `create-new-pr`, `close-fixed-on-main`, `close-duplicate`, or `reject`
4. Prove before patching:
- failing test, focused repro, log/source proof, or dependency contract proof
- if already fixed on `main`, prove with current source/test/commit and close kindly
5. Patch:
- prefer existing PR when good and writable
- if unwritable or wrong shape, create own PR and preserve useful contributor credit
- if no PR exists, create one
- add regression test when it fits
- release-note context for user-facing fixes in PR body or commit message; credit human reporter/contributor when known
6. Review, refresh, and publish:
- rebase or otherwise refresh the PR branch on current `origin/main`
- resolve drift, including newly exposed CI failures, rather than counting the PR as ready
- do not add `CHANGELOG.md` during normal sweep PRs; release automation generates it from PRs and commits
- left-test the rebased head with the smallest meaningful local/Testbox/live command that proves the bug
- run `$autoreview` until no accepted/actionable findings remain before creating, updating, or presenting the PR URL
- create/update PR with real body and proof fields
- push the exact reviewed head
- verify live GitHub CI is green for that pushed head; do not count pending, red, dirty, conflicting, or externally blocked PRs in the five
7. Hygiene:
- close duplicates and fixed-on-main issues/PRs with proof as soon as you notice them during the sweep
- never mutate more than five associated items in one cluster without explicit confirmation
- comments must be kind, concrete, and include proof/PR/commit links
8. Repeat until five landable PR URLs are ready.
## PR Body Proof
Use the repo PR template. Include these exact labels:
```text
Behavior addressed:
Real environment tested:
Exact steps or command run after this patch:
Evidence after fix:
Observed result after fix:
What was not tested:
```
## Existing PR Rules
- Review code path beyond the diff before trusting it.
- If PR is good: rebase/refresh on current `main`, fix small issues, left-test, autoreview clean, push, and get CI green before showing or counting it.
- If PR is not good but has a useful idea: recreate locally, co-author when warranted, close original with thanks and explanation.
- If PR is duplicate or fixed on `main`: comment proof, close.
- If maintainer cannot push to contributor branch: create own branch/PR, preserve useful commits or credit.
- If CI turns red after local proof, treat that as normal work: inspect the failing job, fix or reject, rerun, and only count the PR once green.
## Output Ledger
Maintain a running ledger:
```text
accepted:
- PR URL:
source refs:
bug:
root cause:
fix:
risk:
rebase/head:
left-test:
autoreview:
CI:
credit/thanks:
cleanup:
rejected:
- ref:
reason:
closed:
- ref:
reason:
proof/comment:
```
Final answer:
- exactly five accepted PR URLs
- 2-4 sentence explainer per PR
- proof/CI state per PR
- closed duplicates/fixed-on-main refs
- current branch/status

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Landable Bug Sweep"
short_description: "Find five small non-SDK landable bugfix PRs"
default_prompt: "Use $openclaw-landable-bug-sweep to find or repair five small high-confidence non-SDK-boundary OpenClaw bugfix PRs and get them landable."

View File

@@ -1,164 +0,0 @@
---
name: openclaw-parallels-smoke
description: Run, rerun, debug, or interpret OpenClaw Parallels install, onboarding, gateway smoke, and upgrade checks.
---
# OpenClaw Parallels Smoke
Use this skill for Parallels guest workflows and smoke interpretation. Do not load it for normal repo work.
## Global rules
- Use the snapshot most closely matching the requested fresh baseline.
- Gateway verification in smoke runs should use `openclaw gateway status --deep --require-rpc` unless the stable version being checked does not support it yet.
- Stable `2026.3.12` pre-upgrade diagnostics may require a plain `gateway status --deep` fallback.
- Treat `precheck=latest-ref-fail` on that stable pre-upgrade lane as baseline, not automatically a regression.
- Pass `--json` for machine-readable summaries.
- Per-phase logs land under `.artifacts/parallels/openclaw-parallels-*` by default. Override with `OPENCLAW_PARALLELS_ARTIFACT_ROOT` when a run needs another artifact volume.
- Do not run local and gateway agent turns in parallel on the same fresh workspace or session.
- Hard-cap every top-level Parallels lane with host `timeout --foreground` (or `gtimeout --foreground` if that is the available binary) so a stalled install, snapshot switch, or `prlctl exec` transport cannot consume the rest of the testing window. Defaults:
- macOS: `75m`
- Linux: `75m`
- Windows: `90m`
- aggregate npm-update wrapper: `150m`
If a lane hits the cap, stop there, inspect the newest `/tmp/openclaw-parallels-*` run directory and phase log, then fix or rerun the smallest affected lane. Do not keep waiting on a capped lane.
- Actual OpenClaw npm install/update phases are a stricter signal than whole-lane caps: install phases should normally finish within 7 minutes, and update phases should normally show meaningful progress within 5 minutes. If a phase named `install-main`, `install-latest`, `install-baseline`, or `install-baseline-package` exceeds 420s, or a phase named `update-dev` / same-guest `openclaw update` exceeds 300s without new markers, start diagnosis from that phase log and guest process state. Current Windows update phases can still pass after roughly 10-15 minutes because `doctor --fix` may install bundled plugin runtime deps; keep the script hard cap near 20 minutes unless the log is truly stale.
- For a full OS matrix, prefer running independent guest-family lanes in parallel when host capacity allows:
- `timeout --foreground 75m pnpm test:parallels:macos -- --json`
- `timeout --foreground 90m pnpm test:parallels:windows -- --json`
- `timeout --foreground 75m pnpm test:parallels:linux -- --json`
Keep each lane in its own shell/session and track the run directory for each one. Before starting the matrix, run any required host build/package gate to completion. When current-main tgz packaging is needed, the smoke scripts hold a shared package lock through `pnpm build`, inventory/staging, and `npm pack`; if that lock is missing or broken, serialize the matrix instead of accepting concurrent `dist` mutation.
- Do not run multiple smoke lanes against the same guest family at once. Tahoe lanes share the host HTTP port, and Windows/Linux lanes can collide on snapshot restore/start state if two jobs touch the same VM concurrently.
- Do not run the aggregate `pnpm test:parallels:npm-update` wrapper in parallel with individual macOS/Windows/Linux smoke lanes; it touches the same guest families and snapshots.
- Do not start Parallels lanes while any unrelated host command may rebuild, clean, or restage `dist` (`pnpm build`, `pnpm ui:build`, `pnpm release:check`, `pnpm test:install:smoke`, npm pack/install smoke, or Docker lanes that run package/build prep). Run unrelated build/package gates first, let them finish, then start the VM matrix. Concurrent `dist` mutation can make host `npm pack` fail with missing files and wastes a full VM cycle.
- While running or optimizing the matrix, record wall-clock duration per lane and the slowest phase from `/tmp/openclaw-parallels-*` logs. Use that timing before changing smoke order, timeouts, or helper behavior.
- If a host build changes tracked generated files such as `src/canvas-host/a2ui/.bundle.hash`, stop before spending VM time. Commit the generated artifact separately or fix the generator drift, then rerun the smallest affected lane.
- If `main` is moving under active multi-agent work, prefer a detached worktree pinned to one commit for long Parallels suites. The smoke scripts now verify the packed tgz commit instead of live `git rev-parse HEAD`, but a pinned worktree still avoids noisy rebuild/version drift during reruns.
- For `openclaw update --channel dev` lanes, remember the guest clones GitHub `main`, not your local worktree. If a local fix exists but the rerun still fails inside the cloned dev checkout, do not treat that as disproof of the fix until the branch has been pushed.
- For `prlctl exec`, pass the VM name before `--current-user` (`prlctl exec "$VM" --current-user ...`), not the other way around.
- If the workflow installs OpenClaw from a repo checkout instead of the site installer/npm release, finish by installing a real guest CLI shim and verifying it in a fresh guest shell. `pnpm openclaw ...` inside the repo is not enough for handoff parity.
- On macOS guests, prefer a user-global install plus a stable PATH-visible shim:
- install with `NPM_CONFIG_PREFIX="$HOME/.npm-global" npm install -g .`
- make sure `~/.local/bin/openclaw` exists or `~/.npm-global/bin` is on PATH
- verify from a brand-new guest shell with `which openclaw` and `openclaw --version`
## npm install then update
- Preferred entrypoint: `pnpm test:parallels:npm-update`
- For a macOS-only published release update check, use:
- `timeout --foreground 75m pnpm test:parallels:npm-update -- --platform macos --package-spec openclaw@<old-version> --update-target <target-version-or-tag> --json`
This keeps the same-guest `openclaw update --tag ...` coverage and uses the shared macOS current-user/sudo fallback without starting Windows/Linux lanes.
- Required coverage: every release/update regression run must include both lanes:
- fresh snapshot -> install requested package/baseline -> smoke
- same guest baseline -> run the guest's installed `openclaw update ...` command -> smoke again
- The update lane must exercise OpenClaw's internal updater. Do not count a direct `npm install -g <tgz-or-spec>` or harness-side package swap as update-flow coverage; those are install smokes only.
- For published targets, install the old baseline package first (for example `openclaw@2026.4.9`), then run the installed guest CLI with the intended channel/tag (for example `openclaw update --channel beta --yes --json`) and verify `openclaw --version`, `openclaw update status --json`, gateway RPC, and an agent turn after the command.
- For unpublished targets, pack the candidate on the host, serve the `.tgz` over the harness HTTP server, and point the guest updater at that served package. Prefer `openclaw update --tag http://<host-ip>:<port>/openclaw-<version>.tgz --yes --json`; when channel persistence also matters, pass `--channel <stable|beta>` and set `OPENCLAW_UPDATE_PACKAGE_SPEC` to the same served URL in the guest update environment. The command under test must still be `openclaw update`, not direct npm.
- For unpublished local-fix validation, remember the old baseline updater code still controls the first hop. A fix that lives only in the new updater code cannot change that already-running old process; the served candidate must either keep package/plugin metadata compatible with the baseline host or the baseline itself must include the updater fix.
- For beta/stable verification, resolve the tag immediately before the run (`npm view openclaw@beta version dist.tarball` or `npm view openclaw@latest ...`). Tags can move while a long VM matrix is already running; restart the matrix when the intended prerelease appears after an earlier registry 404/tag-lag check.
- Use the configured secret workflow to inject only the provider keys needed by OpenAI/Anthropic lanes. Do not print secrets or env dumps; pass provider secrets through the guest exec environment.
- Same-guest update verification should set the default model explicitly to `openai/gpt-5.4` before the agent turn and use a fresh explicit `--session-id` so old session model state does not leak into the check.
- The aggregate npm-update wrapper must resolve the Linux VM with the same Ubuntu fallback policy as `parallels-linux-smoke.sh` before both fresh and update lanes. Treat any Ubuntu guest with major version `>= 24` as acceptable when the exact default VM is missing, preferring the newest versioned Ubuntu guest with a fresh poweroff snapshot. On Peter's current host today, use `Ubuntu 26.04`.
- On macOS same-guest update checks, restart the gateway after the npm upgrade before `gateway status` / `agent`; launchd can otherwise report a loaded service while the old process has exited and the fresh process is not RPC-ready yet.
- The npm-update aggregate's macOS update leg writes the guest update script as root, then runs it as the desktop user. If `prlctl exec "$MACOS_VM" --current-user ...` cannot authenticate, retry through plain root `prlctl exec` plus `sudo -u <desktop-user> /usr/bin/env HOME=/Users/<desktop-user> USER=<desktop-user> LOGNAME=<desktop-user> PATH=/opt/homebrew/bin:/opt/homebrew/opt/node/bin:/usr/bin:/bin:/usr/sbin:/sbin ...`. That is a Parallels transport fallback; still verify `openclaw --version`, gateway RPC, and an agent turn after the update.
- On Windows same-guest update checks, restart the gateway after the npm upgrade before `gateway status` / `agent`; in-place global npm updates can otherwise leave stale hashed `dist/*` module imports alive in the running service.
- In those Windows same-guest update checks, do not treat one nonzero `openclaw gateway restart` as definitive failure. Current login-item restarts can report failure before the background service becomes observable again; follow with a longer RPC-ready wait and use `gateway start` only as a recovery step if readiness still never returns.
- After that Windows restart, do not trust one `gateway status --deep --require-rpc` call after a fixed sleep. Retry the RPC-ready probe for roughly 30 seconds and log each attempt; current guests can keep port `18789` bound while the fresh RPC endpoint is still coming up.
- For Windows same-guest update checks, prefer the done-file/log-drain PowerShell runner pattern over one long-lived `prlctl exec ... powershell -EncodedCommand ...` transport. The guest can finish successfully while the outer `prlctl exec` still hangs.
- The Windows same-guest update helper should write stage markers to its log before long steps like tgz download and `npm install -g` so the outer progress monitor does not sit on `waiting for first log line` during healthy but quiet installs.
- Linux same-guest update verification should also export `HOME=/root`, pass `OPENAI_API_KEY` via `prlctl exec ... /usr/bin/env`, and use `openclaw agent --local`; the fresh Linux baseline does not rely on persisted gateway credentials.
- The npm-update wrapper now prints per-lane progress from the nested log files. If a lane still looks stuck, inspect the nested logs in `runDir` first (`macos-fresh.log`, `windows-fresh.log`, `linux-fresh.log`, `macos-update.log`, `windows-update.log`, `linux-update.log`) instead of assuming the outer wrapper hung.
- Each run writes both `summary.json` and `summary.md`; read the markdown first for quick human triage, then the JSON/timings for automation.
- For full beta validation after a tag is published, prefer one command:
- `timeout --foreground 150m pnpm test:parallels:npm-update -- --beta-validation beta3 --json`
This resolves `beta3` to the latest `*-beta.3` version, runs latest->that-version same-guest update coverage, and then runs fresh install smoke for that exact published target on the same selected OS matrix. Use `--platform macos|windows|linux` to narrow reruns.
- For beta 4 npm validation with agent turns, the known-good shape is:
- `gtimeout --foreground 150m pnpm test:parallels:npm-update -- --beta-validation beta4 --model openai/gpt-5.4 --json`
Prefer the explicit `beta4` alias over `openclaw@beta` when validating a specific prerelease number; npm tags can move.
- If the wrapper fails a lane, read the auto-dumped tail first, then the full nested lane log under `.artifacts/parallels/openclaw-parallels-npm-update.*`.
- Current known macOS update-lane transport signature when the fallback is missing or bypassed: `Unable to authenticate the user. Make sure that the specified credentials are correct and try again.` Treat that as Parallels current-user authentication before blaming npm or OpenClaw.
- A macOS packaged fresh install with global package directories or bundled files mode `0777` usually means the harness used the root `prlctl exec` fallback under a permissive umask. The POSIX guest transports should prepend `umask 022`; verify the phase preflight line before blaming npm.
## CLI invocation footgun
- The Parallels smoke shell scripts should tolerate a literal bare `--` arg so `pnpm test:parallels:* -- --json` and similar forwarded invocations work without needing to call `bash scripts/e2e/...` directly.
## macOS flow
- Preferred entrypoint: `pnpm test:parallels:macos`
- `parallels-macos-smoke.sh --mode fresh --target-package-spec openclaw@<version>` is an install smoke only. For published old-version -> new-version update coverage on macOS, prefer the npm-update wrapper with `--platform macos`; `parallels-macos-smoke.sh --mode upgrade --target-package-spec ...` installs the target package and does not exercise the baseline CLI's updater.
- Default upgrade coverage on macOS should now include: fresh snapshot -> site installer pinned to the latest stable tag -> `openclaw update --channel dev` on the guest. Treat this as part of the default Tahoe regression plan, not an optional side quest.
- `parallels-macos-smoke.sh --mode upgrade` should run that release-to-dev lane by default. Keep the older host-tgz upgrade path only when the caller explicitly passes `--target-package-spec`.
- Because the default upgrade lane no longer needs a host tgz, skip `npm pack` + host HTTP server startup for `--mode upgrade` unless `--target-package-spec` is set. Keep the pack/server path for `fresh` and `both`.
- If that release-to-dev lane fails with `reason=preflight-no-good-commit` and repeated `sh: pnpm: command not found` tails from `preflight build`, treat it as an updater regression first. The fix belongs in the git/dev updater bootstrap path, not in Parallels retry logic.
- Until the public stable train includes that updater bootstrap fix, the macOS release-to-dev lane may seed a temporary guest-local `pnpm` shim immediately before `openclaw update --channel dev`. Keep that workaround scoped to the smoke harness and remove it once the latest stable no longer needs it.
- In Tahoe `prlctl exec --current-user` runs, prefer explicit `node .../openclaw.mjs ...` invocations for the release->dev handoff itself and for post-update verification. The shebanged global `openclaw` wrapper can fail with `env: node: No such file or directory`, and self-updating through the wrapper is a weaker lane than invoking the entrypoint under a fixed `node`.
- Default to the snapshot closest to `macOS 26.5 latest`.
- On Peter's Tahoe VM, `fresh-latest-march-2026` can hang in `prlctl snapshot-switch`; if restore times out there, rerun with `--snapshot-hint 'macOS 26.5 latest'` before blaming auth or the harness.
- `parallels-macos-smoke.sh` now retries `snapshot-switch` once after force-stopping a stuck running/suspended guest. If Tahoe still times out after that recovery path, then treat it as a real Parallels/host issue and rerun manually.
- The macOS smoke should include a dashboard load phase after gateway health: resolve the tokenized URL with `openclaw dashboard --no-open`, verify the served HTML contains the Control UI title/root shell, then open Safari and require an established localhost TCP connection from Safari to the gateway port.
- For Tahoe `fresh.gateway-status`, prefer non-TTY `prlctl exec --current-user ... openclaw gateway status ...` plus a few short retries. `prlctl enter` can spam TTY control bytes and hang the phase log even when the CLI itself is healthy.
- If a Tahoe lane times out in `fresh.first-agent-turn` and the phase log stops right after `__OPENCLAW_RC__:0` from `models set`, suspect the `prlctl enter` / `expect` wrapper before blaming auth or the model lane. That pattern means the first guest command finished but the transport never released for the next `guest_current_user_cli` call.
- If a packaged install regresses with `500` on `/`, `/healthz`, or `__openclaw/control-ui-config.json` after `fresh.install-main` or `upgrade.install-main`, suspect bundled plugin runtime deps resolving from the package root `node_modules` rather than `dist/extensions/*/node_modules`. Repro quickly with a real `npm pack`/global install lane before blaming dashboard auth or Safari.
- `prlctl exec` is fine for deterministic repo commands, but use the guest Terminal or `prlctl enter` when installer parity or shell-sensitive behavior matters.
- Multi-word `openclaw agent --message ...` checks should go through a guest shell wrapper (`guest_current_user_sh` / `guest_current_user_cli` or `/bin/sh -lc ...`), not raw `prlctl exec ... node openclaw.mjs ...`, or the message can be split into extra argv tokens and Commander reports `too many arguments for 'agent'`.
- The same wrapper rule applies when bypassing `--current-user`: write a tiny `/tmp/*.sh` on the guest and execute `/bin/bash /tmp/*.sh` through the sudo desktop-user environment. Do not pass `openclaw agent --message '...'` directly as one raw `prlctl exec` command.
- When ref-mode onboarding stores `OPENAI_API_KEY` as an env secret ref, the post-onboard agent verification should also export `OPENAI_API_KEY` for the guest command. The gateway can still reject with pairing-required and fall back to embedded execution, and that fallback needs the env-backed credential available in the shell.
- On the fresh Tahoe snapshot, `brew` exists but `node` may be missing from PATH in noninteractive exec. Use `/opt/homebrew/bin/node` when needed.
- Fresh host-served tgz installs should install as guest root with `HOME=/var/root`, then run onboarding as the desktop user via `prlctl exec --current-user`.
- Root-installed tgz smoke can log plugin blocks for world-writable `extensions/*`; do not treat that as an onboarding or gateway failure unless plugin loading is the task.
## Windows flow
- Preferred entrypoint: `pnpm test:parallels:windows`
- Use the snapshot closest to `pre-openclaw-native-e2e-2026-03-12`.
- Default upgrade coverage on Windows should now include: fresh snapshot -> site installer pinned to the requested stable tag -> `openclaw update --channel dev` on the guest. Keep the older host-tgz upgrade path only when the caller explicitly passes `--target-package-spec`.
- Optional exact npm-tag baseline on Windows: `bash scripts/e2e/parallels-windows-smoke.sh --mode upgrade --target-package-spec openclaw@<tag> --json`. That lane installs the published npm tarball as baseline, then runs `openclaw update --channel dev`.
- Optional forward-fix Windows validation: `bash scripts/e2e/parallels-windows-smoke.sh --mode upgrade --upgrade-from-packed-main --json`. That lane installs the packed current-main npm tgz as baseline, then runs `openclaw update --channel dev`.
- Always use `prlctl exec --current-user`; plain `prlctl exec` lands in `NT AUTHORITY\\SYSTEM`.
- Prefer explicit `npm.cmd` and `openclaw.cmd`.
- Use PowerShell only as the transport with `-ExecutionPolicy Bypass`, then call the `.cmd` shims from inside it.
- Current Windows Node installs expose `corepack` as a `.cmd` shim. If a release-to-dev lane sees `corepack` on PATH but `openclaw update --channel dev` still behaves as if corepack is missing, treat that as an exec-shim regression first.
- If an exact published-tag Windows lane fails during preflight with `npm run build` and `'pnpm' is not recognized`, remember that the guest is still executing the old published updater. Validate the fix with `--upgrade-from-packed-main`, then wait for the next tagged npm release before expecting the historical tag lane to pass.
- Multi-word `openclaw agent --message ...` checks should call `& $openclaw ...` inside PowerShell, not `Start-Process ... -ArgumentList` against `openclaw.cmd`, or Commander can see split argv and throw `too many arguments for 'agent'`.
- Windows installer/tgz phases now retry once after guest-ready recheck; keep new Windows smoke steps idempotent so a transport-flake retry is safe.
- If a Windows retry sees the VM become `suspended` or `stopped`, resume/start it before the next `prlctl exec`; otherwise the second attempt just repeats the same `rc=255`.
- Windows global `npm install -g` phases can stay quiet for a minute or more even when healthy; inspect the phase log before calling it hung, and only treat it as a regression once the retry wrapper or timeout trips.
- When those Windows global installs stay quiet, the useful progress often lives in the guest npm debug log, not the helper phase log. The smoke script now streams incremental `npm-cache/_logs/*-debug-0.log` deltas into the phase log during long baseline/package installs; read those lines before assuming the lane is stalled.
- The Windows baseline-package helpers now auto-dump the latest guest `npm-cache/_logs/*-debug-0.log` tail on timeout or nonzero completion. Read that tail in the phase log before opening a second guest shell.
- The same incremental npm-debug streaming also applies to `--upgrade-from-packed-main` / packaged-install baseline phases. A phase log that still says only `install.start`, `install.download-tgz`, `install.install-tgz` can still be healthy if the streamed npm-debug section shows registry fetches or bundled-plugin postinstall work.
- Fresh Windows tgz install phases should also use the background PowerShell runner plus done-file/log-drain pattern; do not rely on one long-lived `prlctl exec ... powershell ... npm install -g` transport for package installs.
- Windows release-to-dev helpers should log `where pnpm` before and after the update and require `where pnpm` to succeed post-update. That proves the updater installed or enabled `pnpm` itself instead of depending on a smoke-only bootstrap.
- Fresh Windows ref-mode onboard should use the same background PowerShell runner plus done-file/log-drain pattern as the npm-update helper, including startup materialization checks, host-side timeouts on short poll `prlctl exec` calls, and retry-on-poll-failure behavior for transient transport flakes.
- Fresh Windows daemon-health reachability should use `openclaw gateway probe --json` with a longer timeout and treat `ok: true` as success; full `gateway status --require-rpc` checks are too eager during initial startup on current main.
- Fresh Windows ref-mode agent verification should set `OPENAI_API_KEY` in the PowerShell environment before invoking `openclaw.cmd agent`, for the same pairing-required fallback reason as macOS.
- The standalone Windows upgrade smoke lane should stop the managed gateway after `upgrade.install-main` and before `upgrade.onboard-ref`. Restarting before onboard can leave the old process alive on the pre-onboard token while onboard rewrites `~/.openclaw/openclaw.json`, which then fails `gateway-health` with `unauthorized: gateway token mismatch`.
- If standalone Windows upgrade fails with a gateway token mismatch but `pnpm test:parallels:npm-update` passes, trust the mismatch as a standalone ref-onboard ordering bug first; the npm-update helper does not re-run ref-mode onboard on the same guest.
- Keep onboarding and status output ASCII-clean in logs; fancy punctuation becomes mojibake in current capture paths.
- If you hit an older run with `rc=255` plus an empty `fresh.install-main.log` or `upgrade.install-main.log`, treat it as a likely `prlctl exec` transport drop after guest start-up, not immediate proof of an npm/package failure.
## Linux flow
- Preferred entrypoint: `pnpm test:parallels:linux`
- Use the newest versioned Ubuntu guest with a fresh poweroff snapshot. On Peter's host today, that is `Ubuntu 26.04`.
- If an exact requested Ubuntu VM is missing on the host, any Ubuntu guest with major version `>= 24` is acceptable; prefer the newest versioned Ubuntu guest over older fallback snapshots.
- Use plain `prlctl exec`; `--current-user` is not the right transport on this snapshot.
- Fresh snapshots may be missing `curl`, and `apt-get update` can fail on clock skew. Bootstrap with `apt-get -o Acquire::Check-Date=false update` and install `curl ca-certificates`.
- Fresh `main` tgz smoke still needs the latest-release installer first because the snapshot has no Node or npm before bootstrap.
- This snapshot does not have a usable `systemd --user` session; managed daemon install is unsupported.
- The Linux smoke now falls back to a manual `setsid openclaw gateway run --bind loopback --port 18789 --force` launch with `HOME=/root` and the provider secret exported, then verifies `gateway status --deep --require-rpc` when available.
- The Linux manual gateway launch should wait for `gateway status --deep --require-rpc` inside the `gateway-start` phase; otherwise the first status probe can race the background bind and fail a healthy lane.
- If Linux gateway bring-up fails, inspect `/tmp/openclaw-parallels-linux-gateway.log` in the guest phase logs first; the common failure mode is a missing provider secret in the launched gateway environment.
## Discord roundtrip
- Discord roundtrip is optional and should be enabled with:
- `--discord-token-env`
- `--discord-guild-id`
- `--discord-channel-id`
- After a successful Discord smoke/roundtrip, shut down the guest VM before handoff (`prlctl stop "$VM_NAME"` or the concrete VM name). The macOS smoke harness should do this automatically after successful Discord proof; still stop the VM manually after ad-hoc Discord checks. Do not leave the Discord-configured guest running; it can keep reading/posting in `#maintainer` and spam Discord after the proof is complete.
- Keep the Discord token only in a host env var.
- Use installed `openclaw message send/read`, not `node openclaw.mjs message ...`.
- Set `channels.discord.guilds` as one JSON object, not dotted config paths with snowflakes.
- Avoid long `prlctl enter` or expect-driven Discord config scripts; prefer `prlctl exec --current-user /bin/sh -lc ...` with short commands.
- For a narrower macOS-only Discord proof run, the existing `parallels-discord-roundtrip` skill is the deep-dive companion.

View File

@@ -1,271 +0,0 @@
---
name: openclaw-pr-maintainer
description: Use immediately for any pasted OpenClaw GitHub issue or PR URL/number, and for OpenClaw issue/PR review, triage, duplicate search, opener identity/who wrote it, author account age/activity, comments, labels, close, land, or maintainer evidence checks.
---
# OpenClaw PR Maintainer
Use this skill for maintainer-facing GitHub workflow, not for ordinary code changes.
## Start issue and PR triage with gitcrawl
- Use `$gitcrawl` first anytime you inspect OpenClaw issues or PRs.
- Check local `gitcrawl` data first for related threads, duplicate attempts, and already-landed fixes.
- Use `gitcrawl` for candidate discovery and clustering; use `gh`, `gh api`, and the current checkout to verify live state before commenting, labeling, closing, or landing.
- If `gitcrawl` is missing, stale, lacks the target thread, or has no embeddings for neighbor/search commands, fall back to the GitHub search workflow below.
- Do not run expensive/update commands such as `gitcrawl sync --include-comments`, future enrichment commands, or broad reclustering unless the user asked to update the local store or stale data is blocking the decision.
Common read-only path:
```bash
gitcrawl threads openclaw/openclaw --numbers <issue-or-pr-number> --include-closed --json
gitcrawl neighbors openclaw/openclaw --number <issue-or-pr-number> --limit 12 --json
gitcrawl search openclaw/openclaw --query "<scope or title keywords>" --mode hybrid --json
gitcrawl cluster-detail openclaw/openclaw --id <cluster-id> --member-limit 20 --body-chars 280 --json
```
## Claim specific review targets
When a maintainer asks Codex to review, triage, fix, or land a specific OpenClaw issue/PR, check assignment before deep work.
- Identify the requesting maintainer's GitHub login. In this environment, default Peter to `steipete`; if another maintainer is clearly the requester, use that maintainer's bare login.
- Read current assignees with live `gh issue view` / `gh pr view`; `gitcrawl` is not enough for assignment state.
- If unassigned, assign the requester before deep review. This is allowed for specific requested targets; do not auto-assign broad discovery candidates or shortlists.
- If assigned to someone else, say so clearly before analysis and include assignment age:
- fresh: assigned within 6h; treat as actively owned unless user explicitly asks to continue or reassign
- stale: assigned 6h+ ago; treat as ownership hint, not a hard block; continue only with that caveat
- If assigned to requester plus others, mention co-assignees and continue.
- If assignment event time is unavailable, say `assigned, time unknown`; treat as assigned, not stale.
- Never remove or replace assignees unless explicitly asked.
Assignment time proof:
```bash
gh api "repos/openclaw/openclaw/issues/<number>/timeline" --paginate \
-H "Accept: application/vnd.github+json" \
--jq '[.[] | select(.event=="assigned") | {assignee:.assignee.login, assigner:.assigner.login, actor:.actor.login, created_at}]'
```
Use the newest `assigned` event for each current assignee. Issue timeline events expose `created_at`; GitHub GraphQL `AssignedEvent.createdAt` is also valid when REST pagination is awkward.
Claim command for issues or PRs:
```bash
gh api -X POST "repos/openclaw/openclaw/issues/<number>/assignees" -f 'assignees[]=<login>' >/dev/null
```
## Surface opener identity
- For every reviewed, triaged, closed, or landed issue/PR, show the opener's human name when available, GitHub login, and account age.
- Get the login from `gh issue view` / `gh pr view` (`author.login`), then fetch profile metadata once with `gh api users/<login> --jq '{login,name,created_at,type}'`.
- Report opener identity as one compact line:
`By: Jane Doe (@jane, acct 2021-04-03) | OpenClaw: 4 PRs, 2 issues, 11 commits/12mo | GitHub: 9 repos, 86 commits, 9 PRs, 3 issues, 12 reviews`
- Always show recent activity in two lanes: OpenClaw-local PRs, issues, and commits in the last 12 months; and general public GitHub activity over the same window. For linked issue-fixing PRs, include both the PR author and issue opener when they differ.
- Prefer the bundled helper for activity lookups:
```bash
.agents/skills/openclaw-pr-maintainer/scripts/github-activity.sh <login> [other-login...]
.agents/skills/openclaw-pr-maintainer/scripts/github-activity.sh --global <login>
```
- The helper reports repo-local activity first and can fetch public GitHub contribution totals for the same window with `--global`; run the global form by default for review/triage identity summaries.
- If the global contribution graph reports zero or looks inconsistent with visible public activity, sanity-check with `gh api users/<login>`, `gh api 'users/<login>/events/public?per_page=100'`, and recent public repo commits before calling the account inactive.
- The helper is intentionally cache-friendly for gitcrawl-backed `gh`: it rounds repo-local windows to the UTC day, rounds global contribution windows to the UTC hour, and counts PRs/issues from one paginated issues response before fetching commits separately. Prefer reusing the helper instead of hand-rolling several `gh api` loops.
- If the contribution graph is misleading or zero but public events/repos show activity, keep it one line, for example:
`By: pickaxe (@ProspectOre, acct 2019-08-24) | OpenClaw: 5 PRs, 0 issues, 5 commits/12mo | GitHub: 5 repos, 29 recent events, 100 public own-repo commits; graph=0`
- If `name` is empty, use the login only. If profile lookup is rate-limited or unavailable, say `account age unknown` rather than omitting the opener.
- Use identity and activity as triage signal, not proof by itself: new, low-activity, or bot-like accounts can raise review caution, but code, repro, and CI evidence still decide.
## Suppress top-maintainer items in issue triage
When asked for issue triage, hot issues, pressing bugs, Discord-correlated issues, or "what is still open", do not surface issues or PRs authored by top maintainers by default. Prefer external/user-reported hot issues and external PRs, not maintainer-owned work queues.
Suppress by default when the opener/author is one of:
- `@vincentkoc`
- `@Takhoffman`
- `@gumadeiras`
- `@obviyus`
- `@shakkernerd`
- `@mbelinky`
- `@joshavant`
- `@ngutman`
- `@vignesh07`
- `@huntharo`
Also suppress lower-priority maintainer-owned noise from the broader keep/top-maintainer group unless it is directly relevant:
- `@thewilloftheshadow`
- `@onutc` / `@osolmaz`
- `@jacobtomlinson`
- `@tyler6204`
- `@velvet-shark`
- `@jalehman`
- `@frankekn`
- `@ImLukeF`
- `@mcaxtr`
Exceptions:
- Show maintainer-authored items when the requester explicitly asks for maintainer PRs/issues, PR landing candidates, release-blocking maintainer work, or a specific PR/issue number.
- Show a maintainer-authored item when it is the canonical fix for an external hot issue, but frame it as the fix path rather than as a user-facing issue candidate.
- Do not close, label, or deprioritize solely because an item is maintainer-authored; this section only controls what appears in triage shortlists.
## Apply close and triage labels correctly
- If an issue or PR matches an auto-close reason, apply the label and let `.github/workflows/auto-response.yml` handle the comment/close/lock flow.
- Do not manually close plus manually comment for these reasons.
- If an issue/PR is already fixed on current `main` or solved by a new release, comment with proof plus the canonical commit/PR/release, then close it.
- `r:*` labels can be used on both issues and PRs.
- Current reasons:
- `r: skill`
- `r: support`
- `r: no-ci-pr`
- `r: too-many-prs`
- `r: testflight`
- `r: third-party-extension`
- `r: moltbook`
- `r: spam`
- `invalid`
- `dirty` for PRs only
## Select small high-confidence triage candidates
When asked for `X` issues or PRs to triage, `X` means qualified candidates, not sampled threads.
Issue triage is review/prove/patch-local by default:
1. Review the issue body, comments, related threads, current code, and adjacent tests.
2. Fix only issues that are easy, high-confidence, and narrowly owned by the implicated path.
3. Add focused regression proof when practical.
4. Stop with the dirty diff, touched files, and test/gate output for maintainer review.
5. After maintainer approval to ship, make one commit per accepted fix, with release-note context in the PR body or commit message when user-facing.
6. Pull/rebase, push, then comment and close only the issues that were fixed or explicitly triaged closed.
Do not batch unrelated issue fixes into one commit. Do not publish, comment, close, or label during the review/prove phase.
Missing `CHANGELOG.md` is not a PR review finding or merge blocker. If landing/fixing a user-visible change, make sure the PR body or commit message captures the release-note context; never ask or block solely on it.
Only list candidates that pass all gates:
- small owner/surface, with a likely narrow fix and focused regression test
- symptom is reproducible or provable with logs, failing test, live command, dependency contract, or current-main behavior
- root cause is traceable to code with file/line and the proposed fix touches that path
- no strong smell that a broader refactor, ownership rethink, migration, or product decision is the better fix
- dependency-backed behavior checked against upstream docs/source/types; live or web proof used when local proof is insufficient
Loop:
1. Use `gitcrawl` / `gh` to gather candidate clusters.
2. Read issue/PR body, comments, current code, adjacent tests, and dependency contracts.
3. Try focused repro or proof.
4. Reject unclear, stale, speculative, broad-refactor, or owner-ambiguous items.
5. Continue until `X` qualified candidates or the bounded search is exhausted.
Output only qualifying candidates, with: ref, surface, proof, cause, fix sketch, why small, expected test/gate. If none qualify, say so; do not pad.
## Structure PR review output
- Start every PR review with 1-3 plain sentences explaining what the change does and why it matters. Put this before `Findings`.
- Then list findings first. If none, say `No blocking findings` or `No findings`.
- Show size near the top as `LOC: +<additions>/-<deletions> (<changedFiles> files)`, using live PR stats or local diff stats.
- Always answer: bug/behavior being fixed, PR/issue URL and affected surface, provenance for regressions when traceable, and best-fix verdict.
- For bug/regression fixes, include a compact `Provenance:` line after cause/root-cause when a bounded history pass can identify it. Use `git log -S/-G`, `git blame`, linked PRs/issues, and tests.
- Provenance must separate roles when they differ: blamed code author username, blamed PR author username, blamed PR merger/committer username, automerge trigger when known, current PR author username, PR number, and date. Do not collapse them into one "introduced by" actor.
- If the blamed PR was merged by `clawsweeper[bot]` or another automation, identify the human trigger when practical. Check live PR timeline/comments first; if rate-limited, use gitcrawl/cache or public PR HTML. Look for maintainer command comments such as `@clawsweeper automerge`, `/landpr`, labels/events that armed automerge, and ClawSweeper status comments. Report `automerge triggered by @login`; if not found, say trigger unknown rather than naming the bot as the human decision-maker.
- For any confirmed bug, run `git blame` on the implicated line(s) after identifying the root cause. Report who broke it as the blamed PR merger/committer, and also name the blamed code author. Include the PR number. If no PR is traceable, use the blamed commit as the provenance: commit SHA, date, and author username. Do not guess a merger or frame missing PR metadata as a separate finding.
- Phrase provenance as `introduced by`, `made visible by`, or `carried forward by`, with confidence (`clear`, `likely`, `unknown`). If unclear, say what evidence is missing instead of guessing. For features, docs, and refactors, use `Provenance: N/A` or omit it when no broken behavior is being fixed.
- Keep summaries compact, but include enough proof that the verdict is auditable without rereading the PR.
LOC proof:
```bash
gh pr view <number> --json additions,deletions,changedFiles \
--jq '"LOC: +\(.additions)/-\(.deletions) (\(.changedFiles) files)"'
```
## Read beyond the diff
- Review the surrounding code path, not just changed lines. Open the caller, callee, data contracts, adjacent tests, and owner module.
- For large-codebase PRs, sample enough related files to understand the runtime boundary before deciding. Default to more code reading when the change touches agents, gateway, plugins, auth, sessions, process, config, or provider/runtime seams.
- Compare the PR against current `origin/main` behavior. Check whether recent main already changed the same surface.
- Dependency-backed behavior: MUST read upstream docs/source/types before judging API use, defaults, output shapes, errors, timeouts, memory behavior, or compatibility. Do not assume dependency contracts from memory or PR text.
- Judge solution quality, not only correctness. Ask whether the PR is the clean owner-boundary fix or a wart/workaround that should be replaced by a small refactor, moved seam, contract change, or deletion of duplicate logic.
- Mention the main files read when the verdict depends on code-path evidence.
## Enforce the bug-fix evidence bar
- Never merge a bug-fix PR based only on issue text, PR text, or AI rationale.
- Whenever feasible, use Crabbox (`$crabbox`) for end-to-end verification before
commenting that a bug is unreproducible, closing an issue, or opening/landing
a fix PR. Prefer a real packaged/Docker/live lane that exercises the reported
user flow over unit-only proof.
- Before landing, require:
1. symptom evidence such as a repro, logs, or a failing test
2. a verified root cause in code with file/line
3. blame-backed provenance for regressions when traceable, including blamed PR merger and automerge trigger when known, or commit SHA/date when no PR is traceable
4. a fix that touches the implicated code path
5. a regression test when feasible, or explicit manual verification plus a reason no test was added
- If the claim is unsubstantiated or likely wrong, request evidence or changes instead of merging.
- If the linked issue appears outdated or incorrect, correct triage first. Do not merge a speculative fix.
- If Crabbox/E2E proof is blocked, say exactly why and use the closest available
local, Docker, mocked, or targeted proof. Do not present unit tests as real
behavior proof.
## Close low-signal manual PRs carefully
- Do not close for red CI alone. Require a clear low-signal category plus stale or failed validation.
- Good manual-close categories:
- blank or mostly untouched PR template with no concrete OpenClaw problem/fix
- random docs-only churn such as root README translations, generic wording tweaks, or community-plugin discoverability docs that should go through ClawHub
- test-only coverage without a linked bug, owner request, or behavior change
- refactor-only cleanup, variable renames, formatting, or generated/baseline churn without maintainer request
- third-party channel/provider/tool/skill/plugin work that belongs on ClawHub instead of core
- risky ops/infra drive-bys such as new external CI services, release workflows, host upgrade scripts, Docker base migrations, or apt retry/fix-missing tweaks without owner request and green validation
- dirty branches where a narrow stated change includes unrelated docs/generated/runtime/extension files
- repeated bot-review spam or copied bot output without author-owned fixes
- Keep or escalate plausible focused bug fixes, green PRs, active maintainer discussions, assigned work, recent author follow-up, and unique reproduction details.
- For third-party capabilities, prefer the `r: third-party-extension` auto-response label when it applies; it points contributors to publish on ClawHub.
## Handle GitHub text safely
- For issue comments and PR comments, use literal multiline strings or `-F - <<'EOF'` for real newlines. Never embed `\n`.
- Do not use `gh issue/pr comment -b "..."` when the body contains backticks or shell characters. Prefer a single-quoted heredoc.
- Do not wrap issue or PR refs like `#24643` in backticks when you want auto-linking.
- PR landing comments should include clickable full commit links for landed and source SHAs when present.
## Search broadly before deciding
- Prefer `gitcrawl` first. Then use targeted GitHub keyword search to verify gaps, live status, comments, and candidates not present in the local store.
- Use `--repo openclaw/openclaw` with `--match title,body` first when using `gh search`.
- Add `--match comments` when triaging follow-up discussion or closed-as-duplicate chains.
- Do not stop at the first 500 results when the task requires a full search.
Examples:
```bash
gh search prs --repo openclaw/openclaw --match title,body --limit 50 -- "auto-update"
gh search issues --repo openclaw/openclaw --match title,body --limit 50 -- "auto-update"
gh search issues --repo openclaw/openclaw --match title,body --limit 50 \
--json number,title,state,url,updatedAt -- "auto update" \
--jq '.[] | "\(.number) | \(.state) | \(.title) | \(.url)"'
```
## Follow PR review and landing hygiene
- Never mention release-note bookkeeping in review-only output. It is landing
or release-generation mechanics, not a correctness finding.
- If bot review conversations exist on your PR, address them and resolve them yourself once fixed.
- Leave a review conversation unresolved only when reviewer or maintainer judgment is still needed.
- Before landing any PR with non-trivial code changes, run `$autoreview` until no accepted/actionable findings remain, unless equivalent manual review already covered it, the change is trivial/docs-only, or the user opts out.
- When landing or merging any PR, follow the global `/landpr` process.
- Use `scripts/committer "<msg>" <file...>` for scoped commits instead of manual `git add` and `git commit`.
- Keep commit messages concise and action-oriented.
- Group related changes; avoid bundling unrelated refactors.
- Use `.github/pull_request_template.md` for PR submissions and `.github/ISSUE_TEMPLATE/` for issues.
- Do not commit PR-only artifacts such as screenshots under `.github/pr-assets`; attach them to the PR/comment or use an external artifact store instead.
## Extra safety
- If a close or reopen action would affect more than 5 PRs, ask for explicit confirmation with the exact count and target query first.
- `sync` means: if the tree is dirty, commit all changes with a sensible Conventional Commit message, then `git pull --rebase`, then `git push`. Stop if rebase conflicts cannot be resolved safely.

View File

@@ -1,178 +0,0 @@
#!/usr/bin/env bash
set -euo pipefail
repo="openclaw/openclaw"
months="12"
include_global="0"
usage() {
printf 'Usage: %s [--repo owner/repo] [--months N] [--global] <github-login> [login...]\n' "$0"
}
die() {
printf 'error: %s\n' "$*" >&2
exit 1
}
need() {
command -v "$1" >/dev/null 2>&1 || die "missing required command: $1"
}
date_utc_relative_months() {
local count="$1"
if date -u -v-"${count}"m +%Y-%m-%dT00:00:00Z >/dev/null 2>&1; then
date -u -v-"${count}"m +%Y-%m-%dT00:00:00Z
return
fi
date -u -d "${count} months ago" +%Y-%m-%dT00:00:00Z
}
date_to_epoch() {
local value="$1"
if date -u -j -f '%Y-%m-%dT%H:%M:%SZ' "$value" +%s >/dev/null 2>&1; then
date -u -j -f '%Y-%m-%dT%H:%M:%SZ' "$value" +%s
return
fi
date -u -d "$value" +%s
}
rough_age() {
local created_at="$1"
local now_s created_s days
now_s=$(date -u +%s)
created_s=$(date_to_epoch "$created_at")
days=$(( (now_s - created_s) / 86400 ))
if (( days < 120 )); then
printf '~%dd old' "$days"
return
fi
awk -v days="$days" 'BEGIN { printf "~%.1fy old", days / 365.2425 }'
}
thread_kinds() {
local login="$1"
local since_ts="$2"
gh api --paginate "repos/${repo}/issues?state=all&creator=${login}&since=${since_ts}&per_page=100" \
--jq ".[] | select(.created_at >= \"${since_ts}\") | if has(\"pull_request\") then \"pr\" else \"issue\" end"
}
count_kind_lines() {
local kind="$1"
local lines="$2"
grep -cx "$kind" <<<"$lines" 2>/dev/null || true
}
count_commits() {
local login="$1"
local since_ts="$2"
gh api --paginate "repos/${repo}/commits?author=${login}&since=${since_ts}&per_page=100" \
--jq '.[].sha' | wc -l | tr -d '[:space:]'
}
global_activity() {
local login="$1"
local since_ts="$2"
local now_ts="$3"
# shellcheck disable=SC2016
gh api graphql \
-f login="$login" \
-f from="$since_ts" \
-f to="$now_ts" \
-f query='
query($login: String!, $from: DateTime!, $to: DateTime!) {
user(login: $login) {
contributionsCollection(from: $from, to: $to) {
totalCommitContributions
totalIssueContributions
totalPullRequestContributions
totalPullRequestReviewContributions
}
}
}' \
--jq '.data.user.contributionsCollection // empty'
}
while [[ $# -gt 0 ]]; do
case "$1" in
--repo)
[[ $# -ge 2 ]] || die "--repo requires owner/repo"
repo="$2"
shift 2
;;
--months)
[[ $# -ge 2 ]] || die "--months requires a positive integer"
months="$2"
[[ "$months" =~ ^[0-9]+$ && "$months" != "0" ]] || die "--months must be a positive integer"
shift 2
;;
--global)
include_global="1"
shift
;;
-h|--help)
usage
exit 0
;;
--)
shift
break
;;
-*)
die "unknown option: $1"
;;
*)
break
;;
esac
done
[[ $# -gt 0 ]] || {
usage >&2
exit 2
}
need gh
need jq
since_ts=$(date_utc_relative_months "$months")
now_ts=$(date -u +%Y-%m-%dT%H:00:00Z)
for login in "$@"; do
profile=$(gh api "users/${login}" --jq '{login,name,created_at,type}')
display_login=$(jq -r '.login' <<<"$profile")
name=$(jq -r '.name // empty' <<<"$profile")
created_at=$(jq -r '.created_at' <<<"$profile")
type=$(jq -r '.type' <<<"$profile")
created_day=${created_at%%T*}
kinds=$(thread_kinds "$display_login" "$since_ts")
prs=$(count_kind_lines pr "$kinds")
issues=$(count_kind_lines issue "$kinds")
commits=$(count_commits "$display_login" "$since_ts")
if [[ -n "$name" ]]; then
printf '%s (@%s, %s, account created %s, %s)\n' \
"$name" "$display_login" "$type" "$created_day" "$(rough_age "$created_at")"
else
printf '@%s (%s, account created %s, %s)\n' \
"$display_login" "$type" "$created_day" "$(rough_age "$created_at")"
fi
printf '%s last %smo: %s PRs, %s issues, %s commits\n' "$repo" "$months" "$prs" "$issues" "$commits"
if [[ "$include_global" == "1" ]]; then
if global_json=$(global_activity "$display_login" "$since_ts" "$now_ts" 2>/dev/null); then
if [[ -n "$global_json" ]]; then
global_commits=$(jq -r '.totalCommitContributions' <<<"$global_json")
global_issues=$(jq -r '.totalIssueContributions' <<<"$global_json")
global_prs=$(jq -r '.totalPullRequestContributions' <<<"$global_json")
global_reviews=$(jq -r '.totalPullRequestReviewContributions' <<<"$global_json")
printf 'GitHub public last %smo: %s commits, %s PRs, %s issues, %s reviews\n' \
"$months" "$global_commits" "$global_prs" "$global_issues" "$global_reviews"
else
printf 'GitHub public last %smo: unavailable\n' "$months"
fi
else
printf 'GitHub public last %smo: unavailable\n' "$months"
fi
fi
done

View File

@@ -1,269 +0,0 @@
---
name: openclaw-qa-testing
description: Run, watch, debug, extend, or explain OpenClaw qa-lab and qa-channel scenarios, artifacts, and live lanes.
---
# OpenClaw QA Testing
Use this skill for `qa-lab` / `qa-channel` work. Repo-local QA only.
## Read first
- `docs/concepts/qa-e2e-automation.md`
- `docs/help/testing.md`
- `docs/channels/qa-channel.md`
- `qa/README.md`
- `qa/scenarios/index.md`
- `extensions/qa-lab/src/suite.ts`
- `extensions/qa-lab/src/character-eval.ts`
## Model policy
- Live OpenAI lane: `openai/gpt-5.4`
- Fast mode: on
- Do not use:
- `openai/gpt-5.4-pro`
- `openai/gpt-5.4-mini`
- Only change model policy if the user explicitly asks.
## Default workflow
1. Read the scenario pack and current suite implementation.
2. Decide lane:
- mock/dev: `mock-openai`
- real validation: `live-frontier`
3. For live OpenAI, use:
```bash
OPENCLAW_LIVE_OPENAI_KEY="${OPENAI_API_KEY}" \
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model openai/gpt-5.4 \
--alt-model openai/gpt-5.4 \
--output-dir .artifacts/qa-e2e/run-all-live-frontier-<tag>
```
4. Watch outputs:
- summary: `.artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-summary.json`
- report: `.artifacts/qa-e2e/run-all-live-frontier-<tag>/qa-suite-report.md`
5. If the user wants to watch the live UI, find the current `openclaw-qa` listen port and report `http://127.0.0.1:<port>`.
6. If a scenario fails, fix the product or harness root cause, then rerun the full lane.
## OTEL smoke
For local QA-lab OpenTelemetry validation, use:
```bash
pnpm qa:otel:smoke
```
This starts a local OTLP/HTTP trace receiver, runs the `otel-trace-smoke`
scenario through qa-channel, decodes the emitted protobuf spans, and verifies
the exported trace names and privacy contract. It does not require Opik,
Langfuse, or external collector credentials.
## Matrix live profiles
`pnpm openclaw qa matrix` defaults to the full `all` profile. Use explicit
profiles for faster CI/release proof:
```bash
OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000 \
pnpm openclaw qa matrix --profile fast --fail-fast
```
- `fast`: release-critical transport contract, excluding generated image and
deep E2EE recovery inventory.
- `transport`, `media`, `e2ee-smoke`, `e2ee-deep`, `e2ee-cli`: sharded full
Matrix coverage.
- `QA-Lab - All Lanes` uses explicit `fast` Matrix on scheduled runs. Manual
dispatch keeps `matrix_profile=all` as the default and always shards that full
Matrix selection.
## QA credentials and 1Password
- Use `op` only inside `tmux` for QA secret lookup in this repo.
- Quick auth check inside tmux:
```bash
op account list
```
- Direct Telegram npm live test secrets currently live in 1Password item:
- vault: `OpenClaw`
- item: `Telegram E2E`
- That item is the first place to look for:
- `OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN`
- `OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN`
- `OPENCLAW_QA_PROVIDER_MODE`
- `OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC`
- Convex QA secrets currently live in 1Password items:
- vault: `OpenClaw`
- item: `OPENCLAW_QA_CONVEX_SITE_URL`
- item: `OPENCLAW_QA_CONVEX_SECRET_MAINTAINER`
- item: `OPENCLAW_QA_CONVEX_SECRET_CI`
- Additional related notes/login items seen during QA credential work:
- vault: `Private`
- items: `OPENCLAW QA`, `Convex`, `Telegram`
- If a required value is missing from those notes:
- do not guess
- ask the maintainer/operator for the current value or the current 1Password item name
- for Telegram direct runs, `OPENCLAW_QA_TELEGRAM_GROUP_ID` may be stored separately from `Telegram E2E`
- for Convex runs, the leased Telegram credential should provide the Telegram group id and bot tokens together; do not require a separate `OPENCLAW_QA_TELEGRAM_GROUP_ID`
- for Convex runs, prefer `OpenClaw/OPENCLAW_QA_CONVEX_SITE_URL`; if that is stale or unclear, ask for the active pool URL before running
- Prefer direct Telegram envs for the npm Telegram Docker lane when available:
```bash
OPENCLAW_QA_TELEGRAM_GROUP_ID="..." \
OPENCLAW_QA_TELEGRAM_DRIVER_BOT_TOKEN="..." \
OPENCLAW_QA_TELEGRAM_SUT_BOT_TOKEN="..." \
OPENCLAW_QA_PROVIDER_MODE="mock-openai" \
OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC="openclaw@beta" \
pnpm test:docker:npm-telegram-live
```
- Prefer Convex mode when the goal is stable shared QA infra:
- round-robin credential leasing
- thinner wrapper for channel-specific setup
- CLI/admin flows around the pooled credentials
- Live npm Telegram Docker lane note:
- `scripts/e2e/npm-telegram-live-runner.ts` reads `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE`
- do not assume `OPENCLAW_QA_PROVIDER_MODE` is consumed by that wrapper
- if a 1Password note only gives `OPENCLAW_QA_PROVIDER_MODE`, map it explicitly to `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE` before running the Docker lane
- Verified live shape:
- Convex mode can pass the real Docker lane without direct Telegram env vars
- leased Telegram payload includes the group id coupled to the driver/SUT tokens
- a real run of `pnpm test:docker:npm-telegram-live` passed with:
- `OPENCLAW_QA_CREDENTIAL_SOURCE=convex`
- `OPENCLAW_QA_CREDENTIAL_ROLE=maintainer`
- `OPENCLAW_QA_CONVEX_SITE_URL`
- `OPENCLAW_QA_CONVEX_SECRET_MAINTAINER`
- `OPENCLAW_NPM_TELEGRAM_PROVIDER_MODE=mock-openai`
- If direct Telegram env is missing locally and `op signin` blocks, prefer dispatching the manual GitHub lane because the `qa-live-shared` environment already has Convex CI credentials:
```bash
gh workflow run "NPM Telegram Beta E2E" --repo openclaw/openclaw --ref main \
-f package_spec=openclaw@YYYY.M.D-beta.N \
-f package_label=openclaw@YYYY.M.D-beta.N \
-f provider_mode=mock-openai
```
- Poll the exact run id from the dispatch URL. `gh run view --json artifacts` is not supported; list artifacts with:
```bash
gh api repos/openclaw/openclaw/actions/runs/<run-id>/artifacts
```
## WhatsApp live credentials
Use this when setting up or replacing Convex `kind=whatsapp` credentials.
- Treat WhatsApp QA credentials as operator-owned live accounts, not generated fixtures.
- Use two dedicated WhatsApp-capable test numbers: one driver account and one SUT account. Do not use personal numbers or personal OpenClaw WhatsApp accounts in the shared pool.
- Register and link each account manually with WhatsApp or WhatsApp Business, storing Web auth only in isolated local auth dirs outside the repo.
- For group coverage, create a dedicated test group that includes both QA accounts and store its JID as `groupJid`; otherwise the group mention-gating scenario should be skipped by default and fail when explicitly requested.
- Package the two Baileys auth dirs into base64 `.tgz` payload fields and add a new active Convex credential row. Prefer adding a fresh row and disabling stale/broken rows over overwriting credentials in place.
- Expected payload fields: `driverPhoneE164`, `sutPhoneE164`, `driverAuthArchiveBase64`, `sutAuthArchiveBase64`, and optional `groupJid`.
- Keep credential material out of the repo, logs, PRs, and screenshots. Redact phone numbers unless the operator explicitly asks for local debugging.
- Validate with `pnpm openclaw qa whatsapp --credential-source convex --credential-role maintainer --provider-mode mock-openai` and preserve artifact paths plus redacted pass/fail summaries.
- If WhatsApp expires or invalidates a linked Web session, relink locally, package fresh auth archives, add a new Convex row, then disable the stale row.
## Character evals
Use `qa character-eval` for style/persona/vibe checks across multiple live models.
```bash
pnpm openclaw qa character-eval \
--model openai/gpt-5.4,thinking=xhigh \
--model openai/gpt-5.2,thinking=xhigh \
--model openai/gpt-5,thinking=xhigh \
--model anthropic/claude-opus-4-6,thinking=high \
--model anthropic/claude-sonnet-4-6,thinking=high \
--model zai/glm-5.1,thinking=high \
--model moonshot/kimi-k2.5,thinking=high \
--model google/gemini-3.1-pro-preview,thinking=high \
--judge-model openai/gpt-5.4,thinking=xhigh,fast \
--judge-model anthropic/claude-opus-4-6,thinking=high \
--concurrency 16 \
--judge-concurrency 16 \
--output-dir .artifacts/qa-e2e/character-eval-<tag>
```
- Runs local QA gateway child processes, not Docker.
- Preferred model spec syntax is `provider/model,thinking=<level>[,fast|,no-fast|,fast=<bool>]` for both `--model` and `--judge-model`.
- Do not add new examples with separate `--model-thinking`; keep that flag as legacy compatibility only.
- Defaults to candidate models `openai/gpt-5.4`, `openai/gpt-5.2`, `openai/gpt-5`, `anthropic/claude-opus-4-6`, `anthropic/claude-sonnet-4-6`, `zai/glm-5.1`, `moonshot/kimi-k2.5`, and `google/gemini-3.1-pro-preview` when no `--model` is passed.
- Candidate thinking defaults to `high`, with `xhigh` for OpenAI models that support it. Prefer inline `--model provider/model,thinking=<level>`; `--thinking <level>` and `--model-thinking <provider/model=level>` remain compatibility shims.
- OpenAI candidate refs default to fast mode so priority processing is used where supported. Use inline `,fast`, `,no-fast`, or `,fast=false` for one model; use `--fast` only to force fast mode for every candidate.
- Judges default to `openai/gpt-5.4,thinking=xhigh,fast` and `anthropic/claude-opus-4-6,thinking=high`.
- Report includes judge ranking, run stats, durations, and full transcripts; do not include raw judge replies. Duration is benchmark context, not a grading signal.
- Candidate and judge concurrency default to 16. Use `--concurrency <n>` and `--judge-concurrency <n>` to override when local gateways or provider limits need a gentler lane.
- Scenario source should stay markdown-driven under `qa/scenarios/`.
- For isolated character/persona evals, write the persona into `SOUL.md` and blank `IDENTITY.md` in the scenario flow. Use `SOUL.md + IDENTITY.md` only when intentionally testing how the normal OpenClaw identity combines with the character.
- Keep prompts natural and task-shaped. The candidate model should receive character setup through `SOUL.md`, then normal user turns such as chat, workspace help, and small file tasks; do not ask "how would you react?" or tell the model it is in an eval.
- Prefer at least one real task, such as creating or editing a tiny workspace artifact, so the transcript captures character under normal tool use instead of pure roleplay.
## Codex CLI model lane
Use model refs shaped like `codex-cli/<codex-model>` whenever QA should exercise Codex as a model backend.
Examples:
```bash
pnpm openclaw qa suite \
--provider-mode live-frontier \
--model codex-cli/<codex-model> \
--alt-model codex-cli/<codex-model> \
--scenario <scenario-id> \
--output-dir .artifacts/qa-e2e/codex-<tag>
```
```bash
pnpm openclaw qa manual \
--model codex-cli/<codex-model> \
--message "Reply exactly: CODEX_OK"
```
- Treat the concrete Codex model name as user/config input; do not hardcode it in source, docs examples, or scenarios.
- Live QA preserves `CODEX_HOME` so Codex CLI auth/config works while keeping `HOME` and `OPENCLAW_HOME` sandboxed.
- Mock QA should scrub `CODEX_HOME`.
- If Codex returns fallback/auth text every turn, first check `CODEX_HOME`,
relevant secret-backed auth, and gateway child logs before changing
scenario assertions.
- For model comparison, include `codex-cli/<codex-model>` as another candidate in `qa character-eval`; the report should label it as an opaque model name.
## Repo facts
- Seed scenarios live in `qa/`.
- Main live runner: `extensions/qa-lab/src/suite.ts`
- QA lab server: `extensions/qa-lab/src/lab-server.ts`
- Child gateway harness: `extensions/qa-lab/src/gateway-child.ts`
- Synthetic channel: `extensions/qa-channel/`
## What “done” looks like
- Full suite green for the requested lane.
- User gets:
- watch URL if applicable
- pass/fail counts
- artifact paths
- concise note on what was fixed
## Common failure patterns
- Live timeout too short:
- widen live waits in `extensions/qa-lab/src/suite.ts`
- Discovery cannot find repo files:
- point prompts at `repo/...` inside seeded workspace
- Subagent proof too brittle:
- prefer stable final reply evidence over transient child-session listing
- Harness “rebuild” delay:
- dirty tree can trigger a pre-run build; expect that before ports appear
## When adding scenarios
- Add or update scenario markdown under `qa/scenarios/`
- Keep kickoff expectations in `qa/scenarios/index.md` aligned
- Add executable coverage in `extensions/qa-lab/src/suite.ts`
- Prefer end-to-end assertions over mock-only checks
- Save outputs under `.artifacts/qa-e2e/`

View File

@@ -1,4 +0,0 @@
interface:
display_name: "QA Test OpenClaw"
short_description: "Run and debug qa-lab and qa-channel scenarios"
default_prompt: "Use $openclaw-qa-testing to run or extend the OpenClaw QA suite with qa-lab and qa-channel, using regular openai/gpt-5.4 in fast mode for live OpenAI runs."

View File

@@ -1,196 +0,0 @@
---
name: openclaw-refactor-docs
description: Refactor an existing OpenClaw docs page with source-audited preservation, restructuring, and verification.
---
# OpenClaw Refactor Docs
## Overview
Use this skill when the user gives a target OpenClaw docs page and asks to
rewrite, refactor, reorganize, split, shorten, or improve it.
This skill builds on `openclaw-docs`: use that skill for style, page types,
structure, examples, discoverability, and verification. This skill adds the
rewrite workflow needed to avoid losing accurate behavior during a major docs
refactor.
## Inputs
Required:
- A target docs page path, such as `docs/plugins/codex-harness.md`.
Optional:
- Desired page type, such as topic page, guide, reference, or troubleshooting.
- Specific goals, such as shorter main page, move details to reference pages, or
align with current CLI behavior.
- Related source files, schemas, commands, tests, specs, or PRs.
If the target page is missing or ambiguous, ask one concise question before
editing. Otherwise, proceed.
## Working Contract
Refactor the target page to be more useful, concise, and comprehensive within
its stated scope.
Do not treat a rewrite as permission to discard behavior facts. Preserve,
verify, move, or explicitly retire existing material. Incorrect docs are worse
than verbose docs.
Prefer this split:
- Topic or guide pages cover the 80/20 path, decisions readers must make, safe
setup, smallest reliable verification, common failures, and links onward.
- Reference pages cover exhaustive fields, defaults, enums, limits, precedence
rules, API contracts, narrow internals, and rare debugging details.
- Troubleshooting pages start from observable symptoms and map to checks,
causes, and fixes.
## Workflow
### 1. Load the doc standard
Read `../openclaw-docs/SKILL.md` first. Apply its page-type, style,
examples, navigation, and verification guidance throughout the refactor.
Run `pnpm docs:list` when available, then read only the target page and the
likely entry points, references, or related pages needed for the refactor.
### 2. Classify the page
Before editing, decide the intended page type from `openclaw-docs`.
If the current page mixes page types, choose the main page type and plan where
the other material belongs:
- Move exhaustive contracts to an existing or new reference page.
- Move symptom-driven material to an existing or new troubleshooting page.
- Move narrow setup workflows to a guide when they interrupt the main path.
- Keep concise routing, decision, and safety details in the main page when
readers need them to complete the workflow.
### 3. Preserve and audit existing facts
Create a working inventory from the old page before rewriting. Include:
- Config fields, flags, commands, slash commands, env vars, defaults, enums,
nullable values, and constraints.
- Precedence rules, fallback behavior, caps, limits, rate limits, timeouts,
lifecycle states, queueing behavior, and compatibility rules.
- Auth, permission, approval, sandbox, safety, privacy, and destructive-action
behavior.
- Setup requirements, supported versions, dependencies, operating systems,
credentials, and account requirements.
- Error messages, troubleshooting symptoms, diagnostics, and recovery steps.
- Examples, expected output, command routing tables, and cross-links.
For each fact, choose one outcome:
- Keep it in the refactored target page.
- Move it to a specific existing page.
- Move it to a specific new page.
- Delete it because current source proves it is obsolete or out of scope.
Do not infer defaults, permissions, policy, timeout behavior, or safety posture
from names or intent. Verify them.
### 4. Find source of truth
Use the nearest authoritative source for each behavior-sensitive claim:
- Public schema, plugin manifest, generated config docs, or exported types for
config fields.
- CLI implementation, slash-command handlers, help text, and command tests for
commands and flags.
- Runtime source and tests for lifecycle, queueing, permission, fallback,
timeout, and provider behavior.
- Protocol docs, SDK facades, and contract tests for APIs and plugin surfaces.
- Existing docs only as secondary evidence unless the target is purely
conceptual.
If a page promises a reference, compare its tables against the schema,
manifest, CLI help, generated docs, or exported types. Missing public fields,
defaults, precedence rules, caps, or side effects are correctness bugs.
### 5. Plan moved material
When moving detail out of the target page, record the destination before
editing:
- Existing page: name the page and section.
- New page: choose the page type, slug, title, frontmatter summary,
`doc-schema-version: 1`, and `read_when` hints.
- Target page: keep a short summary and link from the point where readers need
the deeper detail.
Avoid duplicate truth. If the same contract appears in multiple places, choose
one canonical page and link to it.
### 6. Rewrite
Rewrite in this order:
1. Make the first screen answer what the reader can do and why this page exists.
2. Put the recommended path before alternatives.
3. Keep only decision-making and common operational detail in the main flow.
4. Move exhaustive tables and rare details to the planned reference pages.
5. Preserve concise routing tables when they help readers choose commands,
config paths, harnesses, plugins, providers, or references.
6. Add troubleshooting from observable symptoms, not internal guesses.
7. Link related concepts, guides, references, diagnostics, and adjacent tools.
Add `doc-schema-version: 1` to the YAML frontmatter of every docs page that the
refactor migrates, creates, or materially rewrites. Apply it only to docs page
files, not `docs.json`, glossary JSON, or other non-page metadata. If a
migrated page is generated, update the generator so regeneration preserves the
marker instead of hand-editing generated output.
Do not leave placeholders such as "TODO", "TBD", or "see docs" unless the user
explicitly asks for a draft.
### 7. Compare old and new
After editing, compare the old and new page:
- Confirm all behavior-sensitive facts were kept, moved, or intentionally
deleted with source-backed reason.
- Check that the main page still covers the 80/20 scenario end to end.
- Check that reference pages remain exhaustive for the scope they claim.
- Check that links from the target page reach moved details.
- Check that headings are stable, searchable, and action-oriented.
If the refactor deliberately removes relevant material, say where it went or why
it was removed in the final report.
### 8. Verify
Run the smallest reliable docs checks for the touched surface:
- `pnpm docs:list`
- `git diff --check -- <touched-files>`
- Targeted `pnpm exec oxfmt --check --threads=1 <touched-files>`
- `pnpm docs:check-mdx`
- `pnpm docs:check-links`
- `pnpm docs:check-i18n-glossary` when link text, navigation, labels, or glossary
surfaces changed
- Generated-doc checks when schemas, generated config docs, API docs, or
generated baselines are touched
Run commands and examples from the page whenever feasible. If you cannot verify
a behavior-sensitive claim, either remove the claim, mark the uncertainty in the
work-in-progress report, or ask for the missing source.
## Final Report
Report:
- What changed in the target page.
- What details moved and their destination pages.
- What source-of-truth checks backed behavior-sensitive claims.
- What validation ran and what failed for unrelated reasons.
Do not include a long rewrite diary. Lead with remaining risks only if there are
any.

View File

@@ -1,236 +0,0 @@
---
name: openclaw-secret-scanning-maintainer
description: Triage, redact, clean up, and resolve OpenClaw GitHub Secret Scanning alerts in issues or PRs.
---
# OpenClaw Secret Scanning Maintainer
**Maintainer-only.** This skill requires repo admin / maintainer permissions to edit or delete other users' comments and resolve secret scanning alerts.
Use this skill when processing alerts from `https://github.com/openclaw/openclaw/security/secret-scanning`.
**Language rule:** All notification comments and replacement comments MUST be written in English.
## Script
All mechanical operations (API calls, temp file management, security enforcements) are handled by:
```
$REPO_ROOT/.agents/skills/openclaw-secret-scanning-maintainer/scripts/secret-scanning.mjs
```
The script enforces:
- `hide_secret=true` on all alert fetches (no plaintext secrets in stdout)
- `mktemp` with random UUIDs for all temp files
- `-F body=@file` for all body uploads (no inline shell quoting)
- Notification templates branched by location type
- Never prints `.secret` or `.body` to stdout
## Overall Flow
Supports single or multiple alerts. For multiple alerts, process in ascending order.
For each alert:
1. **Identify**`fetch-alert` + `fetch-content` to get metadata and body
2. **Decide** — Agent reads the body file, identifies whether plaintext secrets remain, and produces a redacted version only when needed
3. **Redact**`redact-body-if-needed` for issue/PR body; skip for comments (delete directly)
4. **Purge**`delete-comment` + `recreate-comment` for comments; cannot purge body history
5. **Notify**`notify` posts the right template per location type, unless the current issue/PR body is already redacted
6. **Resolve**`resolve` closes the alert
7. **Summary**`summary` prints formatted results
## Step 1: Identify
```bash
# List all open alerts
node secret-scanning.mjs list-open
# Fetch specific alert metadata + locations
node secret-scanning.mjs fetch-alert <NUMBER>
# Fetch content for each location (saves body to temp file)
node secret-scanning.mjs fetch-content '<location-json>'
```
The `fetch-content` output includes:
- `body_file`: path to temp file with full body content
- `author`: who posted it
- `issue_number` / `pr_number`: where it is
- `edit_history_count`: number of existing edits
- `type`: location type for routing
- For `discussion_comment`, it also includes `comment_node_id`, `discussion_node_id`, and `reply_to_node_id` when the original comment was a reply.
### Location type routing
| type | Flow |
| ----------------------------- | --------------------------------------------- |
| `issue_comment` | Comment: delete+recreate |
| `pull_request_comment` | Comment: delete+recreate |
| `pull_request_review_comment` | Comment: delete+recreate |
| `discussion_comment` | Discussion comment: delete+recreate (GraphQL) |
| `issue_body` | Body: redact in place |
| `pull_request_body` | Body: redact in place |
| `commit` | Notify only |
| _other_ | Skip and report |
## Step 2: Decide (Agent)
The agent reads the body file from `fetch-content` output and:
1. Identifies ALL secrets in the content (there may be more than the alert flagged)
2. Determines whether any plaintext credential remains in the current body
3. Replaces each remaining secret with `[REDACTED <secret_type>]`**no partial values, no prefix/suffix**
4. Saves the redacted content to a new temp file
This is the only step that requires semantic understanding. Everything else is mechanical.
For `issue_body` and `pull_request_body`: if the current body has already been redacted by the author and no plaintext credential remains, **do not post a public notification comment**. Resolve the alert with a maintainer-only resolution comment such as:
```bash
node secret-scanning.mjs resolve <ALERT_NUMBER> revoked "Current issue/PR body is already redacted; no public notification posted."
```
This avoids creating a fresh public pointer to historical sensitive content.
## Step 3: Redact
### For comments (issue_comment / PR comments)
**Do NOT redact.** Skip directly to Step 4 (delete + recreate). PATCHing before DELETE creates an unnecessary edit history revision.
### For issue_body / pull_request_body
```bash
node secret-scanning.mjs redact-body-if-needed <issue|pr> <NUMBER> <current-body-file> <redacted-body-file> <result-file>
```
Use the `body_file` from `fetch-content` as `<current-body-file>`. The command writes `notify_required` to `<result-file>` and only PATCHes the body when the redacted file differs from the current body.
## Step 4: Purge Edit History
### Comments — Delete and Recreate
For issue/PR comments:
```bash
# Delete original (all edit history gone)
node secret-scanning.mjs delete-comment <COMMENT_ID>
# Recreate with redacted content
node secret-scanning.mjs recreate-comment <ISSUE_NUMBER> <body-file>
```
For discussion comments (uses GraphQL):
```bash
# Delete original
node secret-scanning.mjs delete-discussion-comment <COMMENT_NODE_ID>
# Recreate with redacted content
node secret-scanning.mjs recreate-discussion-comment <DISCUSSION_NODE_ID> <body-file> [REPLY_TO_NODE_ID]
```
The `fetch-content` output for `discussion_comment` includes `comment_node_id` and `discussion_node_id` for these commands. When the original discussion comment was a reply, it also includes `reply_to_node_id`; pass that optional third argument so the redacted replacement stays in the original thread.
The recreated comment should follow this format:
```
> **Note:** The original comment by @<AUTHOR> has been removed due to secret leakage. Below is the redacted version of the original content.
---
<redacted original content>
```
### issue_body / pull_request_body — Cannot Purge Edit History
Editing creates an edit history revision with the pre-edit plaintext. This cannot be cleared via API.
Do not advise authors publicly to delete/recreate issues or close/reopen PRs. That can draw attention to historical content. Keep purge guidance maintainer-only.
**Output to maintainer terminal only (never in public comments):**
```
⚠️ Issue/PR body edit history still contains plaintext secrets.
Contact GitHub Support to purge: https://support.github.com/contact
Request purge of issue/PR #{NUMBER} userContentEdits.
```
> **CRITICAL:** Do NOT mention edit history or the "edited" button in any public comment or resolution_comment.
### Commits
Cannot clean. Notify author to delete branch or force-push (for unmerged PRs).
## Step 5: Notify
```bash
node secret-scanning.mjs notify <TARGET> <AUTHOR> <LOCATION_TYPE> <SECRET_TYPES> [REPLY_TO_NODE_ID|BODY_REDACTION_RESULT_FILE]
```
- For non-discussion types, `<TARGET>` is the issue/PR number.
- For `discussion_comment`, `<TARGET>` is the `discussion_node_id` returned by `fetch-content`.
- For reply-style `discussion_comment` locations, pass the optional `reply_to_node_id` from `fetch-content` so the notification stays in the same thread.
- For `issue_body` and `pull_request_body`, pass the `<result-file>` from `redact-body-if-needed`. The script skips notification when `notify_required` is `false` and refuses body notifications without this file.
Secret types are comma-separated: `"Discord Bot Token,Feishu App Secret"`
The script picks the right template:
- **comment types**: "your comment … removed and replaced"
- **body types**: "your issue/PR description … redacted in place"
- **commit**: "code you committed"
For `issue_body` and `pull_request_body`, only notify when the current body still contained plaintext and maintainers redacted it. If the user already redacted the current body, skip this step and resolve silently.
## Step 6: Resolve
```bash
node secret-scanning.mjs resolve <ALERT_NUMBER>
# or with custom resolution:
node secret-scanning.mjs resolve <ALERT_NUMBER> revoked "Custom comment"
```
Resolution is `revoked` by default. As maintainers we cannot control whether users rotate — our responsibility is to remove current plaintext exposure and notify only when public notification is useful. The `revoked` means "this secret should be considered leaked", not "I confirmed it was revoked".
## Step 7: Summary
After processing, create a JSON results file and pass it to the summary command:
```bash
node secret-scanning.mjs summary /tmp/results.json
```
The script outputs a block delimited by `---BEGIN SUMMARY---` and `---END SUMMARY---`. **You MUST output the content between these markers verbatim to the user. Do NOT rephrase, reformat, abbreviate, or create your own summary.** The script already includes full URLs for every alert and location.
The JSON format:
```json
[
{
"number": 72,
"secret_type": "Discord Bot Token",
"location_label": "Issue #63101 comment",
"location_url": "https://github.com/openclaw/openclaw/issues/63101#issuecomment-xxx",
"actions": "Deleted+Recreated+Notified",
"history_cleared": true
}
]
```
For unsupported types, add `"skipped": true, "unsupported_type": "<type>"`.
## Safety Rules
- **Agent reads content, identifies secrets, produces redaction.** Script handles all API calls.
- **Never include any portion of a secret** in public comments, redaction markers, or terminal output.
- **Never include alert URLs or numbers** in public comments.
- **For comments, skip PATCH — go directly to DELETE + recreate.**
- **Never mention edit history, "edited" button, or commit SHAs** in any public content.
- **Ask for confirmation** before deleting any comment.
- **One alert at a time** unless user requests batch.
- **All public comments in English.**
- **Skip unsupported location types** and report in summary.

View File

@@ -1,888 +0,0 @@
#!/usr/bin/env node
// Secret scanning alert handler for OpenClaw maintainers.
// Usage: node secret-scanning.mjs <command> [options]
import { execFileSync, spawnSync } from "node:child_process";
import crypto from "node:crypto";
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import { pathToFileURL } from "node:url";
const REPO = "openclaw/openclaw";
const REPO_URL = `https://github.com/${REPO}`;
// ─── Helpers ────────────────────────────────────────────────────────────────
function fail(message) {
console.error(`error: ${message}`);
process.exit(1);
}
function tmpFile(purpose) {
const filePath = path.join(os.tmpdir(), `secretscan-${purpose}-${crypto.randomUUID()}`);
// 预创建文件,限制权限为 owner-only
fs.writeFileSync(filePath, "", { mode: 0o600 });
return filePath;
}
function gh(args, { json = true, allowFailure = false } = {}) {
const proc = spawnSync("gh", args, { encoding: "utf8", maxBuffer: 10 * 1024 * 1024 });
if (proc.status !== 0 && !allowFailure) {
fail(`gh ${args.slice(0, 3).join(" ")} failed:\n${(proc.stderr || proc.stdout || "").trim()}`);
}
if (proc.status !== 0) {
return {
gh_failed: true,
status: proc.status,
stdout: proc.stdout,
stderr: proc.stderr,
};
}
if (!json) return proc.stdout;
try {
return JSON.parse(proc.stdout);
} catch {
return proc.stdout;
}
}
function ghGraphQL(query, options = {}) {
return gh(["api", "graphql", "-f", `query=${query}`], options);
}
function isBodyLocationType(locationType) {
return locationType === "issue_body" || locationType === "pull_request_body";
}
export function decideBodyRedaction(currentBody, redactedBody) {
const bodyChanged = String(currentBody) !== String(redactedBody);
return {
body_changed: bodyChanged,
notify_required: bodyChanged,
};
}
export function loadBodyRedactionResult(locationType, resultFile) {
if (!isBodyLocationType(locationType)) {
return { notify_required: true };
}
if (!resultFile) {
fail("Body notifications require a redaction result file from redact-body-if-needed");
}
if (!fs.existsSync(resultFile)) fail(`File not found: ${resultFile}`);
const result = JSON.parse(fs.readFileSync(resultFile, "utf8"));
if (typeof result.notify_required !== "boolean") {
fail(`Invalid redaction result file: missing boolean notify_required in ${resultFile}`);
}
return result;
}
function failOnGraphQLFailure(result, message) {
if (result?.gh_failed) {
const details = (
result.stderr ||
result.stdout ||
`gh exited with status ${result.status}`
).trim();
fail(`${message}: ${details}`);
}
if (Array.isArray(result?.errors) && result.errors.length > 0) {
fail(`${message}: ${JSON.stringify(result.errors)}`);
}
}
function escapeGraphQLString(value) {
return String(value)
.replace(/\\/g, "\\\\")
.replace(/"/g, '\\"')
.replace(/\r/g, "\\r")
.replace(/\n/g, "\\n");
}
function formatGraphQLAfterClause(cursor) {
return cursor ? `, after: "${escapeGraphQLString(cursor)}"` : "";
}
function findDiscussionCommentNode(nodes, discussionCommentDbId) {
return nodes.find((node) => String(node.databaseId) === String(discussionCommentDbId)) || null;
}
function fetchDiscussionReplyPage(commentNodeId, cursor) {
const afterClause = formatGraphQLAfterClause(cursor);
return ghGraphQL(`{
node(id: "${escapeGraphQLString(commentNodeId)}") {
... on DiscussionComment {
replies(first: 100${afterClause}) {
pageInfo { hasNextPage endCursor }
nodes {
id
databaseId
author { login }
body
url
replyTo { id }
userContentEdits(first: 50) {
totalCount
}
}
}
}
}
}}`);
}
function fetchDiscussionComment(discussionNumber, discussionCommentDbId) {
const [owner, name] = REPO.split("/");
let discussionId = null;
let cursor = null;
let hasNextPage = true;
while (hasNextPage) {
const afterClause = formatGraphQLAfterClause(cursor);
const gql = ghGraphQL(
`{
repository(owner: "${owner}", name: "${name}") {
discussion(number: ${discussionNumber}) {
id
comments(first: 50${afterClause}) {
pageInfo { hasNextPage endCursor }
nodes {
id
databaseId
author { login }
body
url
replyTo { id }
userContentEdits(first: 50) {
totalCount
}
replies(first: 100) {
pageInfo { hasNextPage endCursor }
nodes {
id
databaseId
author { login }
body
url
replyTo { id }
userContentEdits(first: 50) {
totalCount
}
}
}
}
}
}
}
}`,
{ allowFailure: true },
);
failOnGraphQLFailure(gql, `Failed to fetch discussion #${discussionNumber}`);
const discussion = gql?.data?.repository?.discussion;
if (!discussion)
fail(
`Discussion #${discussionNumber} not found — it may have been deleted. The alert cannot be processed via this skill.`,
);
discussionId = discussion.id;
for (const topLevelComment of discussion.comments.nodes) {
if (String(topLevelComment.databaseId) === String(discussionCommentDbId)) {
return { discussionId, comment: topLevelComment };
}
let reply = findDiscussionCommentNode(topLevelComment.replies.nodes, discussionCommentDbId);
let replyCursor = topLevelComment.replies.pageInfo.endCursor;
let hasMoreReplies = topLevelComment.replies.pageInfo.hasNextPage;
while (!reply && hasMoreReplies) {
const replyPage = fetchDiscussionReplyPage(topLevelComment.id, replyCursor);
failOnGraphQLFailure(
replyPage,
`Failed to fetch replies for discussion comment ${topLevelComment.id}`,
);
const replies = replyPage?.data?.node?.replies;
if (!replies)
fail(`Failed to paginate replies for discussion comment ${topLevelComment.id}`);
reply = findDiscussionCommentNode(replies.nodes, discussionCommentDbId);
hasMoreReplies = replies.pageInfo.hasNextPage;
replyCursor = replies.pageInfo.endCursor;
}
if (reply) return { discussionId, comment: reply };
}
hasNextPage = discussion.comments.pageInfo.hasNextPage;
cursor = discussion.comments.pageInfo.endCursor;
}
return { discussionId, comment: null };
}
function createDiscussionComment(discussionNodeId, body, replyToNodeId) {
const replyToClause = replyToNodeId ? `, replyToId: "${escapeGraphQLString(replyToNodeId)}"` : "";
const result = ghGraphQL(
`mutation { addDiscussionComment(input: { discussionId: "${escapeGraphQLString(discussionNodeId)}"${replyToClause}, body: "${escapeGraphQLString(body)}" }) { comment { id url } } }`,
);
if (result?.errors) {
fail(`Failed to create discussion comment: ${JSON.stringify(result.errors)}`);
}
return result?.data?.addDiscussionComment?.comment;
}
// ─── Commands ───────────────────────────────────────────────────────────────
/**
* fetch-alert <number>
* Fetch alert metadata + locations. Never exposes .secret.
*/
function cmdFetchAlert(alertNumber) {
if (!alertNumber) fail("Usage: fetch-alert <number>");
const alert = gh(["api", `repos/${REPO}/secret-scanning/alerts/${alertNumber}?hide_secret=true`]);
const locations = gh([
"api",
`repos/${REPO}/secret-scanning/alerts/${alertNumber}/locations`,
"--paginate",
"--slurp",
]);
// --paginate + --slurp 确保多页结果合并为一个 JSON 数组
const flatLocations = Array.isArray(locations?.[0])
? locations.flat()
: Array.isArray(locations)
? locations
: [];
const result = {
number: alert.number,
state: alert.state,
secret_type: alert.secret_type,
secret_type_display_name: alert.secret_type_display_name,
validity: alert.validity,
html_url: alert.html_url,
locations: flatLocations.map((loc) => ({
type: loc.type,
details: loc.details,
})),
};
console.log(JSON.stringify(result, null, 2));
}
/**
* fetch-content <location-json>
* Fetch the content and metadata for a specific location.
* Saves full body to a temp file. Prints metadata + file path to stdout.
*/
function cmdFetchContent(locationJson) {
if (!locationJson) fail("Usage: fetch-content '<location-json>'");
const location = JSON.parse(locationJson);
const type = location.type;
const details = location.details;
if (type === "discussion_comment") {
const commentUrl = details.discussion_comment_url;
if (!commentUrl) fail("No discussion_comment_url in location details");
const urlMatch = commentUrl.match(/discussions\/(\d+)#discussioncomment-(\d+)/);
if (!urlMatch) fail(`Cannot parse discussion comment URL: ${commentUrl}`);
const discussionNumber = urlMatch[1];
const discussionCommentDbId = urlMatch[2];
const { discussionId, comment } = fetchDiscussionComment(
discussionNumber,
discussionCommentDbId,
);
if (!comment)
fail(
`Discussion comment #${discussionCommentDbId} not found in discussion #${discussionNumber}`,
);
const bodyFile = tmpFile("body.md");
fs.writeFileSync(bodyFile, comment.body || "");
console.log(
JSON.stringify(
{
type,
comment_node_id: comment.id,
discussion_node_id: discussionId,
reply_to_node_id: comment.replyTo?.id ?? null,
discussion_number: Number(discussionNumber),
discussion_comment_db_id: Number(discussionCommentDbId),
author: comment.author?.login,
html_url: comment.url || commentUrl,
edit_history_count: comment.userContentEdits?.totalCount ?? 0,
body_file: bodyFile,
},
null,
2,
),
);
} else if (
type === "issue_comment" ||
type === "pull_request_comment" ||
type === "pull_request_review_comment"
) {
// Extract comment ID from URL
const commentUrl =
details.issue_comment_url ||
details.pull_request_comment_url ||
details.pull_request_review_comment_url;
if (!commentUrl) fail(`No comment URL in location details`);
const comment = gh(["api", commentUrl]);
const bodyFile = tmpFile("body.md");
fs.writeFileSync(bodyFile, comment.body || "");
// Fetch edit history
const nodeId = comment.node_id;
const typeName =
type === "pull_request_review_comment" ? "PullRequestReviewComment" : "IssueComment";
const gql = ghGraphQL(`{
node(id: "${nodeId}") {
... on ${typeName} {
userContentEdits(first: 50) {
totalCount
}
}
}
}`);
const editCount = gql?.data?.node?.userContentEdits?.totalCount ?? 0;
// Extract issue number from html_url
const htmlUrl = comment.html_url || details.html_url || "";
const issueMatch = htmlUrl.match(/\/(issues|pull)\/(\d+)/);
const issueNumber = issueMatch ? issueMatch[2] : null;
console.log(
JSON.stringify(
{
type,
comment_id: comment.id,
node_id: nodeId,
author: comment.user?.login,
issue_number: issueNumber,
html_url: htmlUrl,
edit_history_count: editCount,
body_file: bodyFile,
},
null,
2,
),
);
} else if (type === "issue_body") {
const issueUrl = details.issue_body_url || details.issue_url;
if (!issueUrl) fail("No issue URL in location details");
const issue = gh(["api", issueUrl]);
const bodyFile = tmpFile("body.md");
fs.writeFileSync(bodyFile, issue.body || "");
const nodeId = issue.node_id;
const number = issue.number;
const gql = ghGraphQL(`{
node(id: "${nodeId}") {
... on Issue {
userContentEdits(first: 50) {
totalCount
}
}
}
}`);
const editCount = gql?.data?.node?.userContentEdits?.totalCount ?? 0;
console.log(
JSON.stringify(
{
type,
issue_number: number,
node_id: nodeId,
author: issue.user?.login,
html_url: issue.html_url,
edit_history_count: editCount,
body_file: bodyFile,
},
null,
2,
),
);
} else if (type === "pull_request_body") {
const prUrl = details.pull_request_body_url || details.pull_request_url;
if (!prUrl) fail("No PR URL in location details");
const pr = gh(["api", prUrl]);
const bodyFile = tmpFile("body.md");
fs.writeFileSync(bodyFile, pr.body || "");
const nodeId = pr.node_id;
const number = pr.number;
const gql = ghGraphQL(`{
node(id: "${nodeId}") {
... on PullRequest {
userContentEdits(first: 50) {
totalCount
}
}
}
}`);
const editCount = gql?.data?.node?.userContentEdits?.totalCount ?? 0;
console.log(
JSON.stringify(
{
type,
pr_number: number,
node_id: nodeId,
author: pr.user?.login,
merged: pr.merged,
state: pr.state,
html_url: pr.html_url,
edit_history_count: editCount,
body_file: bodyFile,
},
null,
2,
),
);
} else if (type === "commit") {
console.log(
JSON.stringify(
{
type,
commit_sha: details.commit_sha,
path: details.path,
start_line: details.start_line,
end_line: details.end_line,
html_url: details.html_url || details.commit_url || details.blob_url || null,
// No body file for commits
body_file: null,
},
null,
2,
),
);
} else {
console.log(
JSON.stringify(
{
type,
unsupported: true,
details,
},
null,
2,
),
);
}
}
/**
* redact-body <issue|pr> <number> <redacted-body-file>
* PATCH the issue or PR body with redacted content from a file.
*/
function cmdRedactBody(kind, number, bodyFile) {
if (!kind || !number || !bodyFile) {
fail("Usage: redact-body <issue|pr> <number> <redacted-body-file>");
}
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const endpoint =
kind === "pr" ? `repos/${REPO}/pulls/${number}` : `repos/${REPO}/issues/${number}`;
gh(["api", endpoint, "-X", "PATCH", "-F", `body=@${bodyFile}`]);
console.log(JSON.stringify({ ok: true, kind, number: Number(number) }));
}
/**
* redact-body-if-needed <issue|pr> <number> <current-body-file> <redacted-body-file> <result-file>
* PATCH only when the agent-produced redacted body differs from the current body.
*/
function cmdRedactBodyIfNeeded(kind, number, currentBodyFile, redactedBodyFile, resultFile) {
if (!kind || !number || !currentBodyFile || !redactedBodyFile || !resultFile) {
fail(
"Usage: redact-body-if-needed <issue|pr> <number> <current-body-file> <redacted-body-file> <result-file>",
);
}
if (!fs.existsSync(currentBodyFile)) fail(`File not found: ${currentBodyFile}`);
if (!fs.existsSync(redactedBodyFile)) fail(`File not found: ${redactedBodyFile}`);
const currentBody = fs.readFileSync(currentBodyFile, "utf8");
const redactedBody = fs.readFileSync(redactedBodyFile, "utf8");
const decision = decideBodyRedaction(currentBody, redactedBody);
const result = {
ok: true,
kind,
number: Number(number),
...decision,
};
if (decision.body_changed) {
const endpoint =
kind === "pr" ? `repos/${REPO}/pulls/${number}` : `repos/${REPO}/issues/${number}`;
gh(["api", endpoint, "-X", "PATCH", "-F", `body=@${redactedBodyFile}`]);
result.redacted = true;
} else {
result.redacted = false;
result.reason = "current_body_already_redacted";
}
fs.writeFileSync(resultFile, `${JSON.stringify(result, null, 2)}\n`, { mode: 0o600 });
console.log(JSON.stringify(result));
}
/**
* delete-comment <comment-id>
* Delete a comment (and all its edit history).
*/
function cmdDeleteComment(commentId) {
if (!commentId) fail("Usage: delete-comment <comment-id>");
gh(["api", `repos/${REPO}/issues/comments/${commentId}`, "-X", "DELETE"], { json: false });
console.log(JSON.stringify({ ok: true, deleted_comment_id: Number(commentId) }));
}
/**
* delete-discussion-comment <node-id>
* Delete a discussion comment via GraphQL (and all its edit history).
*/
function cmdDeleteDiscussionComment(nodeId) {
if (!nodeId) fail("Usage: delete-discussion-comment <node-id>");
const result = ghGraphQL(
`mutation { deleteDiscussionComment(input: { id: "${nodeId}" }) { comment { id } } }`,
);
if (result?.errors) {
fail(`Failed to delete discussion comment: ${JSON.stringify(result.errors)}`);
}
console.log(JSON.stringify({ ok: true, deleted_node_id: nodeId }));
}
/**
* recreate-discussion-comment <discussion-node-id> <body-file> [reply-to-node-id]
* Create a new discussion comment via GraphQL.
*/
function cmdRecreateDiscussionComment(discussionNodeId, bodyFile, replyToNodeId) {
if (!discussionNodeId || !bodyFile)
fail("Usage: recreate-discussion-comment <discussion-node-id> <body-file> [reply-to-node-id]");
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const body = fs.readFileSync(bodyFile, "utf8");
const newComment = createDiscussionComment(discussionNodeId, body, replyToNodeId);
console.log(
JSON.stringify({
ok: true,
node_id: newComment?.id,
html_url: newComment?.url,
}),
);
}
/**
* recreate-comment <issue-number> <body-file>
* Create a new comment from a file.
*/
function cmdRecreateComment(issueNumber, bodyFile) {
if (!issueNumber || !bodyFile) fail("Usage: recreate-comment <issue-number> <body-file>");
if (!fs.existsSync(bodyFile)) fail(`File not found: ${bodyFile}`);
const result = gh([
"api",
`repos/${REPO}/issues/${issueNumber}/comments`,
"-X",
"POST",
"-F",
`body=@${bodyFile}`,
]);
console.log(
JSON.stringify({
ok: true,
comment_id: result.id,
html_url: result.html_url,
}),
);
}
/**
* notify <target> <author> <location-type> <secret-types> [reply-to-node-id]
* Post a notification comment with the correct template for the location type.
* target = issue/PR number for non-discussion types, discussion node ID for discussion_comment.
*/
function cmdNotify(target, author, locationType, secretTypes, replyToNodeId) {
if (!target || !author || !locationType || !secretTypes) {
fail(
"Usage: notify <target> <author> <location-type> <secret-types-comma-sep> [reply-to-node-id]",
);
}
const types = secretTypes.split(",").map((s) => s.trim());
const typeList = types.map((t, i) => `${i + 1}. **${t}**`).join("\n");
const redactionResult = loadBodyRedactionResult(locationType, replyToNodeId);
if (isBodyLocationType(locationType) && !redactionResult.notify_required) {
console.log(
JSON.stringify({
ok: true,
skipped: true,
reason: "current_body_already_redacted",
}),
);
return;
}
let locationDesc;
let actionDesc;
if (
locationType === "issue_comment" ||
locationType === "pull_request_comment" ||
locationType === "pull_request_review_comment" ||
locationType === "discussion_comment"
) {
locationDesc = "your comment";
actionDesc = "The affected comment has been removed and replaced with a redacted version.";
} else if (locationType === "issue_body") {
locationDesc = "your issue description";
actionDesc = "The affected content has been redacted in place.";
} else if (locationType === "pull_request_body") {
locationDesc = "your pull request description";
actionDesc = "The affected content has been redacted in place.";
} else if (locationType === "commit") {
locationDesc = "code you committed";
actionDesc = "";
} else {
locationDesc = "your content";
actionDesc = "";
}
const body = [
`> **Note:** This is an automated message sent by the OpenClaw maintainer team. **NO_REPLY.**`,
"",
`@${author} :warning: **Security Notice: Secret Leakage Detected**`,
"",
`GitHub Secret Scanning detected the following exposed secret types in ${locationDesc}:`,
"",
typeList,
"",
actionDesc,
"",
"**Please rotate these credentials immediately.**",
"",
"These secrets were publicly exposed and should be considered compromised.",
]
.filter((line) => line !== undefined)
.join("\n");
// Discussion comments must be notified via GraphQL
if (locationType === "discussion_comment") {
const newComment = createDiscussionComment(target, body, replyToNodeId);
console.log(
JSON.stringify({
ok: true,
node_id: newComment?.id,
html_url: newComment?.url,
}),
);
return;
}
// Issue/PR comments via REST
const bodyFile = tmpFile("notify.md");
fs.writeFileSync(bodyFile, body);
const result = gh([
"api",
`repos/${REPO}/issues/${target}/comments`,
"-X",
"POST",
"-F",
`body=@${bodyFile}`,
]);
console.log(
JSON.stringify({
ok: true,
comment_id: result.id,
html_url: result.html_url,
}),
);
}
/**
* resolve <alert-number> [resolution] [comment]
* Close a secret scanning alert.
*/
function cmdResolve(alertNumber, resolution, comment) {
if (!alertNumber) fail("Usage: resolve <alert-number> [resolution] [comment]");
const res = resolution || "revoked";
const resComment = comment || "Content redacted and author notified to rotate credentials.";
const result = gh([
"api",
`repos/${REPO}/secret-scanning/alerts/${alertNumber}`,
"-X",
"PATCH",
"-f",
`state=resolved`,
"-f",
`resolution=${res}`,
"-f",
`resolution_comment=${resComment}`,
]);
console.log(
JSON.stringify({
ok: true,
number: result.number,
state: result.state,
resolution: result.resolution,
resolved_at: result.resolved_at,
}),
);
}
/**
* list-open
* List all open secret scanning alerts.
*/
function cmdListOpen() {
const alerts = gh([
"api",
`repos/${REPO}/secret-scanning/alerts?hide_secret=true&state=open`,
"--paginate",
"--slurp",
]);
// --slurp 将分页结果合并为 [[page1], [page2], ...] 需要 flat
const flat = Array.isArray(alerts?.[0]) ? alerts.flat() : Array.isArray(alerts) ? alerts : [];
const rows = flat.map((a) => ({
number: a.number,
secret_type_display_name: a.secret_type_display_name,
html_url: a.html_url,
first_location_html_url: a.first_location_detected?.html_url || null,
}));
console.log(JSON.stringify(rows, null, 2));
}
/**
* summary <json-file>
* Print a formatted summary table from a JSON results file.
*/
function cmdSummary(jsonFile) {
if (!jsonFile) fail("Usage: summary <json-file>");
if (!fs.existsSync(jsonFile)) fail(`File not found: ${jsonFile}`);
const results = JSON.parse(fs.readFileSync(jsonFile, "utf8"));
const lines = [];
lines.push("---BEGIN SUMMARY---");
lines.push("");
lines.push("## Secret Scanning Results");
lines.push("");
lines.push("| Alert | Type | Location | Actions | Edit History |");
lines.push("|-------|------|----------|---------|--------------|");
const needsPurge = [];
for (const r of results) {
const alertLink = `#${r.number} ${REPO_URL}/security/secret-scanning/${r.number}`;
const locationLink = r.location_url
? `${r.location_label} ${r.location_url}`
: r.location_label;
const history = r.history_cleared ? "Cleared" : "⚠️ History remains";
lines.push(`| ${alertLink} | ${r.secret_type} | ${locationLink} | ${r.actions} | ${history} |`);
if (!r.history_cleared && r.location_url) {
needsPurge.push(r);
}
}
if (needsPurge.length > 0) {
lines.push("");
lines.push("Issues requiring GitHub Support to purge edit history:");
for (const r of needsPurge) {
lines.push(`- ${r.location_label} ${r.location_url}${r.secret_type}`);
}
lines.push(
`Contact: https://support.github.com/contact — request purge of userContentEdits for the above issues.`,
);
}
const skipped = results.filter((r) => r.skipped);
if (skipped.length > 0) {
lines.push("");
lines.push(
"⚠️ The following alerts were skipped because their location type is not supported:",
);
for (const r of skipped) {
lines.push(
`- Alert #${r.number}: unsupported type "${r.unsupported_type}" — ${REPO_URL}/security/secret-scanning/${r.number}`,
);
}
lines.push("Please update the skill to define handling for these types.");
}
lines.push("");
lines.push("---END SUMMARY---");
console.log(lines.join("\n"));
}
// ─── Dispatch ───────────────────────────────────────────────────────────────
const args = [];
export const commands = {
"fetch-alert": () => cmdFetchAlert(args[0]),
"fetch-content": () => cmdFetchContent(args[0]),
"redact-body": () => cmdRedactBody(args[0], args[1], args[2]),
"redact-body-if-needed": () => cmdRedactBodyIfNeeded(args[0], args[1], args[2], args[3], args[4]),
"delete-comment": () => cmdDeleteComment(args[0]),
"delete-discussion-comment": () => cmdDeleteDiscussionComment(args[0]),
"recreate-comment": () => cmdRecreateComment(args[0], args[1]),
"recreate-discussion-comment": () => cmdRecreateDiscussionComment(args[0], args[1], args[2]),
notify: () => cmdNotify(args[0], args[1], args[2], args[3], args[4]),
resolve: () => cmdResolve(args[0], args[1], args[2]),
"list-open": () => cmdListOpen(),
summary: () => cmdSummary(args[0]),
};
function main(argv = process.argv.slice(2)) {
const [command, ...commandArgs] = argv;
args.length = 0;
args.push(...commandArgs);
if (!command || !commands[command]) {
console.error(
[
"Usage: node secret-scanning.mjs <command> [args]",
"",
"Commands:",
" fetch-alert <number> Fetch alert metadata + locations",
" fetch-content '<location-json>' Fetch content for a location",
" redact-body <issue|pr> <n> <file> PATCH body with redacted file",
" redact-body-if-needed <issue|pr> <n> <current-file> <redacted-file> <result-file> PATCH body only if redaction changed it",
" delete-comment <comment-id> Delete a comment",
" delete-discussion-comment <node-id> Delete a discussion comment (GraphQL)",
" recreate-comment <issue-n> <file> Create replacement comment",
" recreate-discussion-comment <disc-node-id> <file> [reply-to-node-id] Create discussion comment (GraphQL)",
" notify <target> <author> <type> <types> [reply-to-node-id|body-result-file] Post notification",
" resolve <n> [resolution] [comment] Close alert",
" list-open List open alerts",
" summary <json-file> Print formatted summary",
].join("\n"),
);
process.exit(1);
}
commands[command]();
}
if (process.argv[1] && import.meta.url === pathToFileURL(process.argv[1]).href) {
main();
}

View File

@@ -1,77 +0,0 @@
---
name: openclaw-small-bugfix-sweep
description: Fix only small, high-certainty OpenClaw bugs from a pasted issue/PR list after deep code review.
---
# OpenClaw Small Bugfix Sweep
Batch workflow for pasted OpenClaw issue/PR refs.
Execute, do not summarize.
Triage reviews, proves, and patches local fixes first; publishing waits for Peter's manual review.
## Peter Review Gate
Peter always wants to review code before commits.
Default flow:
1. Review each issue deeply enough to prove current behavior and root cause.
2. Fix only easy, high-confidence bugs with narrow ownership and focused proof.
3. Stop with the dirty diff summary, touched files, and test/gate output for Peter's manual review.
4. After Peter approves shipping, make one commit per accepted fix, with a changelog entry for each user-facing fix.
5. Pull/rebase, push, then comment and close only the fixed or explicitly triaged-closed issues.
Do not batch unrelated issue fixes into one commit. Do not push, create PRs, comment, close, label, land, merge, or otherwise publish during the review/prove phase.
## Companion Skills
Use `$gitcrawl` first, `$openclaw-pr-maintainer` for live GitHub hygiene, `$github-deep-review` posture for source tracing, and `$openclaw-testing` for proof.
## Loop
For each ref:
1. Read live target with `gh`.
2. Check `gitcrawl` for related, duplicate, closed, or already-fixed threads.
3. Read body, comments, linked refs, changed files, current code, adjacent tests, and dependency contracts when relevant.
4. Trace the real runtime path.
5. For issues: fix locally only if this is a bug, current code proves root cause, the implicated path is clear, and a narrow patch is cleaner than refactor.
6. For PRs: decide `ready-to-merge`, `needs-fixup`, or `skip`; do not alter PR branches unless explicitly asked.
7. Add focused regression proof when practical for local issue fixes or PR readiness checks.
8. Run the smallest meaningful gate.
9. Continue until every pasted ref is fixed or classified.
No subagents unless explicitly requested.
## Skip If
- not a bug
- config/docs/workflow/release/support/dependency/product work
- repro or root cause is uncertain
- larger refactor or owner-boundary change is cleaner
- already fixed on current `main`
- dependency behavior is guessed
- no focused proof is feasible
Skip with terse reason. Do not pad with low-confidence fixes.
## Fix Rules
- owner module first; generic seam only when required
- existing patterns/helpers/types
- no drive-by refactors
- tests near failing surface
- docs only for changed public behavior
- no commit during the review/prove phase
- after Peter approves shipping, one commit plus changelog per accepted user-facing fix
- no push/create PR/comment/close/label/land/merge until Peter approves shipping after review
## PR Rules
- `ready-to-merge`: code is good, current head checked, required proof is green or clearly pending only external CI; list for maintainer merge or `@clawsweeper automerge`
- `needs-fixup`: small bug is clear, but PR branch needs changes; list exact files/tests and wait for explicit fix/push/automerge instruction
- `skip`: broad, stale, speculative, config/product/security/release, owner-boundary, or refactor-sized
- if source PR is untrusted/uneditable, do not create a replacement PR during sweep
## Output Shape
Ledger: `fixed-local`, `ready-to-merge`, `needs-fixup`, `skipped`, `needs-human`.
Final: issue files left on disk, PRs ready for merge/automerge, tests/gates, skip reasons.

View File

@@ -1,109 +0,0 @@
---
name: openclaw-test-heap-leaks
description: Investigate OpenClaw pnpm test memory growth, Vitest OOMs, RSS spikes, and heap snapshot deltas.
---
# OpenClaw Test Heap Leaks
Use this skill for test-memory investigations. Do not guess from RSS alone when heap snapshots are available. Treat snapshot-name deltas as triage evidence, not proof, until retainers or dominators support the call.
For **runtime fixes** (e.g., closure leaks in long-running services like the gateway), see [Validating runtime fixes](#validating-runtime-fixes-not-test-memory) below — that uses a dedicated harness, not the test-parallel snapshot machinery.
## Workflow
1. Reproduce the failing shape first.
- Match the real entrypoint if possible. For Linux CI-style unit failures, start with:
- `pnpm canvas:a2ui:bundle && OPENCLAW_TEST_MEMORY_TRACE=1 OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap OPENCLAW_TEST_WORKERS=2 OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 pnpm test`
- Keep `OPENCLAW_TEST_MEMORY_TRACE=1` enabled so the wrapper prints per-file RSS summaries alongside the snapshots.
- If the report is about a specific shard or worker budget, preserve that shape.
- Before you analyze snapshots, identify the real lane names from `[test-parallel] start ...` lines or `pnpm test --plan`. Do not assume a single `unit-fast` lane; local plans often split into `unit-fast-batch-*`.
2. Wait for repeated snapshots before concluding anything.
- Take at least two intervals from the same lane.
- Compare snapshots from the same PID inside the real lane directory such as `.tmp/heapsnap/unit-fast-batch-2/`.
- Use `.agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs` to compare either two files directly or the earliest/latest pair per PID in one lane directory.
- If the helper suggests transformed-module retention, confirm the top entries in DevTools retainers/dominators before calling it solved.
3. Classify the growth before choosing a fix.
- If growth is dominated by Vite/Vitest transformed source strings, `Module`, `system / Context`, bytecode, descriptor arrays, or property maps, treat it as likely retained module graph growth in long-lived workers.
- If growth is dominated by app objects, caches, buffers, server handles, timers, mock state, sqlite state, or similar runtime objects, treat it as a likely cleanup or lifecycle leak.
- If the names are ambiguous, stop short of a confident label and inspect retainers/dominators in DevTools for the top deltas.
4. Fix the right layer.
- For likely retained transformed-module growth in shared workers:
- Prefer timing and hotspot-driven scheduling fixes first. Check whether the file is already represented in `test/fixtures/test-timings.unit.json` and whether `scripts/test-update-memory-hotspots.mjs` should refresh the measured hotspot manifest before hand-editing behavior overrides.
- Move hotspot files out of the real shared lane by updating `test/fixtures/test-parallel.behavior.json` only when timing-driven peeling is insufficient.
- Prefer `singletonIsolated` for files that are safe alone but inflate shared worker heaps.
- If the file should already have been peeled out by timings but is absent from `test/fixtures/test-timings.unit.json`, call that out explicitly. Missing timings are a scheduling blind spot.
- For real leaks:
- Patch the implicated test or runtime cleanup path.
- Look for missing `afterEach`/`afterAll`, module-reset gaps, retained global state, unreleased DB handles, or listeners/timers that survive the file.
5. Verify with the most direct proof.
- Re-run the targeted lane or file with heap snapshots enabled if the suite still finishes in reasonable time.
- If snapshot overhead pushes tests over Vitest timeouts, fall back to the same lane without snapshots and confirm the RSS trend or OOM is reduced.
- For wrapper-only changes, at minimum verify the expected lanes start and the snapshot files are written.
## Heuristics
- Do not call everything a leak. In this repo, large `unit-fast` or `unit-fast-batch-*` growth can be a worker-lifetime problem rather than an application object leak.
- `scripts/test-parallel.mjs` and `scripts/test-parallel-memory.mjs` are the primary control points for wrapper diagnostics.
- The lane names printed by `[test-parallel] start ...` and `[test-parallel][mem] summary ...` tell you where to focus.
- When one or two files account for most of the delta and they are missing from timings, reducing impact by isolating them is usually the first pragmatic fix.
- When the same retained object families grow across multiple intervals in the same worker PID, trust the snapshots over intuition, then confirm ambiguous calls with retainer evidence.
## Snapshot Comparison
- Direct comparison:
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs before.heapsnapshot after.heapsnapshot`
- Auto-select earliest/latest snapshots per PID within one lane:
- `node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs --lane-dir .tmp/heapsnap/unit-fast-batch-2`
- Useful flags:
- `--top 40`
- `--min-kb 32`
- `--pid 16133`
Read the top positive deltas first. Large positive growth in module-transform artifacts suggests lane isolation; large positive growth in runtime objects suggests a real leak. If the names alone do not settle it, open the same snapshot pair in DevTools and inspect retainers/dominators for the top rows before declaring root cause.
## Validating runtime fixes (not test-memory)
The workflow above is for diagnosing Vitest worker memory growth. For
validating that a runtime/closure fix actually releases captured state, use the
dedicated harness:
- `pnpm leak:embedded-run` — runs `scripts/embedded-run-abort-leak.ts`. Loops N
aborted runs in a function-shaped scope mimicking `runEmbeddedAttempt`,
writes heap snapshots, and reports a PASS/FAIL verdict on retention growth
using `FinalizationRegistry` for tracked-instance counting plus RSS delta.
Modes:
- `closure-extracted` (default) — production fix shape (helper at module scope).
- `closure-inline` — pre-fix shape (closure inside the runner scope). Use as a
sensitivity check: if it passes you've broken the harness, not fixed a bug.
- `synthetic-leak` — deliberately retains via a module-level bucket. Use to
confirm the harness can detect leaks before trusting a PASS on a real fix.
Snapshots land in `.tmp/embedded-run-abort-leak/`. Diff with the same script
as above:
```
node .agents/skills/openclaw-test-heap-leaks/scripts/heapsnapshot-delta.mjs \
.tmp/embedded-run-abort-leak/baseline-*.heapsnapshot \
.tmp/embedded-run-abort-leak/batch-N-*.heapsnapshot --top 30
```
When fixing a different runtime leak, add a new harness alongside this one
rather than retrofitting it. The fixture function should mimic the lexical
scope of the function where the leak lives, not be a generic abort-loop.
## Output Expectations
When using this skill, report:
- The exact reproduce command.
- Which lane and PID were compared.
- The dominant retained object families from the snapshot delta.
- Whether the issue is a likely real leak or likely shared-worker retained module growth, plus whether retainers/dominators confirmed it.
- The concrete fix or impact-reduction patch.
- What you verified, and what snapshot overhead prevented you from verifying.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Test Heap Leaks"
short_description: "Investigate test OOMs with heap snapshots"
default_prompt: "Use $openclaw-test-heap-leaks to investigate test memory growth with heap snapshots and reduce its impact."

View File

@@ -1,553 +0,0 @@
#!/usr/bin/env node
import fs from "node:fs";
import path from "node:path";
function printUsage() {
console.error(
"Usage: node heapsnapshot-delta.mjs <before.heapsnapshot> <after.heapsnapshot> [--top N] [--min-kb N]",
);
console.error(
" or: node heapsnapshot-delta.mjs --lane-dir <dir> [--pid PID] [--top N] [--min-kb N]",
);
}
function fail(message) {
console.error(message);
process.exit(1);
}
function parseArgs(argv) {
const options = {
top: 30,
minKb: 64,
laneDir: null,
pid: null,
files: [],
};
for (let index = 0; index < argv.length; index += 1) {
const arg = argv[index];
if (arg === "--top") {
options.top = Number.parseInt(argv[index + 1] ?? "", 10);
index += 1;
continue;
}
if (arg === "--min-kb") {
options.minKb = Number.parseInt(argv[index + 1] ?? "", 10);
index += 1;
continue;
}
if (arg === "--lane-dir") {
options.laneDir = argv[index + 1] ?? null;
index += 1;
continue;
}
if (arg === "--pid") {
options.pid = Number.parseInt(argv[index + 1] ?? "", 10);
index += 1;
continue;
}
options.files.push(arg);
}
if (!Number.isFinite(options.top) || options.top <= 0) {
fail("--top must be a positive integer");
}
if (!Number.isFinite(options.minKb) || options.minKb < 0) {
fail("--min-kb must be a non-negative integer");
}
if (options.pid !== null && (!Number.isInteger(options.pid) || options.pid <= 0)) {
fail("--pid must be a positive integer");
}
return options;
}
class JsonStreamScanner {
constructor(filePath) {
this.stream = fs.createReadStream(filePath, {
encoding: "utf8",
highWaterMark: 1024 * 1024,
});
this.iterator = this.stream[Symbol.asyncIterator]();
this.buffer = "";
this.offset = 0;
this.done = false;
}
compactBuffer() {
if (this.offset > 65536) {
this.buffer = this.buffer.slice(this.offset);
this.offset = 0;
}
}
async ensureAvailable(count = 1) {
while (!this.done && this.buffer.length - this.offset < count) {
const next = await this.iterator.next();
if (next.done) {
this.done = true;
break;
}
this.buffer += next.value;
}
}
async peek() {
await this.ensureAvailable(1);
return this.buffer[this.offset] ?? null;
}
async next() {
await this.ensureAvailable(1);
if (this.offset >= this.buffer.length) {
return null;
}
const char = this.buffer[this.offset];
this.offset += 1;
this.compactBuffer();
return char;
}
async skipWhitespace() {
while (true) {
const char = await this.peek();
if (char === null || !/\s/u.test(char)) {
return;
}
await this.next();
}
}
async expectChar(expected) {
const char = await this.next();
if (char !== expected) {
fail(`Expected ${expected} but found ${char ?? "<eof>"}`);
}
}
async find(sequence) {
let matched = 0;
while (true) {
const char = await this.next();
if (char === null) {
fail(`Could not find ${sequence}`);
}
if (char === sequence[matched]) {
matched += 1;
if (matched === sequence.length) {
return;
}
continue;
}
matched = char === sequence[0] ? 1 : 0;
if (matched === sequence.length) {
return;
}
}
}
async readBalancedObject() {
const start = await this.next();
if (start !== "{") {
fail(`Expected { but found ${start ?? "<eof>"}`);
}
let text = "{";
let depth = 1;
let inString = false;
let escaped = false;
while (depth > 0) {
const char = await this.next();
if (char === null) {
fail("Unexpected EOF while reading JSON object");
}
text += char;
if (inString) {
if (escaped) {
escaped = false;
} else if (char === "\\") {
escaped = true;
} else if (char === '"') {
inString = false;
}
continue;
}
if (char === '"') {
inString = true;
} else if (char === "{") {
depth += 1;
} else if (char === "}") {
depth -= 1;
}
}
return text;
}
async parseNumberArray(onValue) {
await this.skipWhitespace();
await this.expectChar("[");
await this.skipWhitespace();
if ((await this.peek()) === "]") {
await this.next();
return;
}
let token = "";
let index = 0;
const flush = () => {
if (token.length === 0) {
fail("Unexpected empty number token");
}
const value = Number.parseInt(token, 10);
if (!Number.isFinite(value)) {
fail(`Invalid numeric token: ${token}`);
}
onValue(value, index);
index += 1;
token = "";
};
while (true) {
const char = await this.next();
if (char === null) {
fail("Unexpected EOF while reading number array");
}
if (char === "]") {
flush();
return;
}
if (char === ",") {
flush();
continue;
}
if (/\s/u.test(char)) {
continue;
}
token += char;
}
}
async readJsonString() {
await this.expectChar('"');
let value = "";
while (true) {
const char = await this.next();
if (char === null) {
fail("Unexpected EOF while reading JSON string");
}
if (char === '"') {
return value;
}
if (char !== "\\") {
value += char;
continue;
}
const escaped = await this.next();
if (escaped === null) {
fail("Unexpected EOF while reading JSON string escape");
}
if (escaped === "u") {
let hex = "";
for (let index = 0; index < 4; index += 1) {
const hexChar = await this.next();
if (hexChar === null) {
fail("Unexpected EOF while reading JSON unicode escape");
}
hex += hexChar;
}
value += String.fromCharCode(Number.parseInt(hex, 16));
continue;
}
value +=
escaped === "b"
? "\b"
: escaped === "f"
? "\f"
: escaped === "n"
? "\n"
: escaped === "r"
? "\r"
: escaped === "t"
? "\t"
: escaped;
}
}
async parseStringArray(onValue) {
await this.skipWhitespace();
await this.expectChar("[");
await this.skipWhitespace();
if ((await this.peek()) === "]") {
await this.next();
return;
}
let index = 0;
while (true) {
const value = await this.readJsonString();
onValue(value, index);
index += 1;
await this.skipWhitespace();
const separator = await this.next();
if (separator === "]") {
return;
}
if (separator !== ",") {
fail(`Expected , or ] but found ${separator ?? "<eof>"}`);
}
await this.skipWhitespace();
}
}
}
function parseHeapFilename(filePath) {
const base = path.basename(filePath);
const match = base.match(
/^Heap\.(?<stamp>\d{8}\.\d{6})\.(?<pid>\d+)\.0\.(?<seq>\d+)\.heapsnapshot$/u,
);
if (!match?.groups) {
return null;
}
return {
filePath,
pid: Number.parseInt(match.groups.pid, 10),
stamp: match.groups.stamp,
sequence: Number.parseInt(match.groups.seq, 10),
};
}
function resolvePair(options) {
if (options.laneDir) {
const entries = fs
.readdirSync(options.laneDir)
.map((name) => parseHeapFilename(path.join(options.laneDir, name)))
.filter((entry) => entry !== null)
.filter((entry) => options.pid === null || entry.pid === options.pid)
.toSorted((left, right) => {
if (left.pid !== right.pid) {
return left.pid - right.pid;
}
if (left.stamp !== right.stamp) {
return left.stamp.localeCompare(right.stamp);
}
return left.sequence - right.sequence;
});
if (entries.length === 0) {
fail(`No matching heap snapshots found in ${options.laneDir}`);
}
const groups = new Map();
for (const entry of entries) {
const group = groups.get(entry.pid) ?? [];
group.push(entry);
groups.set(entry.pid, group);
}
const candidates = Array.from(groups.values())
.map((group) => ({
pid: group[0].pid,
before: group[0],
after: group.at(-1),
count: group.length,
}))
.filter((entry) => entry.count >= 2);
if (candidates.length === 0) {
fail(`Need at least two snapshots for one PID in ${options.laneDir}`);
}
const chosen =
options.pid !== null
? (candidates.find((entry) => entry.pid === options.pid) ?? null)
: candidates.toSorted((left, right) => right.count - left.count || left.pid - right.pid)[0];
if (!chosen) {
fail(`No PID with at least two snapshots matched in ${options.laneDir}`);
}
return {
before: chosen.before.filePath,
after: chosen.after.filePath,
pid: chosen.pid,
snapshotCount: chosen.count,
};
}
if (options.files.length !== 2) {
printUsage();
process.exit(1);
}
return {
before: options.files[0],
after: options.files[1],
pid: null,
snapshotCount: 2,
};
}
async function parseSnapshotMeta(scanner) {
await scanner.find('"snapshot":');
await scanner.skipWhitespace();
const metaObjectText = await scanner.readBalancedObject();
const parsed = JSON.parse(metaObjectText);
return parsed?.meta ?? null;
}
async function buildSummary(filePath) {
const scanner = new JsonStreamScanner(filePath);
const meta = await parseSnapshotMeta(scanner);
if (!meta) {
fail(`Invalid heap snapshot: ${filePath}`);
}
const nodeFieldCount = meta.node_fields.length;
const typeNames = meta.node_types[0];
const typeIndex = meta.node_fields.indexOf("type");
const nameIndex = meta.node_fields.indexOf("name");
const selfSizeIndex = meta.node_fields.indexOf("self_size");
if (typeIndex === -1 || nameIndex === -1 || selfSizeIndex === -1) {
fail(`Unsupported heap snapshot schema: ${filePath}`);
}
const summaryByIndex = new Map();
let nodeCount = 0;
let currentTypeId = 0;
let currentNameId = 0;
let currentSelfSize = 0;
await scanner.find('"nodes":');
await scanner.parseNumberArray((value, index) => {
const fieldIndex = index % nodeFieldCount;
if (fieldIndex === typeIndex) {
currentTypeId = value;
return;
}
if (fieldIndex === nameIndex) {
currentNameId = value;
return;
}
if (fieldIndex === selfSizeIndex) {
currentSelfSize = value;
}
if (fieldIndex !== nodeFieldCount - 1) {
return;
}
const key = `${currentTypeId}\t${currentNameId}`;
const current = summaryByIndex.get(key) ?? {
typeId: currentTypeId,
nameId: currentNameId,
selfSize: 0,
count: 0,
};
current.selfSize += currentSelfSize;
current.count += 1;
summaryByIndex.set(key, current);
nodeCount += 1;
});
const requiredNameIds = new Set(
Array.from(summaryByIndex.values(), (entry) => entry.nameId).filter((value) => value >= 0),
);
const nameStrings = new Map();
await scanner.find('"strings":');
await scanner.parseStringArray((value, index) => {
if (requiredNameIds.has(index)) {
nameStrings.set(index, value);
}
});
const summary = new Map();
for (const entry of summaryByIndex.values()) {
const key = `${typeNames[entry.typeId] ?? "unknown"}\t${nameStrings.get(entry.nameId) ?? ""}`;
summary.set(key, {
type: typeNames[entry.typeId] ?? "unknown",
name: nameStrings.get(entry.nameId) ?? "",
selfSize: entry.selfSize,
count: entry.count,
});
}
return {
nodeCount,
summary,
};
}
function formatBytes(bytes) {
if (Math.abs(bytes) >= 1024 ** 2) {
return `${(bytes / 1024 ** 2).toFixed(2)} MiB`;
}
if (Math.abs(bytes) >= 1024) {
return `${(bytes / 1024).toFixed(1)} KiB`;
}
return `${bytes} B`;
}
function formatDelta(bytes) {
return `${bytes >= 0 ? "+" : "-"}${formatBytes(Math.abs(bytes))}`;
}
function truncate(text, maxLength) {
return text.length <= maxLength ? text : `${text.slice(0, maxLength - 1)}`;
}
async function main() {
const options = parseArgs(process.argv.slice(2));
const pair = resolvePair(options);
const before = await buildSummary(pair.before);
const after = await buildSummary(pair.after);
const minBytes = options.minKb * 1024;
const rows = [];
for (const [key, next] of after.summary) {
const previous = before.summary.get(key) ?? { selfSize: 0, count: 0 };
const sizeDelta = next.selfSize - previous.selfSize;
const countDelta = next.count - previous.count;
if (sizeDelta < minBytes) {
continue;
}
rows.push({
type: next.type,
name: next.name,
sizeDelta,
countDelta,
afterSize: next.selfSize,
afterCount: next.count,
});
}
rows.sort(
(left, right) => right.sizeDelta - left.sizeDelta || right.countDelta - left.countDelta,
);
console.log(`before: ${pair.before}`);
console.log(`after: ${pair.after}`);
if (pair.pid !== null) {
console.log(`pid: ${pair.pid} (${pair.snapshotCount} snapshots found)`);
}
console.log(
`nodes: ${before.nodeCount} -> ${after.nodeCount} (${after.nodeCount - before.nodeCount >= 0 ? "+" : ""}${after.nodeCount - before.nodeCount})`,
);
console.log(`filter: top=${options.top} min=${options.minKb} KiB`);
console.log("");
if (rows.length === 0) {
console.log("No entries exceeded the minimum delta.");
return;
}
for (const row of rows.slice(0, options.top)) {
console.log(
[
formatDelta(row.sizeDelta).padStart(11),
`count ${row.countDelta >= 0 ? "+" : ""}${row.countDelta}`.padStart(10),
row.type.padEnd(16),
truncate(row.name || "(empty)", 96),
].join(" "),
);
}
}
await main();

View File

@@ -1,266 +0,0 @@
---
name: openclaw-test-performance
description: Benchmark, diagnose, and optimize OpenClaw test and plugin-suite runtime, import hotspots, CPU/RSS, heap growth, and slow coverage paths.
---
# OpenClaw Test Performance
Use evidence first. The goal is real `pnpm test`, plugin-suite, and
plugin-inspector speed/RSS improvement with coverage intact, not runner tuning by
guesswork.
## Workflow
1. Read the relevant local `AGENTS.md` files before editing:
- `src/agents/AGENTS.md` for agent/import hotspots.
- `src/channels/AGENTS.md` and `src/plugins/AGENTS.md` for plugin/channel
laziness.
- `src/gateway/AGENTS.md` for server lifecycle tests.
- `test/helpers/AGENTS.md` and `test/helpers/channels/AGENTS.md` for shared
contract helpers.
- `src/infra/outbound/AGENTS.md` for outbound/media/action tests.
2. Establish a baseline before changing code:
- Prefer `pnpm test:perf:groups --full-suite --allow-failures --output <file>`
for full-suite ranking.
- For bundled plugin breadth, run the smallest relevant `pnpm
test:extensions:batch <plugin[,plugin...]>` or plugin-inspector command
before jumping to the full extension sweep.
- For a scoped hotspot use:
`/usr/bin/time -l pnpm test <file-or-files> --maxWorkers=1 --reporter=verbose`
- For import-heavy suspicion add:
`OPENCLAW_VITEST_IMPORT_DURATIONS=1 OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1`.
3. Separate wall/runner noise from real file cost:
- Compare Vitest duration, test body timing, import breakdown, wall time, and
max RSS.
- Re-run single files when grouped/full-suite numbers look stale or noisy.
- If a full-suite grouped run reports a lane failure but JSON says tests
passed, capture that as harness/noise and verify the suspect file directly.
4. Pick the next attack by return and risk:
- High return: one file/test dominates seconds or RSS and has a clear root.
- High leverage: one plugin or SDK barrel causes every plugin-inspector or
extension-batch run to load broad runtime.
- Lower risk: static descriptors, target parsing, routing, auth bypass,
setup hints, registry fixtures, or test server lifecycle.
- Higher risk: real memory/runtime behavior, live providers, protocol
contracts, or broad production refactors.
5. Fix the root cause, not the symptom:
- Move static metadata/parsing into narrow helpers or lightweight artifacts
reused by full runtime and fast paths.
- Prefer dependency injection, loaded-plugin-only lookup, explicit fixtures,
and pure helpers over broad mocks.
- Reuse suite-level servers/clients when a fresh handshake is irrelevant.
- Keep schedulers/background loops off unless the test proves scheduling.
- In plugin paths, move static metadata into manifest/lightweight artifacts
and keep runtime plugin loads behind explicit execution boundaries.
6. Preserve coverage shape:
- Do not delete a slow integration proof unless the exact production
composition is extracted into a named helper and tested.
- Keep one cheap integration smoke when cross-component wiring matters.
- State explicitly what incidental coverage was removed, if any.
7. Re-benchmark the same command after the change and compute seconds plus
percent gain.
8. Update the running report when requested or when this thread is tracking one.
Include before/after commands, artifacts, coverage notes, verification, and
next attack order.
9. Commit with `scripts/committer "<message>" <paths...>` and push when the
user asked for commits/pushes. Stage only files touched for this attack.
## Plugin-Suite Workflow
Use this section when perf work involves bundled plugins, plugin-inspector, SDK
barrels, package-boundary tests, or extension suites.
1. Map the suite shape first:
- source tests: `pnpm test extensions/<id>` or `pnpm test:extensions:batch <id>`
- package boundaries: `pnpm run test:extensions:package-boundary:canary` and
`pnpm run test:extensions:package-boundary:compile`
- all bundled source tests: `pnpm test:extensions`
- plugin import memory: `pnpm test:extensions:memory -- --json .artifacts/test-perf/extensions-memory.json`
- plugin-inspector/report work: keep report primitives in `plugin-inspector`;
keep wrappers thin and collect peak RSS when the command supports it.
2. Start narrow, then widen:
- one plugin changed: run that plugin's tests and plugin-inspector slice.
- SDK/public barrel changed: add representative provider, channel, memory,
and feature plugins.
- loader/runtime mirror changed: add package-boundary checks and build/package
proof as needed.
- unknown shared plugin behavior: run `test:extensions:batch` groups before
`pnpm test:extensions`.
3. Treat plugin-inspector failures as product signals:
- JSON must parse.
- warnings/errors must be classified, not hidden.
- runtime capture should be quiet and config-tolerant.
- command output should include wall time, exit code, and peak RSS when
available.
4. For broad or package-heavy plugin proof, use Crabbox-backed Blacksmith
Testbox by default on maintainer machines:
- `pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- OPENCLAW_TESTBOX=1 pnpm test:extensions:batch <ids>`
- add `--keep`/`--id <id-or-slug>` only when several commands must share one
warmed box; stop it with `pnpm crabbox:stop -- <id-or-slug>`.
5. If plugin performance is package-artifact sensitive, switch to
`release-openclaw-plugin-testing` and Package Acceptance rather than
trusting source-only timing.
## Metric Collection
Collect at least one stable metric before and after. Prefer the same machine and
same command. For Testbox comparisons, use the same `tbx_...` id when possible.
| Metric | Use for | Preferred source |
| --------------- | ---------------------------------- | --------------------------------------------------------------------------- |
| wall time | user-visible suite cost | `/usr/bin/time -l`, test wrapper duration, Testbox run time |
| Vitest duration | test body/import cost | Vitest output per file/shard |
| import duration | broad barrel/runtime loads | `OPENCLAW_VITEST_IMPORT_DURATIONS=1` |
| max RSS | memory pressure and OOM risk | `/usr/bin/time -l`, `pnpm test:extensions:memory`, wrapper memory summaries |
| CPU/user/sys | CPU-bound vs wait-bound split | `/usr/bin/time -l` locally, Testbox job timing when local CPU is noisy |
| heap snapshots | real leak vs retained module graph | `openclaw-test-heap-leaks` workflow |
Local scoped command with CPU/RSS:
```bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
```
Plugin import memory profile:
```bash
pnpm build
pnpm test:extensions:memory -- --top 20 --json .artifacts/test-perf/extensions-memory.json
```
Targeted plugin import memory:
```bash
pnpm test:extensions:memory -- --extension discord --extension telegram --skip-combined
```
Heap/RSS escalation:
```bash
OPENCLAW_TEST_MEMORY_TRACE=1 \
OPENCLAW_TEST_HEAPSNAPSHOT_INTERVAL_MS=60000 \
OPENCLAW_TEST_HEAPSNAPSHOT_DIR=.tmp/heapsnap \
OPENCLAW_TEST_WORKERS=2 \
OPENCLAW_TEST_MAX_OLD_SPACE_SIZE_MB=6144 \
pnpm test
```
Use `openclaw-test-heap-leaks` when RSS keeps growing across intervals, workers
OOM, or the suspect command has app-object retention. Do not call RSS growth a
leak until snapshots or retainers support it.
## Common Root Causes
- Full bundled channel/plugin runtime loaded for static data.
- `getChannelPlugin()` fallback used when an already-loaded fixture or pure
parser would suffice.
- Broad `api.ts`, `runtime-api.ts`, `test-api.ts`, or plugin-sdk barrels pulled
into hot tests.
- SDK root aliases or package barrels pulling focused subpaths back into a broad
plugin graph.
- Plugin-inspector loading runtime code just to render metadata, reports, or CI
policy scores.
- Bundled plugin capture reusing real config/home state instead of synthetic,
redacted, isolated state.
- Partial-real mocks using `importActual()` around broad modules.
- `vi.resetModules()` plus fresh imports in per-test loops.
- Test plugin registry seeded in `beforeAll` while runtime state resets in
`afterEach`.
- Per-test gateway/server/client startup when state reset would suffice.
- Runtime/default model/auth selection paid by idle snapshots or fixtures.
- Plugin-owned media/action discovery triggered before checking whether args
contain plugin-owned fields.
- Timings missing from `test/fixtures/test-timings.unit.json`, causing hotspot
files to stay in shared workers.
- Parallel Vitest runs sharing `node_modules/.experimental-vitest-cache` without
distinct `OPENCLAW_VITEST_FS_MODULE_CACHE_PATH` values.
## Benchmark Commands
Scoped file:
```bash
timeout 240 /usr/bin/time -l pnpm test <file> --maxWorkers=1 --reporter=verbose
```
Scoped file with import breakdown:
```bash
timeout 240 /usr/bin/time -l env \
OPENCLAW_VITEST_IMPORT_DURATIONS=1 \
OPENCLAW_VITEST_PRINT_IMPORT_BREAKDOWN=1 \
pnpm test <file> --maxWorkers=1 --reporter=verbose
```
Grouped suite:
```bash
pnpm test:perf:groups --full-suite --allow-failures \
--output .artifacts/test-perf/<name>.json
```
Extension batch:
```bash
pnpm test:extensions:batch <plugin[,plugin...]> -- --reporter=verbose
```
All extension tests:
```bash
pnpm test:extensions
```
Package-boundary plugin checks:
```bash
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
```
Reuse an existing Vitest JSON report:
```bash
pnpm test:perf:groups --report <vitest-json> \
--output .artifacts/test-perf/<name>.json
```
## Verification
- Always run the targeted test surface that proves the change.
- For source changes, run `pnpm check:changed` before push; in maintainer
Testbox mode run it in the warmed Testbox.
- For test-only changes, run `pnpm test:changed` or the exact edited tests.
- Run `pnpm build` when touching lazy-loading, bundled artifacts, package
boundaries, dynamic imports, build output, or public surfaces.
- For plugin SDK/barrel/runtime changes, add `pnpm plugin-sdk:api:check` or
`pnpm plugin-sdk:api:gen` when the API surface may drift.
- For plugin-suite perf fixes, verify at least one representative plugin batch
plus the changed gate; use Package Acceptance if the bug only exists in a
packed artifact.
- If deps are missing/stale, run `pnpm install` and retry the exact failed
command once.
- Use the report format:
```markdown
| Metric | Before | After | Gain |
| -------------- | -----: | -----: | ------------: |
| File wall time | `Xs` | `Ys` | `-Zs` (`P%`) |
| Max RSS | `XMB` | `YMB` | `-ZMB` (`P%`) |
| CPU user/sys | `X/Ys` | `A/Bs` | explain |
```
## Handoff
Keep the final concise:
- Root cause.
- Suite/plugin scope.
- Files changed.
- Before/after wall, Vitest/import, CPU, and RSS numbers where available.
- Leak classification if memory was involved: real leak, retained module graph,
or inconclusive.
- Coverage retained.
- Verification commands.
- Testbox ID or workflow URL for remote proof.
- Commit hash and push status.

View File

@@ -1,6 +0,0 @@
interface:
display_name: "OpenClaw Test Performance"
short_description: "Benchmark tests, plugin suites, CPU, RSS, and heap growth"
default_prompt: "Use $openclaw-test-performance to reassess OpenClaw test and plugin-suite performance, collect wall/import/CPU/RSS metrics, investigate memory growth when needed, fix the next real hotspot without losing coverage, update the report, and commit scoped changes."
policy:
allow_implicit_invocation: false

View File

@@ -1,679 +0,0 @@
---
name: openclaw-testing
description: Choose, run, rerun, or debug OpenClaw tests, CI checks, Docker E2E lanes, release validation, and the cheapest safe verification path.
---
# OpenClaw Testing
Use this skill when deciding what to test, debugging failures, rerunning CI,
or validating a change without wasting hours.
## Read First
- `docs/reference/test.md` for local test commands.
- `docs/ci.md` for CI scope, release checks, Docker chunks, and runner behavior.
- Scoped `AGENTS.md` files before editing code under a subtree.
## Default Rule
Prove the touched surface first. Do not reflexively run the whole suite.
1. Inspect the diff and classify the touched surface:
- normal source checkout, source change: `pnpm changed:lanes --json`, then `pnpm check:changed`
- normal source checkout, tests only: `pnpm test:changed`
- normal source checkout, one failing file: `pnpm test <path-or-filter> -- --reporter=verbose`
- Codex worktree or linked/sparse checkout, one/few explicit files: `node scripts/run-vitest.mjs <path-or-filter>`
- Codex worktree or linked/sparse checkout, changed gates or anything broad:
use the Crabbox wrapper with the provider that matches the proof surface.
For maintainer heavy `pnpm` gates, that is usually delegated Blacksmith
Testbox through Crabbox, e.g. `node scripts/crabbox-wrapper.mjs run
--provider blacksmith-testbox ... -- pnpm check:changed`. For direct AWS
Crabbox proof, omit `--provider` and let `.crabbox.yaml` choose AWS.
- workflow-only: `git diff --check`, workflow syntax/lint (`actionlint` when available)
- docs-only: `pnpm docs:list`, docs formatter/lint only if docs tooling changed or requested
2. Reproduce narrowly before fixing.
3. Fix root cause.
4. Rerun the same narrow proof.
5. Broaden only when the touched contract demands it.
## Guardrails
- Do not kill unrelated processes or tests. If something is running elsewhere, treat it as owned by the user or another agent.
- Do not run expensive local Docker, full release checks, full `pnpm test`, or full `pnpm check` unless the user asks or the change genuinely requires it.
- Prefer GitHub Actions for release/Docker proof when the workflow already has the prepared image and secrets.
- Use `scripts/committer "<msg>" <paths...>` when committing; stage only your files.
- If deps are missing, run `pnpm install`, retry once, then report the first actionable error.
- In a Codex worktree or linked/sparse checkout, do not run direct local
`pnpm test*`, `pnpm check*`, `pnpm crabbox:run`, or `scripts/committer` until
you have verified pnpm will not reconcile or reinstall dependencies. Use
`node scripts/run-vitest.mjs` for tiny local proof, `node
scripts/crabbox-wrapper.mjs` for Testbox, and `git commit --no-verify` only
after the relevant remote or node-wrapper proof is already clean.
- For remote proof, use the Crabbox wrapper first, but name the actual backend.
Direct AWS Crabbox uses `provider=aws` and `cbx_...` ids. Delegated
Blacksmith Testbox through Crabbox uses `provider=blacksmith-testbox`,
`syncDelegated=true`, and `tbx_...` ids. Both satisfy "remote proof" when the
requested proof surface allows either.
- Do not infer "no Testbox is running" from plain `blacksmith testbox list`.
Use `blacksmith testbox list --all` or `blacksmith testbox status <tbx_id>`
before reporting cloud state.
- Reuse only an id/slug created in this operator session unless explicitly
coordinating with another lane. If Testbox queues, fails capacity, or cannot
allocate, report the blocker or switch to direct AWS Crabbox only when that
still proves the requested surface.
## Local Test Shortcuts
```bash
pnpm changed:lanes --json
pnpm check:changed # changed typecheck/lint/guards; no Vitest
pnpm test:changed # cheap smart changed Vitest targets
pnpm verify # full check, then full Vitest
OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed
pnpm test <path-or-filter> -- --reporter=verbose
OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test <path-or-filter>
```
Use targeted file paths whenever possible. Avoid raw `vitest`; use the repo
`pnpm test` wrapper so project routing, workers, and setup stay correct. If raw
Vitest is unavoidable, use `vitest run ...`; bare `vitest ...` starts local watch
mode and will not exit on its own.
When the checkout is a Codex worktree, prefer the direct node harness instead:
```bash
node scripts/run-vitest.mjs <path-or-filter>
```
That keeps the test scoped without giving pnpm a chance to run dependency
status checks or install reconciliation in a linked worktree.
## Command Semantics
- `pnpm check` and `pnpm check:changed` do not run Vitest tests. They are for
typecheck, lint, and guard proof.
- `pnpm test` and `pnpm test:changed` run Vitest tests.
- `pnpm verify` runs `pnpm check`, then `pnpm test`, with Crabbox phase markers
so remote summaries show which half failed.
- `pnpm test:changed` is intentionally cheap by default: direct test edits,
sibling tests, explicit source mappings, and import-graph dependents.
- `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` is the explicit broad
fallback for harness/config/package edits that genuinely need it.
- Do not run extension sweeps just because core changed. If a core edit is for a
specific plugin bug, run that plugin's tests explicitly. If a public SDK or
contract change needs consumer proof, choose the smallest representative
plugin/contract tests first, then broaden only when the risk justifies it.
- The test wrapper prints a short `[test] passed|failed|skipped ... in ...`
line. Vitest's own duration is still the per-shard detail.
## Routing Model
- `pnpm changed:lanes --json` answers "which check lanes does this diff touch?"
It is used by `pnpm check:changed` for typecheck/lint/guard selection.
- `pnpm test:changed` answers "which Vitest targets are worth running now?" It
uses the same changed path list, but applies a cheaper test-target resolver.
- Direct test edits run themselves. Source edits prefer explicit mappings,
sibling `*.test.ts`, then import-graph dependents. Shared harness/config/root
edits are skipped by default unless they have precise mapped tests.
- Shared group-room delivery config and source-reply prompt edits are precise
mapped tests: they run the core auto-reply regressions plus Discord and Slack
delivery tests so cross-channel default changes fail before a PR push.
- Public SDK or contract edits do not automatically run every plugin test.
`check:changed` proves extension type contracts; the agent chooses the
smallest plugin/contract Vitest proof that matches the actual risk.
- Use `OPENCLAW_TEST_CHANGED_BROAD=1 pnpm test:changed` only when a harness,
config, package, or unknown-root edit really needs the broad Vitest fallback.
## CI Debugging
Start with current run state, not logs for everything:
```bash
gh run list --branch main --limit 10
gh run view <run-id> --json status,conclusion,headSha,url,jobs
gh run view <run-id> --job <job-id> --log
```
- Check exact SHA. Ignore newer unrelated `main` unless asked.
- For cancelled same-branch runs, confirm whether a newer run superseded it.
- Fetch full logs only for failed or relevant jobs.
- Prefer `gh run view <run-id> --json jobs` over PR rollup while debugging; rollup can be stale/noisy.
- For `prompt:snapshots:check` failures, treat Linux Node 24 as CI truth. If macOS passes but CI drifts, reproduce in a Linux Node 24 container or Testbox, commit that generated output, then rerun.
## GitHub Release Workflows
Use the smallest workflow that proves the current risk. The full umbrella is
available, but it is usually the last step after narrower proof, not the first
rerun after a focused patch.
### Full Release Validation
`Full Release Validation` (`.github/workflows/full-release-validation.yml`) is
the manual "everything before release" umbrella. It resolves a target ref, then
dispatches:
- manual `CI` for the full normal CI graph, with Android enabled via
`include_android=true`
- `Plugin Prerelease` for release-only plugin static checks, extension shards,
the release-only `agentic-plugins` shard, and plugin product Docker lanes
- `OpenClaw Release Checks` for install smoke, cross-OS release checks, live and
E2E checks, Docker release-path suites, OpenWebUI, QA Lab, fast Matrix, and
Telegram release lanes
- optional post-publish Telegram E2E when a package spec is supplied
Run it only when validating an actual release candidate, after broad shared CI
or release orchestration changes, or when explicitly asked:
```bash
gh workflow run full-release-validation.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<branch-or-sha> \
-f provider=openai \
-f mode=both \
-f release_profile=stable
```
Run the workflow itself from the trusted current ref, normally `--ref main`;
child workflows are dispatched from that same ref even when `ref` points at an
older release branch or tag. Full Release Validation has no separate child
workflow ref input; choose the trusted harness by choosing the workflow run ref.
Use `release_profile=minimum|stable|full` to control live/provider breadth:
`minimum` keeps the fastest OpenAI/core release-critical set, `stable` adds the
stable provider/backend set, and `full` adds the broad advisory provider/media
matrix. Do not make `full` faster by silently dropping suites; optimize setup,
artifact reuse, and sharding instead. The parent verifier job appends a child
overview plus slowest-job tables for child runs; rerun only that verifier after
a child rerun turns green.
Standalone manual `CI` dispatches do not run the plugin prerelease suite, the
extension batch sweep, or the release-only `agentic-plugins` Vitest shard. Those
lanes are intentionally reserved for the separate `Plugin Prerelease` child so
PRs, main pushes, and ad hoc broad CI checks do not spend Docker/package time or
all-plugin runtime time on release-only product coverage.
If a full run is already active on a newer `origin/main`, prefer watching that
run over dispatching a duplicate. Do not cancel release, release-check, or child
workflow runs unless Peter explicitly asks for cancellation.
The child-dispatch jobs record the child run ids. The final
`Verify full validation` job re-queries those child runs and is the canonical
parent gate. If a child workflow failed but was later rerun successfully, rerun
only the failed parent verifier job; do not dispatch a new full umbrella unless
the release evidence is stale.
For bounded recovery after a focused fix, pass `-f rerun_group=<group>`.
Supported umbrella groups are `all`, `ci`, `plugin-prerelease`,
`release-checks`, `install-smoke`, `cross-os`, `live-e2e`, `package`, `qa`,
`qa-parity`, `qa-live`, and `npm-telegram`. Use the narrowest group that covers
the failed box. After a targeted release-check fix, do not restart the full
umbrella by habit: dispatch the matching `rerun_group` and rerun only the parent
verifier/evidence step after the child is green unless the release evidence is
stale. For a single failed live/E2E shard, use
`-f rerun_group=live-e2e -f live_suite_filter=<suite_id>` so the Blacksmith
workflow only spends setup and queue time on that suite.
### Release Evidence
After release-candidate validation or before a release decision, record the
important run ids in the public `openclaw/releases` evidence ledger.
Use the manual `OpenClaw Release Evidence`
(`openclaw-release-evidence.yml`) workflow there. It writes durable summaries
under `evidence/<release-id>/` and commits:
- `release-evidence.md`
- `release-evidence.json`
- `index.json`
- `runs/<label>.json`
Use one run per line:
```text
full-release-validation openclaw/openclaw <run-id> blocking
package-acceptance openclaw/openclaw <run-id> blocking
release-checks openclaw/openclaw <run-id> blocking
```
Store summaries, run URLs, artifact metadata, timings, pass/fail state, and
short release-manager notes there. Do not store raw logs, provider
prompts/responses, channel transcripts, signing material, or secret-bearing
config in git; raw logs stay in Actions artifacts.
When `Full Release Validation` completes and `OPENCLAW_RELEASES_DISPATCH_TOKEN`
is configured in the source repo, it requests the public
`OpenClaw Release Evidence From Full Validation` workflow. That workflow reads
the parent full-validation run, extracts the child CI/release-checks/Telegram
run ids from the parent logs, and opens the evidence PR automatically. If the
token is absent or the run predates this wiring, trigger that workflow manually
with the full-validation run id.
### Release Checks
`OpenClaw Release Checks` (`openclaw-release-checks.yml`) is the release child
workflow. It is broader than normal CI but narrower than the umbrella because it
does not dispatch the separate full normal CI child. It runs Package Acceptance
with artifact-native delta lanes and `telegram_mode=mock-openai`, so the release
package tarball also goes through offline plugin proof, bundled-channel compat,
and Telegram package QA. The Docker release-path chunks cover the overlapping
package/update/plugin lanes. Use it when release-path validation is needed
without rerunning the entire umbrella.
```bash
gh workflow run openclaw-release-checks.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<branch-or-sha> \
-f provider=openai \
-f mode=both \
-f release_profile=stable \
-f rerun_group=all
```
Release-check rerun groups are `all`, `install-smoke`, `cross-os`, `live-e2e`,
`package`, `qa`, `qa-parity`, and `qa-live`.
`OpenClaw Release Checks` uses the trusted workflow ref to resolve the selected
ref once as `release-package-under-test` and passes that artifact into cross-OS
release checks, release-path Docker live/E2E checks, and Package Acceptance.
When `Full Release Validation` dispatches release checks, it passes the requested
branch/tag plus an `expected_sha` so branch/tag refs resolve through the fast
remote-ref path while the package and QA jobs still validate the exact SHA.
The full install-smoke child is split on purpose: one job prepares or reuses the
target-SHA GHCR root Dockerfile smoke image, QR package install runs in its own
job, root Dockerfile/gateway smokes pull the prepared image, and installer/Bun
smokes pull the same image while building only their small installer images.
If install-smoke gets slow again, first check whether the root image was reused
or rebuilt before adding/removing coverage.
The full-profile native live media shards use the prebuilt
`ghcr.io/openclaw/openclaw-live-media-runner:ubuntu-24.04` container so
`ffmpeg`/`ffprobe` are already present. If those jobs suddenly spend minutes in
dependency setup again, first check the `Live Media Runner Image` workflow and
the `Verify preinstalled live media dependencies` step before assuming the media
tests themselves slowed down.
The release Docker path intentionally shards the plugin/runtime tail. The
workflow uses `plugins-runtime-plugins`, `plugins-runtime-services`, and
`plugins-runtime-install-a` through `plugins-runtime-install-d`; aggregate
aliases such as `plugins-runtime-core`, `plugins-runtime`, and
`plugins-integrations` remain for manual reruns.
The release QA parity box is internally split into candidate and baseline lane
jobs, followed by a report job that downloads both artifacts and runs
`pnpm openclaw qa parity-report`. For parity failures, inspect the failed lane
first; inspect the report job when both lane summaries exist but the comparison
fails.
### QA Lab Matrix Profiles
`pnpm openclaw qa matrix` defaults to `--profile all`. Do not assume the CLI
default is the fast release path. Use explicit profiles:
- `--profile fast`: release-critical Matrix transport contract; add
`--fail-fast` only when the target CLI supports it
- `--profile transport|media|e2ee-smoke|e2ee-deep|e2ee-cli`: sharded full
Matrix proof
- `OPENCLAW_QA_MATRIX_NO_REPLY_WINDOW_MS=3000`: CI-friendly no-reply quiet
window when paired with fast or sharded gates
`QA-Lab - All Lanes` uses explicit fast Matrix on scheduled runs; manual
dispatch keeps `matrix_profile=all` as the default and always shards that full
Matrix selection. `OpenClaw Release Checks` uses explicit fast Matrix; run the
all-lanes workflow when release investigation needs full Matrix media/E2EE
inventory.
### Reusable Live/E2E Checks
`OpenClaw Live And E2E Checks (Reusable)`
(`openclaw-live-and-e2e-checks-reusable.yml`) is the preferred entry point for
targeted live, Docker, model, and E2E proof. Inputs let you turn off unrelated
lanes:
```bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<sha> \
-f include_repo_e2e=false \
-f include_release_path_suites=false \
-f include_openwebui=false \
-f include_live_suites=true \
-f live_models_only=true \
-f live_model_providers=fireworks
```
Useful knobs:
- `docker_lanes='<lane[,lane]>'`: run selected Docker scheduler lanes against
prepared artifacts instead of the release chunk matrix. Multiple selected
lanes fan out as parallel targeted Docker jobs after one shared package/image
preparation step.
- `include_live_suites=false`: skip live/provider suites when testing Docker
scheduler or release packaging only.
- `live_models_only=true`: run only Docker live model coverage.
- `live_model_providers=fireworks` (or comma/space separated providers): run one
targeted Docker live model job instead of the full provider matrix.
- blank `live_model_providers`: run the full live-model provider matrix.
Release-path Docker chunks are currently `core`, `package-update-openai`,
`package-update-anthropic`, `package-update-core`,
`plugins-runtime-plugins`, `plugins-runtime-services`,
`plugins-runtime-install-a`, `plugins-runtime-install-b`,
`plugins-runtime-install-c`, `plugins-runtime-install-d`,
`bundled-channels-core`, `bundled-channels-update-a`,
`bundled-channels-update-b`, and `bundled-channels-contracts`. The aggregate
`bundled-channels`, `plugins-runtime-core`, `plugins-runtime`, and
`plugins-integrations` chunks remain valid for manual one-shot reruns, but
release checks use the split chunks.
When live suites are enabled, the workflow shards broad native `pnpm test:live`
coverage through `scripts/test-live-shard.mjs` instead of one serial `live-all`
job:
- `native-live-src-agents`
- `native-live-src-gateway-core`
- `native-live-src-gateway-profiles` (release CI runs this with provider
filters such as `OPENCLAW_LIVE_GATEWAY_PROVIDERS=anthropic`)
- `native-live-src-gateway-backends`
- `native-live-test`
- `native-live-extensions-a-k`
- `native-live-extensions-l-n`
- `native-live-extensions-openai`
- `native-live-extensions-o-z`
- `native-live-extensions-o-z-other`
- `native-live-extensions-xai`
- `native-live-extensions-media`
- `native-live-extensions-media-audio`
- `native-live-extensions-media-music`
- `native-live-extensions-media-music-google`
- `native-live-extensions-media-music-minimax`
- `native-live-extensions-media-video`
Use `node scripts/test-live-shard.mjs <shard> --list` to see the exact files
before rerunning a failed native live shard. The aggregate `o-z` and `media`
shards remain useful locally; release CI uses the smaller provider/media shards
so one live-provider flake does not force a broad native live rerun.
For model-list or provider-selection fixes, use `live_models_only=true` plus the
specific `live_model_providers` allowlist. Confirm logs show the expected
`OPENCLAW_LIVE_PROVIDERS` and selected model ids before declaring proof.
## Docker
Docker is expensive. First inspect the scheduler without running Docker:
```bash
OPENCLAW_DOCKER_ALL_DRY_RUN=1 pnpm test:docker:all
OPENCLAW_DOCKER_ALL_DRY_RUN=1 OPENCLAW_DOCKER_ALL_LANES=install-e2e pnpm test:docker:all
OPENCLAW_DOCKER_ALL_LANES=install-e2e node scripts/test-docker-all.mjs --plan-json
```
Run one failed lane locally only when explicitly asked or when GitHub is not
usable:
```bash
OPENCLAW_DOCKER_ALL_LANES=<lane> \
OPENCLAW_DOCKER_ALL_BUILD=0 \
OPENCLAW_DOCKER_ALL_PREFLIGHT=0 \
OPENCLAW_SKIP_DOCKER_BUILD=1 \
OPENCLAW_DOCKER_E2E_BARE_IMAGE='<prepared-bare-image>' \
OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE='<prepared-functional-image>' \
pnpm test:docker:all
```
For release validation, prefer the reusable GitHub workflow input:
```yaml
docker_lanes: install-e2e
```
Multiple lanes are allowed:
```yaml
docker_lanes: install-e2e bundled-channel-update-acpx
```
That skips the release chunk matrix and runs one targeted Docker job against the
prepared GHCR images and the selected package artifact. Rerun commands
generated inside GitHub artifacts include `package_artifact_run_id`,
`package_artifact_name`, `docker_e2e_bare_image`, and
`docker_e2e_functional_image` when available, so failed lanes can reuse the
exact tarball and prepared images from the failed run. When the fix changes
package contents, omit those reuse inputs so the workflow packs a new tarball.
Live-only targeted reruns skip the E2E images and build only the live-test
image. Release-path normal mode fans out into smaller Docker chunk jobs:
- `core`
- `package-update-openai`
- `package-update-anthropic`
- `package-update-core`
- `plugins-runtime-plugins`
- `plugins-runtime-services`
- `plugins-runtime-install-a`
- `plugins-runtime-install-b`
- `plugins-runtime-install-c`
- `plugins-runtime-install-d`
- `bundled-channels`
OpenWebUI is folded into `plugins-runtime-services` for full release-path
coverage and keeps a standalone `openwebui` chunk only for OpenWebUI-only
dispatches. The legacy `package-update`, `plugins-runtime-core`,
`plugins-runtime`, and `plugins-integrations` chunks still work as aggregate
aliases for manual reruns, but the release workflow uses the split chunks so
provider installer checks, plugin runtime checks, bundled plugin
install/uninstall shards, and bundled-channel checks can run on separate
machines. The bundled-channel runtime-dependency coverage
inside `bundled-channels`
uses the split `bundled-channel-*` and `bundled-channel-update-*` lanes rather
than the serial `bundled-channel-deps` lane, so failures produce cheap targeted
reruns for the exact channel/update scenario. The bundled plugin
install/uninstall sweep is also split into
`bundled-plugin-install-uninstall-0` through
`bundled-plugin-install-uninstall-7`; selecting the legacy
`bundled-plugin-install-uninstall` lane expands to all eight shards.
## Package Acceptance
Use the manual `Package Acceptance` workflow when the question is "does this
installable package work as a product?" rather than "does this source diff pass
Vitest?"
In release validation, treat Package Acceptance as the package-candidate shard
inside the larger release umbrella, not as a competing full-test path. Full
Release Validation and private release gauntlets should call Package Acceptance
for tarball resolution, Docker product/package proof, and optional Telegram QA
against the same resolved `package-under-test` artifact; keep orchestration,
secret policy, blocking/advisory status, and evidence rollup in the caller.
Good defaults:
```bash
gh workflow run package-acceptance.yml --ref main \
-f source=npm \
-f workflow_ref=main \
-f package_spec=openclaw@beta \
-f suite_profile=product \
-f telegram_mode=mock-openai
```
Npm candidate selection:
- Resolve the registry immediately before dispatch:
`npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`
and `npm view openclaw@beta version dist.tarball dist.integrity --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`.
- If Peter asks for "latest beta", use `source=npm` with
`package_spec=openclaw@beta`, then record the resolved version from `npm view`
or the workflow summary.
- For reruns, release proof, or comparing one known package, prefer the exact
immutable spec: `package_spec=openclaw@YYYY.M.D-beta.N` or
`package_spec=openclaw@YYYY.M.D`.
- For stable package proof, use `package_spec=openclaw@latest` only when the
question is explicitly the current stable dist-tag; otherwise pin the exact
version.
- `source=npm` only accepts registry specs for `openclaw@beta`,
`openclaw@latest`, or exact OpenClaw release versions. Do not pass semver
ranges, git refs, file paths, tarball URLs, or plugin package names there.
- If the candidate is a tarball URL, use `source=url` with `package_sha256`. If
it is an Actions tarball artifact, use `source=artifact`. If it is an
unpublished source candidate, use `source=ref` with a trusted ref or SHA.
- Package acceptance tests exactly the selected package candidate. Do not apply
`openclaw update --channel beta` fallback semantics here; if `beta` is absent,
stale, older than `latest`, or points at a broken tarball, report that tag
state instead of silently testing `latest`.
Profiles:
- `smoke`: quick confidence that the tarball installs, can onboard a channel,
can run an agent turn, and basic gateway/config lanes work.
- `package`: release-package contract. Adds installer/update, doctor install
switching, bundled plugin runtime deps, plugin install/update, and package
repair lanes. This is the default native replacement for most Parallels
package/update coverage.
- `product`: package profile plus broader product surfaces: MCP channels,
cron/subagent cleanup, OpenAI web search, and OpenWebUI.
- `full`: split Docker release-path chunks with OpenWebUI.
- `custom`: exact `docker_lanes` list for a focused rerun.
Candidate sources:
- `source=npm`: `openclaw@beta`, `openclaw@latest`, or an exact release version.
- `source=ref`: pack `package_ref` using the trusted `workflow_ref` harness.
This intentionally separates old package commits from new workflow/test code.
- `source=url`: HTTPS `.tgz` plus required `package_sha256`.
- `source=artifact`: download one `.tgz` from `artifact_run_id`/`artifact_name`.
Ref model:
- `gh workflow run ... --ref <workflow-ref>` selects the workflow file revision
GitHub executes.
- `workflow_ref` is the trusted harness/script ref passed to reusable Docker
E2E.
- `package_ref` is the source ref to build when `source=ref`. It can be an
older branch/tag/SHA as long as it is reachable from an OpenClaw branch or
release tag.
Example: run latest package acceptance harness against an older trusted commit:
```bash
gh workflow run package-acceptance.yml --ref main \
-f workflow_ref=main \
-f source=ref \
-f package_ref=<branch-or-sha> \
-f suite_profile=package \
-f telegram_mode=mock-openai
```
Use `telegram_mode=mock-openai` or `telegram_mode=live-frontier` when the same
resolved `package-under-test` tarball should also run through the Telegram QA
workflow in the `qa-live-shared` environment. The standalone Telegram workflow
still accepts a published npm spec for post-publish checks, but Package
Acceptance passes the resolved artifact for `source=npm`, `ref`, `url`, and
`artifact`. Use `telegram_mode=none` only when intentionally skipping Telegram
credentialed package proof for a focused rerun.
Docker E2E images never copy repo sources as the app under test: the bare image
is a Node/Git runner, and the functional image installs the same prebuilt npm
tarball that bare lanes mount. `scripts/package-openclaw-for-docker.mjs` is the
single packer for local scripts and CI and validates the tarball inventory
before Docker consumes it. `scripts/test-docker-all.mjs --plan-json` is the
scheduler-owned CI plan for image kind, package, live image, lane, and
credential needs. Docker lane definitions live in the single scenario catalog
`scripts/lib/docker-e2e-scenarios.mjs`; planner logic lives in
`scripts/lib/docker-e2e-plan.mjs`. `scripts/docker-e2e.mjs` converts plan and
summary JSON into GitHub outputs and step summaries. Every scheduler run writes
`.artifacts/docker-tests/**/summary.json` plus `failures.json`. Read those
before rerunning. Lane entries include `command`, `rerunCommand`, status,
timing, timeout state, image kind, and log file path. The summary also includes
top-level phase timings for preflight, image build, package prep, lane pools,
and cleanup. Use `pnpm test:docker:timings <summary.json>` to rank slow lanes
and phases before deciding whether a broader rerun is justified.
Skill install proof: use `pnpm test:docker:skill-install` or targeted
`docker_lanes=skill-install` for live ClawHub skill-install validation. The
lane installs the package tarball in a bare runner, keeps
`skills.install.allowUploadedArchives=false`, resolves the current live slug
from `openclaw skills search`, installs it, and verifies `.clawhub` origin/lock
metadata. Prefer this checked-in script over inline heredoc Testbox recipes.
## Cheap Docker Reruns
First derive the smallest rerun command from artifacts:
```bash
pnpm test:docker:rerun <github-run-id>
pnpm test:docker:rerun .artifacts/docker-tests/<run>/failures.json
```
The script downloads Docker E2E artifacts for a GitHub run, reads
`summary.json`/`failures.json`, and prints a combined targeted workflow command
plus per-lane commands. Prefer the combined targeted command when several lanes
failed for the same patch:
```bash
gh workflow run openclaw-live-and-e2e-checks-reusable.yml \
-f ref=<sha> \
-f include_repo_e2e=false \
-f include_release_path_suites=false \
-f include_openwebui=false \
-f docker_lanes='install-e2e bundled-channel-update-acpx' \
-f include_live_suites=false \
-f live_models_only=false
```
That path still runs the prepare job, so it creates a new tarball for `<sha>`.
If the SHA-tagged GHCR bare/functional image already exists, CI skips rebuilding
that image and only uploads the fresh package artifact before the targeted lane
job. Do not rerun the full release path unless the failed lane list
or touched surface really requires it.
## Docker Expected Timings
Treat these as ballpark. Blacksmith queue time, GHCR pull speed, provider
latency, npm cache state, and Docker daemon health can dominate.
Current local timing artifact (`.artifacts/docker-tests/lane-timings.json`) has
these rough bands:
- Tiny lanes, seconds to under 1 minute:
`agents-delete-shared-workspace` ~3s, `plugin-update` ~7s,
`config-reload` ~14s, `pi-bundle-mcp-tools` ~15s, `onboard` ~18s,
`session-runtime-context` ~20s, `gateway-network` ~34s, `qr` ~44s.
- Medium deterministic lanes, ~1-5 minutes:
`npm-onboard-channel-agent` ~96s, `openai-image-auth` ~99s,
bundled channel/update lanes usually ~90-300s when split, `openwebui` ~225s,
`mcp-channels` ~274s.
- Heavy deterministic lanes, ~6-10 minutes:
`bundled-channel-root-owned` ~429s,
`bundled-channel-setup-entry` ~420s,
`bundled-channel-load-failure` ~383s,
`cron-mcp-cleanup` ~567s.
- Live provider lanes, often ~15-20 minutes:
`live-gateway` ~958s, `live-models` ~1054s.
- Installer/release lanes:
`install-e2e` and package-update paths can vary widely with npm, provider,
and package registry behavior. Budget tens of minutes; prefer GitHub targeted
reruns over local repeats.
Default fallback lane timeout is 120 minutes. A timeout usually means debug the
lane log/artifacts first, not “run the whole thing again.”
## Failure Workflow
1. Identify exact failing job, SHA, lane, and artifact path.
2. Read `failures.json`, `summary.json`, and the failed lane log tail.
3. Use `pnpm test:docker:rerun <run-id|failures.json>` to generate targeted
GitHub rerun commands.
4. If the lane has `rerunCommand`, use that only as a local starting point.
5. For Docker release failures, dispatch targeted `docker_lanes=<failed-lane>`
on GitHub before considering local Docker.
6. Patch narrowly, then rerun the failed file/lane only.
7. Broaden to `pnpm check:changed` or CI only after the isolated proof passes.
## When To Escalate
- Public SDK/plugin contract changes: run changed gate plus relevant extension
validation.
- Build output, lazy imports, package boundaries, or published surfaces:
include `pnpm build`.
- Workflow edits: run `pnpm check:workflows`.
- Release branch or tag validation: use release docs and GitHub workflows; avoid
local Docker unless Peter explicitly asks.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Testing"
short_description: "Choose cheap, targeted OpenClaw validation"
default_prompt: "Use $openclaw-testing to choose the cheapest safe test or CI verification path, inspect failures, and rerun only the relevant OpenClaw lane."

View File

@@ -1,63 +0,0 @@
---
name: parallels-discord-roundtrip
description: Run macOS Parallels smoke with Discord send, host verification, host reply, and guest readback proof.
---
# Parallels Discord Roundtrip
Use when macOS Parallels smoke must prove Discord two-way delivery end to end.
## Goal
Cover:
- install on fresh macOS snapshot
- onboard + gateway health
- guest `message send` to Discord
- host sees that message on Discord
- host posts a new Discord message
- guest `message read` sees that new message
## Inputs
- host env var with Discord bot token
- Discord guild ID
- Discord channel ID
- `OPENAI_API_KEY`
## Preferred run
```bash
export OPENCLAW_PARALLELS_DISCORD_TOKEN="$(
ssh peters-mac-studio-1 'jq -r ".channels.discord.token" ~/.openclaw/openclaw.json' | tr -d '\n'
)"
pnpm test:parallels:macos \
--discord-token-env OPENCLAW_PARALLELS_DISCORD_TOKEN \
--discord-guild-id 1456350064065904867 \
--discord-channel-id 1456744319972282449 \
--json
```
## Notes
- Snapshot target: closest to `macOS 26.3.1 fresh`.
- Snapshot resolver now prefers matching `*-poweroff*` clones when the base hint also matches. That lets the harness reuse disk-only recovery snapshots without passing a longer hint.
- If Windows/Linux snapshot restore logs show `PET_QUESTION_SNAPSHOT_STATE_INCOMPATIBLE_CPU`, drop the suspended state once, create a `*-poweroff*` replacement snapshot, and rerun. The smoke scripts now auto-start restored power-off snapshots.
- Harness configures Discord inside the guest; no checked-in token/config.
- Use the `openclaw` wrapper for guest `message send/read`; `node openclaw.mjs message ...` does not expose the lazy message subcommands the same way.
- Write `channels.discord.guilds` in one JSON object (`--strict-json`), not dotted `config set channels.discord.guilds.<snowflake>...` paths; numeric snowflakes get treated like array indexes.
- Avoid `prlctl enter` / expect for long Discord setup scripts; it line-wraps/corrupts long commands. Use `prlctl exec --current-user /bin/sh -lc ...` for the Discord config phase.
- Full 3-OS sweeps: the shared build lock is safe in parallel, but snapshot restore is still a Parallels bottleneck. Prefer serialized Windows/Linux restore-heavy reruns if the host is already under load.
- Harness cleanup deletes the temporary Discord smoke messages at exit.
- After a successful Discord roundtrip, shut down the macOS guest before handoff (`prlctl stop "macOS Tahoe"`). The macOS smoke harness should do this automatically after successful Discord proof; still stop the VM manually after ad-hoc Discord checks. Do not leave the Discord-configured VM running; it can keep reading/posting in `#maintainer` and spam Discord after the proof is complete.
- Per-phase logs: `/tmp/openclaw-parallels-smoke.*`
- Machine summary: pass `--json`
- If roundtrip flakes, inspect `fresh.discord-roundtrip.log` and `discord-last-readback.json` in the run dir first.
## Pass criteria
- fresh lane or upgrade lane requested passes
- summary reports `discord=pass` for that lane
- guest outbound nonce appears in channel history
- host inbound nonce appears in `openclaw message read` output

View File

@@ -1,85 +0,0 @@
---
name: release-openclaw-announcement
description: "Draft or post OpenClaw beta/stable Discord release announcements from changelog, GitHub release, registry, and validation evidence. Use when announcing a beta, stable release, release candidate, or asking what users should test after an OpenClaw release."
---
# OpenClaw Release Announcement
Use with `release-openclaw-maintainer` after a beta or stable release is live.
Use with `openclaw-discord` when actually posting to Discord.
## Evidence First
Before drafting focus areas, read real release evidence:
1. Current GitHub release body for the tag.
2. `CHANGELOG.md` section for the released base version.
3. Commits since the previous shipped version or the operator-specified base.
4. Registry/package metadata for the exact version and current dist-tag.
5. Validation status that is relevant to user confidence.
Do not claim a full changelog audit unless you did it. If you only read the
generated release notes or top changelog section, say that and either audit
properly or draft with that limitation.
For beta focus areas, prioritize user-observable changes over internal test or
CI mechanics:
- install/update paths
- OS/platform-specific behavior
- Gateway startup/restart, config, and runtime behavior
- provider/model/runtime routing
- plugin loading and local plugin development
- channels and media paths
- security/data-loss/user-impact fixes
Do not let late release-branch fixes automatically dominate the announcement.
If the version includes a large delta from the previous shipped version, rank
focus areas by the whole release delta and expected user impact; mention late
fixes in their natural category.
## Required Copy
Every beta announcement must make beta status explicit and include:
- exact version, e.g. `OpenClaw 2026.5.25-beta.1`
- one-sentence risk framing: beta, useful for testing, not stable promotion
- focused test areas derived from evidence, not guesswork
- update command promoted near the top:
```sh
openclaw update --channel beta --yes
openclaw --version
```
- fresh install path:
`Install from https://openclaw.ai`
- GitHub release link
- concise validation note, without making CI the headline
Do not suggest npm install commands in beta announcements unless the operator
explicitly asks for npm-specific copy or troubleshooting text. It is fine to use
registry metadata as evidence; do not turn that into public install guidance.
For stable announcements, use the stable channel wording:
```sh
openclaw update --channel stable --yes
openclaw --version
```
Fresh installs still point to `https://openclaw.ai`.
## Style
- Discord Markdown, no tables.
- Keep it skimmable: short intro, bullets, commands, links.
- Lead with what users can feel or test, not proof plumbing.
- Mention validation only after install/update instructions.
- Be specific about where feedback is useful.
- Do not mention private local proof paths in public announcements.
- Do not overstate unverified platforms, channels, or provider behavior.
## Posting
When asked to post, use the configured Discord workflow from
`openclaw-discord` or the approved OpenClaw relay. Never print tokens.
For public channels, inspect the final body before sending.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Release Announcement"
short_description: "Draft Discord beta/stable release announcements from evidence."
default_prompt: "Use this skill to draft an OpenClaw beta or stable Discord announcement from changelog, release notes, npm/GitHub release proof, and validation evidence."

View File

@@ -1,118 +0,0 @@
---
name: release-openclaw-ci
description: "Run, watch, debug, and summarize OpenClaw full release CI, release checks, live provider gates, install/update proofs, and release-secret preflights."
---
# OpenClaw Release CI
Use this with `$release-openclaw-maintainer` and `$openclaw-testing` when a release candidate needs full validation, install/update proof, live provider checks, or CI recovery.
## Guardrails
- No version bump, tag, npm publish, GitHub release, or release promotion without explicit operator approval.
- Validate provider secrets before dispatching expensive full release matrices.
- Do not set GitHub secrets from unvalidated 1Password candidates. If a candidate returns 401/403, leave the existing secret alone and report the exact missing provider.
- Use `$one-password` for secret reads/writes: one persistent tmux session, targeted items only, no secret output.
- Watch one parent run plus compact child summaries. Avoid broad `gh run view` polling loops; REST quota is easy to burn.
- Fetch logs only for failed or currently-blocking jobs. If quota is low, stop polling and wait for reset.
- Treat live-provider flakes separately from code failures: prove key validity, provider HTTP status, retry evidence, and exact failing lane before editing code.
## Preflight
Before full release validation:
```bash
node .agents/skills/release-openclaw-ci/scripts/verify-provider-secrets.mjs --required openai,anthropic,fireworks
gh api rate_limit --jq '.resources.core'
git status --short --branch
git rev-parse HEAD
```
1Password service-account values are the first source for release provider
preflight. Inject those exact targeted keys first, then run the verifier; use
ambient env only when it was already intentionally injected for this release.
The script prints only provider status and HTTP class, never tokens.
## Dispatch
Start product performance evidence as early as the release SHA exists, in
parallel with other release work:
```bash
gh workflow run openclaw-performance.yml \
--repo openclaw/openclaw \
--ref main \
-f target_ref=<release-sha> \
-f profile=release \
-f repeat=3 \
-f deep_profile=false \
-f live_openai_candidate=false \
-f fail_on_regression=false
```
- Do not wait for full release validation to start this early perf signal.
- Compare available Kova, gateway startup, and CLI startup metrics with earlier
release evidence or clawgrit reports before publish/closeout.
- Call out any regression in the release proof. Treat a major regression as a
release blocker until it is fixed, waived by the operator, or proven to be
infrastructure noise.
- Full Release Validation also records advisory product-performance evidence;
the early standalone run is for overlap and faster regression discovery.
Prefer the trusted workflow on `main`, target the exact release SHA:
```bash
gh workflow run full-release-validation.yml \
--repo openclaw/openclaw \
--ref main \
-f ref=<release-sha> \
-f provider=openai \
-f mode=both \
-f release_profile=full \
-f rerun_group=all
```
Use `release_profile=stable` unless the operator explicitly asks for the broad advisory provider/media matrix. Use narrow `rerun_group` after focused fixes.
## Watch
Use the summary helper instead of repeated raw polling:
```bash
node .agents/skills/release-openclaw-ci/scripts/release-ci-summary.mjs <full-release-run-id>
```
Then watch only when useful:
```bash
gh run watch <full-release-run-id> --repo openclaw/openclaw --exit-status
```
Stop watchers before ending the turn or switching strategy.
## Failure Triage
1. Confirm parent SHA and child run IDs.
2. List failed jobs only:
```bash
gh run view <child-run-id> --repo openclaw/openclaw --json jobs \
--jq '.jobs[] | select(.conclusion=="failure" or .conclusion=="timed_out" or .conclusion=="cancelled") | [.databaseId,.name,.conclusion,.url] | @tsv'
```
3. Fetch one failed job log. If rate-limited, note reset time and avoid more REST calls.
4. For secret-looking failures, validate the provider endpoint from the same secret source before editing code.
5. For live-cache failures, inspect whether it is missing/invalid key, empty text, provider refusal, timeout, or baseline miss. Do not weaken release gates without clear provider evidence.
6. Fix narrowly, run local/changed proof, commit, push, rerun the smallest matching group.
## Evidence
Record:
- release SHA
- full parent run URL
- child run IDs and conclusions: CI, Release Checks, Plugin Prerelease, NPM Telegram, Product Performance
- performance comparison result versus earlier releases when available
- targeted local proof commands
- provider-secret preflight result
- known gaps or unrelated failures
For lessons and recovery patterns, read `references/release-ci-notes.md`.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Release CI"
short_description: "Verify and debug OpenClaw release validation runs"
default_prompt: "Use $release-openclaw-ci to preflight provider secrets, watch full release validation, summarize child runs, and triage only failing release lanes."

View File

@@ -1,41 +0,0 @@
# Release CI Notes
## What Went Wrong
- Full validation was started before all provider keys were proven valid.
- GitHub secret presence was confused with key validity.
- Repeated `gh run view` and log fetches exhausted REST quota.
- Parent run state was less useful than child run evidence.
- Live-cache failures needed structured classification: invalid key, empty provider output, timeout, or real cache regression.
- Background watchers accumulated and made interruption recovery harder.
## Better Defaults
- Run provider-secret preflight first. Require real `/models` or equivalent endpoint checks for release-blocking providers.
- Keep one watcher open. Use child summaries every few minutes, not every few seconds.
- Fetch failed-job logs only after a job reaches a terminal failing state.
- Prefer narrow `rerun_group` recovery after a focused fix.
- Leave bad secrets unset. A 401 candidate from 1Password should not overwrite GitHub.
- Make the final release evidence note durable: parent URL, child run URLs, SHA, command proof, and gaps.
## Secret Handling Pattern
- Use `$one-password`; never run broad env dumps.
- Search exact item titles or known ids.
- Validate candidates without printing values.
- Set GitHub secrets only after endpoint validation succeeds.
- After setting, verify metadata with `gh secret list`, not value output.
## Live Cache Pattern
- Empty text with token usage is a provider/output issue until proven otherwise.
- Retry lane-level mismatches once with a fresh session id.
- Keep cache baselines strict, but log enough structured usage to distinguish cache miss from response mismatch.
- If a provider key validates locally but fails in Actions, inspect whether the workflow reads the expected secret name.
## Quota-Safe GitHub Pattern
- Check `gh api rate_limit --jq '.resources.core'` before log-heavy work.
- Use one child-run listing call, then inspect failed jobs only.
- If remaining quota is low, pause until reset; do not keep polling.
- Prefer GraphQL only for metadata when REST is exhausted; logs still need REST.

View File

@@ -1,121 +0,0 @@
#!/usr/bin/env node
import { execFileSync } from "node:child_process";
import process from "node:process";
const runId = process.argv[2];
const repo = process.env.OPENCLAW_RELEASE_REPO || "openclaw/openclaw";
if (!runId) {
console.error("usage: release-ci-summary.mjs <full-release-run-id>");
process.exit(2);
}
function gh(args) {
return execFileSync("gh", args, {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
});
}
function jsonGh(args) {
return JSON.parse(gh(args));
}
function githubRestJson(pathSuffix) {
const result = execFileSync(
"bash",
[
"-lc",
[
"set -euo pipefail",
'token="$(gh auth token)"',
'curl -fsS -H "Authorization: Bearer ${token}" -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" "${OPENCLAW_GITHUB_REST_URL}"',
].join("\n"),
],
{
encoding: "utf8",
env: {
...process.env,
OPENCLAW_GITHUB_REST_URL: `https://api.github.com/repos/${repo}/${pathSuffix}`,
},
maxBuffer: 16 * 1024 * 1024,
stdio: ["ignore", "pipe", "pipe"],
},
);
return JSON.parse(result);
}
function rate() {
try {
return jsonGh(["api", "rate_limit"]).resources.core;
} catch {
return undefined;
}
}
const core = rate();
if (core) {
const reset = new Date(core.reset * 1000).toISOString();
console.log(`rate: remaining=${core.remaining}/${core.limit} reset=${reset}`);
if (core.remaining < 20) {
console.error("rate too low for CI summary; wait for reset before polling");
process.exit(3);
}
}
const parent = jsonGh([
"run",
"view",
runId,
"--repo",
repo,
"--json",
"status,conclusion,createdAt,headSha,url,jobs",
]);
console.log(`parent: ${runId} ${parent.status}/${parent.conclusion || "none"}`);
console.log(`sha: ${parent.headSha}`);
console.log(`url: ${parent.url}`);
for (const job of parent.jobs ?? []) {
const marker = job.conclusion || job.status;
console.log(`parent-job: ${marker} ${job.name}`);
}
const since = parent.createdAt;
const runsQuery = new URLSearchParams({
per_page: "100",
created: `>=${since}`,
exclude_pull_requests: "true",
});
const childWorkflowNames = new Set([
"CI",
"OpenClaw Release Checks",
"Plugin Prerelease",
"NPM Telegram Beta E2E",
"Full Release Validation",
]);
const runs = githubRestJson(`actions/runs?${runsQuery.toString()}`).workflow_runs ?? [];
const runList = runs
.filter(
(run) =>
run.created_at >= since &&
run.head_sha === parent.headSha &&
childWorkflowNames.has(run.name),
)
.map((run) =>
[run.id, run.name, run.status, run.conclusion ?? "", run.head_sha, run.html_url].join("\t"),
)
.join("\n");
if (!runList) {
console.log("children: none found yet");
process.exit(0);
}
console.log("children:");
for (const line of runList.split("\n")) {
const [id, name, status, conclusion, sha, url] = line.split("\t");
console.log(`child: ${id} ${name} ${status}/${conclusion || "none"} sha=${sha}`);
console.log(`child-url: ${url}`);
}

View File

@@ -1,113 +0,0 @@
#!/usr/bin/env node
import process from "node:process";
const args = new Map();
for (let index = 2; index < process.argv.length; index += 1) {
const arg = process.argv[index];
if (!arg.startsWith("--")) continue;
const [key, inlineValue] = arg.slice(2).split("=", 2);
const value = inlineValue ?? process.argv[index + 1];
if (inlineValue === undefined) index += 1;
args.set(key, value);
}
const requiredInput = String(args.get("required") ?? "openai,anthropic").trim();
const required = new Set(
(requiredInput.toLowerCase() === "none" ? "" : requiredInput)
.split(",")
.map((entry) => entry.trim().toLowerCase())
.filter(Boolean),
);
const timeoutMs = Number(args.get("timeout-ms") ?? 10_000);
function envFirst(names) {
for (const name of names) {
const value = process.env[name]?.trim();
if (value) return { name, value };
}
return undefined;
}
async function checkProvider(id, config) {
const secret = envFirst(config.env);
if (!secret) {
return { id, ok: false, status: "missing", env: config.env.join("|") };
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
const headers = config.headers(secret.value);
const response = await fetch(config.url, {
headers,
signal: controller.signal,
});
return {
id,
ok: response.ok,
status: response.ok ? "ok" : `http_${response.status}`,
env: secret.name,
};
} catch (error) {
return {
id,
ok: false,
status: error?.name === "AbortError" ? "timeout" : "error",
env: secret.name,
};
} finally {
clearTimeout(timer);
}
}
const providers = {
openai: {
env: ["OPENAI_API_KEY"],
url: "https://api.openai.com/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
anthropic: {
env: ["ANTHROPIC_API_KEY", "ANTHROPIC_API_TOKEN"],
url: "https://api.anthropic.com/v1/models",
headers: (token) => ({
"anthropic-version": "2023-06-01",
"x-api-key": token,
}),
},
fireworks: {
env: ["FIREWORKS_API_KEY"],
url: "https://api.fireworks.ai/inference/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
openrouter: {
env: ["OPENROUTER_API_KEY"],
url: "https://openrouter.ai/api/v1/models",
headers: (token) => ({ authorization: `Bearer ${token}` }),
},
};
const unknown = [...required].filter((id) => !providers[id]);
if (unknown.length > 0) {
console.error(`unknown providers: ${unknown.join(",")}`);
process.exit(2);
}
const results = [];
for (const id of Object.keys(providers)) {
if (required.has(id) || envFirst(providers[id].env)) {
results.push(await checkProvider(id, providers[id]));
}
}
let failed = false;
for (const result of results) {
const requiredLabel = required.has(result.id) ? "required" : "optional";
console.log(`${result.id}: ${result.status} env=${result.env} ${requiredLabel}`);
if (required.has(result.id) && !result.ok) failed = true;
}
if (failed) {
console.error("release provider secret preflight failed");
process.exit(1);
}

View File

@@ -1,92 +0,0 @@
---
name: release-openclaw-mac
description: "Run or recover OpenClaw macOS release signing, notarization, appcast, and asset promotion."
---
# OpenClaw Mac Release
Use with `$release-openclaw-maintainer`, `$release-openclaw-ci`, `$one-password`, and `$release-private` if it exists when stable macOS assets, private mac preflight, notarization, appcast promotion, or mac release recovery is involved.
## Credentials
- Resolve Peter-owned ASC item refs, key ids, issuer ids, and service-token provenance from `$release-private`.
- Fields: `private_key_p8`, `key_id`, `issuer_id`.
- Stale/revoked key symptom: `xcrun notarytool submit` fails with `HTTP status code: 401. Unauthenticated`.
- Validate candidate ASC credentials with `xcrun notarytool history` before setting GitHub secrets.
## 1Password
- Use `$one-password`: all `op` work inside one persistent tmux session, no secret output.
- Use the service-token guidance from `$release-private` when available.
- If a service token fails, run status-only checks: token present/length and `op whoami`; never print token values.
- If desktop app auth is needed but Touch ID is unavailable, set `OP_BIOMETRIC_UNLOCK_ENABLED=false` for the manual `op account add --signin` path.
## GitHub Secrets
Target private repo environment: `openclaw/releases-private`, env `mac-release`.
Set only after local notary auth validation:
- `APP_STORE_CONNECT_API_KEY_P8`
- `APP_STORE_CONNECT_KEY_ID`
- `APP_STORE_CONNECT_ISSUER_ID`
Do not update these from mixed sources. All three ASC fields must come from the same 1Password item.
## Workflow Shape
- Public release branch may carry mac-only packaging fixes after the stable tag/npm are already live.
- Use `source_ref=release/YYYY.M.D` for private mac preflight/validation when building that branch variation.
- Keep `tag=vYYYY.M.D` pointing at the original stable release commit.
- Real mac publish must reuse:
- a successful private mac preflight run for the same tag/source SHA
- a successful private mac validation run for the same tag/source SHA
- If preflight source SHA differs from tag SHA, validation must also use the same `source_ref`; promotion rejects mismatched proof.
## Notarization
- OpenClaw uses `scripts/notarize-mac-artifact.sh`.
- `xcrun notarytool submit` should use `--no-s3-acceleration`; accelerated upload can surface misleading 401s even when `notarytool history` succeeds.
- If signing succeeds but notarization fails immediately with 401, check ASC key freshness first.
- If notarization stays in progress for several minutes after key-file write, that is normal Apple wait time; do not edit blindly.
## Dispatch
Private preflight:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f source_ref=release/YYYY.M.D \
-f preflight_only=true \
-f smoke_test_only=false \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.D
```
Private validation for a branch-variation preflight:
```bash
gh workflow run openclaw-macos-validate.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f source_ref=release/YYYY.M.D
```
Real publish:
```bash
gh workflow run openclaw-macos-publish.yml --repo openclaw/releases-private --ref main \
-f tag=vYYYY.M.D \
-f preflight_only=false \
-f smoke_test_only=false \
-f preflight_run_id=<successful-preflight-run> \
-f validate_run_id=<successful-validation-run> \
-f allow_late_calver_recovery=false \
-f public_release_branch=release/YYYY.M.D
```
## Verify
- `gh release view vYYYY.M.D --repo openclaw/openclaw` shows zip, dmg, dSYM zip, not draft, not prerelease.
- Public `main` `appcast.xml` points at `OpenClaw-YYYY.M.D.zip`.
- Appcast entry has `sparkle:version`, `sparkle:shortVersionString`, length, and `sparkle:edSignature`.

View File

@@ -1,677 +0,0 @@
---
name: release-openclaw-maintainer
description: Prepare or verify OpenClaw stable/beta releases, changelogs, release notes, publish commands, and artifacts.
---
# OpenClaw Release Maintainer
Use this skill for release and publish-time workflow. Load `$release-private` if it exists before resolving Peter-owned credential locators or private host topology. Keep ordinary development changes and GHSA-specific advisory work outside this skill.
## Respect release guardrails
- Do not change version numbers without explicit operator approval.
- Ask permission before any npm publish or release step.
- This skill should be sufficient to drive the normal release flow end-to-end.
- Use the private maintainer release docs for credentials, recovery steps, and mac signing/notary specifics, and use `docs/reference/RELEASING.md` for public policy.
- Core `openclaw` publish is manual `workflow_dispatch`; creating or pushing a tag does not publish by itself.
- Normal release work happens on a branch cut from `main`, not directly on
`main`. Use `release/YYYY.M.D` for the branch name.
- If the operator asks for a release without saying stable/full, default to
beta only. Continue from beta to stable only when the operator explicitly asks
for the full release or an automated beta-and-stable train.
- Before release branching, pull latest `main` and confirm current `main` CI is
green. Then branch from that commit so regular development can continue on
`main` while release validation runs.
- Before release branching, commit any dirty files in coherent groups, push,
pull/rebase, then generate `CHANGELOG.md` on `main` from merged PRs and all
direct commits since the last reachable release tag. Commit/push/pull that
changelog rewrite immediately before creating the release branch.
- During release planning, inspect both `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` before branching and again
before final publish. For every deprecated or removal-pending compatibility
record whose `removeAfter` date is on or before the release date, either
remove the compatibility path where safe and validate the affected tests, or
write down why removal is blocked and get explicit maintainer approval before
shipping the expired compatibility path.
- When removing deprecated runtime/config compatibility, preserve any doctor
migration, repair, or hint that is still needed by supported upgrade paths.
Doctor-side compatibility should stay tracked in
`src/commands/doctor/shared/deprecation-compat.ts` until maintainers confirm
the repair is no longer needed.
- Revalidate compatibility replacement text during release planning. The
recommended replacement can shift as plugin ownership, externalization, and
config footprint move, so do not blindly copy stale replacement annotations
into release notes.
- Do not delete or rewrite beta tags after their matching npm package has been
published. If a pushed beta tag fails before npm publish, the version is not
consumed: keep the same `-beta.N`, delete/recreate or force-move the git tag
and prerelease to the fixed commit, and rerun preflight. Do not increment to
the next beta number until the matching npm package has actually published.
If a published beta needs a fix, commit the fix on the release branch and
increment to the next `-beta.N`.
- For a beta release train, run the fast local preflight first, publish the
beta to npm `beta`, then run the expensive published-package roster focused
on install/update/Docker/Parallels/NPM Telegram. If anything fails, fix it on
the release branch, commit/push/pull, increment beta number, and repeat. Run
the full expensive roster at least once before stable/latest promotion; for
later beta attempts, rerun only lanes whose evidence changed unless the fix
touches broad release, install/update, plugin, Docker, Parallels, or live QA
behavior. After each beta is published, scan current `main` once for critical
fixes that landed after the release branch cut and backport only important
low-risk fixes. Operators may authorize up to 4 autonomous beta attempts;
after 4 failed beta attempts, stop and report.
- As soon as the release candidate SHA exists, dispatch `OpenClaw Performance`
with `target_ref=<release-sha>` in parallel with the other release work. Do
not wait for full release validation to start the performance signal.
- Before publish/closeout, compare available product performance metrics with
earlier releases: Kova agent-turn/resource metrics, gateway startup
ready/listen/RSS/CPU metrics, and CLI startup metrics from release evidence
or clawgrit reports. Report regressions explicitly. A major regression is a
release blocker unless the operator waives it or the data clearly proves
infrastructure noise.
- Generate the changelog before every beta, beta rerun, stable release, or
stable rerun, before version/tag preparation. Use
`$openclaw-changelog-update` for the rewrite. Do not continue release prep if
the target `CHANGELOG.md` section does not have `### Highlights`,
`### Changes`, and `### Fixes`, grouped by user-facing surface while
preserving every relevant PR/issue ref and every human `Thanks @...`
attribution in the grouped bullet.
- Do not create beta-specific `CHANGELOG.md` headings. Beta releases use the
stable base version section, for example `v2026.4.20-beta.1` uses
`## 2026.4.20` release notes.
- When any beta or stable release is live, make a best-effort Discord
announcement using the configured secret workflow; do not block or roll back
the release if the announcement fails.
- When asked to announce on X, use `~/Projects/bird/bird` and follow the
release tweet style below.
## Keep release channel naming aligned
- `stable`: tagged releases only, published to npm `beta` by default; operators may target npm `latest` explicitly or promote later
- `beta`: prerelease tags like `vYYYY.M.D-beta.N`, with npm dist-tag `beta`
- Prefer `-beta.N`; do not mint new `-1` or `-2` beta suffixes
- `dev`: moving head on `main`
- When using a beta Git tag, publish npm with the matching beta version suffix so the plain version is not consumed or blocked
## Handle versions and release files consistently
- Version locations include:
- `package.json`
- `apps/android/app/build.gradle.kts`
- `apps/ios/Sources/Info.plist`
- `apps/ios/Tests/Info.plist`
- `apps/macos/Sources/OpenClaw/Resources/Info.plist`
- `docs/install/updating.md`
- Peekaboo Xcode project and plist version fields
- Before creating a release tag, make every version location above match the version encoded by that tag.
- For fallback correction tags like `vYYYY.M.D-N`, the repo version locations still stay at `YYYY.M.D`.
- “Bump version everywhere” means all version locations above except `appcast.xml`.
- Release signing and notary credentials live outside the repo in the private maintainer docs.
- Every stable OpenClaw release ships the npm package and macOS app together.
Beta releases normally ship npm/package artifacts first and skip mac app
build/sign/notarize unless the operator requests mac beta validation.
- Do not let the slower macOS signing/notary path block npm publication once
the npm preflight has passed. Keep mac validation/publish running in
parallel, publish npm from the successful npm preflight, then start published
npm install/update, Docker, and Parallels verification while mac artifacts
continue.
- After a beta is published, overlap remote/manual release rosters where useful,
but avoid piling local Docker, Parallels, and QA-Lab work onto the same host
when it would create system-load noise. Use selective reruns after failures or
fixes, but keep proof that Docker, Parallels, and QA-Lab each passed at least
once before stable/latest promotion.
- Mac packaging may be built from a slight release-branch variation of the
tagged commit when the delta is mac packaging, signing, workflow, or
validation-only release machinery. If mac packaging needs release-branch-only
fixes after the stable npm package or GitHub tag is already published, do not
create a `vYYYY.M.D-N` correction tag just to change the workflow source.
Dispatch the private mac workflows for the original `tag=vYYYY.M.D` with
`source_ref=release/YYYY.M.D` and `public_release_branch=release/YYYY.M.D`;
provenance checks must prove the source SHA descends from the tag and
validation/preflight use the same source. Reserve `vYYYY.M.D-N` correction
tags for emergency hotfixes that must publish a new npm package/release
identity, not for ordinary mac-only packaging recovery.
- The production Sparkle feed lives at `https://raw.githubusercontent.com/openclaw/openclaw/main/appcast.xml`, and the canonical published file is `appcast.xml` on `main` in the `openclaw` repo.
- That shared production Sparkle feed is stable-only. Beta mac releases may
upload assets to the GitHub prerelease, but they must not replace the shared
`appcast.xml` unless a separate beta feed exists.
- For fallback correction tags like `vYYYY.M.D-N`, the repo version still stays
at `YYYY.M.D`, but the mac release must use a strictly higher numeric
`APP_BUILD` / Sparkle build than the original release so existing installs
see it as newer.
## Build changelog-backed release notes
- `CHANGELOG.md` is release-owned. Normal PRs and direct `main` fixes should
not edit it.
- Before release branching or tagging, rewrite the target `CHANGELOG.md`
section from history, not existing notes. Use the last reachable stable or
beta release tag as the base, then inspect every commit through the target
release SHA.
- The changelog rewrite is not optional for beta reruns: any `beta.N` after a
rebase or backport must refresh the same stable-base `## YYYY.M.D` section
before the new version/tag commit.
- Include both merged PR commits and direct commits on `main`. Direct commits
matter: infer notes from their subject, body, touched files, linked issues,
tests, and nearby code when no PR body exists.
- Prefer PR bodies, issue links, review proof, and commit bodies over commit
subjects alone. If a commit fixed an issue directly, the commit body should
name the user-visible behavior, affected surface, issue ref, and credited
reporter/contributor when known.
- Treat missing context as a release-note audit gap: inspect the diff and linked
issue, draft the best accurate entry, and note the uncertainty for maintainer
review rather than inventing impact.
- Add missed user-facing changes, remove internal-only noise, dedupe overlapping
PR/direct-commit entries, and sort each section from most to least interesting
for users.
- Group related highlights, changes, and fixes by user-facing surface and
impact, but never lose traceability: each grouped bullet keeps every relevant
`#issue`, `(#PR)`, `Fixes #...`, and every human `Thanks @...` handle.
Multiple thanks in one bullet are expected when multiple contributor PRs are
grouped.
- Changelog entries should be user-facing, not internal release-process notes.
- GitHub release and prerelease bodies must use the full matching
`CHANGELOG.md` version section, not highlights or an excerpt. When creating
or editing a release, extract from `## YYYY.M.D` through the line before the
next level-2 heading and use that complete block as the release notes.
- When preparing release notes, scan `src/plugins/compat/registry.ts` and
`src/commands/doctor/shared/deprecation-compat.ts` for compatibility records
with `warningStarts` or `removeAfter` within 7 days after the release date.
Add an `Upcoming deprecations` note to the release notes when any exist,
including the compatibility code, target date, replacement, and a link to the
record's `docsPath` or `/plugins/compatibility` when no more specific
deprecation page exists.
- When cutting a mac release with a beta GitHub prerelease:
- tag `vYYYY.M.D-beta.N` from the release commit
- create a prerelease titled `openclaw YYYY.M.D-beta.N`
- use release notes from the stable base `CHANGELOG.md` version section
(`## YYYY.M.D`), not a beta-specific heading
- attach at least the zip and dSYM zip, plus dmg if available
- Keep the top version entries in `CHANGELOG.md` sorted by impact:
- `### Changes` first
- `### Fixes` deduped with user-facing fixes first
## Write release tweets
Use the OpenClaw account's existing release-post style:
- Format: `OpenClaw YYYY.M.D 🦞` or `🦞 OpenClaw YYYY.M.D is live`, blank line,
then 3-4 emoji-led bullets, blank line, one short punchline, then the release
link.
- For beta: say `OpenClaw YYYY.M.D-beta.N 🦞` or `OpenClaw YYYY.M.D beta N is
live`; keep it clearly beta and avoid implying stable promotion.
- Lead with user-visible capabilities, then important integrations, then
reliability/security/install fixes. Compress "lots of fixes" into one
readable bullet.
- Read the full changelog section before drafting. Do not lead with coverage,
CI, validation, or internal release mechanics unless the release is explicitly
about those. Peter prefers concrete user wins: features, integrations,
workflow improvements, and practical reliability fixes.
- Do not feature QA parity, test coverage, release gates, or validation lanes in
user-facing launch tweets. Keep them for release notes or maintainer proof
unless the operator explicitly asks for validation-focused copy.
- Do not feature plugin-author or developer tooling such as SDK helpers,
tool-plugin scaffolding, build/validate/init commands, or internal CLI
plumbing in general user-facing launch tweets unless the operator explicitly
asks for developer-focused copy.
- Tone: high-signal, slightly cheeky, confident, not corporate. One joke is
enough. Avoid punching down, insulting users, or promising what was not
verified.
- Peter likes dry, compact taglines when they feel earned. Good example:
`Big release, tiny release notes... kidding.` Keep the joke short and let the
feature bullets carry the tweet; do not turn the punchline into a second
paragraph or a forced bit.
- Length: release tweets are always standard tweets under 280 characters, with
room for one URL. Trim to 3-4 bullets and count the final text before posting.
- Links/media: include the GitHub release or changelog link at the end of the
first release tweet.
- Thread follow-ups: if doing a thread, keep the first release tweet as the
compact launch post, then publish one focused feature explainer per reply.
Follow-up replies should not repeat "new in VERSION" or the version number
when the thread context already makes it obvious.
- Peter's preferred thread workflow: first agree on the generic launch tweet,
then proceed through follow-up tweets one by one. When he says `next`, provide
or copy the next follow-up only; do not dump the full thread again unless asked.
- Every follow-up tweet should include a docs URL for that specific feature.
Prefer a bare URL over `Docs: <url>` unless the label is needed for clarity.
Keep follow-ups concise: around 160-220 raw characters is usually the sweet
spot; under 280 is the hard cap. If a URL makes a tweet fail, trim prose
before dropping the URL.
Prefer explaining diagnostics, trajectory/export, provider setup, model
commands, or other setup-heavy features in follow-ups instead of overloading
the first release tweet.
- Hotfix/correction: be direct and accountable. State what slipped, what is
fixed, and the new version. Keep jokes out of incident-style posts.
Examples to adapt:
```text
OpenClaw 2026.4.20-beta.1 🦞
🐳 Docker install/update smoke
🖥️ Parallels upgrade checks
🔧 Package verification tightened
Beta first. Stable after the gauntlet.
<release link>
```
```text
OpenClaw 2026.4.20 🦞
🚀 Faster install + update
🐳 Docker + Parallels verified
🍎 macOS signed + notarized
🔧 Channel/plugin fixes
Good boring release. Best kind.
<release link>
```
```text
Packaging issue in 2026.4.20-beta.1.
2026.4.20-beta.2 fixes install/update verification. No tag rewrites; beta moves
forward.
Upgrade with the beta channel.
<release link>
```
## Run publish-time validation
Before tagging or publishing, run:
```bash
pnpm check:architecture
pnpm build
pnpm ui:build
pnpm qa:otel:smoke
pnpm release:check
pnpm test:install:smoke
```
- Use `pnpm qa:otel:smoke` when release validation needs telemetry coverage.
It starts a local OTLP/HTTP trace receiver, runs QA-lab's
`otel-trace-smoke`, and checks span names plus content/identifier redaction
without external Opik or Langfuse credentials.
For a non-root smoke path:
```bash
OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke
```
After npm publish, run:
```bash
node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>
```
- This verifies the published registry install path in a fresh temp prefix.
- For stable correction releases like `YYYY.M.D-N`, it also verifies the
upgrade path from `YYYY.M.D` to `YYYY.M.D-N` so a correction publish cannot
silently leave existing global installs on the old base stable payload.
- Treat install smoke as a pack-budget gate too. `pnpm test:install:smoke`
now fails the candidate update tarball when npm reports an oversized
`unpackedSize`, so release-time e2e cannot miss pack bloat that would risk
low-memory install/startup failures.
- Keep direct npm global coverage enabled in install smoke. It exercises plain
`npm install -g <candidate>` fresh installs and npm-driven update installs,
because many users install with npm even when docs prefer pnpm.
- Use `pnpm test:live:media video` for bounded video-provider smoke when video
generation is in release scope. The default video smoke skips `fal`, runs one
text-to-video attempt per provider with a one-second lobster prompt, and caps
each provider operation with `OPENCLAW_LIVE_VIDEO_GENERATION_TIMEOUT_MS`
(`180000` by default).
- Run `pnpm test:live:media video --video-providers fal` only when FAL-specific
proof is required. Its queue latency can dominate release time.
- Set `OPENCLAW_LIVE_VIDEO_GENERATION_FULL_MODES=1` only when intentionally
validating the slower image-to-video and video-to-video transform lanes.
## Check all relevant release builds
- Always validate the OpenClaw npm release path before creating the tag.
- Use the configured secret workflow before live release validation so OpenAI
and Anthropic credentials are available without printing secrets.
- Parallels validation and any local live model QA for this train must use both
`OPENAI_API_KEY` and `ANTHROPIC_API_KEY`. If either cannot be injected, stop
before starting those local long lanes and report the missing key.
- Live credentialed channel QA is the GitHub Actions workflow
`QA-Lab - All Lanes` (`.github/workflows/qa-live-telegram-convex.yml`), not a
local substitute. Dispatch it from Actions against the release tag and wait
for it to pass before npm preflight/publish readiness. Use a SHA only when it
satisfies the workflow's secret-bearing trust gate: main ancestor or open PR
head. It runs the QA Lab mock parity gate plus live Matrix and live Telegram
lanes using the `qa-live-shared` environment; Telegram uses Convex CI
credential leases.
- Default release checks:
- `pnpm check`
- `pnpm check:test-types`
- `pnpm check:architecture`
- `pnpm build`
- `pnpm ui:build`
- `pnpm release:check`
- `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- Full pre-npm beta test roster:
- default release checks above
- all Docker tests: `pnpm test:docker:all`, plus standalone Docker live lanes
not covered by the aggregate when operator says "all docker tests":
`pnpm test:docker:live-acp-bind`, `pnpm test:docker:live-cli-backend`, and
`pnpm test:docker:live-codex-harness`
- all Parallels install/update tests:
`pnpm test:parallels:npm-update -- --json` plus any needed individual
rerun lanes from `openclaw-parallels-smoke`
- all QA release validation: dispatch GitHub Actions > `QA-Lab - All Lanes`
against the release tag and require success. This is the release gate for
live credentialed Matrix/Telegram channel coverage. Use a SHA only when it
satisfies the workflow trust gate. Run local OpenAI/Anthropic suites or
repo-backed character evals only when the operator asks for extra model
coverage or a failure needs local debugging.
- Post-published beta verification roster:
- `node --import tsx scripts/openclaw-npm-postpublish-verify.ts <beta-version>`
- install/update smoke against the published beta channel
- Docker install/update coverage that exercises the published beta package
- published npm Telegram proof: dispatch Actions > `NPM Telegram Beta E2E`
from `main` with `package_spec=openclaw@<beta-version>` and
`provider_mode=mock-openai`, and require success. This workflow is
maintainer-dispatched and intentionally has no `npm-release` approval gate;
`qa-live-shared` only supplies the shared QA secrets. This is the default
button path for installed-package onboarding, Telegram setup, and real
Telegram E2E against the published npm package.
Use the local `pnpm test:docker:npm-telegram-live` lane with the matching
`OPENCLAW_NPM_TELEGRAM_PACKAGE_SPEC` and Convex CI env only as a fallback
or debugging path.
- Parallels published beta install/update coverage with both OpenAI and
Anthropic provider keys available
- Parallels install/update proof must keep plugin installs enabled unless the
operator explicitly scopes a harness-only isolation check; a lane that
disables bundled plugin installs is not valid plugin/dependency release
evidence.
- targeted QA reruns only for areas touched by fixes after the full pre-npm
roster, unless the operator requests the full QA roster again. If the fix
touches live channel QA, credential plumbing, Matrix, Telegram, or the QA
harness, rerun Actions > `QA-Lab - All Lanes`.
- Check all release-related build surfaces touched by the release, not only the npm package.
- For beta-style full e2e batteries, hard-cap top-level long lanes instead of letting them run indefinitely. Use host `timeout --foreground`/`gtimeout --foreground` caps such as:
- `45m` for `OPENCLAW_INSTALL_SMOKE_SKIP_NONROOT=1 pnpm test:install:smoke`
- `90m` for `pnpm test:docker:all`
- `60m` each for standalone Docker live lanes
- `180m` for local full QA live OpenAI + Anthropic rosters when explicitly
requested; the default release channel QA gate is Actions >
`QA-Lab - All Lanes`
- Parallels caps from the `openclaw-parallels-smoke` skill
If a lane hits its cap, stop and inspect/fix the affected lane before continuing; do not continue to wait on the same process.
- Actual npm install/update phases are capped at 5 minutes. If `npm install -g`, installer package install, or `openclaw update` takes longer than 300s in release e2e, stop treating the run as healthy progress and debug the installer/updater or harness.
- Serialize host build/package mutations ahead of VM lanes. Finish `pnpm build`, `pnpm ui:build`, `pnpm release:check`, install smoke, and any Docker/package-prep lanes before starting Parallels `npm pack` lanes; otherwise `dist` can disappear during VM pack prep and produce false failures.
- Include mac release readiness in preflight by running the public validation
workflow in `openclaw/openclaw` and the real mac preflight in
`openclaw/releases-private` for every release.
- Treat the `appcast.xml` update on `main` as part of mac release readiness, not an optional follow-up.
- The workflows remain tag-based. The agent is responsible for making sure
preflight runs complete successfully before any publish run starts.
- Any fix after preflight means a new commit. Delete and recreate the tag and
matching GitHub release from the fixed commit, then rerun preflight from
scratch before publishing.
Exception: never delete or recreate a beta tag whose matching npm package has
already been published; increment to the next beta number instead. If only the
pushed tag/prerelease exists and npm publish has not happened, recreate that
same beta tag at the fixed commit.
- For stable mac releases, generate the signed `appcast.xml` before uploading
public release assets so the updater feed cannot lag the published binaries.
- Serialize stable appcast-producing runs across tags so two releases do not
generate replacement `appcast.xml` files from the same stale seed.
- For stable releases, rely primarily on the latest beta's broader release
workflow confidence. When promoting the matching non-beta build to npm
`latest`, prefer a light time-bounded verification pass: published npm
postpublish verify, Docker install/update smoke, macOS-only Parallels
install/update smoke, and required QA signal. Do not rerun the full
Docker/Parallels matrix unless the beta evidence is stale, the stable build
differs materially from beta, or the operator explicitly asks for full
retesting.
- If any required build, packaging step, or release workflow is red, do not say the release is ready.
## Use the right auth flow
- OpenClaw publish uses GitHub trusted publishing.
- Stable npm promotion from `beta` to `latest` uses the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow because `npm dist-tag` management needs `NPM_TOKEN`, while the
public npm release workflow stays OIDC-only.
- Prefer fixing the private workflow token path over any local 1Password
fallback. The desired setup is a granular npm token stored as the private
repo's `NPM_TOKEN` secret, scoped to the `openclaw` package with read/write
and 2FA bypass for automation.
- If the private dist-tag workflow cannot promote because `NPM_TOKEN` is absent
or stale, use the local tmux + 1Password fallback:
- Start or reuse a tmux session so interactive `npm login` and OTP prompts
are observable and recoverable.
- Hard rule: never run `op` directly in the main agent shell during release
work. Any 1Password CLI use must happen inside that tmux session so prompts
and alerts are contained and observable.
- Use `$release-private` for the npm credentials and OTP item.
Do not print passwords, tokens, or OTPs to the transcript; send them through
tmux buffers, env vars scoped to the tmux command, or `expect` with
`log_user 0`.
- Re-authenticate npm inside that tmux session with
`npm login --auth-type=legacy`, then confirm `npm whoami` reports
`steipete`.
- Promote with a fresh OTP:
`npm dist-tag add openclaw@YYYY.M.D latest --otp "$OTP"`.
- Verify with a cache-bypassed registry read, for example:
`npm view openclaw dist-tags --json --prefer-online --cache /tmp/openclaw-npm-cache-verify-$$`
and `npm view openclaw@latest version dist.tarball --json --prefer-online`.
- Direct stable publishes can also use that private dist-tag workflow to point
`beta` at the already-published `latest` version when the operator wants both
tags aligned immediately.
- The publish run must be started manually with `workflow_dispatch`.
- The npm workflow and the private mac publish workflow accept
`preflight_only=true` to run validation/build/package steps without uploading
public release assets.
- Real npm publish requires a prior successful npm preflight run id so the
publish job promotes the prepared tarball instead of rebuilding it.
- Real private mac publish requires a prior successful private mac preflight
run id so the publish job promotes the prepared artifacts instead of
rebuilding or renotarizing them again.
- The private mac workflow also accepts `smoke_test_only=true` for branch-safe
workflow smoke tests that use ad-hoc signing, skip notarization, skip shared
appcast generation, and do not prove release readiness.
- `preflight_only=true` on the npm workflow is also the right way to validate an
existing tag after publish; it should keep running the build checks even when
the npm version is already published.
- npm validation-only preflight may still be dispatched from ordinary branches
when testing workflow changes before merge. Release checks and real publish
use only `main` or `release/YYYY.M.D`.
- `.github/workflows/macos-release.yml` in `openclaw/openclaw` is now a
public validation-only handoff. It validates the tag/release state and points
operators to the private repo. It still rebuilds the JS outputs needed for
release validation, but it does not sign, notarize, or publish macOS
artifacts.
- `openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
is the required private mac validation lane for `swift test`; keep it green
before any real stable mac publish run starts.
- Real mac preflight and real mac publish both use
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`.
- The private mac validation lane runs on GitHub's standard macOS runner.
- The private mac preflight path runs on GitHub's xlarge macOS runner and uses
a SwiftPM cache because the build/sign/notarize/package path is CPU-heavy.
- Private mac preflight uploads notarized build artifacts as workflow artifacts
instead of uploading public GitHub release assets.
- Private smoke-test runs upload ad-hoc, non-notarized build artifacts as
workflow artifacts and intentionally skip stable `appcast.xml` generation.
- For stable releases, npm preflight, public mac validation, private mac
validation, and private mac preflight must all pass before any real publish
run starts. For beta releases, npm preflight plus the selected Docker,
install/update, Parallels, and release-check lanes are sufficient unless mac
beta validation was explicitly requested.
- Real publish runs may be dispatched from `main` or from a
`release/YYYY.M.D` branch. For release-branch runs, the tag must be contained
in that release branch, and the real publish must reuse a successful preflight
from the same branch.
- The release workflows stay tag-based; rely on the documented release sequence
rather than workflow-level SHA pinning.
- The `npm-release` environment must be approved by `@openclaw/openclaw-release-managers` before publish continues.
- Mac publish uses
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml` for
private mac preflight artifact preparation and real publish artifact
promotion.
- Real private mac publish uploads the packaged `.zip`, `.dmg`, and
`.dSYM.zip` assets to the existing GitHub release in `openclaw/openclaw`
automatically when `OPENCLAW_PUBLIC_REPO_RELEASE_TOKEN` is present in the
private repo `mac-release` environment.
- For stable releases, the agent must also download the signed
`macos-appcast-<tag>` artifact from the successful private mac workflow and
then update `appcast.xml` on `main`.
- For beta mac releases, do not update the shared production `appcast.xml`
unless a separate beta Sparkle feed exists.
- The private repo targets a dedicated `mac-release` environment. If the GitHub
plan does not yet support required reviewers there, do not assume the
environment alone is the approval boundary; rely on private repo access and
CODEOWNERS until those settings can be enabled.
- Do not use `NPM_TOKEN` or the plugin OTP flow for the OpenClaw package
publish path; package publishing uses trusted publishing.
- Use `NPM_TOKEN` only for explicit npm dist-tag management modes, because npm
does not support trusted publishing for `npm dist-tag add`.
- `@openclaw/*` plugin publishes use a separate maintainer-only flow.
- Only publish plugins that already exist on npm; bundled disk-tree-only plugins stay unpublished.
## Fallback local mac publish
- Keep the original local macOS publish workflow available as a fallback in case
CI/CD mac publishing is unavailable or broken.
- Preserve the existing maintainer workflow Peter uses: run it on a real Mac
with local signing, notary, and Sparkle credentials already configured.
- Follow the private maintainer macOS runbook for the local steps:
`scripts/package-mac-dist.sh` to build, sign, notarize, and package the app;
manual GitHub release asset upload; then `scripts/make_appcast.sh` plus the
`appcast.xml` commit to `main`.
- `scripts/package-mac-dist.sh` now fails closed for release builds if the
bundled app comes out with a debug bundle id, an empty Sparkle feed URL, or a
`CFBundleVersion` below the canonical Sparkle build floor for that short
version. For correction tags, set a higher explicit `APP_BUILD`.
- `scripts/make_appcast.sh` first uses `generate_appcast` from `PATH`, then
falls back to the SwiftPM Sparkle tool output under `apps/macos/.build`.
- For stable tags, the local fallback may update the shared production
`appcast.xml`.
- For beta tags, the local fallback still publishes the mac assets but must not
update the shared production `appcast.xml` unless a separate beta feed exists.
- Treat the local workflow as fallback only. Prefer the CI/CD publish workflow
when it is working.
- After any stable mac publish, verify all of the following before you call the
release finished:
- the GitHub release has `.zip`, `.dmg`, and `.dSYM.zip` assets
- `appcast.xml` on `main` points at the new stable zip
- the packaged app reports the expected short version and a numeric
`CFBundleVersion` at or above the canonical Sparkle build floor
## Run the release sequence
1. Confirm the operator explicitly wants to cut a release.
2. Choose the exact target version and git tag.
3. Commit any dirty files in coherent groups, push, pull/rebase, and verify the
worktree is clean.
4. Pull latest `main` and confirm current `main` CI is green.
5. Run `/changelog` for the stable base target version on `main`, commit the
changelog rewrite immediately, push, and pull/rebase. For beta releases,
keep the changelog heading as `## YYYY.M.D`, not `## YYYY.M.D-beta.N`.
6. Create `release/YYYY.M.D` from that post-changelog `main` commit.
7. Make every repo version location match the beta tag before creating it.
8. Commit release preparation changes on the release branch and push the branch.
9. Immediately dispatch Actions > `OpenClaw Performance` from `main` with
`target_ref=<release-sha>`, `profile=release`, `repeat=3`, deep profiling
off, live OpenAI off, and regression failure off. Let it run in parallel
with preflight and validation work.
10. Run the fast local beta preflight from the release branch before any npm
preflight or publish. Keep expensive Docker, Parallels, and published-package
install/update lanes for after the beta is live unless the operator asks to
run them before beta publication.
11. For beta releases, skip mac app build/sign/notarize unless beta scope or a
release blocker specifically requires it. For stable releases, include the
mac app, signing, notarization, and appcast path.
12. Confirm the target npm version is not already published.
13. Create and push the git tag from the release branch.
14. Create or refresh the matching GitHub release.
15. Dispatch Actions > `QA-Lab - All Lanes` against the release tag and wait
for the mock parity, live Matrix, and live Telegram credentialed-channel
lanes to pass.
16. Start `.github/workflows/openclaw-npm-release.yml` from the release branch
with `preflight_only=true`
and choose the intended `npm_dist_tag` (`beta` default; `latest` only for
an intentional direct stable publish). Wait for it to pass. Save that run id
because the real publish requires it to reuse the prepared npm tarball.
17. Before real publish, review the early performance run if it has completed.
Compare against earlier release evidence or clawgrit reports where
available. Call out minor regressions in the release proof; block on major
regressions unless waived or proven noisy.
18. For stable releases, start `.github/workflows/macos-release.yml` in
`openclaw/openclaw` and wait for the public validation-only run to pass.
19. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-validate.yml`
with the same tag and wait for the private mac validation lane to pass.
20. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
with `preflight_only=true` and wait for it to pass. Save that run id because
the real publish requires it to reuse the notarized mac artifacts.
21. If any preflight or validation run fails, fix the issue on a new commit,
delete the tag and matching GitHub release, recreate them from the fixed
commit, and rerun all relevant preflights from scratch before continuing.
Never reuse old preflight results after the commit changes. For pushed or
published beta tags, do not delete/recreate; increment to the next beta tag.
For preflight-only failures where npm did not publish the beta version,
delete/recreate the same beta tag and prerelease at the fixed commit instead
of skipping a prerelease number.
22. Start `.github/workflows/openclaw-npm-release.yml` from the same branch with
the same tag for the real publish, choose `npm_dist_tag` (`beta` default,
`latest` only when you intentionally want direct stable publish), keep it
the same as the preflight run, and pass the successful npm
`preflight_run_id`.
23. Wait for `npm-release` approval from `@openclaw/openclaw-release-managers`.
24. Run postpublish verification:
`node --import tsx scripts/openclaw-npm-postpublish-verify.ts <published-version>`.
25. Run the post-published beta verification roster. First scan current `main`
for critical fixes that landed after the release branch cut; backport only
important low-risk fixes before starting expensive lanes, or increment to
the next beta if the fix must change the already-published package. If any
lane fails after the beta package is published, fix, commit/push/pull,
increment to the next beta tag, and rerun the affected beta evidence. Once
the beta is live, start remote/manual rosters where they
can overlap safely, but keep local Docker and Parallels load controlled.
Ensure the full expensive roster has passed at least once before
stable/latest promotion. The roster includes the manual Actions >
`NPM Telegram Beta E2E` workflow against the exact published beta package.
If a pre-npm lane fails before any tag/package leaves the machine, fix and
rerun the same intended beta attempt. Repeat up to the operator's
authorized beta-attempt limit, normally 4.
26. Announce the beta/stable release on Discord best-effort using the configured secret workflow.
27. If the operator requested beta only, stop after beta verification and the
announcement.
28. If the stable release was published to `beta`, use the light stable
promotion roster when the matching beta already carried the full confidence
pass: published npm postpublish verify, Docker install/update smoke,
macOS-only Parallels install/update smoke, and required QA signal.
Then start the private
`openclaw/releases-private/.github/workflows/openclaw-npm-dist-tags.yml`
workflow to promote that stable version from `beta` to `latest`, then
verify `latest` now points at that version.
29. If the stable release was published directly to `latest` and `beta` should
follow it, start that same private dist-tag workflow to point `beta` at the
stable version, then verify both `latest` and `beta` point at that version.
30. For stable releases, start
`openclaw/releases-private/.github/workflows/openclaw-macos-publish.yml`
for the real publish with the successful private mac `preflight_run_id` and
wait for success.
31. Verify the successful real private mac run uploaded the `.zip`, `.dmg`,
and `.dSYM.zip` artifacts to the existing GitHub release in
`openclaw/openclaw`.
32. For stable releases, download `macos-appcast-<tag>` from the successful
private mac run, update `appcast.xml` on `main`, and verify the feed. Merge
or cherry-pick release branch changes back to `main` after stable succeeds.
33. For beta releases, publish the mac assets only when intentionally requested;
expect no shared production
`appcast.xml` artifact and do not update the shared production feed unless a
separate beta feed exists.
34. After publish, verify npm and the attached release artifacts.
## GHSA advisory work
- Use `openclaw-ghsa-maintainer` for GHSA advisory inspection, patch/publish flow, private-fork validation, and GHSA API-specific publish checks.

View File

@@ -1,288 +0,0 @@
---
name: release-openclaw-nightly
description: "OpenClaw Tideclaw alpha/nightly release automation: isolated branches, local fixes, release CI, branch retention, and forward-port to main."
---
# Nightly Release
Use for Tideclaw/OpenClaw alpha/nightly release automation, manual alpha triggers, beta prep, release-branch repair, and post-release forward-port. Load `$release-private` if it exists before using Tideclaw host paths, cron ids, or Discord routing ids.
## Policy
- Alpha/nightly runs every 12h or by manual trigger.
- Beta is human-triggered from Discord from a proven alpha/release branch.
- Stable/latest always needs explicit human confirmation.
- Never publish from a dirty checkout or directly from `main`.
- Main can be busy or broken; alpha work must be isolated so transient main failures do not block a usable nightly.
- Publish only after release-branch proof is green.
- After a successful alpha, forward-port release-branch commits back to `main` and prove main CI green.
- Forward-port PRs contain only reusable fixes needed to make nightly/release checks pass. They must not contain alpha version bumps, release notes, changelog release entries, tags, generated artifacts, or state-file updates.
- Keep only alpha/nightly branches from the last 3 days, plus any branch with an active run, open PR, or release tag.
- Never run broad env/token dumps. For GitHub writes on the Tideclaw host, use the Tideclaw `gh` write wrapper below.
## Identity
Tideclaw should commit under its own machine identity on release branches and forward-port branches:
```bash
git config user.name "Tideclaw"
git config user.email "tideclaw@openclaw.ai"
```
This is good for auditability if commits are clearly machine-authored and gated by CI. Avoid direct pushes to protected `main`; forward-port via PR/automerge unless the repo policy explicitly allows the bot to push after green checks. Include human `Co-authored-by` only when a human supplied the patch or explicit commit text.
## Branch Shape
- Branch prefix: `tideclaw/alpha/`
- Branch name: `tideclaw/alpha/YYYY-MM-DD-HHMMZ`
- Base: current `origin/main` SHA at trigger time.
- State file: resolve from `$release-private` on the Tideclaw host.
- Release tag: `vYYYY.M.D-alpha.N`
- npm dist-tag: `alpha`
Do not reuse old alpha branches for a new run. If rerunning the same base SHA, create a new timestamped branch and record why.
## Start
1. Work in the Tideclaw host checkout from `$release-private`.
2. Fetch first:
```bash
git fetch origin main --tags --prune
git switch main
git merge --ff-only origin/main
BASE_SHA="$(git rev-parse origin/main)"
BRANCH="tideclaw/alpha/$(date -u +%Y-%m-%d-%H%MZ)"
git switch -c "$BRANCH" "$BASE_SHA"
```
3. Read repo release docs/scripts before changing anything:
- `AGENTS.md`
- release docs under `docs/`
- release scripts under `scripts/`
- `.github/workflows/*release*`
4. Compare `$BASE_SHA` with the last successful alpha state and current git/npm/GitHub alpha tags. If already released, report skip and do not publish.
Manual trigger:
```bash
CRON_ID="<from release-private>"
OPENCLAW_ALLOW_ROOT=1 openclaw cron run "$CRON_ID" --expect-final --timeout 21600000
```
## Discord Alpha Trigger
Tideclaw may run alpha immediately from Discord when a maintainer mentions Tideclaw in `#releases` or `#maintainers`.
Accepted shapes:
```text
@Tideclaw run alpha now
@Tideclaw alpha release from main now
@Tideclaw trigger alpha
```
Rules:
1. Treat this as a manual alpha trigger equivalent to the alpha cron job.
2. Start from current `origin/main` and create a fresh `tideclaw/alpha/YYYY-MM-DD-HHMMZ` branch.
3. Follow the normal alpha workflow: reuse prior fixes, run local checks, fix on the alpha branch, run release CI, publish alpha after green gates, then forward-port reusable fixes via fixes-only PR.
4. If another alpha/beta/stable release run is already active, report the active branch/run and stop.
5. `#maintainers` trigger requires an explicit Tideclaw mention; do not react to unmentioned release chatter there.
6. Resolve Discord role/user ids and live host hotfix notes from `$release-private`.
## Discord Beta Trigger
Tideclaw may run beta releases from `#releases` or mentioned `#maintainers` commands only when a maintainer sends an explicit beta trigger. Treat this as human approval for beta, not for stable/latest.
Accepted shapes:
```text
@Tideclaw beta release from vYYYY.M.D-alpha.N
@Tideclaw beta release from tideclaw/alpha/YYYY-MM-DD-HHMMZ
@Tideclaw beta release from latest proven alpha
```
Rules:
1. Require the words `beta release` and a source alpha tag/branch, or `latest proven alpha`.
2. If the source is ambiguous, ask one clarifying question in `#releases` and stop.
3. Verify the source alpha first: GitHub release, npm `alpha` package, release CI, recorded state file, and branch/tag SHA.
4. Create a fresh beta branch `tideclaw/beta/YYYY-MM-DD-HHMMZ` from the proven alpha source, not directly from a moving `main`.
5. Reuse/squash only stabilization fixes already proven on alpha. Do not import unrelated alpha release mechanics unless the beta release docs require them.
6. Compute beta as `vYYYY.M.D-beta.N`, matching npm `--tag beta`.
7. Run beta release validation/preflight/full release CI and fix failures on the beta branch.
8. Publish beta only after green beta gates. Use GitHub Actions/OIDC, never direct npm publish from the host.
9. Final Discord summary must include source alpha, beta tag/version, branch, fix commits, workflow run IDs, npm/GitHub proof, and any skipped/blocked reason.
10. After beta publishes, forward-port reusable fixes to `main` using the same fixes-only PR rules below.
## Reuse Prior Fixes
Before running checks, mine recent Tideclaw alpha branches for fixes already made during previous release attempts:
1. Read the Tideclaw state file from `$release-private` for the last successful alpha branch and fix commit SHAs.
2. List recent remote branches:
```bash
git for-each-ref refs/remotes/origin/tideclaw/alpha --format='%(refname:short) %(committerdate:iso-strict)'
```
3. Consider only Tideclaw alpha branches from the last 3 days plus the last successful alpha branch.
4. For each candidate branch, inspect commits that are not in current `origin/main`:
```bash
git log --no-merges --reverse --format='%H%x09%s' origin/main..origin/tideclaw/alpha/YYYY-MM-DD-HHMMZ
```
5. Cherry-pick only real stabilization fixes that still apply to the new alpha branch. Prefer commits recorded as `fixCommitShas` in the state file.
6. Skip version bumps, changelog release entries, tag artifacts, generated release notes, state-file-only commits, and one-off debug instrumentation.
7. If a cherry-pick conflicts, inspect whether current main already contains an equivalent fix. If not, resolve minimally and keep the commit message clear.
8. Record reused commit SHAs separately from newly authored fix SHAs in the alpha state and final Discord summary.
Use `git cherry`, `git range-diff`, and targeted test reruns to avoid duplicating fixes already present on `main`.
## Repair Loop
Use the branch as a release-candidate repair surface:
1. Run narrow local checks first: changed tests, release preflight, type/lint/build gates required by release docs.
2. If local checks fail, fix on the alpha branch with minimal commits.
3. Commit each coherent fix as Tideclaw.
4. Re-run the failed local check after each fix.
5. Do not hide failures by editing baselines, expected-failure lists, ignore files, or release inventory unless the release docs explicitly require it and the diff is justified.
6. If a failure is flaky, rerun once; if still red, treat it as real.
7. If the fix is clearly useful for main, keep it small and forward-portable. Avoid broad refactors during alpha stabilization.
Commit examples:
```bash
git add <files>
git commit -m "fix: stabilize alpha release preflight"
git push -u origin "$BRANCH"
```
## Release CI
After local proof:
1. Compute the next `vYYYY.M.D-alpha.N` from existing git tags, npm versions, and GitHub releases.
2. Make the alpha branch package version and release metadata match that tag, commit it, and push the branch.
3. Run release validation from the alpha branch, using GitHub CLI, not browser/fetch tools. On the Tideclaw host, bare `gh` is a read-only Codex sandbox wrapper; use `/usr/local/bin/gh-tideclaw-write` for write-capable commands such as `workflow run`, `run cancel`, and publish dispatch:
```bash
GH="/usr/local/bin/gh-tideclaw-write"
SHA="$(git rev-parse HEAD)"
TAG="v$(node -p "require('./package.json').version")"
BRANCH="$(git branch --show-current)"
"$GH" workflow run full-release-validation.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f ref="$BRANCH" \
-f release_profile=beta \
-f rerun_group=all
"$GH" workflow run openclaw-npm-release.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f tag="$SHA" \
-f preflight_only=true \
-f npm_dist_tag=alpha
```
4. Watch the exact workflow run IDs and head SHA with `gh run list`, `gh run view`, and `gh api`. Read-only `gh` is fine for polling; use `$GH` only when a command mutates GitHub. Do not use Codex browser/fetch for GitHub API polling; prior Tideclaw runs failed there after successful preflight.
5. For alpha, blocking gates are the ones Tideclaw can repair directly or that prove package safety: normal CI, plugin prerelease, npm preflight, package preparation, install smoke, tag/reachability, and publish verification. Treat cross-OS, live channel, QA Lab, package acceptance, long Docker E2E, and Telegram package E2E failures as advisory; report them in Discord and continue if the blocking gates are green.
- If `rerun_group=all` is stuck only on advisory lanes after CI, plugin prerelease, npm preflight, package preparation, and install smoke are green, dispatch a focused Full Release Validation on the same head with `-f rerun_group=install-smoke`. Use that successful focused Full Release Validation run as the publish proof, and include the separate CI/plugin/full advisory run IDs in the Discord summary.
6. If a blocking gate fails, fix on the alpha branch, push, and rerun only the failed or required release CI. If the commit changes, discard old preflight/full-validation run IDs and rerun them for the new head.
7. After full validation and npm preflight are green on the same branch head, create and push the release tag from that exact commit:
```bash
git tag -a "$TAG" "$SHA" -m "openclaw ${TAG#v}"
git push origin "$TAG"
```
8. Dispatch the publish wrapper from the same alpha branch. Use the successful npm preflight run ID and full release validation run ID from the same head SHA:
```bash
"$GH" workflow run openclaw-release-publish.yml --repo openclaw/openclaw --ref "$BRANCH" \
-f tag="$TAG" \
-f preflight_run_id="$NPM_PREFLIGHT_RUN_ID" \
-f full_release_validation_run_id="$FULL_RELEASE_VALIDATION_RUN_ID" \
-f npm_dist_tag=alpha \
-f plugin_publish_scope=all-publishable \
-f publish_openclaw_npm=true \
-f release_profile=beta \
-f wait_for_clawhub=false
```
9. Watch the publish wrapper plus child runs. If `openclaw-npm-release.yml` is waiting on the `npm-release` environment and Tideclaw cannot approve it, report that as the only blocker; do not call the release done.
10. Do not publish npm directly from the host; use GitHub Actions/OIDC.
Important: `openclaw-npm-release.yml` with `preflight_only=true` only prepares artifacts. It does not publish. A successful alpha requires the later `openclaw-release-publish.yml` wrapper, a pushed git tag, npm `alpha` dist-tag proof, and a GitHub prerelease.
## Verify Published Alpha
Release is not done until all are true:
- GitHub tag exists.
- GitHub Release exists and is marked prerelease.
- Release body links npm version page, registry tarball, integrity, and CI/proof.
- `npm view openclaw@<version>` shows the exact version, dist-tag `alpha`, tarball, integrity, and publish time.
- Installed/package smoke follows repo release docs.
- The Tideclaw state file from `$release-private` records version, tag, base SHA, branch, fix commit SHAs, workflow run IDs, npm integrity, and timestamp.
Final Discord summary in `#releases`:
- tag/version
- base SHA
- branch
- fix commits
- workflow run IDs
- npm/GitHub proof
- skipped/blocked reason if not released
Use Discord-safe Markdown links with angle-bracket targets. Never print secrets.
## Forward-Port
After a successful alpha, raise a fixes-only PR back to `main`:
1. Create/update a forward-port branch from current `origin/main`:
```bash
git fetch origin main --prune
git switch -c "tideclaw/forward-port/$(date -u +%Y-%m-%d-%H%MZ)" origin/main
```
2. Cherry-pick only release-branch commits that are real fixes required to make nightly/release checks pass.
3. Exclude alpha version bumps, changelog release entries, release notes, tag artifacts, generated release assets, state-file-only commits, and any commit whose only purpose was publishing the alpha.
4. If a commit mixes a real fix with release/version changes, split it: replay only the fix hunks into a new commit on the forward-port branch.
5. Resolve conflicts in favor of the minimal main-compatible fix.
6. Run the relevant changed/local gate.
7. Push and open a PR, or use the repos allowed bot merge path.
8. Wait for required main CI to go green. If CI fails, fix on the forward-port branch and rerun.
9. Report the PR/merge SHA and any commits intentionally not forward-ported.
If `origin/main` is independently red before the forward-port, document the unrelated failing check and still keep the forward-port PR green against its head when possible.
## Branch Retention
Before and after each run, prune old alpha branches:
1. List `origin/tideclaw/alpha/*`.
2. Keep branches whose timestamp is within the last 3 days UTC.
3. Keep branches referenced by a live workflow run, open PR, release tag, or state file.
4. Delete only Tideclaw-owned alpha branches:
```bash
git push origin --delete tideclaw/alpha/YYYY-MM-DD-HHMMZ
```
Never delete human branches, beta branches, stable branches, or unknown prefixes.
## Stop Conditions
Stop and report clearly if:
- release docs/scripts disagree on versioning or publish path
- required secrets/auth are unavailable
- GitHub Actions cannot be dispatched or observed
- a required release gate stays red after a real fix attempt
- npm/GitHub state disagrees after publish
- forward-port cannot be made green without a larger product decision

View File

@@ -1,234 +0,0 @@
---
name: release-openclaw-plugin-testing
description: Plan and run pre-release OpenClaw plugin validation across bundled plugins, package artifacts, lifecycle commands, doctor/fix, config round-trip, gateway startup, SDK compatibility, Docker E2E, Package Acceptance, and Testbox proof.
---
# OpenClaw Pre-Release Plugin Testing
Use this skill when the user asks for plugin release confidence, plugin lifecycle
sweeps, package-artifact plugin proof, or "what else should we test before
release?" It complements `openclaw-testing`; use that skill too when choosing
the cheapest safe runner or debugging a failing lane.
## Goal
Prove the plugin system as a product surface, not just as source tests:
- bundled plugin lifecycle: install, inspect, enable, disable, uninstall
- package artifact behavior from a clean `HOME`
- doctor/fix/config validation and idempotence
- config discovery and config round-trip
- status/log visibility and diagnostics
- gateway startup/bootstrap with plugin metadata snapshots
- public SDK compatibility for real external plugins
- live-ish provider/channel probes only when safe credentials exist
## First Checks
From the OpenClaw repo root:
```bash
pnpm docs:list
git status --short --branch
readlink node_modules
pnpm changed:lanes --json
```
In Codex worktrees under `.codex/worktrees`, `node_modules` must be a symlink to
the main OpenClaw checkout. Do not run `pnpm install` there. For broad or
package-heavy proof, use Blacksmith Testbox or GitHub Actions.
## Runner Choice
Prefer this order:
1. **GitHub Package Acceptance** for installable-package product proof.
2. **`ci-build-artifacts-testbox.yml` Testbox** when Docker/package lanes need
seeded `dist`, `dist-runtime`, and package caches.
3. **`ci-check-testbox.yml` Testbox** for source checks, targeted Vitest,
package-boundary checks, or focused Docker lanes.
4. **Local targeted commands only** for small format/static/unit probes.
Avoid long package Docker runs from a stale sparse worktree. If Testbox sync
reports hundreds of changed files or starts deleting package inputs, stop and
warm a fresh box from current `main`, or switch to Package Acceptance.
## Existing Baseline
Run or verify these before inventing new coverage:
```bash
OPENCLAW_TESTBOX=1 pnpm check:changed
pnpm run test:extensions:package-boundary:canary
pnpm run test:extensions:package-boundary:compile
pnpm test:docker:plugins
OPENCLAW_PLUGINS_E2E_CLAWHUB=0 pnpm test:docker:plugins
pnpm test:docker:plugin-update
pnpm test:docker:bundled-channel-deps:fast
```
For full bundled install/uninstall proof, shard the packaged sweep:
```bash
OPENCLAW_BUNDLED_PLUGIN_SWEEP_TOTAL=8 \
OPENCLAW_BUNDLED_PLUGIN_SWEEP_INDEX=<0-7> \
pnpm test:docker:bundled-plugin-install-uninstall
```
Expected current packaged scope: 116 public bundled plugins over shards `0-7`.
Private QA plugins are source-mode only unless a package explicitly includes
them.
## Confidence Matrix
Use this matrix for pre-release signoff. Record pass/fail, run URL/Testbox ID,
package SHA/version, and skipped-live reason.
| Surface | Proof | Preferred runner |
| --- | --- | --- |
| Package artifact | Package Acceptance `suite_profile=package` or custom lanes | GitHub Actions |
| Bundled lifecycle | 8-shard `test:docker:bundled-plugin-install-uninstall` | Testbox or release Docker |
| External plugins | `test:docker:plugins` and `plugins-offline` | Testbox/package acceptance |
| Update no-op | `test:docker:plugin-update` | Testbox/package acceptance |
| Channel runtime deps | `test:docker:bundled-channel-deps:fast` plus key channels | Testbox/package acceptance |
| Doctor/fix | seeded bad configs + `doctor --fix --non-interactive` | new Docker/Testbox harness |
| Config round-trip | `config set/get`, inspect, doctor, reload, diff hash | new Docker/Testbox harness |
| Gateway bootstrap | clean `HOME`, plugin groups enabled/disabled, status JSON | new Docker/Testbox harness |
| SDK compatibility | directory, tgz, and `file:` external plugins using SDK subpaths | `test:docker:plugins` plus new smoke |
| Live-ish | redacted provider/channel probes only for present env | Testbox live lanes |
## Package Acceptance Plan
Use this when validating a release branch, beta, or candidate package:
```bash
gh workflow run package-acceptance.yml \
--repo openclaw/openclaw \
--ref main \
-f workflow_ref=main \
-f source=ref \
-f package_ref=<branch-or-sha> \
-f suite_profile=custom \
-f docker_lanes='plugins-offline plugin-update bundled-channel-deps-compat doctor-switch update-channel-switch config-reload mcp-channels npm-onboard-channel-agent' \
-f telegram_mode=mock-openai
```
Use `source=npm -f package_spec=openclaw@beta` for published beta proof. Keep
`workflow_ref` as trusted current harness code unless the release process says
otherwise.
## New Testbox Harness Plan
If more certainty is needed, add or run a `plugin-lifecycle-matrix` Docker lane
that uses one package tarball and sharded plugin lists. Per plugin:
1. Start with a clean `HOME`.
2. Capture `plugins list --json`.
3. `plugins install <id>`.
4. `plugins inspect <id> --json`.
5. `plugins disable <id>`, then assert disabled visibility.
6. `plugins enable <id>`, except config-required plugins without config.
7. `plugins registry --refresh`.
8. `doctor --non-interactive`.
9. `plugins uninstall <id> --force`.
10. Assert no config entry, allow/deny residue, install record, managed dir, or
bundled `dist/extensions/...` load path remains.
11. Assert diagnostics contain no `level: "error"` and output redacts
secret-looking values.
Keep `memory-lancedb` special: it is config-required. First assert install does
not enable it without embedding config, then run a second configured case.
## Doctor/Fix Matrix
Seed bad states and require `doctor --fix --non-interactive` to repair them,
then run doctor again and require idempotence:
- stale `plugins.allow`
- stale `plugins.entries`
- stale channel config for missing channel plugin
- invalid `plugins.entries.<id>.config`
- packaged bundled path in `plugins.load.paths`
- legacy `plugins.installs`
- disabled channel/plugin config that must not stage runtime deps
- root-owned global package tree that must remain unmodified
## Gateway Bootstrap Matrix
Start packaged OpenClaw in Docker with clean state:
- provider plugins enabled, no credentials: ready with warnings, no crash
- channel plugins configured disabled: no runtime deps staged
- startup-activation plugins enabled: ready and reflected in status
- invalid single plugin config: bad plugin skipped/quarantined, others remain
Assert:
- gateway reaches ready
- `openclaw status --json` includes plugin diagnostics
- `openclaw plugins inspect --all --json` is parseable
- package tree is not mutated
- logs contain no raw tokens
## Config Round-Trip Representatives
Use representative plugin families instead of every plugin for deep config
round-trip:
- providers: `openai`, `anthropic`, `mistral`, `openrouter`
- channels: `telegram`, `discord`, `slack`, `whatsapp`
- memory: `memory-lancedb`
- feature/runtime: `browser`, `acpx`, `tokenjuice`
For each representative:
1. Write config through CLI when possible.
2. Read it back through `config get` or JSON.
3. Run `plugins inspect`.
4. Run `doctor --non-interactive`.
5. Trigger gateway config reload if applicable.
6. Compare config hash before/after no-op commands.
## External SDK Smoke
In a package Docker lane, create tiny external plugins and install them from:
- local directory
- `.tgz`
- `file:` npm spec
Cover CJS and ESM shapes, plus at least one plugin importing focused
`openclaw/plugin-sdk/*` subpaths. Assert `plugins inspect` sees its tool,
gateway method, CLI command, or service.
## Live-Ish Probe Rules
Before live-ish work, source allowed env in Testbox and generate a redacted
availability matrix: present/missing only, never values.
Only run probes for credentials that exist. Prefer auth/catalog/status probes
over sending user-visible messages. If a probe might contact an external user,
channel, or workspace, stop and ask the user.
## Reporting
Report in this shape:
```text
package/ref:
tbx ids / run urls:
matrix:
bundled lifecycle:
package acceptance:
doctor/fix:
gateway bootstrap:
config round-trip:
sdk external:
live-ish:
failures:
skips:
next highest-value gap:
```
Say clearly when a failure is Testbox sync/env damage rather than product
behavior, and prove that with a clean rerun or current-main comparison.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "OpenClaw Plugin Pre-Release Testing"
short_description: "Plan plugin release validation"
default_prompt: "Use $release-openclaw-plugin-testing to plan or run pre-release OpenClaw plugin validation across package, lifecycle, doctor, gateway, SDK, and live-ish proof."

View File

@@ -1,148 +0,0 @@
---
name: security-triage
description: "Triage OpenClaw security advisories, drafts, and GHSA reports with shipped-tag and trust-model proof."
---
# Security Triage
Use when reviewing OpenClaw security advisories, drafts, or GHSA reports.
Goal: high-confidence maintainers' triage without over-closing real issues or shipping unnecessary regressions.
## Close Bar
Close only if one of these is true:
- duplicate of an existing advisory or fixed issue
- invalid against shipped behavior
- out of scope under `SECURITY.md`
- fixed before any affected release/tag
Do not close only because `main` is fixed. If latest shipped tag or npm release is affected, keep it open until released or published with the right status.
## Required Reads
Before answering:
1. Read `SECURITY.md`.
2. Read the GHSA body with `gh api /repos/openclaw/openclaw/security-advisories/<GHSA>`.
3. Inspect the exact implicated code paths.
4. Verify shipped state:
- `git tag --sort=-creatordate | head`
- `npm view openclaw version --userconfig "$(mktemp)"`
- `git tag --contains <fix-commit>`
- if needed: `git show <tag>:path/to/file`
5. Search for canonical overlap:
- existing published GHSAs
- older fixed bugs
- same trust-model class already covered in `SECURITY.md`
## Review Method
For each advisory, decide:
- `close`
- `keep open`
- `keep open but narrow`
Default to one advisory at a time when comments/closures are involved:
1. Review exactly one GHSA.
2. Print the GHSA URL first.
3. Summarize the decision and evidence for discussion.
4. Draft one maintainer-ready comment.
5. Copy only that one comment to the clipboard.
6. Stop and wait for Peter to post/discuss before moving to the next GHSA.
Do not batch multiple close comments unless Peter explicitly asks for a batch.
Check in this order:
1. Trust model
- Is the prerequisite already inside trusted host/local/plugin/operator state?
- Does `SECURITY.md` explicitly call this class out as out of scope or hardening-only?
2. Shipped behavior
- Is the bug present in the latest shipped tag or npm release?
- Was it fixed before release?
3. Exploit path
- Does the report show a real boundary bypass, not just prompt injection, local same-user control, or helper-level semantics?
- If data only moves between trusted workspace-memory files called out in `SECURITY.md`, do not treat "injection markers" alone as a security bug.
- In that case, frame sanitization as optional hardening only if it preserves expected memory workflows.
4. Functional tradeoff
- If a hardening change would reduce intended user functionality, call that out before proposing it.
- Prefer fixes that preserve user workflows over deny-by-default regressions unless the boundary demands it.
5. Hardening follow-up
- Even when the GHSA should close, ask whether a narrow hardening change would reduce footguns without changing the documented trust boundary.
- Separate hardening from vulnerability status. Phrase it as "not required for GHSA closure, but worth considering".
- Bring up hardening only if it is concrete, low-risk, and preserves intended maintainer/operator workflows.
- If hardening would require a product/security model change, say that explicitly and do not imply it is a required fix for closure.
## Response Format
When preparing a maintainer-ready close reply:
1. Print the GHSA URL first.
2. Then draft a detailed response the maintainer can post.
3. Include:
- exact reason for close
- exact code refs
- exact shipped tag / release facts
- fix provenance or canonical duplicate GHSA when applicable
- optional hardening note only if worthwhile and functionality-preserving
Keep tone firm, specific, non-defensive.
## Public Wording Hygiene
- Keep raw commit hashes, PR titles/numbers, and fix-mechanism summaries out of public advisory text. Use the patched release/version field only.
- Keep exact commit SHAs, PRs, and implementation notes in internal notes and verification files.
- For hardening/no-publish outcomes, do not add exploit-heavy details, "Fixed by" text, or a "Fix Commit(s)" section. Thank reporters, preserve credit, state the `SECURITY.md` boundary, and say clearly that the GHSA will close without publication.
- For published CVE/GHSA text, prefer `### Patched Versions` with the fixed release. Do not explain how the patch works unless Peter explicitly asks for that public detail.
- Keep GHSA ids out of changelog and release-note wording unless Peter explicitly asks.
## Discussion Mode
When Peter is manually posting GHSA comments, use this flow:
1. Show the URL.
2. Give a terse verdict (`close`, `keep open`, or `keep open but narrow`).
3. List the strongest evidence bullets.
4. State any optional hardening follow-up separately from the close reason.
5. Copy the proposed comment body with `pbcopy`.
6. End the reply after the one advisory. Do not continue to the next advisory until Peter says to continue.
If the GitHub API cannot post comments for private advisories, say so once and keep using clipboard/UI paste.
## Clipboard Step
After drafting the final post body for the current advisory, copy it:
```bash
pbcopy <<'EOF'
<final response>
EOF
```
Tell the user that the clipboard now contains the proposed response for that advisory.
## Useful Commands
```bash
gh api /repos/openclaw/openclaw/security-advisories/<GHSA>
gh api /repos/openclaw/openclaw/security-advisories --paginate
git tag --sort=-creatordate | head -n 20
npm view openclaw version --userconfig "$(mktemp)"
git tag --contains <commit>
git show <tag>:<path>
gh search issues --repo openclaw/openclaw --match title,body,comments -- "<terms>"
gh search prs --repo openclaw/openclaw --match title,body,comments -- "<terms>"
```
## Decision Notes
- “fixed on main, unreleased” is usually not a close.
- “needs attacker-controlled trusted local state first” is usually out of scope.
- “same-host same-user process can already read/write local state” is usually out of scope.
- “trusted workspace memory promotes/reindexes trusted workspace memory” is usually out of scope unless it crosses a documented boundary.
- “helper function behaves differently than documented config semantics” is usually invalid.
- If only the severity is wrong but the bug is real, keep it open and narrow the impact in the reply.

View File

@@ -1,41 +0,0 @@
---
name: slacrawl
description: "Slack archive: search, sync freshness, threads/DMs, SQL counts, and Slacrawl repo work."
metadata:
openclaw:
homepage: https://github.com/openclaw/slacrawl
requires:
bins:
- slacrawl
install:
- kind: go
module: github.com/vincentkoc/slacrawl/cmd/slacrawl@latest
bins:
- slacrawl
---
# Slacrawl
Use local Slack archive data first. Check freshness for recent/current questions:
```bash
slacrawl doctor
slacrawl status --json
```
Refresh only when stale or asked:
```bash
slacrawl sync --source desktop
slacrawl sync --source api --latest-only
```
Query with bounded slices:
```bash
slacrawl search --limit 20 "query"
slacrawl messages --since 7d --limit 50
slacrawl sql "select count(*) from messages;"
```
Report workspace/channel names, absolute date spans, counts, and token/source limits. Use read-only SQL for exact counts/rankings. API sync and full thread/DM hydration require Slack tokens; do not assume they exist.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Slacrawl"
short_description: "Search local Slack archives and freshness"
default_prompt: "Use $slacrawl to search local Slack archives, check freshness, inspect channel or DM slices, and report exact date spans and token/source limits."

View File

@@ -1,439 +0,0 @@
---
name: tag-duplicate-prs-issues
description: Use gitcrawl to search duplicate OpenClaw PRs/issues, group related work in prtags, and sync duplicate state to GitHub.
---
# Tag Duplicate PRs and Issues
Use this skill when a maintainer needs to decide whether a pull request or issue is a duplicate of existing work.
This skill is for maintainer triage and grouping.
It is not for reviewing the implementation quality of a PR.
## Required Setup
Do not write duplicate groups or annotations until this setup is complete.
Read-only discovery can still proceed with `gitcrawl` and live `gh`.
### Companion Skills
Use `$gitcrawl` first for local candidate discovery.
Use the `prtags` skill from the `prtags` repo at `skills/prtags/SKILL.md` when it is available.
### Install the CLIs
Install `prtags` from its latest GitHub release.
Do not rely on an old local build unless the maintainer explicitly wants to test unreleased behavior.
`prtags` CLI install path:
```bash
curl -fsSL https://raw.githubusercontent.com/dutifuldev/prtags/main/scripts/install-prtags.sh | bash -s -- --bin-dir "$HOME/.local/bin"
```
### Authenticate prtags
`prtags` should be logged in with the maintainer's own GitHub account through OAuth device flow.
Do not use a shared maintainer token for interactive triage.
```bash
prtags auth login
prtags auth status
```
The expected outcome is that `prtags` stores the logged-in maintainer identity locally and uses that account for authenticated writes.
## Missing-Setup Rule
Do not require an up-front preflight before starting the workflow.
Proceed with the normal steps until you actually need a tool or account state.
As soon as you discover that `prtags` is missing or not logged in at the write step, stop immediately.
Do not continue in a partial write mode after that point.
If `prtags` is missing, ask the user to run:
```bash
curl -fsSL https://raw.githubusercontent.com/dutifuldev/prtags/main/scripts/install-prtags.sh | bash -s -- --bin-dir "$HOME/.local/bin"
```
If `prtags auth status` shows that the user is not logged in, ask the user to run:
```bash
prtags auth login
```
Resume only after the missing tool or login state has been fixed.
## Read-Path Default
For candidate discovery in this workflow, use `gitcrawl` first.
Treat it as the local history and clustering layer for related issues, duplicate attempts, and closed threads.
Use live `gh` or `gh api` for the target thread and for any candidate before making an actionable judgment.
Use live GitHub when `gitcrawl` is missing or stale for a concrete reason, such as:
- the target or candidate is not present yet
- the local data is clearly stale or incomplete for the decision you need to make
- `gitcrawl` errors, times out, or lacks the needed neighbor/search data
When you fall back to live GitHub search, note that you did so and why.
If a later `prtags` target-level write fails because its own mirror has not caught up, stop and report that the curation backend is missing the target object instead of forcing a fallback write.
## Goal
For each target PR or issue:
1. gather duplicate evidence
2. decide whether it is a real duplicate
3. create or reuse one `prtags` group for that duplicate cluster
4. save the maintainer judgment in `prtags`
5. rely on normal `prtags` group writes to drive GitHub comment sync when that integration is configured
## Tool Roles
Use the tools with these boundaries:
- `gitcrawl` is candidate generation and historical context
- use it first for local title/body search, neighbors, clusters, and closed-thread discovery
- treat every candidate as a lead until live GitHub confirms it
- `gh` is live GitHub truth
- use it for target state, body, comments, reviews, files, linked issues, and current open/closed/merged status
- use `gh search` only when `gitcrawl` is stale, missing data, or cannot express the needed query
- `prtags` is the maintainer curation layer
- use it to create or reuse one duplicate group
- use it to save the duplicate status, confidence, rationale, and group summary
- use it as the source of truth for the GitHub-facing group comment
## Working Rules
- Do not call something a duplicate only because the titles are similar.
- Do not call something a duplicate only because the same files changed.
- A duplicate cluster should be based on the same user-facing problem, the same intent, and substantially overlapping implementation or investigation context.
## One-Group Rule
Treat duplicate groups as exclusive.
A PR or issue should belong to at most one duplicate group at a time.
That means:
- before creating a new group, search for an existing group that already represents the same duplicate story
- if the target already appears to belong to a different duplicate group, stop and resolve that conflict first
- do not create a second group for the same target just because the wording is slightly different
- if two plausible existing groups overlap and you cannot safely merge the judgment, stop and ask the maintainer
This rule matters more than speed.
The skill should keep one coherent duplicate cluster per problem, not many near-duplicate clusters.
## What A Good Duplicate Group Represents
A duplicate group should describe the underlying problem and the intended fix direction.
Do not group items only because they share a keyword.
Good group shape:
- same user-facing bug or same maintainer-facing task
- same subsystem or code surface
- same intended change direction
- same likely duplicate-resolution path
Bad group shape:
- “all PRs that touch Slack”
- “all issues mentioning retry”
- “all auth-related items”
The group title should name the real problem.
The group description should summarize the intent and the code surface.
Examples:
- `gateway: startup regression from channel status bootstrap`
- `whatsapp: QR preflight timeout handling`
- `release: cross-OS validation handoff gaps`
## Evidence Checklist
Before declaring a duplicate, gather evidence from at least two categories.
`gitcrawl` neighbors, search hits, and cluster membership count as candidate generation, not as enough proof by themselves.
For PRs:
- same or nearly same problem statement
- same changed files or overlapping file ranges
- same fix direction
- same subsystem and failure mode
- same linked issue or same user-visible symptom
For issues:
- same user-visible problem
- same reproduction story or same failure mode
- same likely fix area
- same PRs already linked or discussed
- same maintainers already steering toward the same duplicate grouping
If you only have wording similarity, that is not enough.
## Step 1: Read The Target
Start by reading the target itself.
Use live GitHub for current target state.
For a PR:
```bash
gh pr view <number> --json number,title,state,mergedAt,body,closingIssuesReferences,files,comments,reviews,statusCheckRollup
```
For an issue:
```bash
gh issue view <number> --json number,title,state,body,comments,closedAt
```
Record:
- target type and number
- title
- problem statement
- proposed intent
- subsystem
- whether it is open, closed, or merged
- whether there is already a likely duplicate thread mentioned by humans
## Step 2: Search Broadly With Gitcrawl
Use `gitcrawl` first because it is the local OpenClaw history and clustering source.
Do not switch to broad live GitHub search unless `gitcrawl` is missing data, stale, or failing.
Start with the target and nearby threads:
```bash
gitcrawl threads openclaw/openclaw --numbers <issue-or-pr-number> --include-closed --json
gitcrawl neighbors openclaw/openclaw --number <issue-or-pr-number> --limit 20 --json
```
Then search key phrases and subsystem terms:
```bash
gitcrawl search openclaw/openclaw --query "<key phrase from title or body>" --mode hybrid --limit 20 --json
gitcrawl search openclaw/openclaw --query "<subsystem or error phrase>" --mode hybrid --limit 20 --json
```
Inspect likely clusters:
```bash
gitcrawl cluster-detail openclaw/openclaw --id <cluster-id> --member-limit 20 --body-chars 280 --json
```
For PRs, verify likely code overlap with live file data:
```bash
gh pr view <candidate-pr> --json number,title,state,mergedAt,files,body,comments,reviews
```
For issues, verify likely duplicate issue state and comments live:
```bash
gh issue view <candidate-issue> --json number,title,state,body,comments,closedAt
```
## Step 3: Use Live GitHub Search For Gaps
Use targeted live GitHub search after `gitcrawl` when:
- the target is too new for the local store
- comments or reviews matter and the local store lacks them
- the exact phrase did not appear in local results but the issue/PR is current enough that GitHub should know it
```bash
gh search prs --repo openclaw/openclaw --match title,body --limit 50 -- "<key phrase>"
gh search issues --repo openclaw/openclaw --match title,body --limit 50 -- "<key phrase>"
gh search issues --repo openclaw/openclaw --match comments --limit 50 -- "<error or maintainer phrase>"
```
## Step 4: Decide The Outcome
Choose one of these outcomes:
- `not_duplicate`
- `duplicate_needs_judgment`
- `duplicate_confirmed`
Use `duplicate_confirmed` only when the evidence is strong enough that the maintainer could safely close or retag the duplicate item.
Use `duplicate_needs_judgment` when:
- the problem looks the same but the implementation goal differs
- the code overlap is weak
- the issue wording is ambiguous
- there may be two valid duplicate group interpretations
- the target appears to intersect two existing duplicate groups
## Step 5: Reuse Or Create One prtags Group
Before creating a group, search `prtags` for an existing one.
Start with text search over groups:
```bash
prtags search text -R openclaw/openclaw "<problem phrase>" --types group --limit 10
prtags search similar -R openclaw/openclaw "<problem summary>" --types group --limit 10
prtags group list -R openclaw/openclaw
```
Inspect likely groups:
```bash
prtags group get <group-id>
prtags group get <group-id> --include-metadata
```
Reuse an existing group when:
- it represents the same problem
- it already contains clearly related members
- adding the target would keep the group coherent
Do not widen an existing group just because `gitcrawl` placed several PRs or issues near each other.
Confirm that the actual implementation path and maintainer intent still match before adding the new member.
Create a new group only when no existing group clearly fits.
Create the group with a problem-based title and an intent-based description:
```bash
prtags group create -R openclaw/openclaw \
--kind mixed \
--title "<problem-centered title>" \
--description "<same intent, subsystem, and duplicate-resolution path>" \
--status open
```
Then attach the target and any known duplicate members:
```bash
prtags group add-pr <group-id> <pr-number>
prtags group add-issue <group-id> <issue-number>
```
If a target appears to already belong to another duplicate group and you cannot safely reuse that group, stop.
Do not create a second group.
## Step 6: Ensure The Annotation Fields Exist
Use `field ensure` so the skill is idempotent.
Recommended target-level fields:
```bash
prtags field ensure -R openclaw/openclaw --name duplicate_status --scope pull_request --type enum --enum-values not_duplicate,candidate,confirmed --filterable
prtags field ensure -R openclaw/openclaw --name duplicate_status --scope issue --type enum --enum-values not_duplicate,candidate,confirmed --filterable
prtags field ensure -R openclaw/openclaw --name duplicate_confidence --scope pull_request --type enum --enum-values low,medium,high --filterable
prtags field ensure -R openclaw/openclaw --name duplicate_confidence --scope issue --type enum --enum-values low,medium,high --filterable
prtags field ensure -R openclaw/openclaw --name duplicate_rationale --scope pull_request --type text --searchable
prtags field ensure -R openclaw/openclaw --name duplicate_rationale --scope issue --type text --searchable
```
Recommended group-level fields:
```bash
prtags field ensure -R openclaw/openclaw --name duplicate_confidence --scope group --type enum --enum-values low,medium,high --filterable
prtags field ensure -R openclaw/openclaw --name duplicate_rationale --scope group --type text --searchable
prtags field ensure -R openclaw/openclaw --name cluster_summary --scope group --type text --searchable
```
## Step 7: Save The Maintainer Judgment In prtags
For a PR:
```bash
prtags annotation pr set -R openclaw/openclaw <pr-number> \
duplicate_status=confirmed \
duplicate_confidence=high \
duplicate_rationale="<same problem, same fix direction, overlapping files and comments>"
```
For an issue:
```bash
prtags annotation issue set -R openclaw/openclaw <issue-number> \
duplicate_status=confirmed \
duplicate_confidence=high \
duplicate_rationale="<same user-visible problem and same intended fix path>"
```
For the group:
```bash
prtags annotation group set <group-id> \
duplicate_confidence=high \
cluster_summary="<one-sentence problem summary>" \
duplicate_rationale="<why these items belong in one duplicate cluster>"
```
When the evidence is incomplete, set `duplicate_status=candidate` and lower the confidence.
If a per-PR or per-issue annotation write fails because `prtags` cannot resolve the target, do not force a fallback write path.
Keep the group state you were able to write, report that the curation backend is still missing the target object, and defer the target-level annotation until `prtags` catches up.
## Step 8: Let prtags Sync The Group Comment
Do not tell the agent to create a GitHub comment directly.
`prtags` owns the outbound GitHub comment as a derived projection of group state.
In the normal case, do not manually trigger comment sync.
When comment sync is configured, group writes already enqueue the derived comment projection automatically.
Use manual sync only as a repair or retry path:
```bash
prtags group sync-comments <group-id>
```
If the maintainer needs to see which groups still need attention, use:
```bash
prtags group list-comment-sync-targets -R openclaw/openclaw
```
The skill should treat the GitHub comment as a consequence of correct `prtags` group state.
It should not treat manual comment authoring as part of the normal duplicate workflow.
It should also not treat `sync-comments` as a required step for every duplicate decision.
## Output Format
Return a short maintainer report with these sections:
```text
Decision: duplicate_confirmed | duplicate_needs_judgment | not_duplicate
Target: PR #<n> | Issue #<n>
Confidence: high | medium | low
Evidence:
- ...
- ...
- ...
prtags actions:
- reused group <group-id> | created group <group-id>
- added members: ...
- annotations written: ...
- comment sync: automatic if configured | manual repair triggered for <group-id>
```
## Stop Conditions
Stop and escalate instead of forcing a duplicate decision when:
- the target appears to belong to two different duplicate groups
- the duplicate grouping is unclear
- the wording matches but the implementation goals differ
- two PRs touch the same files for different reasons
- two issues describe similar symptoms but likely different root causes
The maintainer should get one clean duplicate judgment or an explicit “needs judgment” result.
Do not blur the line.

View File

@@ -1,4 +0,0 @@
interface:
display_name: "Tag Duplicate PRs and Issues"
short_description: "Find duplicate PRs and issues with gitcrawl, group them in prtags, and let prtags sync the GitHub comment"
default_prompt: "Use $tag-duplicate-prs-issues to decide whether an OpenClaw PR or issue is a duplicate, gather candidates with gitcrawl, verify live state with GitHub, group related items in prtags, and save the duplicate judgment."

View File

@@ -1,79 +0,0 @@
---
name: technical-documentation
description: Build and review high-quality technical docs as well as agent instruction files in your repository.
license: MIT
metadata:
source: "https://github.com/vincentkoc/dotskills"
---
# Technical Documentation
## Purpose
Produce and review technical documentation that is clear, actionable, and maintainable for both humans and agents, including contributor-governance files and agent instruction files.
## When to use
- Creating or overhauling docs in an existing product/codebase (brownfield).
- Building evergreen docs meant to stay accurate and reusable over time.
- Reviewing doc diffs for structure, clarity, and operational correctness.
- Running full-repo documentation audits that must include both governance files and product docs surfaces (`docs/`, `README*`, `.md/.mdx/.mdc`, Fern/Sphinx/Mintlify-style sources).
- Updating or reviewing AGENTS.md and/or CONTRIBUTING.md to keep agent and contributor workflows aligned with current repo practices.
- Improving repository onboarding/docs that include contribution instructions, issue templates, PR flow, and review gates.
- Designing governance documentation strategy for repos with alias instruction files (for example `CLAUDE.md`, `AGENT.md`, `.cursorrules`, `.cursor/rules/*`, `.agent/`, `.agents/`, `.pi/`) where `AGENTS.md` is treated as canonical when present and aliases should be kept as compatibility surfaces.
- Diagnosing agent-file drift where teams had to prompt iteratively to surface missing files, broken commands, or policy conflicts.
- Applying repository-specific documentation overlays, including OpenClaw page-type, docs IA, preservation, and validation rules when present.
## Workflow
1. Classify task: `build` or `review`; context: `brownfield` or `evergreen`.
2. Inventory full documentation scope early (governance + product docs): AGENTS/CONTRIBUTING/aliases plus docs directories, framework sources, and root/module READMEs.
3. Detect multilingual scope (README/docs in multiple languages) and define required parity level.
4. Read `references/agent-and-contributing.md` for agent instruction and `CONTRIBUTING.md` workflow rules (inventory, canonical/alias mapping, dual-mode balance, deliverable standards, and precedence/conflict handling).
5. Read `references/principles.md` for the governing ruleset (Matt Palmer & OpenAI).
6. For OpenClaw docs work, read `references/openclaw.md` before the build/review playbook.
7. For build tasks, follow `references/build.md`.
8. For review tasks, follow `references/review.md` and proactively detect issues without waiting for repeated prompts.
9. For complex or high-risk tasks (build or review), it is acceptable to run longer, deeper, and more exhaustive investigations when needed for confidence.
10. When available, use sub-agents for bounded parallel discovery/review work, then merge outputs into one coherent final deliverable.
11. Use `references/tooling.md` when platform/tooling choices affect recommendations.
12. Run a proactive issue sweep for both governance and docs-content surfaces, and fix high-confidence defects in the same pass unless explicitly asked for report-only mode.
13. In brownfield mode, prioritize compatibility with current docs IA, tooling, and release state.
14. In evergreen mode, prioritize timeless wording, update strategy, and durable structure.
15. Return deliverables plus validation notes, parity status, and remaining gaps.
## Sub-agent orchestration guidance
Prefer sub-agents when the repo is large or the requested change set is broad; use them by default for repo-wide, multi-framework, or high-conflict work.
- `inventory-agent` -> `agents/inventory-agent.md` (`fast` / Claude `haiku`): file/config discovery, coverage map, and missing-path checks.
- `governance-agent` -> `agents/governance-agent.md` (`thinking` / Claude `sonnet`): AGENTS/CONTRIBUTING/alias precedence, conflicts, and policy drift.
- `docs-framework-agent` -> `agents/docs-framework-agent.md` (`thinking` / Claude `sonnet`): framework config, relative path base, and file-path vs URL-path mapping checks.
- `synthesis-agent` -> `agents/synthesis-agent.md` (`long` / Claude `opus`): merge sub-agent outputs into one prioritized fix plan and unified precedence model.
## Inputs
- Doc type (tutorial, how-to, reference, explanation) and audience.
- File scope or diff scope.
- Docs framework/tooling constraints (Fern, Mintlify, Sphinx, etc.).
- Build/review mode and brownfield/evergreen intent.
- Target agent and human compatibility intent.
- Docs framework surfaces in scope (for example Fern, Sphinx, Mintlify, Markdown/MDX/MDC/RST/RSC files).
- Desired investigation depth/time budget (quick pass vs exhaustive review).
- Execution mode (`single-agent` or `sub-agent-assisted` when available).
- Remediation mode (`apply-fixes` by default, or `report-only` when requested).
- Multilingual scope: source-of-truth language, target locales, and parity expectations.
- Repository-specific overlay constraints, if any.
## Outputs
- Updated draft or review findings with clear next actions.
- Validation notes (what was checked, what remains).
- Navigation/maintenance recommendations for long-term quality.
- Governance-doc alignment summary when AGENTS/CONTRIBUTING were touched.
- Agent instruction-surface map (primary file, alias files, Codex/Claude/Cursor handling plan).
- Documentation-surface coverage map (what was reviewed under `/docs`, README hierarchy, and framework-specific source trees).
- Autodetected issue list with applied fixes (or explicit report-only findings).
- Delegation notes when sub-agents were used (scope delegated and how findings were merged).
- Multilingual parity note (in-sync, partial with rationale, or intentionally divergent).
- Repository-specific overlay notes when one was used.

View File

@@ -1,32 +0,0 @@
---
name: docs-framework-agent
description: Thinking-focused docs framework checker for config-relative paths and route/file mapping consistency.
model: sonnet
tools:
- Read
- Glob
- Grep
permissionMode: default
maxTurns: 10
---
You are the docs-framework sub-agent for technical documentation.
Goals:
- validate framework config-driven docs behavior
- prevent path-mapping drift between source files and published routes
Tasks:
- detect and read framework config first (Fern/Sphinx/Mintlify/custom)
- resolve paths relative to the declaring file/config
- validate both maps:
- config -> file exists
- config/nav/routing -> URL path is valid and consistent
Return:
- config files reviewed
- path assumptions made
- mismatches (`missing file`, `stale route`, `wrong base path`)

View File

@@ -1,30 +0,0 @@
---
name: governance-agent
description: Thinking-focused governance reviewer for AGENTS/CONTRIBUTING/alias precedence, conflict detection, and policy drift analysis.
model: sonnet
tools:
- Read
- Glob
- Grep
permissionMode: default
maxTurns: 10
---
You are the governance sub-agent for technical documentation.
Goals:
- validate AGENTS/CONTRIBUTING/alias alignment and precedence
- identify policy drift and conflicting instructions
Tasks:
- determine canonical instruction source and alias compatibility mapping
- detect conflicts across nested scope files and tool-specific rule consumers
- validate command examples against stated governance expectations
Return:
- precedence model
- conflict list with severity
- recommended low-risk remediations

View File

@@ -1,31 +0,0 @@
---
name: inventory-agent
description: Fast repo-surface discovery for technical documentation audits. Use for coverage mapping and missing-path detection before deeper review.
model: haiku
tools:
- Read
- Glob
- Grep
- LS
permissionMode: default
maxTurns: 6
---
You are the inventory sub-agent for technical documentation.
Goals:
- enumerate governance and docs-content surfaces in scope
- detect missing files, broken references, and obvious command/path failures
Tasks:
- map `AGENTS.md`/`CONTRIBUTING.md`/aliases and docs surfaces (`docs/**`, README hierarchy, `.md/.mdx/.mdc/.rst/.rsc`)
- list framework config files discovered (Fern/Sphinx/Mintlify or equivalent)
- report hard failures only, with exact file paths
Return:
- coverage map
- missing/broken path list
- unresolved blockers

View File

@@ -1,10 +0,0 @@
interface:
display_name: "Technical Documentation"
short_description: "Build and review technical documentation for brownfield and evergreen systems."
icon_small: "./assets/icon.jpg"
icon_large: "./assets/icon.jpg"
brand_color: "#111827"
default_prompt: "Build or review technical documentation with a clear, maintainable, and production-ready workflow."
policy:
allow_implicit_invocation: true

View File

@@ -1,28 +0,0 @@
---
name: synthesis-agent
description: Long-context synthesis agent that merges sub-agent outputs into one prioritized and deduplicated documentation action plan.
model: opus
tools:
- Read
permissionMode: default
maxTurns: 12
---
You are the synthesis sub-agent for technical documentation.
Goal:
- merge sub-agent outputs into one coherent, non-duplicated action plan
Tasks:
- prioritize blockers first, then non-blocking improvements
- normalize to one precedence model for governance decisions
- remove duplicated recommendations and contradictory fixes
- keep final output concise and execution-ready
Return:
- prioritized fix plan
- validation summary (done vs pending)
- explicit remaining gaps/blockers

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

View File

@@ -1,145 +0,0 @@
# AGENT and CONTRIBUTING Principles
This reference consolidates the core rules for agent-policy and contributor-governance docs.
You must:
1. Discover repo-level and nested instruction files with:
`rg --files -g 'AGENTS.md' -g 'CONTRIBUTING.md' -g 'CLAUDE.md' -g 'AGENT.md' -g '.cursor/rules/*' -g '.cursorrules' -g '.agent/**' -g '.agents/**' -g '.pi/**' -g 'AGENTS.*.md'`
2. Read the root and nearest-scope `AGENTS.md`/`CONTRIBUTING.md` pair before editing.
3. If alias files exist, normalize to one canonical source (`AGENTS.md` preferred when present; otherwise nearest alias), plus compatibility pointers or explicit symlink notes.
4. Document conflicting instructions and precedence decisions.
## GitHub + AGENTS baseline
Source: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors
Source: https://agents.md/
Source: https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/
Source: https://cobusgreyling.substack.com/p/what-is-agentsmd
Source: https://www.infoq.com/news/2025/08/agents-md/
Use these as default operating principles:
1. Keep `CONTRIBUTING.md` discoverable and actionable (`.github`, root, or `docs`).
2. Keep agent instructions concrete: real commands, real paths, clear boundaries.
3. Use explicit behavior boundaries for agents: `Always`, `Ask first`, `Never`.
4. Keep contributor and agent rules aligned with actual repository workflows.
5. Ensure clear guidance is provided to agents on if, when and how to raise issues and pull requests.
## Canonical and alias policy
Source: https://agents.md/
Source: https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/
1. Treat `AGENTS.md` as canonical when present.
2. If `AGENTS.md` is absent, treat the nearest alias file as canonical.
3. Keep compatibility surfaces explicit: `AGENTS.md`, `AGENT.md`, `.cursorrules`, `.cursor/rules/*`, `.agent/`, `.agents/`, `.pi/`.
4. If aliases are used, document how they map back to canonical policy (or symlink when supported).
5. When repos use `.agents/` as canonical rule storage, keep `.cursor` as a compatibility symlink to `.agents` for Cursor rule auto-loading.
6. Keep policy DRY: store one shared policy core and expose it via aliases/symlinks instead of duplicating rule text.
## Context-awareness by agent platform
Source: https://github.com/vercel-labs/agent-skills/blob/main/AGENTS.md
Source: https://github.com/openai/codex/blob/main/AGENTS.md
1. For Cursor and Claude-style glob consumers, keep rule files narrow and bounded.
2. Avoid over-referencing large path sets that inflate context for glob-based agents.
3. For Codex-style workflows, prefer explicit file references and deterministic commands.
4. Keep long runbooks outside top-level policy files; link to scoped docs.
5. Ensure all agents have a happy path regardless so ensuring everything works across Codex, Claude and other coding agents.
## Symlink and compatibility operations
1. Preferred layout for multi-agent compatibility:
- canonical rule directory: `.agents/`
- Cursor compatibility path: `.cursor -> .agents` symlink
- canonical policy doc: `AGENTS.md` pointing to `.agents` paths where relevant
2. Validate symlink state before finalizing changes:
- if `.agents/` exists and `.cursor` is missing, create `.cursor` symlink to `.agents`
- if `.cursor` is a symlink to another target, fix target or document why it must differ
- if `.cursor` is a real directory/file, treat as migration conflict and ask before replacement
3. Validate rule payload through the canonical directory:
- rules: `.agents/rules/*.mdc` with valid frontmatter (`description`, `globs`, `alwaysApply` as needed)
- commands: `.agents/commands/*.md` when command routing is used
- MCP config: `.agents/mcp.json` when MCP is in scope
4. Keep Codex behavior explicit:
- `AGENTS.md` is primary for Codex repository instructions
- `.cursor` compatibility is for Cursor auto-loading and does not replace canonical AGENTS policy
5. Record applied symlink fixes and unresolved compatibility gaps in validation notes.
## Dual-mode and deliverable standards
Source: https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/
Source: https://agents.md/
Source: https://github.com/openai/codex/blob/main/AGENTS.md
Source: https://github.com/vercel-labs/agent-skills/blob/main/AGENTS.md
1. Author one shared policy core (same commands, boundaries, and precedence) for all agents.
2. For Cursor/Claude-style agents, expose that core through glob-driven and bounded files (small `AGENTS.md`/rule surface).
3. For Codex, expose that same core through explicit file references with precise scope.
4. Where styles diverge, prefer the smallest common structure that satisfies both and avoid duplicating policy text.
5. Treat AGENTS/CONTRIBUTING as first-class deliverables when in scope.
6. Preserve required structure, constraints, and examples from existing files.
7. Align wording and commands with active repository instructions.
## Proactive issue discovery and remediation
Source: https://github.blog/ai-and-ml/github-copilot/how-to-write-a-great-agents-md-lessons-from-over-2500-repositories/
Source: https://github.com/openai/codex/blob/main/AGENTS.md
Source: https://github.com/vercel-labs/agent-skills/blob/main/AGENTS.md
1. Run a conflict matrix review across AGENTS/aliases/CONTRIBUTING and related command/rule docs before finalizing.
2. Treat the following as high-priority defects: missing referenced files, non-existent setup commands, command scope mismatches, and branch/commit policy conflicts.
3. Do not stop at caveat-only notes when a low-risk fix is clear; apply the fix in the same pass.
4. If a canonical entry file is missing (for example a directory `README.md` that docs depend on), create a minimal actionable file and update references.
5. Long-running investigations are acceptable when needed to uncover cross-file drift, especially in agent-instruction ecosystems.
## Discovery
1. Agents prefer simple terminal commands so having a well defined `make *` or `npm run *` is ideal
2. Agents can discover terminal commands through shell completion so providing shell completion helps
## CONTRIBUTING size and scope control
Source: https://contributing.md/how-to-build-contributing-md/
Source: https://blog.codacy.com/best-practices-to-manage-an-open-source-project
Source: https://mozillascience.github.io/working-open-workshop/contributing/
Source: https://github.com/openclaw/openclaw/blob/main/CONTRIBUTING.md
1. Keep root `CONTRIBUTING.md` focused on setup, issue flow, PR flow, testing, and review gates.
2. Use issue/PR template links instead of embedding every process detail inline.
3. When the file grows too large, split by domain and link from root.
4. Move any large content into docs if avalible (for example Mintlify/Fern/Sphinx workflows) to avoid large contributor guide.
5. Optimize for agent/machine readability as well as humans.
## Example repos to emulate
Source: https://github.com/openclaw/openclaw/blob/main/AGENTS.md
Source: https://github.com/openclaw/openclaw/blob/main/CONTRIBUTING.md
Source: https://github.com/openclaw/openclaw/blob/main/VISION.md
Source: https://github.com/openai/codex/blob/main/AGENTS.md
Source: https://github.com/processing/p5.js/blob/main/AGENTS.md
Source: https://github.com/vercel-labs/agent-skills/blob/main/AGENTS.md
Source: https://github.com/agentsmd/agents.md/blob/main/AGENTS.md
Source: https://github.com/rails/rails/blob/main/CONTRIBUTING.md
Source: https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md
Source: https://github.com/atom/atom/blob/master/CONTRIBUTING.md
Source: https://github.com/github/docs/blob/main/CONTRIBUTING.md
Source: https://github.com/facebook/react/blob/main/CONTRIBUTING.md
1. OpenClaw: strong real-world alias policy and AGENTS/CONTRIBUTING/VISION cohesion.
2. OpenAI Codex: strict command discipline and explicit scope control.
3. p5.js: explicit AI-policy guardrails in agent instructions.
4. Vercel + agentsmd spec: compact, context-efficient AGENTS patterns.
5. Rails/Kubernetes/Atom/GitHub Docs/React: contributor guidance patterns at different project scales.
## Practical merge policy
When these rules conflict:
1. Preserve contributor and reader task success first.
2. Preserve instruction clarity and unambiguous boundaries second.
3. Preserve long-term maintainability and context-efficiency third.
4. Add extra agent optimization only if it does not reduce human clarity or there is explict need.
5. Use your judgement as the expert.

View File

@@ -1,116 +0,0 @@
# Build Docs Playbook
Read `principles.md` first, then follow this execution flow.
## 1. Detect and align agent instruction and governance instructions
- Use `references/agent-and-contributing.md` as the source of truth for inventory, canonical/alias mapping, and precedence/conflict handling.
- Apply the symlink compatibility policy when in scope (`.agents` canonical directory with `.cursor` compatibility symlink when required by tooling).
- Long-running and extensive build investigations are acceptable when needed to resolve ambiguous or conflicting documentation sources.
- When available, use sub-agents for bounded parallel inventory/cross-check tasks and merge results into one canonical decision set.
- Capture required constraints before writing:
- nested-agent rules, command/test requirements, PR workflow, and style checks.
- Use the same command and validation expectations in proposed snippets and examples.
## 2. Inventory product documentation surfaces (not governance only)
- For repo-wide builds, include docs content surfaces in addition to AGENTS/CONTRIBUTING.
- Inventory docs files and frameworks in scope (examples): `README*.md`, `docs/**`, `**/*.md`, `**/*.mdx`, `**/*.mdc`, `**/*.rst`, `**/*.rsc`, Fern/Mintlify config, Sphinx `conf.py`.
- Build a coverage map before drafting so governance and product docs are both represented.
- If scope is ambiguous, default to broader docs discovery first, then narrow intentionally.
## 3. Framework config and path mapping rules
- Detect framework/config first (for example Fern config, Sphinx `conf.py`, Mintlify config, or equivalent).
- Resolve every referenced path relative to the file/config that declares it, not assumed repo root.
- Treat filesystem paths and published URL routes as separate mappings; do not infer one from the other without config evidence.
- Validate both layers:
- config -> file exists on disk
- config/nav/routing -> URL path is consistent and reachable
- Record path-mapping assumptions and mismatches in handoff (`missing file`, `stale route`, `wrong base path`).
## 4. Define intent and success
- Audience, prerequisites, and job-to-be-done.
- Expected reader outcome immediately after completion.
- Doc type: tutorial, how-to, reference, explanation.
- Success criteria: what must be true after publish.
## 5. Build structure before prose
- Follow the funnel: what/why, quickstart, next steps.
- Keep headings informative and scannable.
- Open each section with the takeaway sentence.
- Add decision points with concrete branch guidance.
- For OpenClaw docs work, choose a page type from `references/openclaw.md` before drafting.
- Keep task-critical OpenClaw configuration inline; link exhaustive defaults, enums, schemas, generated references, and rare debugging workflows.
## 6. Build AGENTS.md and CONTRIBUTING.md intentionally
- Keep AGENTS.md structure consistent with `agents.md` ecosystem patterns:
- include YAML frontmatter when present in repo style (`name`, `description`).
- state persona scope and explicit instruction boundaries: `Always`, `Ask first`, `Never`.
- include concrete commands and representative code examples.
- For CONTRIBUTING.md, prioritize issue triage flow, PR expectations, setup/test commands, and review gates.
- Add `Code of Conduct`, `Testing`, `Local checks`, and `PR expectations` sections when missing but required by the repo.
- If CONTRIBUTING.md is becoming too large, split by scope into linked docs (for example, framework/tool-specific setup and release workflows) and keep the root file as a concise entry point.
- Keep cross-file consistency: links from CONTRIBUTING.md to AGENTS.md (and vice versa) should be accurate and non-circular.
- If multiple AGENTS.md files exist, document the directory-level scope and avoid conflicting advice.
- If a required canonical entry file is missing (for example referenced `README.md` under a major directory), create the file in the same pass instead of adding a caveat-only note.
- For new entry files, keep them minimal and actionable: purpose, prerequisites, concrete run commands, and pointers to deeper docs.
## 7. Keep agent context tight
- Author once, expose twice:
- keep one shared policy core and avoid duplicating guidance in separate agent-specific files.
- publish that core through bounded glob-friendly files for Cursor/Claude plus explicit path references for Codex.
- For Cursor and Claude-style agents, avoid broad references. Use minimal globbing and narrow rule files that each serve one concern (for example, repo-wide setup, test rules, security checks).
- Keep AGENTS and alias files short-to-medium; move detailed runbooks to linked docs.
- For Codex, prefer explicit file references and concrete paths for exact reuse.
- Avoid adding unrelated historical or process details to avoid token/context drift during future tool reads.
## 8. Brownfield build mode
- Match existing terminology, navigation, and component patterns.
- Preserve existing IA unless there is a documented migration plan.
- For rewrites, include a migration note from old to new paths.
- Prefer smallest safe change set that improves utility.
## 9. Evergreen build mode
- Prefer stable concepts over release-tied narrative.
- Isolate volatile details under clearly marked version sections.
- Include maintenance signals: owners, refresh triggers, stale criteria.
- Include lifecycle notes: deprecation and replacement paths.
## 10. Writing constraints
- Use precise language and short, imperative instructions.
- Keep code examples copy-ready and self-contained.
- Include common failure modes and safe defaults.
- Avoid placeholder guidance that cannot be executed.
## 11. Agent and automation readiness
- Keep key facts in text (not image-only).
- Prefer structured lists/tables when choices matter.
- Add links and anchors that allow deterministic navigation.
- Document what can be checked automatically in CI.
## 12. Build validation
- Validate commands and snippets where possible.
- Verify links and references in changed sections.
- Run a reference existence sweep for every path/command you introduced.
- Verify docs-framework consistency when in scope (for example Sphinx/Fern config and referenced doc paths).
- For OpenClaw docs work, apply the validation checklist in `references/openclaw.md`.
## 13. Multilingual parity mode (when applicable)
- Pick one source-of-truth language for technical accuracy and release timing.
- Define parity target: full parity, staged parity, or intentional divergence per section.
- Keep structure aligned across locales (headings, anchors, section order) when possible.
- Preserve command/code correctness first; localize explanatory text second.
- If parity is not feasible, add a visible note with missing scope and expected sync window.
- Run a locale parity check for changed sections (added/removed steps, warnings, prerequisites).
- Record unresolved checks explicitly in handoff.

View File

@@ -1,128 +0,0 @@
# OpenClaw Documentation Overlay
Use this reference only for OpenClaw docs work. It layers OpenClaw-specific page
types, navigation, preservation, and validation rules on top of the general
technical-documentation skill.
## Reader Model
- Lead with the task the reader is trying to complete.
- Give one recommended path before alternatives.
- Keep main docs focused on the common path; move dense contracts and rare
debugging detail to linked reference or troubleshooting pages.
- Explain production risks exactly where the reader can make the mistake.
- Link concepts, guides, references, CLI pages, SDK docs, testing, and
troubleshooting so readers can continue without rereading.
## Page Types
Choose the page type before writing or reviewing:
- Overview: route readers to the right product area, integration path, or guide.
- Quickstart: get a new user to a working result with the fewest safe steps.
- Topic page: explain a major OpenClaw entity or surface end to end.
- Guide: walk through one workflow from prerequisites to production readiness.
- API/SDK/CLI reference: define every object, method, command, option, response,
error, enum, default, and version rule in scope.
- Testing guide: show sandbox setup, fixtures, simulated failures, and live-mode
differences.
- Troubleshooting guide: map observable symptoms to checks, causes, and fixes.
- Governance file: keep agent/contributor policy concrete, scoped, and aligned
with current OpenClaw repo behavior.
## Topic Pages
Use this shape for major-entity pages:
1. Title naming the entity or surface.
2. Unheaded opening that says what it is, what it owns, and what it does not own.
3. Requirements, only when setup needs accounts, versions, permissions, plugins,
operating systems, or credentials.
4. Quickstart with the recommended path and smallest reliable verification.
5. Configuration with task-critical options inline and exhaustive details linked
to reference docs.
6. Major subtopics organized by reader intent, not under a generic "Subtopics"
heading.
7. Troubleshooting with observable failures and concrete checks.
8. Related links to guides, references, commands, concepts, and adjacent topics.
## Guides
Use this shape for workflow pages:
1. Title naming the outcome, not the implementation detail.
2. Opening that states what the reader can accomplish.
3. Before you begin: accounts, keys, permissions, versions, tools, and
assumptions.
4. Choose a path, only when the reader must decide.
5. Steps with verb-led headings, commands, expected output, and checks.
6. Test with the smallest reliable proof that the workflow works.
7. Production readiness: security, retries, limits, observability, migrations,
and cleanup.
8. Troubleshooting near the workflow that causes the failures.
9. See also links to concepts, references, SDK docs, and adjacent guides.
## Docs IA And Navigation
- Read `docs/docs.json` before navigation changes.
- Keep topic pages and common workflows on the main reader path.
- Put exhaustive contracts, generated references, maintainer-only detail, and
support material under `Reference` or another clearly scoped support page.
- Keep generated `plugins/reference/*` children and redirect-only pages out of
visible navigation unless explicitly required.
- For moved pages, include a keep/drop/move/destination matrix in the handoff.
- Add "Read when" hints for docs-list routing when creating or changing pages
that participate in the docs index.
## Source-Backed Content
- CLI docs must match current flags, output, errors, and examples.
- API/SDK docs must include fields, defaults, enum values, constraints, nullable
behavior, lifecycle states, errors, and recovery guidance.
- Config docs must align exported types, schema/help output, metadata, baselines,
and current docs.
- Dependency-backed behavior must be verified from upstream docs, source, or
types before documenting defaults, timing, errors, or API behavior.
- Separate current behavior, shipped behavior, planned behavior, and maintainer
intent.
## Examples
- Prefer complete copy-pasteable commands and snippets.
- Use realistic variable names and values.
- Mark placeholders with angle-bracket names such as `<API_KEY>`.
- Show expected success output when it helps verification.
- Keep one conceptual unit per code block and use language-specific fences.
- Avoid examples that hide setup, auth, error handling, or cleanup.
- Never expose real secrets, live config, phone numbers, private videos, or
credentials.
## Preservation Reviews
For rewrites or splits:
- Identify source units before rewriting: headings, paragraphs, tables, examples,
CLI/API contracts, warnings, and troubleshooting facts.
- Map each retained unit to a destination page or section.
- Do not treat a broad "covered" row as proof for dense source material; use
line- or claim-level evidence when the source unit is dense.
- For dropped content, state whether it is obsolete, duplicated elsewhere,
unsupported, or moved to a reference/support page.
- When a docs-audit artifact is used, verify it is mapped audit data with
non-empty `mappings[]`, not only inventory or reindexed JSON.
## Validation
Choose the narrowest proof that covers the touched surface:
- `pnpm docs:list`
- `pnpm docs:check-mdx`
- `pnpm docs:check-links`
- `pnpm docs:check-i18n-glossary`
- `pnpm format:docs:check` or `pnpm lint:docs`
- `git diff --check`
- generated-doc or inventory checks when generated references, plugin catalogs,
labeler, or docs scripts changed
- behavior tests or command probes when docs claim runtime behavior
If proof is blocked, say exactly which command was not run and why.

View File

@@ -1,54 +0,0 @@
# Documentation Principles
This reference consolidates the core rules used by this skill.
## Matt Palmer: 8 rules for better docs
Source: https://mattpalmer.io/posts/2025/10/8-rules-for-better-docs/
Use these as default operating principles:
1. Write for humans, optimize for agents.
2. Start with a funnel: what/why, quickstart, next steps.
3. Use Diataxis to scaffold content.
4. Write with AI, but structure for agents.
5. Offload routine docs operations to background agents.
6. Automate quality with CI.
7. Automate scaffolding and repetitive workflow tasks.
8. Make contribution easy and visible.
## OpenAI cookbook: what makes documentation good
Source: https://cookbook.openai.com/articles/what_makes_documentation_good
Key quality constraints:
- Prefer specific and accurate terminology over niche jargon.
- Keep examples self-contained and minimize dependencies.
- Prioritize high-value topics over edge-case depth.
- Do not teach unsafe patterns (for example, exposed secrets).
- Open with context that helps readers orient quickly.
- Apply empathy and override rigid rules when it clearly improves outcomes.
## Practical merge policy
When these rules conflict:
1. Preserve reader task success first.
2. Preserve structural clarity second.
3. Preserve long-term maintainability third.
4. Add agent optimization only if it does not reduce human clarity.
For agent-instructions and contributor-governance specifics (AGENTS/aliases/CONTRIBUTING), use `references/agent-and-contributing.md` as the detailed additional source of truth.
When the target repo or request is OpenClaw-specific, layer `references/openclaw.md` on top of these general rules. Otherwise ignore that repo-specific overlay.
## Execution policy for this skill
- Long-running and extensive investigations are allowed for both build and review work when needed to resolve ambiguity or cross-file drift.
- Use sub-agents when available for bounded parallel discovery, verification, or cross-source comparison.
- Keep one merged outcome: sub-agent outputs must be normalized into a single consistent recommendation/fix set.
## Multilingual parity rule
When docs exist in multiple languages, target cross-locale parity for task-critical content (steps, warnings, prerequisites, and limits). If full parity is not possible, publish explicit parity status and sync intent.

View File

@@ -1,121 +0,0 @@
# Review Docs Playbook
Read `principles.md` first, then apply this checklist.
## 1. Scope and classification
- Identify doc type and target audience.
- Confirm brownfield vs evergreen intent.
- Confirm expected outcome for the reader.
- For full-repo reviews, explicitly include both governance surfaces and product-doc surfaces (`docs/`, README trees, `.md/.mdx/.mdc`, `.rst/.rsc`, framework docs configs).
- For OpenClaw docs reviews, apply `references/openclaw.md` for page type, docs IA, preservation, examples, and validation checks.
## 2. Investigation behavior
- Proactively find issues and risks without waiting for repeated prompts.
- If there are signals of deeper problems, continue investigation beyond the first pass.
- Long-running and extensive investigations are acceptable when needed for confidence and correctness.
- When available, use sub-agents for bounded parallel discovery (for example file-inventory, command validation, or cross-doc consistency checks), then merge to one final issue set.
- When no issues are found, state that explicitly and call out residual risks or validation gaps.
- Default to `apply-fixes` for high-confidence documentation defects unless the user explicitly requests `report-only`.
- Do not stop at AGENTS/CONTRIBUTING checks when the task is documentation-wide; continue into docs-content and docs-framework surfaces.
## 3. Governance surface review
- Use `references/agent-and-contributing.md` as the source of truth for inventory, canonical/alias mapping, and precedence/conflict handling.
For AGENTS.md:
- confirm persona intent, scope, and command/tool boundaries are explicit.
- check frontmatter style matches repo conventions when present.
- ensure `Always`, `Ask first`, and `Never` boundaries are present when expected.
- require concrete command examples and repo-specific paths to avoid ambiguity.
For CONTRIBUTING.md:
- verify issue/PR workflow is complete and actionable.
- ensure local setup, lint/test commands, and review criteria are accurate.
- ensure governance does not conflict with nested AGENTS instructions.
- flag oversized files that should be split into linked section docs (for example tool-specific setup and release docs).
For agent-platform awareness:
- confirm references are minimal and scoped for Cursor/Claude glob behavior.
- confirm Codex-facing guidance uses explicit file references.
- confirm both surfaces represent the same shared policy core (commands, boundaries, and precedence), not divergent guidance.
- audit `.agents`/`.cursor` compatibility behavior:
- verify canonical rule directory and symlink state match repo policy
- verify symlink target integrity and platform/tooling expectations
- verify AGENTS policy references remain canonical for Codex even when `.cursor` compatibility exists
- check for context bloat from duplicated policy statements across agent and contributor files.
- check for conflicting rules, skills and agent instructions
- check for conflicting information in agent instructions vs codebase
- check for broken or missing referenced files (for example README/index files named as canonical entry points).
- check for setup/command drift (for example non-existent install commands, root-level commands that should be module-scoped).
## 4. Product documentation surface review
- Verify docs IA coverage across root/module `README*` files and `docs/**` trees.
- Review framework-native docs sources in scope (for example Fern, Mintlify, Sphinx, MkDocs) and ensure guidance matches actual source-of-truth files.
- Check `.md/.mdx/.mdc/.rst/.rsc` for stale commands, missing prerequisites, and broken cross-links.
- Confirm referenced doc paths and anchors exist.
- Flag docs that should be split/merged to improve discoverability and maintenance.
- For OpenClaw docs, check `docs/docs.json`, docs-list routing hints, main path versus `Reference` placement, and generated-reference visibility.
- For OpenClaw rewrites or page splits, require source-backed keep/drop/move/destination coverage for important claims, warnings, examples, commands, fields, and troubleshooting facts.
## 5. Framework config and path mapping checks
- Detect and read framework config first (for example Fern config, Sphinx `conf.py`, Mintlify config, or equivalent).
- Resolve path references relative to the declaring file/config.
- Treat filesystem paths and published URL routes as separate maps; verify both.
- Flag path-map drift explicitly (`missing file`, `stale route`, `wrong base path`).
## 6. Structural review
- Funnel check: what/why, quickstart, next steps.
- Validate heading flow and navigation discoverability.
- Flag critical content trapped in images or buried sections.
- Check Diataxis alignment and split mixed-purpose sections.
- For OpenClaw docs, confirm the content matches an explicit page type from `references/openclaw.md`.
## 7. Writing quality review
- Check for concise, scannable paragraphs.
- Remove ambiguous pronouns and undefined terms.
- Verify examples are executable and scoped correctly.
- Verify tone is directive, technical, and non-hand-wavy.
## 8. Brownfield review mode
- Verify compatibility with existing docs IA and conventions.
- Verify anchors, redirects, and cross-doc links remain valid.
- Flag regressions in onboarding and task completion paths.
- Ensure changed terminology is intentionally propagated.
## 9. Evergreen review mode
- Flag date-stamped or brittle wording without version scope.
- Check ownership and refresh signals are present.
- Ensure recommendations remain valid after routine product evolution.
- Flag missing deprecation/migration guidance.
## 10. Tooling and platform review
Read `tooling.md` if platform fit is uncertain.
- Check whether content uses platform primitives effectively.
- Flag structure that fights the chosen docs platform.
- Recommend targeted platform-aware improvements.
## 11. Multilingual parity review (when applicable)
- Confirm declared source-of-truth language and expected parity policy.
- Compare changed sections across locales for step/order/warning drift.
- Flag missing updates to prerequisites, version notes, limits, and safety guidance.
- Allow intentional divergence only when rationale is explicit and user-impact is low.
- Require a reader-visible status note when locale parity is partial.
## 12. Output format
1. Blocking issues (file + required fix)
2. Non-blocking improvements
3. Validation notes (done vs pending)

View File

@@ -1,32 +0,0 @@
# Documentation Tooling Guide
Source: https://www.mintlify.com/blog/top-7-api-documentation-tools-of-2025
Use this file when deciding build/review expectations for doc platforms.
## Tool-selection checkpoints
- Existing stack lock-in: do not force migration for minor gains.
- API workflow depth: generated references, OpenAPI support, testability.
- Collaboration model: docs-as-code, review workflow, versioning.
- Runtime quality: search, navigation, and copy-ready code snippets.
- AI readiness: structured content, stable URLs, machine-friendly layout yet human readable.
- Human readiness: reading complexity, reading UX, navigation depth, minimize jargon.
## Apply in brownfield mode
- Prioritize compatibility with the current platform.
- Use available components and style conventions before introducing new patterns.
- Propose migration only when current constraints block critical outcomes.
## Apply in evergreen mode
- Favor platforms and templates that make routine updates low-friction.
- Standardize section templates to reduce drift.
- Capture ownership, update cadence, and stale-content detection rules.
## Review implications
- Check whether content uses platform primitives correctly (tabs, callouts, endpoint blocks).
- Flag docs that are technically correct but hard to scan in the chosen platform.
- Recommend platform-specific improvements only when they reduce cognitive load.

View File

@@ -1,206 +0,0 @@
---
name: telegram-crabbox-e2e-proof
description: Use when reviewing, reproducing, or proving OpenClaw Telegram behavior with a real Telegram user on Crabbox, including PR review workflows that need an agent-controlled Telegram Desktop recording, TDLib user-driver commands, Convex-leased credentials, WebVNC observation, and motion-trimmed artifacts.
---
# Telegram Crabbox E2E Proof
Use this for Telegram PR review or bug reproduction when bot-to-bot proof is
not enough. The goal is to let the agent keep a real Telegram user session open
until it is satisfied, then attach visual proof.
Do not use personal accounts. Do not add credentials to the repo, prompt, or
artifact bundle. The runner leases the shared burner account from Convex.
## Start
Run from the OpenClaw repo and branch under test:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" start \
--tdlib-url http://artifacts.openclaw.ai/tdlib-v1.8.0-linux-x64.tgz \
--output-dir .artifacts/qa-e2e/telegram-user-crabbox/pr-review
```
This starts one held session:
- leases the exclusive `telegram-user` Convex credential
- restores TDLib and Telegram Desktop with the same user account
- starts a mock OpenClaw Telegram SUT from the current checkout
- selects the configured Telegram chat in the visible Linux desktop
- starts a 24fps desktop recording
- writes `.artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json`
Keep the session alive while investigating. It is valid for the agent to test
for minutes, run several commands, use WebVNC, inspect transcripts, and only
finish once the behavior is understood.
For deterministic visual repros, put the exact mock-model reply in a file and
pass it to `start`:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" start \
--tdlib-url http://artifacts.openclaw.ai/tdlib-v1.8.0-linux-x64.tgz \
--mock-response-file .artifacts/qa-e2e/telegram-user-crabbox/reply.txt \
--output-dir .artifacts/qa-e2e/telegram-user-crabbox/pr-review
```
The runner defaults to `--class standard`, `--record-fps 24`,
`--preview-fps 24`, and `--preview-width 1920`. Keep those defaults unless the
proof needs something else.
## While Testing
For visual proof, first send or identify a bottom marker message, then open the
group/topic directly by message id:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" view \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json \
--message-id <message-id>
```
This uses Telegram Desktop directly with `tg://privatepost`, not `xdg-open`.
It also resizes Telegram to `650x1000` at the tested desktop position so
the crop can isolate the chat pane even if Telegram keeps a split/sidebar
layout. Do not press Escape after this; Escape can close the selected chat.
Bottom behavior matters:
- deep-linking to the newest message keeps Telegram pinned to the bottom, so
later messages appear live in the recording
- deep-linking to an older message does not auto-scroll to new arrivals; link
again to the newest/final marker instead of clicking the down-arrow
- the cropped GIF intentionally uses the chat pane, not the whole desktop or
whole Telegram window
Send as the real Telegram user:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" send \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json \
--text /status
```
For slash commands, omit the bot username; the runner targets the SUT bot.
Run arbitrary commands on the Crabbox:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" run \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json \
-- bash -lc 'source /tmp/openclaw-telegram-user-crabbox/env.sh && python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py transcript --limit 20 --json'
```
Useful remote user-driver commands:
```bash
source /tmp/openclaw-telegram-user-crabbox/env.sh
python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py status --json
python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py chats --json
python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py transcript --limit 20 --json
python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py send --text '/status@{sut}'
python3 /tmp/openclaw-telegram-user-crabbox/user-driver.py probe --text '@{sut} Reply exactly: USER-E2E-{run}' --expect USER-E2E-
```
Capture the current desktop without ending the session:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" screenshot \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json
```
Check lease state and get the WebVNC command:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" status \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json
```
## Finish
Always finish or explicitly keep the box:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" finish \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json \
--preview-crop telegram-window
```
`finish` stops recording, creates motion-trimmed MP4/GIF artifacts, captures a
final screenshot and logs, releases the Convex credential, stops the local SUT,
and stops the Crabbox lease. `--preview-crop telegram-window` also creates a
fixed-geometry GIF from the tested Telegram proof window for clean side-by-side
PR tables; the full desktop video/GIF remains in the artifact directory. Pass
`--keep-box` only when a human needs to continue VNC debugging after the
credential is released.
After any failure or interruption, verify cleanup:
```bash
crabbox list --provider aws
```
If a session file exists and the credential may still be leased, run `finish`
with that session file before retrying.
## Attach Proof
Attach only the useful visual artifact to the PR unless logs are needed. The
runner is GIF-only by default:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" publish \
--session .artifacts/qa-e2e/telegram-user-crabbox/pr-review/session.json \
--pr <pr-number> \
--summary 'Telegram real-user Crabbox session motion GIF'
```
This copies only the useful GIF into a temporary publish bundle and comments
that GIF. If `finish --preview-crop telegram-window` produced a cropped GIF,
publish uses that; otherwise it uses `telegram-user-crabbox-session-motion.gif`.
Use `--full-artifacts` only when the PR needs logs or JSON output. Never publish
credential payloads, local env files, TDLib databases, Telegram Desktop
profiles, or raw session archives.
For before/after proof, run one session on `main` and one on the PR head, then
publish only the intended GIFs from a clean bundle:
```bash
mkdir -p .artifacts/qa-e2e/telegram-user-crabbox/pr-123/comparison
cp <main-output>/telegram-user-crabbox-session-motion-telegram-window.gif \
.artifacts/qa-e2e/telegram-user-crabbox/pr-123/comparison/main-before.gif
cp <pr-output>/telegram-user-crabbox-session-motion-telegram-window.gif \
.artifacts/qa-e2e/telegram-user-crabbox/pr-123/comparison/pr-after.gif
crabbox artifacts publish \
--repo openclaw/openclaw \
--pr 123 \
--dir .artifacts/qa-e2e/telegram-user-crabbox/pr-123/comparison \
--summary 'Telegram before/after proof' \
--no-comment
```
Then post a concise markdown table with those two URLs. Do not publish working
directories that contain screenshots, raw videos, logs, session JSON, or crop
experiments unless those artifacts are explicitly needed.
## Quick Smoke
For a fast one-shot check, use:
```bash
proof_cmd="${OPENCLAW_TELEGRAM_USER_PROOF_CMD:-openclaw-telegram-user-crabbox-proof}"
"$proof_cmd" --text /status
```
This is a start/send/finish shortcut. Prefer the held session for PR review,
issue reproduction, or any task where the agent may need several attempts.

View File

@@ -1,87 +0,0 @@
---
name: verify-release
description: "Verify an OpenClaw release is fully published across GitHub, npm, plugins, ClawHub, package smoke, and live Gateway agent turns."
---
# Verify Release
Use this when asked whether an OpenClaw release is fully released, published,
promoted, smoke-tested, or live-verified. This is a verification skill, not a
publish skill; use `$release-openclaw-maintainer` before changing release state.
## Rules
- Resolve short suffixes like `.27` to the concrete CalVer version from the
current date/context, then say the resolved version.
- Verify live state. Do not trust local checkout state, release notes, or old
memory as current truth.
- If the checkout is dirty or divergent, use it only for scripts/reference.
For version metadata, fetch from GitHub release/tag or unpack the tag tarball
under `/tmp`.
- Never print secrets. Use inherited live keys only for scoped smoke commands.
- Keep the final terse: `yes/no`, evidence bullets, caveats, cleanup.
## Core Checks
1. GitHub release:
- `gh release view v<VERSION> --repo openclaw/openclaw --json tagName,name,publishedAt,isDraft,isPrerelease,targetCommitish,url,body,assets`
- Confirm stable releases are not draft/prerelease.
- Confirm release body has npm, CI, plugin npm, ClawHub, mac/appcast evidence
links when expected.
- Confirm assets expected for stable mac releases are uploaded: zip, dmg,
dSYM, dependency evidence when present.
2. Root npm:
- `npm view openclaw@<VERSION> version dist-tags.latest dist.tarball dist.integrity time.<VERSION> --json`
- `latest` must equal `<VERSION>` for stable.
- Record tarball, integrity, publish time.
3. Plugin publish set:
- Get exact tag metadata from GitHub, not the local checkout when dirty:
download `https://api.github.com/repos/openclaw/openclaw/tarball/v<VERSION>`
into `/tmp/openclaw-v<VERSION>-src`.
- Count `extensions/*/package.json` with
`openclaw.release.publishToNpm === true` and
`openclaw.release.publishToClawHub === true`.
- Compare expected counts to workflow job counts:
`gh api repos/openclaw/openclaw/actions/runs/<RUN>/jobs --paginate`.
- Each expected npm plugin must have version `<VERSION>` and
`dist-tags.latest === <VERSION>`.
4. ClawHub:
- Check the Plugin ClawHub Release workflow conclusion and publish job count.
- Use OpenClaw itself for live registry proof:
`openclaw plugins search <known-plugin> --json`.
- Install one official plugin from ClawHub in an isolated HOME:
`openclaw plugins install clawhub:@openclaw/matrix --pin`.
Prefer `matrix` unless that plugin is not in the expected set.
5. Release workflows:
- Verify conclusions for release notes evidence links:
Full Release Validation, OpenClaw Release Checks, OpenClaw NPM Release,
Plugin NPM Release, Plugin ClawHub Release, mac preflight/validation/publish
when stable mac assets are expected.
- Summarize only relevant successful/failed jobs; ignore routine skipped
optional lanes unless the release body promised them.
6. Published package smoke:
- In `/tmp`, isolated HOME:
`npm exec --yes --package openclaw@<VERSION> -- openclaw --version`.
- Run at least one harmless command that touches the published CLI surface,
for example `plugins --help` or `gateway --help`.
7. Dev Gateway live model smoke:
- Use temp HOME/workspace, not the user's normal state:
`HOME=/tmp/openclaw-release-smoke/home OPENCLAW_WORKSPACE=/tmp/openclaw-release-smoke/work pnpm openclaw --dev gateway run --auth none --force --verbose`.
- Health check via CLI: `openclaw --dev gateway health --json`.
- Run one Gateway-backed agent turn with inherited `OPENAI_API_KEY`, short
prompt, explicit session key, JSON output, and a known-available model.
- If the configured default model fails as unavailable, record that caveat
and retry with the newest known-good OpenAI model instead of declaring the
release failed.
- Stop the gateway and verify the port is not listening.
## Caveats To Report
- Dist-tag caveat: stable `latest` is release truth; if optional `beta` mirrors
still point at a beta version, report it as a caveat, not a stable-release
blocker, unless the user asked to verify beta promotion.
- Divergent checkout caveat: say when local source SHA differs from release tag
or origin and which live sources were used instead.
- Smoke caveat: distinguish Gateway-backed agent success from local embedded
fallback. A valid Gateway smoke has health OK plus gateway log/run id for the
agent call.

View File

@@ -1,55 +0,0 @@
profile: openclaw-check
# Default OpenClaw runner spend to the Azure-backed Crabbox account.
# Use `--provider aws` only for AWS-specific runner proof.
provider: azure
class: standard
capacity:
market: spot
strategy: most-available
# Fail closed instead of silently falling back to on-demand while the
# Azure-backed billing account is the default runner path.
fallback: spot-only
hints: true
actions:
workflow: .github/workflows/crabbox-hydrate.yml
# Default AWS hydration uses local Actions replay. Use
# `crabbox actions hydrate --github-runner --job hydrate-github` when the
# hydrate job needs GitHub secrets, or `--github-runner --job
# hydrate-windows-daemon` for focused native Windows daemon proof.
job: hydrate
ref: main
runnerLabels:
- crabbox
- openclaw
runnerVersion: latest
ephemeral: true
blacksmith:
org: openclaw
workflow: .github/workflows/ci-check-testbox.yml
job: check
ref: main
aws:
# AWS-specific overrides still pin direct `--provider aws` runs without
# leaking AWS region names into the Azure default capacity fallback list.
region: eu-west-1
rootGB: 400
sync:
delete: true
checksum: false
gitSeed: true
fingerprint: true
baseRef: main
exclude:
- .artifacts
- .codex
- .DS_Store
- playwright-report
- test-results
env:
allow:
- CI
- NODE_OPTIONS
- OPENCLAW_*
ssh:
user: crabbox
port: "2222"

26
.detect-secrets.cfg Normal file
View File

@@ -0,0 +1,26 @@
# detect-secrets exclusion patterns (regex)
#
# Note: detect-secrets does not read this file by default. If you want these
# applied, wire them into your scan command (e.g. translate to --exclude-files
# / --exclude-lines) or into a baseline's filters_used.
[exclude-files]
# pnpm lockfiles contain lots of high-entropy package integrity blobs.
pattern = (^|/)pnpm-lock\.yaml$
[exclude-lines]
# Fastlane checks for private key marker; not a real key.
pattern = key_content\.include\?\("BEGIN PRIVATE KEY"\)
# UI label string for Anthropic auth mode.
pattern = case \.apiKeyEnv: "API key \(env var\)"
# CodingKeys mapping uses apiKey literal.
pattern = case apikey = "apiKey"
# Schema labels referencing password fields (not actual secrets).
pattern = "gateway\.remote\.password"
pattern = "gateway\.auth\.password"
# Schema label for talk API key (label text only).
pattern = "talk\.apiKey"
# checking for typeof is not something we care about.
pattern = === "string"
# specific optional-chaining password check that didn't match the line above.
pattern = typeof remote\?\.password === "string"

View File

@@ -1,21 +1,7 @@
.git
.worktrees
# Sensitive files scripts/docker/setup.sh writes .env with OPENCLAW_GATEWAY_TOKEN
# into the project root; keep it out of the build context.
.env
.env.*
.bun-cache
.bun
.artifacts
**/.artifacts
.local
**/.local
.pi
**/.pi
__openclaw_vitest__
**/__openclaw_vitest__
.tmp
**/.tmp
.DS_Store
@@ -41,14 +27,9 @@ node_modules
**/.next
coverage
**/coverage
docs/.generated
**/.generated
*.log
tmp
**/tmp
dist-runtime
**/dist-runtime
openclaw-path-alias-*
# build artifacts
dist
@@ -59,20 +40,9 @@ apps/ios/build
# large app trees not needed for CLI build
apps/
assets/
Peekaboo/
Swabble/
Core/
Users/
vendor/
# Needed for building the Canvas A2UI bundle during Docker image builds.
# Keep the rest of apps/ and vendor/ excluded to avoid a large build context.
!apps/shared/
!apps/shared/OpenClawKit/
!apps/shared/OpenClawKit/Sources/
!apps/shared/OpenClawKit/Sources/OpenClawKit/
!apps/shared/OpenClawKit/Sources/OpenClawKit/Resources/
!apps/shared/OpenClawKit/Sources/OpenClawKit/Resources/tool-display.json
!apps/shared/OpenClawKit/Tools/
!apps/shared/OpenClawKit/Tools/CanvasA2UI/
!apps/shared/OpenClawKit/Tools/CanvasA2UI/**
!vendor/a2ui/
!vendor/a2ui/renderers/
!vendor/a2ui/renderers/lit/
!vendor/a2ui/renderers/lit/**

View File

@@ -1,95 +1,5 @@
# OpenClaw .env example
#
# Quick start:
# 1) Copy this file to `.env` (for local runs from this repo), OR to `~/.openclaw/.env` (for launchd/systemd daemons).
# 2) Fill only the values you use.
# 3) Keep real secrets out of git.
#
# Env-source precedence for environment variables (highest -> lowest):
# process env, ./.env, ~/.openclaw/.env, then openclaw.json `env` block.
# Existing non-empty process env vars are not overridden by dotenv/config env loading.
# Note: direct config keys (for example `gateway.auth.token` or channel tokens in openclaw.json)
# are resolved separately from env loading and often take precedence over env fallbacks.
# -----------------------------------------------------------------------------
# Gateway auth + paths
# -----------------------------------------------------------------------------
# Required if the gateway binds beyond loopback. Leave blank to have OpenClaw
# auto-generate a token on first start, or provide your own using
# `openssl rand -hex 32`. The gateway will refuse to start if this is set to
# the documented example placeholder, so never copy-paste an example value
# from docs or tutorials into this file verbatim.
OPENCLAW_GATEWAY_TOKEN=
# Optional alternative auth mode (use token OR password).
# OPENCLAW_GATEWAY_PASSWORD=
# Optional path overrides (defaults shown for reference).
# OPENCLAW_STATE_DIR=~/.openclaw
# OPENCLAW_CONFIG_PATH=~/.openclaw/openclaw.json
# OPENCLAW_HOME=~
# Docker setup stores auth profile encryption key material outside the mounted
# OpenClaw state dir and mounts this host directory into the container.
# OPENCLAW_AUTH_PROFILE_SECRET_DIR=/absolute/path/to/.openclaw-auth-profile-secrets
# Allowlist of extra directories that `$include` directives in openclaw.json may
# resolve files from. Path-list separated (':' on POSIX, ';' on Windows). Each
# entry is tilde-expanded. Without this, `$include` is confined to the directory
# containing openclaw.json.
# OPENCLAW_INCLUDE_ROOTS=/etc/openclaw/shared:~/.openclaw/shared
# Optional: import missing keys from your login shell profile.
# OPENCLAW_LOAD_SHELL_ENV=1
# OPENCLAW_SHELL_ENV_TIMEOUT_MS=15000
# -----------------------------------------------------------------------------
# Model provider API keys (set at least one)
# -----------------------------------------------------------------------------
# OPENAI_API_KEY=sk-...
# ANTHROPIC_API_KEY=sk-ant-...
# GEMINI_API_KEY=...
# OPENROUTER_API_KEY=sk-or-...
# OPENCLAW_LIVE_OPENAI_KEY=sk-...
# OPENCLAW_LIVE_ANTHROPIC_KEY=sk-ant-...
# OPENCLAW_LIVE_GEMINI_KEY=...
# OPENAI_API_KEY_1=...
# ANTHROPIC_API_KEY_1=...
# GEMINI_API_KEY_1=...
# GOOGLE_API_KEY=...
# OPENAI_API_KEYS=sk-1,sk-2
# ANTHROPIC_API_KEYS=sk-ant-1,sk-ant-2
# GEMINI_API_KEYS=key-1,key-2
# Optional additional providers
# ZAI_API_KEY=...
# AI_GATEWAY_API_KEY=...
# TOKENHUB_API_KEY=...
# LKEAP_API_KEY=...
# MINIMAX_API_KEY=...
# SYNTHETIC_API_KEY=...
# -----------------------------------------------------------------------------
# Channels (only set what you enable)
# -----------------------------------------------------------------------------
# TELEGRAM_BOT_TOKEN=123456:ABCDEF...
# DISCORD_BOT_TOKEN=...
# SLACK_BOT_TOKEN=xoxb-...
# SLACK_APP_TOKEN=xapp-...
# Optional channel env fallbacks
# MATTERMOST_BOT_TOKEN=...
# MATTERMOST_URL=https://chat.example.com
# ZALO_BOT_TOKEN=...
# OPENCLAW_TWITCH_ACCESS_TOKEN=oauth:...
# -----------------------------------------------------------------------------
# Tools + voice/media (optional)
# -----------------------------------------------------------------------------
# BRAVE_API_KEY=...
# PERPLEXITY_API_KEY=pplx-...
# FIRECRAWL_API_KEY=...
# ELEVENLABS_API_KEY=...
# XI_API_KEY=... # alias for ElevenLabs
# INWORLD_API_KEY=...
# DEEPGRAM_API_KEY=...
# Copy to .env and fill with your Twilio credentials
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=your_auth_token_here
# Must be a WhatsApp-enabled Twilio number, prefixed with whatsapp:
TWILIO_WHATSAPP_FROM=whatsapp:+17343367101

5
.gitattributes vendored
View File

@@ -1,6 +1 @@
* text=auto eol=lf
CLAUDE.md -text
src/gateway/server-methods/CLAUDE.md -text
ui/src/i18n/.i18n/* linguist-generated
ui/src/i18n/locales/*.ts linguist-generated
ui/src/i18n/locales/en.ts -linguist-generated

66
.github/CODEOWNERS vendored
View File

@@ -1,66 +0,0 @@
# Protect the ownership rules themselves.
/.github/CODEOWNERS @steipete
# WARNING: GitHub CODEOWNERS uses last-match-wins semantics.
# If you add overlapping rules below the secops block, include @openclaw/openclaw-secops
# on those entries too or you can silently remove required secops review.
# Security-sensitive code, config, and docs require secops review.
/SECURITY.md @openclaw/openclaw-secops
/.github/dependabot.yml @openclaw/openclaw-secops
/.github/codeql/ @openclaw/openclaw-secops
/.github/workflows/codeql.yml @openclaw/openclaw-secops
/.github/workflows/codeql-android-critical-security.yml @openclaw/openclaw-secops
/.github/workflows/codeql-critical-quality.yml @openclaw/openclaw-secops
/.github/workflows/dependency-guard.yml @openclaw/openclaw-secops
/test/scripts/dependency-guard-workflow.test.ts @openclaw/openclaw-secops
/test/scripts/dependency-guard-script.test.ts @openclaw/openclaw-secops
/scripts/github/dependency-guard.mjs @openclaw/openclaw-secops
/package-lock.json @openclaw/openclaw-secops
/npm-shrinkwrap.json @openclaw/openclaw-secops
/extensions/*/package-lock.json @openclaw/openclaw-secops
/extensions/*/npm-shrinkwrap.json @openclaw/openclaw-secops
/pnpm-lock.yaml @openclaw/openclaw-secops
/scripts/generate-npm-shrinkwrap.mjs @openclaw/openclaw-secops
/src/security/ @openclaw/openclaw-secops
/src/secrets/ @openclaw/openclaw-secops
/src/config/*secret*.ts @openclaw/openclaw-secops
/src/config/**/*secret*.ts @openclaw/openclaw-secops
/src/gateway/*auth*.ts @openclaw/openclaw-secops
/src/gateway/**/*auth*.ts @openclaw/openclaw-secops
/src/gateway/*secret*.ts @openclaw/openclaw-secops
/src/gateway/**/*secret*.ts @openclaw/openclaw-secops
/src/gateway/security-path*.ts @openclaw/openclaw-secops
/src/gateway/resolve-configured-secret-input-string*.ts @openclaw/openclaw-secops
/packages/gateway-protocol/src/**/*secret*.ts @openclaw/openclaw-secops
/src/gateway/server-methods/secrets*.ts @openclaw/openclaw-secops
/src/agents/*auth*.ts @openclaw/openclaw-secops
/src/agents/**/*auth*.ts @openclaw/openclaw-secops
/src/agents/auth-profiles*.ts @openclaw/openclaw-secops
/src/agents/auth-health*.ts @openclaw/openclaw-secops
/src/agents/auth-profiles/ @openclaw/openclaw-secops
/src/agents/sandbox.ts @openclaw/openclaw-secops
/src/agents/sandbox-*.ts @openclaw/openclaw-secops
/src/agents/sandbox/ @openclaw/openclaw-secops
/src/infra/secret-file*.ts @openclaw/openclaw-secops
/src/cron/stagger.ts @openclaw/openclaw-secops
/src/cron/service/jobs.ts @openclaw/openclaw-secops
/docs/security/ @openclaw/openclaw-secops
/docs/gateway/authentication.md @openclaw/openclaw-secops
/docs/gateway/sandbox-vs-tool-policy-vs-elevated.md @openclaw/openclaw-secops
/docs/gateway/sandboxing.md @openclaw/openclaw-secops
/docs/gateway/secrets-plan-contract.md @openclaw/openclaw-secops
/docs/gateway/secrets.md @openclaw/openclaw-secops
/docs/gateway/security/ @openclaw/openclaw-secops
/docs/cli/approvals.md @openclaw/openclaw-secops
/docs/cli/sandbox.md @openclaw/openclaw-secops
/docs/cli/security.md @openclaw/openclaw-secops
/docs/cli/secrets.md @openclaw/openclaw-secops
/docs/reference/secretref-credential-surface.md @openclaw/openclaw-secops
/docs/reference/secretref-user-supplied-credentials-matrix.json @openclaw/openclaw-secops
# Release workflow and its supporting release-path checks.
/.github/workflows/openclaw-npm-release.yml @openclaw/openclaw-release-managers
/docs/reference/RELEASING.md @openclaw/openclaw-release-managers
/scripts/openclaw-npm-publish.sh @openclaw/openclaw-release-managers
/scripts/openclaw-npm-release-check.ts @openclaw/openclaw-release-managers
/scripts/release-check.ts @openclaw/openclaw-release-managers

28
.github/ISSUE_TEMPLATE/bug_report.md vendored Normal file
View File

@@ -0,0 +1,28 @@
---
name: Bug report
about: Report a problem or unexpected behavior in Clawdbot.
title: "[Bug]: "
labels: bug
---
## Summary
What went wrong?
## Steps to reproduce
1.
2.
3.
## Expected behavior
What did you expect to happen?
## Actual behavior
What actually happened?
## Environment
- Clawdbot version:
- OS:
- Install method (pnpm/npx/docker/etc):
## Logs or screenshots
Paste relevant logs or add screenshots (redact secrets).

View File

@@ -1,150 +0,0 @@
name: Bug report
description: Report defects, including regressions, crashes, and behavior bugs.
title: "[Bug]: "
labels:
- bug
body:
- type: markdown
attributes:
value: |
Thanks for filing this report. Keep every answer concise, reproducible, and grounded in observed evidence.
Do not speculate or infer beyond the evidence. If a narrative section cannot be answered from the available evidence, respond with exactly `NOT_ENOUGH_INFO`.
If this is a plugin beta-release blocker, rename the issue title to `Beta blocker: <plugin-name> - <summary>` and apply the `beta-blocker` label after filing.
Please only report one issue per submission. Break multiple issues up into separate submissions.
- type: dropdown
id: bug_type
attributes:
label: Bug type
description: Choose the category that best matches this report.
options:
- Regression (worked before, now fails)
- Crash (process/app exits or hangs)
- Behavior bug (incorrect output/state without crash)
validations:
required: true
- type: dropdown
id: beta_blocker
attributes:
label: Beta release blocker
description: >
Choose `Yes` only if this blocks plugin compatibility during the current beta release window.
Selecting `Yes` does not apply the label automatically. You must also rename the issue title
to `Beta blocker: <plugin-name> - <summary>` for the automation to apply the `beta-blocker` label.
options:
- "No"
- "Yes"
validations:
required: true
- type: textarea
id: summary
attributes:
label: Summary
description: One-sentence statement of what is broken, based only on observed evidence. If the evidence is insufficient, respond with exactly `NOT_ENOUGH_INFO`.
placeholder: After upgrading from 2026.2.10 to 2026.2.17, Telegram thread replies stopped posting; reproduced twice and confirmed by gateway logs.
validations:
required: true
- type: textarea
id: repro
attributes:
label: Steps to reproduce
description: Provide the shortest deterministic repro path supported by direct observation. If the repro path cannot be grounded from the evidence, respond with exactly `NOT_ENOUGH_INFO`.
placeholder: |
1. Start OpenClaw 2026.2.17 with the attached config.
2. Send a Telegram thread reply in the affected chat.
3. Observe no reply and confirm the attached `reply target not found` log line.
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected behavior
description: State the expected result using a concrete reference such as prior observed behavior, attached docs, or a known-good version. If no grounded reference exists, respond with exactly `NOT_ENOUGH_INFO`.
placeholder: In 2026.2.10, the agent posted replies in the same Telegram thread under the same workflow.
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual behavior
description: Describe only the observed result, including user-visible errors and cited evidence. If the observed result cannot be grounded from the evidence, respond with exactly `NOT_ENOUGH_INFO`.
placeholder: No reply is posted in the thread; the attached gateway log shows `reply target not found` at 14:23:08 UTC.
validations:
required: true
- type: input
id: version
attributes:
label: OpenClaw version
description: Exact version/build tested.
placeholder: <version such as 2026.2.17>
validations:
required: true
- type: input
id: os
attributes:
label: Operating system
description: OS and version where this occurs.
placeholder: macOS 15.4 / Ubuntu 24.04 / Windows 11
validations:
required: true
- type: input
id: install_method
attributes:
label: Install method
description: How OpenClaw was installed or launched.
placeholder: npm global / pnpm dev / docker / mac app
- type: input
id: model
attributes:
label: Model
description: Effective model under test.
placeholder: minimax/text-01 / openrouter/anthropic/claude-opus-4.1 / anthropic/claude-sonnet-4.5
validations:
required: true
- type: input
id: provider_chain
attributes:
label: Provider / routing chain
description: Effective request path through gateways, proxies, providers, or model routers.
placeholder: openclaw -> cloudflare-ai-gateway -> minimax
validations:
required: true
- type: textarea
id: provider_setup_details
attributes:
label: Additional provider/model setup details
description: Optional. Include redacted routing details, per-agent overrides, auth-profile interactions, env/config context, or anything else needed to explain the effective provider/model setup. Do not include API keys, tokens, or passwords.
placeholder: |
Default route is openclaw -> cloudflare-ai-gateway -> minimax.
Previous setup was openclaw -> cloudflare-ai-gateway -> openrouter -> minimax.
Relevant config lives in ~/.openclaw/openclaw.json under models.providers.minimax and models.providers.cloudflare-ai-gateway.
- type: textarea
id: logs
attributes:
label: Logs, screenshots, and evidence
description: Include the redacted logs, screenshots, recordings, docs, or version comparisons that support the grounded answers above.
render: shell
- type: textarea
id: impact
attributes:
label: Impact and severity
description: |
Explain who is affected, how severe it is, how often it happens, and the practical consequence using only observed evidence.
If any part cannot be grounded from the evidence, respond with exactly `NOT_ENOUGH_INFO`.
Include:
- Affected users/systems/channels
- Severity (annoying, blocks workflow, data risk, etc.)
- Frequency (always/intermittent/edge case)
- Consequence (missed messages, failed onboarding, extra cost, etc.)
placeholder: |
Affected: Telegram group users on 2026.2.17
Severity: High (blocks thread replies)
Frequency: 4/4 observed attempts
Consequence: Agents do not respond in the affected threads
- type: textarea
id: additional_information
attributes:
label: Additional information
description: Add any remaining grounded context that helps triage but does not fit above. If this is a regression, include the last known good and first known bad versions when observed. If there is not enough evidence, respond with exactly `NOT_ENOUGH_INFO`.
placeholder: Last known good version 2026.2.10, first known bad version 2026.2.17, temporary workaround is sending a top-level message instead of a thread reply.

View File

@@ -1,8 +1,8 @@
blank_issues_enabled: false
blank_issues_enabled: true
contact_links:
- name: Onboarding
url: https://discord.gg/clawd
about: "New to OpenClaw? Join Discord for setup guidance in #help."
about: New to Clawdbot? Join Discord for setup guidance from Krill in #help.
- name: Support
url: https://discord.gg/clawd
about: "Get help from the OpenClaw community on Discord in #help."
about: Get help from Krill and the community on Discord in #help.

View File

@@ -0,0 +1,18 @@
---
name: Feature request
about: Suggest an idea or improvement for Clawdbot.
title: "[Feature]: "
labels: enhancement
---
## Summary
Describe the problem you are trying to solve or the opportunity you see.
## Proposed solution
What would you like Clawdbot to do?
## Alternatives considered
Any other approaches you have considered?
## Additional context
Links, screenshots, or related issues.

View File

@@ -1,70 +0,0 @@
name: Feature request
description: Propose a new capability or product improvement.
title: "[Feature]: "
labels:
- enhancement
body:
- type: markdown
attributes:
value: |
Help us evaluate this request with concrete use cases and tradeoffs.
- type: textarea
id: summary
attributes:
label: Summary
description: One-line statement of the requested capability.
placeholder: Add per-channel default response prefix.
validations:
required: true
- type: textarea
id: problem
attributes:
label: Problem to solve
description: What user pain this solves and why current behavior is insufficient.
placeholder: Agents cannot distinguish persona context in mixed channels, causing misrouted follow-ups.
validations:
required: true
- type: textarea
id: proposed_solution
attributes:
label: Proposed solution
description: Desired behavior/API/UX with as much specificity as possible.
placeholder: Support channels.<channel>.responsePrefix with default fallback and account-level override.
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives considered
description: Other approaches considered and why they are weaker.
placeholder: Manual prefixing in prompts is inconsistent and hard to enforce.
- type: textarea
id: impact
attributes:
label: Impact
description: |
Explain who is affected, severity/urgency, how often this pain occurs, and practical consequences.
Include:
- Affected users/systems/channels
- Severity (annoying, blocks workflow, etc.)
- Frequency (always/intermittent/edge case)
- Consequence (delays, errors, extra manual work, etc.)
placeholder: |
Affected: Multi-team shared channels
Severity: Medium
Frequency: Daily
Consequence: +20 minutes/day/operator and delayed alerts
validations:
required: true
- type: textarea
id: evidence
attributes:
label: Evidence/examples
description: Prior art, links, screenshots, logs, or metrics.
placeholder: Comparable behavior in X, sample config, and screenshot of current limitation.
- type: textarea
id: additional_information
attributes:
label: Additional information
description: Extra context, constraints, or references not covered above.
placeholder: Must remain backward-compatible with existing config keys.

View File

@@ -1,31 +0,0 @@
# actionlint configuration
# https://github.com/rhysd/actionlint/blob/main/docs/config.md
self-hosted-runner:
labels:
# Blacksmith CI runners
- blacksmith-4vcpu-ubuntu-2404
- blacksmith-8vcpu-ubuntu-2404
- blacksmith-8vcpu-windows-2025
- blacksmith-16vcpu-ubuntu-2404
- blacksmith-32vcpu-ubuntu-2404
- blacksmith-16vcpu-windows-2025
- blacksmith-32vcpu-windows-2025
- blacksmith-16vcpu-ubuntu-2404-arm
- blacksmith-6vcpu-macos-latest
- blacksmith-12vcpu-macos-latest
- blacksmith-6vcpu-macos-15
- blacksmith-12vcpu-macos-15
- blacksmith-6vcpu-macos-26
- blacksmith-12vcpu-macos-26
# Ignore patterns for known issues
paths:
.github/workflows/**/*.yml:
ignore:
# Ignore shellcheck warnings (we run shellcheck separately)
- "shellcheck reported issue.+"
# Ignore intentional if: false for disabled jobs
- 'constant expression "false" in condition'
# actionlint's built-in runner label allowlist lags Blacksmith additions.
- 'label "blacksmith-16vcpu-[^"]+" is unknown\.'

View File

@@ -1,65 +0,0 @@
name: Detect docs-only changes
description: >
Outputs docs_only=true when all changed files are under docs/ or are
markdown (.md/.mdx). Fail-safe: if detection fails, outputs false (run
everything). Uses git diff — no API calls, no extra permissions needed.
outputs:
docs_only:
description: "'true' if all changes are docs/markdown, 'false' otherwise"
value: ${{ steps.check.outputs.docs_only }}
docs_changed:
description: "'true' if any changed file is under docs/ or is markdown"
value: ${{ steps.check.outputs.docs_changed }}
runs:
using: composite
steps:
- name: Detect docs-only changes
id: check
shell: bash
run: |
if [ "${{ github.event_name }}" = "push" ]; then
BASE="${{ github.event.before }}"
else
# Use the exact base SHA from the event payload — stable regardless
# of base branch movement (avoids origin/<ref> drift).
BASE="${{ github.event.pull_request.base.sha }}"
fi
# Fail-safe: if we can't diff, assume non-docs (run everything)
CHANGED=$(git diff --name-only "$BASE" HEAD 2>/dev/null || echo "UNKNOWN")
if [ "$CHANGED" = "UNKNOWN" ] || [ -z "$CHANGED" ]; then
echo "docs_only=false" >> "$GITHUB_OUTPUT"
echo "docs_changed=false" >> "$GITHUB_OUTPUT"
exit 0
fi
docs_changed=false
non_docs=false
while IFS= read -r changed_path; do
case "$changed_path" in
test/fixtures/*)
non_docs=true
;;
docs/* | *.md | *.mdx)
docs_changed=true
;;
*)
non_docs=true
;;
esac
done <<< "$CHANGED"
if [ "$docs_changed" = "true" ]; then
echo "docs_changed=true" >> "$GITHUB_OUTPUT"
else
echo "docs_changed=false" >> "$GITHUB_OUTPUT"
fi
if [ "$non_docs" = "false" ]; then
echo "docs_only=true" >> "$GITHUB_OUTPUT"
echo "Docs-only change detected — skipping heavy jobs"
else
echo "docs_only=false" >> "$GITHUB_OUTPUT"
fi

View File

@@ -1,172 +0,0 @@
name: Docker E2E plan and hydrate
description: >
Create a Docker E2E lane plan, expose GitHub outputs, and optionally hydrate
the prebuilt package artifact plus shared Docker images needed by the plan.
inputs:
mode:
description: prepare, chunk, or targeted.
required: true
chunk:
description: Release-path chunk for mode=chunk.
required: false
default: ""
lanes:
description: Comma/space separated lane names for targeted or prepare mode.
required: false
default: ""
include-openwebui:
description: Whether Open WebUI is included when planning release/prepare coverage.
required: false
default: "true"
include-release-path-suites:
description: Whether prepare mode should plan all release-path suites.
required: false
default: "false"
hydrate-artifacts:
description: Whether to download/pull artifacts required by the plan.
required: false
default: "true"
package-artifact-name:
description: Workflow artifact name containing openclaw-current.tgz.
required: false
default: docker-e2e-package
outputs:
credentials:
description: Comma-separated credential groups required by selected lanes.
value: ${{ steps.plan.outputs.credentials }}
needs_bare_image:
description: "1 when selected lanes require the bare Docker E2E image."
value: ${{ steps.plan.outputs.needs_bare_image }}
needs_e2e_image:
description: "1 when selected lanes require any Docker E2E image."
value: ${{ steps.plan.outputs.needs_e2e_image }}
needs_functional_image:
description: "1 when selected lanes require the functional Docker E2E image."
value: ${{ steps.plan.outputs.needs_functional_image }}
needs_live_image:
description: "1 when selected lanes require building the live Docker image."
value: ${{ steps.plan.outputs.needs_live_image }}
needs_package:
description: "1 when selected lanes require the OpenClaw package tarball."
value: ${{ steps.plan.outputs.needs_package }}
plan_json:
description: Path to the generated plan JSON.
value: ${{ steps.plan.outputs.plan_json }}
runs:
using: composite
steps:
- name: Plan Docker E2E lanes
id: plan
shell: bash
env:
MODE: ${{ inputs.mode }}
CHUNK: ${{ inputs.chunk }}
LANES: ${{ inputs.lanes }}
INCLUDE_OPENWEBUI: ${{ inputs.include-openwebui }}
INCLUDE_RELEASE_PATH_SUITES: ${{ inputs.include-release-path-suites }}
run: |
set -euo pipefail
mkdir -p .artifacts/docker-tests
case "$MODE" in
prepare)
plan_path=".artifacts/docker-tests/plan.json"
if [[ "$INCLUDE_RELEASE_PATH_SUITES" == "true" ]]; then
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
export OPENCLAW_DOCKER_ALL_PLAN_RELEASE_ALL=1
elif [[ -n "$LANES" ]]; then
export OPENCLAW_DOCKER_ALL_LANES="$LANES"
elif [[ "$INCLUDE_OPENWEBUI" == "true" ]]; then
export OPENCLAW_DOCKER_ALL_LANES=openwebui
fi
;;
chunk)
if [[ -z "$CHUNK" ]]; then
echo "chunk input is required for Docker E2E chunk planning." >&2
exit 1
fi
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
export OPENCLAW_DOCKER_ALL_CHUNK="$CHUNK"
plan_path=".artifacts/docker-tests/release-${CHUNK}-plan.json"
;;
targeted)
if [[ -z "$LANES" ]]; then
echo "lanes input is required for Docker E2E targeted planning." >&2
exit 1
fi
if [[ "$INCLUDE_RELEASE_PATH_SUITES" == "true" ]]; then
export OPENCLAW_DOCKER_ALL_PROFILE=release-path
fi
export OPENCLAW_DOCKER_ALL_LANES="$LANES"
plan_path=".artifacts/docker-tests/targeted-plan.json"
;;
*)
echo "mode must be prepare, chunk, or targeted. Got: $MODE" >&2
exit 1
;;
esac
export OPENCLAW_DOCKER_ALL_INCLUDE_OPENWEBUI="$INCLUDE_OPENWEBUI"
node scripts/test-docker-all.mjs --plan-json > "$plan_path"
node scripts/docker-e2e.mjs github-outputs "$plan_path" >> "$GITHUB_OUTPUT"
echo "plan_json=$plan_path" >> "$GITHUB_OUTPUT"
- name: Download OpenClaw Docker E2E package
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_package == '1'
uses: actions/download-artifact@v8
with:
name: ${{ inputs.package-artifact-name }}
path: .artifacts/docker-e2e-package
- name: Pull shared bare Docker E2E image
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_bare_image == '1'
shell: bash
run: |
set -euo pipefail
bash scripts/ci-docker-pull-retry.sh "${OPENCLAW_DOCKER_E2E_BARE_IMAGE}"
- name: Pull shared functional Docker E2E image
if: inputs.hydrate-artifacts == 'true' && steps.plan.outputs.needs_functional_image == '1'
shell: bash
run: |
set -euo pipefail
bash scripts/ci-docker-pull-retry.sh "${OPENCLAW_DOCKER_E2E_FUNCTIONAL_IMAGE}"
- name: Validate Docker E2E credentials
if: inputs.hydrate-artifacts == 'true'
shell: bash
env:
CREDENTIALS: ${{ steps.plan.outputs.credentials }}
run: |
set -euo pipefail
credentials=",$CREDENTIALS,"
require_any() {
local label="$1"
shift
local key
for key in "$@"; do
if [[ -n "${!key:-}" ]]; then
return 0
fi
done
echo "Missing credential for ${label}: expected one of $*" >&2
exit 1
}
if [[ "$credentials" == *",openai,"* ]]; then
require_any OpenAI OPENAI_API_KEY
fi
if [[ "$credentials" == *",codex,"* ]]; then
require_any Codex OPENCLAW_CODEX_AUTH_JSON
fi
if [[ "$credentials" == *",anthropic,"* ]]; then
require_any Anthropic ANTHROPIC_API_TOKEN ANTHROPIC_API_KEY OPENCLAW_CLAUDE_CREDENTIALS_JSON OPENCLAW_CLAUDE_JSON
fi
if [[ "$credentials" == *",factory,"* ]]; then
require_any Factory FACTORY_API_KEY
fi
if [[ "$credentials" == *",gemini,"* ]]; then
require_any Gemini GEMINI_API_KEY GOOGLE_API_KEY OPENCLAW_GEMINI_SETTINGS_JSON
fi
if [[ "$credentials" == *",opencode,"* ]]; then
require_any OpenCode OPENCODE_API_KEY OPENCODE_ZEN_API_KEY
fi

View File

@@ -1,67 +0,0 @@
name: Ensure base commit
description: Ensure a shallow checkout has enough history to diff against a base SHA.
inputs:
base-sha:
description: Base commit SHA to diff against.
required: true
fetch-ref:
description: Branch or ref to deepen/fetch from origin when base-sha is missing.
required: true
runs:
using: composite
steps:
- name: Ensure base commit is available
shell: bash
env:
BASE_SHA: ${{ inputs.base-sha }}
FETCH_REF: ${{ inputs.fetch-ref }}
run: |
set -euo pipefail
if [ -z "$BASE_SHA" ] || [[ "$BASE_SHA" =~ ^0+$ ]]; then
echo "No concrete base SHA available; skipping targeted fetch."
exit 0
fi
if ! [[ "$BASE_SHA" =~ ^[0-9a-fA-F]{7,40}$ ]]; then
echo "::error title=ensure-base-commit invalid base sha::Refusing invalid base SHA: $BASE_SHA"
exit 2
fi
if ! git check-ref-format --branch "$FETCH_REF" >/dev/null 2>&1; then
echo "::error title=ensure-base-commit invalid fetch ref::Refusing invalid fetch ref: $FETCH_REF"
exit 2
fi
if git rev-parse --verify "$BASE_SHA^{commit}" >/dev/null 2>&1; then
echo "Base commit already present: $BASE_SHA"
exit 0
fi
fetch_base_ref() {
timeout --signal=TERM --kill-after=10s 30s git \
-c protocol.version=2 \
fetch "$@"
}
for deepen_by in 25 100 300; do
echo "Base commit missing; deepening $FETCH_REF by $deepen_by."
if ! fetch_base_ref --no-tags --deepen="$deepen_by" origin -- "$FETCH_REF"; then
echo "::warning title=ensure-base-commit fetch failed::Failed to deepen $FETCH_REF by $deepen_by while looking for $BASE_SHA"
fi
if git rev-parse --verify "$BASE_SHA^{commit}" >/dev/null 2>&1; then
echo "Resolved base commit after deepening: $BASE_SHA"
exit 0
fi
done
echo "Base commit still missing; fetching full history for $FETCH_REF."
if ! fetch_base_ref --no-tags origin -- "$FETCH_REF"; then
echo "::warning title=ensure-base-commit fetch failed::Failed to fetch full history for $FETCH_REF while looking for $BASE_SHA"
fi
if git rev-parse --verify "$BASE_SHA^{commit}" >/dev/null 2>&1; then
echo "Resolved base commit after full ref fetch: $BASE_SHA"
exit 0
fi
echo "Base commit still unavailable after fetch attempts: $BASE_SHA"

View File

@@ -1,144 +0,0 @@
name: Setup Node environment
description: >
Install Node 24 by default, pnpm, optionally Bun, and optionally run pnpm
install. Requires actions/checkout to run first.
inputs:
node-version:
description: Node.js version to install.
required: false
default: "24.x"
install-bun:
description: Whether to install Bun alongside Node.
required: false
default: "true"
install-deps:
description: Whether to run pnpm install after environment setup.
required: false
default: "true"
frozen-lockfile:
description: Whether to use --frozen-lockfile for install.
required: false
default: "true"
use-actions-cache:
description: Whether to restore the pnpm store with actions/cache.
required: false
default: "true"
save-actions-cache:
description: Whether to save the pnpm store with actions/cache after install when no exact cache restored.
required: false
default: "false"
runs:
using: composite
steps:
- name: Normalize container toolcache
shell: bash
run: |
set -euo pipefail
if [[ -d /__t && ! -e /opt/hostedtoolcache ]]; then
mkdir -p /opt
ln -s /__t /opt/hostedtoolcache
fi
- name: Setup Node.js
shell: bash
env:
REQUESTED_NODE_VERSION: ${{ inputs.node-version }}
run: |
set -euo pipefail
source "$GITHUB_ACTION_PATH/../setup-pnpm-store-cache/ensure-node.sh"
openclaw_ensure_node "$REQUESTED_NODE_VERSION"
- name: Setup pnpm
id: setup-pnpm
uses: ./.github/actions/setup-pnpm-store-cache
with:
node-version: ${{ inputs.node-version }}
use-actions-cache: ${{ inputs.use-actions-cache }}
- name: Setup Bun
if: inputs.install-bun == 'true'
shell: bash
run: |
set -euo pipefail
npm install -g bun@1.3.14
- name: Runtime versions
shell: bash
run: |
node -v
npm -v
pnpm -v
if command -v bun &>/dev/null; then bun -v; fi
- name: Capture node path
shell: bash
run: |
node_bin="$(dirname "$(node -p 'process.execPath')")"
if command -v cygpath >/dev/null 2>&1; then
node_bin="$(cygpath -u "$node_bin")"
fi
# zizmor: ignore[github-env] node_bin comes from trusted actions/setup-node output in this composite action.
echo "NODE_BIN=$node_bin" >> "$GITHUB_ENV"
echo "$node_bin" >> "$GITHUB_PATH"
- name: Install dependencies
if: inputs.install-deps == 'true'
shell: bash
env:
CI: "true"
FROZEN_LOCKFILE: ${{ inputs.frozen-lockfile }}
run: |
set -euo pipefail
export PATH="$NODE_BIN:$PATH"
which node
node -v
pnpm -v
case "$FROZEN_LOCKFILE" in
true) LOCKFILE_FLAG="--frozen-lockfile" ;;
false) LOCKFILE_FLAG="" ;;
*)
echo "::error::Invalid frozen-lockfile input: '$FROZEN_LOCKFILE' (expected true or false)"
exit 2
;;
esac
install_args=(
install
--prefer-offline
--ignore-scripts=false
--config.engine-strict=false
--config.enable-pre-post-scripts=true
--config.side-effects-cache=true
)
if [ -n "$LOCKFILE_FLAG" ]; then
install_args+=("$LOCKFILE_FLAG")
fi
append_pnpm_option_arg() {
local env_name="$1"
local option_name="$2"
local value="${!env_name-}"
if [ -n "$value" ]; then
install_args+=("--${option_name}=${value}")
fi
}
append_pnpm_option_arg PNPM_CONFIG_CHILD_CONCURRENCY child-concurrency
append_pnpm_option_arg PNPM_CONFIG_MODULES_DIR modules-dir
append_pnpm_option_arg PNPM_CONFIG_NETWORK_CONCURRENCY network-concurrency
append_pnpm_option_arg PNPM_CONFIG_VIRTUAL_STORE_DIR virtual-store-dir
if [ -n "${PNPM_CONFIG_MODULES_DIR:-}" ]; then
mkdir -p "$PNPM_CONFIG_MODULES_DIR"
ln -sfn . "$PNPM_CONFIG_MODULES_DIR/node_modules"
fi
pnpm "${install_args[@]}" || pnpm "${install_args[@]}"
if [ -n "${PNPM_CONFIG_MODULES_DIR:-}" ]; then
rm -rf node_modules
ln -sfn "$PNPM_CONFIG_MODULES_DIR" node_modules
ln -sfn . "$PNPM_CONFIG_MODULES_DIR/node_modules"
fi
- name: Save pnpm store cache
if: ${{ inputs.install-deps == 'true' && inputs.use-actions-cache == 'true' && inputs.save-actions-cache == 'true' && runner.os != 'Windows' && steps.setup-pnpm.outputs.store-cache-hit != 'true' }}
uses: actions/cache/save@v5
with:
path: ${{ steps.setup-pnpm.outputs.store-path }}
key: ${{ steps.setup-pnpm.outputs.store-cache-primary-key }}

View File

@@ -1,108 +0,0 @@
name: Setup pnpm
description: Prepare pnpm from the repository packageManager and restore its store cache.
inputs:
package-manager-file:
description: package.json file that owns the packageManager pnpm pin.
required: false
default: "package.json"
lockfile-path:
description: pnpm lockfile used to key the store cache.
required: false
default: "pnpm-lock.yaml"
node-version:
description: Expected Node.js version already installed by actions/setup-node.
required: false
default: ""
use-actions-cache:
description: Whether actions/cache should restore the pnpm store.
required: false
default: "true"
outputs:
pnpm-version:
description: Resolved pnpm version activated by the setup action.
value: ${{ steps.pnpm-version.outputs.pnpm-version }}
project-dir:
description: Directory containing the packageManager file used for pnpm resolution.
value: ${{ steps.setup-pnpm.outputs.project-dir }}
store-cache-hit:
description: Whether the pnpm store cache restored an exact key.
value: ${{ steps.pnpm-store-cache.outputs.cache-hit }}
store-cache-primary-key:
description: Exact pnpm store cache key used for restore/save.
value: ${{ steps.pnpm-store-cache.outputs.cache-primary-key }}
store-path:
description: Resolved pnpm store path.
value: ${{ steps.pnpm-store.outputs.path }}
runs:
using: composite
steps:
- name: Validate pnpm setup inputs
id: setup-pnpm
shell: bash
env:
PACKAGE_MANAGER_FILE: ${{ inputs.package-manager-file }}
REQUESTED_NODE_VERSION: ${{ inputs.node-version }}
run: |
set -euo pipefail
project_dir="$(dirname "$PACKAGE_MANAGER_FILE")"
if [[ ! -f "$PACKAGE_MANAGER_FILE" ]]; then
echo "::error::package manager file not found: $PACKAGE_MANAGER_FILE"
exit 1
fi
echo "project-dir=$project_dir" >> "$GITHUB_OUTPUT"
requested_node="${REQUESTED_NODE_VERSION:-${NODE_VERSION:-}}"
source "$GITHUB_ACTION_PATH/ensure-node.sh"
openclaw_ensure_node "$requested_node"
- name: Setup pnpm from packageManager
shell: bash
env:
COREPACK_ENABLE_DOWNLOAD_PROMPT: "0"
PACKAGE_MANAGER_FILE: ${{ inputs.package-manager-file }}
run: |
set -euo pipefail
package_manager="$(node -e "const fs = require('node:fs'); const path = require('node:path'); const pkg = JSON.parse(fs.readFileSync(path.resolve(process.argv[1]), 'utf8')); process.stdout.write(pkg.packageManager || '')" "$PACKAGE_MANAGER_FILE")"
case "$package_manager" in
pnpm@*) ;;
*)
echo "::error::Expected packageManager to pin pnpm, got '${package_manager:-<empty>}'"
exit 1
;;
esac
corepack enable
for attempt in 1 2 3; do
if corepack prepare "$package_manager" --activate; then
exit 0
fi
sleep $((attempt * 5))
done
corepack prepare "$package_manager" --activate
- name: Resolve pnpm store path
id: pnpm-store
if: ${{ inputs.use-actions-cache == 'true' && runner.os != 'Windows' }}
shell: bash
run: |
set -euo pipefail
store_path="$(pnpm store path --silent)"
node -e "require('node:fs').mkdirSync(process.argv[1], { recursive: true })" "$store_path"
echo "path=$store_path" >> "$GITHUB_OUTPUT"
- name: Restore pnpm store cache
id: pnpm-store-cache
if: ${{ inputs.use-actions-cache == 'true' && runner.os != 'Windows' }}
uses: actions/cache/restore@v5
with:
path: ${{ steps.pnpm-store.outputs.path }}
key: pnpm-store-${{ runner.os }}-${{ runner.arch }}-${{ inputs.node-version }}-${{ hashFiles(inputs.package-manager-file) }}-${{ hashFiles(inputs.lockfile-path) }}
restore-keys: |
pnpm-store-${{ runner.os }}-${{ runner.arch }}-${{ inputs.node-version }}-${{ hashFiles(inputs.package-manager-file) }}-
pnpm-store-${{ runner.os }}-${{ runner.arch }}-${{ inputs.node-version }}-
- name: Record pnpm version
id: pnpm-version
shell: bash
env:
PROJECT_DIR: ${{ steps.setup-pnpm.outputs.project-dir }}
run: echo "pnpm-version=$(cd "$PROJECT_DIR" && pnpm -v)" >> "$GITHUB_OUTPUT"

View File

@@ -1,223 +0,0 @@
#!/usr/bin/env bash
openclaw_node_version_matches() {
local actual="$1"
local requested="$2"
if [[ -z "$requested" ]]; then
return 0
fi
case "$requested" in
*x)
[[ "${actual%%.*}" == "${requested%%.*}" ]] || return 1
if [[ "${requested%%.*}" == "22" ]]; then
openclaw_node_version_at_least "$actual" "22.19.0"
fi
;;
*.*.*)
[[ "$actual" == "$requested" ]]
;;
*.*)
[[ "$actual" == "$requested".* ]]
;;
*)
[[ "${actual%%.*}" == "$requested" ]]
;;
esac
}
openclaw_node_version_at_least() {
local actual="$1"
local minimum="$2"
local actual_major actual_minor actual_patch minimum_major minimum_minor minimum_patch
IFS=. read -r actual_major actual_minor actual_patch <<< "$actual"
IFS=. read -r minimum_major minimum_minor minimum_patch <<< "$minimum"
actual_minor="${actual_minor:-0}"
actual_patch="${actual_patch:-0}"
minimum_minor="${minimum_minor:-0}"
minimum_patch="${minimum_patch:-0}"
if (( actual_major != minimum_major )); then
(( actual_major > minimum_major ))
return
fi
if (( actual_minor != minimum_minor )); then
(( actual_minor > minimum_minor ))
return
fi
(( actual_patch >= minimum_patch ))
}
openclaw_active_node_version() {
node -p 'process.versions.node' 2>/dev/null || true
}
openclaw_prepend_node_bin() {
local node_bin_dir="$1"
local github_path_dir="${2:-$node_bin_dir}"
local shell_node_bin_dir="$node_bin_dir"
if command -v cygpath >/dev/null 2>&1; then
shell_node_bin_dir="$(cygpath -u "$node_bin_dir" 2>/dev/null || printf '%s' "$node_bin_dir")"
fi
export PATH="$shell_node_bin_dir:$PATH"
if [[ -n "${GITHUB_PATH:-}" ]]; then
local github_node_bin_dir="$github_path_dir"
if [[ $# -lt 2 ]] && command -v cygpath >/dev/null 2>&1; then
github_node_bin_dir="$shell_node_bin_dir"
github_node_bin_dir="$(cygpath -w "$shell_node_bin_dir" 2>/dev/null || printf '%s' "$shell_node_bin_dir")"
fi
echo "$github_node_bin_dir" >> "$GITHUB_PATH"
fi
hash -r
}
openclaw_find_toolcache_node() {
local requested_node="$1"
local roots=()
local root
for root in \
"${RUNNER_TOOL_CACHE:-}" \
"${AGENT_TOOLSDIRECTORY:-}" \
"${ACTIONS_RUNNER_TOOL_CACHE:-}" \
"${OPENCLAW_CONTAINER_TOOL_CACHE:-/__t}" \
"/opt/hostedtoolcache" \
"/home/runner/_work/_tool" \
"/Users/runner/hostedtoolcache" \
"/c/hostedtoolcache/windows"
do
if [[ ! -d "$root" && "$root" == *\\* ]] && command -v cygpath >/dev/null 2>&1; then
root="$(cygpath -u "$root" 2>/dev/null || printf '%s' "$root")"
fi
if [[ -d "$root/node" ]]; then
roots+=("$root/node")
elif [[ "$(basename "$root")" == "node" && -d "$root" ]]; then
roots+=("$root")
fi
done
local node_root candidate candidate_version
for node_root in ${roots[@]+"${roots[@]}"}; do
while IFS= read -r candidate; do
candidate_version="$("$candidate" -p 'process.versions.node' 2>/dev/null || true)"
if openclaw_node_version_matches "$candidate_version" "$requested_node"; then
printf '%s\n' "$candidate"
return 0
fi
done < <(find "$node_root" \( -name node -o -name node.exe \) -type f 2>/dev/null | sort -r)
done
return 1
}
openclaw_resolve_node_download_version() {
local requested_node="$1"
if [[ "$requested_node" =~ ^v?[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
[[ "$requested_node" == v* ]] && printf '%s\n' "$requested_node" || printf 'v%s\n' "$requested_node"
return 0
fi
local prefix="${requested_node#v}"
prefix="${prefix%%[xX]*}"
prefix="v${prefix}"
[[ "$prefix" == *. ]] || prefix="${prefix}."
curl -fsSL https://nodejs.org/dist/index.json |
OPENCLAW_NODE_PREFIX="$prefix" python3 -c 'import json, os, sys
prefix = os.environ["OPENCLAW_NODE_PREFIX"]
for item in json.load(sys.stdin):
version = item.get("version", "")
if version.startswith(prefix):
print(version)
break
'
}
openclaw_node_download_platform() {
local os_name arch_name
os_name="$(uname -s)"
arch_name="$(uname -m)"
case "$os_name:$arch_name" in
Linux:x86_64) printf 'linux-x64\n' ;;
Linux:aarch64 | Linux:arm64) printf 'linux-arm64\n' ;;
Darwin:x86_64) printf 'darwin-x64\n' ;;
Darwin:arm64) printf 'darwin-arm64\n' ;;
MINGW*:x86_64 | MSYS*:x86_64 | CYGWIN*:x86_64 | MINGW*:AMD64 | MSYS*:AMD64 | CYGWIN*:AMD64)
printf 'win-x64\n'
;;
MINGW*:aarch64 | MINGW*:arm64 | MSYS*:aarch64 | MSYS*:arm64 | CYGWIN*:aarch64 | CYGWIN*:arm64) printf 'win-arm64\n' ;;
*)
return 1
;;
esac
}
openclaw_download_node() {
local requested_node="$1"
local version platform archive_url install_root temp_root
version="$(openclaw_resolve_node_download_version "$requested_node")"
platform="$(openclaw_node_download_platform)" || return 1
temp_root="${RUNNER_TEMP:-/tmp}"
if command -v cygpath >/dev/null 2>&1; then
temp_root="$(cygpath -u "$temp_root" 2>/dev/null || printf '%s\n' "$temp_root")"
fi
install_root="${temp_root}/openclaw-node-${version}-${platform}"
if [[ "$platform" == win-* ]]; then
local archive_path ps_archive_path ps_install_root ps_bin_dir node_bin_dir
archive_path="${temp_root}/node-${version}-${platform}.zip"
archive_url="https://nodejs.org/dist/${version}/node-${version}-${platform}.zip"
rm -rf "$install_root"
mkdir -p "$install_root"
echo "Downloading Node ${version} from ${archive_url}"
curl -fsSL -o "$archive_path" "$archive_url"
ps_archive_path="$archive_path"
ps_install_root="$install_root"
if command -v cygpath >/dev/null 2>&1; then
ps_archive_path="$(cygpath -w "$archive_path")"
ps_install_root="$(cygpath -w "$install_root")"
fi
ps_bin_dir="$ps_install_root\\node-${version}-${platform}"
node_bin_dir="$install_root/node-${version}-${platform}"
if command -v pwsh >/dev/null 2>&1; then
pwsh -NoLogo -NoProfile -Command "Expand-Archive -LiteralPath '${ps_archive_path}' -DestinationPath '${ps_install_root}' -Force"
openclaw_prepend_node_bin "$node_bin_dir" "$ps_bin_dir"
elif command -v powershell.exe >/dev/null 2>&1; then
powershell.exe -NoLogo -NoProfile -Command "Expand-Archive -LiteralPath '${ps_archive_path}' -DestinationPath '${ps_install_root}' -Force"
openclaw_prepend_node_bin "$node_bin_dir" "$ps_bin_dir"
else
unzip -q "$archive_path" -d "$install_root"
openclaw_prepend_node_bin "$node_bin_dir"
fi
else
archive_url="https://nodejs.org/dist/${version}/node-${version}-${platform}.tar.xz"
mkdir -p "$install_root"
echo "Downloading Node ${version} from ${archive_url}"
curl -fsSL "$archive_url" | tar -xJ -C "$install_root" --strip-components=1
openclaw_prepend_node_bin "$install_root/bin"
fi
}
openclaw_ensure_node() {
local requested_node="${1:-}"
requested_node="${requested_node#v}"
if [[ -z "$requested_node" ]]; then
return 0
fi
local active_node_version node_bin
active_node_version="$(openclaw_active_node_version)"
if openclaw_node_version_matches "$active_node_version" "$requested_node"; then
echo "Using active Node ${active_node_version} at $(command -v node)"
return 0
fi
node_bin="$(openclaw_find_toolcache_node "$requested_node" || true)"
if [[ -n "$node_bin" ]]; then
echo "Using Node $("$node_bin" -p 'process.versions.node') from $node_bin"
openclaw_prepend_node_bin "$(dirname "$node_bin")"
else
openclaw_download_node "$requested_node" || true
fi
active_node_version="$(openclaw_active_node_version)"
if ! openclaw_node_version_matches "$active_node_version" "$requested_node"; then
echo "::error::Expected Node '${requested_node}', but active node is '${active_node_version:-missing}' at $(command -v node || true)"
return 1
fi
}

Some files were not shown because too many files have changed in this diff Show More