fix(heartbeat): multi-agent cadence — parallel broadcast, per-agent busy check, prompt assembly, connect-timeout, doctor warning (#80470)

* fix(heartbeat): unblock beads cadence — parallel broadcast, agent-scoped busy check, full HEARTBEAT.md prompt, connect-timeout, doctor warning

* docs(changelog): note heartbeat cadence fixes

* fix(heartbeat): address review feedback

* fix(heartbeat): append HEARTBEAT.md directives to commitment-only task dispatch (review feedback)

* docs(changelog): extend heartbeat fix entry — commitment-only task dispatch path (review feedback)

* fix(heartbeat): clear connect timer on synchronous baseFn throw (review feedback)

When the provider stream function passed to streamWithIdleTimeout throws

synchronously during setup, the connect watchdog timer was left armed

and could fire onIdleTimeout later with a stale error, keeping the

process open past the real failure. Wrap the synchronous baseFn(...)

invocation in a try/catch that clears the connect timer before

rethrowing, and add a regression test that asserts onIdleTimeout is

not invoked after the synchronous throw.

* docs(changelog): note round-4 heartbeat fix (review feedback)

Bump the heartbeat fixes list from six to seven and document the

synchronous-baseFn-throw connect-timer cleanup added in the prior

commit.

* fix(heartbeat): honor omitted doctor target (review feedback)

* fix(heartbeat): merge doctor heartbeat defaults (review feedback)

Teach the heartbeat session-target doctor warning to enumerate the same agents as the runtime heartbeat runner and merge agents.defaults.heartbeat with per-agent overrides before checking pinned sessions.

Add regression coverage for default-only heartbeat.session pins and explicit agent heartbeat entries that inherit the default session.

Validation:
- pnpm test src/commands/doctor-heartbeat-session-target.test.ts
- pnpm tsgo:core
- pnpm tsgo:core:test
- pnpm config:schema:check
- pnpm exec oxlint src/commands/doctor-heartbeat-session-target.ts src/commands/doctor-heartbeat-session-target.test.ts
- pnpm exec oxfmt --check src/commands/doctor-heartbeat-session-target.ts src/commands/doctor-heartbeat-session-target.test.ts
- git diff --check

Beads: openclaw-8zp

* test(heartbeat): avoid redundant doctor assertion (review feedback)

The CI lint shard flags the non-null assertion in the heartbeat doctor regression test. Keep the same test setup while using an explicit guard so the test still narrows the fixture before mutating the heartbeat entry.

Validation:
- pnpm exec oxlint src/commands/doctor-heartbeat-session-target.test.ts
- pnpm test src/commands/doctor-heartbeat-session-target.test.ts
- pnpm tsgo:core:test
- git diff --check

Beads: openclaw-8zp

* docs(config): refresh baseline after heartbeat branch update

* fix(heartbeat): narrow doctor session warnings (review feedback)
This commit is contained in:
Edward Abrams
2026-05-12 14:36:25 -07:00
committed by GitHub
parent aef80a2940
commit b247b1432f
14 changed files with 589 additions and 56 deletions

View File

@@ -114,6 +114,7 @@ Docs: https://docs.openclaw.ai
### Fixes
- Agents/heartbeat: fix seven layered issues that broke multi-agent heartbeat cadence — (1) fan out the scheduler broadcast wake across agents in parallel via `Promise.all` instead of awaiting each `runOnce` sequentially, so one agent doing real work no longer starves every later agent in iteration order; (2) scope `skipWhenBusy` to lanes attributable to the firing agent via session-key parsing of `session:agent:<id>:…` / `nested:agent:<id>:…` lane names, instead of consulting the global `subagent` lane, so a single stuck subagent on one agent no longer silently disables every other agent's heartbeat; (3) always append workspace `HEARTBEAT.md` directives (everything outside an optional `tasks:` block) to the dispatch prompt, so prose-runbook `HEARTBEAT.md` files reach the model directly instead of being silently dropped unless periodic tasks are declared; (4) race the initial stream-establishment promise inside `streamWithIdleTimeout` against the same watchdog timer that previously only guarded inter-token gaps, so SDK requests stuck at TCP/TLS handshake or before the first response byte no longer hang indefinitely (the stalled-session diagnostic's `recovery=none` case); (5) emit an `openclaw doctor` warning when `heartbeat.session` pins a session key that has no entry in the agent's session store, so silently-dropped heartbeat deliveries surface at config-validation time; (6) also route the commitment-only task dispatch path (tasks configured, none due) through `appendHeartbeatFileDirectives` so prose directives outside the `tasks:` block reach the model on this path as well; (7) wrap the synchronous `baseFn(...)` invocation inside `streamWithIdleTimeout` in a try/catch that clears the connect watchdog timer before rethrowing, so a provider stream function that throws during setup no longer leaves a live timer that can fire `onIdleTimeout` later with a stale error and keep the process open past the real failure. Thanks @zeroaltitude.
- Matrix: stop running `npm install`/`pnpm install` at runtime from a parent-derived plugin path; missing Matrix runtime dependencies now fail with repair guidance instead of mutating the wrong `node_modules` tree. Fixes #80758. (#80876) Thanks @kinjitakabe.
- CLI/media: render terminal QR codes with full-block characters by default so the bundled `qrcode` terminal renderer does not emit a pathologically dense ANSI final row in compact half-block mode that breaks scanning in some terminals. Fixes #77820. Thanks @KrasimirKralev.
- Agents/compaction: read post-compaction AGENTS.md refresh context from the queued run workspace instead of the runner process cwd, so CLI-backed follow-up turns re-inject the correct workspace startup rules after compaction. Fixes #70541. (#75532) Thanks @vyctorbrzezowski.

View File

@@ -1,2 +1,2 @@
f26833e053032e3da94025c8a5a8cb62dcddd275797b527440a19be5886a4783 plugin-sdk-api-baseline.json
429fe1d6d119379b914bf84b15705233dc8d2d9e1a8131bb28ea19b19afbe6a0 plugin-sdk-api-baseline.jsonl
981f125194293842b7a45b1de0ae2ec134f037f63a6cc672ee2a28648251b4c9 plugin-sdk-api-baseline.json
4c56ce2cb5bfae526557479a6cc19f8b0042d14f6c717996f8f86da5d5b159df plugin-sdk-api-baseline.jsonl

View File

@@ -109,7 +109,7 @@ See [Hooks](/automation/hooks).
### Heartbeat
Heartbeat is a periodic main-session turn (default every 30 minutes). It batches multiple checks (inbox, calendar, notifications) in one agent turn with full session context. Heartbeat turns do not create task records and do not extend daily/idle session reset freshness. Use `HEARTBEAT.md` for a small checklist, or a `tasks:` block when you want due-only periodic checks inside heartbeat itself. Empty heartbeat files skip as `empty-heartbeat-file`; due-only task mode skips as `no-tasks-due`. Heartbeats defer while cron work is active or queued, and `heartbeat.skipWhenBusy` can also defer them while subagent or nested lanes are busy.
Heartbeat is a periodic main-session turn (default every 30 minutes). It batches multiple checks (inbox, calendar, notifications) in one agent turn with full session context. Heartbeat turns do not create task records and do not extend daily/idle session reset freshness. Use `HEARTBEAT.md` for a small checklist, or a `tasks:` block when you want due-only periodic checks inside heartbeat itself. Empty heartbeat files skip as `empty-heartbeat-file`; due-only task mode skips as `no-tasks-due`. Heartbeats defer while cron work is active or queued, and `heartbeat.skipWhenBusy` can also defer an agent while that same agent's session-keyed subagent or nested lanes are busy.
See [Heartbeat](/gateway/heartbeat).

View File

@@ -542,7 +542,7 @@ Periodic heartbeat runs.
includeSystemPromptSection: true, // default: true; false omits the Heartbeat section from the system prompt
lightContext: false, // default: false; true keeps only HEARTBEAT.md from workspace bootstrap files
isolatedSession: false, // default: false; true runs each heartbeat in a fresh session (no conversation history)
skipWhenBusy: false, // default: false; true also waits for subagent/nested lanes
skipWhenBusy: false, // default: false; true also waits for this agent's subagent/nested lanes
session: "main",
to: "+15555550123",
directPolicy: "allow", // allow (default) | block
@@ -564,7 +564,7 @@ Periodic heartbeat runs.
- `directPolicy`: direct/DM delivery policy. `allow` (default) permits direct-target delivery. `block` suppresses direct-target delivery and emits `reason=dm-blocked`.
- `lightContext`: when true, heartbeat runs use lightweight bootstrap context and keep only `HEARTBEAT.md` from workspace bootstrap files.
- `isolatedSession`: when true, each heartbeat runs in a fresh session with no prior conversation history. Same isolation pattern as cron `sessionTarget: "isolated"`. Reduces per-heartbeat token cost from ~100K to ~2-5K tokens.
- `skipWhenBusy`: when true, heartbeat runs defer on extra busy lanes: subagent or nested command work. Cron lanes always defer heartbeats, even without this flag.
- `skipWhenBusy`: when true, heartbeat runs defer on that agent's extra busy lanes: its own session-keyed subagent or nested command work. Cron lanes always defer heartbeats, even without this flag.
- Per-agent: set `agents.list[].heartbeat`. When any agent defines `heartbeat`, **only those agents** run heartbeats.
- Heartbeats run full agent turns — shorter intervals burn more tokens.

View File

@@ -50,7 +50,7 @@ Example config:
directPolicy: "allow", // default: allow direct/DM targets; set "block" to suppress
lightContext: true, // optional: only inject HEARTBEAT.md from bootstrap files
isolatedSession: true, // optional: fresh session each run (no conversation history)
skipWhenBusy: true, // optional: also defer when subagent or nested lanes are busy
skipWhenBusy: true, // optional: also defer when this agent's subagent or nested lanes are busy
// activeHours: { start: "08:00", end: "24:00" },
// includeReasoning: true, // optional: send separate `Reasoning:` message too
},
@@ -66,7 +66,7 @@ Example config:
- The heartbeat prompt is sent **verbatim** as the user message. The system prompt includes a "Heartbeat" section only when heartbeats are enabled for the default agent, and the run is flagged internally.
- When heartbeats are disabled with `0m`, normal runs also omit `HEARTBEAT.md` from bootstrap context so the model does not see heartbeat-only instructions.
- Active hours (`heartbeat.activeHours`) are checked in the configured timezone. Outside the window, heartbeats are skipped until the next tick inside the window.
- Heartbeats automatically defer while cron work is active or queued. Set `heartbeat.skipWhenBusy: true` to defer on extra busy lanes (subagent or nested command work) as well; this is useful for local Ollama and other constrained single-runtime hosts.
- Heartbeats automatically defer while cron work is active or queued. Set `heartbeat.skipWhenBusy: true` to also defer an agent on its own session-keyed subagent or nested command lanes; sibling agents no longer pause just because another agent has subagent work in flight.
## What the heartbeat prompt is for
@@ -101,7 +101,7 @@ Outside heartbeats, stray `HEARTBEAT_OK` at the start/end of a message is stripp
includeReasoning: false, // default: false (deliver separate Reasoning: message when available)
lightContext: false, // default: false; true keeps only HEARTBEAT.md from workspace bootstrap files
isolatedSession: false, // default: false; true runs each heartbeat in a fresh session (no conversation history)
skipWhenBusy: false, // default: false; true also waits for subagent/nested lanes
skipWhenBusy: false, // default: false; true also waits for this agent's subagent/nested lanes
target: "last", // default: none | options: last | none | <channel id> (core or plugin, e.g. "imessage")
to: "+15551234567", // optional channel-specific override
accountId: "ops-bot", // optional multi-account channel id
@@ -235,7 +235,7 @@ Use `accountId` to target a specific account on multi-account channels like Tele
When true, each heartbeat runs in a fresh session with no prior conversation history. Uses the same isolation pattern as cron `sessionTarget: "isolated"`. Dramatically reduces per-heartbeat token cost. Combine with `lightContext: true` for maximum savings. Delivery routing still uses the main session context.
</ParamField>
<ParamField path="skipWhenBusy" type="boolean" default="false">
When true, heartbeat runs defer on extra busy lanes: subagent or nested command work. Cron lanes always defer heartbeats, even without this flag, so local-model hosts do not run cron and heartbeat prompts at the same time.
When true, heartbeat runs defer on that agent's extra busy lanes: its own session-keyed subagent or nested command work. Cron lanes always defer heartbeats, even without this flag, so local-model hosts do not run cron and heartbeat prompts at the same time.
</ParamField>
<ParamField path="session" type="string">
Optional session key for heartbeat runs.
@@ -295,7 +295,7 @@ Use `accountId` to target a specific account on multi-account channels like Tele
- To deliver to a specific channel/recipient, set `target` + `to`. With `target: "last"`, delivery uses the last external channel for that session.
- Heartbeat deliveries allow direct/DM targets by default. Set `directPolicy: "block"` to suppress direct-target sends while still running the heartbeat turn.
- If the main queue, target session lane, cron lane, or an active cron job is busy, the heartbeat is skipped and retried later.
- If `skipWhenBusy: true`, subagent and nested lanes also defer heartbeat runs.
- If `skipWhenBusy: true`, this agent's session-keyed subagent and nested lanes also defer heartbeat runs. Other agents' busy lanes do not defer this agent.
- If `target` resolves to no external destination, the run still happens but no outbound message is sent.
</Accordion>

View File

@@ -323,6 +323,24 @@ describe("streamWithIdleTimeout", () => {
await next;
});
it("clears the connection timer when stream setup rejects", async () => {
vi.useFakeTimers();
const setupError = new Error("provider setup failed");
const baseFn = vi.fn().mockRejectedValue(setupError);
const onIdleTimeout = vi.fn();
const wrapped = streamWithIdleTimeout(baseFn, 50, onIdleTimeout);
const model = {} as Parameters<typeof baseFn>[0];
const context = {} as Parameters<typeof baseFn>[1];
const options = {} as Parameters<typeof baseFn>[2];
await expect(wrapped(model, context, options)).rejects.toThrow("provider setup failed");
await vi.advanceTimersByTimeAsync(50);
expect(onIdleTimeout).not.toHaveBeenCalled();
});
it("throws when a promise stream never resolves", async () => {
vi.useFakeTimers();
let streamSignal: AbortSignal | undefined;
@@ -349,6 +367,25 @@ describe("streamWithIdleTimeout", () => {
expect(streamSignal?.aborted).toBe(true);
});
it("clears setup state when baseFn throws synchronously", async () => {
vi.useFakeTimers();
const setupError = new Error("sync provider setup failed");
const baseFn = vi.fn(() => {
throw setupError;
}) as unknown as Parameters<typeof streamWithIdleTimeout>[0];
const onIdleTimeout = vi.fn();
const wrapped = streamWithIdleTimeout(baseFn, 50, onIdleTimeout);
const model = {} as Parameters<typeof baseFn>[0];
const context = {} as Parameters<typeof baseFn>[1];
const options = {} as Parameters<typeof baseFn>[2];
expect(() => wrapped(model, context, options)).toThrow("sync provider setup failed");
await vi.advanceTimersByTimeAsync(500);
expect(onIdleTimeout).not.toHaveBeenCalled();
});
it("resets timer on each chunk", async () => {
const chunks = [{ text: "a" }, { text: "b" }, { text: "c" }];
const mockStream = createMockAsyncIterable(chunks);

View File

@@ -0,0 +1,147 @@
import fs from "node:fs";
import os from "node:os";
import path from "node:path";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import { resolveStorePath } from "../config/sessions/paths.js";
import type { OpenClawConfig } from "../config/types.openclaw.js";
import { describeHeartbeatSessionTargetIssues } from "./doctor-heartbeat-session-target.js";
describe("describeHeartbeatSessionTargetIssues", () => {
let tmpDir: string;
beforeEach(() => {
tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), "openclaw-heartbeat-doctor-"));
});
afterEach(() => {
fs.rmSync(tmpDir, { recursive: true, force: true });
});
function cfgWithSession(session: string, target: string | null = "slack"): OpenClawConfig {
const heartbeat = target === null ? { session } : { session, target };
return {
session: {
mainKey: "work",
store: path.join(tmpDir, "agents", "{agentId}", "sessions", "sessions.json"),
},
agents: {
list: [
{
id: "ops",
heartbeat,
},
],
},
} as OpenClawConfig;
}
function cfgWithDefaultHeartbeat(
session: string,
target: string | null = "slack",
): OpenClawConfig {
const heartbeat = target === null ? { session } : { session, target };
return {
session: {
mainKey: "work",
store: path.join(tmpDir, "agents", "{agentId}", "sessions", "sessions.json"),
},
agents: {
defaults: {
heartbeat,
},
list: [
{
id: "ops",
},
],
},
} as OpenClawConfig;
}
function writeStore(cfg: OpenClawConfig, entries: Record<string, unknown>) {
const storePath = resolveStorePath(cfg.session?.store, { agentId: "ops" });
fs.mkdirSync(path.dirname(storePath), { recursive: true });
fs.writeFileSync(storePath, JSON.stringify(entries, null, 2));
}
it("uses runtime session canonicalization before warning", () => {
const cfg = cfgWithSession("agent:ops:main");
writeStore(cfg, {
"agent:ops:work": {
sessionId: "agent:ops:work",
updatedAt: Date.now(),
},
});
expect(describeHeartbeatSessionTargetIssues(cfg)).toEqual([]);
});
it("warns when the resolved heartbeat session is missing", () => {
const cfg = cfgWithSession("slack:channel:c123");
writeStore(cfg, {});
const warnings = describeHeartbeatSessionTargetIssues(cfg);
expect(warnings).toHaveLength(1);
expect(warnings[0]).toContain("resolved to agent:ops:slack:channel:c123");
expect(warnings[0]).toContain('reason="no-target"');
});
it("does not warn when an explicit heartbeat recipient does not need session history", () => {
const cfg = cfgWithSession("slack:channel:c123");
const agent = cfg.agents?.list?.[0];
if (!agent?.heartbeat) {
throw new Error("expected test config to include heartbeat config");
}
agent.heartbeat.target = "telegram";
agent.heartbeat.to = "-100123";
writeStore(cfg, {});
expect(describeHeartbeatSessionTargetIssues(cfg)).toEqual([]);
});
it("does not warn when the heartbeat cadence is disabled", () => {
const cfg = cfgWithSession("slack:channel:c123");
const agent = cfg.agents?.list?.[0];
if (!agent?.heartbeat) {
throw new Error("expected test config to include heartbeat config");
}
agent.heartbeat.every = "0m";
writeStore(cfg, {});
expect(describeHeartbeatSessionTargetIssues(cfg)).toEqual([]);
});
it("warns when a default-only heartbeat session is missing", () => {
const cfg = cfgWithDefaultHeartbeat("slack:channel:c123");
writeStore(cfg, {});
const warnings = describeHeartbeatSessionTargetIssues(cfg);
expect(warnings).toHaveLength(1);
expect(warnings[0]).toContain("Agent ops heartbeat.session pins slack:channel:c123");
expect(warnings[0]).toContain("resolved to agent:ops:slack:channel:c123");
});
it("warns when an explicit heartbeat inherits a default session", () => {
const cfg = cfgWithDefaultHeartbeat("slack:channel:c123");
const agent = cfg.agents?.list?.[0];
if (!agent) {
throw new Error("expected test config to include an agent");
}
agent.heartbeat = {};
writeStore(cfg, {});
const warnings = describeHeartbeatSessionTargetIssues(cfg);
expect(warnings).toHaveLength(1);
expect(warnings[0]).toContain("resolved to agent:ops:slack:channel:c123");
});
it("does not warn when target is omitted because runtime defaults to none", () => {
const cfg = cfgWithSession("slack:channel:c123", null);
writeStore(cfg, {});
expect(describeHeartbeatSessionTargetIssues(cfg)).toEqual([]);
});
});

View File

@@ -0,0 +1,137 @@
import { listAgentEntries, listAgentIds, resolveAgentConfig } from "../agents/agent-scope.js";
import { canonicalizeMainSessionAlias } from "../config/sessions/main-session.js";
import { resolveStorePath } from "../config/sessions/paths.js";
import { loadSessionStore } from "../config/sessions/store-load.js";
import type { AgentDefaultsConfig } from "../config/types.agent-defaults.js";
import type { OpenClawConfig } from "../config/types.openclaw.js";
import { resolveHeartbeatIntervalMs } from "../infra/heartbeat-summary.js";
import { resolveHeartbeatDeliveryTarget } from "../infra/outbound/targets.js";
import {
normalizeAgentId,
resolveAgentIdFromSessionKey,
toAgentStoreSessionKey,
} from "../routing/session-key.js";
import { isSubagentSessionKey } from "../sessions/session-key-utils.js";
import { normalizeOptionalString } from "../shared/string-coerce.js";
type HeartbeatConfig = AgentDefaultsConfig["heartbeat"];
function hasExplicitHeartbeatAgents(cfg: OpenClawConfig) {
return listAgentEntries(cfg).some((entry) => Boolean(entry?.heartbeat));
}
function resolveHeartbeatConfig(cfg: OpenClawConfig, agentId: string): HeartbeatConfig | undefined {
const defaults = cfg.agents?.defaults?.heartbeat;
const overrides = resolveAgentConfig(cfg, agentId)?.heartbeat;
if (!defaults && !overrides) {
return overrides;
}
return { ...defaults, ...overrides };
}
function listHeartbeatDoctorAgents(cfg: OpenClawConfig) {
if (hasExplicitHeartbeatAgents(cfg)) {
return listAgentEntries(cfg)
.filter((entry) => entry?.heartbeat)
.map((entry) => normalizeAgentId(entry.id))
.filter((agentId) => agentId);
}
if (cfg.agents?.defaults?.heartbeat) {
return listAgentIds(cfg);
}
return [];
}
/**
* Detect heartbeat configs that pin a non-existent session. The runtime
* resolves `heartbeat.session` to a sessionKey via `resolveHeartbeatSession`;
* if the entry is missing, `resolveHeartbeatDeliveryTarget` falls back to
* `{channel:"none", reason:"no-target"}` and the heartbeat fires a model
* call whose reply has nowhere to land. Common cause: the configured Slack
* channel ID does not match any channel the agent has ever joined (e.g.,
* heartbeat pins channel `c0b2eddpw95` but the agent only has sessions in
* `c0ag7jag35g`, or the agent has no Slack bot at all).
*
* Warning only — repair would mean rewriting the config, which is the
* operator's intent to express.
*/
export function describeHeartbeatSessionTargetIssues(cfg: OpenClawConfig): string[] {
const warnings: string[] = [];
const sessionScope = cfg.session?.scope ?? "per-sender";
for (const agentId of listHeartbeatDoctorAgents(cfg)) {
if (!agentId) {
continue;
}
const resolvedAgentId = normalizeAgentId(agentId);
const heartbeatConfig = resolveHeartbeatConfig(cfg, resolvedAgentId);
if (!heartbeatConfig) {
continue;
}
if (!resolveHeartbeatIntervalMs(cfg, undefined, heartbeatConfig)) {
continue;
}
const configuredSession = normalizeOptionalString(heartbeatConfig.session);
if (!configuredSession) {
continue;
}
const normalizedSession = configuredSession.toLowerCase();
// `main` / `global` resolve to the agent main session via
// `resolveHeartbeatSession`; missing entries fall back to the same key
// and are repaired elsewhere — don't double-warn here.
if (normalizedSession === "main" || normalizedSession === "global") {
continue;
}
if (isSubagentSessionKey(configuredSession)) {
continue;
}
if (sessionScope === "global") {
continue;
}
const target = normalizeOptionalString(heartbeatConfig.target);
if (!target || target === "none") {
continue;
}
const deliveryWithoutSession = resolveHeartbeatDeliveryTarget({
cfg,
heartbeat: heartbeatConfig,
});
if (deliveryWithoutSession.channel !== "none" && deliveryWithoutSession.to) {
continue;
}
const candidateSession = toAgentStoreSessionKey({
agentId: resolvedAgentId,
requestKey: configuredSession,
mainKey: cfg.session?.mainKey,
});
if (isSubagentSessionKey(candidateSession)) {
continue;
}
const canonicalSession = canonicalizeMainSessionAlias({
cfg,
agentId: resolvedAgentId,
sessionKey: candidateSession,
});
if (
canonicalSession === "global" ||
isSubagentSessionKey(canonicalSession) ||
resolveAgentIdFromSessionKey(canonicalSession) !== resolvedAgentId
) {
continue;
}
const storeAgentId = resolvedAgentId;
const storePath = resolveStorePath(cfg.session?.store, { agentId: storeAgentId });
const store = loadSessionStore(storePath);
const entry = store[canonicalSession];
if (entry) {
continue;
}
warnings.push(
[
`- Agent ${agentId} heartbeat.session pins ${configuredSession} (resolved to ${canonicalSession}) but that session has no entry in ${storePath}.`,
` Heartbeats will run but resolve delivery to channel="none"/reason="no-target", so replies are dropped silently.`,
` Fix: point heartbeat.session at a session the agent actually owns, set heartbeat.target="none" to suppress delivery, or remove the heartbeat.session field to fall back to the agent main session.`,
].join("\n"),
);
}
return warnings;
}

View File

@@ -34,6 +34,7 @@ import { normalizeOptionalLowercaseString } from "../shared/string-coerce.js";
import { note } from "../terminal/note.js";
import { shortenHomePath } from "../utils.js";
import { repairHeartbeatPoisonedMainSession } from "./doctor-heartbeat-main-session-repair.js";
import { describeHeartbeatSessionTargetIssues } from "./doctor-heartbeat-session-target.js";
import { runPluginSessionStateDoctorRepairs } from "./doctor-session-state-providers.js";
type DoctorPrompterLike = {
@@ -955,6 +956,10 @@ export async function noteStateIntegrity(
changes,
});
for (const warning of describeHeartbeatSessionTargetIssues(cfg)) {
warnings.push(warning);
}
const mainKey = resolveMainSessionKey(cfg);
const mainEntry = store[mainKey];
if (mainEntry?.sessionId) {

View File

@@ -290,9 +290,9 @@ export const FIELD_HELP: Record<string, string> = {
"agents.list[].heartbeat.timeoutSeconds":
"Per-agent maximum time in seconds allowed for a heartbeat agent turn before it is aborted. Leave unset to inherit the merged heartbeat/default agent timeout.",
"agents.defaults.heartbeat.skipWhenBusy":
"When true, defer heartbeat turns on extra busy lanes: subagent or nested command work. Cron lanes always defer heartbeat turns.",
"When true, defer heartbeat turns on this agent's extra busy lanes: its own session-keyed subagent or nested command work. Cron lanes always defer heartbeat turns.",
"agents.list[].heartbeat.skipWhenBusy":
"Per-agent override that defers heartbeat turns on extra busy lanes: subagent or nested command work. Cron lanes always defer heartbeat turns.",
"Per-agent override that defers heartbeat turns on that agent's extra busy lanes: its own session-keyed subagent or nested command work. Cron lanes always defer heartbeat turns.",
browser:
"Browser runtime controls for local or remote CDP attachment, profile routing, and screenshot/snapshot behavior. Keep defaults unless your automation workflow requires custom browser transport settings.",
"browser.enabled":
@@ -1823,7 +1823,7 @@ export const FIELD_HELP: Record<string, string> = {
"agents.list.*.heartbeat.directPolicy":
'Per-agent override for heartbeat direct/DM delivery policy; use "block" for agents that should only send heartbeat alerts to non-DM destinations.',
"agents.list.*.heartbeat.skipWhenBusy":
"Per-agent override that defers heartbeat turns on extra busy lanes: subagent or nested command work. Cron lanes always defer heartbeat turns.",
"Per-agent override that defers heartbeat turns on that agent's extra busy lanes: its own session-keyed subagent or nested command work. Cron lanes always defer heartbeat turns.",
"channels.mattermost.configWrites":
"Allow Mattermost to write config in response to channel events/commands (default: true).",
"channels.modelByChannel":

View File

@@ -420,7 +420,7 @@ export type AgentDefaultsConfig = {
*/
isolatedSession?: boolean;
/**
* If true, defer heartbeat runs while subagent or nested command lanes are busy.
* If true, defer heartbeat runs while this agent's session-keyed subagent or nested command lanes are busy.
* Cron lanes are always treated as busy for heartbeat deferral.
*/
skipWhenBusy?: boolean;

View File

@@ -442,4 +442,101 @@ describe("runHeartbeatOnce commitments", () => {
sentAtMs: nowMs,
});
});
it("appends HEARTBEAT.md directives to commitment prompt when tasks are configured but none are due", async () => {
const { result, sendTelegram, store } = await withTempHeartbeatSandbox(
async ({ tmpDir, storePath, replySpy }) => {
vi.stubEnv("OPENCLAW_STATE_DIR", tmpDir);
const sessionKey = "agent:main:telegram:user-155462274";
const cfg: OpenClawConfig = {
agents: {
defaults: {
workspace: tmpDir,
heartbeat: {
every: "5m",
target: "last",
},
},
},
channels: { telegram: { allowFrom: ["*"] } },
session: { store: storePath },
commitments: { enabled: true },
};
// HEARTBEAT.md has a tasks block (task ran recently — NOT due) plus extra prose directives.
await fs.writeFile(
path.join(tmpDir, "HEARTBEAT.md"),
`Do not contact the user unless critical.
tasks:
- name: check-deployment
interval: 5m
prompt: Check deployment status
`,
"utf-8",
);
// Seed heartbeatTaskState so the task ran at nowMs (well within 5m interval — not due).
await fs.writeFile(
storePath,
JSON.stringify({
[sessionKey]: {
sessionId: "sid",
updatedAt: nowMs,
lastChannel: "telegram",
lastProvider: "telegram",
lastTo: "155462274",
heartbeatTaskState: { "check-deployment": nowMs },
},
}),
);
await saveCommitmentStore(undefined, {
version: 1,
commitments: [buildCommitment({ id: "cm_interview", sessionKey, to: "155462274" })],
});
const sendTelegram = vi.fn().mockResolvedValue({
messageId: "m1",
chatId: "155462274",
});
replySpy.mockImplementation(
async (ctx: { Body?: string }, _opts?: { disableTools?: boolean }) => {
// Must contain commitment text
expect(ctx.Body).toContain("Due inferred follow-up commitments");
expect(ctx.Body).toContain("How did the interview go?");
// Must also contain HEARTBEAT.md directives outside the tasks block
expect(ctx.Body).toContain("Do not contact the user unless critical.");
// Must NOT contain the task prompt (task is not due)
expect(ctx.Body).not.toContain("Check deployment status");
return { text: "How did the interview go?" };
},
);
const result = await runHeartbeatOnce({
cfg,
agentId: "main",
sessionKey,
deps: {
getReplyFromConfig: replySpy,
telegram: sendTelegram,
getQueueSize: () => 0,
nowMs: () => nowMs,
},
});
return {
result,
sendTelegram,
store: await loadCommitmentStore(),
};
},
);
expect(result.status).toBe("ran");
expect(sendTelegram).toHaveBeenCalled();
expect(store.commitments[0]).toMatchObject({
id: "cm_interview",
status: "sent",
attempts: 1,
sentAtMs: nowMs,
});
});
});

View File

@@ -122,7 +122,11 @@ describe("heartbeat runner skips when target session lane is busy", () => {
});
});
it("returns lanes-busy for opt-in broader busy-lane checks", async () => {
it("does not return lanes-busy for global subagent-lane work alone", async () => {
// The global Subagent lane has no agent identity in its name — a stalled
// subagent on any one agent must not silently disable every other
// agent's heartbeat. Per-agent attribution comes from the session-keyed
// lane variants exercised below.
await withTempHeartbeatSandbox(async ({ storePath, replySpy }) => {
const cfg = createHeartbeatTelegramConfig();
cfg.agents!.defaults!.heartbeat = { every: "30m", skipWhenBusy: true };
@@ -132,6 +136,28 @@ describe("heartbeat runner skips when target session lane is busy", () => {
cfg,
deps: {
getQueueSize: vi.fn((lane?: string) => (lane === CommandLane.Subagent ? 1 : 0)),
getCommandLaneSnapshots: vi.fn(() => []),
nowMs: () => Date.now(),
getReplyFromConfig: replySpy,
} as HeartbeatDeps,
});
expect(result.status).not.toBe("skipped");
});
});
it("returns lanes-busy for opt-in work in this agent's nested session lane", async () => {
await withTempHeartbeatSandbox(async ({ storePath, replySpy }) => {
const cfg = createHeartbeatTelegramConfig();
cfg.agents!.defaults!.heartbeat = { every: "30m", skipWhenBusy: true };
await seedHeartbeatTelegramSession(storePath, cfg);
const nestedSessionLane = resolveNestedAgentLaneForSession("agent:main:telegram:123");
const result = await runHeartbeatOnce({
cfg,
deps: {
getQueueSize: vi.fn((_lane?: string) => 0),
getCommandLaneSnapshots: vi.fn(() => [createBusyLaneSnapshot(nestedSessionLane)]),
nowMs: () => Date.now(),
getReplyFromConfig: replySpy,
} as HeartbeatDeps,
@@ -142,7 +168,9 @@ describe("heartbeat runner skips when target session lane is busy", () => {
});
});
it("returns lanes-busy for opt-in work in any session-scoped nested lane", async () => {
it("does not return lanes-busy for another agent's session-scoped nested lane", async () => {
// Per-agent scoping: a zombie subagent or nested run belonging to a
// different agent must not block this agent's heartbeat.
await withTempHeartbeatSandbox(async ({ storePath, replySpy }) => {
const cfg = createHeartbeatTelegramConfig();
cfg.agents!.defaults!.heartbeat = { every: "30m", skipWhenBusy: true };
@@ -159,8 +187,7 @@ describe("heartbeat runner skips when target session lane is busy", () => {
} as HeartbeatDeps,
});
expect(result).toEqual({ status: "skipped", reason: HEARTBEAT_SKIP_LANES_BUSY });
expect(replySpy).not.toHaveBeenCalled();
expect(result.status).not.toBe("skipped");
});
});

View File

@@ -13,7 +13,6 @@ import {
resolveDefaultAgentId,
} from "../agents/agent-scope.js";
import { appendCronStyleCurrentTimeLine } from "../agents/current-time.js";
import { isNestedAgentLane } from "../agents/lanes.js";
import { resolveModelRefFromString, type ModelRef } from "../agents/model-selection.js";
import { resolveEmbeddedSessionLane } from "../agents/pi-embedded-runner/lanes.js";
import { formatReasoningMessage } from "../agents/pi-embedded-utils.js";
@@ -156,7 +155,6 @@ function loadHeartbeatRunnerRuntime() {
}
const HEARTBEAT_ALWAYS_BUSY_LANES = [CommandLane.Cron, CommandLane.CronNested] as const;
const HEARTBEAT_OPT_IN_BUSY_LANES = [CommandLane.Subagent, CommandLane.Nested] as const;
function hasQueuedWorkInLanes(
lanes: readonly string[],
@@ -174,14 +172,47 @@ function hasQueuedWorkInLaneSnapshots(
);
}
function hasOptInBusyLaneWork(
getSize: (lane?: string) => number,
/**
* Return true when `lane` carries a session-key suffix that parses to
* `agentId`. Lane name shapes covered:
*
* - `session:agent:<agentId>:...` — embedded-runner per-session lanes
* (subagent runs, compaction, context maintenance).
* - `nested:agent:<agentId>:...` — per-session nested-agent lanes.
*
* The generic `subagent` and `nested` global lanes carry no agent identity,
* so they cannot be scoped here; rely on the session-keyed variants and the
* per-session `session-lane-busy` skip at the heartbeat dispatch site.
*/
function laneBelongsToAgent(lane: string, agentId: string): boolean {
let suffix: string | undefined;
if (lane.startsWith("session:")) {
suffix = lane.slice("session:".length);
} else if (lane.startsWith("nested:")) {
suffix = lane.slice("nested:".length);
}
if (!suffix) {
return false;
}
const parsed = parseAgentSessionKey(suffix);
if (!parsed) {
return false;
}
return normalizeAgentId(parsed.agentId) === normalizeAgentId(agentId);
}
/**
* Per-agent variant of the opt-in busy check. Previously the runner consulted
* a global `subagent` lane size, which meant a zombie subagent on any one
* agent silently disabled every other agent's heartbeat. Restrict the check
* to lanes attributable to `agentId` via session-key parsing so a stuck
* subagent on `main` no longer starves `tank`, `narcissus`, or `shiva`.
*/
function hasAgentOptInBusyLaneWork(
agentId: string,
getSnapshots: () => readonly CommandLaneSnapshot[],
): boolean {
return (
hasQueuedWorkInLanes(HEARTBEAT_OPT_IN_BUSY_LANES, getSize) ||
hasQueuedWorkInLaneSnapshots(getSnapshots(), isNestedAgentLane)
);
return hasQueuedWorkInLaneSnapshots(getSnapshots(), (lane) => laneBelongsToAgent(lane, agentId));
}
function resolveHeartbeatChannelPlugin(channel: string): ChannelPlugin | undefined {
@@ -1058,6 +1089,27 @@ function stripHeartbeatTasksBlock(content: string): string {
return kept.join("\n");
}
/**
* Append the workspace HEARTBEAT.md directives (everything outside the
* `tasks:` block) to the prompt. Runs on every heartbeat path that actually
* dispatches a model call, so prose-style runbooks (the common case in
* production setups) reach the model — not only files that happen to declare
* periodic tasks.
*/
function appendHeartbeatFileDirectives(prompt: string, heartbeatFileContent?: string): string {
if (!heartbeatFileContent) {
return prompt;
}
const directives = stripHeartbeatTasksBlock(heartbeatFileContent).trim();
if (!directives) {
return prompt;
}
if (prompt.includes(directives)) {
return prompt;
}
return `${prompt}\n\nAdditional context from HEARTBEAT.md:\n${directives}`;
}
function resolveHeartbeatRunPrompt(params: {
cfg: OpenClawConfig;
heartbeat?: HeartbeatConfig;
@@ -1100,18 +1152,12 @@ function resolveHeartbeatRunPrompt(params: {
const completionInstruction = params.useHeartbeatResponseTool
? "After completing all due tasks, use heartbeat_respond to report the outcome. Set notify=false when nothing needs the user's attention."
: "After completing all due tasks, reply HEARTBEAT_OK.";
let prompt = `Run the following periodic tasks (only those due based on their intervals):
const taskListPrompt = `Run the following periodic tasks (only those due based on their intervals):
${taskList}
${completionInstruction}`;
if (params.heartbeatFileContent) {
const directives = stripHeartbeatTasksBlock(params.heartbeatFileContent).trim();
if (directives) {
prompt += `\n\nAdditional context from HEARTBEAT.md:\n${directives}`;
}
}
const prompt = appendHeartbeatFileDirectives(taskListPrompt, params.heartbeatFileContent);
return {
prompt,
hasExecCompletion: false,
@@ -1123,7 +1169,7 @@ ${completionInstruction}`;
}
if (commitmentPrompt) {
return {
prompt: commitmentPrompt,
prompt: appendHeartbeatFileDirectives(commitmentPrompt, params.heartbeatFileContent),
hasExecCompletion: false,
hasRelayableExecCompletion: false,
hasCronEvents: false,
@@ -1155,9 +1201,14 @@ ${completionInstruction}`;
: baseUsesHeartbeatResponseTool
? resolveHeartbeatResponseToolPrompt(params.cfg, params.heartbeat)
: resolveHeartbeatPrompt(params.cfg, params.heartbeat);
const basePromptWithHint = appendHeartbeatWorkspacePathHint(basePrompt, params.workspaceDir);
const basePromptWithDirectives = appendHeartbeatFileDirectives(
basePromptWithHint,
params.heartbeatFileContent,
);
const prompt = commitmentPrompt
? `${appendHeartbeatWorkspacePathHint(basePrompt, params.workspaceDir)}\n\n${commitmentPrompt}`
: appendHeartbeatWorkspacePathHint(basePrompt, params.workspaceDir);
? `${basePromptWithDirectives}\n\n${commitmentPrompt}`
: basePromptWithDirectives;
return {
prompt,
@@ -1245,7 +1296,7 @@ export async function runHeartbeatOnce(opts: {
return { status: "skipped", reason: HEARTBEAT_SKIP_CRON_IN_PROGRESS };
}
if (heartbeat?.skipWhenBusy === true && hasOptInBusyLaneWork(getSize, getSnapshots)) {
if (heartbeat?.skipWhenBusy === true && hasAgentOptInBusyLaneWork(agentId, getSnapshots)) {
emitHeartbeatEvent({
status: "skipped",
reason: HEARTBEAT_SKIP_LANES_BUSY,
@@ -2283,10 +2334,20 @@ export function startHeartbeatRunner(opts: {
}
}
for (const agent of state.agents.values()) {
// Run each agent's wake concurrently. Heartbeat work is per-agent —
// separate session stores, lanes, and delivery targets — so awaiting
// one slow agent (e.g. one whose heartbeat spawns a multi-minute
// subagent) must not starve the others. Bookkeeping mutations only
// touch the owning agent's `HeartbeatAgentState`, so the per-agent
// closures are safe to fan out under `Promise.all`.
type AgentWakeOutcome = {
ran: boolean;
retryableBusySkip?: HeartbeatRunResult;
};
const runOneAgent = async (agent: HeartbeatAgentState): Promise<AgentWakeOutcome> => {
const deferral = evaluateWakeDeferral(agent, now, reason, intent);
if (deferral.defer) {
continue;
return { ran: false };
}
let res: HeartbeatRunResult;
@@ -2302,29 +2363,28 @@ export function startHeartbeatRunner(opts: {
});
} catch (err) {
const errMsg = formatErrorMessage(err);
log.error(`heartbeat runner: runOnce threw unexpectedly: ${errMsg}`, { error: errMsg });
// Throw counts as a non-retryable terminal attempt — see comment in
// targeted branch above.
log.error(`heartbeat runner: runOnce threw unexpectedly: ${errMsg}`, {
error: errMsg,
agentId: agent.agentId,
});
// Throw counts as a non-retryable terminal attempt for cooldown
// purposes — record bookkeeping so the wake layer doesn't tight-loop
// on the same reason.
recordRunBookkeeping(agent, now);
advanceAgentSchedule(agent, now, reason);
continue;
return { ran: false };
}
if (res.status === "skipped" && isRetryableHeartbeatBusySkipReason(res.reason)) {
// Do not advance the schedule or record run bookkeeping — the main
// lane is busy and the wake layer will retry the same reason shortly
// (DEFAULT_RETRY_MS = 1 s). Recording here would convert the retry
// into a false `not-due`/`min-spacing` defer.
retryableBusySkip = true;
return res;
// Do not advance the schedule or record run bookkeeping for this
// agent — its target runtime is busy and the wake layer retries.
return { ran: false, retryableBusySkip: res };
}
// Non-retryable outcome — record bookkeeping for cooldown gates.
recordRunBookkeeping(agent, now);
if (res.status !== "skipped" || res.reason !== "disabled") {
advanceAgentSchedule(agent, now, reason);
}
if (res.status === "ran") {
ran = true;
}
let agentRan = res.status === "ran";
const defaultSessionKey = resolveHeartbeatSession(
state.cfg,
@@ -2357,6 +2417,7 @@ export function startHeartbeatRunner(opts: {
const errMsg = formatErrorMessage(err);
log.error(`heartbeat runner: commitment runOnce threw unexpectedly: ${errMsg}`, {
error: errMsg,
agentId: agent.agentId,
});
continue;
}
@@ -2364,13 +2425,34 @@ export function startHeartbeatRunner(opts: {
commitmentRes.status === "skipped" &&
isRetryableHeartbeatBusySkipReason(commitmentRes.reason)
) {
retryableBusySkip = true;
return commitmentRes;
return { ran: agentRan, retryableBusySkip: commitmentRes };
}
if (commitmentRes.status === "ran") {
ran = true;
agentRan = true;
}
}
return { ran: agentRan };
};
const agentOutcomes = await Promise.all(
Array.from(state.agents.values()).map((agent) => runOneAgent(agent)),
);
let firstRetryableBusy: HeartbeatRunResult | undefined;
for (const outcome of agentOutcomes) {
if (outcome.ran) {
ran = true;
}
if (outcome.retryableBusySkip && !firstRetryableBusy) {
firstRetryableBusy = outcome.retryableBusySkip;
}
}
if (firstRetryableBusy) {
// At least one agent's runtime was busy. The wake layer schedules a
// retry; on retry, agents that already advanced their schedule will
// defer via cooldown, so only the still-busy agent actually re-runs.
retryableBusySkip = true;
return firstRetryableBusy;
}
if (ran) {