Compare commits

...

1 Commits

Author SHA1 Message Date
Tak Hoffman
91625aa9f3 feat capability CLI on latest main 2026-04-06 17:30:59 -05:00
58 changed files with 5875 additions and 50 deletions

View File

@@ -6,12 +6,14 @@ Docs: https://docs.openclaw.ai
### Changes
- CLI/capabilities: add a first-class `openclaw capability ...` hub for provider-backed inference workflows across model, media, web, and embedding tasks, with capability inspection, provider discovery, and consistent JSON output. Thanks @Takhoffman.
- Providers/Anthropic: restore Claude CLI as the preferred local Anthropic path in onboarding, model-auth guidance, and doctor flows again, and keep the Docker Claude CLI live lane aligned with the restored guidance.
- Plugins/webhooks: add a bundled webhook ingress plugin so external automation can create and drive bound TaskFlows through per-route shared-secret endpoints. (#61892) Thanks @mbelinky.
- Tools/media: document per-provider music and video generation capabilities, and add shared live video-to-video sweep coverage for providers that support local reference clips.
### Fixes
- CLI/capabilities: keep provider-backed capability behavior aligned with actual runtime execution by fixing explicit TTS override handling, profile-aware gateway TTS prefs resolution, per-request transcription `prompt`/`language` overrides, image output MIME/extension mismatches, configured web-search fallback behavior, and agent-vs-CLI web-search execution drift.
- Channels/secrets: keep bundled channel artifact and secret-contract loading stable under lazy loading so bundled channel secrets continue to appear in `openclaw secret`, status, and security-audit surfaces.
- Providers/xAI: recognize `api.grok.x.ai` as an xAI-native endpoint again so native xAI web-search attribution keeps working on Grok-hosted base URLs. (#61377) Thanks @jjjojoj.
- Providers/Anthropic/cache: preserve thinking blocks for Claude Opus 4.5+, Sonnet 4.5+, and newer Claude 4-family models so Anthropic prompt-cache prefixes keep matching after thinking turns. (#61793)

View File

@@ -361,14 +361,6 @@
}
}
},
"update_plan": {
"emoji": "🗺️",
"title": "Update Plan",
"detailKeys": [
"explanation",
"plan.0.step"
]
},
"gateway": {
"emoji": "🔌",
"title": "Gateway",

116
docs/cli/capability.md Normal file
View File

@@ -0,0 +1,116 @@
---
summary: "Capability-first CLI for provider-backed model, media, web, and embedding workflows"
read_when:
- Adding or modifying `openclaw capability` commands
- Designing stable headless capability automation
title: "Capability CLI"
---
# Capability CLI
`openclaw capability` is the canonical headless surface for provider-backed capabilities.
It intentionally exposes capability families, not raw gateway RPC names and not raw agent tool ids.
## Command tree
```text
openclaw capability
list
inspect
model
run
list
inspect
providers
auth login
auth logout
auth status
media
image
generate
edit
describe
describe-many
providers
audio
transcribe
providers
tts
convert
voices
providers
status
enable
disable
set-provider
video
generate
describe
providers
web
search
fetch
providers
memory
embedding
create
providers
```
## Transport
Supported transport flags:
- `--local`
- `--gateway`
Default transport is implicit auto at the command-family level:
- Stateless execution commands default to local.
- Gateway-managed state commands default to gateway.
Examples:
```bash
openclaw capability model run --prompt "hello" --json
openclaw capability media image generate --prompt "friendly lobster" --json
openclaw capability media tts status --json
openclaw capability embedding create --text "hello world" --json
```
## JSON output
Capability commands normalize JSON output under a shared envelope:
```json
{
"ok": true,
"capability": "media.image.generate",
"transport": "local",
"provider": "openai",
"model": "gpt-image-1",
"attempts": [],
"outputs": []
}
```
Top-level fields are stable:
- `ok`
- `capability`
- `transport`
- `provider`
- `model`
- `attempts`
- `outputs`
- `error`
## Notes
- `model run` reuses the agent runtime so provider/model overrides behave like normal agent execution.
- `media tts status` defaults to gateway because it reflects gateway-managed TTS state.

View File

@@ -35,6 +35,7 @@ This page describes the current CLI behavior. If commands change, update this do
- [`logs`](/cli/logs)
- [`system`](/cli/system)
- [`models`](/cli/models)
- [`capability`](/cli/capability)
- [`memory`](/cli/memory)
- [`directory`](/cli/directory)
- [`nodes`](/cli/nodes)
@@ -248,6 +249,16 @@ openclaw [--dev] [--profile <name>] <command>
fallbacks list|add|remove|clear
image-fallbacks list|add|remove|clear
scan
capability
list
inspect
model run|list|inspect|providers|auth login|logout|status
media image generate|edit|describe|describe-many|providers
media audio transcribe|providers
media tts convert|voices|providers|status|enable|disable|set-provider
media video generate|describe|providers
web search|fetch|providers
embedding create|providers
auth add|login|login-github-copilot|setup-token|paste-token
auth order get|set|clear
sandbox

View File

@@ -1,5 +1,6 @@
import { definePluginEntry } from "openclaw/plugin-sdk/plugin-entry";
import { buildMicrosoftFoundryProvider } from "./provider.js";
import { buildMicrosoftFoundryRealtimeTranscriptionProvider } from "./realtime-transcription-provider.js";
export default definePluginEntry({
id: "microsoft-foundry",
@@ -7,5 +8,6 @@ export default definePluginEntry({
description: "Microsoft Foundry provider with Entra ID and API key auth",
register(api) {
api.registerProvider(buildMicrosoftFoundryProvider());
api.registerRealtimeTranscriptionProvider(buildMicrosoftFoundryRealtimeTranscriptionProvider());
},
});

View File

@@ -0,0 +1,58 @@
import { describe, expect, it } from "vitest";
import { buildMicrosoftFoundryRealtimeTranscriptionProvider } from "./realtime-transcription-provider.js";
describe("buildMicrosoftFoundryRealtimeTranscriptionProvider", () => {
it("normalizes foundry config from the voice provider block", () => {
const provider = buildMicrosoftFoundryRealtimeTranscriptionProvider();
const resolved = provider.resolveConfig?.({
cfg: {} as never,
rawConfig: {
providers: {
"microsoft-foundry": {
apiKey: "azure-test-key",
baseUrl: "https://example.services.ai.azure.com/openai/v1",
deployment: "gpt-realtime",
apiVersion: "2025-04-01-preview",
},
},
},
});
expect(resolved).toEqual({
apiKey: "azure-test-key",
baseUrl: "https://example.services.ai.azure.com/openai/v1",
deployment: "gpt-realtime",
apiVersion: "2025-04-01-preview",
});
});
it("accepts model-provider style config with api-key headers", () => {
const provider = buildMicrosoftFoundryRealtimeTranscriptionProvider();
const resolved = provider.resolveConfig?.({
cfg: {} as never,
rawConfig: {
providers: {
"microsoft-foundry": {
baseUrl: "https://example.services.ai.azure.com/openai/v1",
headers: {
"api-key": "azure-test-key",
},
model: "gpt-realtime",
},
},
},
});
expect(resolved).toEqual({
apiKey: "azure-test-key",
baseUrl: "https://example.services.ai.azure.com/openai/v1",
deployment: "gpt-realtime",
model: "gpt-realtime",
});
});
it("registers foundry aliases for voice provider selection", () => {
const provider = buildMicrosoftFoundryRealtimeTranscriptionProvider();
expect(provider.aliases).toContain("azure-foundry");
});
});

View File

@@ -0,0 +1,313 @@
import type {
RealtimeTranscriptionProviderConfig,
RealtimeTranscriptionProviderPlugin,
RealtimeTranscriptionSession,
RealtimeTranscriptionSessionCreateRequest,
} from "openclaw/plugin-sdk/realtime-transcription";
import WebSocket from "ws";
import { normalizeFoundryEndpoint, PROVIDER_ID } from "./shared.js";
type FoundryRealtimeTranscriptionProviderConfig = {
apiKey?: string;
baseUrl?: string;
endpoint?: string;
deployment?: string;
model?: string;
apiVersion?: string;
silenceDurationMs?: number;
vadThreshold?: number;
};
type FoundryRealtimeTranscriptionSessionConfig = RealtimeTranscriptionSessionCreateRequest & {
apiKey: string;
baseUrl: string;
deployment: string;
apiVersion: string;
silenceDurationMs: number;
vadThreshold: number;
};
type RealtimeEvent = {
type: string;
delta?: string;
transcript?: string;
error?: unknown;
item?: { transcript?: string } | null;
};
function trimToUndefined(value: unknown): string | undefined {
return typeof value === "string" && value.trim() ? value.trim() : undefined;
}
function asNumber(value: unknown): number | undefined {
return typeof value === "number" && Number.isFinite(value) ? value : undefined;
}
function asObject(value: unknown): Record<string, unknown> | undefined {
return typeof value === "object" && value !== null && !Array.isArray(value)
? (value as Record<string, unknown>)
: undefined;
}
function extractFoundryProviderConfig(
rawConfig: RealtimeTranscriptionProviderConfig,
): FoundryRealtimeTranscriptionProviderConfig {
const providers = asObject(rawConfig.providers);
const raw =
asObject(providers?.[PROVIDER_ID]) ??
asObject(rawConfig[PROVIDER_ID]) ??
asObject(rawConfig.microsoftFoundry) ??
asObject(rawConfig);
const providerBaseUrl = trimToUndefined(raw?.baseUrl);
const endpoint = trimToUndefined(raw?.endpoint);
return {
apiKey:
trimToUndefined(raw?.apiKey) ??
trimToUndefined(asObject(raw?.headers)?.["api-key"]) ??
trimToUndefined(asObject(raw?.headers)?.Authorization)?.replace(/^Bearer\s+/i, ""),
baseUrl: providerBaseUrl,
endpoint,
deployment:
trimToUndefined(raw?.deployment) ??
trimToUndefined(raw?.model) ??
trimToUndefined(raw?.deploymentName),
model: trimToUndefined(raw?.transcriptionModel) ?? trimToUndefined(raw?.model),
apiVersion: trimToUndefined(raw?.apiVersion),
silenceDurationMs: asNumber(raw?.silenceDurationMs),
vadThreshold: asNumber(raw?.vadThreshold),
};
}
function resolveFoundryRealtimeBaseUrl(
config: FoundryRealtimeTranscriptionProviderConfig,
): string | undefined {
if (config.endpoint) {
return normalizeFoundryEndpoint(config.endpoint);
}
if (!config.baseUrl) {
return undefined;
}
return normalizeFoundryEndpoint(config.baseUrl);
}
class FoundryRealtimeTranscriptionSession implements RealtimeTranscriptionSession {
private static readonly MAX_RECONNECT_ATTEMPTS = 5;
private static readonly RECONNECT_DELAY_MS = 1000;
private static readonly CONNECT_TIMEOUT_MS = 10_000;
private ws: WebSocket | null = null;
private connected = false;
private closed = false;
private reconnectAttempts = 0;
private pendingTranscript = "";
constructor(private readonly config: FoundryRealtimeTranscriptionSessionConfig) {}
async connect(): Promise<void> {
this.closed = false;
this.reconnectAttempts = 0;
await this.doConnect();
}
sendAudio(audio: Buffer): void {
if (this.ws?.readyState !== WebSocket.OPEN) {
return;
}
this.sendEvent({
type: "input_audio_buffer.append",
audio: audio.toString("base64"),
});
}
close(): void {
this.closed = true;
this.connected = false;
if (this.ws) {
this.ws.close(1000, "Transcription session closed");
this.ws = null;
}
}
isConnected(): boolean {
return this.connected;
}
private async doConnect(): Promise<void> {
await new Promise<void>((resolve, reject) => {
const wsUrl = this.buildWebSocketUrl();
this.ws = new WebSocket(wsUrl, {
headers: {
"api-key": this.config.apiKey,
},
});
const connectTimeout = setTimeout(() => {
reject(new Error("Microsoft Foundry realtime transcription connection timeout"));
}, FoundryRealtimeTranscriptionSession.CONNECT_TIMEOUT_MS);
this.ws.on("open", () => {
clearTimeout(connectTimeout);
this.connected = true;
this.reconnectAttempts = 0;
this.sendEvent({
type: "session.update",
session: {
input_audio_format: "pcm16",
input_audio_transcription: {
model: this.config.deployment,
},
turn_detection: {
type: "server_vad",
threshold: this.config.vadThreshold,
prefix_padding_ms: 300,
silence_duration_ms: this.config.silenceDurationMs,
},
},
});
resolve();
});
this.ws.on("message", (data: Buffer) => {
try {
this.handleEvent(JSON.parse(data.toString()) as RealtimeEvent);
} catch (error) {
this.config.onError?.(error instanceof Error ? error : new Error(String(error)));
}
});
this.ws.on("error", (error) => {
if (!this.connected) {
clearTimeout(connectTimeout);
reject(error);
return;
}
this.config.onError?.(error instanceof Error ? error : new Error(String(error)));
});
this.ws.on("close", () => {
this.connected = false;
if (this.closed) {
return;
}
void this.attemptReconnect();
});
});
}
private buildWebSocketUrl(): string {
const httpBaseUrl = this.config.baseUrl.replace(/\/+$/, "");
const wsBaseUrl = httpBaseUrl.replace(/^http:/i, "ws:").replace(/^https:/i, "wss:");
const url = new URL(`${wsBaseUrl}/openai/realtime`);
url.searchParams.set("api-version", this.config.apiVersion);
url.searchParams.set("deployment", this.config.deployment);
return url.toString();
}
private async attemptReconnect(): Promise<void> {
if (this.closed) {
return;
}
if (this.reconnectAttempts >= FoundryRealtimeTranscriptionSession.MAX_RECONNECT_ATTEMPTS) {
this.config.onError?.(
new Error("Microsoft Foundry realtime transcription reconnect limit reached"),
);
return;
}
this.reconnectAttempts += 1;
const delay =
FoundryRealtimeTranscriptionSession.RECONNECT_DELAY_MS * 2 ** (this.reconnectAttempts - 1);
await new Promise((resolve) => setTimeout(resolve, delay));
if (this.closed) {
return;
}
try {
await this.doConnect();
} catch (error) {
this.config.onError?.(error instanceof Error ? error : new Error(String(error)));
await this.attemptReconnect();
}
}
private handleEvent(event: RealtimeEvent): void {
switch (event.type) {
case "conversation.item.input_audio_transcription.delta":
case "conversation.item.audio_transcription.delta":
if (event.delta) {
this.pendingTranscript += event.delta;
this.config.onPartial?.(this.pendingTranscript);
}
return;
case "conversation.item.input_audio_transcription.completed":
case "conversation.item.audio_transcription.completed": {
const transcript = event.transcript ?? event.item?.transcript;
if (transcript) {
this.config.onTranscript?.(transcript);
}
this.pendingTranscript = "";
return;
}
case "input_audio_buffer.speech_started":
this.pendingTranscript = "";
this.config.onSpeechStart?.();
return;
case "error": {
const detail =
event.error && typeof event.error === "object" && "message" in event.error
? String((event.error as { message?: unknown }).message ?? "Unknown error")
: event.error
? String(event.error)
: "Unknown error";
this.config.onError?.(new Error(detail));
return;
}
default:
return;
}
}
private sendEvent(event: unknown): void {
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify(event));
}
}
}
export function buildMicrosoftFoundryRealtimeTranscriptionProvider(): RealtimeTranscriptionProviderPlugin {
return {
id: PROVIDER_ID,
label: "Microsoft Foundry Realtime Transcription",
aliases: ["azure-foundry", "azure-openai-foundry"],
autoSelectOrder: 20,
resolveConfig: ({ rawConfig }) => extractFoundryProviderConfig(rawConfig),
isConfigured: ({ providerConfig }) => {
const config = extractFoundryProviderConfig(providerConfig);
return Boolean(config.apiKey && resolveFoundryRealtimeBaseUrl(config) && config.deployment);
},
createSession: (req) => {
const config = extractFoundryProviderConfig(req.providerConfig);
const baseUrl = resolveFoundryRealtimeBaseUrl(config);
if (!config.apiKey) {
throw new Error("Microsoft Foundry realtime transcription API key missing");
}
if (!baseUrl) {
throw new Error("Microsoft Foundry realtime transcription endpoint missing");
}
if (!config.deployment) {
throw new Error("Microsoft Foundry realtime transcription deployment missing");
}
return new FoundryRealtimeTranscriptionSession({
...req,
apiKey: config.apiKey,
baseUrl,
deployment: config.deployment,
apiVersion: config.apiVersion ?? "2025-04-01-preview",
silenceDurationMs: config.silenceDurationMs ?? 800,
vadThreshold: config.vadThreshold ?? 0.5,
});
},
};
}

View File

@@ -18,6 +18,7 @@ type OpenAIRealtimeTranscriptionProviderConfig = {
model?: string;
silenceDurationMs?: number;
vadThreshold?: number;
inputAudioFormat?: string;
};
type OpenAIRealtimeTranscriptionSessionConfig = RealtimeTranscriptionSessionCreateRequest & {
@@ -25,6 +26,7 @@ type OpenAIRealtimeTranscriptionSessionConfig = RealtimeTranscriptionSessionCrea
model: string;
silenceDurationMs: number;
vadThreshold: number;
inputAudioFormat: string;
};
type RealtimeEvent = {
@@ -51,6 +53,7 @@ function normalizeProviderConfig(
model: trimToUndefined(raw?.model) ?? trimToUndefined(raw?.sttModel),
silenceDurationMs: asFiniteNumber(raw?.silenceDurationMs),
vadThreshold: asFiniteNumber(raw?.vadThreshold),
inputAudioFormat: trimToUndefined(raw?.inputAudioFormat),
};
}
@@ -116,7 +119,7 @@ class OpenAIRealtimeTranscriptionSession implements RealtimeTranscriptionSession
this.sendEvent({
type: "transcription_session.update",
session: {
input_audio_format: "g711_ulaw",
input_audio_format: this.config.inputAudioFormat,
input_audio_transcription: {
model: this.config.model,
},
@@ -241,6 +244,7 @@ export function buildOpenAIRealtimeTranscriptionProvider(): RealtimeTranscriptio
model: config.model ?? "gpt-4o-transcribe",
silenceDurationMs: config.silenceDurationMs ?? 800,
vadThreshold: config.vadThreshold ?? 0.5,
inputAudioFormat: config.inputAudioFormat ?? "g711_ulaw",
});
},
};

View File

@@ -9,6 +9,7 @@ export {
isTtsProviderConfigured,
listSpeechVoices,
maybeApplyTtsToPayload,
resolveExplicitTtsOverrides,
resolveTtsAutoMode,
resolveTtsConfig,
resolveTtsPrefsPath,

View File

@@ -23,7 +23,7 @@ import { resolveSendableOutboundReplyParts } from "openclaw/plugin-sdk/reply-pay
import type { ReplyPayload } from "openclaw/plugin-sdk/reply-runtime";
import { isVerbose, logVerbose } from "openclaw/plugin-sdk/runtime-env";
import { resolvePreferredOpenClawTmpDir } from "openclaw/plugin-sdk/sandbox";
import { CONFIG_DIR, resolveUserPath, stripMarkdown } from "openclaw/plugin-sdk/text-runtime";
import { resolveConfigDir, resolveUserPath, stripMarkdown } from "openclaw/plugin-sdk/text-runtime";
import {
canonicalizeSpeechProviderId,
getSpeechProvider,
@@ -35,6 +35,7 @@ import {
summarizeText,
type SpeechModelOverridePolicy,
type SpeechProviderConfig,
type SpeechProviderOverrides,
type SpeechVoiceOption,
type TtsDirectiveOverrides,
type TtsDirectiveParseResult,
@@ -167,7 +168,7 @@ function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
if (envPath) {
return resolveUserPath(envPath);
}
return path.join(CONFIG_DIR, "settings", "tts.json");
return path.join(resolveConfigDir(process.env), "settings", "tts.json");
}
function resolveModelOverridePolicy(
@@ -494,6 +495,66 @@ export function setTtsProvider(prefsPath: string, provider: TtsProvider): void {
});
}
export function resolveExplicitTtsOverrides(params: {
cfg: OpenClawConfig;
prefsPath?: string;
provider?: string;
modelId?: string;
voiceId?: string;
}): TtsDirectiveOverrides {
const providerInput = params.provider?.trim();
const modelId = params.modelId?.trim();
const voiceId = params.voiceId?.trim();
const config = resolveTtsConfig(params.cfg);
const prefsPath = params.prefsPath ?? resolveTtsPrefsPath(config);
const selectedProvider =
canonicalizeSpeechProviderId(providerInput, params.cfg) ??
(modelId || voiceId ? getTtsProvider(config, prefsPath) : undefined);
if (providerInput && !selectedProvider) {
throw new Error(`Unknown TTS provider "${providerInput}".`);
}
if (!modelId && !voiceId) {
return selectedProvider ? { provider: selectedProvider } : {};
}
if (!selectedProvider) {
throw new Error("TTS model or voice overrides require a resolved provider.");
}
const provider = getSpeechProvider(selectedProvider, params.cfg);
if (!provider) {
throw new Error(`speech provider ${selectedProvider} is not registered`);
}
if (!provider.resolveTalkOverrides) {
throw new Error(
`TTS provider "${selectedProvider}" does not support model or voice overrides.`,
);
}
const providerOverrides = provider.resolveTalkOverrides({
talkProviderConfig: {},
params: {
...(voiceId ? { voiceId } : {}),
...(modelId ? { modelId } : {}),
},
});
if ((voiceId || modelId) && (!providerOverrides || Object.keys(providerOverrides).length === 0)) {
throw new Error(
`TTS provider "${selectedProvider}" ignored the requested model or voice overrides.`,
);
}
const overridesRecord = providerOverrides as SpeechProviderOverrides;
return {
provider: selectedProvider,
providerOverrides: {
[provider.id]: overridesRecord,
},
};
}
export function getTtsMaxLength(prefsPath: string): number {
const prefs = readPrefs(prefsPath);
return prefs.tts?.maxLength ?? DEFAULT_TTS_MAX_LENGTH;

View File

@@ -131,9 +131,6 @@ function normalizeResolvedModel(params: {
const normalizedInputModel = {
...params.model,
input: resolveProviderModelInput({
provider: params.provider,
modelId: params.model.id,
modelName: params.model.name,
input: params.model.input,
}),
} as Model<Api>;
@@ -233,7 +230,6 @@ function findInlineModelMatch(params: {
}
export { buildModelAliasLines };
export { buildInlineProviderModels };
function resolveConfiguredProviderConfig(
cfg: OpenClawConfig | undefined,
@@ -250,6 +246,17 @@ function resolveConfiguredProviderConfig(
return findNormalizedProviderValue(configuredProviders, provider);
}
function resolveProviderModelInput(params: {
input?: unknown;
fallbackInput?: unknown;
}): Array<"text" | "image"> {
const resolvedInput = Array.isArray(params.input) ? params.input : params.fallbackInput;
const normalizedInput = Array.isArray(resolvedInput)
? resolvedInput.filter((item): item is "text" | "image" => item === "text" || item === "image")
: [];
return normalizedInput.length > 0 ? normalizedInput : ["text"];
}
function applyConfiguredProviderOverrides(params: {
provider: string;
discoveredModel: ProviderRuntimeModel;
@@ -290,9 +297,6 @@ function applyConfiguredProviderOverrides(params: {
};
}
const normalizedInput = resolveProviderModelInput({
provider: params.provider,
modelId,
modelName: configuredModel?.name ?? discoveredModel.name,
input: configuredModel?.input,
fallbackInput: discoveredModel.input,
});
@@ -337,6 +341,54 @@ function applyConfiguredProviderOverrides(params: {
);
}
export function buildInlineProviderModels(
providers: Record<string, InlineProviderConfig>,
): InlineModelEntry[] {
return Object.entries(providers).flatMap(([providerId, entry]) => {
const trimmed = providerId.trim();
if (!trimmed) {
return [];
}
const providerHeaders = sanitizeModelHeaders(entry?.headers, {
stripSecretRefMarkers: true,
});
const providerRequest = sanitizeConfiguredModelProviderRequest(entry?.request);
return (entry?.models ?? []).map((model) => {
const transport = resolveProviderTransport({
provider: trimmed,
api: model.api ?? entry?.api,
baseUrl: entry?.baseUrl,
});
const modelHeaders = sanitizeModelHeaders((model as InlineModelEntry).headers, {
stripSecretRefMarkers: true,
});
const requestConfig = resolveProviderRequestConfig({
provider: trimmed,
api: transport.api ?? model.api,
baseUrl: transport.baseUrl,
providerHeaders,
modelHeaders,
authHeader: entry?.authHeader,
request: providerRequest,
capability: "llm",
transport: "stream",
});
return attachModelProviderRequestTransport(
{
...model,
input: resolveProviderModelInput({
input: model.input,
}),
provider: trimmed,
baseUrl: requestConfig.baseUrl ?? transport.baseUrl,
api: requestConfig.api ?? model.api,
headers: requestConfig.headers,
},
providerRequest,
);
});
});
}
function resolveExplicitModelWithRegistry(params: {
provider: string;
modelId: string;
@@ -505,9 +557,6 @@ function resolveConfiguredFallbackModel(params: {
baseUrl: requestConfig.baseUrl,
reasoning: configuredModel?.reasoning ?? false,
input: resolveProviderModelInput({
provider,
modelId,
modelName: configuredModel?.name ?? modelId,
input: configuredModel?.input,
}),
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },

View File

@@ -249,11 +249,6 @@ export const TOOL_DISPLAY_CONFIG: ToolDisplayConfig = {
},
},
},
update_plan: {
emoji: "🗺️",
title: "Update Plan",
detailKeys: ["explanation", "plan.0.step"],
},
gateway: {
emoji: "🔌",
title: "Gateway",

View File

@@ -4,6 +4,7 @@ import type { RuntimeWebSearchMetadata } from "../../secrets/runtime-web-tools.t
import {
resolveWebSearchDefinition,
resolveWebSearchProviderId,
runWebSearch,
} from "../../web-search/runtime.js";
import type { AnyAgentTool } from "./common.js";
import { jsonResult } from "./common.js";
@@ -16,16 +17,17 @@ export function createWebSearchTool(options?: {
}): AnyAgentTool | null {
const runtimeProviderId =
options?.runtimeWebSearch?.selectedProvider ?? options?.runtimeWebSearch?.providerConfigured;
const preferRuntimeProviders =
Boolean(runtimeProviderId) &&
!resolveManifestContractOwnerPluginId({
contract: "webSearchProviders",
value: runtimeProviderId,
origin: "bundled",
config: options?.config,
});
const resolved = resolveWebSearchDefinition({
...options,
preferRuntimeProviders:
Boolean(runtimeProviderId) &&
!resolveManifestContractOwnerPluginId({
contract: "webSearchProviders",
value: runtimeProviderId,
origin: "bundled",
config: options?.config,
}),
preferRuntimeProviders,
});
if (!resolved) {
return null;
@@ -36,7 +38,19 @@ export function createWebSearchTool(options?: {
name: "web_search",
description: resolved.definition.description,
parameters: resolved.definition.parameters,
execute: async (_toolCallId, args) => jsonResult(await resolved.definition.execute(args)),
execute: async (_toolCallId, args) => {
const result = await runWebSearch({
config: options?.config,
sandboxed: options?.sandboxed,
runtimeWebSearch: options?.runtimeWebSearch,
preferRuntimeProviders,
args,
});
return jsonResult({
...result.result,
provider: result.provider,
});
},
};
}

View File

@@ -0,0 +1,703 @@
import fs from "node:fs/promises";
import os from "node:os";
import path from "node:path";
import { Command } from "commander";
import { beforeEach, describe, expect, it, vi } from "vitest";
import { runRegisteredCli } from "../test-utils/command-runner.js";
import { registerCapabilityCli } from "./capability-cli.js";
const mocks = vi.hoisted(() => ({
runtime: {
log: vi.fn(),
error: vi.fn(),
exit: vi.fn((code: number) => {
throw new Error(`exit ${code}`);
}),
writeJson: vi.fn(),
writeStdout: vi.fn(),
},
loadConfig: vi.fn(() => ({})),
loadAuthProfileStoreForRuntime: vi.fn(() => ({ profiles: {}, order: {} })),
listProfilesForProvider: vi.fn(() => []),
resolveMemorySearchConfig: vi.fn(() => null),
loadModelCatalog: vi.fn(async () => []),
agentCommand: vi.fn(async () => ({
payloads: [{ text: "local reply" }],
meta: { agentMeta: { provider: "openai", model: "gpt-5.4" } },
})),
callGateway: vi.fn(async ({ method }: { method: string }) => {
if (method === "tts.status") {
return { enabled: true, provider: "openai" };
}
if (method === "agent") {
return {
result: {
payloads: [{ text: "gateway reply" }],
meta: { agentMeta: { provider: "anthropic", model: "claude-sonnet-4-6" } },
},
};
}
return {};
}),
describeImageFile: vi.fn(async () => ({
text: "friendly lobster",
provider: "openai",
model: "gpt-4.1-mini",
})),
generateImage: vi.fn(),
transcribeAudioFile: vi.fn(async () => ({ text: "meeting notes" })),
textToSpeech: vi.fn(async () => ({
success: true,
audioPath: "/tmp/tts-source.mp3",
provider: "openai",
outputFormat: "mp3",
voiceCompatible: false,
attempts: [],
})),
setTtsProvider: vi.fn(),
resolveExplicitTtsOverrides: vi.fn(
({
provider,
modelId,
voiceId,
}: {
provider?: string;
modelId?: string;
voiceId?: string;
}) => ({
...(provider ? { provider } : {}),
...(modelId || voiceId
? {
providerOverrides: {
[provider ?? "openai"]: {
...(modelId ? { modelId } : {}),
...(voiceId ? { voiceId } : {}),
},
},
}
: {}),
}),
),
createEmbeddingProvider: vi.fn(async () => ({
provider: {
id: "openai",
model: "text-embedding-3-small",
embedQuery: async () => [0.1, 0.2],
embedBatch: async (texts: string[]) => texts.map(() => [0.1, 0.2]),
},
})),
registerMemoryEmbeddingProvider: vi.fn(),
listMemoryEmbeddingProviders: vi.fn(() => [
{ id: "openai", defaultModel: "text-embedding-3-small", transport: "remote" },
]),
registerBuiltInMemoryEmbeddingProviders: vi.fn(),
isWebSearchProviderConfigured: vi.fn(() => false),
isWebFetchProviderConfigured: vi.fn(() => false),
modelsStatusCommand: vi.fn(
async (_opts: unknown, runtime: { log: (...args: unknown[]) => void }) => {
runtime.log(JSON.stringify({ ok: true, providers: [{ id: "openai" }] }));
},
),
}));
vi.mock("../runtime.js", () => ({
defaultRuntime: mocks.runtime,
writeRuntimeJson: (runtime: { writeJson: (value: unknown) => void }, value: unknown) =>
runtime.writeJson(value),
}));
vi.mock("../config/config.js", () => ({
loadConfig: (...args: unknown[]) => mocks.loadConfig(...args),
}));
vi.mock("../agents/agent-command.js", () => ({
agentCommand: (...args: unknown[]) => mocks.agentCommand(...args),
}));
vi.mock("../agents/agent-scope.js", () => ({
resolveDefaultAgentId: () => "main",
resolveAgentDir: () => "/tmp/agent",
}));
vi.mock("../agents/model-catalog.js", () => ({
loadModelCatalog: (...args: unknown[]) => mocks.loadModelCatalog(...args),
}));
vi.mock("../agents/auth-profiles.js", () => ({
loadAuthProfileStoreForRuntime: (...args: unknown[]) =>
mocks.loadAuthProfileStoreForRuntime(...args),
listProfilesForProvider: (...args: unknown[]) => mocks.listProfilesForProvider(...args),
}));
vi.mock("../agents/memory-search.js", () => ({
resolveMemorySearchConfig: (...args: unknown[]) => mocks.resolveMemorySearchConfig(...args),
}));
vi.mock("../commands/models.js", () => ({
modelsAuthLoginCommand: vi.fn(),
modelsStatusCommand: (...args: unknown[]) => mocks.modelsStatusCommand(...args),
}));
vi.mock("../gateway/call.js", () => ({
callGateway: (...args: unknown[]) => mocks.callGateway(...args),
randomIdempotencyKey: () => "run-1",
}));
vi.mock("../gateway/connection-details.js", () => ({
buildGatewayConnectionDetailsWithResolvers: vi.fn(() => ({
url: "ws://127.0.0.1:18789",
urlSource: "local loopback",
message: "Gateway target: ws://127.0.0.1:18789",
})),
}));
vi.mock("../media-understanding/runtime.js", () => ({
describeImageFile: (...args: unknown[]) => mocks.describeImageFile(...args),
describeVideoFile: vi.fn(),
transcribeAudioFile: (...args: unknown[]) => mocks.transcribeAudioFile(...args),
}));
vi.mock("../../extensions/memory-core/src/memory/embeddings.js", () => ({
createEmbeddingProvider: (...args: unknown[]) => mocks.createEmbeddingProvider(...args),
}));
vi.mock("../plugins/memory-embedding-providers.js", () => ({
listMemoryEmbeddingProviders: (...args: unknown[]) => mocks.listMemoryEmbeddingProviders(...args),
registerMemoryEmbeddingProvider: (...args: unknown[]) =>
mocks.registerMemoryEmbeddingProvider(...args),
}));
vi.mock("../../extensions/memory-core/src/memory/provider-adapters.js", () => ({
registerBuiltInMemoryEmbeddingProviders: (...args: unknown[]) =>
mocks.registerBuiltInMemoryEmbeddingProviders(...args),
}));
vi.mock("../image-generation/runtime.js", () => ({
generateImage: (...args: unknown[]) => mocks.generateImage(...args),
listRuntimeImageGenerationProviders: vi.fn(() => []),
}));
vi.mock("../video-generation/runtime.js", () => ({
generateVideo: vi.fn(),
listRuntimeVideoGenerationProviders: vi.fn(() => []),
}));
vi.mock("../tts/tts.js", () => ({
getTtsProvider: vi.fn(() => "openai"),
listSpeechVoices: vi.fn(async () => []),
resolveTtsConfig: vi.fn(() => ({})),
resolveTtsPrefsPath: vi.fn(() => "/tmp/tts.json"),
setTtsEnabled: vi.fn(),
setTtsProvider: (...args: unknown[]) => mocks.setTtsProvider(...args),
resolveExplicitTtsOverrides: (...args: unknown[]) => mocks.resolveExplicitTtsOverrides(...args),
textToSpeech: (...args: unknown[]) => mocks.textToSpeech(...args),
}));
vi.mock("../tts/provider-registry.js", () => ({
canonicalizeSpeechProviderId: vi.fn((provider: string) => provider),
listSpeechProviders: vi.fn(() => []),
}));
vi.mock("../web-search/runtime.js", () => ({
listWebSearchProviders: vi.fn(() => []),
isWebSearchProviderConfigured: (...args: unknown[]) =>
mocks.isWebSearchProviderConfigured(...args),
runWebSearch: vi.fn(),
}));
vi.mock("../web-fetch/runtime.js", () => ({
listWebFetchProviders: vi.fn(() => []),
isWebFetchProviderConfigured: (...args: unknown[]) => mocks.isWebFetchProviderConfigured(...args),
resolveWebFetchDefinition: vi.fn(),
}));
describe("capability cli", () => {
beforeEach(() => {
mocks.runtime.log.mockClear();
mocks.runtime.error.mockClear();
mocks.runtime.writeJson.mockClear();
mocks.loadModelCatalog
.mockReset()
.mockResolvedValue([{ id: "gpt-5.4", provider: "openai", name: "GPT-5.4" }]);
mocks.loadAuthProfileStoreForRuntime.mockReset().mockReturnValue({ profiles: {}, order: {} });
mocks.listProfilesForProvider.mockReset().mockReturnValue([]);
mocks.resolveMemorySearchConfig.mockReset().mockReturnValue(null);
mocks.agentCommand.mockClear();
mocks.callGateway.mockClear().mockImplementation(async ({ method }: { method: string }) => {
if (method === "tts.status") {
return { enabled: true, provider: "openai" };
}
if (method === "agent") {
return {
result: {
payloads: [{ text: "gateway reply" }],
meta: { agentMeta: { provider: "anthropic", model: "claude-sonnet-4-6" } },
},
};
}
return {};
});
mocks.describeImageFile.mockClear();
mocks.generateImage.mockReset();
mocks.transcribeAudioFile.mockClear();
mocks.textToSpeech.mockClear();
mocks.setTtsProvider.mockClear();
mocks.resolveExplicitTtsOverrides.mockClear();
mocks.createEmbeddingProvider.mockClear();
mocks.registerMemoryEmbeddingProvider.mockClear();
mocks.registerBuiltInMemoryEmbeddingProviders.mockClear();
mocks.isWebSearchProviderConfigured.mockReset().mockReturnValue(false);
mocks.isWebFetchProviderConfigured.mockReset().mockReturnValue(false);
mocks.modelsStatusCommand.mockClear();
mocks.callGateway.mockImplementation(async ({ method }: { method: string }) => {
if (method === "tts.status") {
return { enabled: true, provider: "openai" };
}
if (method === "tts.convert") {
return {
audioPath: "/tmp/gateway-tts.mp3",
provider: "openai",
outputFormat: "mp3",
voiceCompatible: false,
};
}
if (method === "agent") {
return {
result: {
payloads: [{ text: "gateway reply" }],
meta: { agentMeta: { provider: "anthropic", model: "claude-sonnet-4-6" } },
},
};
}
return {};
});
});
it("lists canonical capabilities", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "list", "--json"],
});
const payload = mocks.runtime.writeJson.mock.calls[0]?.[0] as Array<{ id: string }>;
expect(payload.some((entry) => entry.id === "model.run")).toBe(true);
expect(payload.some((entry) => entry.id === "media.image.describe")).toBe(true);
});
it("defaults model run to local transport", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "model", "run", "--prompt", "hello", "--json"],
});
expect(mocks.agentCommand).toHaveBeenCalledTimes(1);
expect(mocks.callGateway).not.toHaveBeenCalled();
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({
capability: "model.run",
transport: "local",
}),
);
});
it("defaults tts status to gateway transport", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "media", "tts", "status", "--json"],
});
expect(mocks.callGateway).toHaveBeenCalledWith(
expect.objectContaining({ method: "tts.status" }),
);
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({ transport: "gateway" }),
);
});
it("routes image describe through media understanding, not generation", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "media", "image", "describe", "--file", "photo.jpg", "--json"],
});
expect(mocks.describeImageFile).toHaveBeenCalledWith(
expect.objectContaining({ filePath: expect.stringMatching(/photo\.jpg$/) }),
);
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({
capability: "media.image.describe",
outputs: [expect.objectContaining({ kind: "image.description" })],
}),
);
});
it("fails image describe when no description text is returned", async () => {
mocks.describeImageFile.mockResolvedValueOnce({
text: undefined,
provider: undefined,
model: undefined,
});
await expect(
runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "media", "image", "describe", "--file", "photo.jpg", "--json"],
}),
).rejects.toThrow("exit 1");
expect(mocks.runtime.error).toHaveBeenCalledWith(
expect.stringMatching(/No description returned for image/),
);
});
it("rewrites mismatched explicit image output extensions to the detected file type", async () => {
const jpegBase64 =
"/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxAQEBUQEBAVFRUVFRUVFRUVFRUVFRUVFRUXFhUVFRUYHSggGBolHRUVITEhJSkrLi4uFx8zODMsNygtLisBCgoKDg0OGhAQGi0fHyUtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLf/AABEIAAEAAQMBIgACEQEDEQH/xAAXAAEBAQEAAAAAAAAAAAAAAAAAAQID/8QAFhEBAQEAAAAAAAAAAAAAAAAAAAER/9oADAMBAAIQAxAAAAH2AP/EABgQAQEAAwAAAAAAAAAAAAAAAAEAEQIS/9oACAEBAAEFAk1o7//EABYRAQEBAAAAAAAAAAAAAAAAAAABEf/aAAgBAwEBPwGn/8QAFhEBAQEAAAAAAAAAAAAAAAAAABEB/9oACAECAQE/AYf/xAAaEAACAgMAAAAAAAAAAAAAAAABEQAhMUFh/9oACAEBAAY/AjK9cY2f/8QAGhABAQACAwAAAAAAAAAAAAAAAAERITFBUf/aAAgBAQABPyGQk7W5jVYkA//Z";
mocks.generateImage.mockResolvedValue({
provider: "openai",
model: "gpt-image-1",
attempts: [],
images: [
{
buffer: Buffer.from(jpegBase64, "base64"),
mimeType: "image/png",
fileName: "provider-output.png",
},
],
});
const tempOutput = path.join(os.tmpdir(), `openclaw-image-mismatch-${Date.now()}.png`);
await fs.rm(tempOutput, { force: true });
await fs.rm(tempOutput.replace(/\.png$/, ".jpg"), { force: true });
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"image",
"generate",
"--prompt",
"friendly lobster",
"--output",
tempOutput,
"--json",
],
});
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({
outputs: [
expect.objectContaining({
path: tempOutput.replace(/\.png$/, ".jpg"),
mimeType: "image/jpeg",
}),
],
}),
);
});
it("routes audio transcribe through transcription, not realtime", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "media", "audio", "transcribe", "--file", "memo.m4a", "--json"],
});
expect(mocks.transcribeAudioFile).toHaveBeenCalledWith(
expect.objectContaining({ filePath: expect.stringMatching(/memo\.m4a$/) }),
);
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({
capability: "media.audio.transcribe",
outputs: [expect.objectContaining({ kind: "audio.transcription" })],
}),
);
});
it("fails audio transcribe when no transcript text is returned", async () => {
mocks.transcribeAudioFile.mockResolvedValueOnce({ text: undefined });
await expect(
runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "media", "audio", "transcribe", "--file", "memo.m4a", "--json"],
}),
).rejects.toThrow("exit 1");
expect(mocks.runtime.error).toHaveBeenCalledWith(
expect.stringMatching(/No transcript returned for audio/),
);
});
it("forwards transcription prompt and language hints", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"audio",
"transcribe",
"--file",
"memo.m4a",
"--language",
"en",
"--prompt",
"Focus on names",
"--json",
],
});
expect(mocks.transcribeAudioFile).toHaveBeenCalledWith(
expect.objectContaining({
filePath: expect.stringMatching(/memo\.m4a$/),
language: "en",
prompt: "Focus on names",
}),
);
});
it("uses request-scoped TTS overrides without mutating prefs", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"tts",
"convert",
"--text",
"hello",
"--model",
"openai/gpt-4o-mini-tts",
"--voice",
"alloy",
"--json",
],
});
expect(mocks.textToSpeech).toHaveBeenCalledWith(
expect.objectContaining({
overrides: expect.objectContaining({
provider: "openai",
providerOverrides: expect.objectContaining({
openai: expect.objectContaining({
modelId: "gpt-4o-mini-tts",
voiceId: "alloy",
}),
}),
}),
}),
);
expect(mocks.setTtsProvider).not.toHaveBeenCalled();
});
it("disables TTS fallback when explicit provider or voice/model selection is requested", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"tts",
"convert",
"--text",
"hello",
"--model",
"openai/gpt-4o-mini-tts",
"--voice",
"alloy",
"--json",
],
});
expect(mocks.textToSpeech).toHaveBeenCalledWith(
expect.objectContaining({
disableFallback: true,
}),
);
});
it("does not infer and forward a local provider guess for gateway TTS overrides", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"tts",
"convert",
"--gateway",
"--text",
"hello",
"--voice",
"alloy",
"--json",
],
});
expect(mocks.callGateway).toHaveBeenCalledWith(
expect.objectContaining({
method: "tts.convert",
params: expect.objectContaining({
provider: undefined,
voiceId: "alloy",
}),
}),
);
});
it("fails clearly when gateway TTS output is requested against a remote gateway", async () => {
const gatewayConnection = await import("../gateway/connection-details.js");
vi.mocked(gatewayConnection.buildGatewayConnectionDetailsWithResolvers).mockReturnValueOnce({
url: "wss://gateway.example.com",
urlSource: "config gateway.remote.url",
message: "Gateway target: wss://gateway.example.com",
});
await expect(
runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: [
"capability",
"media",
"tts",
"convert",
"--gateway",
"--text",
"hello",
"--output",
"hello.mp3",
"--json",
],
}),
).rejects.toThrow("exit 1");
expect(mocks.runtime.error).toHaveBeenCalledWith(
expect.stringContaining("--output is not supported for remote gateway TTS yet"),
);
});
it("uses only embedding providers for embedding creation", async () => {
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "embedding", "create", "--text", "hello", "--json"],
});
expect(mocks.createEmbeddingProvider).toHaveBeenCalledWith(
expect.objectContaining({
provider: "auto",
fallback: "none",
}),
);
expect(mocks.runtime.writeJson).toHaveBeenCalledWith(
expect.objectContaining({
capability: "embedding.create",
provider: "openai",
model: "text-embedding-3-small",
}),
);
});
it("bootstraps built-in embedding providers when the registry is empty", async () => {
mocks.listMemoryEmbeddingProviders.mockReturnValueOnce([]);
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "embedding", "providers", "--json"],
});
expect(mocks.registerBuiltInMemoryEmbeddingProviders).toHaveBeenCalledWith(
expect.objectContaining({
registerMemoryEmbeddingProvider: expect.any(Function),
}),
);
});
it("surfaces available, configured, and selected for web providers", async () => {
mocks.loadConfig.mockReturnValue({
tools: {
web: {
search: { provider: "gemini" },
fetch: { provider: "firecrawl" },
},
},
});
const webSearchRuntime = await import("../web-search/runtime.js");
const webFetchRuntime = await import("../web-fetch/runtime.js");
vi.mocked(webSearchRuntime.listWebSearchProviders).mockReturnValue([
{ id: "brave", envVars: ["BRAVE_API_KEY"] } as never,
{ id: "gemini", envVars: ["GEMINI_API_KEY"] } as never,
]);
vi.mocked(webFetchRuntime.listWebFetchProviders).mockReturnValue([
{ id: "firecrawl", envVars: ["FIRECRAWL_API_KEY"] } as never,
]);
mocks.isWebSearchProviderConfigured.mockReturnValueOnce(false).mockReturnValueOnce(true);
mocks.isWebFetchProviderConfigured.mockReturnValueOnce(true);
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "web", "providers", "--json"],
});
expect(mocks.runtime.writeJson).toHaveBeenCalledWith({
search: [
{
available: true,
configured: false,
selected: false,
id: "brave",
envVars: ["BRAVE_API_KEY"],
},
{
available: true,
configured: true,
selected: true,
id: "gemini",
envVars: ["GEMINI_API_KEY"],
},
],
fetch: [
{
available: true,
configured: true,
selected: true,
id: "firecrawl",
envVars: ["FIRECRAWL_API_KEY"],
},
],
});
});
it("surfaces selected and configured embedding provider state", async () => {
mocks.loadConfig.mockReturnValue({});
mocks.resolveMemorySearchConfig.mockReturnValue({
provider: "gemini",
model: "gemini-embedding-001",
});
mocks.listMemoryEmbeddingProviders.mockReturnValue([
{ id: "openai", defaultModel: "text-embedding-3-small", transport: "remote" },
{ id: "gemini", defaultModel: "gemini-embedding-001", transport: "remote" },
]);
await runRegisteredCli({
register: registerCapabilityCli as (program: Command) => void,
argv: ["capability", "embedding", "providers", "--json"],
});
expect(mocks.runtime.writeJson).toHaveBeenCalledWith([
{
available: true,
configured: false,
selected: false,
id: "openai",
defaultModel: "text-embedding-3-small",
transport: "remote",
autoSelectPriority: undefined,
},
{
available: true,
configured: true,
selected: true,
id: "gemini",
defaultModel: "gemini-embedding-001",
transport: "remote",
autoSelectPriority: undefined,
},
]);
});
});

1805
src/cli/capability-cli.ts Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -74,6 +74,15 @@ const entrySpecs: readonly CommandGroupDescriptorSpec<SubCliRegistrar>[] = [
loadModule: () => import("../models-cli.js"),
exportName: "registerModelsCli",
},
{
name: "capability",
description: "Run provider-backed capability commands",
hasSubcommands: true,
register: async (program) => {
const mod = await import("../capability-cli.js");
mod.registerCapabilityCli(program);
},
},
{
commandNames: ["approvals"],
loadModule: () => import("../exec-approvals-cli.js"),

View File

@@ -22,6 +22,11 @@ const subCliCommandCatalog = defineCommandDescriptorCatalog([
description: "Discover, scan, and configure models",
hasSubcommands: true,
},
{
name: "capability",
description: "Run provider-backed capability commands",
hasSubcommands: true,
},
{
name: "approvals",
description: "Manage exec approvals (gateway or node host)",

View File

@@ -75,6 +75,16 @@ export const FIELD_HELP: Record<string, string> = {
"Control UI hosting settings including enablement, pathing, and browser-origin/auth hardening behavior. Keep UI exposure minimal and pair with strong auth controls before internet-facing deployments.",
"gateway.controlUi.enabled":
"Enables serving the gateway Control UI from the gateway HTTP process when true. Keep enabled for local administration, and disable when an external control surface replaces it.",
"gateway.controlUi.voice":
"Browser voice settings for the Control UI chat, including realtime transcription provider selection and optional assistant speech playback.",
"gateway.controlUi.voice.enabled":
"Enables realtime browser voice sessions for the Control UI chat when a transcription provider is configured.",
"gateway.controlUi.voice.transcriptionProvider":
"Registered realtime transcription provider id used for browser mic input. Keep this explicit so browser voice fails closed when no provider is configured.",
"gateway.controlUi.voice.providers":
"Provider-owned realtime transcription config keyed by provider id for browser voice sessions.",
"gateway.controlUi.voice.playbackEnabled":
"Enables browser speech-synthesis playback for finalized assistant replies during a voice session.",
"gateway.auth":
"Authentication policy for gateway HTTP/WebSocket access including mode, credentials, trusted-proxy behavior, and rate limiting. Keep auth enabled for every non-loopback deployment.",
"gateway.auth.mode":

View File

@@ -100,6 +100,17 @@ export type GatewayControlUiConfig = {
allowInsecureAuth?: boolean;
/** DANGEROUS: Disable device identity checks for the Control UI (default: false). */
dangerouslyDisableDeviceAuth?: boolean;
/** Realtime voice settings for the browser chat UI. */
voice?: {
/** Enable browser voice sessions for the Control UI chat. */
enabled?: boolean;
/** Registered realtime transcription provider id to use for browser voice. */
transcriptionProvider?: string;
/** Provider-owned realtime transcription config keyed by provider id. */
providers?: Record<string, Record<string, unknown>>;
/** Enable browser speech synthesis playback for assistant replies. */
playbackEnabled?: boolean;
};
};
export type GatewayAuthMode = "none" | "token" | "password" | "trusted-proxy";

View File

@@ -676,6 +676,15 @@ export const OpenClawSchema = z
dangerouslyAllowHostHeaderOriginFallback: z.boolean().optional(),
allowInsecureAuth: z.boolean().optional(),
dangerouslyDisableDeviceAuth: z.boolean().optional(),
voice: z
.object({
enabled: z.boolean().optional(),
transcriptionProvider: z.string().min(1).optional(),
providers: z.record(z.string(), z.record(z.string(), z.unknown())).optional(),
playbackEnabled: z.boolean().optional(),
})
.strict()
.optional(),
})
.strict()
.optional(),

View File

@@ -0,0 +1,106 @@
import type { RealtimeTranscriptionSession } from "../realtime-transcription/provider-types.js";
export type ChatVoiceEventPayload = {
sessionKey: string;
state:
| "ready"
| "speech_start"
| "partial_transcript"
| "final_transcript"
| "assistant_started"
| "assistant_completed"
| "playback_clear"
| "interrupted"
| "error"
| "closed";
transcript?: string;
runId?: string;
errorMessage?: string;
playbackEnabled?: boolean;
};
export type ChatVoiceSessionEntry = {
sessionKey: string;
connId: string;
providerId: string;
playbackEnabled: boolean;
sttSession: RealtimeTranscriptionSession;
transcriptPartial: string;
transcriptFinal: string;
activeRunId: string | null;
};
const sessionsByKey = new Map<string, ChatVoiceSessionEntry>();
const sessionKeyByRunId = new Map<string, string>();
export function getChatVoiceSession(sessionKey: string): ChatVoiceSessionEntry | undefined {
return sessionsByKey.get(sessionKey);
}
export function setChatVoiceSession(entry: ChatVoiceSessionEntry) {
const existing = sessionsByKey.get(entry.sessionKey);
if (existing && existing !== entry) {
try {
existing.sttSession.close();
} catch {
// ignore replacement cleanup errors
}
if (existing.activeRunId) {
sessionKeyByRunId.delete(existing.activeRunId);
}
}
sessionsByKey.set(entry.sessionKey, entry);
}
export function deleteChatVoiceSession(sessionKey: string): ChatVoiceSessionEntry | undefined {
const entry = sessionsByKey.get(sessionKey);
if (!entry) {
return undefined;
}
sessionsByKey.delete(sessionKey);
if (entry.activeRunId) {
sessionKeyByRunId.delete(entry.activeRunId);
}
return entry;
}
export function setChatVoiceRunId(sessionKey: string, runId: string | null) {
const entry = sessionsByKey.get(sessionKey);
if (!entry) {
return;
}
if (entry.activeRunId) {
sessionKeyByRunId.delete(entry.activeRunId);
}
entry.activeRunId = runId;
if (runId) {
sessionKeyByRunId.set(runId, sessionKey);
}
}
export function getChatVoiceSessionByRunId(runId: string): ChatVoiceSessionEntry | undefined {
const sessionKey = sessionKeyByRunId.get(runId);
return sessionKey ? sessionsByKey.get(sessionKey) : undefined;
}
export function closeChatVoiceSessionsForConn(
connId: string,
emit: (connId: string, payload: ChatVoiceEventPayload) => void,
) {
for (const entry of sessionsByKey.values()) {
if (entry.connId !== connId) {
continue;
}
try {
entry.sttSession.close();
} catch {
// ignore cleanup errors on disconnect
}
deleteChatVoiceSession(entry.sessionKey);
emit(connId, {
sessionKey: entry.sessionKey,
state: "closed",
playbackEnabled: entry.playbackEnabled,
});
}
}

View File

@@ -27,6 +27,7 @@ describe("method scope resolution", () => {
it.each([
["sessions.resolve", ["operator.read"]],
["config.schema.lookup", ["operator.read"]],
["chat.voice.start", ["operator.write"]],
["sessions.create", ["operator.write"]],
["sessions.send", ["operator.write"]],
["sessions.abort", ["operator.write"]],
@@ -85,6 +86,10 @@ describe("operator scope authorization", () => {
allowed: false,
missingScope: "operator.write",
});
expect(authorizeOperatorScopesForMethod("chat.voice.start", ["operator.read"])).toEqual({
allowed: false,
missingScope: "operator.write",
});
});
it("requires pairing scope for node pairing approvals", () => {

View File

@@ -117,14 +117,23 @@ const METHOD_SCOPE_GROUPS: Record<OperatorScope, readonly string[]> = {
"wake",
"talk.mode",
"talk.speak",
"chat.voice.start",
"tts.enable",
"tts.disable",
"tts.convert",
"tts.setProvider",
"realtimeTranscription.start",
"realtimeTranscription.pushAudio",
"realtimeTranscription.pull",
"realtimeTranscription.finish",
"voicewake.set",
"node.invoke",
"chat.send",
"chat.abort",
"chat.voice.audio",
"chat.voice.commit",
"chat.voice.interrupt",
"chat.voice.stop",
"sessions.create",
"sessions.send",
"sessions.steer",

View File

@@ -63,6 +63,18 @@ import {
ChatHistoryParamsSchema,
type ChatInjectParams,
ChatInjectParamsSchema,
type ChatVoiceAudioParams,
ChatVoiceAudioParamsSchema,
type ChatVoiceCommitParams,
ChatVoiceCommitParamsSchema,
type ChatVoiceEvent,
ChatVoiceEventSchema,
type ChatVoiceInterruptParams,
ChatVoiceInterruptParamsSchema,
type ChatVoiceStartParams,
ChatVoiceStartParamsSchema,
type ChatVoiceStopParams,
ChatVoiceStopParamsSchema,
ChatSendParamsSchema,
type ConfigApplyParams,
ConfigApplyParamsSchema,
@@ -474,6 +486,21 @@ export const validateChatSendParams = ajv.compile(ChatSendParamsSchema);
export const validateChatAbortParams = ajv.compile<ChatAbortParams>(ChatAbortParamsSchema);
export const validateChatInjectParams = ajv.compile<ChatInjectParams>(ChatInjectParamsSchema);
export const validateChatEvent = ajv.compile(ChatEventSchema);
export const validateChatVoiceStartParams = ajv.compile<ChatVoiceStartParams>(
ChatVoiceStartParamsSchema,
);
export const validateChatVoiceAudioParams = ajv.compile<ChatVoiceAudioParams>(
ChatVoiceAudioParamsSchema,
);
export const validateChatVoiceCommitParams = ajv.compile<ChatVoiceCommitParams>(
ChatVoiceCommitParamsSchema,
);
export const validateChatVoiceInterruptParams = ajv.compile<ChatVoiceInterruptParams>(
ChatVoiceInterruptParamsSchema,
);
export const validateChatVoiceStopParams =
ajv.compile<ChatVoiceStopParams>(ChatVoiceStopParamsSchema);
export const validateChatVoiceEvent = ajv.compile<ChatVoiceEvent>(ChatVoiceEventSchema);
export const validateUpdateRunParams = ajv.compile<UpdateRunParams>(UpdateRunParamsSchema);
export const validateWebLoginStartParams =
ajv.compile<WebLoginStartParams>(WebLoginStartParamsSchema);

View File

@@ -68,6 +68,68 @@ export const ChatInjectParamsSchema = Type.Object(
{ additionalProperties: false },
);
export const ChatVoiceStartParamsSchema = Type.Object(
{
sessionKey: NonEmptyString,
},
{ additionalProperties: false },
);
export const ChatVoiceAudioParamsSchema = Type.Object(
{
sessionKey: NonEmptyString,
audio: NonEmptyString,
format: Type.Optional(Type.String()),
sampleRate: Type.Optional(Type.Integer({ minimum: 1 })),
},
{ additionalProperties: false },
);
export const ChatVoiceCommitParamsSchema = Type.Object(
{
sessionKey: NonEmptyString,
transcript: Type.Optional(Type.String()),
},
{ additionalProperties: false },
);
export const ChatVoiceInterruptParamsSchema = Type.Object(
{
sessionKey: NonEmptyString,
},
{ additionalProperties: false },
);
export const ChatVoiceStopParamsSchema = Type.Object(
{
sessionKey: NonEmptyString,
},
{ additionalProperties: false },
);
export const ChatVoiceEventSchema = Type.Object(
{
sessionKey: NonEmptyString,
state: Type.Union([
Type.Literal("ready"),
Type.Literal("speech_start"),
Type.Literal("partial_transcript"),
Type.Literal("final_transcript"),
Type.Literal("assistant_started"),
Type.Literal("assistant_completed"),
Type.Literal("playback_clear"),
Type.Literal("interrupted"),
Type.Literal("error"),
Type.Literal("closed"),
]),
transcript: Type.Optional(Type.String()),
runId: Type.Optional(Type.String()),
errorMessage: Type.Optional(Type.String()),
playbackEnabled: Type.Optional(Type.Boolean()),
},
{ additionalProperties: false },
);
export const ChatEventSchema = Type.Object(
{
runId: NonEmptyString,

View File

@@ -118,6 +118,12 @@ import {
ChatEventSchema,
ChatHistoryParamsSchema,
ChatInjectParamsSchema,
ChatVoiceAudioParamsSchema,
ChatVoiceCommitParamsSchema,
ChatVoiceEventSchema,
ChatVoiceInterruptParamsSchema,
ChatVoiceStartParamsSchema,
ChatVoiceStopParamsSchema,
ChatSendParamsSchema,
LogsTailParamsSchema,
LogsTailResultSchema,
@@ -330,7 +336,13 @@ export const ProtocolSchemas = {
ChatSendParams: ChatSendParamsSchema,
ChatAbortParams: ChatAbortParamsSchema,
ChatInjectParams: ChatInjectParamsSchema,
ChatVoiceStartParams: ChatVoiceStartParamsSchema,
ChatVoiceAudioParams: ChatVoiceAudioParamsSchema,
ChatVoiceCommitParams: ChatVoiceCommitParamsSchema,
ChatVoiceInterruptParams: ChatVoiceInterruptParamsSchema,
ChatVoiceStopParams: ChatVoiceStopParamsSchema,
ChatEvent: ChatEventSchema,
ChatVoiceEvent: ChatVoiceEventSchema,
UpdateRunParams: UpdateRunParamsSchema,
TickEvent: TickEventSchema,
ShutdownEvent: ShutdownEventSchema,

View File

@@ -144,6 +144,12 @@ export type DeviceTokenRevokeParams = SchemaType<"DeviceTokenRevokeParams">;
export type ChatAbortParams = SchemaType<"ChatAbortParams">;
export type ChatInjectParams = SchemaType<"ChatInjectParams">;
export type ChatEvent = SchemaType<"ChatEvent">;
export type ChatVoiceStartParams = SchemaType<"ChatVoiceStartParams">;
export type ChatVoiceAudioParams = SchemaType<"ChatVoiceAudioParams">;
export type ChatVoiceCommitParams = SchemaType<"ChatVoiceCommitParams">;
export type ChatVoiceInterruptParams = SchemaType<"ChatVoiceInterruptParams">;
export type ChatVoiceStopParams = SchemaType<"ChatVoiceStopParams">;
export type ChatVoiceEvent = SchemaType<"ChatVoiceEvent">;
export type UpdateRunParams = SchemaType<"UpdateRunParams">;
export type TickEvent = SchemaType<"TickEvent">;
export type ShutdownEvent = SchemaType<"ShutdownEvent">;

View File

@@ -0,0 +1,154 @@
import { describe, expect, it, vi } from "vitest";
import type { OpenClawConfig } from "../config/config.js";
import type { RealtimeTranscriptionProviderPlugin } from "../plugins/types.js";
import { RealtimeTranscriptionSessionManager } from "./realtime-transcription-session-manager.js";
function createProvider(params?: {
id?: string;
configured?: boolean;
onCreate?: (callbacks: Record<string, unknown>) => void;
}): RealtimeTranscriptionProviderPlugin {
return {
id: params?.id ?? "openai",
label: "Test",
autoSelectOrder: 1,
resolveConfig: ({ rawConfig }) => rawConfig,
isConfigured: () => params?.configured ?? true,
createSession: (req) => {
params?.onCreate?.(req as unknown as Record<string, unknown>);
return {
connect: async () => {},
sendAudio: vi.fn(),
close: vi.fn(),
isConnected: () => true,
};
},
};
}
describe("RealtimeTranscriptionSessionManager", () => {
it("starts a session, auto-selects the first configured provider, and queues events", async () => {
let callbacks: Record<string, unknown> | undefined;
const provider = createProvider({
onCreate: (req) => {
callbacks = req;
},
});
const manager = new RealtimeTranscriptionSessionManager({
loadConfig: () => ({}) as OpenClawConfig,
listProviders: () => [provider],
getProvider: () => provider,
now: () => 123,
createId: () => "session-1",
});
const started = await manager.startSession({
format: "s16le",
sampleRate: 16000,
channels: 1,
});
expect(started).toEqual({
sessionId: "session-1",
provider: "openai",
format: "s16le",
sampleRate: 16000,
channels: 1,
});
(callbacks?.onPartial as ((value: string) => void) | undefined)?.("hello");
(callbacks?.onTranscript as ((value: string) => void) | undefined)?.("hello world");
const pulled = manager.pullEvents({ sessionId: "session-1" });
expect(pulled.events).toEqual([
{ type: "session.started", provider: "openai", transport: "gateway", timestamp: 123 },
{ type: "partial", text: "hello", timestamp: 123 },
{ type: "final", text: "hello world", timestamp: 123 },
]);
});
it("rejects unsupported audio shapes", async () => {
const provider = createProvider();
const manager = new RealtimeTranscriptionSessionManager({
loadConfig: () => ({}) as OpenClawConfig,
listProviders: () => [provider],
getProvider: () => provider,
now: () => 123,
createId: () => "session-1",
});
await expect(
manager.startSession({
format: "s16le",
sampleRate: 16000,
channels: 2,
}),
).rejects.toThrow(/mono audio/);
});
it("returns pending terminal events on finish and removes the session", async () => {
let callbacks: Record<string, unknown> | undefined;
const close = vi.fn();
const provider = createProvider({
onCreate: (req) => {
callbacks = req;
},
});
provider.createSession = (req) => {
callbacks = req as unknown as Record<string, unknown>;
return {
connect: async () => {},
sendAudio: vi.fn(),
close,
isConnected: () => false,
};
};
const manager = new RealtimeTranscriptionSessionManager({
loadConfig: () => ({}) as OpenClawConfig,
listProviders: () => [provider],
getProvider: () => provider,
now: () => 123,
createId: () => "session-1",
});
await manager.startSession({
format: "s16le",
sampleRate: 16000,
channels: 1,
});
(callbacks?.onPartial as ((value: string) => void) | undefined)?.("hello");
expect(manager.finishSession({ sessionId: "session-1" })).toEqual({
sessionId: "session-1",
provider: "openai",
closed: true,
events: [
{ type: "session.started", provider: "openai", transport: "gateway", timestamp: 123 },
{ type: "partial", text: "hello", timestamp: 123 },
{ type: "session.ended", reason: "client_finish", timestamp: 123 },
],
});
expect(close).toHaveBeenCalledTimes(1);
expect(() => manager.pullEvents({ sessionId: "session-1" })).toThrow(
/Unknown realtime transcription session/,
);
});
it("fails when no configured provider is available", async () => {
const provider = createProvider({ configured: false });
const manager = new RealtimeTranscriptionSessionManager({
loadConfig: () => ({}) as OpenClawConfig,
listProviders: () => [provider],
getProvider: () => provider,
now: () => 123,
createId: () => "session-1",
});
await expect(
manager.startSession({
format: "s16le",
sampleRate: 16000,
channels: 1,
}),
).rejects.toThrow(/No configured realtime transcription provider/);
});
});

View File

@@ -0,0 +1,297 @@
import { randomUUID } from "node:crypto";
import type { OpenClawConfig } from "../config/config.js";
import { loadConfig } from "../config/config.js";
import type { RealtimeTranscriptionProviderPlugin } from "../plugins/types.js";
import {
getRealtimeTranscriptionProvider,
listRealtimeTranscriptionProviders,
} from "../realtime-transcription/provider-registry.js";
import type {
RealtimeTranscriptionProviderConfig,
RealtimeTranscriptionSession,
} from "../realtime-transcription/provider-types.js";
type AudioFormat = "s16le" | "pcm16" | "g711_ulaw";
export type RealtimeTranscriptionSessionEvent =
| { type: "session.started"; provider: string; transport: "gateway"; timestamp: number }
| { type: "partial"; text: string; timestamp: number }
| { type: "final"; text: string; timestamp: number }
| { type: "warning"; message: string; timestamp: number }
| { type: "error"; message: string; timestamp: number }
| { type: "session.ended"; reason: string; timestamp: number };
type ManagedSession = {
id: string;
provider: string;
format: AudioFormat;
sampleRate: number;
channels: number;
session: RealtimeTranscriptionSession;
events: RealtimeTranscriptionSessionEvent[];
closed: boolean;
};
type SessionStartParams = {
provider?: string;
providerConfig?: RealtimeTranscriptionProviderConfig;
format: AudioFormat;
sampleRate: number;
channels: number;
};
type ManagerDeps = {
loadConfig: () => OpenClawConfig;
listProviders: (cfg?: OpenClawConfig) => RealtimeTranscriptionProviderPlugin[];
getProvider: (
providerId: string | undefined,
cfg?: OpenClawConfig,
) => RealtimeTranscriptionProviderPlugin | undefined;
now: () => number;
createId: () => string;
};
const defaultDeps: ManagerDeps = {
loadConfig,
listProviders: listRealtimeTranscriptionProviders,
getProvider: getRealtimeTranscriptionProvider,
now: () => Date.now(),
createId: () => randomUUID(),
};
function normalizeAudioFormat(raw: string | undefined): AudioFormat | null {
const value = raw?.trim().toLowerCase();
if (!value) {
return null;
}
if (value === "s16le" || value === "pcm16" || value === "g711_ulaw") {
return value;
}
return null;
}
function validateSessionShape(params: {
format: AudioFormat;
sampleRate: number;
channels: number;
}) {
if (!Number.isFinite(params.sampleRate) || params.sampleRate <= 0) {
throw new Error("sampleRate must be a positive number.");
}
if (!Number.isFinite(params.channels) || params.channels <= 0) {
throw new Error("channels must be a positive number.");
}
if (params.channels !== 1) {
throw new Error("realtime transcription currently requires mono audio (channels=1).");
}
if (params.format === "g711_ulaw" && params.sampleRate !== 8000) {
throw new Error("g711_ulaw realtime transcription requires sampleRate=8000.");
}
}
function sortProviders(providers: RealtimeTranscriptionProviderPlugin[]) {
return [...providers].toSorted((left, right) => {
const leftOrder = left.autoSelectOrder ?? Number.MAX_SAFE_INTEGER;
const rightOrder = right.autoSelectOrder ?? Number.MAX_SAFE_INTEGER;
if (leftOrder !== rightOrder) {
return leftOrder - rightOrder;
}
return left.id.localeCompare(right.id);
});
}
function buildProviderConfig(params: {
provider: RealtimeTranscriptionProviderPlugin;
cfg: OpenClawConfig;
providerConfig?: RealtimeTranscriptionProviderConfig;
format: AudioFormat;
}): RealtimeTranscriptionProviderConfig {
const rawConfig = {
...params.providerConfig,
...(params.format === "s16le" || params.format === "pcm16"
? { inputAudioFormat: "pcm16" }
: params.format === "g711_ulaw"
? { inputAudioFormat: "g711_ulaw" }
: {}),
};
return params.provider.resolveConfig?.({ cfg: params.cfg, rawConfig }) ?? rawConfig;
}
export class RealtimeTranscriptionSessionManager {
private readonly sessions = new Map<string, ManagedSession>();
constructor(private readonly deps: ManagerDeps = defaultDeps) {}
async startSession(params: SessionStartParams) {
validateSessionShape({
format: params.format,
sampleRate: params.sampleRate,
channels: params.channels,
});
const cfg = this.deps.loadConfig();
const provider = this.resolveProvider(params.provider, cfg, params);
const providerConfig = buildProviderConfig({
provider,
cfg,
providerConfig: params.providerConfig,
format: params.format,
});
const sessionId = this.deps.createId();
const events: RealtimeTranscriptionSessionEvent[] = [];
const queueEvent = (event: RealtimeTranscriptionSessionEvent) => {
events.push(event);
};
const session = provider.createSession({
providerConfig,
onPartial: (partial) => {
if (partial.trim()) {
queueEvent({ type: "partial", text: partial, timestamp: this.deps.now() });
}
},
onTranscript: (transcript) => {
if (transcript.trim()) {
queueEvent({ type: "final", text: transcript, timestamp: this.deps.now() });
}
},
onError: (error) => {
queueEvent({
type: "error",
message: error.message || String(error),
timestamp: this.deps.now(),
});
},
});
await session.connect();
queueEvent({
type: "session.started",
provider: provider.id,
transport: "gateway",
timestamp: this.deps.now(),
});
this.sessions.set(sessionId, {
id: sessionId,
provider: provider.id,
format: params.format,
sampleRate: params.sampleRate,
channels: params.channels,
session,
events,
closed: false,
});
return {
sessionId,
provider: provider.id,
format: params.format,
sampleRate: params.sampleRate,
channels: params.channels,
};
}
pushAudio(params: { sessionId: string; audio: Buffer }) {
const managed = this.getOpenSession(params.sessionId);
managed.session.sendAudio(params.audio);
return {
sessionId: managed.id,
acceptedBytes: params.audio.byteLength,
connected: managed.session.isConnected(),
};
}
pullEvents(params: { sessionId: string; limit?: number }) {
const managed = this.getSession(params.sessionId);
const requested = params.limit ?? (managed.events.length || 100);
const count = Math.max(1, Math.floor(requested));
const events = managed.events.splice(0, count);
return {
sessionId: managed.id,
provider: managed.provider,
connected: managed.session.isConnected(),
closed: managed.closed,
events,
};
}
finishSession(params: { sessionId: string; reason?: string }) {
const managed = this.getSession(params.sessionId);
if (!managed.closed) {
managed.closed = true;
managed.session.close();
managed.events.push({
type: "session.ended",
reason: params.reason?.trim() || "client_finish",
timestamp: this.deps.now(),
});
}
const events = managed.events.splice(0, managed.events.length);
this.sessions.delete(params.sessionId);
return {
sessionId: managed.id,
provider: managed.provider,
closed: true,
events,
};
}
private resolveProvider(
providerId: string | undefined,
cfg: OpenClawConfig,
params: SessionStartParams,
): RealtimeTranscriptionProviderPlugin {
if (providerId?.trim()) {
const provider = this.deps.getProvider(providerId, cfg);
if (!provider) {
throw new Error(`Unknown realtime transcription provider: ${providerId}`);
}
const providerConfig = buildProviderConfig({
provider,
cfg,
providerConfig: params.providerConfig,
format: params.format,
});
if (!provider.isConfigured({ cfg, providerConfig })) {
throw new Error(`Realtime transcription provider "${provider.id}" is not configured.`);
}
return provider;
}
const provider = sortProviders(this.deps.listProviders(cfg)).find((candidate) => {
const providerConfig = buildProviderConfig({
provider: candidate,
cfg,
providerConfig: params.providerConfig,
format: params.format,
});
return candidate.isConfigured({ cfg, providerConfig });
});
if (!provider) {
throw new Error("No configured realtime transcription provider is available.");
}
return provider;
}
private getSession(sessionId: string): ManagedSession {
const managed = this.sessions.get(sessionId);
if (!managed) {
throw new Error(`Unknown realtime transcription session: ${sessionId}`);
}
return managed;
}
private getOpenSession(sessionId: string): ManagedSession {
const managed = this.getSession(sessionId);
if (managed.closed) {
throw new Error(`Realtime transcription session is already closed: ${sessionId}`);
}
return managed;
}
}
const sharedManager = new RealtimeTranscriptionSessionManager();
export function getRealtimeTranscriptionSessionManager() {
return sharedManager;
}
export const __testing = {
normalizeAudioFormat,
};

View File

@@ -21,6 +21,7 @@ const EVENT_SCOPE_GUARDS: Record<string, string[]> = {
"sessions.changed": [READ_SCOPE],
"session.message": [READ_SCOPE],
"session.tool": [READ_SCOPE],
"chat.voice.event": [READ_SCOPE],
};
export type GatewayBroadcastStateVersion = {

View File

@@ -5,6 +5,7 @@ import { loadConfig } from "../config/config.js";
import { type AgentEventPayload, getAgentRunContext } from "../infra/agent-events.js";
import { resolveHeartbeatVisibility } from "../infra/heartbeat-visibility.js";
import { stripInlineDirectiveTagsForDisplay } from "../utils/directive-tags.js";
import { getChatVoiceSessionByRunId, setChatVoiceRunId } from "./chat-voice-sessions.js";
import { loadGatewaySessionRow } from "./server-chat.load-gateway-session-row.runtime.js";
import { persistGatewaySessionLifecycleEvent } from "./server-chat.persist-session-lifecycle.runtime.js";
import { deriveGatewaySessionLifecycleSnapshot } from "./session-lifecycle-state.js";
@@ -948,6 +949,72 @@ export function createAgentEventHandler({
}
if (!isAborted && evt.stream === "assistant" && typeof evt.data?.text === "string") {
emitChatDelta(sessionKey, clientRunId, evt.runId, evt.seq, evt.data.text, evt.data.delta);
} else if (!isAborted && (lifecyclePhase === "end" || lifecyclePhase === "error")) {
const evtStopReason =
typeof evt.data?.stopReason === "string" ? evt.data.stopReason : undefined;
if (chatLink) {
const finished = chatRunState.registry.shift(evt.runId);
if (!finished) {
clearAgentRunContext(evt.runId);
return;
}
emitChatFinal(
finished.sessionKey,
finished.clientRunId,
evt.runId,
evt.seq,
lifecyclePhase === "error" ? "error" : "done",
evt.data?.error,
evtStopReason,
);
} else {
emitChatFinal(
sessionKey,
eventRunId,
evt.runId,
evt.seq,
lifecyclePhase === "error" ? "error" : "done",
evt.data?.error,
evtStopReason,
);
}
const voiceSession = getChatVoiceSessionByRunId(clientRunId);
if (voiceSession) {
setChatVoiceRunId(voiceSession.sessionKey, null);
broadcastToConnIds(
"chat.voice.event",
{
sessionKey: voiceSession.sessionKey,
state: "assistant_completed",
runId: clientRunId,
playbackEnabled: voiceSession.playbackEnabled,
},
new Set([voiceSession.connId]),
);
}
} else if (isAborted && (lifecyclePhase === "end" || lifecyclePhase === "error")) {
chatRunState.abortedRuns.delete(clientRunId);
chatRunState.abortedRuns.delete(evt.runId);
chatRunState.buffers.delete(clientRunId);
chatRunState.deltaSentAt.delete(clientRunId);
if (chatLink) {
chatRunState.registry.remove(evt.runId, clientRunId, sessionKey);
}
const voiceSession = getChatVoiceSessionByRunId(clientRunId);
if (voiceSession) {
setChatVoiceRunId(voiceSession.sessionKey, null);
broadcastToConnIds(
"chat.voice.event",
{
sessionKey: voiceSession.sessionKey,
state: "interrupted",
runId: clientRunId,
playbackEnabled: voiceSession.playbackEnabled,
},
new Set([voiceSession.connId]),
{ dropIfSlow: true },
);
}
}
}

View File

@@ -17,6 +17,10 @@ const BASE_METHODS = [
"tts.disable",
"tts.convert",
"tts.setProvider",
"realtimeTranscription.start",
"realtimeTranscription.pushAudio",
"realtimeTranscription.pull",
"realtimeTranscription.finish",
"config.get",
"config.set",
"config.apply",
@@ -118,6 +122,11 @@ const BASE_METHODS = [
"chat.history",
"chat.abort",
"chat.send",
"chat.voice.start",
"chat.voice.audio",
"chat.voice.commit",
"chat.voice.interrupt",
"chat.voice.stop",
];
export function listGatewayMethods(): string[] {
@@ -129,6 +138,7 @@ export const GATEWAY_EVENTS = [
"connect.challenge",
"agent",
"chat",
"chat.voice.event",
"session.message",
"session.tool",
"sessions.changed",

View File

@@ -20,6 +20,7 @@ import { modelsHandlers } from "./server-methods/models.js";
import { nodePendingHandlers } from "./server-methods/nodes-pending.js";
import { nodeHandlers } from "./server-methods/nodes.js";
import { pushHandlers } from "./server-methods/push.js";
import { realtimeTranscriptionHandlers } from "./server-methods/realtime-transcription.js";
import { sendHandlers } from "./server-methods/send.js";
import { sessionsHandlers } from "./server-methods/sessions.js";
import { skillsHandlers } from "./server-methods/skills.js";
@@ -84,6 +85,7 @@ export const coreGatewayHandlers: GatewayRequestHandlers = {
...toolsCatalogHandlers,
...toolsEffectiveHandlers,
...ttsHandlers,
...realtimeTranscriptionHandlers,
...skillsHandlers,
...sessionsHandlers,
...systemHandlers,

View File

@@ -1,3 +1,4 @@
import { randomUUID } from "node:crypto";
import fs from "node:fs";
import path from "node:path";
import { CURRENT_SESSION_VERSION, SessionManager } from "@mariozechner/pi-coding-agent";
@@ -19,6 +20,8 @@ import { jsonUtf8Bytes } from "../../infra/json-utf8-bytes.js";
import type { PromptImageOrderEntry } from "../../media/prompt-image-order.js";
import { type SavedMedia, saveMediaBuffer } from "../../media/store.js";
import { createChannelReplyPipeline } from "../../plugin-sdk/channel-reply-pipeline.js";
import { getRealtimeTranscriptionProvider } from "../../plugin-sdk/realtime-transcription.js";
import type { RealtimeTranscriptionSession } from "../../realtime-transcription/provider-types.js";
import { normalizeInputProvenance, type InputProvenance } from "../../sessions/input-provenance.js";
import { resolveSendPolicy } from "../../sessions/send-policy.js";
import { parseAgentSessionKey } from "../../sessions/session-key-utils.js";
@@ -48,6 +51,13 @@ import {
parseMessageWithAttachments,
} from "../chat-attachments.js";
import { stripEnvelopeFromMessage, stripEnvelopeFromMessages } from "../chat-sanitize.js";
import {
deleteChatVoiceSession,
getChatVoiceSession,
setChatVoiceRunId,
setChatVoiceSession,
type ChatVoiceEventPayload,
} from "../chat-voice-sessions.js";
import { augmentChatHistoryWithCliSessionImports } from "../cli-session-history.js";
import { ADMIN_SCOPE } from "../method-scopes.js";
import {
@@ -57,6 +67,11 @@ import {
hasGatewayClientCap,
} from "../protocol/client-info.js";
import {
validateChatVoiceAudioParams,
validateChatVoiceCommitParams,
validateChatVoiceInterruptParams,
validateChatVoiceStartParams,
validateChatVoiceStopParams,
ErrorCodes,
errorShape,
formatValidationErrors,
@@ -1011,6 +1026,88 @@ function normalizeOptionalText(value?: string | null): string | undefined {
return trimmed || undefined;
}
function getActiveChatVoiceCallbackSession(params: {
sessionKey: string;
connId: string;
sttSession: RealtimeTranscriptionSession;
}) {
const active = getChatVoiceSession(params.sessionKey);
if (!active || active.connId !== params.connId || active.sttSession !== params.sttSession) {
return undefined;
}
return active;
}
function isStrictBase64(value: string): boolean {
const normalized = value.replace(/\s+/g, "");
if (!normalized || normalized.length % 4 !== 0) {
return false;
}
if (!/^[A-Za-z0-9+/]+={0,2}$/.test(normalized)) {
return false;
}
const decoded = Buffer.from(normalized, "base64");
return decoded.length > 0 && decoded.toString("base64") === normalized;
}
function parseStrictBase64AudioBuffer(value: unknown): Buffer {
const audio = typeof value === "string" ? value.trim() : "";
if (!audio) {
throw new Error("audio is required.");
}
if (!isStrictBase64(audio)) {
throw new Error("audio must be base64 encoded.");
}
return Buffer.from(audio, "base64");
}
function resolveControlUiVoiceConfig(cfg: ReturnType<typeof loadSessionEntry>["cfg"]) {
return cfg.gateway?.controlUi?.voice;
}
function emitChatVoiceEvent(
context: GatewayRequestContext,
connId: string,
payload: ChatVoiceEventPayload,
) {
context.broadcastToConnIds("chat.voice.event", payload, new Set([connId]));
}
async function closeChatVoiceSession(params: {
context: GatewayRequestContext;
sessionKey: string;
connId: string;
emitClosed?: boolean;
errorMessage?: string;
}) {
const entry = deleteChatVoiceSession(params.sessionKey);
if (!entry) {
return;
}
try {
entry.sttSession.close();
} catch (err) {
params.context.logGateway.debug(
`chat.voice session close cleanup failed: ${formatForLog(err)}`,
);
}
if (params.errorMessage) {
emitChatVoiceEvent(params.context, params.connId, {
sessionKey: params.sessionKey,
state: "error",
errorMessage: params.errorMessage,
playbackEnabled: entry.playbackEnabled,
});
}
if (params.emitClosed !== false) {
emitChatVoiceEvent(params.context, params.connId, {
sessionKey: params.sessionKey,
state: "closed",
playbackEnabled: entry.playbackEnabled,
});
}
}
function normalizeExplicitChatSendOrigin(
params: ChatSendExplicitOrigin,
): { ok: true; value?: ChatSendExplicitOrigin } | { ok: false; error: string } {
@@ -1954,6 +2051,425 @@ export const chatHandlers: GatewayRequestHandlers = {
});
}
},
"chat.voice.start": async ({ params, respond, context, client }) => {
if (!validateChatVoiceStartParams(params)) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`invalid chat.voice.start params: ${formatValidationErrors(validateChatVoiceStartParams.errors)}`,
),
);
return;
}
const connId = normalizeOptionalText(client?.connId);
if (!connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice requires connId"));
return;
}
const { sessionKey: rawSessionKey } = params as { sessionKey: string };
const { cfg, canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
const voiceConfig = resolveControlUiVoiceConfig(cfg);
if (voiceConfig?.enabled !== true) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "web voice is disabled"));
return;
}
const providerId = normalizeOptionalText(voiceConfig.transcriptionProvider);
if (!providerId) {
respond(
false,
undefined,
errorShape(ErrorCodes.INVALID_REQUEST, "voice transcription provider is not configured"),
);
return;
}
const provider = getRealtimeTranscriptionProvider(providerId, cfg);
if (!provider) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`voice transcription provider not found: ${providerId}`,
),
);
return;
}
const modelProviderConfig =
provider.id === "microsoft-foundry"
? cfg.models?.providers?.["microsoft-foundry"]
: cfg.models?.providers?.[provider.id];
const providerConfig = {
providers: {
[provider.id]: {
...modelProviderConfig,
...voiceConfig.providers?.[provider.id],
inputAudioFormat: "pcm16",
},
},
};
if (!provider.isConfigured({ cfg, providerConfig })) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`voice transcription provider is not configured: ${provider.id}`,
),
);
return;
}
const existing = getChatVoiceSession(sessionKey);
if (existing?.connId === connId) {
await closeChatVoiceSession({
context,
sessionKey,
connId,
emitClosed: false,
});
}
const playbackEnabled = voiceConfig.playbackEnabled !== false;
try {
let sttSession: RealtimeTranscriptionSession;
sttSession = provider.createSession({
providerConfig,
onSpeechStart: () => {
const active = getActiveChatVoiceCallbackSession({ sessionKey, connId, sttSession });
if (!active) {
return;
}
active.transcriptPartial = "";
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "speech_start",
playbackEnabled: active.playbackEnabled,
});
},
onPartial: (partial) => {
const active = getActiveChatVoiceCallbackSession({ sessionKey, connId, sttSession });
if (!active) {
return;
}
active.transcriptPartial = partial;
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "partial_transcript",
transcript: partial,
playbackEnabled: active.playbackEnabled,
});
},
onTranscript: (transcript) => {
const active = getActiveChatVoiceCallbackSession({ sessionKey, connId, sttSession });
if (!active) {
return;
}
active.transcriptFinal = transcript;
active.transcriptPartial = "";
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "final_transcript",
transcript,
playbackEnabled: active.playbackEnabled,
});
},
onError: (error) => {
const active = getActiveChatVoiceCallbackSession({ sessionKey, connId, sttSession });
if (!active) {
return;
}
void closeChatVoiceSession({
context,
sessionKey,
connId,
errorMessage: error.message || String(error),
});
},
});
await sttSession.connect();
setChatVoiceSession({
sessionKey,
connId,
providerId: provider.id,
playbackEnabled,
sttSession,
transcriptPartial: "",
transcriptFinal: "",
activeRunId: null,
});
respond(true, {
ok: true,
providerId: provider.id,
playbackEnabled,
});
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "ready",
playbackEnabled,
});
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, String(err)));
context.logGateway.warn(`chat.voice.start failed: ${formatForLog(err)}`);
}
},
"chat.voice.audio": ({ params, respond, client }) => {
if (!validateChatVoiceAudioParams(params)) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`invalid chat.voice.audio params: ${formatValidationErrors(validateChatVoiceAudioParams.errors)}`,
),
);
return;
}
const connId = normalizeOptionalText(client?.connId);
if (!connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice requires connId"));
return;
}
const {
sessionKey: rawSessionKey,
audio,
format,
} = params as {
sessionKey: string;
audio: string;
format?: string;
};
const { canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
const entry = getChatVoiceSession(sessionKey);
if (!entry || entry.connId !== connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice session not found"));
return;
}
if (format && format.toLowerCase() !== "pcm16") {
respond(
false,
undefined,
errorShape(ErrorCodes.INVALID_REQUEST, `unsupported voice audio format: ${format}`),
);
return;
}
let audioBuffer: Buffer;
try {
audioBuffer = parseStrictBase64AudioBuffer(audio);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, String(err)));
return;
}
try {
entry.sttSession.sendAudio(audioBuffer);
respond(true, { ok: true });
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.UNAVAILABLE, String(err)));
}
},
"chat.voice.commit": async ({ params, req, respond, context, client }) => {
if (!validateChatVoiceCommitParams(params)) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`invalid chat.voice.commit params: ${formatValidationErrors(validateChatVoiceCommitParams.errors)}`,
),
);
return;
}
const connId = normalizeOptionalText(client?.connId);
if (!connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice requires connId"));
return;
}
const { sessionKey: rawSessionKey, transcript: transcriptOverride } = params as {
sessionKey: string;
transcript?: string;
};
const { canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
const entry = getChatVoiceSession(sessionKey);
if (!entry || entry.connId !== connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice session not found"));
return;
}
if (entry.activeRunId) {
respond(true, { ok: false, status: "in_flight", runId: entry.activeRunId });
return;
}
const transcript = (
transcriptOverride ??
entry.transcriptFinal ??
entry.transcriptPartial
).trim();
if (!transcript) {
respond(
false,
undefined,
errorShape(ErrorCodes.INVALID_REQUEST, "voice transcript is empty"),
);
return;
}
const runId = randomUUID();
const voiceSendResult = await new Promise<{
ok: boolean;
payload?: unknown;
error?: ReturnType<typeof errorShape>;
}>((resolve) => {
void chatHandlers["chat.send"]({
req,
params: {
sessionKey,
message: transcript,
deliver: false,
idempotencyKey: runId,
},
client,
isWebchatConnect: () => false,
context,
respond: (ok, payload, error) => resolve({ ok, payload, error }),
});
});
if (!voiceSendResult.ok) {
respond(false, voiceSendResult.payload, voiceSendResult.error);
return;
}
entry.transcriptFinal = "";
entry.transcriptPartial = "";
setChatVoiceRunId(sessionKey, runId);
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "assistant_started",
runId,
playbackEnabled: entry.playbackEnabled,
});
respond(true, {
ok: true,
runId,
transcript,
playbackEnabled: entry.playbackEnabled,
result: voiceSendResult.payload,
});
},
"chat.voice.interrupt": ({ params, req, respond, context, client }) => {
if (!validateChatVoiceInterruptParams(params)) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`invalid chat.voice.interrupt params: ${formatValidationErrors(validateChatVoiceInterruptParams.errors)}`,
),
);
return;
}
const connId = normalizeOptionalText(client?.connId);
if (!connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice requires connId"));
return;
}
const { sessionKey: rawSessionKey } = params as { sessionKey: string };
const { canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
const entry = getChatVoiceSession(sessionKey);
if (!entry || entry.connId !== connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice session not found"));
return;
}
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "playback_clear",
playbackEnabled: entry.playbackEnabled,
});
const runId = entry.activeRunId;
if (!runId) {
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "interrupted",
playbackEnabled: entry.playbackEnabled,
});
respond(true, { ok: true, aborted: false });
return;
}
void chatHandlers["chat.abort"]({
req,
params: {
sessionKey,
runId,
},
client,
isWebchatConnect: () => false,
context,
respond: () => undefined,
});
setChatVoiceRunId(sessionKey, null);
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "interrupted",
runId,
playbackEnabled: entry.playbackEnabled,
});
respond(true, { ok: true, aborted: true, runId });
},
"chat.voice.stop": async ({ params, req, respond, context, client }) => {
if (!validateChatVoiceStopParams(params)) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
`invalid chat.voice.stop params: ${formatValidationErrors(validateChatVoiceStopParams.errors)}`,
),
);
return;
}
const connId = normalizeOptionalText(client?.connId);
if (!connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice requires connId"));
return;
}
const { sessionKey: rawSessionKey } = params as { sessionKey: string };
const { canonicalKey: sessionKey } = loadSessionEntry(rawSessionKey);
const entry = getChatVoiceSession(sessionKey);
if (!entry || entry.connId !== connId) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, "voice session not found"));
return;
}
emitChatVoiceEvent(context, connId, {
sessionKey,
state: "playback_clear",
playbackEnabled: entry.playbackEnabled,
});
if (entry.activeRunId) {
void chatHandlers["chat.abort"]({
req,
params: {
sessionKey,
runId: entry.activeRunId,
},
client,
isWebchatConnect: () => false,
context,
respond: () => undefined,
});
}
setChatVoiceRunId(sessionKey, null);
await closeChatVoiceSession({
context,
sessionKey,
connId,
});
respond(true, { ok: true });
},
"chat.inject": async ({ params, respond, context }) => {
if (!validateChatInjectParams(params)) {
respond(

View File

@@ -0,0 +1,202 @@
import { afterEach, describe, expect, it, vi } from "vitest";
import {
deleteChatVoiceSession,
getChatVoiceSession,
setChatVoiceSession,
} from "../chat-voice-sessions.js";
import { ErrorCodes } from "../protocol/index.js";
const mockState = vi.hoisted(() => ({
cfg: {
gateway: {
controlUi: {
voice: {
enabled: true,
transcriptionProvider: "mock-stt",
playbackEnabled: true,
},
},
},
models: {
providers: {
"mock-stt": {},
},
},
} as Record<string, unknown>,
provider: null as {
id: string;
isConfigured: ReturnType<typeof vi.fn>;
createSession: ReturnType<typeof vi.fn>;
} | null,
}));
vi.mock("../session-utils.js", async () => {
const original =
await vi.importActual<typeof import("../session-utils.js")>("../session-utils.js");
return {
...original,
loadSessionEntry: (rawKey: string) => ({
cfg: mockState.cfg,
storePath: "/tmp/sessions.json",
entry: {
sessionId: "sess-voice-1",
sessionFile: "/tmp/sess-voice-1.jsonl",
},
canonicalKey: rawKey || "main",
}),
};
});
vi.mock("../../plugin-sdk/realtime-transcription.js", () => ({
getRealtimeTranscriptionProvider: vi.fn(() => mockState.provider),
}));
const { chatHandlers } = await import("./chat.js");
function createContext() {
return {
broadcastToConnIds: vi.fn(),
logGateway: {
warn: vi.fn(),
debug: vi.fn(),
},
};
}
function createClient(connId = "conn-1") {
return { connId } as const;
}
afterEach(() => {
vi.restoreAllMocks();
deleteChatVoiceSession("main");
mockState.provider = null;
});
describe("chat voice handlers", () => {
it("ignores stale onError callbacks from replaced voice sessions", async () => {
const callbacks: Array<{
onError?: (error: Error) => void;
}> = [];
const sessions = [
{
connect: vi.fn(async () => undefined),
sendAudio: vi.fn(),
close: vi.fn(),
isConnected: vi.fn(() => true),
},
{
connect: vi.fn(async () => undefined),
sendAudio: vi.fn(),
close: vi.fn(),
isConnected: vi.fn(() => true),
},
];
mockState.provider = {
id: "mock-stt",
isConfigured: vi.fn(() => true),
createSession: vi.fn((params) => {
callbacks.push(params);
return sessions[callbacks.length - 1];
}),
};
const context = createContext();
const respond = vi.fn();
await chatHandlers["chat.voice.start"]({
params: { sessionKey: "main" },
respond,
context: context as never,
client: createClient(),
} as never);
await chatHandlers["chat.voice.start"]({
params: { sessionKey: "main" },
respond,
context: context as never,
client: createClient(),
} as never);
expect(getChatVoiceSession("main")?.sttSession).toBe(sessions[1]);
callbacks[0].onError?.(new Error("late"));
expect(getChatVoiceSession("main")?.sttSession).toBe(sessions[1]);
});
it("rejects malformed base64 audio before forwarding to the session", async () => {
const sendAudio = vi.fn();
setChatVoiceSession({
sessionKey: "main",
connId: "conn-1",
providerId: "mock-stt",
playbackEnabled: true,
sttSession: {
connect: vi.fn(async () => undefined),
sendAudio,
close: vi.fn(),
isConnected: vi.fn(() => true),
},
transcriptPartial: "",
transcriptFinal: "",
activeRunId: null,
});
const respond = vi.fn();
await chatHandlers["chat.voice.audio"]({
params: { sessionKey: "main", audio: "not@base64", format: "pcm16" },
respond,
client: createClient(),
} as never);
expect(sendAudio).not.toHaveBeenCalled();
expect(respond).toHaveBeenCalledWith(
false,
undefined,
expect.objectContaining({
code: ErrorCodes.INVALID_REQUEST,
message: expect.stringContaining("base64"),
}),
);
});
it("preserves buffered transcript when commit send fails", async () => {
const sttSession = {
connect: vi.fn(async () => undefined),
sendAudio: vi.fn(),
close: vi.fn(),
isConnected: vi.fn(() => true),
};
setChatVoiceSession({
sessionKey: "main",
connId: "conn-1",
providerId: "mock-stt",
playbackEnabled: true,
sttSession,
transcriptPartial: "draft tail",
transcriptFinal: "hello from voice",
activeRunId: null,
});
vi.spyOn(chatHandlers, "chat.send").mockImplementation(async ({ respond }) => {
respond(false, undefined, { code: ErrorCodes.UNAVAILABLE, message: "send failed" } as never);
});
const respond = vi.fn();
await chatHandlers["chat.voice.commit"]({
params: { sessionKey: "main" },
req: {} as never,
respond,
context: createContext() as never,
client: createClient(),
} as never);
expect(getChatVoiceSession("main")).toMatchObject({
transcriptFinal: "hello from voice",
transcriptPartial: "draft tail",
});
expect(respond).toHaveBeenCalledWith(
false,
undefined,
expect.objectContaining({ code: ErrorCodes.UNAVAILABLE }),
);
});
});

View File

@@ -0,0 +1,140 @@
import { beforeEach, describe, expect, it, vi } from "vitest";
const mocks = vi.hoisted(() => ({
manager: {
startSession: vi.fn(),
pushAudio: vi.fn(),
pullEvents: vi.fn(),
finishSession: vi.fn(),
},
}));
vi.mock("../realtime-transcription-session-manager.js", () => ({
getRealtimeTranscriptionSessionManager: () => mocks.manager,
__testing: {
normalizeAudioFormat: (value: string | undefined) =>
value === "s16le" || value === "pcm16" || value === "g711_ulaw" ? value : null,
},
}));
import { realtimeTranscriptionHandlers } from "./realtime-transcription.js";
describe("realtimeTranscriptionHandlers", () => {
beforeEach(() => {
mocks.manager.startSession.mockReset();
mocks.manager.pushAudio.mockReset();
mocks.manager.pullEvents.mockReset();
mocks.manager.finishSession.mockReset();
});
it("starts a session with validated audio metadata", async () => {
mocks.manager.startSession.mockResolvedValue({ sessionId: "s1", provider: "openai" });
const respond = vi.fn();
await realtimeTranscriptionHandlers["realtimeTranscription.start"]({
req: { method: "realtimeTranscription.start", id: "1" } as never,
params: { format: "s16le", sampleRate: 16000, channels: 1 },
client: null,
isWebchatConnect: () => false,
respond,
context: {} as never,
});
expect(mocks.manager.startSession).toHaveBeenCalledWith({
provider: undefined,
providerConfig: undefined,
format: "s16le",
sampleRate: 16000,
channels: 1,
});
expect(respond).toHaveBeenCalledWith(true, { sessionId: "s1", provider: "openai" });
});
it("rejects invalid start formats", async () => {
const respond = vi.fn();
await realtimeTranscriptionHandlers["realtimeTranscription.start"]({
req: { method: "realtimeTranscription.start", id: "1" } as never,
params: { format: "wav", sampleRate: 16000, channels: 1 },
client: null,
isWebchatConnect: () => false,
respond,
context: {} as never,
});
expect(mocks.manager.startSession).not.toHaveBeenCalled();
expect(respond.mock.calls[0]?.[0]).toBe(false);
});
it("pushes audio chunks to an existing session", async () => {
mocks.manager.pushAudio.mockReturnValue({ sessionId: "s1", acceptedBytes: 4, connected: true });
const respond = vi.fn();
await realtimeTranscriptionHandlers["realtimeTranscription.pushAudio"]({
req: { method: "realtimeTranscription.pushAudio", id: "2" } as never,
params: { sessionId: "s1", audio: Buffer.from("test").toString("base64") },
client: null,
isWebchatConnect: () => false,
respond,
context: {} as never,
});
expect(mocks.manager.pushAudio).toHaveBeenCalledWith({
sessionId: "s1",
audio: expect.any(Buffer),
});
expect(respond).toHaveBeenCalledWith(
true,
expect.objectContaining({ sessionId: "s1", acceptedBytes: 4 }),
);
});
it("rejects malformed base64 audio payloads before forwarding to the manager", async () => {
const respond = vi.fn();
await realtimeTranscriptionHandlers["realtimeTranscription.pushAudio"]({
req: { method: "realtimeTranscription.pushAudio", id: "2b" } as never,
params: { sessionId: "s1", audio: "%%%not-base64%%%" },
client: null,
isWebchatConnect: () => false,
respond,
context: {} as never,
});
expect(mocks.manager.pushAudio).not.toHaveBeenCalled();
expect(respond.mock.calls[0]?.[0]).toBe(false);
expect(JSON.stringify(respond.mock.calls[0]?.[2] ?? {})).toContain("audio must be base64 encoded");
});
it("returns final events from finish and lets the manager clean up immediately", async () => {
mocks.manager.finishSession.mockReturnValue({
sessionId: "s1",
provider: "openai",
closed: true,
events: [{ type: "session.ended", reason: "client_finish", timestamp: 123 }],
});
const respond = vi.fn();
await realtimeTranscriptionHandlers["realtimeTranscription.finish"]({
req: { method: "realtimeTranscription.finish", id: "3" } as never,
params: { sessionId: "s1" },
client: null,
isWebchatConnect: () => false,
respond,
context: {} as never,
});
expect(mocks.manager.finishSession).toHaveBeenCalledWith({
sessionId: "s1",
reason: undefined,
});
expect(respond).toHaveBeenCalledWith(
true,
expect.objectContaining({
sessionId: "s1",
closed: true,
events: [{ type: "session.ended", reason: "client_finish", timestamp: 123 }],
}),
);
});
});

View File

@@ -0,0 +1,118 @@
import { ErrorCodes, errorShape } from "../protocol/index.js";
import {
getRealtimeTranscriptionSessionManager,
__testing as managerTesting,
} from "../realtime-transcription-session-manager.js";
import { formatForLog } from "../ws-log.js";
import type { GatewayRequestHandlers } from "./types.js";
function parsePositiveNumber(value: unknown, name: string): number {
const number =
typeof value === "number"
? value
: typeof value === "string" && value.trim()
? Number(value)
: Number.NaN;
if (!Number.isFinite(number) || number <= 0) {
throw new Error(`${name} must be a positive number.`);
}
return number;
}
function parseSessionId(value: unknown): string {
const sessionId = typeof value === "string" ? value.trim() : "";
if (!sessionId) {
throw new Error("sessionId is required.");
}
return sessionId;
}
function parseAudioBuffer(value: unknown): Buffer {
const audio = typeof value === "string" ? value.trim() : "";
if (!audio) {
throw new Error("audio is required.");
}
if (!isStrictBase64(audio)) {
throw new Error("audio must be base64 encoded.");
}
return Buffer.from(audio, "base64");
}
function isStrictBase64(value: string): boolean {
const normalized = value.replace(/\s+/g, "");
if (!normalized || normalized.length % 4 !== 0) {
return false;
}
if (!/^[A-Za-z0-9+/]+={0,2}$/.test(normalized)) {
return false;
}
const decoded = Buffer.from(normalized, "base64");
return decoded.length > 0 && decoded.toString("base64") === normalized;
}
export const realtimeTranscriptionHandlers: GatewayRequestHandlers = {
"realtimeTranscription.start": async ({ params, respond }) => {
try {
const format = managerTesting.normalizeAudioFormat(
typeof params.format === "string" ? params.format : undefined,
);
if (!format) {
respond(
false,
undefined,
errorShape(
ErrorCodes.INVALID_REQUEST,
"format is required and must be one of: s16le, pcm16, g711_ulaw",
),
);
return;
}
const result = await getRealtimeTranscriptionSessionManager().startSession({
provider: typeof params.provider === "string" ? params.provider.trim() : undefined,
providerConfig:
params.providerConfig && typeof params.providerConfig === "object"
? (params.providerConfig as Record<string, unknown>)
: undefined,
format,
sampleRate: parsePositiveNumber(params.sampleRate, "sampleRate"),
channels: parsePositiveNumber(params.channels, "channels"),
});
respond(true, result);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, formatForLog(err)));
}
},
"realtimeTranscription.pushAudio": async ({ params, respond }) => {
try {
const result = getRealtimeTranscriptionSessionManager().pushAudio({
sessionId: parseSessionId(params.sessionId),
audio: parseAudioBuffer(params.audio),
});
respond(true, result);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, formatForLog(err)));
}
},
"realtimeTranscription.pull": async ({ params, respond }) => {
try {
const result = getRealtimeTranscriptionSessionManager().pullEvents({
sessionId: parseSessionId(params.sessionId),
limit: params.limit === undefined ? undefined : parsePositiveNumber(params.limit, "limit"),
});
respond(true, result);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, formatForLog(err)));
}
},
"realtimeTranscription.finish": async ({ params, respond }) => {
try {
const result = getRealtimeTranscriptionSessionManager().finishSession({
sessionId: parseSessionId(params.sessionId),
reason: typeof params.reason === "string" ? params.reason : undefined,
});
respond(true, result);
} catch (err) {
respond(false, undefined, errorShape(ErrorCodes.INVALID_REQUEST, formatForLog(err)));
}
},
};

View File

@@ -9,6 +9,7 @@ import {
getTtsProvider,
isTtsEnabled,
isTtsProviderConfigured,
resolveExplicitTtsOverrides,
resolveTtsAutoMode,
resolveTtsConfig,
resolveTtsPrefsPath,
@@ -89,7 +90,22 @@ export const ttsHandlers: GatewayRequestHandlers = {
try {
const cfg = loadConfig();
const channel = typeof params.channel === "string" ? params.channel.trim() : undefined;
const result = await textToSpeech({ text, cfg, channel });
const providerRaw = typeof params.provider === "string" ? params.provider.trim() : undefined;
const modelId = typeof params.modelId === "string" ? params.modelId.trim() : undefined;
const voiceId = typeof params.voiceId === "string" ? params.voiceId.trim() : undefined;
const overrides = resolveExplicitTtsOverrides({
cfg,
provider: providerRaw,
modelId,
voiceId,
});
const result = await textToSpeech({
text,
cfg,
channel,
overrides,
disableFallback: Boolean(overrides.provider || modelId || voiceId),
});
if (result.success && result.audioPath) {
respond(true, {
audioPath: result.audioPath,

View File

@@ -8,6 +8,7 @@ import { truncateUtf16Safe } from "../../utils.js";
import { isWebchatClient } from "../../utils/message-channel.js";
import type { AuthRateLimiter } from "../auth-rate-limit.js";
import type { ResolvedGatewayAuth } from "../auth.js";
import { closeChatVoiceSessionsForConn } from "../chat-voice-sessions.js";
import { getPreauthHandshakeTimeoutMsFromEnv } from "../handshake-timeouts.js";
import { isLoopbackAddress } from "../net.js";
import type { GatewayRequestContext, GatewayRequestHandlers } from "../server-methods/types.js";
@@ -270,6 +271,9 @@ export function attachGatewayWsConnectionHandler(params: AttachGatewayWsConnecti
}
const context = buildRequestContext();
context.unsubscribeAllSessionEvents(connId);
closeChatVoiceSessionsForConn(connId, (targetConnId, payload) => {
context.broadcastToConnIds("chat.voice.event", payload, new Set([targetConnId]));
});
if (client?.connect?.role === "node") {
const nodeId = context.nodeRegistry.unregister(connId);
if (nodeId) {

View File

@@ -121,6 +121,43 @@ describe("runCapability auto audio entries", () => {
expect(seenModel).toBe("whisper-1");
});
it("lets per-request transcription hints override configured model-entry hints", async () => {
let seenLanguage: string | undefined;
let seenPrompt: string | undefined;
const result = await runAutoAudioCase({
transcribeAudio: async (req) => {
seenLanguage = req.language;
seenPrompt = req.prompt;
return { text: "ok", model: req.model ?? "unknown" };
},
cfgExtra: {
tools: {
media: {
audio: {
enabled: true,
prompt: "configured prompt",
language: "fr",
_requestPromptOverride: "Focus on names",
_requestLanguageOverride: "en",
models: [
{
provider: "openai",
model: "whisper-1",
prompt: "entry prompt",
language: "de",
},
],
},
},
},
} as Partial<OpenClawConfig>,
});
expect(result.outputs[0]?.text).toBe("ok");
expect(seenLanguage).toBe("en");
expect(seenPrompt).toBe("Focus on names");
});
it("uses mistral when only mistral key is configured", async () => {
const isolatedAgentDir = await fs.mkdtemp(path.join(os.tmpdir(), "openclaw-audio-agent-"));
let runResult: Awaited<ReturnType<typeof runCapability>> | undefined;

View File

@@ -0,0 +1,67 @@
import { afterEach, beforeAll, beforeEach, describe, expect, it, vi } from "vitest";
import type { OpenClawConfig } from "../config/config.js";
import { withAudioFixture } from "./runner.test-utils.js";
const runExecMock = vi.hoisted(() => vi.fn());
vi.mock("../process/exec.js", () => ({
runExec: (...args: unknown[]) => runExecMock(...args),
}));
let runCliEntry: typeof import("./runner.entries.js").runCliEntry;
describe("media-understanding CLI audio entry", () => {
beforeAll(async () => {
({ runCliEntry } = await import("./runner.entries.js"));
});
beforeEach(() => {
runExecMock.mockReset().mockResolvedValue({ stdout: "cli transcript" });
});
afterEach(() => {
vi.clearAllMocks();
});
it("applies per-request prompt and language overrides to CLI transcription templating", async () => {
await withAudioFixture("openclaw-cli-audio", async ({ ctx, cache }) => {
await runCliEntry({
capability: "audio",
entry: {
type: "cli",
command: "mock-transcriber",
args: ["--prompt", "{{Prompt}}", "--language", "{{Language}}", "--file", "{{MediaPath}}"],
prompt: "entry prompt",
language: "de",
},
cfg: {
tools: {
media: {
audio: {
prompt: "configured prompt",
language: "fr",
_requestPromptOverride: "Focus on names",
_requestLanguageOverride: "en",
},
},
},
} as OpenClawConfig,
ctx,
attachmentIndex: 0,
cache,
config: {
prompt: "configured prompt",
language: "fr",
_requestPromptOverride: "Focus on names",
_requestLanguageOverride: "en",
} as never,
});
});
expect(runExecMock).toHaveBeenCalledWith(
"mock-transcriber",
expect.arrayContaining(["--prompt", "Focus on names", "--language", "en"]),
expect.any(Object),
);
});
});

View File

@@ -372,6 +372,20 @@ function resolveEntryRunOptions(params: {
return { maxBytes, maxChars, timeoutMs, prompt };
}
function resolveAudioRequestOverrides(config: MediaUnderstandingConfig | undefined): {
prompt?: string;
language?: string;
} {
const overrides = (config ?? {}) as MediaUnderstandingConfig & {
_requestPromptOverride?: string;
_requestLanguageOverride?: string;
};
return {
prompt: overrides._requestPromptOverride,
language: overrides._requestLanguageOverride,
};
}
async function resolveProviderExecutionAuth(params: {
providerId: string;
cfg: OpenClawConfig;
@@ -530,6 +544,7 @@ export async function runProviderEntry(params: {
throw new Error(`Audio transcription provider "${providerId}" not available.`);
}
const transcribeAudio = provider.transcribeAudio;
const requestOverrides = resolveAudioRequestOverrides(params.config);
const media = await params.cache.getBuffer({
attachmentIndex: params.attachmentIndex,
maxBytes,
@@ -569,8 +584,12 @@ export async function runProviderEntry(params: {
headers,
request,
model,
language: entry.language ?? params.config?.language ?? cfg.tools?.media?.audio?.language,
prompt,
language:
requestOverrides.language ??
entry.language ??
params.config?.language ??
cfg.tools?.media?.audio?.language,
prompt: requestOverrides.prompt ?? prompt,
query: providerQuery,
timeoutMs,
fetchFn,
@@ -651,6 +670,7 @@ export async function runCliEntry(params: {
if (!command) {
throw new Error(`CLI entry missing command for ${capability}`);
}
const requestOverrides = resolveAudioRequestOverrides(params.config);
const { maxBytes, maxChars, timeoutMs, prompt } = resolveEntryRunOptions({
capability,
entry,
@@ -683,7 +703,8 @@ export async function runCliEntry(params: {
MediaDir: path.dirname(mediaPath),
OutputDir: outputDir,
OutputBase: outputBase,
Prompt: prompt,
Prompt: requestOverrides.prompt ?? prompt,
...(requestOverrides.language ? { Language: requestOverrides.language } : {}),
MaxChars: maxChars,
};
const argv = [command, ...args].map((part, index) =>

View File

@@ -150,7 +150,28 @@ export async function transcribeAudioFile(params: {
agentDir?: string;
mime?: string;
activeModel?: ActiveMediaModel;
language?: string;
prompt?: string;
}): Promise<{ text: string | undefined }> {
const result = await runMediaUnderstandingFile({ ...params, capability: "audio" });
const cfg =
params.language || params.prompt
? {
...params.cfg,
tools: {
...params.cfg.tools,
media: {
...params.cfg.tools?.media,
audio: {
...params.cfg.tools?.media?.audio,
...(params.language ? { _requestLanguageOverride: params.language } : {}),
...(params.prompt ? { _requestPromptOverride: params.prompt } : {}),
...(params.language ? { language: params.language } : {}),
...(params.prompt ? { prompt: params.prompt } : {}),
},
},
},
}
: params.cfg;
const result = await runMediaUnderstandingFile({ ...params, cfg, capability: "audio" });
return { text: result.text };
}

View File

@@ -1,6 +1,6 @@
import fs from "node:fs";
import path from "node:path";
import { describe, expect, it } from "vitest";
import { describe, expect, it, vi } from "vitest";
import { withTempHome } from "../../test/helpers/temp-home.js";
import type { OpenClawConfig } from "../config/config.js";
import { resolveStatusTtsSnapshot } from "./status-config.js";
@@ -61,4 +61,44 @@ describe("resolveStatusTtsSnapshot", () => {
});
});
});
it("derives the default prefs path from OPENCLAW_CONFIG_PATH when set", async () => {
await withTempHome(
async (home) => {
const stateDir = path.join(home, ".openclaw-dev");
const prefsPath = path.join(stateDir, "settings", "tts.json");
fs.mkdirSync(path.dirname(prefsPath), { recursive: true });
fs.writeFileSync(
prefsPath,
JSON.stringify({
tts: {
auto: "always",
provider: "openai",
},
}),
);
vi.stubEnv("OPENCLAW_CONFIG_PATH", path.join(stateDir, "openclaw.json"));
try {
expect(
resolveStatusTtsSnapshot({
cfg: {
messages: {
tts: {},
},
} as OpenClawConfig,
}),
).toEqual({
autoMode: "always",
provider: "openai",
maxLength: 1500,
summarize: true,
});
} finally {
vi.unstubAllEnvs();
}
},
{ env: { OPENCLAW_STATE_DIR: undefined } },
);
});
});

View File

@@ -2,7 +2,7 @@ import fs from "node:fs";
import path from "node:path";
import type { OpenClawConfig } from "../config/config.js";
import type { TtsAutoMode, TtsConfig, TtsProvider } from "../config/types.tts.js";
import { CONFIG_DIR, resolveUserPath } from "../utils.js";
import { resolveConfigDir, resolveUserPath } from "../utils.js";
import { normalizeTtsAutoMode } from "./tts-auto-mode.js";
const DEFAULT_TTS_MAX_LENGTH = 1500;
@@ -47,7 +47,7 @@ function resolveTtsPrefsPathValue(prefsPath: string | undefined): string {
if (envPath) {
return resolveUserPath(envPath);
}
return path.join(CONFIG_DIR, "settings", "tts.json");
return path.join(resolveConfigDir(process.env), "settings", "tts.json");
}
function readPrefs(prefsPath: string): TtsUserPrefs {

View File

@@ -10,6 +10,7 @@ export {
isTtsProviderConfigured,
listSpeechVoices,
maybeApplyTtsToPayload,
resolveExplicitTtsOverrides,
resolveTtsAutoMode,
resolveTtsConfig,
resolveTtsPrefsPath,

View File

@@ -50,6 +50,15 @@ describe("resolveConfigDir", () => {
expect(resolveConfigDir(env)).toBe(path.resolve("/tmp/openclaw-home", "state"));
});
it("falls back to the config file directory when only OPENCLAW_CONFIG_PATH is set", () => {
const env = {
HOME: "/tmp/openclaw-home",
OPENCLAW_CONFIG_PATH: "~/profiles/dev/openclaw.json",
} as NodeJS.ProcessEnv;
expect(resolveConfigDir(env)).toBe(path.resolve("/tmp/openclaw-home", "profiles", "dev"));
});
});
describe("resolveHomeDir", () => {

View File

@@ -141,6 +141,10 @@ export function resolveConfigDir(
if (override) {
return resolveUserPath(override, env, homedir);
}
const configPath = env.OPENCLAW_CONFIG_PATH?.trim();
if (configPath) {
return path.dirname(resolveUserPath(configPath, env, homedir));
}
const newDir = path.join(resolveRequiredHomeDir(env, homedir), ".openclaw");
try {
const hasNew = fs.existsSync(newDir);

View File

@@ -64,6 +64,16 @@ function hasEntryCredential(
});
}
export function isWebFetchProviderConfigured(params: {
provider: Pick<
PluginWebFetchProviderEntry,
"envVars" | "getConfiguredCredentialValue" | "getCredentialValue" | "requiresCredential"
>;
config?: OpenClawConfig;
}): boolean {
return hasEntryCredential(params.provider, params.config, resolveFetchConfig(params.config));
}
export function listWebFetchProviders(params?: {
config?: OpenClawConfig;
}): PluginWebFetchProviderEntry[] {

View File

@@ -289,4 +289,162 @@ describe("web search runtime", () => {
result: { query: "runtime", provider: "beta", runtimeSelectedProvider: "beta" },
});
});
it("falls back to another provider when auto-selected search execution fails", async () => {
resolveRuntimeWebSearchProvidersMock.mockReturnValue([
createProvider({
pluginId: "google",
id: "google",
credentialPath: "tools.web.search.google.apiKey",
autoDetectOrder: 1,
getCredentialValue: () => "configured",
createTool: () => ({
description: "google",
parameters: {},
execute: async () => {
throw new Error("google aborted");
},
}),
}),
createProvider({
pluginId: "duckduckgo",
id: "duckduckgo",
credentialPath: "",
autoDetectOrder: 100,
requiresCredential: false,
createTool: () => ({
description: "duckduckgo",
parameters: {},
execute: async (args) => ({ ...args, provider: "duckduckgo" }),
}),
}),
]);
await expect(
runWebSearch({
config: {},
args: { query: "fallback" },
}),
).resolves.toEqual({
provider: "duckduckgo",
result: { query: "fallback", provider: "duckduckgo" },
});
});
it("does not prebuild fallback provider tools before attempting the selected provider", async () => {
resolveRuntimeWebSearchProvidersMock.mockReturnValue([
createProvider({
pluginId: "google",
id: "google",
credentialPath: "tools.web.search.google.apiKey",
autoDetectOrder: 1,
getCredentialValue: () => "configured",
createTool: () => ({
description: "google",
parameters: {},
execute: async (args) => ({ ...args, provider: "google" }),
}),
}),
createProvider({
pluginId: "broken-fallback",
id: "broken-fallback",
credentialPath: "",
autoDetectOrder: 100,
requiresCredential: false,
createTool: () => {
throw new Error("fallback createTool exploded");
},
}),
]);
await expect(
runWebSearch({
config: {},
args: { query: "selected-first" },
}),
).resolves.toEqual({
provider: "google",
result: { query: "selected-first", provider: "google" },
});
});
it("does not fall back when the provider came from explicit config selection", async () => {
resolveRuntimeWebSearchProvidersMock.mockReturnValue([
createProvider({
pluginId: "google",
id: "google",
credentialPath: "tools.web.search.google.apiKey",
autoDetectOrder: 1,
getCredentialValue: () => "configured",
createTool: () => ({
description: "google",
parameters: {},
execute: async () => {
throw new Error("google aborted");
},
}),
}),
createProvider({
pluginId: "duckduckgo",
id: "duckduckgo",
credentialPath: "",
autoDetectOrder: 100,
requiresCredential: false,
createTool: () => ({
description: "duckduckgo",
parameters: {},
execute: async (args) => ({ ...args, provider: "duckduckgo" }),
}),
}),
]);
await expect(
runWebSearch({
config: {
tools: {
web: {
search: {
provider: "google",
},
},
},
},
args: { query: "configured" },
}),
).rejects.toThrow("google aborted");
});
it("does not fall back when the caller explicitly selects a provider", async () => {
resolveRuntimeWebSearchProvidersMock.mockReturnValue([
createProvider({
pluginId: "google",
id: "google",
credentialPath: "tools.web.search.google.apiKey",
autoDetectOrder: 1,
getCredentialValue: () => "configured",
createTool: () => ({
description: "google",
parameters: {},
execute: async () => {
throw new Error("google aborted");
},
}),
}),
createProvider({
pluginId: "duckduckgo",
id: "duckduckgo",
credentialPath: "",
autoDetectOrder: 100,
requiresCredential: false,
}),
]);
await expect(
runWebSearch({
config: {},
providerId: "google",
args: { query: "explicit" },
}),
).rejects.toThrow("google aborted");
});
});

View File

@@ -78,6 +78,21 @@ function hasEntryCredential(
});
}
export function isWebSearchProviderConfigured(params: {
provider: Pick<
PluginWebSearchProviderEntry,
| "credentialPath"
| "id"
| "envVars"
| "getConfiguredCredentialValue"
| "getCredentialValue"
| "requiresCredential"
>;
config?: OpenClawConfig;
}): boolean {
return hasEntryCredential(params.provider, params.config, resolveSearchConfig(params.config));
}
export function listWebSearchProviders(params?: {
config?: OpenClawConfig;
}): PluginWebSearchProviderEntry[] {
@@ -197,21 +212,117 @@ export function resolveWebSearchDefinition(
});
}
function resolveWebSearchCandidates(
options?: ResolveWebSearchDefinitionParams,
): PluginWebSearchProviderEntry[] {
const search = resolveSearchConfig(options?.config);
const runtimeWebSearch = options?.runtimeWebSearch ?? getActiveRuntimeWebToolsMetadata()?.search;
if (!resolveWebSearchEnabled({ search, sandboxed: options?.sandboxed })) {
return [];
}
const providers = sortWebSearchProvidersForAutoDetect(
options?.preferRuntimeProviders
? resolveRuntimeWebSearchProviders({
config: options?.config,
bundledAllowlistCompat: true,
})
: resolvePluginWebSearchProviders({
config: options?.config,
bundledAllowlistCompat: true,
origin: "bundled",
}),
).filter(Boolean);
if (providers.length === 0) {
return [];
}
const preferredIds = [
options?.providerId,
runtimeWebSearch?.selectedProvider,
runtimeWebSearch?.providerConfigured,
resolveWebSearchProviderId({ config: options?.config, search, providers }),
].filter(
(value, index, array): value is string => Boolean(value) && array.indexOf(value) === index,
);
const orderedProviders = [
...preferredIds
.map((id) => providers.find((entry) => entry.id === id))
.filter((entry): entry is PluginWebSearchProviderEntry => Boolean(entry)),
...providers.filter((entry) => !preferredIds.includes(entry.id)),
];
return orderedProviders;
}
function hasExplicitWebSearchSelection(params: {
search?: WebSearchConfig;
runtimeWebSearch?: RuntimeWebSearchMetadata;
providerId?: string;
}): boolean {
if (params.providerId?.trim()) {
return true;
}
if (
params.search &&
"provider" in params.search &&
typeof params.search.provider === "string" &&
params.search.provider.trim()
) {
return true;
}
return params.runtimeWebSearch?.providerSource === "configured";
}
export async function runWebSearch(
params: RunWebSearchParams,
): Promise<{ provider: string; result: Record<string, unknown> }> {
const resolved = resolveWebSearchDefinition({ ...params, preferRuntimeProviders: true });
if (!resolved) {
const search = resolveSearchConfig(params.config);
const runtimeWebSearch = params.runtimeWebSearch ?? getActiveRuntimeWebToolsMetadata()?.search;
const candidates = resolveWebSearchCandidates({
...params,
runtimeWebSearch,
preferRuntimeProviders: true,
});
if (candidates.length === 0) {
throw new Error("web_search is disabled or no provider is available.");
}
return {
provider: resolved.provider.id,
result: await resolved.definition.execute(params.args),
};
const allowFallback = !hasExplicitWebSearchSelection({
search,
runtimeWebSearch,
providerId: params.providerId,
});
let lastError: unknown;
for (const candidate of candidates) {
try {
const definition = candidate.createTool({
config: params.config,
searchConfig: search as Record<string, unknown> | undefined,
runtimeMetadata: runtimeWebSearch,
});
if (!definition) {
continue;
}
return {
provider: candidate.id,
result: await definition.execute(params.args),
};
} catch (error) {
lastError = error;
if (!allowFallback) {
throw error;
}
}
}
throw lastError instanceof Error ? lastError : new Error(String(lastError));
}
export const __testing = {
resolveSearchConfig,
resolveSearchProvider: resolveWebSearchProviderId,
resolveWebSearchProviderId,
resolveWebSearchCandidates,
hasExplicitWebSearchSelection,
};

View File

@@ -752,6 +752,13 @@ export function renderApp(state: AppViewState) {
onSettingsChange: (next) => state.applySettings(next),
onPasswordChange: (next) => (state.password = next),
onSessionKeyChange: (next) => {
if (state.client && state.connected && state.chatVoiceActive) {
void state.client
.request("chat.voice.stop", { sessionKey: state.sessionKey })
.catch(() => {
// ignore best-effort voice cleanup errors during navigation
});
}
state.sessionKey = next;
state.chatMessage = "";
state.resetToolStream();
@@ -1532,12 +1539,24 @@ export function renderApp(state: AppViewState) {
? renderChat({
sessionKey: state.sessionKey,
onSessionKeyChange: (next) => {
if (state.client && state.connected && state.chatVoiceActive) {
void state.client
.request("chat.voice.stop", { sessionKey: state.sessionKey })
.catch(() => {
// ignore best-effort voice cleanup errors during navigation
});
}
state.sessionKey = next;
state.chatMessage = "";
state.chatAttachments = [];
state.chatStream = null;
state.chatStreamStartedAt = null;
state.chatRunId = null;
state.chatVoiceActive = false;
state.chatVoiceState = "idle";
state.chatVoiceTranscript = "";
state.chatVoiceRunId = null;
state.chatVoiceError = null;
state.chatQueue = [];
state.resetToolStream();
state.resetChatScroll();
@@ -1569,6 +1588,11 @@ export function renderApp(state: AppViewState) {
canSend: state.connected,
disabledReason: chatDisabledReason,
error: state.lastError,
voiceActive: state.chatVoiceActive,
voiceState: state.chatVoiceState,
voiceTranscript: state.chatVoiceTranscript,
voiceError: state.chatVoiceError,
voicePlaybackEnabled: state.chatVoicePlaybackEnabled,
sessions: state.sessionsResult,
focusMode: chatFocus,
onRefresh: () => {
@@ -1591,6 +1615,69 @@ export function renderApp(state: AppViewState) {
attachments: state.chatAttachments,
onAttachmentsChange: (next) => (state.chatAttachments = next),
onSend: () => state.handleSendChat(),
onVoiceStart: async () => {
if (!state.client || !state.connected) {
return false;
}
state.chatVoiceActive = false;
state.chatVoiceState = "connecting";
state.chatVoiceTranscript = "";
state.chatVoiceRunId = null;
state.chatVoiceError = null;
try {
const res = (await state.client.request("chat.voice.start", {
sessionKey: state.sessionKey,
})) as { playbackEnabled?: boolean } | undefined;
state.chatVoiceActive = true;
state.chatVoiceState = "listening";
state.chatVoicePlaybackEnabled = res?.playbackEnabled !== false;
return true;
} catch (error) {
state.chatVoiceActive = false;
state.chatVoiceState = "error";
state.chatVoiceError = String(error);
return false;
}
},
onVoiceAudioChunk: async (chunkBase64) => {
if (!state.client || !state.connected || !state.chatVoiceActive) {
return;
}
try {
await state.client.request("chat.voice.audio", {
sessionKey: state.sessionKey,
audio: chunkBase64,
format: "pcm16",
sampleRate: 16000,
});
} catch (error) {
state.chatVoiceState = "error";
state.chatVoiceError = String(error);
}
},
onVoiceStop: async () => {
state.chatVoiceActive = false;
state.chatVoiceRunId = null;
state.chatVoiceTranscript = "";
if (!state.client || !state.connected) {
state.chatVoiceState = "idle";
return;
}
try {
await state.client.request("chat.voice.stop", { sessionKey: state.sessionKey });
} catch (error) {
state.chatVoiceState = "error";
state.chatVoiceError = String(error);
}
},
onVoiceInterrupt: async () => {
if (!state.client || !state.connected) {
return;
}
await state.client.request("chat.voice.interrupt", {
sessionKey: state.sessionKey,
});
},
canAbort: Boolean(state.chatRunId),
onAbort: () => void state.handleAbortChat(),
onQueueRemove: (id) => state.removeQueuedMessage(id),

View File

@@ -72,6 +72,19 @@ export type AppViewState = {
chatStream: string | null;
chatStreamStartedAt: number | null;
chatRunId: string | null;
chatVoiceActive: boolean;
chatVoiceState:
| "idle"
| "connecting"
| "listening"
| "processing"
| "speaking"
| "interrupted"
| "error";
chatVoiceTranscript: string;
chatVoiceRunId: string | null;
chatVoicePlaybackEnabled: boolean;
chatVoiceError: string | null;
compactionStatus: CompactionStatus | null;
fallbackStatus: FallbackStatus | null;
chatAvatarUrl: string | null;

View File

@@ -165,6 +165,19 @@ export class OpenClawApp extends LitElement {
@state() chatStream: string | null = null;
@state() chatStreamStartedAt: number | null = null;
@state() chatRunId: string | null = null;
@state() chatVoiceActive = false;
@state() chatVoiceState:
| "idle"
| "connecting"
| "listening"
| "processing"
| "speaking"
| "interrupted"
| "error" = "idle";
@state() chatVoiceTranscript = "";
@state() chatVoiceRunId: string | null = null;
@state() chatVoicePlaybackEnabled = true;
@state() chatVoiceError: string | null = null;
@state() compactionStatus: CompactionStatus | null = null;
@state() fallbackStatus: FallbackStatus | null = null;
@state() chatAvatarUrl: string | null = null;

View File

@@ -125,6 +125,195 @@ export function isSttActive(): boolean {
return activeRecognition !== null;
}
// ─── Realtime Voice Capture ───
type RealtimeVoiceCallbacks = {
onChunk: (chunkBase64: string) => void;
onStart?: () => void;
onStop?: () => void;
onError?: (error: string) => void;
};
type RealtimeVoiceCapture = {
stop: () => void;
};
const REALTIME_VOICE_TARGET_SAMPLE_RATE = 16_000;
const REALTIME_VOICE_CHUNK_MS = 250;
let activeRealtimeVoiceCapture: RealtimeVoiceCapture | null = null;
export function isRealtimeVoiceSupported(): boolean {
const hasGetUserMedia =
typeof navigator !== "undefined" && typeof navigator.mediaDevices?.getUserMedia === "function";
return (
typeof window !== "undefined" &&
Boolean(window.isSecureContext) &&
hasGetUserMedia &&
typeof AudioContext !== "undefined"
);
}
export async function startRealtimeVoiceCapture(
callbacks: RealtimeVoiceCallbacks,
): Promise<boolean> {
if (!isRealtimeVoiceSupported()) {
callbacks.onError?.("Realtime voice requires a secure context with microphone access");
return false;
}
stopRealtimeVoiceCapture();
let stream: MediaStream;
try {
stream = await navigator.mediaDevices.getUserMedia({
audio: {
channelCount: 1,
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
},
});
} catch (error) {
callbacks.onError?.(error instanceof Error ? error.message : String(error));
return false;
}
const audioContext = new AudioContext();
try {
if (audioContext.state !== "running") {
await audioContext.resume();
}
} catch (error) {
stream.getTracks().forEach((track) => track.stop());
callbacks.onError?.(
error instanceof Error ? error.message : "Failed to start realtime voice capture",
);
void audioContext.close();
return false;
}
const source = audioContext.createMediaStreamSource(stream);
const processor = audioContext.createScriptProcessor(4096, 1, 1);
const samplesPerChunk = Math.max(
1,
Math.round((REALTIME_VOICE_TARGET_SAMPLE_RATE * REALTIME_VOICE_CHUNK_MS) / 1000),
);
let pcmBuffer = new Int16Array(0);
let stopped = false;
const flushChunk = () => {
if (pcmBuffer.length < samplesPerChunk) {
return;
}
const chunk = pcmBuffer.slice(0, samplesPerChunk);
pcmBuffer = pcmBuffer.slice(samplesPerChunk);
callbacks.onChunk(encodePcm16Chunk(chunk));
};
processor.onaudioprocess = (event) => {
if (stopped) {
return;
}
const input = event.inputBuffer.getChannelData(0);
const downsampled = downsampleFloat32Buffer(
input,
audioContext.sampleRate,
REALTIME_VOICE_TARGET_SAMPLE_RATE,
);
if (downsampled.length === 0) {
return;
}
const next = new Int16Array(pcmBuffer.length + downsampled.length);
next.set(pcmBuffer, 0);
next.set(downsampled, pcmBuffer.length);
pcmBuffer = next;
flushChunk();
};
source.connect(processor);
processor.connect(audioContext.destination);
const stop = () => {
if (stopped) {
return;
}
stopped = true;
activeRealtimeVoiceCapture = null;
if (pcmBuffer.length > 0) {
callbacks.onChunk(encodePcm16Chunk(pcmBuffer));
pcmBuffer = new Int16Array(0);
}
processor.disconnect();
source.disconnect();
stream.getTracks().forEach((track) => track.stop());
void audioContext.close();
callbacks.onStop?.();
};
activeRealtimeVoiceCapture = { stop };
callbacks.onStart?.();
return true;
}
export function stopRealtimeVoiceCapture(): void {
activeRealtimeVoiceCapture?.stop();
}
function downsampleFloat32Buffer(
buffer: Float32Array,
inputSampleRate: number,
outputSampleRate: number,
): Int16Array {
if (outputSampleRate >= inputSampleRate) {
return float32ToPcm16(buffer);
}
const ratio = inputSampleRate / outputSampleRate;
const outputLength = Math.max(1, Math.round(buffer.length / ratio));
const output = new Int16Array(outputLength);
let offsetBuffer = 0;
for (let i = 0; i < outputLength; i += 1) {
const nextOffsetBuffer = Math.min(buffer.length, Math.round((i + 1) * ratio));
let sum = 0;
let count = 0;
for (let j = offsetBuffer; j < nextOffsetBuffer; j += 1) {
sum += buffer[j];
count += 1;
}
const sample = count > 0 ? sum / count : 0;
output[i] = float32SampleToPcm16(sample);
offsetBuffer = nextOffsetBuffer;
}
return output;
}
function float32ToPcm16(buffer: Float32Array): Int16Array {
const output = new Int16Array(buffer.length);
for (let i = 0; i < buffer.length; i += 1) {
output[i] = float32SampleToPcm16(buffer[i]);
}
return output;
}
function float32SampleToPcm16(sample: number): number {
const clamped = Math.max(-1, Math.min(1, sample));
return clamped < 0 ? Math.round(clamped * 0x8000) : Math.round(clamped * 0x7fff);
}
function encodePcm16Chunk(chunk: Int16Array): string {
const bytes = new Uint8Array(chunk.length * 2);
for (let i = 0; i < chunk.length; i += 1) {
const value = chunk[i];
bytes[i * 2] = value & 0xff;
bytes[i * 2 + 1] = (value >> 8) & 0xff;
}
let binary = "";
for (const byte of bytes) {
binary += String.fromCharCode(byte);
}
return btoa(binary);
}
// ─── TTS (Text-to-Speech) ───
export function isTtsSupported(): boolean {

View File

@@ -44,6 +44,11 @@ function createProps(overrides: Partial<ChatProps> = {}): ChatProps {
canSend: true,
disabledReason: null,
error: null,
voiceActive: false,
voiceState: "idle",
voiceTranscript: "",
voiceError: null,
voicePlaybackEnabled: true,
sessions: {
ts: 0,
path: "",

View File

@@ -220,6 +220,11 @@ function createProps(overrides: Partial<ChatProps> = {}): ChatProps {
canSend: true,
disabledReason: null,
error: null,
voiceActive: false,
voiceState: "idle",
voiceTranscript: "",
voiceError: null,
voicePlaybackEnabled: true,
sessions: createSessions(),
focusMode: false,
assistantName: "OpenClaw",

View File

@@ -29,7 +29,14 @@ import {
type SlashCommandCategory,
type SlashCommandDef,
} from "../chat/slash-commands.ts";
import { isSttSupported, startStt, stopStt } from "../chat/speech.ts";
import {
isRealtimeVoiceSupported,
isSttSupported,
startRealtimeVoiceCapture,
startStt,
stopRealtimeVoiceCapture,
stopStt,
} from "../chat/speech.ts";
import { icons } from "../icons.ts";
import { detectTextDirection } from "../text-direction.ts";
import type { GatewaySessionRow, SessionsListResult } from "../types.ts";
@@ -62,6 +69,18 @@ export type ChatProps = {
canSend: boolean;
disabledReason: string | null;
error: string | null;
voiceActive: boolean;
voiceState:
| "idle"
| "connecting"
| "listening"
| "processing"
| "speaking"
| "interrupted"
| "error";
voiceTranscript: string;
voiceError: string | null;
voicePlaybackEnabled: boolean;
sessions: SessionsListResult | null;
focusMode: boolean;
sidebarOpen?: boolean;
@@ -80,6 +99,10 @@ export type ChatProps = {
onDraftChange: (next: string) => void;
onRequestUpdate?: () => void;
onSend: () => void;
onVoiceStart?: () => Promise<boolean> | boolean;
onVoiceAudioChunk?: (chunkBase64: string) => Promise<void> | void;
onVoiceStop?: () => Promise<void> | void;
onVoiceInterrupt?: () => Promise<void> | void;
onAbort?: () => void;
onQueueRemove: (id: string) => void;
onNewSession: () => void;
@@ -130,6 +153,7 @@ function getDeletedMessages(sessionKey: string): DeletedMessages {
interface ChatEphemeralState {
sttRecording: boolean;
sttInterimText: string;
voiceRecording: boolean;
slashMenuOpen: boolean;
slashMenuItems: SlashCommandDef[];
slashMenuIndex: number;
@@ -145,6 +169,7 @@ function createChatEphemeralState(): ChatEphemeralState {
return {
sttRecording: false,
sttInterimText: "",
voiceRecording: false,
slashMenuOpen: false,
slashMenuItems: [],
slashMenuIndex: 0,
@@ -167,6 +192,9 @@ export function resetChatViewState() {
if (vs.sttRecording) {
stopStt();
}
if (vs.voiceRecording) {
stopRealtimeVoiceCapture();
}
Object.assign(vs, createChatEphemeralState());
}
@@ -254,6 +282,32 @@ function renderFallbackIndicator(status: FallbackIndicatorStatus | null | undefi
`;
}
function renderVoiceStatus(props: ChatProps) {
if (!props.voiceActive && !props.voiceError) {
return nothing;
}
const label =
props.voiceState === "connecting"
? "Connecting voice..."
: props.voiceState === "listening"
? "Listening..."
: props.voiceState === "processing"
? "Processing..."
: props.voiceState === "speaking"
? "Speaking..."
: props.voiceState === "interrupted"
? "Interrupted"
: props.voiceState === "error"
? "Voice error"
: "Voice ready";
const detail = props.voiceError || props.voiceTranscript;
return html`
<div class="agent-chat__stt-interim">
<strong>${label}</strong>${detail ? html` ${detail}` : nothing}
</div>
`;
}
/**
* Compact notice when context usage reaches 85%+.
* Progressively shifts from amber (85%) to red (90%+).
@@ -913,6 +967,11 @@ export function renderChat(props: ChatProps) {
const requestUpdate = props.onRequestUpdate ?? (() => {});
const getDraft = props.getDraft ?? (() => props.draft);
if (!props.voiceActive && vs.voiceRecording) {
stopRealtimeVoiceCapture();
vs.voiceRecording = false;
}
const splitRatio = props.splitRatio ?? 0.6;
const sidebarOpen = Boolean(props.sidebarOpen && props.onCloseSidebar);
@@ -1262,6 +1321,7 @@ export function renderChat(props: ChatProps) {
${vs.sttRecording && vs.sttInterimText
? html`<div class="agent-chat__stt-interim">${vs.sttInterimText}</div>`
: nothing}
${renderVoiceStatus(props)}
<textarea
${ref((el) => el && adjustTextareaHeight(el as HTMLTextAreaElement))}
@@ -1342,6 +1402,56 @@ export function renderChat(props: ChatProps) {
</button>
`
: nothing}
${isRealtimeVoiceSupported() && props.onVoiceStart && props.onVoiceStop
? html`
<button
class="agent-chat__input-btn ${props.voiceActive
? "agent-chat__input-btn--recording"
: ""}"
@click=${async () => {
if (props.voiceActive) {
stopRealtimeVoiceCapture();
vs.voiceRecording = false;
await props.onVoiceStop?.();
requestUpdate();
return;
}
const started = await props.onVoiceStart?.();
if (!started) {
requestUpdate();
return;
}
const captureStarted = await startRealtimeVoiceCapture({
onChunk: (chunkBase64) => {
void props.onVoiceAudioChunk?.(chunkBase64);
},
onStart: () => {
vs.voiceRecording = true;
requestUpdate();
},
onStop: () => {
vs.voiceRecording = false;
requestUpdate();
},
onError: async () => {
vs.voiceRecording = false;
await props.onVoiceStop?.();
requestUpdate();
},
});
if (!captureStarted) {
await props.onVoiceStop?.();
requestUpdate();
}
}}
title=${props.voiceActive ? "Stop live voice" : "Start live voice"}
aria-label=${props.voiceActive ? "Stop live voice" : "Start live voice"}
?disabled=${!props.connected || props.voiceState === "connecting"}
>
${props.voiceActive ? icons.volume2 : icons.radio}
</button>
`
: nothing}
${tokens ? html`<span class="agent-chat__token-count">${tokens}</span>` : nothing}
</div>