feat(agent): make the assistant agentic - visible tools, LLM voice, full toolset

The agent felt like an artificial idiot because the LLM almost never spoke for itself: 14+ Go paths injected fmt.Sprintf canned replies, the frontend filtered out tool-progress events so users saw three dots for 10-20s, the main prompt told the LLM "be a trading partner" AND "answer only what's asked", and the planner sliced the toolset by inferred domain so a "BTC dropped, how much am I losing?" question couldn't see positions and market at the same time. - agent/central_brain.go: shouldTrustDeterministicSkillReply now always returns false. Successful mutations (trader/strategy/model/exchange create/update/start/stop/delete) flow through reviewTaskCompletion so the LLM sees the real outcome JSON and writes the user-facing prose. The trade-confirmation regex path (handleTradeConfirmation) was already outside this code path and is unaffected. - agent/agent.go: rewrite the Behavior section of the main system prompt. Replace the contradictory "answer only what's asked / don't upsell" with "lead with the direct answer, then optionally one relevant follow-up only when (a) open risk, (b) missing config, or (c) the next step is obvious — e.g. created, want me to start it?". Explicitly authorize chaining ("if the user says create and start, do both this turn") and ban "please wait / I'll get back to you" language because there is no background job to come back from. - agent/tools.go: plannerToolsForText always returns the full 22-tool set (new __all__ domain). The old per-domain trimming hid manage_trader from market questions and execute_trade from anything that didn't look like an explicit trade — cross-domain reasoning was structurally blocked. The compact-vs-full strategy schema switch is preserved so mutation intents still see the full config schema. - web/src/components/agent/{AgentStepPanel,ChatMessages}.tsx: stop filtering tool: steps. Map raw tool names to friendly labels with emoji ("get_positions" → "📊 检查持仓") in zh/en/id. Users now see what the agent is doing in real time instead of silence. central_brain routing chatter still gets dropped. - agent/planner_tools_test.go: tests updated to assert the new full-toolset behavior and the compact-vs-full strategy schema switch.
2026-07-21 19:27:39 +08:00 · 2026-05-29 22:13:05 +08:00
parent fcb73cc195
commit 1851508353
6 changed files with 166 additions and 57 deletions
--- a/agent/agent.go
+++ b/agent/agent.go
@@ -620,14 +620,14 @@ func (a *Agent) buildSystemPromptForStoreUser(lang, storeUserID string) string {
 - 查股票行情 ≠ 用户持有该股票。不要混淆"查价格"和"有持仓"

 ## 行为准则（最高优先级）
- **用户问什么就只答什么** — 问余额只说余额，问持仓只说持仓，问价格只说价格。不要把 System Context 里的其他数据也一起输出。
- **System Context 是参考资料，不是输出模板** — 里面有很多实时数据，但你只用跟用户问题直接相关的那部分。
- **回复要短** — 能一句话说清就不要写一段。不要用表格、分隔线、标题，除非数据需要对比。
- **不要主动推销** — 不要列"下一步建议"、"需要我帮你做什么"，除非用户主动问。数据为空就一句话说明原因。
+- **先直接答, 再可选追加一条相关提醒** — 第一句永远是用户问的那个具体答案。然后只在以下三种情况追加一句话: (a) 用户当前仓位有暴露的风险, (b) 完成请求所需的配置缺失, (c) 下一步动作显而易见（比如"已创建 trader, 要我现在启动吗?"）。一次只追加一句, 不要列清单。
+- **System Context 是参考资料, 不是输出模板** — 只用跟用户问题直接相关的那部分, 不要复述整个状态。
+- **回复要短** — 能一句话说清就不要写一段。不要用表格、分隔线、标题, 除非数据真的需要对比。
+- **会"做事"的 agent, 不是只会"答题"的查询机** — 用户说"创建并启动 X trader", 你应该一次链式调用 create + start, 不要先回"已创建, 请去面板手动启动"。用户已经表达意图, 你就去做。
+- **遇到工具错误**: 用一句人话说出原因, 然后给一个最可能的修复建议或一个聚焦的追问。不要默默重试。不要说"稍等一下我去办" — 你没有后台任务。
 - **不要重复自我介绍** — 除非用户首次问"你是谁/你能做什么"。
- 把用户当交易小白，语言简单直接。
- 先说结论，再说原因。
- **诚实是第一原则** — 不确定就说不确定，没数据就说没数据。绝不编造。
+- 把用户当交易小白, 语言简单直接。先结论, 再原因。
+- **诚实是第一原则** — 不确定就说不确定, 没数据就说没数据。绝不编造。
 - 用中文回复。

 当前时间: %s`, traderInfo, watchlist, skillCatalog, time.Now().Format("2006-01-02 15:04:05"))
@@ -708,14 +708,14 @@ You can call these tools to take action:
 - Checking a stock price ≠ user owns that stock. Never confuse "quote lookup" with "holding"

 ## Behavior (HIGHEST PRIORITY)
- **Answer ONLY what the user asked** — if they ask balance, only say balance. If they ask positions, only say positions. Do not dump other System Context data.
- **System Context is reference material, not output template** — it has lots of real-time data, but only use what is directly relevant to the user's question.
- **Keep it short** — if you can say it in one sentence, don't write a paragraph. No tables, dividers, or headers unless data needs comparison.
- **Don't upsell** — don't list "next step suggestions" or "want me to help?" unless the user explicitly asks. If data is empty, one sentence explaining why.
+- **Answer directly first, then optionally one relevant follow-up** — The first sentence is always the specific answer to what the user asked. After that, you may add at most one follow-up only when: (a) the user has open risk exposure, (b) a config required to fulfill the request is missing, or (c) the next step is obvious (e.g. "Trader created — want me to start it?"). One follow-up max, no checklists.
+- **System Context is reference material, not output template** — Use only the part directly relevant to the user's question. Don't recap the whole state.
+- **Keep it short** — One sentence beats a paragraph. No tables, dividers, or headers unless data really needs comparison.
+- **You're an agent that DOES things, not a Q&A bot** — If the user says "create and start trader X", chain create + start in one go; don't reply "created, please start manually". They already expressed intent; execute it.
+- **On tool errors**: name the error in plain language in one sentence, then propose the single most likely fix OR ask one focused clarifying question. Never silently retry. Never say "I'll get back to you" / "please wait" — you have no background job.
 - **Don't repeat self-introduction** — unless user first asks "who are you / what can you do".
- Treat the user like a trading beginner. Use plain language.
- Lead with the conclusion, then the reason.
- **Honesty is rule #1** — uncertain = say uncertain, no data = say no data.
+- Treat the user like a trading beginner. Use plain language. Conclusion first, reason after.
+- **Honesty is rule #1** — uncertain = say uncertain, no data = say no data. Never fabricate.

 Current time: %s`, traderInfo, watchlist, skillCatalog, time.Now().Format("2006-01-02 15:04:05"))
 }
--- a/agent/central_brain.go
+++ b/agent/central_brain.go
@@ -937,17 +937,22 @@ func (a *Agent) executeActiveSkillSession(storeUserID string, userID int64, lang
 	return outcome, ActiveSkillSession{}, false, true
 }

-func shouldTrustDeterministicSkillReply(outcome skillOutcome) bool {
-	if outcome.Status != skillOutcomeSuccess || !outcome.GoalAchieved {
-		return false
-	}
-	switch outcome.Skill {
-	case "strategy_management", "trader_management", "model_management", "exchange_management":
-		switch outcome.Action {
-		case "create", "update", "update_name", "update_bindings", "configure_strategy", "configure_exchange", "configure_model", "update_status", "update_endpoint", "update_config", "update_prompt", "delete", "start", "stop", "activate", "duplicate":
-			return true
-		}
-	}
+// shouldTrustDeterministicSkillReply controls whether a Go-generated UserMessage
+// is shown verbatim to the user (true) or whether the LLM gets to review the
+// tool outcome and write a natural reply (false).
+//
+// Historically this returned true for every successful mutation on trader /
+// strategy / model / exchange — which meant the user always saw the same
+// canned `fmt.Sprintf` lines (e.g. "已创建 Trader: X. 我没有自动启动..."), and
+// the agent felt mechanical / "non-agentic". It now always returns false so the
+// LLM owns the voice. The cost is one extra LLM call per mutation; the upside
+// is that the agent can chain ("trader created — want me to start it now?"),
+// apologize on errors in plain language, respect the user's language and
+// tone, and behave like an actual agent instead of a settings panel.
+//
+// The trade-confirmation flow (execute_trade -> "确认 trade_xxx") is unaffected:
+// it runs through handleTradeConfirmation in trade.go before this code path.
+func shouldTrustDeterministicSkillReply(_ skillOutcome) bool {
 	return false
 }

--- a/agent/planner_tools_test.go
+++ b/agent/planner_tools_test.go
@@ -7,34 +7,53 @@ import (
 	"nofx/mcp"
 )

-func TestPlannerToolsForMarketIntentAreTrimmed(t *testing.T) {
+// plannerToolsForText now always returns the FULL toolset (no per-domain
+// trimming) so the LLM can cross-domain reason. The old "if market intent,
+// hide manage_trader" filter was making cross-domain questions like "BTC
+// dropped, how much am I losing?" impossible to answer because the agent
+// couldn't see both market AND position tools in the same turn.
+//
+// We still trim the giant strategy schema for non-mutation intents because
+// that one is genuinely huge and uninformative for read-only use.
+
+func TestPlannerToolsExposeFullSetForMarketIntent(t *testing.T) {
 	tools := plannerToolsForText("看一下 BTCUSDT 行情和 K线")
 	names := toolNamesForTest(tools)

+	// Market tools must be present.
 	for _, expected := range []string{"get_market_snapshot", "get_market_price", "get_kline"} {
 		if !containsString(names, expected) {
 			t.Fatalf("expected market tool %q in %v", expected, names)
 		}
 	}
-	for _, unexpected := range []string{"manage_strategy", "manage_trader", "manage_exchange_config", "manage_model_config"} {
-		if containsString(names, unexpected) {
-			t.Fatalf("did not expect management tool %q in market tools %v", unexpected, names)
+	// Cross-domain tools (positions, balance, trader management) must ALSO be
+	// present so the agent can answer "how much am I losing" follow-ups
+	// without losing the market context.
+	for _, expected := range []string{"get_positions", "get_balance", "manage_trader"} {
+		if !containsString(names, expected) {
+			t.Fatalf("expected cross-domain tool %q in market context %v", expected, names)
 		}
 	}
 }

-func TestPlannerToolsForExchangeIntentAreTrimmed(t *testing.T) {
+func TestPlannerToolsExposeFullSetForExchangeIntent(t *testing.T) {
 	tools := plannerToolsForText("帮我添加 okx 交易所 API key")
 	names := toolNamesForTest(tools)

-	if len(names) != 2 {
-		t.Fatalf("expected two exchange tools, got %v", names)
-	}
+	// At least the exchange management tools must show up.
 	for _, expected := range []string{"get_exchange_configs", "manage_exchange_config"} {
 		if !containsString(names, expected) {
 			t.Fatalf("expected exchange tool %q in %v", expected, names)
 		}
 	}
+	// And the agent still has the broader surface available — adding an
+	// exchange often leads to "now create a trader" so trader/strategy tools
+	// must be reachable in the same turn.
+	for _, expected := range []string{"manage_trader", "get_strategies"} {
+		if !containsString(names, expected) {
+			t.Fatalf("expected adjacent tool %q in exchange context %v", expected, names)
+		}
+	}
 }

 func TestPlannerToolsUseCompactManageStrategyForReadIntent(t *testing.T) {
--- a/agent/tools.go
+++ b/agent/tools.go
@@ -43,10 +43,22 @@ var (
 // agentTools returns the tools available to the LLM for autonomous action.
 func agentTools() []mcp.Tool { return cachedTools }

+// plannerToolsForText returns the tools the LLM can call on this turn.
+//
+// Historically this filtered tools to a "domain" inferred from the user's
+// text (asking about "market" hid trader tools, etc.). The intent was to
+// keep prompts small for older models, but it made cross-domain reasoning
+// structurally impossible — e.g. "BTC dropped, how much am I losing?" needs
+// BOTH market AND position tools. Modern LLMs handle 22-tool surfaces fine,
+// and the agent-feels-blind-and-useless symptom is worse than any prompt
+// bloat. We now always expose the full toolset.
+//
+// `compactStrategy` still trims the giant strategy schema for non-mutation
+// intents (it's a 117-line nested schema; only worth showing in full when
+// the user is actually editing strategy config).
 func plannerToolsForText(text string) []mcp.Tool {
-	domain := plannerToolDomainForText(text)
 	compactStrategy := !looksLikeStrategyMutationIntent(text)
-	names := plannerToolNamesForDomain(domain)
+	names := plannerToolNamesForDomain("__all__")
 	return toolsByName(names, compactStrategy)
 }

@@ -80,7 +92,28 @@ func plannerToolDomainForText(text string) string {
 }

 func plannerToolNamesForDomain(domain string) []string {
+	// Full toolset — exposed in every turn so the LLM can cross-domain reason.
+	// The `__all__` sentinel is the canonical "give me everything" entry; older
+	// domain switches are kept for callers that explicitly request a subset.
+	all := []string{
+		// Account / lifecycle state
+		"get_preferences", "manage_preferences",
+		"get_decisions", "get_backend_logs",
+		"get_exchange_configs", "manage_exchange_config",
+		"get_model_configs", "manage_model_config",
+		"get_strategies", "manage_strategy",
+		"manage_trader",
+		"get_balance", "get_positions", "get_trade_history",
+		"get_candidate_coins",
+		"get_watchlist", "manage_watchlist",
+		// Trade execution
+		"execute_trade",
+		// Market data
+		"get_market_snapshot", "get_market_price", "get_kline", "search_stock",
+	}
 	switch domain {
+	case "__all__", "":
+		return all
 	case "market":
 		return []string{"get_market_snapshot", "get_market_price", "get_kline", "search_stock"}
 	case "account":
@@ -96,16 +129,7 @@ func plannerToolNamesForDomain(domain string) []string {
 	case "diagnosis":
 		return []string{"get_decisions", "get_backend_logs", "get_model_configs", "get_exchange_configs", "get_strategies", "manage_trader"}
 	default:
-		return []string{
-			"get_preferences", "manage_preferences",
-			"get_decisions", "get_backend_logs",
-			"get_exchange_configs", "manage_exchange_config",
-			"get_model_configs", "manage_model_config",
-			"get_strategies", "manage_strategy",
-			"manage_trader",
-			"get_balance", "get_positions", "get_trade_history",
-			"get_market_snapshot", "get_market_price", "get_kline", "search_stock",
-		}
+		return all
 	}
 }

--- a/web/src/components/agent/AgentStepPanel.tsx
+++ b/web/src/components/agent/AgentStepPanel.tsx
@@ -1,4 +1,5 @@
 import type { AgentStep } from '../../types/agent'
+import { useLanguage } from '../../contexts/LanguageContext'

 interface AgentStepPanelProps {
  steps?: AgentStep[]
@@ -13,21 +14,80 @@ const statusStyles: Record<AgentStep['status'], { dot: string; text: string }> =
  replanned: { dot: '#38bdf8', text: '#9bdcf7' },
 }

+// Map raw backend tool names to friendly user-facing labels.
+// Backend emits `step.label` like `tool:get_positions` and we render that as
+// "📊 Checking your positions" instead of hiding it from the user.
+const toolLabels: Record<string, { zh: string; en: string; id: string }> = {
+  // Read-only state
+  get_positions: { zh: '📊 检查持仓', en: '📊 Checking positions', id: '📊 Memeriksa posisi' },
+  get_balance: { zh: '💰 查余额', en: '💰 Reading balance', id: '💰 Membaca saldo' },
+  get_trade_history: { zh: '📜 查交易历史', en: '📜 Reading trade history', id: '📜 Membaca riwayat' },
+  get_decisions: { zh: '🤖 查 AI 决策记录', en: '🤖 Reading AI decisions', id: '🤖 Membaca keputusan AI' },
+  get_strategies: { zh: '📋 查策略列表', en: '📋 Listing strategies', id: '📋 Daftar strategi' },
+  get_candidate_coins: { zh: '🎯 查标的池', en: '🎯 Reading candidate pool', id: '🎯 Kandidat' },
+  get_exchange_configs: { zh: '🔌 查交易所配置', en: '🔌 Reading exchanges', id: '🔌 Bursa' },
+  get_model_configs: { zh: '🧠 查 AI 模型', en: '🧠 Reading AI models', id: '🧠 Model AI' },
+  get_preferences: { zh: '⚙️ 查偏好', en: '⚙️ Reading preferences', id: '⚙️ Preferensi' },
+  get_backend_logs: { zh: '🪵 查后台日志', en: '🪵 Reading logs', id: '🪵 Membaca log' },
+  get_watchlist: { zh: '👁 查关注列表', en: '👁 Reading watchlist', id: '👁 Membaca watchlist' },
+
+  // Market data
+  search_stock: { zh: '🔍 搜索股票', en: '🔍 Searching stocks', id: '🔍 Mencari saham' },
+  get_market_price: { zh: '📈 查实时价格', en: '📈 Fetching price', id: '📈 Mengambil harga' },
+  get_market_snapshot: { zh: '📈 查市场快照', en: '📈 Reading market snapshot', id: '📈 Snapshot pasar' },
+  get_kline: { zh: '📈 查 K 线', en: '📈 Reading candlesticks', id: '📈 Membaca candlestick' },
+
+  // Mutating
+  manage_trader: { zh: '🤖 管理 Trader', en: '🤖 Managing trader', id: '🤖 Mengelola trader' },
+  manage_strategy: { zh: '📋 管理策略', en: '📋 Managing strategy', id: '📋 Mengelola strategi' },
+  manage_exchange_config: { zh: '🔌 管理交易所', en: '🔌 Managing exchange', id: '🔌 Mengelola bursa' },
+  manage_model_config: { zh: '🧠 管理 AI 模型', en: '🧠 Managing AI model', id: '🧠 Mengelola model' },
+  manage_preferences: { zh: '⚙️ 更新偏好', en: '⚙️ Updating preferences', id: '⚙️ Memperbarui preferensi' },
+  manage_watchlist: { zh: '👁 更新关注列表', en: '👁 Updating watchlist', id: '👁 Memperbarui watchlist' },
+  execute_trade: { zh: '⚡ 准备下单', en: '⚡ Preparing trade', id: '⚡ Menyiapkan order' },
+}
+
+function friendlyStepLabel(rawLabel: string, lang: 'zh' | 'en' | 'id'): string {
+  const trimmed = rawLabel.trim()
+  if (trimmed.toLowerCase().startsWith('tool:')) {
+    const toolName = trimmed.slice(5).trim().toLowerCase()
+    const entry = toolLabels[toolName]
+    if (entry) return entry[lang]
+    // Unknown tool — surface a generic but still informative label
+    const generic = {
+      zh: `🔧 调用 ${toolName}`,
+      en: `🔧 Calling ${toolName}`,
+      id: `🔧 Memanggil ${toolName}`,
+    }
+    return generic[lang]
+  }
+  return rawLabel
+}
+
 export function AgentStepPanel({ steps, visible }: AgentStepPanelProps) {
+  const { language } = useLanguage()
+  const lang = (language === 'zh' || language === 'id' ? language : 'en') as
+    | 'zh'
+    | 'en'
+    | 'id'
+
  if (!visible || !steps || steps.length === 0) {
    return null
  }

-  const sanitizedSteps = steps.filter((step) => {
-    const label = step.label.trim().toLowerCase()
+  // Drop only the internal-routing chatter (central_brain); keep tool steps —
+  // they are exactly what the user wants to see ("agent is actually doing something").
+  const visibleSteps = steps.filter((step) => {
    const detail = (step.detail || '').trim().toLowerCase()
-    return !(label.startsWith('tool:') || detail === 'central_brain')
+    return detail !== 'central_brain'
  })

-  if (sanitizedSteps.length === 0) {
+  if (visibleSteps.length === 0) {
    return null
  }

+  const liveRunHeading = lang === 'zh' ? 'AGENT 实时动作' : lang === 'id' ? 'AKSI AGENT' : 'LIVE RUN'
+
  return (
    <div
      style={{
@@ -48,11 +108,12 @@ export function AgentStepPanel({ steps, visible }: AgentStepPanelProps) {
          marginBottom: 10,
        }}
      >
-        Live Run
+        {liveRunHeading}
      </div>
      <div style={{ display: 'flex', flexDirection: 'column', gap: 8 }}>
-        {sanitizedSteps.map((step) => {
+        {visibleSteps.map((step) => {
          const style = statusStyles[step.status]
+          const label = friendlyStepLabel(step.label, lang)
          return (
            <div
              key={step.id}
@@ -85,9 +146,9 @@ export function AgentStepPanel({ steps, visible }: AgentStepPanelProps) {
                    fontWeight: step.status === 'running' ? 600 : 500,
                  }}
                >
-                  {step.label}
+                  {label}
                </div>
-                {step.detail && (
+                {step.detail && step.detail.trim().toLowerCase() !== 'central_brain' && (
                  <div
                    style={{
                      fontSize: 11.5,
--- a/web/src/components/agent/ChatMessages.tsx
+++ b/web/src/components/agent/ChatMessages.tsx
@@ -10,12 +10,12 @@ interface ChatMessagesProps {

 function hasMeaningfulExecutionSteps(steps?: AgentStep[]) {
  if (!steps || steps.length === 0) return false
+  // Tool steps (label "tool:get_positions" etc.) ARE meaningful — they're the
+  // visible signal that the agent is actually doing something. Only drop the
+  // internal routing chatter (central_brain) and pure-planning placeholders.
  return steps.some((step) => {
-    const label = step.label.trim().toLowerCase()
    const detail = (step.detail || '').trim().toLowerCase()
-    if (label.startsWith('tool:') || detail === 'central_brain') {
-      return false
-    }
+    if (detail === 'central_brain') return false
    return step.status !== 'planning'
  })
 }