agentic-act — AI Intent Execution Engine

Core Concept

Sense → Think → Act

tools 是输入扩展 — AI 调外部能力来获取信息。

actions 是输出扩展 — AI 选外部通道来输出信息。

act 是 AI 意志的投射，不是能力的延伸。输入一个意图，act 从注册的行动选项中选择最合适的一个，填好参数返回。执行是你的事。

所有 AI 产品的交互方式都是固定的 — ChatGPT 永远是文字框，Siri 永远是语音。但现实中人与人的沟通方式是动态选择的：看到你在忙就发微信不打电话。act 让 AI 也能做这个判断。

📝

多模态输入

text / image / audio / sense / 任意字段

↓

🧠

LLM 决策

理解意图 + 感知环境 + 匹配 actions

↓

⚡

结构化输出

action id + params + reason + confidence

Quick Start

Three lines to decide

const act = new AgenticAct({
  provider: 'openai',
  apiKey: 'sk-...',

  // 路由准则 — 告诉 AI 怎么思考、怎么选
  prompt: `他工作时极度讨厌被打断，除非紧急一律静默排队。
    做饭时喜欢听语音播报。晚上 11 点后不要用语音。`,

  actions: [
    {
      id: 'voice',
      name: '语音播报',
      description: '通过音箱语音告知。适合用户没在看屏幕的场景',
      schema: { content: 'string', volume: 'low|normal' },
      handler: async (params) => speaker.say(params.content)
    }
  ]
})

// 只决策
const decision = await act.decide({ text: '外卖到了', image: cameraFrame })
// → { action: 'voice', params: { content: '外卖到楼下了', volume: 'low' } }

// 决策 + 执行
const result = await act.run({ text: '会议要开始了' })
// → { action: 'voice', params: {...}, executed: true }

Design

Built for intent, not implementation

🎯

意图驱动

输入的不是"用音箱播语音"，而是"让他知道外卖到了"。怎么做，act 自己决定。

📡

多模态输入

text / image / audio / sensor data — 全部可选，至少一个。任意字段自动归一化。

🔌

Actions 注册表

声明式定义可选行动。id + name + description + schema + handler。consumer 教 act 决策。

🧠

decide() / run()

decide 只决策不执行。run 决策后自动调用 handler。两种模式，一个接口。

🔗

agentic-core 驱动

内部使用 agentic-core 调用 LLM。OpenAI / Anthropic 双 provider，vision 原生支持。

📦

零依赖

单文件 ~5KB。浏览器 script 标签或 Node.js require 直接用。

Philosophy

Why act exists

Tools vs Actions
输入 ↔ 输出
Tools 是 AI 调外部能力来获取信息 — 搜索、查询、计算。Actions 是 AI 选择外部通道来输出信息 — 说话、通知、亮灯。方向相反，抽象对称。

Framework shapes thinking

框架决定思维

如果把 act 当 tool 实现，developer 会想"给 AI 加个发通知的工具"。用 act，developer 会想"AI 该怎么跟人说话"。前者是能力视角，后者是意图视角。

Ecosystem

Part of the agentic family

🧠

agentic-core

LLM + vision. The brain.

👁

agentic-sense

Perception engine. The eyes.

⚡

agentic-act

Intent → action. The will.

🎨

agentic-render

Dynamic UI. The expression.

🗣️

agentic-voice

TTS + STT. The voice.

💭

agentic-memory

Context + retrieval. The memory.

📦

agentic-store

SQLite persistence. Long-term storage.

🦀

agentic-claw

Runtime + skills. The body.

Agentic Act