agentic family

Agentic Voice

Speech for AI apps.
TTS + STT in one library. Zero dependencies.
OpenAI-compatible. Push-to-talk built in.

Quick Start

Three lines to speak

Complete voice setup
const voice = AgenticVoice.createVoice({
  tts: { baseUrl: 'https://api.openai.com', apiKey: 'sk-...', voice: 'alloy' },
  stt: { mode: 'browser' },  // or 'whisper'
})

// Speak
await voice.speak('Hello world')

// Listen
voice.startListening()
voice.on('transcript', text => console.log(text))

Two engines

Speak and listen

🔊

Text-to-Speech

OpenAI-compatible TTS API. AudioContext playback with Audio element fallback.

Generation tracking — no stale audio
Retry with exponential backoff
CORS proxy support
Auto-stops when user starts recording
AudioContext unlock helper

🎤

Speech-to-Text

Two modes: Web Speech API (free, no key) or Whisper API (accurate, any language).

Push-to-talk with minimum hold time
webm → wav conversion for Whisper
MediaRecorder + Web Speech dual path
Automatic language detection
Direct blob transcription API

Standalone

Use what you need

TTS only
const tts = AgenticVoice.createTTS({ apiKey: '...', voice: 'nova' })
await tts.speak('一句话就够了')
tts.stop()

STT only
const stt = AgenticVoice.createSTT({ mode: 'whisper', apiKey: '...' })
stt.startListening(
  text => console.log('You said:', text),
  err  => console.error(err)
)
stt.stopListening()

API

Voice instance

speak(text, opts?)

Speak text aloud. Async — resolves when done.

stop()

Stop speaking immediately.

startListening()

Start recording. Emits 'transcript' when done.

stopListening()

Stop recording, trigger transcription.

transcribe(blob)

Transcribe an audio Blob directly via Whisper API.

unlock()

Unlock AudioContext. Call on first user gesture.

on(event, fn)

Listen: 'transcript', 'speaking', 'listening', 'error'

isSpeaking

Boolean — currently playing audio?

isListening

Boolean — currently recording?

destroy()

Release AudioContext, stop everything.

Design

Why one library

🔄

Mutual Exclusion

TTS auto-stops when recording starts. No audio collision. No echo loops.

⏱

Generation Tracking

Each speak() call gets a generation ID. Stale responses never play over fresh ones.

🔌

OpenAI-Compatible

Works with any OpenAI-compatible TTS/Whisper endpoint. Local, cloud, self-hosted.

🎯

Push-to-Talk

Hold to record, release to transcribe. Minimum hold time prevents accidental triggers.

🔀

Dual STT Modes

Web Speech API for zero-cost, Whisper API for accuracy. Switch with one config change.

📦

Zero Dependencies

One file. UMD format. Works with script tags, CommonJS, and ESM. No build step.

With Claw

Voice-enabled agent in 10 lines

agentic-claw + agentic-voice
const claw  = AgenticClaw.createClaw({ apiKey, skills: ['calculate'] })
const voice = AgenticVoice.createVoice({
  tts: { apiKey: ttsKey, voice: 'nova' },
  stt: { mode: 'browser' },
})

voice.on('transcript', async text => {
  const { answer } = await claw.chat(text)
  await voice.speak(answer)
})

Ecosystem

Part of the agentic family

🧠

agentic-core

LLM + vision. The brain.

👁

agentic-sense

Perception engine. The eyes.

⚡

agentic-act

Intent → action. The will.

🎨

agentic-render

Dynamic UI. The expression.

🗣️

agentic-voice

TTS + STT. The voice.

💭

agentic-memory

Context + retrieval. The memory.

📦

agentic-store

SQLite persistence. Long-term storage.

🦀

agentic-claw

Runtime + skills. The body.