agentic family

Agentic Voice

Speech for AI apps.
TTS + STT in one library. Zero dependencies.
OpenAI-compatible. Push-to-talk built in.
GitHub β†’ API Reference
Quick Start
Three lines to speak
Complete voice setup
const voice = AgenticVoice.createVoice({ tts: { baseUrl: 'https://api.openai.com', apiKey: 'sk-...', voice: 'alloy' }, stt: { mode: 'browser' }, // or 'whisper' }) // Speak await voice.speak('Hello world') // Listen voice.startListening() voice.on('transcript', text => console.log(text))
Two engines
Speak and listen
πŸ”Š

Text-to-Speech

OpenAI-compatible TTS API. AudioContext playback with Audio element fallback.

  • Generation tracking β€” no stale audio
  • Retry with exponential backoff
  • CORS proxy support
  • Auto-stops when user starts recording
  • AudioContext unlock helper
🎀

Speech-to-Text

Two modes: Web Speech API (free, no key) or Whisper API (accurate, any language).

  • Push-to-talk with minimum hold time
  • webm β†’ wav conversion for Whisper
  • MediaRecorder + Web Speech dual path
  • Automatic language detection
  • Direct blob transcription API
Standalone
Use what you need
TTS only
const tts = AgenticVoice.createTTS({ apiKey: '...', voice: 'nova' }) await tts.speak('δΈ€ε₯θ―ε°±ε€ŸδΊ†') tts.stop()
STT only
const stt = AgenticVoice.createSTT({ mode: 'whisper', apiKey: '...' }) stt.startListening( text => console.log('You said:', text), err => console.error(err) ) stt.stopListening()
API
Voice instance
Method
Description
speak(text, opts?)
Speak text aloud. Async β€” resolves when done.
stop()
Stop speaking immediately.
startListening()
Start recording. Emits 'transcript' when done.
stopListening()
Stop recording, trigger transcription.
transcribe(blob)
Transcribe an audio Blob directly via Whisper API.
unlock()
Unlock AudioContext. Call on first user gesture.
on(event, fn)
Listen: 'transcript', 'speaking', 'listening', 'error'
isSpeaking
Boolean β€” currently playing audio?
isListening
Boolean β€” currently recording?
destroy()
Release AudioContext, stop everything.
Design
Why one library
πŸ”„
Mutual Exclusion
TTS auto-stops when recording starts. No audio collision. No echo loops.
⏱
Generation Tracking
Each speak() call gets a generation ID. Stale responses never play over fresh ones.
πŸ”Œ
OpenAI-Compatible
Works with any OpenAI-compatible TTS/Whisper endpoint. Local, cloud, self-hosted.
🎯
Push-to-Talk
Hold to record, release to transcribe. Minimum hold time prevents accidental triggers.
πŸ”€
Dual STT Modes
Web Speech API for zero-cost, Whisper API for accuracy. Switch with one config change.
πŸ“¦
Zero Dependencies
One file. UMD format. Works with script tags, CommonJS, and ESM. No build step.
With Claw
Voice-enabled agent in 10 lines
agentic-claw + agentic-voice
const claw = AgenticClaw.createClaw({ apiKey, skills: ['calculate'] }) const voice = AgenticVoice.createVoice({ tts: { apiKey: ttsKey, voice: 'nova' }, stt: { mode: 'browser' }, }) voice.on('transcript', async text => { const { answer } = await claw.chat(text) await voice.speak(answer) })
Ecosystem
Part of the agentic family
🧠
agentic-core
LLM + vision. The brain.
πŸ‘
agentic-sense
Perception engine. The eyes.
⚑
agentic-act
Intent β†’ action. The will.
🎨
agentic-render
Dynamic UI. The expression.
πŸ—£οΈ
agentic-voice
TTS + STT. The voice.
πŸ’­
agentic-memory
Context + retrieval. The memory.
πŸ“¦
agentic-store
SQLite persistence. Long-term storage.
πŸ¦€
agentic-claw
Runtime + skills. The body.