JellyfishBot/ Docs

User Guide · 10

Voice Interaction

10.1 Voice Input (Chat Box)

Click microphone button to start recording
Click again to stop → auto-transcribed into input field
Press Esc during recording to cancel (doesn't send)
No global keyboard shortcut (avoids breaking accessibility focus navigation)
Transcription depends on STT_API_KEY (defaults to OpenAI Whisper)

10.2 Realtime Voice Conversation (S2S WebSocket)

Direct WebSocket connection to OpenAI Realtime API
Supports bidirectional streaming voice (speak → AI listens → AI streams reply)
Transparent tool calls + backend tool execution
API Key override via S2S_API_KEY / S2S_BASE_URL

⚠️ The current React frontend realtime voice UI is still iterating. Can test via /api/voice/ws WebSocket endpoint with third-party clients.

Previous←Soul Memory System NextEnvironment Variable Configuration→