User Guide · 10
Voice Interaction
10.1 Voice Input (Chat Box)
- Click microphone button to start recording
- Click again to stop → auto-transcribed into input field
- Press Esc during recording to cancel (doesn't send)
- No global keyboard shortcut (avoids breaking accessibility focus navigation)
- Transcription depends on
STT_API_KEY(defaults to OpenAI Whisper)
10.2 Realtime Voice Conversation (S2S WebSocket)
- Direct WebSocket connection to OpenAI Realtime API
- Supports bidirectional streaming voice (speak → AI listens → AI streams reply)
- Transparent tool calls + backend tool execution
- API Key override via
S2S_API_KEY/S2S_BASE_URL
⚠️ The current React frontend realtime voice UI is still iterating. Can test via
/api/voice/wsWebSocket endpoint with third-party clients.