User Guide · 10

Voice Interaction

10.1 Voice Input (Chat Box)

  • Click microphone button to start recording
  • Click again to stop → auto-transcribed into input field
  • Press Esc during recording to cancel (doesn't send)
  • No global keyboard shortcut (avoids breaking accessibility focus navigation)
  • Transcription depends on STT_API_KEY (defaults to OpenAI Whisper)

10.2 Realtime Voice Conversation (S2S WebSocket)

  • Direct WebSocket connection to OpenAI Realtime API
  • Supports bidirectional streaming voice (speak → AI listens → AI streams reply)
  • Transparent tool calls + backend tool execution
  • API Key override via S2S_API_KEY / S2S_BASE_URL

⚠️ The current React frontend realtime voice UI is still iterating. Can test via /api/voice/ws WebSocket endpoint with third-party clients.