Voice API

Introduction

The Voice API allows you to integrate Exei’s real-time AI Voice Agent directly into your own applications using APIs and WebSockets.

This channel is designed for developers who want full control over the voice experience, such as building custom voice assistants, call flows, kiosks, mobile apps, or internal tools—outside of Exei’s default UI.

What the Voice API Enables

Using the Voice API, you can:

Connect to Exei’s AI Voice Agent programmatically
Send real-time audio input (speech)
Receive real-time audio output (AI responses)
Handle speech-to-text (STT) and text-to-speech (TTS)
Manage interruptions and transcripts
Build custom voice experiences on top of Exei

All conversations created through the Voice API are tracked inside Exei.

Where to Find Voice API Configuration

To access the Voice API details:

Open your AI agent from My Agents
Go to Channels
Select Voice API

This page provides everything required to integrate the Voice API.

Voice API Credentials

The Voice API configuration page provides:

API Endpoint

This is the base endpoint used to connect to Exei’s Voice API.

Client ID

A unique identifier used to authenticate your Voice API requests.

Both values are required to establish a connection and should be kept secure.

How the Voice API Works

The Voice API uses WebSocket-based real-time communication.

A typical flow looks like this:

Establish a WebSocket connection using the API endpoint
Authenticate using the Client ID
Generate a session ID for the conversation
Send audio data (user speech) to Exei
Receive audio responses (AI speech) in real time
Handle transcripts and interruption events

This enables natural, low-latency voice conversations.

Step-by-Step Voice API Flow

1. Connect to the Voice API

Create a WebSocket connection using the provided API endpoint.
This connection is used for sending and receiving real-time audio data.

2. Generate a Session ID

Each voice conversation requires a unique session ID.

The session ID:

Identifies the conversation
Keeps audio streams and transcripts in sync
Is required for tracking the session in Exei

3. Initialize the Voice Session

Once connected, initialize the session by sending:

Session ID
Client ID
Any required configuration parameters

This tells Exei to start a new voice interaction.

4. Send Audio for Speech-to-Text (STT)

Capture microphone audio from the user and send it to the Voice API.

The API:

Converts speech to text
Uses the text to generate an AI response
Supports real-time streaming audio

5. Receive Audio Responses (TTS)

The Voice API streams back:

AI-generated audio responses
Partial or complete speech output

You can play this audio directly in your application.

6. Handle Interrupts and Transcripts

The Voice API supports interruption handling.

If a user speaks while the AI is responding:

The API sends an interrupt event
Current audio playback can be stopped
The new input is processed immediately

Transcript events are also sent, allowing you to:

Display live text
Store conversation logs
Debug voice interactions

Voice Settings Used by the API

The Voice API respects the agent’s voice configuration.

Voice behavior such as:

Language
Accent
Speech style
Voice model

is controlled from Channels → Voice in Exei.

Any changes made there automatically apply to Voice API interactions.

Conversations & Tracking

All conversations created via the Voice API:

Appear in the Conversations section
Include full transcripts
Support feedback and Instant Retrain
Can be reviewed for analytics and debugging
Are included in Insights (based on plan availability)

Voice API conversations are treated the same as other voice channels.

Security Best Practices

Keep API endpoints and Client IDs private
Do not expose credentials in client-side code
Use secure server-side handling where possible
Rotate credentials if compromised

Best Practices for Voice API Integration

Generate a new session ID for each conversation
Handle interruptions gracefully
Monitor transcripts for accuracy
Test with different accents and languages
Log errors and fallback events

Common Mistakes to Avoid

Reusing session IDs across conversations
Not handling interrupt events
Sending unsupported audio formats
Hardcoding credentials in public clients

When to Use the Voice API

Use the Voice API when:

You are building a custom voice application
You need full control over the UI and flow
You are integrating Exei into external systems
Default website or VoIP voice channels are not sufficient

What’s Next?

Once Voice API is integrated, you can enhance and optimize the experience.

Recommended next guides:

Page updated

Report abuse