Voice AI

Voice Agent

Voice Agent is a production-grade conversational AI proof of concept that demonstrates a sub-1.2-second round-trip from end-of-utterance to first audio byte — the latency budget that separates a natural-feeling agent from one that feels broken. The platform combines LiveKit Cloud (WebRTC SFU + Python agent worker), a LangGraph state machine for conversation flow, FastMCP for tool exposure, Sarvam for Indian-language speech, Groq for English speech, and OpenAI for language understanding. PostgreSQL persists session memory; Redis holds live state. The architecture follows the latency budget, silence-handling, and conversation-design patterns from the Building Intelligent Voice Agents guide.

Launch live demo Request a walkthroughv1.0 · 2026-05-08

Detail-page tour · LiveKit + LangGraph + multi-language voice stack

<1.2s

Round-trip latency

<1.1s

Max silence

Multi-lang

EN + Indian

WebRTC

Audio transport

What you can do

Core capabilities

Hands-on features available when you launch the live demo.

Sub-1.2s round-trip latency

End-to-end pipeline tuned to keep total time from end of caller's utterance to first audio byte of the agent's response under the 1.2-second budget that separates natural conversation from a broken-feeling agent.

Filler-phrase handler against silence

When the language model is slow, a brief phrase like 'let me check on that for you' fills the gap without breaking flow. The agent never goes silent for more than 1.1 seconds — silence in a phone call is an error signal, not a neutral state.

LangGraph conversation state machine

Multi-turn dialog managed as an explicit DAG with named nodes (greet, gather, confirm, fulfill, recover) instead of free-form prompts — predictable, testable, and easier to evolve as flows grow.

FastMCP tool exposure

External tools (lookups, bookings, escalations) are surfaced through FastMCP so the agent can call them mid-conversation with bounded latency and structured arguments.

Multi-language speech (Sarvam + Groq)

Sarvam handles Indian-language voice (Hindi, Telugu, Tamil, etc.); Groq handles English STT with sub-200ms first-token. Voice routing is transparent to the language model.

OpenAI for language understanding

Conversation reasoning, intent extraction, and tool-argument synthesis run through OpenAI; the model sees only text after STT, so the language layer is decoupled from the audio layer.

Persistent + live session memory

PostgreSQL stores cross-call memory (preferences, history); Redis holds live in-session state (current turn, pending tool calls, partial transcripts) so reconnections after a network blip resume cleanly.

LiveKit Cloud audio transport

WebRTC SFU handles bidirectional low-latency audio. The Python agent worker auto-joins each room created by the front-end token-issuer; no audio ever transits our application server.

Under the hood

Technology stack

Every layer of the stack — from database to 3D renderer.

Technology	Role & contribution
LiveKit Cloud (Agents SDK + SFU)	WebRTC audio transport + Python agent worker that auto-joins rooms
LangGraph	Stateful conversation DAG (greet → gather → confirm → fulfill → recover)
FastMCP	Tool exposure surface — lookup, booking, escalation calls during a turn
Sarvam	Indian-language speech-to-text + text-to-speech (Hindi, Telugu, Tamil, etc.)
Groq	English speech-to-text with sub-200ms first-token latency
OpenAI	Language understanding, intent extraction, tool-argument synthesis
PostgreSQL	Persistent cross-call memory (preferences, history, audit log)
Redis	Live in-session state (current turn, pending tool calls, partial transcripts)
Next.js 15 + LiveKit React SDK	Front-end UI + server-side LiveKit access-token issuer

Ready to try it

Launch the live Voice Agent demo

Runs in your browser · camera processed locally · never stored.

Launch demo

AI platform

Build your own AI experience

Explore the full AIXcelerator platform — agents, skills, MCP servers, and the modular capability layers that power demos like this one.

Book a demo Explore AIXcelerator

Voice AgentVoice Agent