We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Build a real-time Voice RAG Agent using Gemma 3 & Voice Cloning
Summary
Description
I just created a Real-time Voice RAG Agent!
(also cloned my voice in just 5 seconds)
Here's an overview of what this agent:
1. Listens to real-time audio
2. Transcribes it via AssemblyAI
3. Uses your docs (via LlamaIndex) to craft an answer
4. Speaks that answer back with Cartesia
Tech stack:
- Cartesia for SOTA text-to-speech
- AssemblyAI for speech-to-text
- LlamaIndex to power RAG
- LiveKit for orchestration
Why is Cartesia?
Cartesia enables you to generate seamless speech, power voice applications, and fine-tune your own voice models on the fastest real-time AI platform.
I used it to clone my own voice with just a 5-second clip and powered the agent with it.
You can find all the code and everything you need in this GitHub repo: https://github.com/patchy631/ai-engineering-hub/tree/main/rag-voice-agent
It's fairly easy to follow along and incase you're struck, feel free to raise an issue.
I'll be happy to help! :)
#ai #llm #agent #genai
Translated At: 2025-04-06T08:54:29Z