Xây dựng một Voice RAG Agent thời gian thực bằng Gemma 3 & Nhân bản giọng nói

Tác giả: Akshay Pachaar
Ngày xuất bản: 2025-03-20T00:00:00
Length: 09:31

I just created a Real-time Voice RAG Agent!

(also cloned my voice in just 5 seconds)

Here's an overview of what this agent:

1. Listens to real-time audio

2. Transcribes it via AssemblyAI

3. Uses your docs (via LlamaIndex) to craft an answer

4. Speaks that answer back with Cartesia

Tech stack:

- Cartesia for SOTA text-to-speech

- AssemblyAI for speech-to-text

- LlamaIndex to power RAG

- LiveKit for orchestration

Why is Cartesia?

Cartesia enables you to generate seamless speech, power voice applications, and fine-tune your own voice models on the fastest real-time AI platform.

I used it to clone my own voice with just a 5-second clip and powered the agent with it.

You can find all the code and everything you need in this GitHub repo: https://github.com/patchy631/ai-engineering-hub/tree/main/rag-voice-agent

It's fairly easy to follow along and incase you're struck, feel free to raise an issue.

I'll be happy to help! :)

#ai #llm #agent #genai

Dịch Vào Lúc: 2025-04-06T08:54:29Z

Yêu cầu dịch (Một bản dịch khoảng 5 phút)

Phiên bản 3 (ổn định)

Tối ưu hóa cho một người nói. Phù hợp cho video chia sẻ kiến thức hoặc giảng dạy.

Video Đề Xuất