We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Qwen 2.5 Omni - The Most Multi-modal
Summary
Description
📜 Get repo access at Trelis.com/ADVANCED-transcription
📧 Get the Trelis AI Newsletter: https://trelis.substack.com
❗️If you subscribed here, click the bell to be notified of new vids
🤝 Work for Trelis: https://trelis.com/jobs/
💡 Need Technical or Market Assistance?
Book a Consult Here: https://forms.gle/wJXVZXwioKMktjyVA
💸 Starting a New Project/Venture?
Apply for a Trelis Grant: https://trelis.com/trelis-ai-grants/
Video Links:
- HF Repo: https://huggingface.co/Qwen/Qwen2.5-Omni-7B/tree/main
- Qwen2.5 Omni Paper: https://arxiv.org/pdf/2503.20215
- Llama 3 Paper: https://arxiv.org/pdf/2407.21783
- Moshi: https://arxiv.org/pdf/2410.00037
TIMESTAMPS:
0:00 Qwen 2.5 Onmi - Video, Text and Audio Inputs, Text and Audio Outputs.
0:24 Qwen2.5 Architecture, incl. TMRoPE
6:29 Qwen Omni vs Llama 3.
7:43 Qwen Omni vs Moshi.
9:32 Comparison with GPT-4o and Gemini Pro 2.5.
13:09 How to run Qwen 2.5 Onmi on a GPU?
18:19 Inference with Audio Inputs and Audio + Text Outputs.
22:48 Inference with Video Input and Audio Output + Text Output.
27:22 Qwen 2.5 Model Architecture Print-out
29:20 When should you use Qwen 2.5 Omni?
Translated At: 2025-04-08T12:47:06Z