We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Mixture of Nested Experts by Google: Efficient Alternative To MoE?
Summary
Description
In this video, we dive into a recent research paper by Google, titled: "Mixture of Nested Experts: Adaptive Processing of Visual Tokens". While standard Mixture of Experts (MoE) is successfully applied in LLMs, and also in computer vision, to increase computational cost without a proportional increase to model size, it comes with a large memory footprint. The Mixture of Nested Experts (MoNE) which we review in this video tackles that drawback. Mixture of Nested Experts is built on top of the Vision Transformer (ViT) architecture, and offers a dramatic performance improvement, by leveraging the fact that images naturally contain a large amount of information redundancy. So, while ViT (also with MoE), allocates its full compute power for each token, Mixture of Nested Experts (MoNE) learns to allocate compute power to tokens based on their importance.
Watch the video to learn more.
Paper page - https://arxiv.org/abs/2407.19985
Mixture of Experts (MoE) Video - https://youtu.be/kb6eH0zCnl8
Post - https://aipapersacademy.com/mixture-of-nested-experts/
Original Mixture-of-Experts paper review - https://aipapersacademy.com/mixture-of-experts/
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/
👍 Please like & subscribe if you enjoy this content
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Introduction
1:20 MoNE Illustration
4:36 MoNE Diagram
5:47 Results
Translated At: 2025-03-02T03:56:47Z