Mixture of Nested Experts by Google: Efficient Alternative To MoE?

Source: https://www.youtube.com/watch?v=rJAndyAbErc

Author: AI Papers Academy

Published At: 2024-08-11T00:00:00

Length: 07:37

Summary

Description

In this video, we dive into a recent research paper by Google, titled: "Mixture of Nested Experts: Adaptive Processing of Visual Tokens". While standard Mixture of Experts (MoE) is successfully applied in LLMs, and also in computer vision, to increase computational cost without a proportional increase to model size, it comes with a large memory footprint. The Mixture of Nested Experts (MoNE) which we review in this video tackles that drawback. Mixture of Nested Experts is built on top of the Vision Transformer (ViT) architecture, and offers a dramatic performance improvement, by leveraging the fact that images naturally contain a large amount of information redundancy. So, while ViT (also with MoE), allocates its full compute power for each token, Mixture of Nested Experts (MoNE) learns to allocate compute power to tokens based on their importance.

Watch the video to learn more.

Paper page - https://arxiv.org/abs/2407.19985

Mixture of Experts (MoE) Video - https://youtu.be/kb6eH0zCnl8

Post - https://aipapersacademy.com/mixture-of-nested-experts/

Original Mixture-of-Experts paper review - https://aipapersacademy.com/mixture-of-experts/

-----------------------------------------------------------------------------------------------

✉️ Join the newsletter - https://aipapersacademy.com/newsletter/

👍 Please like & subscribe if you enjoy this content

-----------------------------------------------------------------------------------------------

Chapters:

0:00 Introduction

1:20 MoNE Illustration