Chúng tôi không thể tìm thấy kết nối internet
Đang cố gắng kết nối lại
Có lỗi xảy ra!
Hãy kiên nhẫn trong khi chúng tôi khắc phục sự cố
V-JEPA của Meta AI - Mô hình dựa trên Video và Thị giác Máy tính Giống con người
In this video we dive into V-JEPA, a new vision models collection, created by Meta AI. V-JEPA stands for Video Joint-Embedding Predictive Architecture, and is part of the Meta AI's implementation of Yann LeCun's vision for a more human-like AI. In this video we dive deep into the researcher paper which presented V-JEPA, titled: "Revisiting Feature Prediction for Learning Visual Representations from Video". Additionally, we provide reminders for important information from I-JEPA, a previous Meta AI's JEPA model which is based on images, which will help to grasp how JEPA works for videos as well.
We start with a short background of what is the meaning of visual representations, also known as visual features or semantic embeddings. V-JEPA is trained using unsupervised learning using feature prediction, so we provide a short background for what is the meaning of feature prediction, which is different than pixels prediction. By then, we are ready to cover the JEPA framework, starting with the main idea, following with the details of both images with I-JEPA and videos with V-JEPA.
Both I-JEPA and V-JEPA models are based on Vision Transformers, which we may assume that you are familiar with in the video. We covered the details of vision transformers in the following video - https://youtu.be/NetSJM590Lo
We also have a previous video dedicated solely to I-JEPA with more details on the I-JEPA paper which we do not cover here - https://youtu.be/6bJIkfi8H-E
-----------------------------------------------------------------------------------------------
Meta AI's V-JEPA blog post - https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/
Code - https://github.com/facebookresearch/jepa
Blog post - https://aipapersacademy.com/v-jepa/
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/
👍 Please like & subscribe if you enjoy this content
Become a patron - https://www.patreon.com/aipapersacademy
We use VideoScribe to edit our videos - https://tidd.ly/44TZEiX
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Introduction
1:01 Visual Representations
2:42 Feature Prediction
4:12 JEPA Framework
5:55 I-JEPA Details
8:56 V-JEPA Details
10:52 V-JEPA Results
Dịch Vào Lúc: 2025-03-02T03:59:26Z