Chúng tôi không thể tìm thấy kết nối internet
Đang cố gắng kết nối lại
Có lỗi xảy ra!
Hãy kiên nhẫn trong khi chúng tôi khắc phục sự cố
Sapiens của Meta AI: Nền tảng cho các Mô hình Thị giác Người
In this video we dive into Sapiens, a new family of models for four fundamental human-centric tasks, presented by Meta AI in a recent research paper titled "Sapiens: Foundation for Human Vision Models".
The model's architecture is based on Vision Transformer (ViT), which for the first time pushed to train on 1K resolution images, x5 in size than DINOv2's input images size!
We cover the model's training process, which includes a self-supervised learning pretraining step, based on the masked-autoencoder (MAE) approach, which we also explain here.
For a quick background about Vision Transformers, watch the following short video - Introduction to Vision Transformers (ViT) | An Image is Worth 16x16 Words
Code - https://github.com/facebookresearch/sapiens
Paper page - https://arxiv.org/abs/2408.12569
-----------------------------------------------------------------------------------------------
✉️ Join the newsletter - https://aipapersacademy.com/newsletter/
Become a patron - https://www.patreon.com/aipapersacademy
👍 Please like & subscribe if you enjoy this content
-----------------------------------------------------------------------------------------------
Chapters:
0:00 Introduction
1:05 Humans-300M Dataset
1:54 Self-Supervised Pretraining
3:54 Task-specific Models
Dịch Vào Lúc: 2025-03-02T03:38:44Z