Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving Paper • 2605.18137 • Published 21 days ago • 1
view article Article Compressing Time: A Comparative Study of Video VAEs in Diffusers Bekhouche • 20 days ago • 2
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images Paper • 2412.08802 • Published Dec 11, 2024 • 7
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 222
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 140
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 166
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 217
Real-time Vision Models Collection A collection of real-time detectors. • 20 items • Updated Feb 18 • 24
view post Post 2682 We have published an excellent paper for Arabic CLIP model.Paper link:https://aclanthology.org/2024.arabicnlp-1.9/More information in this website:https://arabic-clip.github.io/Arabic-CLIP/All datasets, models, and demo are published to Huggingface: Arabic-Clip The codes are published to github:https://github.com/Arabic-Clip/Arabic-CLIP ❤️ 7 7 👀 2 2 🚀 2 2 🔥 1 1 + Reply
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 204
pain/dinov3-smallplus-mask2former-v1.0-3000_samples-12-classes-enhanced-diff-lr Updated Oct 23, 2025 • 1