Xiaomi Auto World Model: A Joint World Model Integrating Reconstruction and Generation for Autonomous Driving Paper • 2605.18137 • Published 28 days ago • 1
view article Article Compressing Time: A Comparative Study of Video VAEs in Diffusers Bekhouche • 27 days ago • 2
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images Paper • 2412.08802 • Published Dec 11, 2024 • 7
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 225
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published Mar 17 • 141
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 166
view article Article SigLIP 2: A better multilingual vision language encoder +1 ariG23498, merve, qubvel-hf • Feb 21, 2025 • 217
Real-time Vision Models Collection A collection of real-time detectors. • 21 items • Updated 7 days ago • 24
3D Gaussian Splatting for Real-Time Radiance Field Rendering Paper • 2308.04079 • Published Aug 8, 2023 • 204
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 abidlabs, znation, nouamanetazi, sasha, qgallouedec • Jul 29, 2025 • 225
💫StarVector Models Collection StarVector is a multimodal LLM for Scalable Vector Graphics (SVG) generation, producing structured SVG code directly from images and text. • 2 items • Updated Mar 20, 2025 • 99
view article Article Open R1: How to use OlympicCoder locally for coding +3 burtenshaw, reach-vb, lewtun, edbeeching, yagilb • Mar 20, 2025 • 63
view article Article PyTorchModelHubMixin: Bridging the Gap for Custom AI Models on Hugging Face not-lain • Nov 11, 2024 • 20
CIDAR: Culturally Relevant Instruction Dataset For Arabic Paper • 2402.03177 • Published Feb 5, 2024 • 8