NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 3 days ago • 51
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 1 day ago • 76
VINO: A Unified Visual Generator with Interleaved OmniModal Context Paper • 2601.02358 • Published 2 days ago • 24
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published 4 days ago • 25
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation Paper • 2601.02256 • Published 3 days ago • 30
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published 2 days ago • 23
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published 6 days ago • 104
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation Paper • 2601.00664 • Published 6 days ago • 45
Nested Browser-Use Learning for Agentic Information Seeking Paper • 2512.23647 • Published 9 days ago • 17
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation Paper • 2512.23705 • Published 9 days ago • 44
GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models Paper • 2512.15560 • Published 22 days ago • 24
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone Paper • 2512.22615 • Published 12 days ago • 43
TimeBill: Time-Budgeted Inference for Large Language Models Paper • 2512.21859 • Published 13 days ago • 24
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 15 days ago • 60