"Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries Paper • 2508.15752 • Published 3 days ago • 5 • 2
When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding Paper • 2508.15641 • Published 3 days ago • 2 • 2
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling Paper • 2508.15767 • Published 3 days ago • 10 • 2
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published 5 days ago • 27 • 3
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization Paper • 2508.14460 • Published 5 days ago • 74 • 2
S^2-Guidance: Stochastic Self Guidance for Training-Free Enhancement of Diffusion Models Paper • 2508.12880 • Published 6 days ago • 42 • 2
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Paper • 2508.12945 • Published 6 days ago • 12 • 3
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published 6 days ago • 22 • 2
Hidden in plain sight: VLMs overlook their visual representations Paper • 2506.08008 • Published Jun 9 • 8 • 1
B-score: Detecting biases in large language models using response history Paper • 2505.18545 • Published May 24 • 31 • 2
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published May 21 • 20 • 2