Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published 23 days ago • 62
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing Paper • 2506.19848 • Published Jun 24 • 26
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning Paper • 2505.22019 • Published May 28 • 11 • 3
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10 • 48
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10 • 48
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning Paper • 2504.07956 • Published Apr 10 • 48 • 2