oguzhanercan 's Collections Vision Reasoning
updated
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
• 2505.15966
• Published
• 53
GRIT: Teaching MLLMs to Think with Images
Paper
• 2505.15879
• Published
• 13
Think or Not? Selective Reasoning via Reinforcement Learning for
Vision-Language Models
Paper
• 2505.16854
• Published
• 11
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced
Multimodal Chain-of-Thought
Paper
• 2505.16192
• Published
• 12
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System
Collaboration
Paper
• 2505.20256
• Published
• 19
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
Paper
• 2505.18291
• Published
• 2
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Paper
• 2506.13654
• Published
• 43
VGR: Visual Grounded Reasoning
Paper
• 2506.11991
• Published
• 20
GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal
Reasoning
Paper
• 2506.16141
• Published
• 27
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper
• 2506.22434
• Published
• 10
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published
• 251
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and
Future Frontiers
Paper
• 2506.23918
• Published
• 90
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and
Methodology
Paper
• 2507.07999
• Published
• 50
RoboBrain 2.0 Technical Report
Paper
• 2507.02029
• Published
• 35
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published
• 75
VisionThink: Smart and Efficient Vision Language Model via Reinforcement
Learning
Paper
• 2507.13348
• Published
• 79
Reinforcement Learning in Vision: A Survey
Paper
• 2508.08189
• Published
• 30
STEP3-VL-10B Technical Report
Paper
• 2601.09668
• Published
• 193
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper
• 2601.14724
• Published
• 74