Image-Text-to-Text
Safetensors
English
qwen2_5_vl
agent
conversational
Logo

Dynamic Tool Orchestration for Iterative Visual Reasoning

Paper Docs Data & Model Homepage Demo Video

📋 Model Description

AdaReasoner-7B is a vision-language model trained with dynamic tool orchestration capabilities for iterative visual reasoning. This model is AdaReasoner-7B-Randomized.

We provide three variants of AdaReasoner-7B, each optimized for different use cases:

Model Description Hugging Face
AdaReasoner-7B-Randomized Trained with the adaptive learning method, enabling strong generalization to unseen tools and tasks. Designed for open-ended and evolving tool environments where adaptability is required. 🤗 Link
AdaReasoner-7B-Non-Randomized Trained without adaptive learning, providing more stable and reliable performance on known tools and tasks, but limited generalization to unseen tools or task settings. 🤗 Link
AdaReasoner-VSP-7B Task-specialized model trained exclusively on the Visual Spatial Planning (VSP) task, achieving strong performance on VSP benchmarks but not intended for cross-task generalization. 🤗 Link

Key Differences:

  • Randomized: Trained with adaptive learning method, enabling zero-shot generalization to novel tools and task configurations
  • Non-Randomized: Trained without adaptive learning, offering more predictable behavior on familiar tools but lacking generalization
  • VSP-7B: Task-specific model fine-tuned exclusively on Visual Spatial Planning (VSP) benchmarks for optimal performance on navigation tasks

🚀 Quick Start

AdaReasoner-7B can be deployed for single-turn inference using standard inference frameworks such as vLLM. However, AdaReasoner is a tool-planning model whose full capabilities require interaction with an external tool environment. To fully evaluate or utilize its tool-planning behavior, we recommend using AdaEval provided in our repository for batch inference and evaluation, or trying the Demo interface for interactive, single-instance GUI-based reasoning.

🎯 Capabilities

The model supports a diverse set of visual reasoning tasks, covering both structured reasoning and open-ended visual understanding: - Visual Spatial Planning Navigation and verification tasks based on grid-world environments (VSPO and VSP), evaluating fine-grained spatial perception, multi-step path planning, and safety verification under out-of-distribution map configurations. - Compositional Visual Reasoning (Jigsaw) Image reconstruction from shuffled patches (Jigsaw-COCO and BLINK-J), testing local–global consistency, part–whole reasoning, and visual compositional understanding. - GUI Question Answering (GUIQA) Fine-grained reasoning over GUI screenshots, including interactive webpage understanding (GUIChat) and agent-centric UI reasoning from WebMMU (Agentic Action subset), emphasizing element grounding, action planning, and multi-step inference. - General Visual Question Answering (General VQA) Open-ended visual reasoning beyond structured settings, evaluated on V* and HRBench, focusing on fine-grained visual search, attribute recognition, spatial relationship reasoning, and robustness to high-resolution, complex real-world scenes.

🛠️ Tool Integration

For full tool-augmented inference capabilities, please refer to the AdaReasoner repository which includes:

  • Tool Server deployment
  • AdaEval evaluation framework
  • Complete inference pipeline

📊 Performance

Please refer to our paper for detailed benchmark results across multiple visual reasoning tasks.

🔧 Technical Details

  • Base Architecture: Qwen 2.5 VL 7B Instruct
  • Training Method: Tool Cold Start (SFT) + Tool GRPO (RL) + Adaptive Learning
  • Context Length: Support for extended context with multiple tool interactions
  • Modalities: Text + Vision

📚 Citation

If you use this model in your research, please cite:

@article{adareasoner2024,
  title={Dynamic Tool Orchestration for Iterative Visual Reasoning},
  author={AdaReasoner Team},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

📄 License

Apache 2.0

🤝 Acknowledgments

This model is part of the AdaReasoner project. For more information, visit our GitHub repository.

📧 Contact

For questions and feedback, please open an issue in our GitHub repository.

Downloads last month
26
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AdaReasoner/AdaReasoner-7B-Randomized

Finetuned
(959)
this model
Quantizations
3 models

Datasets used to train AdaReasoner/AdaReasoner-7B-Randomized