File size: 5,258 Bytes
88885fa d54fe4d 88885fa f619c10 3bfa588 f619c10 71d77bb 3bfa588 f619c10 5f43073 a6fb2e8 5f43073 f619c10 604c0cf f619c10 cfa4338 a6fb2e8 cfa4338 5f43073 cfa4338 a6fb2e8 cfa4338 f619c10 cfa4338 f619c10 5f43073 f619c10 69d5c76 f619c10 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
base_model:
- Qwen/Qwen3-4B
pipeline_tag: text-generation
library_name: transformers
license: apache-2.0
---

# II-Search-4B
<aside>
A 4B parameter language model specialized in information seeking, multi-hop reasoning, and web-integrated search, achieving state-of-the-art performance among models of similar size.
</aside>


## Model Description
II-Search-4B is a 4B parameter language model based on Qwen3-4B, fine-tuned specifically for information seeking tasks and web-integrated reasoning. It excels at complex multi-hop information retrieval, fact verification, and comprehensive report generation.
### Key Features
- Enhanced tool usage for web search and webpage visits
- Multi-hop reasoning capabilities with sophisticated planning
- Verified information retrieval with cross-checking
- Strong performance on factual QA benchmarks
- Comprehensive report generation for research queries
## Training Methodology
Our training process consisted of three key phases:
### Phase 1: Tool Call Ability Stimulation
We used a distillation approach from larger models (Qwen3-235B) to generate reasoning paths with function calling on multi-hop datasets. This established the base capabilities for tool use.
### Phase 2: Reasoning Improvement
We addressed initial limitations by:
- Creating synthetic problems requiring more reasoning turns, inspired by Random Walk algorithm
- Improving reasoning thought patterns for more efficient and cleaner reasoning paths
### Phase 3: Rejection Sampling & Report Generation
We applied:
- Filtering to keep only high-quality reasoning traces (correct answers with proper reasoning)
- STORM-inspired techniques to enhance comprehensive report generation
### Phase 4: Reinforcement Learning
We trained the model using reinforcement learning
- Used dataset: [dgslibisey/MuSiQue](https://huggingface.co/datasets/dgslibisey/MuSiQue)
- Incorporated our in-house search database (containing Wiki data, Fineweb data, and ArXiv data)
## Performance
| **Benchmark** | **Qwen3-4B** | **Jan-4B** | **WebSailor-3B** | **II-Search-4B** |
| --- | --- | --- | --- | --- |
| OpenAI/SimpleQA | 76.8 | 80.1 | 81.8 | 91.8 |
| Google/Frames | 30.7 | 24.8 | 34.0 | 67.5 |
| Seal_0 | 6.31 | 2.7 | 1.8 | 22.5 |
### Tool Usage Comparison
**Simple QA (SerpDev)**
| | **Qwen3-4B** | **Jan-4B** | **WebSailor-3B** | **II-Search-4B** |
| --- | --- | --- | --- | --- |
| # Search | 1.0 | 0.9 | 2.1 | 2.2 |
| # Visit | 0.1 | 1.9 | 6.4 | 3.5 |
| # Total Tools | 1.1 | 2.8 | 8.5 | 5.7 |
All benchmark traces from models can be found at: https://huggingface.co/datasets/Intelligent-Internet/II-Search-Benchmark-Details
## Intended Use
II-Search-4B is designed for:
- Information seeking and factual question answering
- Research assistance and comprehensive report generation
- Fact verification and evidence-based reasoning
- Educational and research applications requiring factual accuracy
## Usage
To deploy and interact with the II-Search-4B model effectively, follow these options:
1. Serve the model using vLLM or SGLang
Use the following command to serve the model with vLLM (adjust parameters as needed for your hardware setup):
```bash
vllm serve Intelligent-Internet/II-Search-4B --served-model-name II-Search-4B --tensor-parallel-size 8 --enable-reasoning --reasoning-parser deepseek_r1 --rope-scaling '{"rope_type":"yarn","factor":1.5,"original_max_position_embeddings":98304}' --max-model-len 131072
```
This configuration enables distributed tensor parallelism across 8 GPUs, reasoning capabilities, custom RoPE scaling for extended context, and a maximum context length of 131,072 tokens.
2. Integrate web_search and web_visit tools
Equip the served model with web_search and web_visit tools to enable internet-aware functionality. Alternatively, use a middleware like MCP for tool integration—see this example repository: https://github.com/hoanganhpham1006/mcp-server-template.
## Host on macOS with MLX for local use
As an alternative for Apple Silicon users, host the quantized [II-Search-4B-MLX](https://huggingface.co/Intelligent-Internet/II-Search-4B-MLX) version on your Mac. Then, interact with it via user-friendly interfaces like LM Studio or Ollama Desktop.
## Recommended Generation Parameters
```python
generate_cfg = {
'top_k': 20,
'top_p': 0.95,
'temperature': 0.6,
'repetition_penalty': 1.1,
'max_tokens': 2048
}
```
- For a query that you need to find a short and accurate answer. Add the following phrase: "\n\nPlease reason step-by-step and put the final answer within \\\\boxed{}."
## Citation
```
@misc{II-Search-4B,
author = {Intelligent Internet},
title = {II-Search-4B: Information Seeking and Web-Integrated Reasoning LLM},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Hub},
howpublished = {\url{https://huggingface.co/II-Vietnam/II-Search-4B}},
}
``` |