fascinating read! staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach: 1) <think>on prompt/context && what u know </think> 2) self <search>when u don’t know</search> (iteratively) with no external tool 3) <information>cite sources to support claim(s)</information> 4) <answer>final answer</answer> their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL