Post
4124
fascinating read!
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL