@Jaward on Fast360: "fascinating read! staying bullish on search with rl might just help us get rid…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Jaward

posted an update 6 days ago

Post

4124

fascinating read!
staying bullish on search with rl might just help us get rid of hallucination entirely. I really like their approach:
1) <think>on prompt/context && what u know </think>
2) self <search>when u don’t know</search> (iteratively) with no external tool
3) <information>cite sources to support claim(s)</information>
4) <answer>final answer</answer>
their rl training was done cost efficiently too, see code: https://github.com/TsinghuaC3I/SSRL

iseesaw

4 days ago

thanks for sharing our work

Jaward

4 days ago

you're welcome, nice work.

In this post

Jaward Jaward Sesay
iseesaw Kaiyan Zhang