MirrorBench: An Extensible Framework to Evaluate User-Proxy Agents for Human-Likeness Paper • 2601.08118 • Published 16 days ago • 2 • 3
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky Paper • 2507.03336 • Published Jul 4, 2025 • 7 • 1