LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries Paper • 2508.15760 • Published 3 days ago • 33
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper • 2508.14704 • Published 4 days ago • 29
view article Article Old Maps, New Terrain: Updating Labour Taxonomies for the AI Era By frimelle and 1 other • 4 days ago • 12
τ^2-Bench: Evaluating Conversational Agents in a Dual-Control Environment Paper • 2506.07982 • Published Jun 9 • 6
view article Article From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels By drbh and 1 other • 7 days ago • 35
view article Article Announcing the Synthetic Online Conversations Dataset (SOC) By marcodsn • 12 days ago • 11
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper • 2508.06471 • Published 16 days ago • 156
view article Article The GPT-OSS models are here… and they’re energy-efficient! By sasha • 17 days ago • 19
view article Article Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training By siro1 and 4 others • 17 days ago • 51
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 323
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! By reach-vb and 11 others • 20 days ago • 472
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated 17 days ago • 316
view article Article Introducing Command A Vision: Multimodal AI built for Business By CohereLabs and 3 others • 24 days ago • 63
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face By abidlabs and 4 others • 27 days ago • 159