Collections
Discover the best community collections!
Collections including paper arxiv:2507.14683
-
Magistral
Paper • 2506.10910 • Published • 63 -
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Paper • 2506.15882 • Published • 2 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 126 -
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper • 2507.14843 • Published • 84
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.15k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.02k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 39 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 22
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 4 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 34 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 128 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 87
-
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 30 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 256 -
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Paper • 2507.07996 • Published • 32 -
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Paper • 2507.13158 • Published • 24
-
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 -
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7
-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 32 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 107 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 105
-
WorldVLA: Towards Autoregressive Action World Model
Paper • 2506.21539 • Published • 39 -
Fast and Simplex: 2-Simplicial Attention in Triton
Paper • 2507.02754 • Published • 25 -
IntFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction
Paper • 2507.02025 • Published • 35 -
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 22
-
Magistral
Paper • 2506.10910 • Published • 63 -
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Paper • 2506.15882 • Published • 2 -
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization
Paper • 2507.14683 • Published • 126 -
The Invisible Leash: Why RLVR May Not Escape Its Origin
Paper • 2507.14843 • Published • 84
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper • 2503.14734 • Published • 4 -
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Paper • 2401.02117 • Published • 34 -
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics
Paper • 2506.01844 • Published • 128 -
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Paper • 2506.16035 • Published • 87
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 5.71k • 1.15k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 14 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 1.02k • 15 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 61
-
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Paper • 2503.12937 • Published • 30 -
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 256 -
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs
Paper • 2507.07996 • Published • 32 -
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities
Paper • 2507.13158 • Published • 24
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 25 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 100 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Paper • 2310.17796 • Published • 18 -
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Paper • 2311.08263 • Published • 16 -
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Paper • 2501.12599 • Published • 123 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 7