Instructions to use MiniMaxAI/MiniMax-M1-80k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MiniMaxAI/MiniMax-M1-80k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="MiniMaxAI/MiniMax-M1-80k", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("MiniMaxAI/MiniMax-M1-80k", trust_remote_code=True, dtype="auto") - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use MiniMaxAI/MiniMax-M1-80k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "MiniMaxAI/MiniMax-M1-80k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M1-80k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/MiniMaxAI/MiniMax-M1-80k
- SGLang
How to use MiniMaxAI/MiniMax-M1-80k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "MiniMaxAI/MiniMax-M1-80k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M1-80k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "MiniMaxAI/MiniMax-M1-80k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "MiniMaxAI/MiniMax-M1-80k", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use MiniMaxAI/MiniMax-M1-80k with Docker Model Runner:
docker model run hf.co/MiniMaxAI/MiniMax-M1-80k
Was the 7.5T Token Continual Pre-Training Performed on the Instruction-Tuned Model or the Base PLM?
Hello, and thank you for your impressive work on the MiniMax-M1 project!
In the paper, it is mentioned that a 7.5T token continual pre-training (CPT) was performed based on the MiniMax-Text-01 model.
To clarify, was this CPT applied to the instruction-tuned model (MiniMax-Text-01 Instruction), or was it conducted on the base pretrained language model (MiniMax-Text-01 PLM) before any instruction tuning?
(Here, “MiniMax-Text-01 Instruction” and “MiniMax-Text-01 PLM” are just placeholder terms for clarity.)
Understanding this detail would help clarify how CPT fits into the overall training pipeline and what its intended role is.
Thank you in advance!
Thank you for the clarification! @sriting
I have a follow-up question regarding the implementation details.
Is it correct that there are no special tokens such as and to explicitly mark reasoning paths (i.e., Chain-of-Thought segments) in the model output?
I ask this because some recent models (e.g., Qwen3, DeepSeek-MoE) adopt such tags to separate reasoning and final answers. I’m curious whether MiniMax-M1 internally uses similar markers during SFT or RL training, or if reasoning is handled purely implicitly through instruction and response formatting.
Thanks again for your time and support!
Bro it is not working now