Instructions to use SeongryongJung/Qwen3-4B-Physics-RLSD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SeongryongJung/Qwen3-4B-Physics-RLSD with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SeongryongJung/Qwen3-4B-Physics-RLSD") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD") model = AutoModelForCausalLM.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use SeongryongJung/Qwen3-4B-Physics-RLSD with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SeongryongJung/Qwen3-4B-Physics-RLSD" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeongryongJung/Qwen3-4B-Physics-RLSD", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/SeongryongJung/Qwen3-4B-Physics-RLSD
- SGLang
How to use SeongryongJung/Qwen3-4B-Physics-RLSD with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SeongryongJung/Qwen3-4B-Physics-RLSD" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeongryongJung/Qwen3-4B-Physics-RLSD", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SeongryongJung/Qwen3-4B-Physics-RLSD" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeongryongJung/Qwen3-4B-Physics-RLSD", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use SeongryongJung/Qwen3-4B-Physics-RLSD with Docker Model Runner:
docker model run hf.co/SeongryongJung/Qwen3-4B-Physics-RLSD
Qwen3-4B Physics RLSD
This repository contains Physics fine-tuned Qwen3-4B checkpoints from the local SciKnowEval-style generalization setup.
- Root checkpoint: final
global_step_100merged to Hugging Face safetensors. best_avg16/: checkpoint with the highest validationavg@16during training, merged to Hugging Face safetensors.
Checkpoints
| Checkpoint | Source step | Validation avg@16 | best@16 / pass@16 | maj@16 |
|---|---|---|---|---|
| Root final | 100 | 0.700000 | 0.768713 | 0.718600 |
best_avg16/ |
60 | 0.731250 | 0.816338 | 0.740938 |
Training Run
qwen3gen-physics-RLSD-Qwen-Qwen3-4B-mbs8-decay0-ema0.05-train256-rollout8-lr1e-6-vllm0.8
Base Model
- Base model:
Qwen/Qwen3-4B - Fine-tuning type: full-parameter FSDP RL training
- Dataset:
datasets/sciknoweval/physics - Train split: 720 examples
- Validation split: 80 examples
Method
- Method: RLSD
- Config:
rlsd - Policy loss mode:
rlsd - Reward: local SciKnowEval multiple-choice reward checker
- Rollout correction: token-level importance sampling, threshold 2.0
Hyperparameters
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3-4B |
| Training steps | 100 |
| Train batch size | 256 |
| Rollouts per prompt | 8 |
| Generations per step | 2048 |
| PPO mini batch size | 8 |
| Learning rate | 1e-6 |
| LR warmup steps | 10 |
| Weight decay | 0.01 |
| Grad clip | 1.0 |
| Max prompt length | 2048 |
| Max response length | 8192 |
| Max model length | 10240 |
| Train temperature | 1.0 |
| Train top_p | 1.0 |
| Validation generations | 16 |
| Validation temperature | 0.6 |
| Validation top_p | 0.95 |
| vLLM GPU memory utilization | 0.8 |
| GPUs | 8 x NVIDIA H200 |
| Save frequency | every 10 steps |
| Validation frequency | every 10 steps |
| Token reweight lambda | 0.5 |
| Token reweight eps_w | 0.2 |
| Token reweight decay steps | 0 |
| Teacher update rate | 0.05 |
| Max reprompt length | 10240 |
Metrics
| Metric | Value |
|---|---|
| Final training step | 100 |
Final critic/score/mean |
0.890137 |
Final critic/rewards/mean |
0.890137 |
Final validation avg@16 |
0.700000 |
Peak validation avg@16 |
0.731250 |
| Peak validation step | 60 |
Loading
Root final checkpoint:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD")
tokenizer = AutoTokenizer.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD")
Best avg@16 checkpoint:
model = AutoModelForCausalLM.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD", subfolder="best_avg16")
tokenizer = AutoTokenizer.from_pretrained("SeongryongJung/Qwen3-4B-Physics-RLSD", subfolder="best_avg16")
Intended Use
This model is intended for research on RL fine-tuning and self-distillation behavior on science/generalization tasks. It has not been broadly safety evaluated for production use.
Limitations
The reported scores are training-time and validation-time metrics from the local experimental setup. They should not be interpreted as broad benchmark results without independent evaluation.
- Downloads last month
- 15