Training in progress, step 250

Files changed (4) hide show

README.md CHANGED Viewed

@@ -27,25 +27,25 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/mlebar-university-of-chicago/huggingface/runs/3zie9zvw)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 ### Framework versions
-- TRL: 0.21.0
-- Transformers: 4.55.2
 - Pytorch: 2.8.0.dev20250319+cu128
 - Datasets: 4.0.0
-- Tokenizers: 0.21.4
 ## Citations
 Cite GRPO as:
 ```bibtex
-@article{zhihong2024deepseekmath,
     title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
     author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
     year         = 2024,

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/mlebar-university-of-chicago/huggingface/runs/81xgnibd)
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 ### Framework versions
+- TRL: 0.22.1
+- Transformers: 4.56.0
 - Pytorch: 2.8.0.dev20250319+cu128
 - Datasets: 4.0.0
+- Tokenizers: 0.22.0
 ## Citations
 Cite GRPO as:
 ```bibtex
+@article{shao2024deepseekmath,
     title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
     author       = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
     year         = 2024,

adapter_config.json CHANGED Viewed

@@ -25,8 +25,8 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "o_proj",
     "v_proj",
     "k_proj",
     "q_proj"
   ],

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "v_proj",
+    "o_proj",
     "k_proj",
     "q_proj"
   ],

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fcd3c7f772fc675324673de02a8a30013df18a86fde68f5491c6988172241372
 size 54560368

 version https://git-lfs.github.com/spec/v1
+oid sha256:f0f6bba6875039fe3c20497843a92b4bf8c11aeeea27b60fa5fb54a1a053d784
 size 54560368

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:db73dc0d431d7f1a9ef9e5523050776d01882c73d5f3e7fbedabb0ebd86cca38
-size 6993

 version https://git-lfs.github.com/spec/v1
+oid sha256:3c2707f3bed6e825b1ab329ee8fd67eda78289534601eaeed56112c8ee2b8341
+size 7057