Update README.md
Browse files
README.md
CHANGED
@@ -17,31 +17,29 @@ language:
|
|
17 |
- en
|
18 |
---
|
19 |
|
20 |
-
#
|
21 |
|
22 |
-
|
23 |
|
24 |
-
|
25 |
|
26 |
-
-
|
27 |
-
- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
|
28 |
-
- It's made for a specific project task.
|
29 |
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
- **License:** *TBD*
|
34 |
-
- **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
|
35 |
-
- **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
|
38 |
|
39 |
-
|
40 |
-

|
41 |
|
42 |
-
|
43 |
|
44 |
-
* Solution #1:
|
45 |
```python
|
46 |
from transformers import pipeline
|
47 |
|
@@ -56,7 +54,7 @@ Board:
|
|
56 |
Strategy:
|
57 |
1. Identify taken positions, and empty positions.
|
58 |
2. Find and execute winning moves.
|
59 |
-
3. If There isn't a winning move, then block your opponent
|
60 |
4. Control the center and set up future moves.
|
61 |
|
62 |
Respond in XML:
|
@@ -77,69 +75,72 @@ board = {
|
|
77 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
78 |
|
79 |
# use 'empty', 'one_move' or 'four_moves' in board['']
|
80 |
-
output = generator([
|
|
|
|
|
|
|
|
|
81 |
print(output["generated_text"])
|
82 |
```
|
83 |
-
* Solution #2:
|
84 |
-
[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
|
85 |
-
|
86 |
-
* Solution #3:
|
87 |
-
[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
|
88 |
|
89 |
-
|
90 |
|
91 |
-
|
92 |
|
93 |
-
|
94 |
|
95 |
-
|
96 |
-
- The final dataset is Lyte/ConnectFour-T10
|
97 |
|
98 |
-
|
99 |
|
100 |
-
|
101 |
|
102 |
-
|
103 |
-
* temperature=0.6, top_p=0.95, max_tokens=1024
|
104 |
|
105 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
|
107 |
-
|
108 |
-
|-----------------------|--------------------------------|--------------------------------|--------------------------------|--------------------------------|
|
109 |
-
| Total games evaluated | 5082 | 5082 | 5082 | 5082 |
|
110 |
-
| Correct predictions | 518 | 394 | 516 | **713** |
|
111 |
-
| Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** |
|
112 |
-
| Most common move | d (41.14%) | d (67.61%) | a (38.72%) | **a (31.01%)** |
|
113 |
-
| Middle column usage | 75.05% | 99.53% | 29.08% | **35.43%** |
|
114 |
|
115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
116 |
|
117 |
-
|
118 |
|
119 |
-
|
120 |
-
|
121 |
-
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) |
|
126 |
-
| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) |
|
127 |
-
| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) |
|
128 |
|
|
|
|
|
|
|
|
|
129 |
|
130 |
-
|
131 |
-
### Framework versions
|
132 |
-
|
133 |
- TRL: 0.15.1
|
134 |
- Transformers: 4.49.0
|
135 |
-
-
|
136 |
- Datasets: 3.2.0
|
137 |
- Tokenizers: 0.21.0
|
138 |
|
139 |
-
## Citations
|
140 |
-
|
141 |
-
Cite GRPO as:
|
142 |
|
|
|
143 |
```bibtex
|
144 |
@article{zhihong2024deepseekmath,
|
145 |
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
@@ -147,18 +148,16 @@ Cite GRPO as:
|
|
147 |
year = 2024,
|
148 |
eprint = {arXiv:2402.03300},
|
149 |
}
|
150 |
-
|
151 |
```
|
152 |
|
153 |
-
|
154 |
-
|
155 |
```bibtex
|
156 |
@misc{vonwerra2022trl,
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
-
|
161 |
-
|
162 |
-
|
163 |
}
|
164 |
```
|
|
|
17 |
- en
|
18 |
---
|
19 |
|
20 |
+
# QuadConnect2.5-0.5B-v0.0.9b - A Strategic Connect Four AI
|
21 |
|
22 |
+

|
23 |
|
24 |
+
## 🎮 Overview
|
25 |
|
26 |
+
QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (Gradient-based Reward Policy Optimization) to learn the strategic intricacies of Connect Four gameplay.
|
|
|
|
|
27 |
|
28 |
+
**Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
|
29 |
+
|
30 |
+
## 🔍 Model Details
|
|
|
|
|
|
|
31 |
|
32 |
+
- **Developed by:** [Lyte](https://hf.co/Lyte)
|
33 |
+
- **Model type:** Small Language Model (SLM)
|
34 |
+
- **Language:** English
|
35 |
+
- **Base model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
|
36 |
+
- **Training method:** [TRL](https://github.com/huggingface/trl)'s GRPO
|
37 |
+
- **Training data:** [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
|
38 |
|
39 |
+
## 🚀 Quick Start
|
|
|
40 |
|
41 |
+
### Option 1: Using Transformers
|
42 |
|
|
|
43 |
```python
|
44 |
from transformers import pipeline
|
45 |
|
|
|
54 |
Strategy:
|
55 |
1. Identify taken positions, and empty positions.
|
56 |
2. Find and execute winning moves.
|
57 |
+
3. If There isn't a winning move, then block your opponent's potential wins.
|
58 |
4. Control the center and set up future moves.
|
59 |
|
60 |
Respond in XML:
|
|
|
75 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
76 |
|
77 |
# use 'empty', 'one_move' or 'four_moves' in board['']
|
78 |
+
output = generator([
|
79 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
80 |
+
{"role": "user", "content": board['empty']}
|
81 |
+
], max_new_tokens=10245, return_full_text=False)[0]
|
82 |
+
|
83 |
print(output["generated_text"])
|
84 |
```
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
+
### Option 2: Using GGUF
|
87 |
|
88 |
+
Download the [Quantized GGUF (Q8_0)](https://huggingface.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/unsloth.Q8_0.gguf) and use it in your favorite GGUF inference engine (e.g., LMStudio).
|
89 |
|
90 |
+
### Option 3: Using Hugging Face Space
|
91 |
|
92 |
+
Visit the [QuadConnect Space](https://huggingface.co/spaces/Lyte/QuadConnect) to interact with the model directly. You can also duplicate the space or download its code for local use.
|
|
|
93 |
|
94 |
+
## 📊 Evaluation Results
|
95 |
|
96 |
+
Model performance was evaluated on the [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10) validation split with various temperature settings.
|
97 |
|
98 |
+
### Summary Metrics Comparison
|
|
|
99 |
|
100 |
+
| Metric | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
|
101 |
+
|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
|
102 |
+
| Total games evaluated | 5082 | 5082 | 5082 | 5082 | 5082 |
|
103 |
+
| Correct predictions | 518 | 394 | 516 | **713** | 677 |
|
104 |
+
| Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** | 13.32% |
|
105 |
+
| Most common move | d (41.14%) | d (67.61%) | a (38.72%) | a (31.01%) | a (26.99%) |
|
106 |
+
| Middle column usage | 75.05% | 99.53% | 29.08% | 35.43% | 39.49% |
|
107 |
|
108 |
+
### Move Distribution by Column
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
|
110 |
+
| Column | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
|
111 |
+
|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
|
112 |
+
| a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) | 1547 (31.01%) | 1351 (26.99%) |
|
113 |
+
| b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) | 924 (18.52%) | 997 (19.92%) |
|
114 |
+
| c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) | 1003 (20.11%) | 985 (19.68%) |
|
115 |
+
| d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) | 202 (4.05%) | 306 (6.11%) |
|
116 |
+
| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) | 686 (13.70%) |
|
117 |
+
| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) | 354 (7.07%) |
|
118 |
+
| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) | 327 (6.53%) |
|
119 |
|
120 |
+
## 🔧 Training Details
|
121 |
|
122 |
+
### Data Preparation
|
123 |
+
1. Started with [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection)
|
124 |
+
2. Filtered for clean, complete entries
|
125 |
+
3. Further filtered to include only games with 10 or fewer turns
|
126 |
+
4. Split into train and validation sets
|
127 |
+
5. Final dataset: [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
|
|
|
|
|
|
|
128 |
|
129 |
+
### Evaluation Parameters
|
130 |
+
- Temperature: 0.6, 0.8, 1.0 (compared)
|
131 |
+
- Top-p: 0.95
|
132 |
+
- Max tokens: 1024
|
133 |
|
134 |
+
### Framework Versions
|
|
|
|
|
135 |
- TRL: 0.15.1
|
136 |
- Transformers: 4.49.0
|
137 |
+
- PyTorch: 2.5.1+cu121
|
138 |
- Datasets: 3.2.0
|
139 |
- Tokenizers: 0.21.0
|
140 |
|
141 |
+
## 📚 Citations
|
|
|
|
|
142 |
|
143 |
+
For GRPO:
|
144 |
```bibtex
|
145 |
@article{zhihong2024deepseekmath,
|
146 |
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
|
|
|
148 |
year = 2024,
|
149 |
eprint = {arXiv:2402.03300},
|
150 |
}
|
|
|
151 |
```
|
152 |
|
153 |
+
For TRL:
|
|
|
154 |
```bibtex
|
155 |
@misc{vonwerra2022trl,
|
156 |
+
title = {{TRL: Transformer Reinforcement Learning}},
|
157 |
+
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
|
158 |
+
year = 2020,
|
159 |
+
journal = {GitHub repository},
|
160 |
+
publisher = {GitHub},
|
161 |
+
howpublished = {\url{https://github.com/huggingface/trl}}
|
162 |
}
|
163 |
```
|