Update README.md
Browse files
README.md
CHANGED
@@ -2,43 +2,136 @@
|
|
2 |
base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
|
3 |
library_name: transformers
|
4 |
model_name: QuadConnect2.5-0.5B-v0.0.9b
|
|
|
5 |
tags:
|
6 |
-
- generated_from_trainer
|
7 |
- unsloth
|
8 |
- trl
|
9 |
- grpo
|
|
|
|
|
|
|
10 |
licence: license
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
# Model Card for QuadConnect2.5-0.5B-v0.0.9b
|
14 |
|
15 |
-
|
16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Quick start
|
19 |
|
|
|
20 |
```python
|
21 |
from transformers import pipeline
|
22 |
|
23 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
24 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
25 |
-
|
|
|
|
|
26 |
print(output["generated_text"])
|
27 |
```
|
|
|
|
|
|
|
|
|
|
|
28 |
|
29 |
## Training procedure
|
30 |
|
31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
|
33 |
|
34 |
-
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
35 |
|
36 |
### Framework versions
|
37 |
|
38 |
-
- TRL: 0.15.
|
39 |
- Transformers: 4.49.0
|
40 |
- Pytorch: 2.5.1+cu121
|
41 |
-
- Datasets: 3.
|
42 |
- Tokenizers: 0.21.0
|
43 |
|
44 |
## Citations
|
|
|
2 |
base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
|
3 |
library_name: transformers
|
4 |
model_name: QuadConnect2.5-0.5B-v0.0.9b
|
5 |
+
pipeline_tag: text-generation
|
6 |
tags:
|
|
|
7 |
- unsloth
|
8 |
- trl
|
9 |
- grpo
|
10 |
+
- connect4
|
11 |
+
- qwen
|
12 |
+
- RL
|
13 |
licence: license
|
14 |
+
datasets:
|
15 |
+
- Lyte/ConnectFour-T10
|
16 |
+
language:
|
17 |
+
- en
|
18 |
---
|
19 |
|
20 |
# Model Card for QuadConnect2.5-0.5B-v0.0.9b
|
21 |
|
22 |
+
## Model Details
|
23 |
+
|
24 |
+
### Model Description
|
25 |
+
|
26 |
+
- Still very early training experiments, the reward functions are still changing.
|
27 |
+
- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
|
28 |
+
- It's made for a specific project task.
|
29 |
+
|
30 |
+
- **Developed by:** [Lyte](https://hf.co/Lyte)
|
31 |
+
- **Model type:** *Small Language Model*
|
32 |
+
- **Language(s) (NLP):** *English*
|
33 |
+
- **License:** *TBD*
|
34 |
+
- **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
|
35 |
+
- **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
|
36 |
+
|
37 |
+
# Demo:
|
38 |
+
|
39 |
+
- Example from the hf space(version: 0.0.6b):
|
40 |
+

|
41 |
|
42 |
## Quick start
|
43 |
|
44 |
+
* Solution #1:
|
45 |
```python
|
46 |
from transformers import pipeline
|
47 |
|
48 |
+
SYSTEM_PROMPT = """You are a master Connect Four strategist whose goal is to win while preventing your opponent from winning. The game is played on a 6x7 grid (columns a–g, rows 1–6 with 1 at the bottom) where pieces drop to the lowest available spot.
|
49 |
+
|
50 |
+
Board:
|
51 |
+
- Represented as a list of occupied cells in the format: <column><row>(<piece>), e.g., 'a1(O)'.
|
52 |
+
- For example: 'a1(O), a2(X), b1(O)' indicates that cell a1 has an O, a2 has an X, and b1 has an O.
|
53 |
+
- An empty board is shown as 'Empty Board'.
|
54 |
+
- Win by connecting 4 pieces in any direction (horizontal, vertical, or diagonal).
|
55 |
+
|
56 |
+
Strategy:
|
57 |
+
1. Identify taken positions, and empty positions.
|
58 |
+
2. Find and execute winning moves.
|
59 |
+
3. If There isn't a winning move, then block your opponent’s potential wins.
|
60 |
+
4. Control the center and set up future moves.
|
61 |
+
|
62 |
+
Respond in XML:
|
63 |
+
<reasoning>
|
64 |
+
Explain your thought process, focusing on your winning move, how you block your opponent, and your strategic plans.
|
65 |
+
</reasoning>
|
66 |
+
<move>
|
67 |
+
Specify the column letter (a–g) for your next move.
|
68 |
+
</move>
|
69 |
+
"""
|
70 |
+
|
71 |
+
board = {
|
72 |
+
"empty": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: \n- Current board state: Empty Board\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.
|
73 |
+
"one_move": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: b1\n- Current board state: b1(O)\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
|
74 |
+
"four_moves": "Game State:\n- You are playing as: X\n- Your previous moves: a1, a2\n- Opponent's moves: d1, a3\n- Current board state: a1(X), d1(O), a2(X), a3(O)\n- Next available position per column: \nColumn a: a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
|
75 |
+
}
|
76 |
+
|
77 |
generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
|
78 |
+
|
79 |
+
# use 'empty', 'one_move' or 'four_moves' in board['']
|
80 |
+
output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
|
81 |
print(output["generated_text"])
|
82 |
```
|
83 |
+
* Solution #2:
|
84 |
+
[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
|
85 |
+
|
86 |
+
* Solution #3:
|
87 |
+
[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
|
88 |
|
89 |
## Training procedure
|
90 |
|
91 |
+
This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
|
92 |
+
|
93 |
+
#### Preprocessing
|
94 |
+
|
95 |
+
- First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
|
96 |
+
- The final dataset is Lyte/ConnectFour-T10
|
97 |
+
|
98 |
+
### Evaluation
|
99 |
+
|
100 |
+
* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
|
101 |
+
|
102 |
+
|
103 |
+
#### Summary Metrics Comparison
|
104 |
+
|
105 |
+
| Metric | Lyte/QuadConnect2.5-0.5B-v0.0.6b | Lyte/QuadConnect2.5-0.5B-v0.0.8b | New Evaluation (Lyte/QuadConnect2.5-0.0.9b)[*here*] |
|
106 |
+
|-----------------------|--------------------------------|--------------------------------|--------------------------------|
|
107 |
+
| Total games evaluated | 5082 | 5082 | 5082 |
|
108 |
+
| Correct predictions | 518 | 394 | 516 |
|
109 |
+
| Accuracy | 10.19% | 7.75% | 10.15% |
|
110 |
+
| Most common move | d (41.14%) | d (67.61%) | a (38.72%) |
|
111 |
+
| Middle column usage | 75.05% | 99.53% | 27.38% |
|
112 |
+
|
113 |
+
*(Middle column usage = c (17.34%) + d (2.70%) + e (9.04%) = 27.38%)*
|
114 |
+
|
115 |
+
#### Move Distribution Comparison
|
116 |
+
|
117 |
+
| Column | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Count, %) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Count, %) |
|
118 |
+
|--------|-----------------------------------|-----------------------------------|------------------------------|
|
119 |
+
| a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) |
|
120 |
+
| b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) |
|
121 |
+
| c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) |
|
122 |
+
| d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) |
|
123 |
+
| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) |
|
124 |
+
| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) |
|
125 |
+
| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) |
|
126 |
|
127 |
|
|
|
128 |
|
129 |
### Framework versions
|
130 |
|
131 |
+
- TRL: 0.15.1
|
132 |
- Transformers: 4.49.0
|
133 |
- Pytorch: 2.5.1+cu121
|
134 |
+
- Datasets: 3.2.0
|
135 |
- Tokenizers: 0.21.0
|
136 |
|
137 |
## Citations
|