Lyte commited on
Commit
ec40c84
·
verified ·
1 Parent(s): 582fbc9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -9
README.md CHANGED
@@ -2,43 +2,136 @@
2
  base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
3
  library_name: transformers
4
  model_name: QuadConnect2.5-0.5B-v0.0.9b
 
5
  tags:
6
- - generated_from_trainer
7
  - unsloth
8
  - trl
9
  - grpo
 
 
 
10
  licence: license
 
 
 
 
11
  ---
12
 
13
  # Model Card for QuadConnect2.5-0.5B-v0.0.9b
14
 
15
- This model is a fine-tuned version of [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit).
16
- It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
  ## Quick start
19
 
 
20
  ```python
21
  from transformers import pipeline
22
 
23
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
  generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
25
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 
 
26
  print(output["generated_text"])
27
  ```
 
 
 
 
 
28
 
29
  ## Training procedure
30
 
31
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
 
34
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
35
 
36
  ### Framework versions
37
 
38
- - TRL: 0.15.2
39
  - Transformers: 4.49.0
40
  - Pytorch: 2.5.1+cu121
41
- - Datasets: 3.3.1
42
  - Tokenizers: 0.21.0
43
 
44
  ## Citations
 
2
  base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
3
  library_name: transformers
4
  model_name: QuadConnect2.5-0.5B-v0.0.9b
5
+ pipeline_tag: text-generation
6
  tags:
 
7
  - unsloth
8
  - trl
9
  - grpo
10
+ - connect4
11
+ - qwen
12
+ - RL
13
  licence: license
14
+ datasets:
15
+ - Lyte/ConnectFour-T10
16
+ language:
17
+ - en
18
  ---
19
 
20
  # Model Card for QuadConnect2.5-0.5B-v0.0.9b
21
 
22
+ ## Model Details
23
+
24
+ ### Model Description
25
+
26
+ - Still very early training experiments, the reward functions are still changing.
27
+ - This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
28
+ - It's made for a specific project task.
29
+
30
+ - **Developed by:** [Lyte](https://hf.co/Lyte)
31
+ - **Model type:** *Small Language Model*
32
+ - **Language(s) (NLP):** *English*
33
+ - **License:** *TBD*
34
+ - **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
35
+ - **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
36
+
37
+ # Demo:
38
+
39
+ - Example from the hf space(version: 0.0.6b):
40
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/cV87vnDwFAPhOOZIT2tPp.png)
41
 
42
  ## Quick start
43
 
44
+ * Solution #1:
45
  ```python
46
  from transformers import pipeline
47
 
48
+ SYSTEM_PROMPT = """You are a master Connect Four strategist whose goal is to win while preventing your opponent from winning. The game is played on a 6x7 grid (columns a–g, rows 1–6 with 1 at the bottom) where pieces drop to the lowest available spot.
49
+
50
+ Board:
51
+ - Represented as a list of occupied cells in the format: <column><row>(<piece>), e.g., 'a1(O)'.
52
+ - For example: 'a1(O), a2(X), b1(O)' indicates that cell a1 has an O, a2 has an X, and b1 has an O.
53
+ - An empty board is shown as 'Empty Board'.
54
+ - Win by connecting 4 pieces in any direction (horizontal, vertical, or diagonal).
55
+
56
+ Strategy:
57
+ 1. Identify taken positions, and empty positions.
58
+ 2. Find and execute winning moves.
59
+ 3. If There isn't a winning move, then block your opponent’s potential wins.
60
+ 4. Control the center and set up future moves.
61
+
62
+ Respond in XML:
63
+ <reasoning>
64
+ Explain your thought process, focusing on your winning move, how you block your opponent, and your strategic plans.
65
+ </reasoning>
66
+ <move>
67
+ Specify the column letter (a–g) for your next move.
68
+ </move>
69
+ """
70
+
71
+ board = {
72
+ "empty": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: \n- Current board state: Empty Board\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.
73
+ "one_move": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: b1\n- Current board state: b1(O)\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
74
+ "four_moves": "Game State:\n- You are playing as: X\n- Your previous moves: a1, a2\n- Opponent's moves: d1, a3\n- Current board state: a1(X), d1(O), a2(X), a3(O)\n- Next available position per column: \nColumn a: a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
75
+ }
76
+
77
  generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
78
+
79
+ # use 'empty', 'one_move' or 'four_moves' in board['']
80
+ output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
81
  print(output["generated_text"])
82
  ```
83
+ * Solution #2:
84
+ [GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
85
+
86
+ * Solution #3:
87
+ [Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
88
 
89
  ## Training procedure
90
 
91
+ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
92
+
93
+ #### Preprocessing
94
+
95
+ - First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
96
+ - The final dataset is Lyte/ConnectFour-T10
97
+
98
+ ### Evaluation
99
+
100
+ * Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
101
+
102
+
103
+ #### Summary Metrics Comparison
104
+
105
+ | Metric | Lyte/QuadConnect2.5-0.5B-v0.0.6b | Lyte/QuadConnect2.5-0.5B-v0.0.8b | New Evaluation (Lyte/QuadConnect2.5-0.0.9b)[*here*] |
106
+ |-----------------------|--------------------------------|--------------------------------|--------------------------------|
107
+ | Total games evaluated | 5082 | 5082 | 5082 |
108
+ | Correct predictions | 518 | 394 | 516 |
109
+ | Accuracy | 10.19% | 7.75% | 10.15% |
110
+ | Most common move | d (41.14%) | d (67.61%) | a (38.72%) |
111
+ | Middle column usage | 75.05% | 99.53% | 27.38% |
112
+
113
+ *(Middle column usage = c (17.34%) + d (2.70%) + e (9.04%) = 27.38%)*
114
+
115
+ #### Move Distribution Comparison
116
+
117
+ | Column | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Count, %) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Count, %) |
118
+ |--------|-----------------------------------|-----------------------------------|------------------------------|
119
+ | a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) |
120
+ | b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) |
121
+ | c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) |
122
+ | d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) |
123
+ | e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) |
124
+ | f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) |
125
+ | g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) |
126
 
127
 
 
128
 
129
  ### Framework versions
130
 
131
+ - TRL: 0.15.1
132
  - Transformers: 4.49.0
133
  - Pytorch: 2.5.1+cu121
134
+ - Datasets: 3.2.0
135
  - Tokenizers: 0.21.0
136
 
137
  ## Citations