Lyte commited on
Commit
1f26120
·
verified ·
1 Parent(s): 32f34f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -68
README.md CHANGED
@@ -17,31 +17,29 @@ language:
17
  - en
18
  ---
19
 
20
- # Model Card for QuadConnect2.5-0.5B-v0.0.9b
21
 
22
- ## Model Details
23
 
24
- ### Model Description
25
 
26
- - Still very early training experiments, the reward functions are still changing.
27
- - This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
28
- - It's made for a specific project task.
29
 
30
- - **Developed by:** [Lyte](https://hf.co/Lyte)
31
- - **Model type:** *Small Language Model*
32
- - **Language(s) (NLP):** *English*
33
- - **License:** *TBD*
34
- - **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
35
- - **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
36
 
37
- # Demo:
 
 
 
 
 
38
 
39
- - Example from the hf space(version: 0.0.6b):
40
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/cV87vnDwFAPhOOZIT2tPp.png)
41
 
42
- ## Quick start
43
 
44
- * Solution #1:
45
  ```python
46
  from transformers import pipeline
47
 
@@ -56,7 +54,7 @@ Board:
56
  Strategy:
57
  1. Identify taken positions, and empty positions.
58
  2. Find and execute winning moves.
59
- 3. If There isn't a winning move, then block your opponents potential wins.
60
  4. Control the center and set up future moves.
61
 
62
  Respond in XML:
@@ -77,69 +75,72 @@ board = {
77
  generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
78
 
79
  # use 'empty', 'one_move' or 'four_moves' in board['']
80
- output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
 
 
 
 
81
  print(output["generated_text"])
82
  ```
83
- * Solution #2:
84
- [GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
85
-
86
- * Solution #3:
87
- [Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
88
 
89
- ## Training procedure
90
 
91
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
92
 
93
- #### Preprocessing
94
 
95
- - First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
96
- - The final dataset is Lyte/ConnectFour-T10
97
 
98
- ### Evaluation
99
 
100
- * Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
101
 
102
- * evals sampling parameters are as follows:
103
- * temperature=0.6, top_p=0.95, max_tokens=1024
104
 
105
- #### Summary Metrics Comparison
 
 
 
 
 
 
106
 
107
- | Metric | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Temp 0.6) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Temp 0.6) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) |
108
- |-----------------------|--------------------------------|--------------------------------|--------------------------------|--------------------------------|
109
- | Total games evaluated | 5082 | 5082 | 5082 | 5082 |
110
- | Correct predictions | 518 | 394 | 516 | **713** |
111
- | Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** |
112
- | Most common move | d (41.14%) | d (67.61%) | a (38.72%) | **a (31.01%)** |
113
- | Middle column usage | 75.05% | 99.53% | 29.08% | **35.43%** |
114
 
115
- *(Middle column usage = c + d + e 20.11% + 4.05% + 11.27% = 35.43%)*
 
 
 
 
 
 
 
 
116
 
117
- #### Move Distribution Comparison
118
 
119
- | Column | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) (Count, %) |
120
- |--------|-----------------------------------|-----------------------------------|------------------------------|------------------------------|
121
- | a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) | 1547 (31.01%) |
122
- | b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) | 924 (18.52%) |
123
- | c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) | 1003 (20.11%) |
124
- | d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) | 202 (4.05%) |
125
- | e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) |
126
- | f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) |
127
- | g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) |
128
 
 
 
 
 
129
 
130
-
131
- ### Framework versions
132
-
133
  - TRL: 0.15.1
134
  - Transformers: 4.49.0
135
- - Pytorch: 2.5.1+cu121
136
  - Datasets: 3.2.0
137
  - Tokenizers: 0.21.0
138
 
139
- ## Citations
140
-
141
- Cite GRPO as:
142
 
 
143
  ```bibtex
144
  @article{zhihong2024deepseekmath,
145
  title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
@@ -147,18 +148,16 @@ Cite GRPO as:
147
  year = 2024,
148
  eprint = {arXiv:2402.03300},
149
  }
150
-
151
  ```
152
 
153
- Cite TRL as:
154
-
155
  ```bibtex
156
  @misc{vonwerra2022trl,
157
- title = {{TRL: Transformer Reinforcement Learning}},
158
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
159
- year = 2020,
160
- journal = {GitHub repository},
161
- publisher = {GitHub},
162
- howpublished = {\url{https://github.com/huggingface/trl}}
163
  }
164
  ```
 
17
  - en
18
  ---
19
 
20
+ # QuadConnect2.5-0.5B-v0.0.9b - A Strategic Connect Four AI
21
 
22
+ ![Connect Four Demo](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/QiDstnBXlVVz6dGrx3uus.png)
23
 
24
+ ## 🎮 Overview
25
 
26
+ QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (Gradient-based Reward Policy Optimization) to learn the strategic intricacies of Connect Four gameplay.
 
 
27
 
28
+ **Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
29
+
30
+ ## 🔍 Model Details
 
 
 
31
 
32
+ - **Developed by:** [Lyte](https://hf.co/Lyte)
33
+ - **Model type:** Small Language Model (SLM)
34
+ - **Language:** English
35
+ - **Base model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
36
+ - **Training method:** [TRL](https://github.com/huggingface/trl)'s GRPO
37
+ - **Training data:** [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
38
 
39
+ ## 🚀 Quick Start
 
40
 
41
+ ### Option 1: Using Transformers
42
 
 
43
  ```python
44
  from transformers import pipeline
45
 
 
54
  Strategy:
55
  1. Identify taken positions, and empty positions.
56
  2. Find and execute winning moves.
57
+ 3. If There isn't a winning move, then block your opponent's potential wins.
58
  4. Control the center and set up future moves.
59
 
60
  Respond in XML:
 
75
  generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
76
 
77
  # use 'empty', 'one_move' or 'four_moves' in board['']
78
+ output = generator([
79
+ {"role": "system", "content": SYSTEM_PROMPT},
80
+ {"role": "user", "content": board['empty']}
81
+ ], max_new_tokens=10245, return_full_text=False)[0]
82
+
83
  print(output["generated_text"])
84
  ```
 
 
 
 
 
85
 
86
+ ### Option 2: Using GGUF
87
 
88
+ Download the [Quantized GGUF (Q8_0)](https://huggingface.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/unsloth.Q8_0.gguf) and use it in your favorite GGUF inference engine (e.g., LMStudio).
89
 
90
+ ### Option 3: Using Hugging Face Space
91
 
92
+ Visit the [QuadConnect Space](https://huggingface.co/spaces/Lyte/QuadConnect) to interact with the model directly. You can also duplicate the space or download its code for local use.
 
93
 
94
+ ## 📊 Evaluation Results
95
 
96
+ Model performance was evaluated on the [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10) validation split with various temperature settings.
97
 
98
+ ### Summary Metrics Comparison
 
99
 
100
+ | Metric | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
101
+ |--------|-------------------|-------------------|-------------------|-------------------|-------------------|
102
+ | Total games evaluated | 5082 | 5082 | 5082 | 5082 | 5082 |
103
+ | Correct predictions | 518 | 394 | 516 | **713** | 677 |
104
+ | Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** | 13.32% |
105
+ | Most common move | d (41.14%) | d (67.61%) | a (38.72%) | a (31.01%) | a (26.99%) |
106
+ | Middle column usage | 75.05% | 99.53% | 29.08% | 35.43% | 39.49% |
107
 
108
+ ### Move Distribution by Column
 
 
 
 
 
 
109
 
110
+ | Column | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
111
+ |--------|-------------------|-------------------|-------------------|-------------------|-------------------|
112
+ | a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) | 1547 (31.01%) | 1351 (26.99%) |
113
+ | b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) | 924 (18.52%) | 997 (19.92%) |
114
+ | c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) | 1003 (20.11%) | 985 (19.68%) |
115
+ | d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) | 202 (4.05%) | 306 (6.11%) |
116
+ | e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) | 686 (13.70%) |
117
+ | f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) | 354 (7.07%) |
118
+ | g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) | 327 (6.53%) |
119
 
120
+ ## 🔧 Training Details
121
 
122
+ ### Data Preparation
123
+ 1. Started with [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection)
124
+ 2. Filtered for clean, complete entries
125
+ 3. Further filtered to include only games with 10 or fewer turns
126
+ 4. Split into train and validation sets
127
+ 5. Final dataset: [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
 
 
 
128
 
129
+ ### Evaluation Parameters
130
+ - Temperature: 0.6, 0.8, 1.0 (compared)
131
+ - Top-p: 0.95
132
+ - Max tokens: 1024
133
 
134
+ ### Framework Versions
 
 
135
  - TRL: 0.15.1
136
  - Transformers: 4.49.0
137
+ - PyTorch: 2.5.1+cu121
138
  - Datasets: 3.2.0
139
  - Tokenizers: 0.21.0
140
 
141
+ ## 📚 Citations
 
 
142
 
143
+ For GRPO:
144
  ```bibtex
145
  @article{zhihong2024deepseekmath,
146
  title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
 
148
  year = 2024,
149
  eprint = {arXiv:2402.03300},
150
  }
 
151
  ```
152
 
153
+ For TRL:
 
154
  ```bibtex
155
  @misc{vonwerra2022trl,
156
+ title = {{TRL: Transformer Reinforcement Learning}},
157
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
158
+ year = 2020,
159
+ journal = {GitHub repository},
160
+ publisher = {GitHub},
161
+ howpublished = {\url{https://github.com/huggingface/trl}}
162
  }
163
  ```