Update README.md

Browse files

Files changed (1) hide show

README.md +102 -9

README.md CHANGED Viewed

@@ -2,43 +2,136 @@
 base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
 library_name: transformers
 model_name: QuadConnect2.5-0.5B-v0.0.9b
 tags:
-- generated_from_trainer
 - unsloth
 - trl
 - grpo
 licence: license
 ---
 # Model Card for QuadConnect2.5-0.5B-v0.0.9b
-This model is a fine-tuned version of [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit).
-It has been trained using [TRL](https://github.com/huggingface/trl).
 ## Quick start
 ```python
 from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
 generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 ### Framework versions
-- TRL: 0.15.2
 - Transformers: 4.49.0
 - Pytorch: 2.5.1+cu121
-- Datasets: 3.3.1
 - Tokenizers: 0.21.0
 ## Citations

 base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
 library_name: transformers
 model_name: QuadConnect2.5-0.5B-v0.0.9b
+pipeline_tag: text-generation
 tags:
 - unsloth
 - trl
 - grpo
+- connect4
+- qwen
+- RL
 licence: license
+datasets:
+- Lyte/ConnectFour-T10
+language:
+- en
 ---
 # Model Card for QuadConnect2.5-0.5B-v0.0.9b
+## Model Details
+### Model Description
+- Still very early training experiments, the reward functions are still changing.
+- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
+- It's made for a specific project task.
+- **Developed by:** [Lyte](https://hf.co/Lyte)
+- **Model type:** *Small Language Model*
+- **Language(s) (NLP):** *English*
+- **License:** *TBD*
+- **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
+- **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
+# Demo:
+- Example from the hf space(version: 0.0.6b):
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/cV87vnDwFAPhOOZIT2tPp.png)
 ## Quick start
+* Solution #1:
 ```python
 from transformers import pipeline
+SYSTEM_PROMPT = """You are a master Connect Four strategist whose goal is to win while preventing your opponent from winning. The game is played on a 6x7 grid (columns a–g, rows 1–6 with 1 at the bottom) where pieces drop to the lowest available spot.
+Board:
+- Represented as a list of occupied cells in the format: <column><row>(<piece>), e.g., 'a1(O)'.
+- For example: 'a1(O), a2(X), b1(O)' indicates that cell a1 has an O, a2 has an X, and b1 has an O.
+- An empty board is shown as 'Empty Board'.
+- Win by connecting 4 pieces in any direction (horizontal, vertical, or diagonal).
+Strategy:
+1. Identify taken positions, and empty positions.
+2. Find and execute winning moves.
+3. If There isn't a winning move, then block your opponent’s potential wins.
+4. Control the center and set up future moves.
+Respond in XML:
+<reasoning>
+Explain your thought process, focusing on your winning move, how you block your opponent, and your strategic plans.
+</reasoning>
+<move>
+Specify the column letter (a–g) for your next move.
+</move>
+"""
+board = {
+    "empty": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: \n- Current board state: Empty Board\n- Next available position per column:  \nColumn a: a1, a2, a3, a4, a5, a6  \nColumn b: b1, b2, b3, b4, b5, b6  \nColumn c: c1, c2, c3, c4, c5, c6  \nColumn d: d1, d2, d3, d4, d5, d6  \nColumn e: e1, e2, e3, e4, e5, e6  \nColumn f: f1, f2, f3, f4, f5, f6  \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.
+    "one_move": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: b1\n- Current board state: b1(O)\n- Next available position per column:  \nColumn a: a1, a2, a3, a4, a5, a6  \nColumn b: b2, b3, b4, b5, b6  \nColumn c: c1, c2, c3, c4, c5, c6  \nColumn d: d1, d2, d3, d4, d5, d6  \nColumn e: e1, e2, e3, e4, e5, e6  \nColumn f: f1, f2, f3, f4, f5, f6  \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
+    "four_moves": "Game State:\n- You are playing as: X\n- Your previous moves: a1, a2\n- Opponent's moves: d1, a3\n- Current board state: a1(X), d1(O), a2(X), a3(O)\n- Next available position per column:  \nColumn a: a4, a5, a6  \nColumn b: b1, b2, b3, b4, b5, b6  \nColumn c: c1, c2, c3, c4, c5, c6  \nColumn d: d2, d3, d4, d5, d6  \nColumn e: e1, e2, e3, e4, e5, e6  \nColumn f: f1, f2, f3, f4, f5, f6  \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
+}
 generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
+# use 'empty', 'one_move' or 'four_moves' in board['']
+output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
 print(output["generated_text"])
 ```
+* Solution #2:
+[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
+* Solution #3:
+[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
 ## Training procedure
+This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
+#### Preprocessing
+- First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
+- The final dataset is Lyte/ConnectFour-T10
+### Evaluation
+* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
+#### Summary Metrics Comparison
+| Metric                | Lyte/QuadConnect2.5-0.5B-v0.0.6b | Lyte/QuadConnect2.5-0.5B-v0.0.8b | New Evaluation (Lyte/QuadConnect2.5-0.0.9b)[*here*] |
+|-----------------------|--------------------------------|--------------------------------|--------------------------------|
+| Total games evaluated | 5082                           | 5082                           | 5082                           |
+| Correct predictions   | 518                            | 394                            | 516                            |
+| Accuracy             | 10.19%                         | 7.75%                          | 10.15%                         |
+| Most common move     | d (41.14%)                     | d (67.61%)                     | a (38.72%)                     |
+| Middle column usage  | 75.05%                         | 99.53%                         | 27.38%                         |
+*(Middle column usage = c (17.34%) + d (2.70%) + e (9.04%) = 27.38%)*
+#### Move Distribution Comparison
+| Column | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Count, %) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Count, %) |
+|--------|-----------------------------------|-----------------------------------|------------------------------|
+| a      | 603 (19.02%)                      | 3 (0.12%)                        | 1447 (38.72%)               |
+| b      | 111 (3.50%)                        | 4 (0.16%)                        | 644 (17.23%)                |
+| c      | 785 (24.76%)                      | 463 (17.96%)                      | 648 (17.34%)                |
+| d      | 1304 (41.14%)                     | 1743 (67.61%)                     | 101 (2.70%)                 |
+| e      | 290 (9.15%)                       | 360 (13.96%)                      | 338 (9.04%)                 |
+| f      | 50 (1.58%)                        | 3 (0.12%)                         | 310 (8.30%)                 |
+| g      | 27 (0.85%)                        | 2 (0.08%)                         | 249 (6.66%)                 |
 ### Framework versions
+- TRL: 0.15.1
 - Transformers: 4.49.0
 - Pytorch: 2.5.1+cu121
+- Datasets: 3.2.0
 - Tokenizers: 0.21.0
 ## Citations