Update README.md

Browse files

Files changed (1) hide show

README.md +67 -68

README.md CHANGED Viewed

@@ -17,31 +17,29 @@ language:
 - en
 ---
-# Model Card for QuadConnect2.5-0.5B-v0.0.9b
-## Model Details
-### Model Description
-- Still very early training experiments, the reward functions are still changing.
-- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
-- It's made for a specific project task.
-- **Developed by:** [Lyte](https://hf.co/Lyte)
-- **Model type:** *Small Language Model*
-- **Language(s) (NLP):** *English*
-- **License:** *TBD*
-- **Finetuned from model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
-- **Trained Using:** [TRL](https://github.com/huggingface/trl)'s GRPO.
-# Demo:
-- Example from the hf space(version: 0.0.6b):
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/cV87vnDwFAPhOOZIT2tPp.png)
-## Quick start
-* Solution #1:
 ```python
 from transformers import pipeline
@@ -56,7 +54,7 @@ Board:
 Strategy:
 1. Identify taken positions, and empty positions.
 2. Find and execute winning moves.
-3. If There isn't a winning move, then block your opponent’s potential wins.
 4. Control the center and set up future moves.
 Respond in XML:
@@ -77,69 +75,72 @@ board = {
 generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
 # use 'empty', 'one_move' or 'four_moves' in board['']
-output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
 print(output["generated_text"])
 ```
-* Solution #2:
-[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)
-* Solution #3:
-[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.
-## Training procedure
-This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
-#### Preprocessing
-- First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
-- The final dataset is Lyte/ConnectFour-T10
-### Evaluation
-* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.
-* evals sampling parameters are as follows:
-* temperature=0.6, top_p=0.95, max_tokens=1024
-#### Summary Metrics Comparison
-| Metric                | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Temp 0.6) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Temp 0.6) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) |
-|-----------------------|--------------------------------|--------------------------------|--------------------------------|--------------------------------|
-| Total games evaluated | 5082                           | 5082                           | 5082                           | 5082                           |
-| Correct predictions   | 518                            | 394                            | 516                            | **713**                            |
-| Accuracy             | 10.19%                         | 7.75%                          | 10.15%                         | **14.03%**                         |
-| Most common move     | d (41.14%)                     | d (67.61%)                     | a (38.72%)                     | **a (31.01%)**                     |
-| Middle column usage  | 75.05%                         | 99.53%                         | 29.08%                         | **35.43%**                         |
-*(Middle column usage = c + d + e → 20.11% + 4.05% + 11.27% = 35.43%)*
-#### Move Distribution Comparison
-| Column | Lyte/QuadConnect2.5-0.5B-v0.0.6b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.5B-v0.0.8b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) (Count, %) | Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) (Count, %) |
-|--------|-----------------------------------|-----------------------------------|------------------------------|------------------------------|
-| a      | 603 (19.02%)                      | 3 (0.12%)                        | 1447 (38.72%)               | 1547 (31.01%)               |
-| b      | 111 (3.50%)                        | 4 (0.16%)                        | 644 (17.23%)                | 924 (18.52%)                |
-| c      | 785 (24.76%)                      | 463 (17.96%)                      | 648 (17.34%)                | 1003 (20.11%)               |
-| d      | 1304 (41.14%)                     | 1743 (67.61%)                     | 101 (2.70%)                 | 202 (4.05%)                 |
-| e      | 290 (9.15%)                       | 360 (13.96%)                      | 338 (9.04%)                 | 562 (11.27%)                |
-| f      | 50 (1.58%)                        | 3 (0.12%)                         | 310 (8.30%)                 | 408 (8.18%)                 |
-| g      | 27 (0.85%)                        | 2 (0.08%)                         | 249 (6.66%)                 | 342 (6.86%)                 |
-### Framework versions
 - TRL: 0.15.1
 - Transformers: 4.49.0
-- Pytorch: 2.5.1+cu121
 - Datasets: 3.2.0
 - Tokenizers: 0.21.0
-## Citations
-Cite GRPO as:
 ```bibtex
 @article{zhihong2024deepseekmath,
     title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
@@ -147,18 +148,16 @@ Cite GRPO as:
     year         = 2024,
     eprint       = {arXiv:2402.03300},
 }
 ```
-Cite TRL as:
 ```bibtex
 @misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```

 - en
 ---
+# QuadConnect2.5-0.5B-v0.0.9b - A Strategic Connect Four AI
+![Connect Four Demo](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/QiDstnBXlVVz6dGrx3uus.png)
+## 🎮 Overview
+QuadConnect2.5-0.5B is a specialized language model trained to master the game of Connect Four. Built on Qwen 2.5 (0.5B parameter base), this model uses GRPO (Gradient-based Reward Policy Optimization) to learn the strategic intricacies of Connect Four gameplay.
+**Status**: Early training experiments (v0.0.9b) - Reward functions still evolving
+## 🔍 Model Details
+- **Developed by:** [Lyte](https://hf.co/Lyte)
+- **Model type:** Small Language Model (SLM)
+- **Language:** English
+- **Base model:** [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
+- **Training method:** [TRL](https://github.com/huggingface/trl)'s GRPO
+- **Training data:** [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
+## 🚀 Quick Start
+### Option 1: Using Transformers
 ```python
 from transformers import pipeline
 Strategy:
 1. Identify taken positions, and empty positions.
 2. Find and execute winning moves.
+3. If There isn't a winning move, then block your opponent's potential wins.
 4. Control the center and set up future moves.
 Respond in XML:
 generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")
 # use 'empty', 'one_move' or 'four_moves' in board['']
+output = generator([
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": board['empty']}
+], max_new_tokens=10245, return_full_text=False)[0]
 print(output["generated_text"])
 ```
+### Option 2: Using GGUF
+Download the [Quantized GGUF (Q8_0)](https://huggingface.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/unsloth.Q8_0.gguf) and use it in your favorite GGUF inference engine (e.g., LMStudio).
+### Option 3: Using Hugging Face Space
+Visit the [QuadConnect Space](https://huggingface.co/spaces/Lyte/QuadConnect) to interact with the model directly. You can also duplicate the space or download its code for local use.
+## 📊 Evaluation Results
+Model performance was evaluated on the [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10) validation split with various temperature settings.
+### Summary Metrics Comparison
+| Metric | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
+|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
+| Total games evaluated | 5082 | 5082 | 5082 | 5082 | 5082 |
+| Correct predictions | 518 | 394 | 516 | **713** | 677 |
+| Accuracy | 10.19% | 7.75% | 10.15% | **14.03%** | 13.32% |
+| Most common move | d (41.14%) | d (67.61%) | a (38.72%) | a (31.01%) | a (26.99%) |
+| Middle column usage | 75.05% | 99.53% | 29.08% | 35.43% | 39.49% |
+### Move Distribution by Column
+| Column | v0.0.6b (Temp 0.6) | v0.0.8b (Temp 0.6) | v0.0.9b (Temp 0.6) | v0.0.9b (Temp 0.8) | v0.0.9b (Temp 1.0) |
+|--------|-------------------|-------------------|-------------------|-------------------|-------------------|
+| a | 603 (19.02%) | 3 (0.12%) | 1447 (38.72%) | 1547 (31.01%) | 1351 (26.99%) |
+| b | 111 (3.50%) | 4 (0.16%) | 644 (17.23%) | 924 (18.52%) | 997 (19.92%) |
+| c | 785 (24.76%) | 463 (17.96%) | 648 (17.34%) | 1003 (20.11%) | 985 (19.68%) |
+| d | 1304 (41.14%) | 1743 (67.61%) | 101 (2.70%) | 202 (4.05%) | 306 (6.11%) |
+| e | 290 (9.15%) | 360 (13.96%) | 338 (9.04%) | 562 (11.27%) | 686 (13.70%) |
+| f | 50 (1.58%) | 3 (0.12%) | 310 (8.30%) | 408 (8.18%) | 354 (7.07%) |
+| g | 27 (0.85%) | 2 (0.08%) | 249 (6.66%) | 342 (6.86%) | 327 (6.53%) |
+## 🔧 Training Details
+### Data Preparation
+1. Started with [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection)
+2. Filtered for clean, complete entries
+3. Further filtered to include only games with 10 or fewer turns
+4. Split into train and validation sets
+5. Final dataset: [Lyte/ConnectFour-T10](https://huggingface.co/datasets/Lyte/ConnectFour-T10)
+### Evaluation Parameters
+- Temperature: 0.6, 0.8, 1.0 (compared)
+- Top-p: 0.95
+- Max tokens: 1024
+### Framework Versions
 - TRL: 0.15.1
 - Transformers: 4.49.0
+- PyTorch: 2.5.1+cu121
 - Datasets: 3.2.0
 - Tokenizers: 0.21.0
+## 📚 Citations
+For GRPO:
 ```bibtex
 @article{zhihong2024deepseekmath,
     title        = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
     year         = 2024,
     eprint       = {arXiv:2402.03300},
 }
 ```
+For TRL:
 ```bibtex
 @misc{vonwerra2022trl,
+    title        = {{TRL: Transformer Reinforcement Learning}},
+    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+    year         = 2020,
+    journal      = {GitHub repository},
+    publisher    = {GitHub},
+    howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```