README.md · Lyte/QuadConnect2.5-0.5B-v0.0.9b at e2885201448846839d23aaf4cadb9af41a754616

QuadConnect2.5-0.5B-v0.0.9b / README.md

Lyte

Update README.md

e288520 verified 7 months ago

preview code

raw

history blame

9.1 kB

	---
	base_model: unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit
	library_name: transformers
	model_name: QuadConnect2.5-0.5B-v0.0.9b
	pipeline_tag: text-generation
	tags:
	- unsloth
	- trl
	- grpo
	- connect4
	- qwen
	- RL
	licence: license
	datasets:
	- Lyte/ConnectFour-T10
	language:
	- en
	---

	# Model Card for QuadConnect2.5-0.5B-v0.0.9b

	## Model Details

	### Model Description

	- Still very early training experiments, the reward functions are still changing.
	- This model was created using GRPO and Unsloth. It was trained to reason over Connect Four and learn to play it strategically.
	- It's made for a specific project task.

	- Developed by: [Lyte](https://hf.co/Lyte)
	- Model type: Small Language Model
	- Language(s) (NLP): English
	- License: TBD
	- Finetuned from model: [unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-0.5b-instruct-unsloth-bnb-4bit)
	- Trained Using: [TRL](https://github.com/huggingface/trl)'s GRPO.

	# Demo:

	- Example from the hf space(version: 0.0.6b):
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f847d692950415b63c6011/cV87vnDwFAPhOOZIT2tPp.png)

	## Quick start

	* Solution #1:
	```python
	from transformers import pipeline

	SYSTEM_PROMPT = """You are a master Connect Four strategist whose goal is to win while preventing your opponent from winning. The game is played on a 6x7 grid (columns a–g, rows 1–6 with 1 at the bottom) where pieces drop to the lowest available spot.

	Board:
	- Represented as a list of occupied cells in the format: <column><row>(<piece>), e.g., 'a1(O)'.
	- For example: 'a1(O), a2(X), b1(O)' indicates that cell a1 has an O, a2 has an X, and b1 has an O.
	- An empty board is shown as 'Empty Board'.
	- Win by connecting 4 pieces in any direction (horizontal, vertical, or diagonal).

	Strategy:
	1. Identify taken positions, and empty positions.
	2. Find and execute winning moves.
	3. If There isn't a winning move, then block your opponent’s potential wins.
	4. Control the center and set up future moves.

	Respond in XML:
	<reasoning>
	Explain your thought process, focusing on your winning move, how you block your opponent, and your strategic plans.
	</reasoning>
	<move>
	Specify the column letter (a–g) for your next move.
	</move>
	"""

	board = {
	"empty": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: \n- Current board state: Empty Board\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
	"one_move": "Game State:\n- You are playing as: X\n- Your previous moves: \n- Opponent's moves: b1\n- Current board state: b1(O)\n- Next available position per column: \nColumn a: a1, a2, a3, a4, a5, a6 \nColumn b: b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d1, d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
	"four_moves": "Game State:\n- You are playing as: X\n- Your previous moves: a1, a2\n- Opponent's moves: d1, a3\n- Current board state: a1(X), d1(O), a2(X), a3(O)\n- Next available position per column: \nColumn a: a4, a5, a6 \nColumn b: b1, b2, b3, b4, b5, b6 \nColumn c: c1, c2, c3, c4, c5, c6 \nColumn d: d2, d3, d4, d5, d6 \nColumn e: e1, e2, e3, e4, e5, e6 \nColumn f: f1, f2, f3, f4, f5, f6 \nColumn g: g1, g2, g3, g4, g5, g6\n\nMake your move.",
	}

	generator = pipeline("text-generation", model="Lyte/QuadConnect2.5-0.5B-v0.0.9b", device="cuda")

	# use 'empty', 'one_move' or 'four_moves' in board['']
	output = generator([{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": board['empty']}], max_new_tokens=10245, return_full_text=False)[0]
	print(output["generated_text"])
	```
	* Solution #2:
	[GGUF Q8](https://hf.co/Lyte/QuadConnect2.5-0.5B-v0.0.9b/blob/main/quadconnect.Q8_0.gguf): Download the Quantized GGUF in any of your favorite GGUF inference engine(e.g. LMStudio)

	* Solution #3:
	[Huggingface Space](http://hf.co/spaces/Lyte/QuadConnect)): You can duplicate the space or download the code from the space and use it locally.

	## Training procedure

	This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).

	#### Preprocessing

	- First I searched for datasets of the game Connect Four and found 3 potential datasets and ended up selecting this dataset [Leon-LLM/Connect-Four-Datasets-Collection](https://huggingface.co/datasets/Leon-LLM/Connect-Four-Datasets-Collection), I took the dataset filtered it for any empty or broken entries and uploaded it as Lyte/ConnectFour-clean and finally filtered to remove games that go for more than 10 turns, I then split it into train and validation(which wasn't used).
	- The final dataset is Lyte/ConnectFour-T10

	### Evaluation

	* Evaluations were conducted on the [Lyte/ConnectFour-T10](hf.co/datasets/Lyte/ConnectFour-T10) dataset's validation split to test whether the model learns to win by presenting it with a board showing only the winning position left.

	* evals sampling parameters are as follows:
	* temperature=0.6, top_p=0.95, max_tokens=1024

	#### Summary Metrics Comparison

	#### Summary Metrics Comparison

	\| Metric \| Lyte/QuadConnect2.5-0.5B-v0.0.6b \| Lyte/QuadConnect2.5-0.5B-v0.0.8b \| Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) \| Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) \|
	\|-----------------------\|--------------------------------\|--------------------------------\|--------------------------------\|--------------------------------\|
	\| Total games evaluated \| 5082 \| 5082 \| 5082 \| 5082 \|
	\| Correct predictions \| 518 \| 394 \| 516 \| 713 \|
	\| Accuracy \| 10.19% \| 7.75% \| 10.15% \| 14.03% \|
	\| Most common move \| d (41.14%) \| d (67.61%) \| a (38.72%) \| a (31.01%) \|
	\| Middle column usage \| 75.05% \| 99.53% \| 29.08% \| 35.43% \|

	(Middle column usage = c + d + e → 20.11% + 4.05% + 11.27% = 35.43%)

	#### Move Distribution Comparison

	\| Column \| Lyte/QuadConnect2.5-0.5B-v0.0.6b (Count, %) \| Lyte/QuadConnect2.5-0.5B-v0.0.8b (Count, %) \| Lyte/QuadConnect2.5-0.0.9b (Temp 0.6) (Count, %) \| Lyte/QuadConnect2.5-0.0.9b (Temp 0.8) (Count, %) \|
	\|--------\|-----------------------------------\|-----------------------------------\|------------------------------\|------------------------------\|
	\| a \| 603 (19.02%) \| 3 (0.12%) \| 1447 (38.72%) \| 1547 (31.01%) \|
	\| b \| 111 (3.50%) \| 4 (0.16%) \| 644 (17.23%) \| 924 (18.52%) \|
	\| c \| 785 (24.76%) \| 463 (17.96%) \| 648 (17.34%) \| 1003 (20.11%) \|
	\| d \| 1304 (41.14%) \| 1743 (67.61%) \| 101 (2.70%) \| 202 (4.05%) \|
	\| e \| 290 (9.15%) \| 360 (13.96%) \| 338 (9.04%) \| 562 (11.27%) \|
	\| f \| 50 (1.58%) \| 3 (0.12%) \| 310 (8.30%) \| 408 (8.18%) \|
	\| g \| 27 (0.85%) \| 2 (0.08%) \| 249 (6.66%) \| 342 (6.86%) \|



	### Framework versions

	- TRL: 0.15.1
	- Transformers: 4.49.0
	- Pytorch: 2.5.1+cu121
	- Datasets: 3.2.0
	- Tokenizers: 0.21.0

	## Citations

	Cite GRPO as:

	```bibtex
	@article{zhihong2024deepseekmath,
	title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
	author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
	year = 2024,
	eprint = {arXiv:2402.03300},
	}

	```

	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```