Update README.md

614fafd verified 3 months ago

2.47 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen2.5-Coder-32B-Instruct
  - open-r1/OlympicCoder-32B
pipeline_tag: text-generation
tags:
  - merge
  - programming
  - code generation
  - code
  - codeqwen
  - moe
  - coding
  - coder
  - qwen2
  - chat
  - qwen
  - qwen-coder
  - mixture of experts
  - qwen2moe
  - 2X32B Shared.
  - shared expert
library_name: transformers

re-uploading source code safetensors [20-29], possible upload error(s)...

Qwen2.5-2X32B-CoderInstruct-OlympicCoder-87B-V1.1

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

The monster coder in MOE (Mixture of Experts) 2x32B (with shared expert) configuration.

The two best Coders in one that are stronger than the sum of their parts.

Both models code together.

Max context: 32k.

Super special thanks to Qwen and Open-R1 for making such fantastic models.

Suggested Settings:

Temp .5 to .7 (or lower)
topk: 20, topp: .8, minp: .05 (topp, minp can be .95 and .05)
rep pen: 1.1 (can be lower; lower may generate better code; specifically 1.02, 1.03 and 1.05)
Jinja Template (embedded) or CHATML template.
A System Prompt is not required. (ran tests with blank system prompt)

System Prompt:

If you want the model to code in specific ways, in specific languages I suggest to create a system prompt with these instructions.

This will cut down prompt size and focus the model.

Activated Experts:

Model default is set to 2 experts activated. It will run with one expert activated.

Generation:

Due to model config, suggest min 2 generations if both experts are activated (default) or 2-4 gens if one expert activated.

This will give you a large selection of varied code to choose from.

I also suggest changing rep pen from 1.1 to lower setting(s) and getting at least 2 generations at this level(s).

These generation suggestions can create stronger, more compact code - and in some cases faster code too.

For more information / other Qwen/Mistral Coders / additional settings see:

[ https://huggingface.co/DavidAU/Qwen2.5-MOE-2x-4x-6x-8x__7B__Power-CODER__19B-30B-42B-53B-gguf ]

[model card pending updates]

For settings, parameters and other details also see:

https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct

and/or

https://huggingface.co/open-r1/OlympicCoder-32B

More to come...