Improve model card: Add pipeline tag, library name, and relevant tags; update paper link in content
Browse filesThis PR improves the model card by:
- Adding the `pipeline_tag: text-generation` to ensure the model appears in relevant searches on the Hugging Face Hub.
- Adding `library_name: transformers` to enable the "Use in Transformers" widget for direct copy-pasting code snippets.
- Adding `llm-as-judge` and `qwen2` to the `tags` list for enhanced discoverability.
- Updating the paper link in the "Evaluation" section to point to the Hugging Face paper page (`https://huggingface.co/papers/2507.09104`) for better integration with the Hugging Face ecosystem.
README.md
CHANGED
|
@@ -1,6 +1,10 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
# CompassJudger-2
|
|
@@ -30,10 +34,10 @@ We introduce **CompassJudger-2**, a novel series of generalist judge models desi
|
|
| 30 |
|
| 31 |
Key contributions of our work include:
|
| 32 |
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
|
| 38 |
This repository contains the **CompassJudger-2** series of models, fine-tuned on the Qwen2.5-Instruct series.
|
| 39 |
|
|
@@ -120,14 +124,14 @@ CompassJudger-2 sets a new state-of-the-art for judge models, outperforming gene
|
|
| 120 |
| CompassJudger-1-32B-Instruct | 60.33 | 62.29 | 77.63 | 86.17 | 71.61 |
|
| 121 |
| Skywork-Critic-Llama-3.1-70B | 52.41 | 50.65 | 65.50 | 93.30 | 65.47 |
|
| 122 |
| RISE-Judge-Qwen2.5-32B | 56.42 | 63.87 | 73.70 | 92.70 | 71.67 |
|
| 123 |
-
| **CompassJudger-2-32B-Instruct** | **62.21**
|
| 124 |
| **General Models (for reference)** | | | | | |
|
| 125 |
| Qwen2.5-32B-Instruct | 62.97 | 59.84 | 74.99 | 85.61 | 70.85 |
|
| 126 |
| DeepSeek-V3-0324 | 64.43 | 59.68 | 78.16 | 85.17 | 71.86 |
|
| 127 |
| Qwen3-235B-A22B | 61.40 | 65.97 | 75.59 | 84.68 | 71.91 |
|
| 128 |
|
| 129 |
|
| 130 |
-
For detailed benchmark performance and methodology, please refer to our [📑 Paper](https://
|
| 131 |
|
| 132 |
## License
|
| 133 |
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
library_name: transformers
|
| 5 |
+
tags:
|
| 6 |
+
- llm-as-judge
|
| 7 |
+
- qwen2
|
| 8 |
---
|
| 9 |
|
| 10 |
# CompassJudger-2
|
|
|
|
| 34 |
|
| 35 |
Key contributions of our work include:
|
| 36 |
|
| 37 |
+
- **Advanced Data Strategy:** We employ a task-driven, multi-domain data curation and synthesis strategy to enhance the model's robustness and domain adaptability.
|
| 38 |
+
- **Verifiable Reward-Guided Training:** We supervise judgment tasks with verifiable rewards, guiding the model's intrinsic reasoning through chain-of-thought (CoT) and rejection sampling. A refined margin policy gradient loss further enhances performance.
|
| 39 |
+
- **Superior Performance:** CompassJudger-2 achieves state-of-the-art results across multiple judge and reward benchmarks. Our 7B model demonstrates competitive accuracy with models that are significantly larger.
|
| 40 |
+
- **JudgerBenchV2:** We introduce a new, comprehensive benchmark with 10,000 questions across 10 scenarios, using a Mixture-of-Judgers (MoJ) consensus for more reliable ground truth.
|
| 41 |
|
| 42 |
This repository contains the **CompassJudger-2** series of models, fine-tuned on the Qwen2.5-Instruct series.
|
| 43 |
|
|
|
|
| 124 |
| CompassJudger-1-32B-Instruct | 60.33 | 62.29 | 77.63 | 86.17 | 71.61 |
|
| 125 |
| Skywork-Critic-Llama-3.1-70B | 52.41 | 50.65 | 65.50 | 93.30 | 65.47 |
|
| 126 |
| RISE-Judge-Qwen2.5-32B | 56.42 | 63.87 | 73.70 | 92.70 | 71.67 |
|
| 127 |
+
| **CompassJudger-2-32B-Instruct** | **62.21** | **65.48** | 72.98 | **92.62** | **73.32** |
|
| 128 |
| **General Models (for reference)** | | | | | |
|
| 129 |
| Qwen2.5-32B-Instruct | 62.97 | 59.84 | 74.99 | 85.61 | 70.85 |
|
| 130 |
| DeepSeek-V3-0324 | 64.43 | 59.68 | 78.16 | 85.17 | 71.86 |
|
| 131 |
| Qwen3-235B-A22B | 61.40 | 65.97 | 75.59 | 84.68 | 71.91 |
|
| 132 |
|
| 133 |
|
| 134 |
+
For detailed benchmark performance and methodology, please refer to our [📑 Paper](https://huggingface.co/papers/2507.09104).
|
| 135 |
|
| 136 |
## License
|
| 137 |
|