Improve model card: Add pipeline tag, library name, and relevant tags; update paper link in content

This PR improves the model card by:

- Adding the `pipeline_tag: text-generation` to ensure the model appears in relevant searches on the Hugging Face Hub.
- Adding `library_name: transformers` to enable the "Use in Transformers" widget for direct copy-pasting code snippets.
- Adding `llm-as-judge` and `qwen2` to the `tags` list for enhanced discoverability.
- Updating the paper link in the "Evaluation" section to point to the Hugging Face paper page (`https://huggingface.co/papers/2507.09104`) for better integration with the Hugging Face ecosystem.

Files changed (1) hide show

README.md +11 -7

README.md CHANGED Viewed

@@ -1,6 +1,10 @@
 ---
 license: apache-2.0
 ---
 # CompassJudger-2
@@ -30,10 +34,10 @@ We introduce **CompassJudger-2**, a novel series of generalist judge models desi
 Key contributions of our work include:
-- **Advanced Data Strategy:** We employ a task-driven, multi-domain data curation and synthesis strategy to enhance the model's robustness and domain adaptability.
-- **Verifiable Reward-Guided Training:** We supervise judgment tasks with verifiable rewards, guiding the model's intrinsic reasoning through chain-of-thought (CoT) and rejection sampling. A refined margin policy gradient loss further enhances performance.
-- **Superior Performance:** CompassJudger-2 achieves state-of-the-art results across multiple judge and reward benchmarks. Our 7B model demonstrates competitive accuracy with models that are significantly larger.
-- **JudgerBenchV2:** We introduce a new, comprehensive benchmark with 10,000 questions across 10 scenarios, using a Mixture-of-Judgers (MoJ) consensus for more reliable ground truth.
 This repository contains the **CompassJudger-2** series of models, fine-tuned on the Qwen2.5-Instruct series.
@@ -120,14 +124,14 @@ CompassJudger-2 sets a new state-of-the-art for judge models, outperforming gene
 | CompassJudger-1-32B-Instruct       |     60.33      |   62.29    |   77.63   |    86.17    |   71.61   |
 | Skywork-Critic-Llama-3.1-70B       |     52.41      |   50.65    |   65.50   |    93.30    |   65.47   |
 | RISE-Judge-Qwen2.5-32B             |     56.42      |   63.87    |   73.70   |    92.70    |   71.67   |
-| **CompassJudger-2-32B-Instruct**   |   **62.21**    | **65.48**  |   72.98   |  **92.62**  | **73.32** |
 | **General Models (for reference)** |                |            |           |             |           |
 | Qwen2.5-32B-Instruct               |     62.97      |   59.84    |   74.99   |    85.61    |   70.85   |
 | DeepSeek-V3-0324                   |     64.43      |   59.68    |   78.16   |    85.17    |   71.86   |
 | Qwen3-235B-A22B                    |     61.40      |   65.97    |   75.59   |    84.68    |   71.91   |
-For detailed benchmark performance and methodology, please refer to our [📑 Paper](https://arxiv.org/abs/2507.09104).
 ## License

 ---
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+  - llm-as-judge
+  - qwen2
 ---
 # CompassJudger-2
 Key contributions of our work include:
+-   **Advanced Data Strategy:** We employ a task-driven, multi-domain data curation and synthesis strategy to enhance the model's robustness and domain adaptability.
+-   **Verifiable Reward-Guided Training:** We supervise judgment tasks with verifiable rewards, guiding the model's intrinsic reasoning through chain-of-thought (CoT) and rejection sampling. A refined margin policy gradient loss further enhances performance.
+-   **Superior Performance:** CompassJudger-2 achieves state-of-the-art results across multiple judge and reward benchmarks. Our 7B model demonstrates competitive accuracy with models that are significantly larger.
+-   **JudgerBenchV2:** We introduce a new, comprehensive benchmark with 10,000 questions across 10 scenarios, using a Mixture-of-Judgers (MoJ) consensus for more reliable ground truth.
 This repository contains the **CompassJudger-2** series of models, fine-tuned on the Qwen2.5-Instruct series.
 | CompassJudger-1-32B-Instruct       |     60.33      |   62.29    |   77.63   |    86.17    |   71.61   |
 | Skywork-Critic-Llama-3.1-70B       |     52.41      |   50.65    |   65.50   |    93.30    |   65.47   |
 | RISE-Judge-Qwen2.5-32B             |     56.42      |   63.87    |   73.70   |    92.70    |   71.67   |
+| **CompassJudger-2-32B-Instruct**   |   **62.21**   | **65.48**  |   72.98   |  **92.62**  | **73.32** |
 | **General Models (for reference)** |                |            |           |             |           |
 | Qwen2.5-32B-Instruct               |     62.97      |   59.84    |   74.99   |    85.61    |   70.85   |
 | DeepSeek-V3-0324                   |     64.43      |   59.68    |   78.16   |    85.17    |   71.86   |
 | Qwen3-235B-A22B                    |     61.40      |   65.97    |   75.59   |    84.68    |   71.91   |
+For detailed benchmark performance and methodology, please refer to our [📑 Paper](https://huggingface.co/papers/2507.09104).
 ## License