Update README.md (#1)
Browse files- Update README.md (309c6a8092d8534bb1629622736e3d5ed866555a)
README.md
CHANGED
|
@@ -74,7 +74,7 @@ As of 29 May 2025, Qwen-3-Nemotron-32B-Reward has comparable scores on [JudgeBen
|
|
| 74 |
|
| 75 |
## Use Case
|
| 76 |
|
| 77 |
-
Qwen-3-Nemotron-32B-Reward assigns a reward score to
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
@@ -146,7 +146,7 @@ If you find this model useful, please cite the following work:
|
|
| 146 |
|
| 147 |
```bibtex
|
| 148 |
@misc{wang2025helpsteer3preferenceopenhumanannotatedpreference,
|
| 149 |
-
title={
|
| 150 |
author={Zhilin Wang and Jiaqi Zeng and Olivier Delalleau and Hoo-Chang Shin and Felipe Soares and Alexander Bukharin and Ellie Evans and Yi Dong and Oleksii Kuchaiev},
|
| 151 |
year={2025},
|
| 152 |
eprint={2505.11475},
|
|
|
|
| 74 |
|
| 75 |
## Use Case
|
| 76 |
|
| 77 |
+
Qwen-3-Nemotron-32B-Reward assigns a reward score to an LLM-generated response in a user–assistant dialogue.
|
| 78 |
|
| 79 |
---
|
| 80 |
|
|
|
|
| 146 |
|
| 147 |
```bibtex
|
| 148 |
@misc{wang2025helpsteer3preferenceopenhumanannotatedpreference,
|
| 149 |
+
title={Help{S}teer3-{P}reference: Open Human-Annotated Preference Data across Diverse Tasks and Languages},
|
| 150 |
author={Zhilin Wang and Jiaqi Zeng and Olivier Delalleau and Hoo-Chang Shin and Felipe Soares and Alexander Bukharin and Ellie Evans and Yi Dong and Oleksii Kuchaiev},
|
| 151 |
year={2025},
|
| 152 |
eprint={2505.11475},
|