Update README.md
Browse files
README.md
CHANGED
@@ -14,11 +14,8 @@ tags:
|
|
14 |
|
15 |
## Model Overview
|
16 |
|
17 |
-
|
18 |
-
|
19 |
-
Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model’s memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details [here](https://arxiv.org/abs/2503.18908)), it also offers a significant improvement in latency.
|
20 |
-
|
21 |
-
The model underwent a multi-phase post-training process to enhance both its reasoning and non-reasoning capabilities. This includes a supervised fine-tuning stage for Math, Code, Reasoning, Chat, and Tool Calling as well as multiple reinforcement learning (RL) stages using Group Relative Policy Optimization (GRPO) algorithms for reasoning, chat, and instruction-following.
|
22 |
|
23 |
This model is ready for commercial use.
|
24 |
|
|
|
14 |
|
15 |
## Model Overview
|
16 |
|
17 |
+
OpenCodeReasoning-Distill-Qwen-7B-Instruct is a large language model (LLM) which is a derivative of [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) (AKA the *reference model*).
|
18 |
+
It is a reasoning model that is post trained for reasoning while code generation. The model supports a context length of 32K tokens.
|
|
|
|
|
|
|
19 |
|
20 |
This model is ready for commercial use.
|
21 |
|