ibm-ai-platform
/

Bamba-9B-v1

@@ -8,9 +8,9 @@ license: apache-2.0
 # Model Card for Bamba 9B
 We introduce Bamba-9B, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on 200 billion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
-| Model            | Params       | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length |  Tied Embeddings |
-|-------------------|--------------|----------|-------------|-----------------|-----|----------|----------------|------------------|
-| Bamba  | 9B (9.78B)   | 32       | 4096        | 32              | Yes | 8        | 4096           | False |
 The current release includes the following models:
@@ -69,168 +69,134 @@ contributed [HF-version of Mamba2-Hybrid]() (TODO: add link once live).
 ### Base pretrained models
 <table>
-  <tr>
-   <td><strong>Category</strong>
-   </td>
-   <td><strong>Benchmark</strong>
-   </td>
-   <td><strong>Setting</strong></td>
-   <td><strong>Metric</strong></td>
-   <td><strong>Bamba 9B (2.2T)</strong>
-   </td>
-  </tr>
-  <tr>
-   <td rowspan="8" >General
-   </td>
-   <td>MMLU
-   </td>
-   <td>5-shot</td>
-   <td>Accuracy</td>
-   <td>60.77
-   </td>
-  </tr>
-  <tr>
-   <td>ARC-C
-   </td>
-  <td>25-shot</td>
-   <td>Accuracy normalized</td>
-   <td>63.23
-   </td>
-  </tr>
-  <tr>
-   <td>GSM8K
-   </td>
-    <td>5-shot</td>
-   <td>exact match</td>
-   <td>36.77
-   </td>
-  </tr>
-  <tr>
-   <td>Hellaswag
-   </td>
-    <td>10-shot</td>
-   <td>Accuracy normalized</td>
-   <td>81.8
-   </td>
-  </tr>
-  <tr>
-   <td>OpenbookQA
-   </td>
-    <td>5-shot</td>
-   <td>Accuracy normalized</td>
-   <td>47.6
-   </td>
-  </tr>
-  <tr>
-   <td>Piqa
-   </td>
-    <td>5-shot</td>
-   <td>Accuracy normalized</td>
-   <td>82.26
-   </td>
-  </tr>
-  <tr>
-   <td>TruthfulQA
-   </td>
-     <td>0-shot</td>
-   <td>Accuracy</td>
-   <td>49.21
-   </td>
-  </tr>
-  <tr>
-   <td>Winogrande
-   </td>
-     <td>5-shot</td>
-   <td>Accuracy</td>
-   <td>76.87
-   </td>
-  </tr>
-  <tr>
-   <td rowspan="6" >HF LLM- V2
-   </td>
-   <td>MMLU-PRO
-   </td>
-     <td>5-shot</td>
-   <td>Accuracy</td>
-   <td>17.53
-   </td>
-  </tr>
-  <tr>
-   <td>BBH
-   </td>
-     <td>3-shot</td>
-   <td>Accuracy normalized</td>
-   <td>17.4
-   </td>
-  </tr>
-  <tr>
-   <td>GPQA
-   </td>
-     <td>0-shot</td>
-   <td>Accuracy normalized</td>
-   <td>4.14
-   </td>
-  </tr>
-  <tr>
-   <td>IFEval
-   </td>
-     <td>0-shot</td>
-   <td>inst_level_strict_acc + prompt_level_strict_acc</td>
-   <td>15.16
-   </td>
-  </tr>
-  <tr>
-   <td>MATH Lvl 5
-   </td>
-    <td>4-shot</td>
-   <td>Exact match</td>
-   <td>1.66
-   </td>
-  </tr>
-  <tr>
-   <td>MuSR
-   </td>
-    <td>0-shot</td>
-   <td>Accuracy normalized</td>
-   <td>9.59
-   </td>
-  </tr>
-  <tr>
-   <td rowspan="4" >Safety Tasks
-   </td>
-   <td>PopQA
-   </td>
-     <td>5-shot, generation</td>
-   <td>Accuracy</td>
-   <td>20.5
-   </td>
-  </tr>
-  <tr>
-   <td>Toxigen
-   </td>
-    <td>5-shot, logits</td>
-   <td>Accuracy</td>
-   <td>57.4
-   </td>
-  </tr>
-  <tr>
-   <td>BBQ
-   </td>
-     <td>5-shot, generation</td>
-   <td>Accuracy</td>
-   <td>44.2
-   </td>
-    </tr>
-  <tr>
-   <td>Crows-pairs_english
-   </td>
-    <td>5-shot, generation</td>
-   <td>pct_stereotype (lower is better)</td>
-   <td>70.78
-   </td>
-  </tr>
 </table>
 ## Fine-tuning
@@ -247,15 +213,13 @@ python -m fms_mo.run_quant \
     --output_dir <"path_to_save_new_model">
 ```
 Model size comparison before and after FP8:
-||original|quantized |
-|:----:|----:|----:|
-|memory (total)|39.12 GB|10.83 GB|
-|memory (break-down)|`torch.float32` 39.12 GB|`torch.bfloat16` 2.10 GB<br>`torch.float8_e4m3fn`    8.73 GB|
 More details about `fms-model-optimizer` can be found [here](https://github.com/foundation-model-stack/fms-model-optimizer/tree/main/examples/FP8_QUANT#quickstart).
-## Evaluation
 ## Llama.cpp
 There is preliminary work to enable running Bamba architecture models using [llama.cpp](https://github.com/ggerganov/llama.cpp). This is work-in-progress, so should only be used as a guide for the adventurous!

 # Model Card for Bamba 9B
 We introduce Bamba-9B, a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on 200 billion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
+| Model | Params     | # Layers | Hidden Dim. | Attention Heads | GQA  | KV Heads | Context Length | Tied Embeddings |
+| ----- | ---------- | -------- | ----------- | --------------- | ---- | -------- | -------------- | --------------- |
+| Bamba | 9B (9.78B) | 32       | 4096        | 32              | Yes  | 8        | 4096           | False           |
 The current release includes the following models:
 ### Base pretrained models
 <table>
+ <tr>
+<td><strong>Category</strong>
+</td>
+<td><strong>Benchmark</strong>
+</td>
+<td><strong>Bamba 9B (2.2T)</strong>
+</td>
+</tr>
+<tr>
+<td rowspan="8" >General
+</td>
+<td>MMLU (5-shot)
+</td>
+<td>60.77
+</td>
+</tr>
+<tr>
+<td>ARC-C (25-shot)
+</td>
+<td>63.23
+</td>
+</tr>
+<tr>
+<td>GSM8K (5-shot)
+</td>
+<td>36.77
+</td>
+</tr>
+<tr>
+<td>Hellaswag (10-shot)
+</td>
+<td>81.8
+</td>
+</tr>
+<tr>
+<td>OpenbookQA (5-shot)
+</td>
+<td>47.6
+</td>
+</tr>
+<tr>
+<td>Piqa (5-shot)
+</td>
+<td>82.26
+</td>
+</tr>
+<tr>
+<td>TruthfulQA (0-shot)
+</td>
+<td>49.21
+</td>
+</tr>
+<tr>
+<td>Winogrande (5-shot)
+</td>
+<td>76.87
+</td>
+</tr>
+<tr>
+<td rowspan="6" >HF OpenLLM- V2*
+</td>
+<td>MMLU-PRO (5-shot)
+</td>
+<td>17.53
+</td>
+</tr>
+<tr>
+<td>BBH (3-shot)
+</td>
+<td>17.4
+</td>
+</tr>
+<tr>
+<td>GPQA (0-shot)
+</td>
+<td>4.14
+</td>
+</tr>
+<tr>
+<td>IFEval (0-shot)
+</td>
+<td>15.16
+</td>
+</tr>
+<tr>
+<td>MATH Lvl 5 (4-shot)
+</td>
+<td>1.66
+</td>
+</tr>
+<tr>
+<td>MuSR (0-shot)
+</td>
+<td>9.59
+</td>
+</tr>
+<tr>
+<td rowspan="4" >Safety Tasks
+</td>
+<td>PopQA (5-shot)
+</td>
+<td>20.5
+</td>
+</tr>
+<tr>
+<td>Toxigen (5-shot)
+</td>
+<td>57.4
+</td>
+</tr>
+<tr>
+<td>BBQ (5-shot)
+</td>
+<td>44.2
+</td>
+</tr>
+<tr>
+<td>Crows-pairs english (5-shot)
+</td>
+<td>70.78
+</td>
+</tr>
 </table>
+*For the v2 leaderboard results, we perform [normalization](https://huggingface.co/docs/leaderboards/open_llm_leaderboard/normalization) and report the normalized results.
+Further details on our evaluation and normalization detailes along with run and analysis scripts can be found [here](https://github.com/foundation-model-stack/bamba/blob/main/evaluation/README.md).
 ## Fine-tuning
     --output_dir <"path_to_save_new_model">
 ```
 Model size comparison before and after FP8:
+|                     |                 original |                                                    quantized |
+| :-----------------: | -----------------------: | -----------------------------------------------------------: |
+|   memory (total)    |                 39.12 GB |                                                     10.83 GB |
+| memory (break-down) | `torch.float32` 39.12 GB | `torch.bfloat16` 2.10 GB<br>`torch.float8_e4m3fn`    8.73 GB |
 More details about `fms-model-optimizer` can be found [here](https://github.com/foundation-model-stack/fms-model-optimizer/tree/main/examples/FP8_QUANT#quickstart).
 ## Llama.cpp
 There is preliminary work to enable running Bamba architecture models using [llama.cpp](https://github.com/ggerganov/llama.cpp). This is work-in-progress, so should only be used as a guide for the adventurous!