tmfi-us
/

Progenitor-V5-Final-LLaMa-70B

Text Generation

text-generation-inference

Model card Files Files and versions

y-ryan commited on Feb 16

Commit

5cab0ba

·

verified ·

1 Parent(s): eaeae60

Update README.md

Files changed (1) hide show

README.md +10 -2

README.md CHANGED Viewed

@@ -13,7 +13,15 @@ tags:
 license: llama3.3
 ---
-The [original model](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B) had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
 ```
 ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
 ```
@@ -60,7 +68,7 @@ if __name__ == "__main__":
     main()
 ```
-Original README.md from here:
 This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
 # merge

 license: llama3.3
 ---
+[original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B
+### Update Feb 16, 2025 morning PST:
+The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).
+# Overview
+The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
 ```
 ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
 ```
     main()
 ```
+# Original README.md from here:
 This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
 # merge