Update README.md
Browse files
README.md
CHANGED
|
@@ -13,7 +13,15 @@ tags:
|
|
| 13 |
license: llama3.3
|
| 14 |
---
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
```
|
| 18 |
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
|
| 19 |
```
|
|
@@ -60,7 +68,7 @@ if __name__ == "__main__":
|
|
| 60 |
main()
|
| 61 |
```
|
| 62 |
|
| 63 |
-
Original README.md from here:
|
| 64 |
|
| 65 |
This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
|
| 66 |
# merge
|
|
|
|
| 13 |
license: llama3.3
|
| 14 |
---
|
| 15 |
|
| 16 |
+
[original model]: https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B
|
| 17 |
+
|
| 18 |
+
### Update Feb 16, 2025 morning PST:
|
| 19 |
+
|
| 20 |
+
The author of the [original model] mentioned that this model gave very different outputs. See ongoing discussion [here](https://huggingface.co/Tarek07/Progenitor-V5-Final-LLaMa-70B/discussions/1#67b21fd3ba726eda5c98e812).
|
| 21 |
+
|
| 22 |
+
# Overview
|
| 23 |
+
|
| 24 |
+
The [original model] had invalid `tensor.Shape` for weights (`[1, 8192]`), raising following errors when loading with `transformers`:
|
| 25 |
```
|
| 26 |
ValueError: Trying to set a tensor of shape torch.Size([1, 8192]) in "weight" (which has shape torch. Size ( [8192])), this looks incorrect.
|
| 27 |
```
|
|
|
|
| 68 |
main()
|
| 69 |
```
|
| 70 |
|
| 71 |
+
# Original README.md from here:
|
| 72 |
|
| 73 |
This marks the culmination of my experiments with the Progenitor series. I fixed the typo I had earlier where it wasn't computing in float32, but 6 models in computed in float32 is a bit taxing on resources and time and so I left it for the configuration I thought was the best (it's not something I can afford to do with every model I make, just the worthwhile ones). This one also uses the Sicari's tokenizer which I find the best.
|
| 74 |
# merge
|