Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ pipeline_tag: text-generation
|
|
17 |
|
18 |
# HRM-Text1-41M
|
19 |
|
20 |
-
**HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset,
|
21 |
|
22 |
*Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
|
23 |
|
@@ -66,7 +66,7 @@ This model is intended for creative and research purposes, specifically for gene
|
|
66 |
The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
|
67 |
|
68 |
### Training Procedure
|
69 |
-
The model was trained for 1 epoch using PyTorch. This took around 4.5 hours.
|
70 |
|
71 |
#### Hyperparameters
|
72 |
<table>
|
|
|
17 |
|
18 |
# HRM-Text1-41M
|
19 |
|
20 |
+
**HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset, and it can kind of produce... let's say semi-coherent sentences ;)
|
21 |
|
22 |
*Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
|
23 |
|
|
|
66 |
The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
|
67 |
|
68 |
### Training Procedure
|
69 |
+
The model was trained for 1 epoch using PyTorch. This took around 4.5 hours. Final training loss after an epoch was around 0.8.
|
70 |
|
71 |
#### Hyperparameters
|
72 |
<table>
|