qingy2024
/

HRM-Text1-41M

Text Generation

Model card Files Files and versions

qingy2024 commited on 18 days ago

Commit

c91eab8

·

verified ·

1 Parent(s): 23a2b2e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ pipeline_tag: text-generation
 # HRM-Text1-41M
-**HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset, designed to produce simple, coherent, and child-appropriate stories.
 *Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
@@ -66,7 +66,7 @@ This model is intended for creative and research purposes, specifically for gene
 The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
 ### Training Procedure
-The model was trained for 1 epoch using PyTorch. This took around 4.5 hours.
 #### Hyperparameters
 <table>

 # HRM-Text1-41M
+**HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset, and it can kind of produce... let's say semi-coherent sentences ;)
 *Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
 The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
 ### Training Procedure
+The model was trained for 1 epoch using PyTorch. This took around 4.5 hours. Final training loss after an epoch was around 0.8.
 #### Hyperparameters
 <table>