qingy2024 commited on
Commit
c91eab8
·
verified ·
1 Parent(s): 23a2b2e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -17,7 +17,7 @@ pipeline_tag: text-generation
17
 
18
  # HRM-Text1-41M
19
 
20
- **HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset, designed to produce simple, coherent, and child-appropriate stories.
21
 
22
  *Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
23
 
@@ -66,7 +66,7 @@ This model is intended for creative and research purposes, specifically for gene
66
  The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
67
 
68
  ### Training Procedure
69
- The model was trained for 1 epoch using PyTorch. This took around 4.5 hours.
70
 
71
  #### Hyperparameters
72
  <table>
 
17
 
18
  # HRM-Text1-41M
19
 
20
+ **HRM-Text1** is an experimental text generation architecture based on the **Hierarchical Reasoning Model (HRM)** architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the `roneneldan/TinyStories` dataset, and it can kind of produce... let's say semi-coherent sentences ;)
21
 
22
  *Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.*
23
 
 
66
  The model was trained on the `train` split of the `roneneldan/TinyStories` dataset. The text was tokenized using the `google-t5/t5-small` tokenizer.
67
 
68
  ### Training Procedure
69
+ The model was trained for 1 epoch using PyTorch. This took around 4.5 hours. Final training loss after an epoch was around 0.8.
70
 
71
  #### Hyperparameters
72
  <table>