HRM-Text1-41M
HRM-Text1 is an experimental text generation architecture based on the Hierarchical Reasoning Model (HRM) architecture. I added positional embeddings to the model for each token and tweaked the training code a bit from their implementation so that text generation would work well. It was trained from scratch on the roneneldan/TinyStories
dataset, and it can kind of produce... let's say semi-coherent sentences ;)
Note: This repo corresponds to the 41M parameter model, which is the first iteration. Also note that although it has 'reasoning' in the name, this model does not do chain-of-thought reasoning. The 'reasoning' just helps the model on a per-token basis.
The model utilizes the HRM structure, consisting of a "Specialist" module for low-level processing and a "Manager" module for high-level abstraction and planning. This architecture aims to handle long-range dependencies more effectively by summarizing information at different temporal scales.
Model Description
- Architecture: Hierarchical Reasoning Model (HRM)
- Training Data: roneneldan/TinyStories
- Original Paper: Hierarchical Reasoning Model
- Source/Model Code: qingy1337/HRM-Text
- Tokenizer: T5 Tokenizer
- Objective: Causal Language Modeling
Example Outputs
Here are some example outputs from the model (it turns dark pretty quickly, no idea why haha).
- Prompt: "Once upon"
Model: Once upon a time, there was a little girl named Lily. She loved to play with her toys and her friends. One day, Lily and her friends went to the park. They played on the swings and the slide. They had a lot of fun. But then, Lily saw a boy with a shiny toy that looked like a real dinosaur. She wanted to play with him, but she was shy. She said, "Hi, I'm Lily. What's your name?" The boy looked at her and said, "I'm Max. I'm shy. What's your name?" Lily said, "I'm Max. Do you want to play with me?" Max said, "No, I'm Lily. I'm shy. I'm just new here." Lily and Max did not talk. They thought Max was rude. They felt sad and confused. They wanted to play with Max and Max. So, they walked to the boy and said, "Hi, I'm Lily. I like to play with you. What's your name?" Lily said, "Hi, I'm Max. What's your name?" Max said, "My name is Ben. I'm Ben. Do you want to play with me?" Lily and Max said, "Yes, please. I will play with you. But first, let's play tag. We can run after him and have fun." Lily and Max ran after him, but they were too fast. They ran faster and faster, and did not see the big dog that was sleeping under a tree. The dog was sleeping and did not hear them. He was too busy playing. He didn't notice the dog that was sleeping under the tree. The dog saw the dog and jumped on him. He bit the dog's hand and tore it apart. Lily and Max were very scared and sad. They didn't know what to do. They wanted to run away, but they knew they had to go home. They cried, "Help! Help! There's a dog!" But no one heard them. They were too late. The dog caught up with them and bit the dog. He ate the dog and bit him. Lily and Max cried, but it was too late. They wished they
- Prompt: "Once upon a time, there was a boy named Jammer"
Model: Once upon a time, there was a boy named Jammer who loved to play with his toy engine. He would push it around the floor and make it go around the house. One day, he went to the park with his mom and dad and saw a big slide. He wanted to go on it, but he did not want to go on the slide. He asked his mom and dad, but they said no. He told him to stay on the swing and wait for his turn. He was bored and sad. He waited until his mom and dad were busy in the park, not looking, and then he sneaked to the swing and pushed too hard. Then, he landed on the swing and slid down fast. He felt the wind on his face and the sun on his face. He smiled and laughed. He felt happy and free. He swung back and forth and followed his dad. He swung back and forth and laughed and laughed. He felt like he was flying. But then, the wind blew harder and harder. He pushed and pulled and pushed harder, until finally, the swing swung off the ground. He fell off his toy engine and started to cry. His mom and dad ran to him and hugged him. They checked him and kissed him and gave him a kiss. She felt better and happy again. She said, "Don't cry, my love. We'll go on another slide today. We can go on the slide together." They went to the slide and climbed up the ladder. He said, "This is fun! Let's go!" He used his toy engine to push and push, and soon he was on the other side. He looked down and smiled. He felt the wind in his hair and the sun on his face. He felt the wind in his hair and the sun on his face. He was scared and confused. He did not know what to do. He said, "Mom, help me!" His mom and dad looked at him and saw him in pain. They said, "Don't worry, my love. We will get you a band-aid and a band-aid for you." They took him to the bench and took him to the bench. They sat on the bench and waited for him to come back. The end.
(Code to run the model can be found here)
Intended Uses & Limitations
Intended Use
This model is intended for creative and research purposes, specifically for generating short, simple stories in English, similar in style and complexity to those found in the TinyStories dataset. It can be used as a base for research into hierarchical models and long-context understanding.
Limitations and Bias
- Domain Specificity: The model is exclusively trained on
TinyStories
and will not generalize well to other domains like technical writing, news articles, or formal language. - Low Context Length: The model only supports a context length of 512, which is far less than you would need for practical use cases.
- Undertraining: The model has been trained for a single epoch. While performance on in-domain data might be reasonable, it is likely undertrained (< Chinchilla optimal of 20 tokens per parameter) and may lack robustness.
- Experimental Architecture: The HRM architecture is not a standard Transformer. Its behavior, performance, and failure modes may differ from well-established models.
- Inherited Bias: (of course) The model will reflect any biases present in the
roneneldan/TinyStories
dataset.
Training Details
Training Data
The model was trained on the train
split of the roneneldan/TinyStories
dataset. The text was tokenized using the google-t5/t5-small
tokenizer.
Training Procedure
The model was trained for 1 epoch using PyTorch. This took around 4.5 hours. Final training loss after an epoch was around 0.8.
Hyperparameters
Parameter | Value |
---|---|
optimizer |
AdamW |
learning_rate |
2e-4 |
lr_scheduler |
CosineAnnealingLR (over 2 epochs but I stopped it at 1 to avoid overfitting) |
batch_size |
32 |
epochs |
1 |
gradient_clipping |
1.0 |
device |
NVIDIA H100 80GB GPU |