trojblue
/

gpt2-prompt-upscaler-v1

@@ -11,54 +11,82 @@ tags:
 - gpt2
 ---
->  date trained: March, 2024
-A lightweight prompt-generating model for generating danbooru tag-based prompts from few input tags.
-The model is trained with the consideration of being a supplementary **lightweight** model for prompt refinements:
-- **small**: with only **335M** parameters, the model offers an lightweight solution of prompt filling compared to much larger models such as phi-3 or llama 8b.
-- **fast**: the model generates fairly quickly to not interfere with the main text generation as possible
-- **low vram requirement**: the model takes less vram, so it saves more vram for the main image generation model.
-The model is capable of:
-- **character refinement**: filling more details about characters by inputting a danbooru character tag (eg. `hatsune miku`)
-- **filling small details**: Filling creative details onto a faily thought out scene (add small details)
-- **creative inspirations**: adding randomness to a short prompt for inspirations
-- **prompt-to-prompt**: refine prompt elements to steer toward a better generation
-## Training details
-The model is finetuned on GPT2-medium, using 10M prompts from a refined full pixiv dataset, in the format of:
-- **rating**: [safe | nsfw]
-- **chara**: [danbooru character tags]
-- **date**: [2020s | 2010s | 2000s]
-- **quality**: [normal | good | excellent]  (by image aesthetic ratings)
-- **tags**: [rest of the danbooru general tags]
-- **output**: model output
-Some example training entries would look like this:
-```
-'<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output>'
-'<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output>'
-'<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>'
-'<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>'
 ```
-You can find the full dataset soon.

 - gpt2
 ---
+# GPT2-Prompt-Upscaler-v1
+> **Date Trained:** March 2024
+A lightweight model for generating **Danbooru tag-based prompts** from just a few input tags. It’s simple, fast, and (hopefully) useful for enhancing your text-to-image generations. Also, nsfw mode included.
+------
+## Model Description
+The **GPT2-Prompt-Upscaler-v1** is designed to extend and refine prompts, aligning them with the tag distribution you’d expect from Danbooru images. Think of it as a friendly helper that fills in the gaps when you’re stuck or want more details for your image generation.
+### Why Use This Model?
+- **Compact:** With just **335M parameters**, it’s much lighter than the bigger models like the [Phi-3](https://huggingface.co/KBlueLeaf/DanTagGen-beta?not-for-all-audiences=true) based and on-par with [TIPO](https://huggingface.co/KBlueLeaf/TIPO-500M) (but trained much earlier than that!). It won’t hog your VRAM.
+- **Fast:** It runs <1s on a modern GPU to not adding extra time from your t2i generations.
+- **Efficient:** Saves resources for your main image generation process.
+### What Can It Do?
+- **Character Refinement:** Turn a simple character tag (like `hatsune miku`) into a fully detailed prompt with hairstyles, outfits, and accessories.
+- **Adding Details:** Sprinkle in creative details to enhance a scene or concept.
+- **Inspiration:** Want random but interesting variations? Just toss in a short idea and let it play around.
+- **Prompt Polishing:** Clean up and refine elements for better generation outputs.
+------
+## Training Details
+The model is finetuned on **GPT2-medium** with **10M prompts** extracted from a refined Pixiv dataset for 5 epochs, with about ~2B  tokens seen per epoch.
+The training format looks something like this:
+- **Rating:** [safe | nsfw]
+- **Chara:** [Danbooru character tags]
+- **Date:** [2020s | 2010s | 2000s]
+- **Quality:** [normal | good | excellent] (based on aesthetics)
+- **Tags:** [General Danbooru tags]
+- **Output:** Model-generated continuation.
+------
+## How to Use It
+It’s a causal GPT2 model, so you prepare your input in a structured format, and it keeps generating until it sees `</output>`. Here’s what an input might look like:
+### Example Prompts
+```lua
+<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output>
+<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output>
+<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>
+<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>
 ```
+The model generates extensions, filling in gaps with tags that feel natural and fitting.
+------
+## Limitations
+Okay, so here’s the deal: this model isn’t perfect. Since it’s relatively small and (confession time) it’s technically my 5th attempt at training but the *first* one I thought was worth keeping, it has a few quirks:
+- **Over-Focusing on Scenery:** When you set the quality to "excellent," the model sometimes gets overly enthusiastic about beautiful backgrounds and makes characters too small.
+- **Alphabetical Tagging:** Occasionally, it gets into a habit of generating tags in alphabetical order, which can lead to repetitive color tags at the end.
+- **Needs More Data:** It might benefit from a retrain with updated Danbooru tags to iron out some of these issues.
+So, yeah, this isn’t a “final form” model, but I think it’s still pretty handy. I might update it in the future, so stay tuned!
+------
+## What’s Next?
+I’ll be sharing:
+- **The Dataset:** The Pixiv 2023 prompts corpus used for training.
+- **A Demo:** A simple interface to try out the model.
+This model is for anyone who wants quick and lightweight prompt refinement without the heavy lifting of larger models. Play around with it, and let me know what you think!