Update README.md
Browse files
README.md
CHANGED
@@ -11,54 +11,82 @@ tags:
|
|
11 |
- gpt2
|
12 |
---
|
13 |
|
14 |
-
|
15 |
|
16 |
-
|
17 |
|
|
|
18 |
|
|
|
19 |
|
20 |
-
|
21 |
|
22 |
-
|
23 |
-
- **fast**: the model generates fairly quickly to not interfere with the main text generation as possible
|
24 |
-
- **low vram requirement**: the model takes less vram, so it saves more vram for the main image generation model.
|
25 |
|
|
|
26 |
|
|
|
|
|
|
|
27 |
|
28 |
-
|
29 |
|
30 |
-
- **
|
31 |
-
- **
|
32 |
-
- **
|
33 |
-
- **
|
34 |
|
|
|
35 |
|
|
|
36 |
|
37 |
-
|
38 |
|
|
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
-
|
42 |
|
43 |
-
|
44 |
-
- **chara**: [danbooru character tags]
|
45 |
-
- **date**: [2020s | 2010s | 2000s]
|
46 |
-
- **quality**: [normal | good | excellent] (by image aesthetic ratings)
|
47 |
-
- **tags**: [rest of the danbooru general tags]
|
48 |
-
- **output**: model output
|
49 |
|
|
|
50 |
|
|
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
'<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>'
|
58 |
-
'<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>'
|
59 |
```
|
60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
|
|
|
62 |
|
63 |
-
|
|
|
64 |
|
|
|
|
11 |
- gpt2
|
12 |
---
|
13 |
|
14 |
+
# GPT2-Prompt-Upscaler-v1
|
15 |
|
16 |
+
> **Date Trained:** March 2024
|
17 |
|
18 |
+
A lightweight model for generating **Danbooru tag-based prompts** from just a few input tags. It’s simple, fast, and (hopefully) useful for enhancing your text-to-image generations. Also, nsfw mode included.
|
19 |
|
20 |
+
------
|
21 |
|
22 |
+
## Model Description
|
23 |
|
24 |
+
The **GPT2-Prompt-Upscaler-v1** is designed to extend and refine prompts, aligning them with the tag distribution you’d expect from Danbooru images. Think of it as a friendly helper that fills in the gaps when you’re stuck or want more details for your image generation.
|
|
|
|
|
25 |
|
26 |
+
### Why Use This Model?
|
27 |
|
28 |
+
- **Compact:** With just **335M parameters**, it’s much lighter than the bigger models like the [Phi-3](https://huggingface.co/KBlueLeaf/DanTagGen-beta?not-for-all-audiences=true) based and on-par with [TIPO](https://huggingface.co/KBlueLeaf/TIPO-500M) (but trained much earlier than that!). It won’t hog your VRAM.
|
29 |
+
- **Fast:** It runs <1s on a modern GPU to not adding extra time from your t2i generations.
|
30 |
+
- **Efficient:** Saves resources for your main image generation process.
|
31 |
|
32 |
+
### What Can It Do?
|
33 |
|
34 |
+
- **Character Refinement:** Turn a simple character tag (like `hatsune miku`) into a fully detailed prompt with hairstyles, outfits, and accessories.
|
35 |
+
- **Adding Details:** Sprinkle in creative details to enhance a scene or concept.
|
36 |
+
- **Inspiration:** Want random but interesting variations? Just toss in a short idea and let it play around.
|
37 |
+
- **Prompt Polishing:** Clean up and refine elements for better generation outputs.
|
38 |
|
39 |
+
------
|
40 |
|
41 |
+
## Training Details
|
42 |
|
43 |
+
The model is finetuned on **GPT2-medium** with **10M prompts** extracted from a refined Pixiv dataset for 5 epochs, with about ~2B tokens seen per epoch.
|
44 |
|
45 |
+
The training format looks something like this:
|
46 |
|
47 |
+
- **Rating:** [safe | nsfw]
|
48 |
+
- **Chara:** [Danbooru character tags]
|
49 |
+
- **Date:** [2020s | 2010s | 2000s]
|
50 |
+
- **Quality:** [normal | good | excellent] (based on aesthetics)
|
51 |
+
- **Tags:** [General Danbooru tags]
|
52 |
+
- **Output:** Model-generated continuation.
|
53 |
|
54 |
+
------
|
55 |
|
56 |
+
## How to Use It
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
+
It’s a causal GPT2 model, so you prepare your input in a structured format, and it keeps generating until it sees `</output>`. Here’s what an input might look like:
|
59 |
|
60 |
+
### Example Prompts
|
61 |
|
62 |
+
```lua
|
63 |
+
<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output>
|
64 |
+
<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output>
|
65 |
+
<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>
|
66 |
+
<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>
|
|
|
|
|
67 |
```
|
68 |
|
69 |
+
The model generates extensions, filling in gaps with tags that feel natural and fitting.
|
70 |
+
|
71 |
+
------
|
72 |
+
|
73 |
+
## Limitations
|
74 |
+
|
75 |
+
Okay, so here’s the deal: this model isn’t perfect. Since it’s relatively small and (confession time) it’s technically my 5th attempt at training but the *first* one I thought was worth keeping, it has a few quirks:
|
76 |
+
|
77 |
+
- **Over-Focusing on Scenery:** When you set the quality to "excellent," the model sometimes gets overly enthusiastic about beautiful backgrounds and makes characters too small.
|
78 |
+
- **Alphabetical Tagging:** Occasionally, it gets into a habit of generating tags in alphabetical order, which can lead to repetitive color tags at the end.
|
79 |
+
- **Needs More Data:** It might benefit from a retrain with updated Danbooru tags to iron out some of these issues.
|
80 |
+
|
81 |
+
So, yeah, this isn’t a “final form” model, but I think it’s still pretty handy. I might update it in the future, so stay tuned!
|
82 |
+
|
83 |
+
------
|
84 |
+
|
85 |
+
## What’s Next?
|
86 |
|
87 |
+
I’ll be sharing:
|
88 |
|
89 |
+
- **The Dataset:** The Pixiv 2023 prompts corpus used for training.
|
90 |
+
- **A Demo:** A simple interface to try out the model.
|
91 |
|
92 |
+
This model is for anyone who wants quick and lightweight prompt refinement without the heavy lifting of larger models. Play around with it, and let me know what you think!
|