trojblue commited on
Commit
093df5a
·
verified ·
1 Parent(s): 4b871fe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -27
README.md CHANGED
@@ -11,54 +11,82 @@ tags:
11
  - gpt2
12
  ---
13
 
14
- > date trained: March, 2024
15
 
16
- A lightweight prompt-generating model for generating danbooru tag-based prompts from few input tags.
17
 
 
18
 
 
19
 
20
- The model is trained with the consideration of being a supplementary **lightweight** model for prompt refinements:
21
 
22
- - **small**: with only **335M** parameters, the model offers an lightweight solution of prompt filling compared to much larger models such as phi-3 or llama 8b.
23
- - **fast**: the model generates fairly quickly to not interfere with the main text generation as possible
24
- - **low vram requirement**: the model takes less vram, so it saves more vram for the main image generation model.
25
 
 
26
 
 
 
 
27
 
28
- The model is capable of:
29
 
30
- - **character refinement**: filling more details about characters by inputting a danbooru character tag (eg. `hatsune miku`)
31
- - **filling small details**: Filling creative details onto a faily thought out scene (add small details)
32
- - **creative inspirations**: adding randomness to a short prompt for inspirations
33
- - **prompt-to-prompt**: refine prompt elements to steer toward a better generation
34
 
 
35
 
 
36
 
37
- ## Training details
38
 
 
39
 
 
 
 
 
 
 
40
 
41
- The model is finetuned on GPT2-medium, using 10M prompts from a refined full pixiv dataset, in the format of:
42
 
43
- - **rating**: [safe | nsfw]
44
- - **chara**: [danbooru character tags]
45
- - **date**: [2020s | 2010s | 2000s]
46
- - **quality**: [normal | good | excellent] (by image aesthetic ratings)
47
- - **tags**: [rest of the danbooru general tags]
48
- - **output**: model output
49
 
 
50
 
 
51
 
52
- Some example training entries would look like this:
53
-
54
- ```
55
- '<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output>'
56
- '<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output>'
57
- '<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>'
58
- '<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>'
59
  ```
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
 
62
 
63
- You can find the full dataset soon.
 
64
 
 
 
11
  - gpt2
12
  ---
13
 
14
+ # GPT2-Prompt-Upscaler-v1
15
 
16
+ > **Date Trained:** March 2024
17
 
18
+ A lightweight model for generating **Danbooru tag-based prompts** from just a few input tags. It’s simple, fast, and (hopefully) useful for enhancing your text-to-image generations. Also, nsfw mode included.
19
 
20
+ ------
21
 
22
+ ## Model Description
23
 
24
+ The **GPT2-Prompt-Upscaler-v1** is designed to extend and refine prompts, aligning them with the tag distribution you’d expect from Danbooru images. Think of it as a friendly helper that fills in the gaps when you’re stuck or want more details for your image generation.
 
 
25
 
26
+ ### Why Use This Model?
27
 
28
+ - **Compact:** With just **335M parameters**, it’s much lighter than the bigger models like the [Phi-3](https://huggingface.co/KBlueLeaf/DanTagGen-beta?not-for-all-audiences=true) based and on-par with [TIPO](https://huggingface.co/KBlueLeaf/TIPO-500M) (but trained much earlier than that!). It won’t hog your VRAM.
29
+ - **Fast:** It runs <1s on a modern GPU to not adding extra time from your t2i generations.
30
+ - **Efficient:** Saves resources for your main image generation process.
31
 
32
+ ### What Can It Do?
33
 
34
+ - **Character Refinement:** Turn a simple character tag (like `hatsune miku`) into a fully detailed prompt with hairstyles, outfits, and accessories.
35
+ - **Adding Details:** Sprinkle in creative details to enhance a scene or concept.
36
+ - **Inspiration:** Want random but interesting variations? Just toss in a short idea and let it play around.
37
+ - **Prompt Polishing:** Clean up and refine elements for better generation outputs.
38
 
39
+ ------
40
 
41
+ ## Training Details
42
 
43
+ The model is finetuned on **GPT2-medium** with **10M prompts** extracted from a refined Pixiv dataset for 5 epochs, with about ~2B tokens seen per epoch.
44
 
45
+ The training format looks something like this:
46
 
47
+ - **Rating:** [safe | nsfw]
48
+ - **Chara:** [Danbooru character tags]
49
+ - **Date:** [2020s | 2010s | 2000s]
50
+ - **Quality:** [normal | good | excellent] (based on aesthetics)
51
+ - **Tags:** [General Danbooru tags]
52
+ - **Output:** Model-generated continuation.
53
 
54
+ ------
55
 
56
+ ## How to Use It
 
 
 
 
 
57
 
58
+ It’s a causal GPT2 model, so you prepare your input in a structured format, and it keeps generating until it sees `</output>`. Here’s what an input might look like:
59
 
60
+ ### Example Prompts
61
 
62
+ ```lua
63
+ <input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output>
64
+ <input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output>
65
+ <input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output>
66
+ <input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output>
 
 
67
  ```
68
 
69
+ The model generates extensions, filling in gaps with tags that feel natural and fitting.
70
+
71
+ ------
72
+
73
+ ## Limitations
74
+
75
+ Okay, so here’s the deal: this model isn’t perfect. Since it’s relatively small and (confession time) it’s technically my 5th attempt at training but the *first* one I thought was worth keeping, it has a few quirks:
76
+
77
+ - **Over-Focusing on Scenery:** When you set the quality to "excellent," the model sometimes gets overly enthusiastic about beautiful backgrounds and makes characters too small.
78
+ - **Alphabetical Tagging:** Occasionally, it gets into a habit of generating tags in alphabetical order, which can lead to repetitive color tags at the end.
79
+ - **Needs More Data:** It might benefit from a retrain with updated Danbooru tags to iron out some of these issues.
80
+
81
+ So, yeah, this isn’t a “final form” model, but I think it’s still pretty handy. I might update it in the future, so stay tuned!
82
+
83
+ ------
84
+
85
+ ## What’s Next?
86
 
87
+ I’ll be sharing:
88
 
89
+ - **The Dataset:** The Pixiv 2023 prompts corpus used for training.
90
+ - **A Demo:** A simple interface to try out the model.
91
 
92
+ This model is for anyone who wants quick and lightweight prompt refinement without the heavy lifting of larger models. Play around with it, and let me know what you think!