|
--- |
|
license: mit |
|
language: |
|
- en |
|
base_model: |
|
- openai-community/gpt2-medium |
|
pipeline_tag: text-generation |
|
tags: |
|
- transformers |
|
- pytorch |
|
- gpt2 |
|
--- |
|
|
|
# GPT2-Prompt-Upscaler-v1 |
|
|
|
> **Date Trained:** March 2024 |
|
|
|
A lightweight model for generating **Danbooru tag-based prompts** from just a few input tags. It’s simple, fast, and (hopefully) useful for enhancing your text-to-image generations. Also, nsfw mode included. |
|
|
|
------ |
|
|
|
## Model Description |
|
|
|
The **GPT2-Prompt-Upscaler-v1** is designed to extend and refine prompts, aligning them with the tag distribution you’d expect from Danbooru images. Think of it as a friendly helper that fills in the gaps when you’re stuck or want more details for your image generation. |
|
|
|
### Why Use This Model? |
|
|
|
- **Compact:** With just **335M parameters**, it’s much lighter than the bigger models like the [Phi-3](https://huggingface.co/KBlueLeaf/DanTagGen-beta?not-for-all-audiences=true) based and on-par with [TIPO](https://huggingface.co/KBlueLeaf/TIPO-500M) (but trained much earlier than that!). It won’t hog your VRAM. |
|
- **Fast:** It runs <1s on a modern GPU to not adding extra time from your t2i generations. |
|
- **Efficient:** Saves resources for your main image generation process. |
|
|
|
### What Can It Do? |
|
|
|
- **Character Refinement:** Turn a simple character tag (like `hatsune miku`) into a fully detailed prompt with hairstyles, outfits, and accessories. |
|
- **Adding Details:** Sprinkle in creative details to enhance a scene or concept. |
|
- **Inspiration:** Want random but interesting variations? Just toss in a short idea and let it play around. |
|
- **Prompt Polishing:** Clean up and refine elements for better generation outputs. |
|
|
|
------ |
|
|
|
## Training Details |
|
|
|
The model is finetuned on **GPT2-medium** with **10M prompts** extracted from a refined Pixiv dataset for 5 epochs, with about ~2B tokens seen per epoch. |
|
|
|
Training is done on a 8xH100 node for about 30 hours. |
|
|
|
The training format looks something like this: |
|
|
|
- **Rating:** [safe | nsfw] |
|
- **Chara:** [Danbooru character tags] |
|
- **Date:** [2020s | 2010s | 2000s] |
|
- **Quality:** [normal | good | excellent] (based on aesthetics) |
|
- **Tags:** [General Danbooru tags] |
|
- **Output:** Model-generated continuation. |
|
|
|
------ |
|
|
|
## How to Use It |
|
|
|
It’s a causal GPT2 model, so you prepare your input in a structured format, and it keeps generating until it sees `</output>`. Here’s what an input might look like: |
|
|
|
### Example Prompts |
|
|
|
```lua |
|
<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, long hair, white hair"><output> |
|
<input rating="safe" chara="" date="2020s" quality="excellent" tags="1girl, purple hair, white hair"><output> |
|
<input rating="safe" chara="" date="2020s" quality="excellent" tags="gothic lolita"><output> |
|
<input rating="safe" chara="hatsune miku" date="2020s" quality="excellent" tags=""><output> |
|
``` |
|
|
|
The model generates extensions, filling in gaps with tags that feel natural and fitting. |
|
|
|
------ |
|
|
|
## Limitations |
|
|
|
Okay, so here’s the deal: this model isn’t perfect. Since it’s relatively small and (confession time) it’s technically my 5th attempt at training but the *first* one I thought was worth keeping, it has a few quirks: |
|
|
|
- **Over-Focusing on Scenery:** When you set the quality to "excellent," the model sometimes gets overly enthusiastic about beautiful backgrounds and makes characters too small. |
|
- **Alphabetical Tagging:** Occasionally, it gets into a habit of generating tags in alphabetical order, which can lead to repetitive color tags at the end. |
|
- **Needs More Data:** It might benefit from a retrain with updated Danbooru tags to iron out some of these issues. |
|
|
|
So, yeah, this isn’t a “final form” model, but I think it’s still pretty handy. I might update it in the future, so stay tuned! |
|
|
|
------ |
|
|
|
## What’s Next? |
|
|
|
I’ll be sharing: |
|
|
|
- **The Dataset:** The Pixiv 2023 prompts corpus used for training. |
|
- **A Demo:** A simple interface to try out the model. |
|
|
|
This model is for anyone who wants quick and lightweight prompt refinement without the heavy lifting of larger models. Play around with it, and let me know what you think! |
|
|