|
--- |
|
base_model: |
|
- stabilityai/stable-diffusion-3.5-medium |
|
tags: |
|
- art |
|
--- |
|
# Bokeh 3.5 Medium |
|
<div align="center"> |
|
<img src="ad2.jpg" alt="00205_" width="620"/> |
|
</div> |
|
|
|
Bokeh 3.5 Medium is a **Continue-training** model built upon the **stable diffusion 3.5 medium** foundation, further refined using a **500W high-resolution open-source dataset** with rigorous **aesthetic curation**. This ensures outstanding image quality, fine detail preservation, and enhanced controllability. |
|
|
|
This model is released under the Stability Community License. |
|
For more details, visit [Tensor.Art](https://tensor.art) or [TusiArt](https://tusiart.com) to explore additional resources and useful information. |
|
|
|
## Overview |
|
|
|
- **Continue-training on SD3.5M**, leveraging a large-scale **500W high-resolution dataset**, carefully curated for aesthetic quality. |
|
- **Supports hybrid short/long caption training** for enhanced natural language understanding. |
|
- **Short Captions:** Focus on core image features. |
|
- **Long Captions:** Provide broader scene context and atmospheric details. |
|
- **Recommended Resolutions:** |
|
`1920x1024`, `1728x1152`, `1152x1728`, `1280x1664`, `1440x1440` |
|
- **Best Quality Training Resolution:** `1440x1440` |
|
- **Supports LoRA fine-tuning.** |
|
|
|
## Advantages |
|
|
|
### 🖼️ High-Quality Image Generation |
|
- **State-of-the-art visual fidelity** with improved detail extraction and **aesthetic consistency**. |
|
- **Enhanced resolution support** up to **200W pixels**, ensuring highly detailed image outputs. |
|
- **Carefully curated dataset** ensures better composition, lighting, and overall artistic appeal. |
|
|
|
### 🎯 Powerful Custom Fine-Tuning |
|
- **Exceptional LoRA training support**, making it highly effective for: |
|
- Photography |
|
- 3D Rendering |
|
- Illustration |
|
- Concept Art |
|
|
|
### ⚡ Efficient Inference & Training |
|
- **Low hardware requirements for inference:** |
|
- **Medium model:** 9GB VRAM (without T5) |
|
- **Full weights inference:** 16GB VRAM (suitable for local deployment) |
|
- **LoRA fine-tuning VRAM requirement:** 12GB - 32GB |
|
|
|
## Known Issues |
|
|
|
- **Potential human anatomy inconsistencies.** |
|
- **Limited ability to generate photorealistic images.** |
|
- **Some concepts may suffer from aesthetic quality issues.** |
|
|
|
|
|
## Prompting Guide |
|
|
|
### Use a structured prompt combining: |
|
- **Main subject** (e.g., `"Close-up of a macaw"`) |
|
- **Detailed features** (e.g., `"vivid feathers, sharp beak"`) |
|
- **Background environment** (e.g., `"dimly lit environment"`) |
|
- **Atmospheric description** (e.g., `"soft warm lighting, cinematic mood"`) |
|
|
|
### Best Practices: |
|
- **Avoid overly complex prompts**, as the model already has strong text encoding. Overloading details can cause **T5 hallucination artifacts**, reducing image quality. |
|
- **Do not use excessively short prompts** (e.g., single words or 2-3 tokens) unless combined with **LoRA or Image2Image (i2i)** techniques. |
|
- **Avoid mixing too many unrelated concepts**, as this can lead to visual distortions and unwanted artifacts. |
|
- **Optimal token length:** **30-70 tokens**. |
|
|
|
### Negative Prompting |
|
- **Negative prompts strongly influence image quality.** |
|
- Ensure they **do not contradict the main subject** to avoid degrading the output. |
|
|
|
|
|
|
|
## Example Output |
|
Using diffusers: |
|
```python |
|
import torch |
|
from diffusers import StableDiffusion3Pipeline |
|
|
|
pipe = StableDiffusion3Pipeline.from_pretrained("/mnt/share/pcm_outputs/bokeh_3.5_medium", torch_dtype=torch.bfloat16) |
|
pipe = pipe.to("cuda") |
|
|
|
image = pipe( |
|
"Close-up of a macaw, dimly lit environment", |
|
num_inference_steps=28, |
|
guidance_scale=4, |
|
height=1920, |
|
width=1024, |
|
).images[0] |
|
image.save("macaw.jpg") |
|
``` |
|
Using comfyui: |
|
To use this workflow in **ComfyUI**, download the JSON file and load it: |
|
|
|
[Download Workflow](bk_workflow.json) |
|
|
|
## Recommended Training Configuration |
|
|
|
For **LoRA fine-tuning**, the following tools and settings are recommended: |
|
|
|
### 🔧 Training Tools |
|
- **Kohya_ss:** [GitHub Repository](https://github.com/bmaltais/kohya_ss.git) |
|
- **Simple Tuner:** [GitHub Repository](https://github.com/bghira/SimpleTuner) |
|
|
|
### ⚙️ Suggested Training Settings |
|
```bash |
|
--Resolution 1440x1440 |
|
--t5xxl_max_token_length 154 |
|
--optimizer_type AdamW8bit |
|
--mmdit_lr 1e-4 |
|
--text_encoder_lr 5e-5 |
|
``` |
|
|
|
## Contact |
|
* Website: https://tensor.art https://tusiart.com |
|
* Developed by: TensorArt |