File size: 4,416 Bytes
68cfe0e
 
 
 
 
29f4422
 
 
 
68cfe0e
1af5080
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
base_model:
- stabilityai/stable-diffusion-3.5-medium
tags:
- art
license: other
license_name: stabilityai-ai-community
license_link: LICENSE
-
---
# Bokeh 3.5 Medium
<div align="center">
<img src="ad2.jpg" alt="00205_" width="620"/>
</div> 

Bokeh 3.5 Medium is a **Continue-training** model built upon the **stable diffusion 3.5 medium** foundation, further refined using a **500W high-resolution open-source dataset** with rigorous **aesthetic curation**. This ensures outstanding image quality, fine detail preservation, and enhanced controllability.

This model is released under the Stability Community License.
For more details, visit [Tensor.Art](https://tensor.art) or [TusiArt](https://tusiart.com) to explore additional resources and useful information.

## Overview

- **Continue-training on SD3.5M**, leveraging a large-scale **500W high-resolution dataset**, carefully curated for aesthetic quality.
- **Supports hybrid short/long caption training** for enhanced natural language understanding.
  - **Short Captions:** Focus on core image features.
  - **Long Captions:** Provide broader scene context and atmospheric details.
- **Recommended Resolutions:**  
  `1920x1024`, `1728x1152`, `1152x1728`, `1280x1664`, `1440x1440`
- **Best Quality Training Resolution:** `1440x1440`
- **Supports LoRA fine-tuning.**

## Advantages

### 🖼️ High-Quality Image Generation
- **State-of-the-art visual fidelity** with improved detail extraction and **aesthetic consistency**.
- **Enhanced resolution support** up to **200W pixels**, ensuring highly detailed image outputs.
- **Carefully curated dataset** ensures better composition, lighting, and overall artistic appeal.

### 🎯 Powerful Custom Fine-Tuning
- **Exceptional LoRA training support**, making it highly effective for:
  - Photography
  - 3D Rendering
  - Illustration
  - Concept Art

### ⚡ Efficient Inference & Training
- **Low hardware requirements for inference:**
  - **Medium model:** 9GB VRAM (without T5)
  - **Full weights inference:** 16GB VRAM (suitable for local deployment)
- **LoRA fine-tuning VRAM requirement:** 12GB - 32GB

## Known Issues

- **Potential human anatomy inconsistencies.**
- **Limited ability to generate photorealistic images.**
- **Some concepts may suffer from aesthetic quality issues.**


## Prompting Guide

### Use a structured prompt combining:
- **Main subject** (e.g., `"Close-up of a macaw"`)  
- **Detailed features** (e.g., `"vivid feathers, sharp beak"`)  
- **Background environment** (e.g., `"dimly lit environment"`)  
- **Atmospheric description** (e.g., `"soft warm lighting, cinematic mood"`)  

### Best Practices:
- **Avoid overly complex prompts**, as the model already has strong text encoding. Overloading details can cause **T5 hallucination artifacts**, reducing image quality.  
- **Do not use excessively short prompts** (e.g., single words or 2-3 tokens) unless combined with **LoRA or Image2Image (i2i)** techniques.  
- **Avoid mixing too many unrelated concepts**, as this can lead to visual distortions and unwanted artifacts.  
- **Optimal token length:** **30-70 tokens**.  

### Negative Prompting
- **Negative prompts strongly influence image quality.**  
- Ensure they **do not contradict the main subject** to avoid degrading the output.  



## Example Output
Using diffusers:
```python
import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("/mnt/share/pcm_outputs/bokeh_3.5_medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "Close-up of a macaw, dimly lit environment",
    num_inference_steps=28,
    guidance_scale=4,
    height=1920,
    width=1024,
).images[0]
image.save("macaw.jpg")
```
Using comfyui:
To use this workflow in **ComfyUI**, download the JSON file and load it:

[Download Workflow](bk_workflow.json)

## Recommended Training Configuration

For **LoRA fine-tuning**, the following tools and settings are recommended:

### 🔧 Training Tools
- **Kohya_ss:** [GitHub Repository](https://github.com/bmaltais/kohya_ss.git)
- **Simple Tuner:** [GitHub Repository](https://github.com/bghira/SimpleTuner)

### ⚙️ Suggested Training Settings
```bash
--Resolution 1440x1440
--t5xxl_max_token_length 154
--optimizer_type AdamW8bit
--mmdit_lr 1e-4
--text_encoder_lr 5e-5
```

## Contact
* Website: https://tensor.art  https://tusiart.com
* Developed by: TensorArt