bokeh_3.5_medium / README.md

Update README.md

68cfe0e verified 7 months ago

4.34 kB

	---
	base_model:
	- stabilityai/stable-diffusion-3.5-medium
	tags:
	- art
	---
	# Bokeh 3.5 Medium
	<div align="center">
	<img src="ad2.jpg" alt="00205_" width="620"/>
	</div>

	Bokeh 3.5 Medium is a Continue-training model built upon the stable diffusion 3.5 medium foundation, further refined using a 500W high-resolution open-source dataset with rigorous aesthetic curation. This ensures outstanding image quality, fine detail preservation, and enhanced controllability.

	This model is released under the Stability Community License.
	For more details, visit [Tensor.Art](https://tensor.art) or [TusiArt](https://tusiart.com) to explore additional resources and useful information.

	## Overview

	- Continue-training on SD3.5M, leveraging a large-scale 500W high-resolution dataset, carefully curated for aesthetic quality.
	- Supports hybrid short/long caption training for enhanced natural language understanding.
	- Short Captions: Focus on core image features.
	- Long Captions: Provide broader scene context and atmospheric details.
	- Recommended Resolutions:
	`1920x1024`, `1728x1152`, `1152x1728`, `1280x1664`, `1440x1440`
	- Best Quality Training Resolution: `1440x1440`
	- Supports LoRA fine-tuning.

	## Advantages

	### 🖼️ High-Quality Image Generation
	- State-of-the-art visual fidelity with improved detail extraction and aesthetic consistency.
	- Enhanced resolution support up to 200W pixels, ensuring highly detailed image outputs.
	- Carefully curated dataset ensures better composition, lighting, and overall artistic appeal.

	### 🎯 Powerful Custom Fine-Tuning
	- Exceptional LoRA training support, making it highly effective for:
	- Photography
	- 3D Rendering
	- Illustration
	- Concept Art

	### ⚡ Efficient Inference & Training
	- Low hardware requirements for inference:
	- Medium model: 9GB VRAM (without T5)
	- Full weights inference: 16GB VRAM (suitable for local deployment)
	- LoRA fine-tuning VRAM requirement: 12GB - 32GB

	## Known Issues

	- Potential human anatomy inconsistencies.
	- Limited ability to generate photorealistic images.
	- Some concepts may suffer from aesthetic quality issues.


	## Prompting Guide

	### Use a structured prompt combining:
	- Main subject (e.g., `"Close-up of a macaw"`)
	- Detailed features (e.g., `"vivid feathers, sharp beak"`)
	- Background environment (e.g., `"dimly lit environment"`)
	- Atmospheric description (e.g., `"soft warm lighting, cinematic mood"`)

	### Best Practices:
	- Avoid overly complex prompts, as the model already has strong text encoding. Overloading details can cause T5 hallucination artifacts, reducing image quality.
	- Do not use excessively short prompts (e.g., single words or 2-3 tokens) unless combined with LoRA or Image2Image (i2i) techniques.
	- Avoid mixing too many unrelated concepts, as this can lead to visual distortions and unwanted artifacts.
	- Optimal token length: 30-70 tokens.

	### Negative Prompting
	- Negative prompts strongly influence image quality.
	- Ensure they do not contradict the main subject to avoid degrading the output.



	## Example Output
	Using diffusers：
	```python
	import torch
	from diffusers import StableDiffusion3Pipeline

	pipe = StableDiffusion3Pipeline.from_pretrained("/mnt/share/pcm_outputs/bokeh_3.5_medium", torch_dtype=torch.bfloat16)
	pipe = pipe.to("cuda")

	image = pipe(
	"Close-up of a macaw, dimly lit environment",
	num_inference_steps=28,
	guidance_scale=4,
	height=1920,
	width=1024,
	).images[0]
	image.save("macaw.jpg")
	```
	Using comfyui:
	To use this workflow in ComfyUI, download the JSON file and load it:

	[Download Workflow](bk_workflow.json)

	## Recommended Training Configuration

	For LoRA fine-tuning, the following tools and settings are recommended:

	### 🔧 Training Tools
	- Kohya_ss: [GitHub Repository](https://github.com/bmaltais/kohya_ss.git)
	- Simple Tuner: [GitHub Repository](https://github.com/bghira/SimpleTuner)

	### ⚙️ Suggested Training Settings
	```bash
	--Resolution 1440x1440
	--t5xxl_max_token_length 154
	--optimizer_type AdamW8bit
	--mmdit_lr 1e-4
	--text_encoder_lr 5e-5
	```

	## Contact
	* Website: https://tensor.art https://tusiart.com
	* Developed by: TensorArt