ConorWang commited on
Commit
0a5d253
·
verified ·
1 Parent(s): bfba007

Upload SDXL Refiner snapshot (preserve folder structure)

Browse files
README.md CHANGED
@@ -1,53 +1,123 @@
1
- Veltraxor_1_image_refiner
2
- Veltraxor_1_image_refiner provides the indexed weights and configuration for the Stable Diffusion XL 1.0 Refiner model. It is designed to enhance the latent representations generated by the Base model, performing high-fidelity denoising and detail refinement to produce final high-quality images.
 
 
 
 
 
 
3
 
4
- Key Features
5
- Detail Enhancement: Removes residual noise and enhances textures from the Base stage outputs.
6
 
7
- Two-Stage Workflow: Optimized for seamless integration with Veltraxor_1_image_base, forming a complete two-stage image generation pipeline.
8
 
9
- Modular Deployment: Can be loaded independently without modifying or redownloading the Base model.
 
 
 
10
 
11
- Repository Structure
12
- /
13
- ├── unet/
14
- └── diffusion_pytorch_model.safetensors Core Refiner weights
15
- ├── scheduler/
16
- │ └── scheduler_config.json Noise scheduler configuration
17
- ├── text_encoder/
18
- │ └── model.safetensors CLIP text encoder weights
19
- ├── tokenizer/
20
- │ ├── merges.txt
21
- │ ├── vocab.json
22
- │ └── tokenizer_config.json Tokenizer files
23
- └── README.md
24
 
25
- Integration Guide
26
- Install Dependencies
27
 
28
- pip install diffusers transformers accelerate safetensors
29
 
30
- Load Refiner in Python
 
 
 
 
31
 
32
- from diffusers import StableDiffusionXLPipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  import torch
 
 
34
 
35
- pipe = StableDiffusionXLPipeline.from_pretrained(
36
- "Veltraxor/Veltraxor_1_image_refiner",
37
- torch_dtype=torch.float16
38
  )
 
 
39
 
40
- Example Usage
 
 
 
41
 
42
- base_pipe = StableDiffusionXLPipeline.from_pretrained(
43
- "Veltraxor/Veltraxor_1_image_base", torch_dtype=torch.float16
44
- )
45
- latents = base_pipe.text_encoder("A serene mountain lake at sunrise").latent_dist
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
- refined = pipe.unet(latents).sample
48
 
49
- License
50
- This repository indexes and mirrors the original weights and configuration of Stable Diffusion XL 1.0 Refiner, which is distributed under the Creative ML Open RAIL-M License (see the original LICENSE: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0).
 
 
 
51
 
52
- Proprietary Components
53
- All fine-tuning scripts, testing frameworks, and derivative improvements developed by Libo Wang (Veltraxor AI) are proprietary and not publicly released. Users may freely download and use the original SDXL Refiner model under its license terms.
 
1
+ ---
2
+ license: openrail++
3
+ tags:
4
+ - stable-diffusion
5
+ - image-to-image
6
+ ---
7
+ # SD-XL 1.0-refiner Model Card
8
+ ![row01](01.png)
9
 
10
+ ## Model
 
11
 
12
+ ![pipeline](pipeline.png)
13
 
14
+ [SDXL](https://arxiv.org/abs/2307.01952) consists of an [ensemble of experts](https://arxiv.org/abs/2211.01324) pipeline for latent diffusion:
15
+ In a first step, the base model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) is used to generate (noisy) latents,
16
+ which are then further processed with a refinement model specialized for the final denoising steps.
17
+ Note that the base model can be used as a standalone module.
18
 
19
+ Alternatively, we can use a two-stage pipeline as follows:
20
+ First, the base model is used to generate latents of the desired output size.
21
+ In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as "img2img")
22
+ to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations.
 
 
 
 
 
 
 
 
 
23
 
24
+ Source code is available at https://github.com/Stability-AI/generative-models .
 
25
 
26
+ ### Model Description
27
 
28
+ - **Developed by:** Stability AI
29
+ - **Model type:** Diffusion-based text-to-image generative model
30
+ - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/LICENSE.md)
31
+ - **Model Description:** This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses two fixed, pretrained text encoders ([OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip) and [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main)).
32
+ - **Resources for more information:** Check out our [GitHub Repository](https://github.com/Stability-AI/generative-models) and the [SDXL report on arXiv](https://arxiv.org/abs/2307.01952).
33
 
34
+ ### Model Sources
35
+
36
+ For research purposes, we recommned our `generative-models` Github repository (https://github.com/Stability-AI/generative-models), which implements the most popoular diffusion frameworks (both training and inference) and for which new functionalities like distillation will be added over time.
37
+ [Clipdrop](https://clipdrop.co/stable-diffusion) provides free SDXL inference.
38
+
39
+ - **Repository:** https://github.com/Stability-AI/generative-models
40
+ - **Demo:** https://clipdrop.co/stable-diffusion
41
+
42
+
43
+ ## Evaluation
44
+ ![comparison](comparison.png)
45
+ The chart above evaluates user preference for SDXL (with and without refinement) over SDXL 0.9 and Stable Diffusion 1.5 and 2.1.
46
+ The SDXL base model performs significantly better than the previous variants, and the model combined with the refinement module achieves the best overall performance.
47
+
48
+
49
+ ### 🧨 Diffusers
50
+
51
+ Make sure to upgrade diffusers to >= 0.18.0:
52
+ ```
53
+ pip install diffusers --upgrade
54
+ ```
55
+
56
+ In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
57
+ ```
58
+ pip install invisible_watermark transformers accelerate safetensors
59
+ ```
60
+
61
+ Yon can then use the refiner to improve images.
62
+
63
+ ```py
64
  import torch
65
+ from diffusers import StableDiffusionXLImg2ImgPipeline
66
+ from diffusers.utils import load_image
67
 
68
+ pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
69
+ "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
 
70
  )
71
+ pipe = pipe.to("cuda")
72
+ url = "https://huggingface.co/datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"
73
 
74
+ init_image = load_image(url).convert("RGB")
75
+ prompt = "a photo of an astronaut riding a horse on mars"
76
+ image = pipe(prompt, image=init_image).images
77
+ ```
78
 
79
+ When using `torch >= 2.0`, you can improve the inference speed by 20-30% with torch.compile. Simple wrap the unet with torch compile before running the pipeline:
80
+ ```py
81
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
82
+ ```
83
+
84
+ If you are limited by GPU VRAM, you can enable *cpu offloading* by calling `pipe.enable_model_cpu_offload`
85
+ instead of `.to("cuda")`:
86
+
87
+ ```diff
88
+ - pipe.to("cuda")
89
+ + pipe.enable_model_cpu_offload()
90
+ ```
91
+
92
+ For more advanced use cases, please have a look at [the docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_xl).
93
+
94
+ ## Uses
95
+
96
+ ### Direct Use
97
+
98
+ The model is intended for research purposes only. Possible research areas and tasks include
99
+
100
+ - Generation of artworks and use in design and other artistic processes.
101
+ - Applications in educational or creative tools.
102
+ - Research on generative models.
103
+ - Safe deployment of models which have the potential to generate harmful content.
104
+ - Probing and understanding the limitations and biases of generative models.
105
+
106
+ Excluded uses are described below.
107
+
108
+ ### Out-of-Scope Use
109
+
110
+ The model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.
111
+
112
+ ## Limitations and Bias
113
 
114
+ ### Limitations
115
 
116
+ - The model does not achieve perfect photorealism
117
+ - The model cannot render legible text
118
+ - The model struggles with more difficult tasks which involve compositionality, such as rendering an image corresponding to “A red cube on top of a blue sphere”
119
+ - Faces and people in general may not be generated properly.
120
+ - The autoencoding part of the model is lossy.
121
 
122
+ ### Bias
123
+ While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
text_encoder_2/model.safetensors CHANGED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a6032f63d37ae02bbc74ccd6a27440578cd71701f96532229d0154f55a8d3ff
3
+ size 2778702264
unet/diffusion_pytorch_model.fp16.safetensors CHANGED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ea0376dcf065eaefd27806394a90e310001b1a71d4f1cf1f655e86c0e566ffe
3
+ size 4519210760