Commit 
							
							·
						
						03f4f17
	
1
								Parent(s):
							
							6539942
								
Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -153,20 +153,14 @@ Stable Diffusion v1-4 is a latent diffusion model which combines an autoencoder | |
| 153 | 
             
            - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
         | 
| 154 | 
             
            - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
         | 
| 155 |  | 
| 156 | 
            -
            We currently provide four checkpoints, 
         | 
| 157 | 
            -
            - [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1), 
         | 
| 158 | 
            -
            - [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2), 
         | 
| 159 | 
            -
            - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3), and 
         | 
| 160 | 
            -
            - [`stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4).
         | 
| 161 | 
            -
             | 
| 162 | 
            -
            The checkpoints were trained as follows:
         | 
| 163 | 
            -
            - `stable-diffusion-v1-1`: 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
         | 
| 164 | 
             
              194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
         | 
| 165 | 
            -
            - `stable-diffusion-v1-2 | 
| 166 | 
             
              515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
         | 
| 167 | 
             
            filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
         | 
| 168 | 
            -
            - `stable-diffusion-v1-3 | 
| 169 | 
            -
            -  | 
| 170 |  | 
| 171 | 
             
            - **Hardware:** 32 x 8 x A100 GPUs
         | 
| 172 | 
             
            - **Optimizer:** AdamW
         | 
|  | |
| 153 | 
             
            - The non-pooled output of the text encoder is fed into the UNet backbone of the latent diffusion model via cross-attention.
         | 
| 154 | 
             
            - The loss is a reconstruction objective between the noise that was added to the latent and the prediction made by the UNet.
         | 
| 155 |  | 
| 156 | 
            +
            We currently provide four checkpoints, which were trained as follows.
         | 
| 157 | 
            +
            - [`stable-diffusion-v1-1`](https://huggingface.co/CompVis/stable-diffusion-v1-1): 237,000 steps at resolution `256x256` on [laion2B-en](https://huggingface.co/datasets/laion/laion2B-en).
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 158 | 
             
              194,000 steps at resolution `512x512` on [laion-high-resolution](https://huggingface.co/datasets/laion/laion-high-resolution) (170M examples from LAION-5B with resolution `>= 1024x1024`).
         | 
| 159 | 
            +
            - [`stable-diffusion-v1-2`](https://huggingface.co/CompVis/stable-diffusion-v1-2): Resumed from `stable-diffusion-v1-1`.
         | 
| 160 | 
             
              515,000 steps at resolution `512x512` on "laion-improved-aesthetics" (a subset of laion2B-en,
         | 
| 161 | 
             
            filtered to images with an original size `>= 512x512`, estimated aesthetics score `> 5.0`, and an estimated watermark probability `< 0.5`. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using an [improved aesthetics estimator](https://github.com/christophschuhmann/improved-aesthetic-predictor)).
         | 
| 162 | 
            +
            - [`stable-diffusion-v1-3`](https://huggingface.co/CompVis/stable-diffusion-v1-3): Resumed from `stable-diffusion-v1-2`. 195,000 steps at resolution `512x512` on "laion-improved-aesthetics" and 10 % dropping of the text-conditioning to improve [classifier-free guidance sampling](https://arxiv.org/abs/2207.12598)
         | 
| 163 | 
            +
            - [**`stable-diffusion-v1-4`**](https://huggingface.co/CompVis/stable-diffusion-v1-4) *To-fill-here*
         | 
| 164 |  | 
| 165 | 
             
            - **Hardware:** 32 x 8 x A100 GPUs
         | 
| 166 | 
             
            - **Optimizer:** AdamW
         | 

