kohya-ss commited on
Commit
c9929a2
·
verified ·
1 Parent(s): 5b338bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -10,6 +10,65 @@ The training captions are like `Yellow blob emoji with smiling face with smiling
10
  - Blob emoji face drives a red sport car along a curved road on a cliff overlooking the sea. The sea is dotted with whitecaps. The sky is blue, and cumulonimbus clouds float on the horizon. --w 1664 --h 928 --s 50 --d 12345678
11
  ![sample2](yellow_blob_2.png)
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ## fp-1f-kisekae-1024-v4-2-PfPHEMA.safetensors
14
 
15
  Post-Hoc EMA (with Power function sigma_rel=0.2) version of the following LoRA. The usage is the same.
 
10
  - Blob emoji face drives a red sport car along a curved road on a cliff overlooking the sea. The sea is dotted with whitecaps. The sky is blue, and cumulonimbus clouds float on the horizon. --w 1664 --h 928 --s 50 --d 12345678
11
  ![sample2](yellow_blob_2.png)
12
 
13
+ ### Dataset Creation Procedure
14
+
15
+ The dataset was created following these steps:
16
+
17
+ - The SVG files from [C1710/blobmoji](https://github.com/C1710/blobmoji) (licensed under ASL 2.0) were used. Specifically, 118 different yellow blob emojis were selected from the SVG files.
18
+ - `cairosvg` was used to convert these SVGs into 512x512 pixel transparent PNGs.
19
+ - A script was then used to pad the images to 640x640 pixels and generate four versions of each image with different background colors: white, light gray, gray, and black. This resulted in a total of 472 images.
20
+ - The captions were generated based on the official Unicode names of the emojis. The prefix `Yellow blob emoji with ` and the suffix `. The background is <color>.` were added to each name.
21
+ - For example: `Yellow blob emoji with smiling face with smiling eyes. The background is gray.`
22
+ - Note: For some emojis (e.g., devil, zombie), the word `Yellow` was omitted from the prefix.
23
+
24
+ ### Dataset Definition
25
+
26
+ ```
27
+ # general configurations
28
+ [general]
29
+ resolution = [640, 640]
30
+ batch_size = 16
31
+ enable_bucket = true
32
+ bucket_no_upscale = false
33
+ caption_extension = ".txt"
34
+
35
+ [[datasets]]
36
+ image_directory = "path/to/images_and_captions_dir"
37
+ cache_directory = "path/to/cache_dir"
38
+ ```
39
+
40
+ ### Training Command
41
+
42
+ ```
43
+ accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 --rdzv_backend=c10d \
44
+ src/musubi_tuner/qwen_image_train_network.py \
45
+ --dit path/to/dit.safetensors --vae path/to/vae.safetensors \
46
+ --text_encoder path/to/vlm.safetensors \
47
+ --dataset_config path/to/blob_emoji_v1_640_bs16.toml \
48
+ --output_dir path/to/output_dir \
49
+ --learning_rate 2e-4 \
50
+ --timestep_sampling shift --weighting_scheme none --discrete_flow_shift 2.0 \
51
+ --max_train_epochs 16 --mixed_precision bf16 --seed 42 --gradient_checkpointing \
52
+ --network_module=networks.lora_qwen_image \
53
+ --network_dim=4 --network_args loraplus_lr_ratio=4 \
54
+ --save_every_n_epochs=1 --max_data_loader_n_workers 2 \
55
+ --persistent_data_loader_workers \
56
+ --logging_dir ./logs --log_prefix qwenimage-blob4-2e4- \
57
+ --output_name qwenimage-blob4-2e4 \
58
+ --optimizer_type adamw8bit --flash_attn --split_attn \
59
+ --log_with tensorboard \
60
+ --sample_every_n_epochs 1 --sample_prompts path/to/prompts_qwen_blob_emoji.txt \
61
+ --fp8_base --fp8_scaled
62
+ ```
63
+
64
+ ### Training Details
65
+
66
+ - Training was conducted on a Windows machine with a multi-GPU setup (2x RTX A6000).
67
+ - If you are not using a Windows environment or not performing multi-GPU training, please remove the `--rdzv_backend=c10d` argument.
68
+ - Please note that due to the 2-GPU setup, the effective batch size is 32. To achieve the same results with limited VRAM, increase the gradient accumulation steps. However, you should be able to train successfully with a lower batch size by adjusting the learning rate.
69
+ - The model was trained for 6 epochs (90 steps), which took approximately 1 hour with the Power Limit set to 60%.
70
+ - Finally, the weights from all 6 epochs were merged using the LoRA Post-Hoc EMA script from Musubi Tuner with `sigma_rel=0.2`.
71
+
72
  ## fp-1f-kisekae-1024-v4-2-PfPHEMA.safetensors
73
 
74
  Post-Hoc EMA (with Power function sigma_rel=0.2) version of the following LoRA. The usage is the same.