|
## qwenimage-blob_emoji-4-s020-6.safetensors |
|
|
|
Blob emoji LoRA. |
|
|
|
The training captions are like `Yellow blob emoji with smiling face with smiling eyes. The background is gray.`, so `blob emoji` or `blob emoji with face ...` etc. act as trigger words. |
|
|
|
- Blob emoji with face holds a sign says "Blob Emoji" in front of Japanese Shrine. --w 1024 --h 1024 --s 50 --d 1001 |
|
 |
|
|
|
- Blob emoji face drives a red sport car along a curved road on a cliff overlooking the sea. The sea is dotted with whitecaps. The sky is blue, and cumulonimbus clouds float on the horizon. --w 1664 --h 928 --s 50 --d 12345678 |
|
 |
|
|
|
### Dataset Creation Procedure |
|
|
|
The dataset was created following these steps: |
|
|
|
- The SVG files from [C1710/blobmoji](https://github.com/C1710/blobmoji) (licensed under ASL 2.0) were used. Specifically, 118 different yellow blob emojis were selected from the SVG files. |
|
- `cairosvg` was used to convert these SVGs into 512x512 pixel transparent PNGs. |
|
- A script was then used to pad the images to 640x640 pixels and generate four versions of each image with different background colors: white, light gray, gray, and black. This resulted in a total of 472 images. |
|
- The captions were generated based on the official Unicode names of the emojis. The prefix `Yellow blob emoji with ` and the suffix `. The background is <color>.` were added to each name. |
|
- For example: `Yellow blob emoji with smiling face with smiling eyes. The background is gray.` |
|
- Note: For some emojis (e.g., devil, zombie), the word `Yellow` was omitted from the prefix. |
|
|
|
### Dataset Definition |
|
|
|
``` |
|
# general configurations |
|
[general] |
|
resolution = [640, 640] |
|
batch_size = 16 |
|
enable_bucket = true |
|
bucket_no_upscale = false |
|
caption_extension = ".txt" |
|
|
|
[[datasets]] |
|
image_directory = "path/to/images_and_captions_dir" |
|
cache_directory = "path/to/cache_dir" |
|
``` |
|
|
|
### Training Command |
|
|
|
``` |
|
accelerate launch --num_cpu_threads_per_process 1 --mixed_precision bf16 --rdzv_backend=c10d \ |
|
src/musubi_tuner/qwen_image_train_network.py \ |
|
--dit path/to/dit.safetensors --vae path/to/vae.safetensors \ |
|
--text_encoder path/to/vlm.safetensors \ |
|
--dataset_config path/to/blob_emoji_v1_640_bs16.toml \ |
|
--output_dir path/to/output_dir \ |
|
--learning_rate 2e-4 \ |
|
--timestep_sampling shift --weighting_scheme none --discrete_flow_shift 2.0 \ |
|
--max_train_epochs 16 --mixed_precision bf16 --seed 42 --gradient_checkpointing \ |
|
--network_module=networks.lora_qwen_image \ |
|
--network_dim=4 --network_args loraplus_lr_ratio=4 \ |
|
--save_every_n_epochs=1 --max_data_loader_n_workers 2 \ |
|
--persistent_data_loader_workers \ |
|
--logging_dir ./logs --log_prefix qwenimage-blob4-2e4- \ |
|
--output_name qwenimage-blob4-2e4 \ |
|
--optimizer_type adamw8bit --flash_attn --split_attn \ |
|
--log_with tensorboard \ |
|
--sample_every_n_epochs 1 --sample_prompts path/to/prompts_qwen_blob_emoji.txt \ |
|
--fp8_base --fp8_scaled |
|
``` |
|
|
|
### Training Details |
|
|
|
- Training was conducted on a Windows machine with a multi-GPU setup (2x RTX A6000). |
|
- If you are not using a Windows environment or not performing multi-GPU training, please remove the `--rdzv_backend=c10d` argument. |
|
- Please note that due to the 2-GPU setup, the effective batch size is 32. To achieve the same results with limited VRAM, increase the gradient accumulation steps. However, you should be able to train successfully with a lower batch size by adjusting the learning rate. |
|
- The model was trained for 6 epochs (90 steps), which took approximately 1 hour with the Power Limit set to 60%. |
|
- Finally, the weights from all 6 epochs were merged using the LoRA Post-Hoc EMA script from Musubi Tuner with `sigma_rel=0.2`. |
|
|
|
## fp-1f-kisekae-1024-v4-2-PfPHEMA.safetensors |
|
|
|
Post-Hoc EMA (with Power function sigma_rel=0.2) version of the following LoRA. The usage is the same. |
|
|
|
## fp-1f-kisekae-1024-v4-2.safetensors |
|
|
|
Experimental LoRA for FramePack One Frame kisekaeichi. The target index is 5. The prompt is as follows: |
|
``` |
|
The girl stays in the same pose, but her outfit changes into a <costume description>, then she changes into another girl wearing the same outfit. |
|
``` |
|
|
|
`costume description` is something like `school uniform` etc. A detailed description may improve the results. For example: "T-shirt with writing on it" or "Girl with long hair" |
|
|
|
This model is trained with 1024x1024 resolution. Please use at roughly the same resolution. |
|
|
|
## fp-1f-chibi-1024.safetensors |
|
|
|
Experimental LoRA for FramePack One Frame Inference. The target index is 9. The prompt is as follows: |
|
``` |
|
An anime character transforms: her head grows larger, her body becomes shorter and smaller, eyes become bigger and cuter. She turns into a chibi (super-deformed) version, with cartoonishly cute proportions. The transformation is quick and playful. |
|
``` |
|
|
|
This model is trained with 1024x1024 resolution. Please use at roughly the same resolution. If the effect is too strong, lower the multiplier (strength) to 0.8 or less. |
|
|
|
## FramePack-dance-lora-d8.safetensors |
|
Experimental LoRA for FramePack. This is for testing purposes and the effect is weak. Please set the prompt to something like `A woman is spinning on her tiptoes` . |
|
`. |
|
|
|
|
|
## flux-hasui-lora-d4-sigmoid-raw-gs1.0.safetensors |
|
Experimental LoRA for FLUX.1 dev. |
|
|
|
Trained with `sd-scripts` (Aug. 11) `sd3` branch. __NOTE:__ This settings requires > 26GB VRAM. Please add `--fp8_base` to enable fp8 training to reduce VRAM usage. |
|
|
|
``` |
|
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train_network.py --pretrained_model_name_or_path flux1/flux1-dev.sft --clip_l sd3/clip_l.safetensors --t5xxl sd3/t5xxl_fp16.safetensors --ae flux1/ae_dev.sft --cache_latents_to_disk --save_model_as safetensors --sdpa --persistent_data_loader_workers --max_data_loader_n_workers 2 --seed 42 --gradient_checkpointing --mixed_precision bf16 --save_precision bf16 --network_module networks.lora_flux --network_dim 4 --optimizer_type adamw8bit --learning_rate 1e-3 --network_train_unet_only --cache_text_encoder_outputs --cache_text_encoder_outputs_to_disk --highvram --max_train_epochs 4 --save_every_n_epochs 1 --dataset_config hasui_1024_bs1.toml --output_dir flux/lora --output_name lora-name --timestep_sampling sigmoid --model_prediction_type raw --guidance_scale 1.0 |
|
``` |
|
|
|
.toml is below. |
|
```.toml |
|
[general] |
|
flip_aug = true |
|
color_aug = false |
|
|
|
[[datasets]] |
|
enable_bucket = true |
|
resolution = [1024,1024] |
|
bucket_reso_steps = 64 |
|
max_bucket_reso = 2048 |
|
min_bucket_reso = 128 |
|
bucket_no_upscale = false |
|
batch_size = 1 |
|
random_crop = false |
|
shuffle_caption = false |
|
|
|
[[datasets.subsets]] |
|
image_dir = "path/to/train/images" |
|
num_repeats = 1 |
|
caption_extension = ".txt" |
|
``` |
|
|
|
|
|
## sdxl-negprompt8-v1m.safetensors |
|
Negative embeddings for sdxl. Num vectors per token = 8 |
|
|
|
## stable-cascade-c-lora-hasui-v02.safetensors |
|
Sample of LoRA for Stable Cascade Stage C. |
|
|
|
Feb 22, 2024 Update: Fixed a bug that LoRA is not applied to some modules (to_q/k/v and to_out) in Attention. |
|
|
|
__This is an experimental model, so the format of the weights may change in the future.__ |
|
|
|
- a painting of an anthropomorphic penguin sitting in a cafe reading a book and having a coffee --w 1024 --h 1024 --d 1 |
|
 |
|
|
|
- a painting of japanese shrine in winter with snowfall --w 832 --h 1152 --d 1234 |
|
 |
|
|
|
This model is trained with 169 images with captions. U-Net only, dim=4, conv_dim=4, alpha=1, lr=1e-3, 4 epochs, mixed precision bf16, 8bit AdamW, batch size 8, resolution 1024x1024 with aspect ratio bucketing. VRAM usage is approximately 22 GB. |