will there be a pose reference lora?

#3
by jasoncow - opened

the depth control is a little bit strong, isnt it?

the depth control is a little bit strong, isnt it?

hey jasoncow planned more loras including poses, but it takes some time.

Hello thedeoxen , I would like to ask some questions about this LoRA training.

  1. Did you train it on the aitoolkit platform?
  2. Is the training data left-right concatenated data pairs? For example, AB -> AC, where A refers to the original image, B refers to the reference depth map, and C refers to the target image.
  3. If it was trained using aitoolkit, is the inference effect in ComfyUI the same as the sample effect during training? Can they be aligned?
    Thank you.

Hello thedeoxen,
In addition to the second question from cmxdg, could you please tell us what your dataset size is and provide more details about the training process?
Thank you!

Thank you for your interest! As I mentioned before, I’m currently preparing a few more LoRAs, so unfortunately I can’t share all the details until that’s done.
Once I release them, I also plan to provide a workflow for dataset preparation and possibly the dataset itself.
Regarding your questions: I trained using aitoolkit, and the inference results are similar to what you see in ComfyUI.
Sorry I can’t share more details right now, but I’ll make everything available later.

Thanks for your answer!
Do you have any timeline for releasing the rest?

Thank you, thedeoxen. Is the model you trained from aitoolkit based on kontext-dev fp16? When the trained model is loaded into comfyui, will there be any distortion in the proportions of the characters in the generated images? I am also training a LoRA that uses pose images as references to change poses. My data is like this:
control image is:
varle301921c925_1656621308446_2-0._QL90_UX1128_ (1)_varle301921c925_1656621306972_2-0._QL90_UX1128_.jpg
target image is :
varle301921c925_1656621308446_2-0._QL90_UX1128_ (1)_varle301921c925_1656621306972_2-0._QL90_UX1128_.jpg
the sample results during training performed very well,however, when I import ComfyUI for inference, no matter how I set the image ratio, the proportion of the generated characters is abnormal, looking like they are squashed, which is completely different from the samples during training.Have you encountered this problem, and how did you solve it?

Hey, cmxdg glad to see that someone also working in this direction. I think that aitoolkit and comfyui somehow differentely working with images when input and output have different size/aspect ratio. But don't know excacly how. If I will get simmilar problem and resolve somehow I'll let you know. But now only can wish you luck :)

Thank you, and I look forward to your sharing of new models and datasets.

Hi cmxdg! I hope you don't mind me reaching out. I saw your comments and would love to chat with you about your pose lora. What's the best way to connect with you?
thanks!

MoSalama98 hi, about timeline, planning to release other version next week

Hi cmxdg! I hope you don't mind me reaching out. I saw your comments and would love to chat with you about your pose lora. What's the best way to connect with you?
thanks!

You can contact me via my email [email protected]

Perhaps qwen image edit is more worthy of in - depth exploration. The qwen vl 2.5 model seems to understand images better than clip - l. Most notably, it can understand the up - down, left - right relationships of the combined reference images.

hey jasoncow. Yeah I really like results what I get from qwen image edit, but didn't touch training for it. Planing do it next.
BTW released lora for poses, hope will be helpfull.
https://huggingface.co/thedeoxen/refcontrol-flux-kontext-reference-pose-lora

Sign up or log in to comment