Instructions to use Claquasse/Anima-Control-Pose with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Claquasse/Anima-Control-Pose with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Claquasse/Anima-Control-Pose", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Anima Control — Pose (Preview-2)
⚠️ Preview-2 — still experimental. Better than Preview-1, but not finished. It will still miss poses and produce deformed bodies, fused hands, and similar artifacts. Treat it as a work-in-progress preview, not a production tool. Non-commercial use only (inherits the Anima base model license). Behaviour and weights may still change.
A native pose control adapter for the Anima v1.0 image model: condition generation on a skeleton pose map so the subject follows a target pose. Preview-2 is the same idea as Preview-1, trained at higher resolutions on a larger corpus, and shipped with a friendlier ComfyUI node.
What changed since Preview-1
- Multi-resolution: 512, 768 and 1024. Preview-1 was 512-only; this runs at all three, and the bodies hold together much better at the larger sizes. 1024 looks best.
- Larger training corpus than Preview-1's ~3,900 examples, which shows up as cleaner anatomy and steadier poses.
- Better pose-following at every resolution, with the biggest gains at 768 and 1024.
- New ComfyUI node, "Anima Pose Control." Drop in a reference photo; it detects the pose for you and renders the skeleton, and you can pick how the skeleton is drawn (thin, thick, puppet, heatmap, or with hands/face stripped). The plain thin skeleton is still the best default — it's the only one the model trained on — but a different render occasionally lands a stubborn pose when the seed alone won't.
Method
Unchanged from Preview-1: a channel-concat control-LoRA on the frozen Anima DiT.
Conditioning. The base VAE encodes the skeleton pose map into a control latent, the same latent space as the noisy image, so the control stays spatially aligned with the generation.
Fusion (ControlEmbedder + ControlInitialLayer). A zero-initialized ControlEmbedder produces
control tokens that are added to the frozen base patch-embed output. Zero-init means training
starts as an exact no-op (output == base) and the control contribution grows only as it earns loss,
so at strength = 0 the adapter is exactly the base model.
Trainable parameters. The ControlEmbedder plus a rank-16 low-rank adapter on the transformer
blocks. The base transformer, text encoder, and VAE stay frozen.
skeleton ─▶ VAE ─▶ control latent ─┐
▼ (+ zero-init ControlEmbedder)
noisy latent ─▶ patch-embed ─▶ [ControlInitialLayer] ─▶ Block×N (+ rank-16 LoRA) ─▶ output
Training
Data. (image, skeleton, caption) triples generated by Anima from a broad prompt distribution; skeletons rendered from each image's detected keypoints (DWPose, COCO-WholeBody, black background). Preview-2 uses a substantially larger corpus than Preview-1.
| Setting | Value |
|---|---|
| Resolution | 512 + 768 + 1024, aspect-ratio bucketed |
| Adapter rank | 16 |
| Learning rate | 1e-4 |
| Epochs | 8 |
| Control dropout | 0.1 |
| Precision | bf16 |
Final training loss ≈ 0.11 (denoising MSE, mean over the final 400 steps).
Results
Measured on held-out full-body poses (fresh generations, not seen in training). Pose agreement is body-PCK@0.1: re-detect keypoints on each output, compare to the target skeleton.
Each grid: the BASE column shows the reference and the no-control generation (same prompt, different seed — it ignores the pose); the remaining columns show the skeleton, drawn in each style, over the pose-controlled output. Control follows the pose; no-control doesn't.
| body-PCK@0.1 | control off | control on |
|---|---|---|
| 512 | ~0.33 | ~0.59 |
| 768 | ~0.38 | ~0.67 |
| 1024 | ~0.37 | ~0.83 |
Preview-1 reached ~0.59 at 512. Preview-2 matches that at 512 and pulls clearly ahead at 768 and 1024 — the gains grow with resolution.
What's still off (honest):
- It doesn't always follow the skeleton, even a clean, correct one.
- A thin stick figure on black isn't how anime is drawn, so the model only half-reads it; the worst artifacts (fused hands, mush) cluster where the skeleton is busiest.
- Dynamic poses — running, jumping, sitting — are the least reliable.
- On short or vague prompts the style flattens toward a samey default; a richer prompt fixes it.
Usage (ComfyUI)
Preview-2 uses two small custom nodes: Anima Control Apply (AnimaControlApply) applies the
adapter, and Anima Pose Control (AnimaPoseControl) detects the pose from a photo and renders
the skeleton for you.
Install
- Download
anima_pose_preview2.safetensorsintoComfyUI/models/loras/. - Copy both folders from
comfyui/in this repo intoComfyUI/custom_nodes/:anima_control_lora/andComfyUI-anima-pose-control/. Restart ComfyUI. ComfyUI-Manager installs the second node'srequirements.txtautomatically; otherwise:pip install -r ComfyUI-anima-pose-control/requirements.txt(rtmlib, opencv-python, onnxruntime, numpy; torch and Pillow come with ComfyUI). - Load a workflow from the menu.
pose_control_demo.jsonis the easiest — control vs no-control side by side. Also included:pose_control.json(simple),pose_control_edit.json(single node, pick the skeleton style),pose_control_compare.json(one pose across every style at once).
anima_pose_preview2.safetensors holds both the low-rank adapter (lora.* keys) and the control
embedder (control_embedder.* keys); LoraLoaderModelOnly reads the first set and Anima Control
Apply reads the second, both pointing at the same file.
Strength. 0.0 is the base model with no control; 1.0 follows the skeleton (range 0–2). Higher
tracks the pose more closely but can cost some image quality.
Pose detector (first run)
The Anima Pose Control node needs a pose detector (rtmlib). On first use it downloads two ONNX
files (316 MB) from this repo's /.cache/rtmlib/hub/checkpoints/detector/ folder and caches them under
`; after that it runs offline. If you see urllib ... getaddrinfo failed, your machine couldn't reach the download host — download detector/yolox_m_8xb8-300e_humanart-c2c7a14a.onnxanddetector/rtmw-dw-x-l_simcc-cocktail14_270e-256x192_20231122.onnx` from the Files tab by hand and
drop both into that cache folder (create it if missing), then restart ComfyUI.
Roadmap
Preview-3 targets the headline weakness: the model only half-reads the skeleton. The thin lines-on-black signal seems foreign to it, so a representation bake-off is testing other renders (thicker, puppet/segmentation, depth-mannequin) to find what Anima follows best, plus the smaller fixes (default strength, dropping noisy hand points, more varied captions). The other half is the detector: the one used here was built for photos, not anime, so on art it produces noisy skeletons. Preview-3 will most likely wait on a purpose-built anime pose detector ("DWPose for anime"), then retrain on the winning representation with a lot more dynamic-pose data.
Version 1.0 comes after Preview-3, if it lands clean — the first non-preview release, trained on a good deal more data again.
Pose is the first component on a shared control harness for Anima; planned siblings are image-prompt (IP-Adapter) and face-identity conditioning.
License
These weights are a derivative of the Anima base model
(circlestone-labs/Anima) and inherit its terms:
the CircleStone Labs Non-Commercial License, and — because Anima is itself a derivative of
Cosmos-Predict2 — the NVIDIA Open Model License.
The model weights are for non-commercial use only. Generated images (outputs) are not restricted
by these terms and may be used commercially. See the bundled LICENSE for the full text.
Support
Building these models means mining and labeling a lot of images and renting GPUs to train on them. If they're useful to you and you want to chip in, it's appreciated and never expected: https://ko-fi.com/claquasse
Citation
@misc{anima_control_pose_preview2,
title = {Anima Control --- Pose (Preview-2)},
author = {Claquasse},
year = {2026},
note = {Preview-2 multi-resolution pose control adapter for Anima v1.0},
howpublished = {\url{https://huggingface.co/Claquasse/Anima-Control-Pose}}
}
Built on Anima (CircleStone Labs), the Cosmos-Predict2 transformer architecture, and the diffusion-pipe training framework.
- Downloads last month
- 271
Model tree for Claquasse/Anima-Control-Pose
Base model
nvidia/Cosmos-Predict2-2B-Text2Image