Anima Control — Pose (Preview-2)

⚠️ Preview-2 — still experimental. Better than Preview-1, but not finished. It will still miss poses and produce deformed bodies, fused hands, and similar artifacts. Treat it as a work-in-progress preview, not a production tool. Non-commercial use only (inherits the Anima base model license). Behaviour and weights may still change.

A native pose control adapter for the Anima v1.0 image model: condition generation on a skeleton pose map so the subject follows a target pose. Preview-2 is the same idea as Preview-1, trained at higher resolutions on a larger corpus, and shipped with a friendlier ComfyUI node.

What changed since Preview-1

Multi-resolution: 512, 768 and 1024. Preview-1 was 512-only; this runs at all three, and the bodies hold together much better at the larger sizes. 1024 looks best.
Larger training corpus than Preview-1's ~3,900 examples, which shows up as cleaner anatomy and steadier poses.
Better pose-following at every resolution, with the biggest gains at 768 and 1024.
New ComfyUI node, "Anima Pose Control." Drop in a reference photo; it detects the pose for you and renders the skeleton, and you can pick how the skeleton is drawn (thin, thick, puppet, heatmap, or with hands/face stripped). The plain thin skeleton is still the best default — it's the only one the model trained on — but a different render occasionally lands a stubborn pose when the seed alone won't.

Method

Unchanged from Preview-1: a channel-concat control-LoRA on the frozen Anima DiT.

Conditioning. The base VAE encodes the skeleton pose map into a control latent, the same latent space as the noisy image, so the control stays spatially aligned with the generation.

Fusion (ControlEmbedder + ControlInitialLayer). A zero-initialized ControlEmbedder produces control tokens that are added to the frozen base patch-embed output. Zero-init means training starts as an exact no-op (output == base) and the control contribution grows only as it earns loss, so at strength = 0 the adapter is exactly the base model.

Trainable parameters. The ControlEmbedder plus a rank-16 low-rank adapter on the transformer blocks. The base transformer, text encoder, and VAE stay frozen.

skeleton ─▶ VAE ─▶ control latent ─┐
                                   ▼  (+ zero-init ControlEmbedder)
noisy latent ─▶ patch-embed ─▶ [ControlInitialLayer] ─▶ Block×N (+ rank-16 LoRA) ─▶ output

Training

Data. (image, skeleton, caption) triples generated by Anima from a broad prompt distribution; skeletons rendered from each image's detected keypoints (DWPose, COCO-WholeBody, black background). Preview-2 uses a substantially larger corpus than Preview-1.

Setting	Value
Resolution	512 + 768 + 1024, aspect-ratio bucketed
Adapter rank	16
Learning rate	1e-4
Epochs	8
Control dropout	0.1
Precision	bf16

Final training loss ≈ 0.11 (denoising MSE, mean over the final 400 steps).

Results

Measured on held-out full-body poses (fresh generations, not seen in training). Pose agreement is body-PCK@0.1: re-detect keypoints on each output, compare to the target skeleton.

Each grid: the BASE column shows the reference and the no-control generation (same prompt, different seed — it ignores the pose); the remaining columns show the skeleton, drawn in each style, over the pose-controlled output. Control follows the pose; no-control doesn't.

body-PCK@0.1	control off	control on
512	~0.33	~0.59
768	~0.38	~0.67
1024	~0.37	~0.83

Preview-1 reached ~0.59 at 512. Preview-2 matches that at 512 and pulls clearly ahead at 768 and 1024 — the gains grow with resolution.

What's still off (honest):

It doesn't always follow the skeleton, even a clean, correct one.
A thin stick figure on black isn't how anime is drawn, so the model only half-reads it; the worst artifacts (fused hands, mush) cluster where the skeleton is busiest.
Dynamic poses — running, jumping, sitting — are the least reliable.
On short or vague prompts the style flattens toward a samey default; a richer prompt fixes it.

Usage (ComfyUI)

Preview-2 uses two small custom nodes: Anima Control Apply (AnimaControlApply) applies the adapter, and Anima Pose Control (AnimaPoseControl) detects the pose from a photo and renders the skeleton for you.

Install

Download anima_pose_preview2.safetensors into ComfyUI/models/loras/.
Copy both folders from comfyui/ in this repo into ComfyUI/custom_nodes/: anima_control_lora/ and ComfyUI-anima-pose-control/. Restart ComfyUI. ComfyUI-Manager installs the second node's requirements.txt automatically; otherwise: pip install -r ComfyUI-anima-pose-control/requirements.txt (rtmlib, opencv-python, onnxruntime, numpy; torch and Pillow come with ComfyUI).
Load a workflow from the menu. pose_control_demo.json is the easiest — control vs no-control side by side. Also included: pose_control.json (simple), pose_control_edit.json (single node, pick the skeleton style), pose_control_compare.json (one pose across every style at once).

anima_pose_preview2.safetensors holds both the low-rank adapter (lora.* keys) and the control embedder (control_embedder.* keys); LoraLoaderModelOnly reads the first set and Anima Control Apply reads the second, both pointing at the same file.

Strength. 0.0 is the base model with no control; 1.0 follows the skeleton (range 0–2). Higher tracks the pose more closely but can cost some image quality.

Pose detector (first run)

The Anima Pose Control node needs a pose detector (rtmlib). On first use it downloads two ONNX files (~~316 MB) from this repo's detector/ folder and caches them under `~~/.cache/rtmlib/hub/checkpoints/; after that it runs offline. If you see urllib ... getaddrinfo failed, your machine couldn't reach the download host — download detector/yolox_m_8xb8-300e_humanart-c2c7a14a.onnxanddetector/rtmw-dw-x-l_simcc-cocktail14_270e-256x192_20231122.onnx` from the Files tab by hand and drop both into that cache folder (create it if missing), then restart ComfyUI.

Roadmap

Preview-3 targets the headline weakness: the model only half-reads the skeleton. The thin lines-on-black signal seems foreign to it, so a representation bake-off is testing other renders (thicker, puppet/segmentation, depth-mannequin) to find what Anima follows best, plus the smaller fixes (default strength, dropping noisy hand points, more varied captions). The other half is the detector: the one used here was built for photos, not anime, so on art it produces noisy skeletons. Preview-3 will most likely wait on a purpose-built anime pose detector ("DWPose for anime"), then retrain on the winning representation with a lot more dynamic-pose data.

Version 1.0 comes after Preview-3, if it lands clean — the first non-preview release, trained on a good deal more data again.

Pose is the first component on a shared control harness for Anima; planned siblings are image-prompt (IP-Adapter) and face-identity conditioning.

License

These weights are a derivative of the Anima base model (circlestone-labs/Anima) and inherit its terms: the CircleStone Labs Non-Commercial License, and — because Anima is itself a derivative of Cosmos-Predict2 — the NVIDIA Open Model License.

The model weights are for non-commercial use only. Generated images (outputs) are not restricted by these terms and may be used commercially. See the bundled LICENSE for the full text.

Support

Building these models means mining and labeling a lot of images and renting GPUs to train on them. If they're useful to you and you want to chip in, it's appreciated and never expected: https://ko-fi.com/claquasse

Citation

@misc{anima_control_pose_preview2,
  title  = {Anima Control --- Pose (Preview-2)},
  author = {Claquasse},
  year   = {2026},
  note   = {Preview-2 multi-resolution pose control adapter for Anima v1.0},
  howpublished = {\url{https://huggingface.co/Claquasse/Anima-Control-Pose}}
}

Built on Anima (CircleStone Labs), the Cosmos-Predict2 transformer architecture, and the diffusion-pipe training framework.

Downloads last month: 271

Model tree for Claquasse/Anima-Control-Pose

Base model

nvidia/Cosmos-Predict2-2B-Text2Image

Finetuned

circlestone-labs/Anima

Quantized

(26)

this model

Collection including Claquasse/Anima-Control-Pose

Anima-Control

Collection

Native control adapters for the Anima 2B image model (frozen base, composable adapters). Pose is the first module; identity / IP-Adapter to follow. • 1 item • Updated 9 days ago