SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

🤗 Available Models

Model	Status	Link
SAMA-5B	Coming soon	Coming soon
SAMA-14B	Available	syxbb/SAMA-14B

🚀 Quick Start

This repository contains the weights of SAMA-14B. For more instructions about how to use the model, please refer to the official GitHub repository.

Installation

Recommended environment:

Linux
NVIDIA GPU
CUDA 12.1 or a compatible environment
Python 3.10

git clone https://github.com/Cynthiazxy123/SAMA
cd SAMA

conda create -n sama python=3.10 -y
conda activate sama

pip install --upgrade pip
pip install -r requirements.txt

Inference

Prepare:

The base Wan2.1-T2V-14B model directory.
A SAMA checkpoint from Hugging Face.
A source video and an edit instruction.

The inference script is:

infer_sh/run_sama.sh

Edit the variables at the top of that script before running:

MODEL_ROOT
STATE_DICT
SRC_VIDEO
PROMPT
OUTPUT_DIR

Then run:

bash infer_sh/run_sama.sh

The generated result will be saved to:

outputs/seed_1/<input_video_filename>

A recommended local model layout is:

models/
├── Wan2.1-T2V-14B/
│   ├── diffusion_pytorch_model-00001-of-00006.safetensors
│   ├── diffusion_pytorch_model-00002-of-00006.safetensors
│   ├── diffusion_pytorch_model-00003-of-00006.safetensors
│   ├── diffusion_pytorch_model-00004-of-00006.safetensors
│   ├── diffusion_pytorch_model-00005-of-00006.safetensors
│   ├── diffusion_pytorch_model-00006-of-00006.safetensors
│   ├── models_t5_umt5-xxl-enc-bf16.pth
│   ├── Wan2.1_VAE.pth
│   └── google/
└── SAMA-14B/
    └── <downloaded_checkpoint>.safetensors

Notes

Input frames are automatically padded to satisfy the 4k+1 frame requirement used by Wan video inference.
The output video uses the source video FPS when available; otherwise it falls back to --fps.
If --model-root is incomplete, the script will stop and report the missing files or directories.

📚 Citation

@misc{zhang2026samafactorizedsemanticanchoring,
      title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing},
      author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang},
      year={2026},
      eprint={2603.19228},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.19228},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Video-to-Video

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for syxbb/SAMA-14B

Quantizations

1 model

Dataset used to train syxbb/SAMA-14B

Collection including syxbb/SAMA-14B

SAMA

Collection

2 items • Updated 4 days ago • 1

Paper for syxbb/SAMA-14B

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

Paper • 2603.19228 • Published 4 days ago • 63