Video-to-Video

SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing

🤗 Available Models

Model Status Link
SAMA-5B Coming soon Coming soon
SAMA-14B Available syxbb/SAMA-14B

🚀 Quick Start

This repository contains the weights of SAMA-14B. For more instructions about how to use the model, please refer to the official GitHub repository.

Installation

Recommended environment:

  • Linux
  • NVIDIA GPU
  • CUDA 12.1 or a compatible environment
  • Python 3.10
git clone https://github.com/Cynthiazxy123/SAMA
cd SAMA

conda create -n sama python=3.10 -y
conda activate sama

pip install --upgrade pip
pip install -r requirements.txt

Inference

Prepare:

  1. The base Wan2.1-T2V-14B model directory.
  2. A SAMA checkpoint from Hugging Face.
  3. A source video and an edit instruction.

The inference script is:

infer_sh/run_sama.sh

Edit the variables at the top of that script before running:

  • MODEL_ROOT
  • STATE_DICT
  • SRC_VIDEO
  • PROMPT
  • OUTPUT_DIR

Then run:

bash infer_sh/run_sama.sh

The generated result will be saved to:

outputs/seed_1/<input_video_filename>

A recommended local model layout is:

models/
├── Wan2.1-T2V-14B/
│   ├── diffusion_pytorch_model-00001-of-00006.safetensors
│   ├── diffusion_pytorch_model-00002-of-00006.safetensors
│   ├── diffusion_pytorch_model-00003-of-00006.safetensors
│   ├── diffusion_pytorch_model-00004-of-00006.safetensors
│   ├── diffusion_pytorch_model-00005-of-00006.safetensors
│   ├── diffusion_pytorch_model-00006-of-00006.safetensors
│   ├── models_t5_umt5-xxl-enc-bf16.pth
│   ├── Wan2.1_VAE.pth
│   └── google/
└── SAMA-14B/
    └── <downloaded_checkpoint>.safetensors

Notes

  • Input frames are automatically padded to satisfy the 4k+1 frame requirement used by Wan video inference.
  • The output video uses the source video FPS when available; otherwise it falls back to --fps.
  • If --model-root is incomplete, the script will stop and report the missing files or directories.

📚 Citation

@misc{zhang2026samafactorizedsemanticanchoring,
      title={SAMA: Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing},
      author={Xinyao Zhang and Wenkai Dong and Yuxin Song and Bo Fang and Qi Zhang and Jing Wang and Fan Chen and Hui Zhang and Haocheng Feng and Yu Lu and Hang Zhou and Chun Yuan and Jingdong Wang},
      year={2026},
      eprint={2603.19228},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.19228},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for syxbb/SAMA-14B

Quantizations
1 model

Dataset used to train syxbb/SAMA-14B

Collection including syxbb/SAMA-14B

Paper for syxbb/SAMA-14B

Free AI Image Generator No sign-up. Instant results. Open Now