Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue, Lu Jiang

Project Lead   Corresponding Authors

Project Page Tar Paper on arXiv Huggingface Model Huggingface Space Huggingface Space

Citation

@article{han2025tar,
  title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
  author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
  journal={arXiv preprint arXiv:2506.18898},
  year={2025},
}

License

This project is licensed under the Apache 2.0 License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ByteDance-Seed/Tar-TA-Tok

Finetuned
(14)
this model

Spaces using ByteDance-Seed/Tar-TA-Tok 3

Collection including ByteDance-Seed/Tar-TA-Tok