TencentARC
/

ARC-Hunyuan-Video-7B

Video-Text-to-Text

arc_hunyuan_video

text-generation

video-understanding

video-audio understanding

video-captioning

video-grounding

video-reasoning

short video understanding

Model card Files Files and versions

tttoaster commited on Jul 29

Commit

bbc9a99

·

verified ·

1 Parent(s): 15e4fb8

Update README.md

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -1,7 +1,6 @@
 # ARC-Hunyuan-Video-7B
-<!-- [![arXiv](https://img.shields.io/badge/arXiv-2404.14396-b31b1b.svg)](https://arxiv.org/abs/2404.14396)-->
 [![Demo](https://img.shields.io/badge/ARC-Demo-blue)](https://arc.tencent.com/en/ai-demos/multimodal)
 [![Code](https://img.shields.io/badge/Github-Code-orange)](https://github.com/TencentARC/ARC-Hunyuan-Video-7B)
 [![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/TencentARC/ARC-Hunyuan-Video-7B)
@@ -126,18 +125,17 @@ Due to video file size limitations imposed by the deployment API, we compressed
 We observe that incorporating generic video datasets during training may inadvertently compromise the model's capacity for real-world video understanding, potentially due to domain shift or noise introduced by non-real-world samples. To address this limitation, we plan to develop a dedicated model trained exclusively on rigorously curated real-world video data.
-<!-- ## Citation
 If you find the work helpful, please consider citing:
 ```bash
-@article{ge2024seed,
-  title={SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation},
-  author={Ge, Yuying and Zhao, Sijie and Zhu, Jinguo and Ge, Yixiao and Yi, Kun and Song, Lin and Li, Chen and Ding, Xiaohan and Shan, Ying},
-  journal={arXiv preprint arXiv:2404.14396},
-  year={2024}
 }
 ```
--->

 # ARC-Hunyuan-Video-7B
+[![arXiv](https://img.shields.io/badge/arXiv-2507.20939-b31b1b.svg)](https://arxiv.org/abs/2507.20939)
 [![Demo](https://img.shields.io/badge/ARC-Demo-blue)](https://arc.tencent.com/en/ai-demos/multimodal)
 [![Code](https://img.shields.io/badge/Github-Code-orange)](https://github.com/TencentARC/ARC-Hunyuan-Video-7B)
 [![Static Badge](https://img.shields.io/badge/Model-Huggingface-yellow)](https://huggingface.co/TencentARC/ARC-Hunyuan-Video-7B)
 We observe that incorporating generic video datasets during training may inadvertently compromise the model's capacity for real-world video understanding, potentially due to domain shift or noise introduced by non-real-world samples. To address this limitation, we plan to develop a dedicated model trained exclusively on rigorously curated real-world video data.
+## Citation
 If you find the work helpful, please consider citing:
 ```bash
+@article{ge2025seed,
+  title={ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts},
+  author={Ge, Yuying and Ge, Yixiao and Li, Chen and Wang, Teng and Pu, Junfu and Li, Yizhuo and Qiu, Lu and Ma, Jin and Duan, Lisheng and Zuo, Xinyu and Luo, Jinwen and Gu, Weibo and Li, Zexuan and Zhang, Xiaojing and Tao, Yangyu and Hu, Han and Wang, Di and Shan Ying},
+  journal={arXiv preprint arXiv:2507.20939},
+  year={2025}
 }
 ```