tttoaster commited on
Commit
40230fd
·
verified ·
1 Parent(s): 5945b45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +127 -3
README.md CHANGED
@@ -63,10 +63,134 @@ Specifically, ARC-Hunyuan-Video-7B is built on top of the Hunyuan-7B vision-lang
63
  <img src="https://github.com/TencentARC/ARC-Hunyuan-Video-7B/blob/master/figures/method.jpg?raw=true" width="95%"/>
64
  <p>
65
 
66
- ## News
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- - 2025.07.25: We release the [model checkpoint](https://huggingface.co/TencentARC/ARC-Hunyuan-Video-7B) and inference code of ARC-Hunyuan-Video-7B including [vLLM](https://github.com/vllm-project/vllm) version.
69
- - 2025.07.25: We release the [API service](https://arc.tencent.com/zh/document/ARC-Hunyuan-Video-7B) of ARC-Hunyuan-Video-7B, which is supported by [vLLM](https://github.com/vllm-project/vllm). We release two versions: one is V0, which only supports video description and summarization in Chinese; the other is the version consistent with the model checkpoint and the one described in the paper.
 
 
 
 
70
 
71
  ## Usage
72
  ### Dependencies
 
63
  <img src="https://github.com/TencentARC/ARC-Hunyuan-Video-7B/blob/master/figures/method.jpg?raw=true" width="95%"/>
64
  <p>
65
 
66
+ ## ARC-Qwen-Video-7B
67
+ In this version, we have switched the base model from hunyuan VLM to [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) and introduce [ARC-Qwen-Video-7B](https://huggingface.co/TencentARC/ARC-Qwen-Video-7B). We used the same training data and training stages. Please refere to the `arc-qwen-video` branch for details.
68
+
69
+ We are also introducing a new model, [ARC-Qwen-Video-7B-Narrator](https://huggingface.co/TencentARC/ARC-Qwen-Video-7B-Narrator). It can output **timestamped video descriptions, speaker identities, and the specific ASR (Automatic Speech Recognition) content**. By processing its output with an external LLM, you can obtain more comprehensive structured information as follows (Click to watch the video):
70
+
71
+ [<img src="https://img.youtube.com/vi/Bz1T4wCuWc8/maxresdefault.jpg" alt="视频" width="300">](https://www.youtube.com/watch?v=Bz1T4wCuWc8)
72
+
73
+ > ### 视频概述
74
+ >
75
+ > 这是一个喜剧短片,讲述了一位丈夫藏在棉衣里的私房钱被妻子意外发现,并误以为是丈夫准备的“惊喜”礼物。视频通过夫妻二人的一通电话,生动展现了丈夫从悠闲自得,到震惊错愕,再到崩溃无奈的全过程,充满了戏剧性的反转和幽默感。
76
+ >
77
+ > ### 情节发展分解
78
+ >
79
+ > 视频情节围绕一通电话展开,以下是详细的时间线、场景、说话人和对话内容:
80
+ >
81
+ > <table>
82
+ > <thead>
83
+ > <tr>
84
+ > <th>时间戳</th>
85
+ > <th>场景描述</th>
86
+ > <th>说话人</th>
87
+ > <th>对话内容 (ASR)</th>
88
+ > </tr>
89
+ > </thead>
90
+ > <tbody>
91
+ > <tr>
92
+ > <td>0:00 - 0:05</td>
93
+ > <td>丈夫头戴浴帽,围着浴巾,在室内泳池边悠闲地自拍。</td>
94
+ > <td>无</td>
95
+ > <td>(无对话)</td>
96
+ > </tr>
97
+ > <tr>
98
+ > <td>0:05 - 0:10</td>
99
+ > <td><b>镜头切换</b>:妻子在服装店里,满脸幸福地给丈夫打电话。</td>
100
+ > <td>妻子</td>
101
+ > <td>“哎,老公,老公,我爱你爱你,爱死你了,么么么。”</td>
102
+ > </tr>
103
+ > <tr>
104
+ > <td rowspan="2" style="vertical-align: top;">0:10 - 0:18</td>
105
+ > <td rowspan="2" style="vertical-align: top;">丈夫接起电话,对妻子的热情感到好奇,妻子则兴奋地揭晓了“惊喜”。</td>
106
+ > <td>丈夫</td>
107
+ > <td>“哎,怎么了你这是,这么高兴啊?”</td>
108
+ > </tr>
109
+ > <tr>
110
+ > <td>妻子</td>
111
+ > <td>“今天我在我的棉衣兜里,发现了你给我的惊喜,一万元哟。”</td>
112
+ > </tr>
113
+ > <tr>
114
+ > <td>0:18 - 0:27</td>
115
+ > <td>听到“一万元”,丈夫表情瞬间凝固,从疑惑变为震惊和懊悔,但仍强装镇定。</td>
116
+ > <td>丈夫</td>
117
+ > <td>“啊?好啊,你你你你开心高兴就行。”</td>
118
+ > </tr>
119
+ > <tr>
120
+ > <td>0:27 - 0:34</td>
121
+ > <td>妻子开心地告知钱的用途,丈夫的表情彻底僵住,震惊加剧。</td>
122
+ > <td>妻子</td>
123
+ > <td>“我当然高兴啊,我用它买了一件新衣裳,等晚上回去穿给你看啊。”</td>
124
+ > </tr>
125
+ > <tr>
126
+ > <td rowspan="3" style="vertical-align: top;">0:34 - 0:46</td>
127
+ > <td rowspan="3" style="vertical-align: top;">丈夫确认钱已被花掉,情绪崩溃。妻子则认为是丈夫授权的,丈夫忍不住骂了一句。</td>
128
+ > <td>丈夫</td>
129
+ > <td>“你已经给买成衣服了?”</td>
130
+ > </tr>
131
+ > <tr>
132
+ > <td>妻子</td>
133
+ > <td>“当然啦,不是你说的吗?说买我自己喜欢的东西。老公,你真是太好了。”</td>
134
+ > </tr>
135
+ > <tr>
136
+ > <td>丈夫</td>
137
+ > <td>“你真是败家娘们儿啊你。”</td>
138
+ > </tr>
139
+ > <tr>
140
+ > <td rowspan="4" style="vertical-align: top;">0:46 - 0:59</td>
141
+ > <td rowspan="4" style="vertical-align: top;">妻子察觉丈夫语气不对,丈夫立刻改口掩饰,并催促妻子早点回家。</td>
142
+ > <td>妻子</td>
143
+ > <td>“什么,老公,你说什么?”</td>
144
+ > </tr>
145
+ > <tr>
146
+ > <td>丈夫</td>
147
+ > <td>“啊?我说好啊,你漂亮我高兴。”</td>
148
+ > </tr>
149
+ > <tr>
150
+ > <td>妻子</td>
151
+ > <td>“你说的,老公。你今天呀,一定要早点回来哟,我等你哟。”</td>
152
+ > </tr>
153
+ > <tr>
154
+ > <td>丈夫</td>
155
+ > <td>“行行行行行。”</td>
156
+ > </tr>
157
+ > </tbody>
158
+ > </table>
159
+ >
160
+ > ### 人物与核心冲突
161
+ >
162
+ > #### 1. 人物分析
163
+ >
164
+ > 丈夫:
165
+ > 行为: 藏私房钱,事发后极力掩饰自己的真实情绪(心痛、懊悔)。
166
+ > 心理变化: 悠闲 -> 疑惑 -> 震惊 -> 崩溃 -> 无奈接受。
167
+ > 特点: 爱面子,对妻子既有爱意也有无奈,典型的“妻管严”形象。
168
+ >
169
+ > 妻子:
170
+ > 行为: 发现钱后,认为是丈夫的爱意表达,并迅速将其消费。
171
+ > 心理变化: 全程处于发现“惊喜”的幸福和喜悦中。
172
+ > 特点: 天真、消费果断,对丈夫充满信任和爱意。
173
+ >
174
+ > #### 2. 核心冲突
175
+ >
176
+ > 视频的核心冲突在于 “信息的严重不对等” 所造成的戏剧性误会:
177
+ >
178
+ > * 丈夫视角: 辛苦攒下的 $10,000$ 元私房钱被意外发现并花掉,是一场“惊吓”。
179
+ > * 妻子视角: 丈夫精心准备的 $10,000$ 元浪漫基金,是一份巨大的“惊喜”。
180
+ >
181
+ > 这个误会推动了整个故事的发展,丈夫的“打碎牙往肚里咽”和妻子的“理所当然的幸福”形成了强烈的喜剧反差,制造了密集的笑点。
182
+ >
183
+ > ### 总结
184
+ >
185
+ > 该视频通过一个关于“私房钱”的常见家庭情景,巧妙地构建了一个充满反转和幽默的故事。它利用戏剧性讽刺(观众和丈夫知道真相,而妻子蒙在鼓里)的手法,精准捕捉了丈夫在突发状况下的复杂心理活动。整个过程不仅笑料百出,也含蓄地探讨了夫妻间的沟通、信任和金钱观等话题,容易引发观众的共鸣和讨论。
186
+
187
 
188
+ ## News
189
+ - 2025.09.19: We release [ARC-Qwen-Video-7B](https://huggingface.co/TencentARC/ARC-Qwen-Video-7B), which switched the base model from hunyuan VLM to [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct). We also release [ARC-Qwen-Video-7B-Narrator](https://huggingface.co/TencentARC/ARC-Qwen-Video-7B-Narrator), which can output timestamped video descriptions, speaker identities, and the specific ASR (Automatic Speech Recognition) content. Please refere to the `arc-qwen-video` branch for details.
190
+ - 2025.08.05: We release [ShortVid-Bench](https://huggingface.co/datasets/TencentARC/ShortVid-Bench), a specialized, human-annotated benchmark with multiple-choice questions for evaluating short-video understanding.
191
+ - 2025.07.29: We release the training code for instruction tuning.
192
+ - 2025.07.25: We release the [model checkpoint](https://huggingface.co/TencentARC/ARC-Hunyuan-Video-7B) and inference code of ARC-Hunyuan-Video-7B including [vLLM](https://github.com/vllm-project/vllm) version.
193
+ - 2025.07.25: We release the [API service](https://arc.tencent.com/zh/document/ARC-Hunyuan-Video-7B) of ARC-Hunyuan-Video-7B, which is supported by [vLLM](https://github.com/vllm-project/vllm). We release two versions: one is V0, which only supports video description and summarization in Chinese; the other is the version consistent with the model checkpoint and the one described in the paper.
194
 
195
  ## Usage
196
  ### Dependencies