video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions.
AI & ML interests
https://www.ee.tsinghua.edu.cn/en/
Recent Activity
View all activity
Organization Card
Department of Electronic Engineering, Tsinghua University
models
16
tsinghua-ee/video_SALMONN2plus_3B_audioAlign
5B
•
Updated
•
8
tsinghua-ee/D-ORCA-8B-0210
10B
•
Updated
•
22
•
1
tsinghua-ee/WAVE-7B
Updated
•
22
•
1
tsinghua-ee/video_SALMONN2_7B_audioAlign
Updated
•
21
tsinghua-ee/video_SALMONN2plus_72B_audioAlign
Updated
•
3
tsinghua-ee/video_SALMONN2plus_7B_audioAlign
9B
•
Updated
•
548
tsinghua-ee/SALMONN
Automatic Speech Recognition
•
Updated
•
50
tsinghua-ee/video-SALMONN-2_plus_72B
Updated
•
6
•
2
tsinghua-ee/video-SALMONN-2_plus_3B
Updated
•
1.53k
•
3
tsinghua-ee/video-SALMONN-2_plus_7B
Updated
•
905
•
6
datasets
8
tsinghua-ee/ELViM
Viewer
•
Updated
•
211
•
16
tsinghua-ee/SACRED-Bench
Viewer
•
Updated
•
2.48k
•
55
tsinghua-ee/F-16-NBA
Preview
•
Updated
•
43
tsinghua-ee/AVUTBenchmark
Viewer
•
Updated
•
3.28k
•
4.86k
•
1
tsinghua-ee/video-SALMONN_2_testset
Preview
•
Updated
•
121
tsinghua-ee/QualiSpeech
Viewer
•
Updated
•
14.6k
•
570
•
21
tsinghua-ee/RivaBench
Viewer
•
Updated
•
542
•
414
•
2
tsinghua-ee/SAVEBench
Preview
•
Updated
•
64
•
3