OpenGVLab
/

InternVideo2_5_Chat_8B

Video-Text-to-Text

feature-extraction

Model card Files Files and versions Community

ynhe commited on 24 days ago

Commit

1037d18

·

verified ·

1 Parent(s): bff14a1

Update README.md

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -87,10 +87,21 @@ model-index:
 ## 📈 Performance
 | Model |  MVBench | LongVideoBench |  VideoMME(w/o sub)|
 | ---   |  ---     |   ---            | ---     |
 |InternVideo2.5| 75.7 |  60.6   | 65.1|
 ## 🚀 How to use the model
 First, you need to install [flash attention2](https://github.com/Dao-AILab/flash-attention) and some other modules. We provide a simple installation example below:

 ## 📈 Performance
+- VideoBenchmark
 | Model |  MVBench | LongVideoBench |  VideoMME(w/o sub)|
 | ---   |  ---     |   ---            | ---     |
 |InternVideo2.5| 75.7 |  60.6   | 65.1|
+- Inference Speed
+We measured the average inference speed (tokens/s) of generating 1024 new tokens and 5198 (8192-2998) tokens with the context of an video (which takes 2998 tokens) under BF16 precision.
+|Quantization |	Speed (3022 tokens)	| Speed (8192 tokens)|
+|--- |--- |---|
+|BF16 |	33.40 |	31.91 |
 ## 🚀 How to use the model
 First, you need to install [flash attention2](https://github.com/Dao-AILab/flash-attention) and some other modules. We provide a simple installation example below: