nvidia
/

audio-flamingo-2-SoundCoT

Audio-Text-to-Text

Model card Files Files and versions Community

ZhifengKong commited on 24 days ago

Commit

4a08e7e

·

1 Parent(s): 41d6708

update readme

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -20,6 +20,17 @@ This repo contains the PyTorch implementation of [Audio Flamingo Sound-CoT Techn
 - Audio Flamingo 2 Sound-CoT shows strong reasoning abilities on several sound reasoning benchmarks, despite being small (3B) and trained exclusively on public datasets.
 ## License
 - The code in this repo is under MIT license.

 - Audio Flamingo 2 Sound-CoT shows strong reasoning abilities on several sound reasoning benchmarks, despite being small (3B) and trained exclusively on public datasets.
+## Usage
+The inference script is almost the same as [Audio Flamingo 2](https://github.com/NVIDIA/audio-flamingo/tree/audio_flamingo_2/inference_HF_pretrained). The only difference is to add a special prompt (```Output the answer with <SUMMARY>, <CAPTION>, <REASONING>, and <CONCLUSION> tags.```) after the input question. For instance, in Audio Flamingo 2, the input is
+```
+Based on the given audio, identify the source of the church bells. Choose the correct option from the following options:\n(A) Church\n(B) School\n(C) Clock Tower\n(D) Fire Station.
+```
+In Audio Flamingo 2 Sound-CoT, the input is
+```
+Based on the given audio, identify the source of the church bells. Choose the correct option from the following options:\n(A) Church\n(B) School\n(C) Clock Tower\n(D) Fire Station. Output the answer with <SUMMARY>, <CAPTION>, <REASONING>, and <CONCLUSION> tags.
+```
 ## License
 - The code in this repo is under MIT license.