--- license: cdla-permissive-2.0 --- ## Model Summary [DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for speech-only signals with high-quality signal reconstruction. ## Usage follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions download the model weights from the current repo (e.g., *weights_24khz_1.5kbps_v1.0*) ### Compress audio ``` python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path /path/to/weights_24khz_1.5kbps_v1.0 ``` This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options. ### Reconstruct audio from compressed codes ``` python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path /path/to/weights_24khz_1.5kbps_v1.0 ``` This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options. ### Programmatic Usage ```py import dac from audiotools import AudioSignal # Download a model model_path = /path/to/weights_24khz_1.5kbps_v1.0 model = dac.DAC.load(model_path) model.to('cuda') # Load audio signal file signal = AudioSignal('input.wav') # Encode audio signal as one long file # (may run out of GPU memory on long files) signal.to(model.device) x = model.preprocess(signal.audio_data, signal.sample_rate) z, codes, latents, _, _ = model.encode(x) # Decode audio signal y = model.decode(z) # Alternatively, use the `compress` and `decompress` functions # to compress long files. signal = signal.cpu() x = model.compress(signal) # Save and load to and from disk x.save("compressed.dac") x = dac.DACFile.load("compressed.dac") # Decompress it back to an AudioSignal y = model.decompress(x) # Write to file y.write('output.wav') ```