Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
pipeline_tag: audio-classification
|
7 |
+
tags:
|
8 |
+
- music
|
9 |
+
---
|
10 |
+
|
11 |
+
# MuQ & MuQ-MuLan
|
12 |
+
|
13 |
+
<div>
|
14 |
+
<a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>
|
15 |
+
<a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>
|
16 |
+
<a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>
|
17 |
+
<a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>
|
18 |
+
<a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>
|
19 |
+
</div>
|
20 |
+
|
21 |
+
|
22 |
+
This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning
|
23 |
+
with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).
|
24 |
+
|
25 |
+
In this repo, the following models are released:
|
26 |
+
|
27 |
+
- **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
|
28 |
+
- **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.
|
29 |
+
|
30 |
+
|
31 |
+
## Usage
|
32 |
+
|
33 |
+
To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
|
34 |
+
```bash
|
35 |
+
pip3 install muq
|
36 |
+
```
|
37 |
+
|
38 |
+
|
39 |
+
Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity:
|
40 |
+
```python
|
41 |
+
import torch, librosa
|
42 |
+
from muq import MuQMuLan
|
43 |
+
|
44 |
+
# This will automatically fetch checkpoints from huggingface
|
45 |
+
device = 'cuda'
|
46 |
+
mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
|
47 |
+
mulan = mulan.to(device).eval()
|
48 |
+
|
49 |
+
# Extract music embeddings
|
50 |
+
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
|
51 |
+
wavs = torch.tensor(wav).unsqueeze(0).to(device)
|
52 |
+
with torch.no_grad():
|
53 |
+
audio_embeds = mulan(wavs = wavs)
|
54 |
+
|
55 |
+
# Extract text embeddings (texts can be in English or Chinese)
|
56 |
+
texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"]
|
57 |
+
with torch.no_grad():
|
58 |
+
text_embeds = mulan(texts = texts)
|
59 |
+
|
60 |
+
# Calculate dot product similarity
|
61 |
+
sim = mulan.calc_similarity(audio_embeds, text_embeds)
|
62 |
+
print(sim)
|
63 |
+
```
|
64 |
+
|
65 |
+
|
66 |
+
To extract music audio features using **MuQ**:
|
67 |
+
```python
|
68 |
+
import torch, librosa
|
69 |
+
from muq import MuQ
|
70 |
+
|
71 |
+
device = 'cuda'
|
72 |
+
wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
|
73 |
+
wavs = torch.tensor(wav).unsqueeze(0).to(device)
|
74 |
+
|
75 |
+
# This will automatically fetch the checkpoint from huggingface
|
76 |
+
muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
|
77 |
+
muq = muq.to(device).eval()
|
78 |
+
|
79 |
+
with torch.no_grad():
|
80 |
+
output = muq(wavs, output_hidden_states=True)
|
81 |
+
|
82 |
+
print('Total number of layers: ', len(output.hidden_states))
|
83 |
+
print('Feature shape: ', output.last_hidden_state.shape)
|
84 |
+
|
85 |
+
```
|
86 |
+
|
87 |
+
## Model Checkpoints
|
88 |
+
|
89 |
+
| Model Name | Parameters | Data | HuggingFace🤗 |
|
90 |
+
| ----------- | --- | --- | ----------- |
|
91 |
+
| MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) |
|
92 |
+
| MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) |
|
93 |
+
|
94 |
+
**Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.
|
95 |
+
|
96 |
+
## License
|
97 |
+
|
98 |
+
The code is released under the MIT license.
|
99 |
+
|
100 |
+
The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.
|
101 |
+
|
102 |
+
## Citation
|
103 |
+
|
104 |
+
```
|
105 |
+
@article{zhu2025muq,
|
106 |
+
title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},
|
107 |
+
author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
|
108 |
+
journal={arXiv preprint arXiv:2501.01108},
|
109 |
+
year={2025}
|
110 |
+
}
|
111 |
+
```
|