Juhayna commited on
Commit
07766ba
·
verified ·
1 Parent(s): 8a081db

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: audio-classification
7
+ tags:
8
+ - music
9
+ ---
10
+
11
+ # MuQ & MuQ-MuLan
12
+
13
+ <div>
14
+ <a href='#'><img alt="Static Badge" src="https://img.shields.io/badge/Python-3.8%2B-blue?logo=python&logoColor=white"></a>
15
+ <a href='https://arxiv.org/abs/2501.01108'><img alt="Static Badge" src="https://img.shields.io/badge/arXiv-2501.01108-%23b31b1b?logo=arxiv&link=https%3A%2F%2Farxiv.org%2F"></a>
16
+ <a href='https://huggingface.co/OpenMuQ'><img alt="Static Badge" src="https://img.shields.io/badge/huggingface-OpenMuQ-%23FFD21E?logo=huggingface&link=https%3A%2F%2Fhuggingface.co%2FOpenMuQ"></a>
17
+ <a href='https://pytorch.org/'><img alt="Static Badge" src="https://img.shields.io/badge/framework-PyTorch-%23EE4C2C?logo=pytorch"></a>
18
+ <a href='https://pypi.org/project/muq'><img alt="Static Badge" src="https://img.shields.io/badge/pip%20install-muq-green?logo=PyPI&logoColor=white&link=https%3A%2F%2Fpypi.org%2Fproject%2Fmuq"></a>
19
+ </div>
20
+
21
+
22
+ This is the official repository for the paper *"**MuQ**: Self-Supervised **Mu**sic Representation Learning
23
+ with Mel Residual Vector **Q**uantization"*. For more detailed information, we strongly recommend referring to https://github.com/tencent-ailab/MuQ and the [paper]((https://arxiv.org/abs/2501.01108)).
24
+
25
+ In this repo, the following models are released:
26
+
27
+ - **MuQ**(see [this link](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter)): A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
28
+ - **MuQ-MuLan**(see [this link](https://huggingface.co/OpenMuQ/MuQ-MuLan-large)): A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.
29
+
30
+
31
+ ## Usage
32
+
33
+ To begin with, please use pip to install the official `muq` lib, and ensure that your `python>=3.8`:
34
+ ```bash
35
+ pip3 install muq
36
+ ```
37
+
38
+
39
+ Using **MuQ-MuLan** to extract the music and text embeddings and calculate the similarity:
40
+ ```python
41
+ import torch, librosa
42
+ from muq import MuQMuLan
43
+
44
+ # This will automatically fetch checkpoints from huggingface
45
+ device = 'cuda'
46
+ mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large")
47
+ mulan = mulan.to(device).eval()
48
+
49
+ # Extract music embeddings
50
+ wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
51
+ wavs = torch.tensor(wav).unsqueeze(0).to(device)
52
+ with torch.no_grad():
53
+ audio_embeds = mulan(wavs = wavs)
54
+
55
+ # Extract text embeddings (texts can be in English or Chinese)
56
+ texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"]
57
+ with torch.no_grad():
58
+ text_embeds = mulan(texts = texts)
59
+
60
+ # Calculate dot product similarity
61
+ sim = mulan.calc_similarity(audio_embeds, text_embeds)
62
+ print(sim)
63
+ ```
64
+
65
+
66
+ To extract music audio features using **MuQ**:
67
+ ```python
68
+ import torch, librosa
69
+ from muq import MuQ
70
+
71
+ device = 'cuda'
72
+ wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000)
73
+ wavs = torch.tensor(wav).unsqueeze(0).to(device)
74
+
75
+ # This will automatically fetch the checkpoint from huggingface
76
+ muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")
77
+ muq = muq.to(device).eval()
78
+
79
+ with torch.no_grad():
80
+ output = muq(wavs, output_hidden_states=True)
81
+
82
+ print('Total number of layers: ', len(output.hidden_states))
83
+ print('Feature shape: ', output.last_hidden_state.shape)
84
+
85
+ ```
86
+
87
+ ## Model Checkpoints
88
+
89
+ | Model Name | Parameters | Data | HuggingFace🤗 |
90
+ | ----------- | --- | --- | ----------- |
91
+ | MuQ | ~300M | MSD dataset | [OpenMuQ/MuQ-large-msd-iter](https://huggingface.co/OpenMuQ/MuQ-large-msd-iter) |
92
+ | MuQ-MuLan | ~700M | music-text pairs | [OpenMuQ/MuQ-MuLan-large](https://huggingface.co/OpenMuQ/MuQ-MuLan-large) |
93
+
94
+ **Note**: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper.
95
+
96
+ ## License
97
+
98
+ The code is released under the MIT license.
99
+
100
+ The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) are released under the CC-BY-NC 4.0 license.
101
+
102
+ ## Citation
103
+
104
+ ```
105
+ @article{zhu2025muq,
106
+ title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization},
107
+ author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
108
+ journal={arXiv preprint arXiv:2501.01108},
109
+ year={2025}
110
+ }
111
+ ```