Update README.md
Browse files
README.md
CHANGED
@@ -6,16 +6,63 @@ tags:
|
|
6 |
- unsloth
|
7 |
- whisper
|
8 |
- trl
|
|
|
|
|
|
|
|
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- en
|
12 |
---
|
13 |
|
14 |
-
# Uploaded model
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
21 |
|
|
|
6 |
- unsloth
|
7 |
- whisper
|
8 |
- trl
|
9 |
+
- audio
|
10 |
+
- audio-classification
|
11 |
+
- speech-processing
|
12 |
+
- noise-detection
|
13 |
license: apache-2.0
|
14 |
language:
|
15 |
- en
|
16 |
---
|
17 |
|
|
|
18 |
|
19 |
+
# Speech Quality and Environmental Noise Classifier
|
20 |
+
|
21 |
+
This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.
|
22 |
+
|
23 |
+
It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).
|
24 |
+
|
25 |
+
- **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
|
26 |
+
- **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.
|
27 |
+
|
28 |
+
## Intended Uses & Limitations
|
29 |
+
|
30 |
+
This model is ideal for:
|
31 |
+
- Pre-processing a large audio dataset to filter for clean samples.
|
32 |
+
- Automatically tagging audio clips for quality control.
|
33 |
+
- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.
|
34 |
+
|
35 |
+
**Limitations:**
|
36 |
+
- This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
|
37 |
+
- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.
|
38 |
+
|
39 |
+
## How to Use
|
40 |
+
|
41 |
+
The easiest way to use this model is with a `pipeline`.
|
42 |
+
|
43 |
+
```bash
|
44 |
+
pip install transformers torch
|
45 |
+
```
|
46 |
+
|
47 |
+
```python
|
48 |
+
from transformers import pipeline
|
49 |
+
|
50 |
+
# Replace "Etherll/NoisySpeechDetection-v0.2" with the actual model path
|
51 |
+
classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")
|
52 |
+
|
53 |
+
# Classify a local audio file (must be a WAV or other supported format)
|
54 |
+
# The pipeline automatically handles resampling to 16kHz.
|
55 |
+
results = classifier("path/to/your_audio_file.wav")
|
56 |
+
|
57 |
+
# The result is a list of dictionaries
|
58 |
+
# [{'score': 0.9979726672172546, 'label': 'clean'},
|
59 |
+
# {'score': 0.002027299487963319, 'label': 'noisy'}]
|
60 |
+
print(results)
|
61 |
+
```
|
62 |
+
|
63 |
+
## Training Data
|
64 |
+
|
65 |
+
This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.
|
66 |
|
67 |
This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
|
68 |
|