Etherll commited on
Commit
a9dd5c0
·
verified ·
1 Parent(s): afa0ffd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -4
README.md CHANGED
@@ -6,16 +6,63 @@ tags:
6
  - unsloth
7
  - whisper
8
  - trl
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
12
  ---
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** Etherll
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/whisper-small
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
 
6
  - unsloth
7
  - whisper
8
  - trl
9
+ - audio
10
+ - audio-classification
11
+ - speech-processing
12
+ - noise-detection
13
  license: apache-2.0
14
  language:
15
  - en
16
  ---
17
 
 
18
 
19
+ # Speech Quality and Environmental Noise Classifier
20
+
21
+ This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.
22
+
23
+ It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).
24
+
25
+ - **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
26
+ - **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.
27
+
28
+ ## Intended Uses & Limitations
29
+
30
+ This model is ideal for:
31
+ - Pre-processing a large audio dataset to filter for clean samples.
32
+ - Automatically tagging audio clips for quality control.
33
+ - As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.
34
+
35
+ **Limitations:**
36
+ - This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
37
+ - Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.
38
+
39
+ ## How to Use
40
+
41
+ The easiest way to use this model is with a `pipeline`.
42
+
43
+ ```bash
44
+ pip install transformers torch
45
+ ```
46
+
47
+ ```python
48
+ from transformers import pipeline
49
+
50
+ # Replace "Etherll/NoisySpeechDetection-v0.2" with the actual model path
51
+ classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")
52
+
53
+ # Classify a local audio file (must be a WAV or other supported format)
54
+ # The pipeline automatically handles resampling to 16kHz.
55
+ results = classifier("path/to/your_audio_file.wav")
56
+
57
+ # The result is a list of dictionaries
58
+ # [{'score': 0.9979726672172546, 'label': 'clean'},
59
+ # {'score': 0.002027299487963319, 'label': 'noisy'}]
60
+ print(results)
61
+ ```
62
+
63
+ ## Training Data
64
+
65
+ This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.
66
 
67
  This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
68