Etherll
/

NoisySpeechDetection-v0.2

Audio Classification

text-generation-inference

speech-processing

noise-detection

Model card Files Files and versions

Etherll commited on Jun 11

Commit

a9dd5c0

·

verified ·

1 Parent(s): afa0ffd

Update README.md

Files changed (1) hide show

README.md +51 -4

README.md CHANGED Viewed

@@ -6,16 +6,63 @@ tags:
 - unsloth
 - whisper
 - trl
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** Etherll
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/whisper-small
 This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

 - unsloth
 - whisper
 - trl
+- audio
+- audio-classification
+- speech-processing
+- noise-detection
 license: apache-2.0
 language:
 - en
 ---
+# Speech Quality and Environmental Noise Classifier
+This is a binary audio classification model that determines if a speech recording is **clean** or if it is degraded by **environmental noise**.
+It is specifically trained to be robust and understand the difference between clean audio and audio that has actual background noise (like cars, music, or other people talking).
+- **LABEL_0: `clean`**: The audio contains speech with no significant environmental noise. This includes high-quality recordings as well as recordings with source artifacts like hiss, clipping, or "bad microphone" quality.
+- **LABEL_1: `noisy`**: The audio contains speech that is obscured by external, environmental background noise.
+## Intended Uses & Limitations
+This model is ideal for:
+- Pre-processing a large audio dataset to filter for clean samples.
+- Automatically tagging audio clips for quality control.
+- As a gate for ASR (Automatic Speech Recognition) systems that perform better on clean audio.
+**Limitations:**
+- This model is a **classifier**, not a noise-reduction tool. It only tells you *if* environmental noise is present.
+- Its definition of "noisy" is based on environmental sounds. It is trained to classify audio with only source artifacts (like microphone hum or pure static) as `clean`.
+## How to Use
+The easiest way to use this model is with a `pipeline`.
+```bash
+pip install transformers torch
+```
+```python
+from transformers import pipeline
+# Replace "Etherll/NoisySpeechDetection-v0.2" with the actual model path
+classifier = pipeline("audio-classification", model="Etherll/NoisySpeechDetection-v0.2")
+# Classify a local audio file (must be a WAV or other supported format)
+# The pipeline automatically handles resampling to 16kHz.
+results = classifier("path/to/your_audio_file.wav")
+# The result is a list of dictionaries
+# [{'score': 0.9979726672172546, 'label': 'clean'},
+# {'score': 0.002027299487963319, 'label': 'noisy'}]
+print(results)
+```
+## Training Data
+This model was trained on a sophisticated, custom-built dataset of ~55,000 audio clips, specifically designed to teach the nuances of audio quality.
 This whisper model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.