|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- ai4bharat/IndicVoices |
|
language: |
|
- hi |
|
- gu |
|
- mr |
|
base_model: |
|
- openai/whisper-large-v3 |
|
pipeline_tag: automatic-speech-recognition |
|
--- |
|
|
|
# Open-Sarika |
|
|
|
This is a speech recognition and translation model for Indian languages (Hindi, Gujarati, and Marathi). The model can transcribe speech in these languages and translate between them. This is an open-source implementation inspired by Sarvam AI's Sarika model. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Model type:** Speech Recognition and Translation (based on Whisper architecture) |
|
- **Language(s):** Hindi (hi), Gujarati (gu), Marathi (mr) |
|
- **License:** MIT |
|
- **Base Model:** openai/whisper-large-v3 |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
The model can be used for: |
|
1. Transcribing speech in Hindi, Gujarati, and Marathi |
|
2. Translating speech between these languages |
|
|
|
Here's a simple example to get started: |
|
|
|
```python |
|
from transformers import WhisperProcessor, WhisperForConditionalGeneration |
|
import torch |
|
import librosa |
|
|
|
model_id = "theharshithh/open-sarika-v1" |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
# Load model and processor |
|
processor = WhisperProcessor.from_pretrained(model_id) |
|
model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device) |
|
model.config.forced_decoder_ids = None |
|
|
|
# Load and process audio |
|
audio_path = "your_audio.wav" |
|
audio, rate = librosa.load(audio_path, sr=16000) |
|
|
|
# Generate transcription |
|
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device) |
|
with torch.no_grad(): |
|
output_ids = model.generate(**inputs) |
|
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0] |
|
``` |
|
|
|
### Training Data |
|
|
|
The model was trained on a variety of datasets, including: |
|
- Project Vaani dataset: A large-scale Indian language collection project by the Indian Institute of Science (IISc) in collaboration with ARTPARK, funded by Google |
|
- High-quality speech recordings in Hindi, Gujarati, and Marathi from AI4Bharat |
|
- Real-world speech data from various sources |
|
|
|
### Hardware Requirements |
|
|
|
- Minimum RAM: 8GB |
|
- GPU: Recommended for faster inference |
|
- Storage: Model size is approximately 1.5GB |
|
|
|
## Model Card Contact |
|
|
|
For issues and feedback, please create an issue on the model's repository: https://huggingface.co/theharshithh/open-sarika-v1 |
|
|
|
## Github |
|
|
|
Github Repo: https://github.com/theharshithh/open-sarika |