File size: 2,393 Bytes
445ca75 30cec52 445ca75 b3a0ae3 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 30cec52 445ca75 dce5a9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
library_name: transformers
license: mit
datasets:
- ai4bharat/IndicVoices
language:
- hi
- gu
- mr
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
---
# Open-Sarika
This is a speech recognition and translation model for Indian languages (Hindi, Gujarati, and Marathi). The model can transcribe speech in these languages and translate between them. This is an open-source implementation inspired by Sarvam AI's Sarika model.
## Model Details
### Model Description
- **Model type:** Speech Recognition and Translation (based on Whisper architecture)
- **Language(s):** Hindi (hi), Gujarati (gu), Marathi (mr)
- **License:** MIT
- **Base Model:** openai/whisper-large-v3
## Uses
### Direct Use
The model can be used for:
1. Transcribing speech in Hindi, Gujarati, and Marathi
2. Translating speech between these languages
Here's a simple example to get started:
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
model_id = "theharshithh/open-sarika-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and processor
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device)
model.config.forced_decoder_ids = None
# Load and process audio
audio_path = "your_audio.wav"
audio, rate = librosa.load(audio_path, sr=16000)
# Generate transcription
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(**inputs)
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
```
### Training Data
The model was trained on a variety of datasets, including:
- Project Vaani dataset: A large-scale Indian language collection project by the Indian Institute of Science (IISc) in collaboration with ARTPARK, funded by Google
- High-quality speech recordings in Hindi, Gujarati, and Marathi from AI4Bharat
- Real-world speech data from various sources
### Hardware Requirements
- Minimum RAM: 8GB
- GPU: Recommended for faster inference
- Storage: Model size is approximately 1.5GB
## Model Card Contact
For issues and feedback, please create an issue on the model's repository: https://huggingface.co/theharshithh/open-sarika-v1
## Github
Github Repo: https://github.com/theharshithh/open-sarika |