open-sarika / README.md
theharshithh's picture
Update README.md
dce5a9a verified
---
library_name: transformers
license: mit
datasets:
- ai4bharat/IndicVoices
language:
- hi
- gu
- mr
base_model:
- openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
---
# Open-Sarika
This is a speech recognition and translation model for Indian languages (Hindi, Gujarati, and Marathi). The model can transcribe speech in these languages and translate between them. This is an open-source implementation inspired by Sarvam AI's Sarika model.
## Model Details
### Model Description
- **Model type:** Speech Recognition and Translation (based on Whisper architecture)
- **Language(s):** Hindi (hi), Gujarati (gu), Marathi (mr)
- **License:** MIT
- **Base Model:** openai/whisper-large-v3
## Uses
### Direct Use
The model can be used for:
1. Transcribing speech in Hindi, Gujarati, and Marathi
2. Translating speech between these languages
Here's a simple example to get started:
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import librosa
model_id = "theharshithh/open-sarika-v1"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load model and processor
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device)
model.config.forced_decoder_ids = None
# Load and process audio
audio_path = "your_audio.wav"
audio, rate = librosa.load(audio_path, sr=16000)
# Generate transcription
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)
with torch.no_grad():
output_ids = model.generate(**inputs)
transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
```
### Training Data
The model was trained on a variety of datasets, including:
- Project Vaani dataset: A large-scale Indian language collection project by the Indian Institute of Science (IISc) in collaboration with ARTPARK, funded by Google
- High-quality speech recordings in Hindi, Gujarati, and Marathi from AI4Bharat
- Real-world speech data from various sources
### Hardware Requirements
- Minimum RAM: 8GB
- GPU: Recommended for faster inference
- Storage: Model size is approximately 1.5GB
## Model Card Contact
For issues and feedback, please create an issue on the model's repository: https://huggingface.co/theharshithh/open-sarika-v1
## Github
Github Repo: https://github.com/theharshithh/open-sarika