theharshithh
/

open-sarika

Automatic Speech Recognition

Model card Files Files and versions

open-sarika / README.md

theharshithh's picture

Update README.md

dce5a9a verified 4 months ago

|

history blame contribute delete

2.39 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- ai4bharat/IndicVoices
	language:
	- hi
	- gu
	- mr
	base_model:
	- openai/whisper-large-v3
	pipeline_tag: automatic-speech-recognition
	---

	# Open-Sarika

	This is a speech recognition and translation model for Indian languages (Hindi, Gujarati, and Marathi). The model can transcribe speech in these languages and translate between them. This is an open-source implementation inspired by Sarvam AI's Sarika model.

	## Model Details

	### Model Description

	- Model type: Speech Recognition and Translation (based on Whisper architecture)
	- Language(s): Hindi (hi), Gujarati (gu), Marathi (mr)
	- License: MIT
	- Base Model: openai/whisper-large-v3

	## Uses

	### Direct Use

	The model can be used for:
	1. Transcribing speech in Hindi, Gujarati, and Marathi
	2. Translating speech between these languages

	Here's a simple example to get started:

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import torch
	import librosa

	model_id = "theharshithh/open-sarika-v1"
	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Load model and processor
	processor = WhisperProcessor.from_pretrained(model_id)
	model = WhisperForConditionalGeneration.from_pretrained(model_id).to(device)
	model.config.forced_decoder_ids = None

	# Load and process audio
	audio_path = "your_audio.wav"
	audio, rate = librosa.load(audio_path, sr=16000)

	# Generate transcription
	inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)
	with torch.no_grad():
	output_ids = model.generate(**inputs)
	transcription = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
	```

	### Training Data

	The model was trained on a variety of datasets, including:
	- Project Vaani dataset: A large-scale Indian language collection project by the Indian Institute of Science (IISc) in collaboration with ARTPARK, funded by Google
	- High-quality speech recordings in Hindi, Gujarati, and Marathi from AI4Bharat
	- Real-world speech data from various sources

	### Hardware Requirements

	- Minimum RAM: 8GB
	- GPU: Recommended for faster inference
	- Storage: Model size is approximately 1.5GB

	## Model Card Contact

	For issues and feedback, please create an issue on the model's repository: https://huggingface.co/theharshithh/open-sarika-v1

	## Github

	Github Repo: https://github.com/theharshithh/open-sarika