A comprehensive dataset collection for Indic language information retrieval.
AI4Bharat
non-profit
Verified
AI & ML interests
None defined yet.
Recent Activity
View all activity
Collection of Parler-TTS models adapted to Indian languages.
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams.
A collection of ASR models for 22 scheduled languages of India
-
ai4bharat/indicconformer_stt_as_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 43 • 4 -
ai4bharat/indicconformer_stt_bn_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 72 • 1 -
ai4bharat/indicconformer_stt_brx_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 2 -
ai4bharat/indicconformer_stt_doi_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 2
A collection of benchmarks used for evaluation of Airavata, an Hindi instruction-tuned model on top of Sarvam's OpenHathi base model.
IndicXTREME is a human-supervised benchmark of 9 diverse NLU tasks across 20 languages, featuring 105 evaluation sets in total.
IndicNLG Benchmark is a dataset collection designed for benchmarking Natural Language Generation (NLG) across 11 Indic languages.
Romansetu is a collection of models address the challenge of extending Large Language Models (LLMs) to non-English languages using non-Latin scripts
A Speech Translation Dataset for 13 Indian Languages
Hercule series of Evaluation models
Largest Collections of Pretraining and Instruction Finetuning datasets for 22 Indic languages.
Models(En-Indic, Indic-En, Indic-Indic) in 2 variants (base and dist) and Benchmarks (IN22-Gen and IN22-Conv) released as a part of IndicTrans2.
-
ai4bharat/indictrans2-en-indic-1B
Translation • 1B • Updated • 39.3k • 34 -
ai4bharat/indictrans2-en-indic-dist-200M
Translation • 0.3B • Updated • 76.2k • 17 -
ai4bharat/indictrans2-indic-en-1B
Translation • 1B • Updated • 64.1k • 21 -
ai4bharat/indictrans2-indic-en-dist-200M
Translation • 0.2B • Updated • 5.09k • 5
IndicBERT v2 is a multilingual BERT model pretrained on IndicCorp v2, an Indic monolingual corpus of 20.9 billion tokens, covering 24 consitutionally
A comprehensive dataset collection for Indic language information retrieval.
Romansetu is a collection of models address the challenge of extending Large Language Models (LLMs) to non-English languages using non-Latin scripts
Collection of Parler-TTS models adapted to Indian languages.
A Speech Translation Dataset for 13 Indian Languages
ELAICHI: Enhancing Low-resource TTS by Addressing Infrequent and Low-frequency Character Bigrams.
Hercule series of Evaluation models
A collection of ASR models for 22 scheduled languages of India
-
ai4bharat/indicconformer_stt_as_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 43 • 4 -
ai4bharat/indicconformer_stt_bn_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 72 • 1 -
ai4bharat/indicconformer_stt_brx_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 2 -
ai4bharat/indicconformer_stt_doi_hybrid_ctc_rnnt_large
Automatic Speech Recognition • Updated • 2
Largest Collections of Pretraining and Instruction Finetuning datasets for 22 Indic languages.
A collection of benchmarks used for evaluation of Airavata, an Hindi instruction-tuned model on top of Sarvam's OpenHathi base model.
Models(En-Indic, Indic-En, Indic-Indic) in 2 variants (base and dist) and Benchmarks (IN22-Gen and IN22-Conv) released as a part of IndicTrans2.
-
ai4bharat/indictrans2-en-indic-1B
Translation • 1B • Updated • 39.3k • 34 -
ai4bharat/indictrans2-en-indic-dist-200M
Translation • 0.3B • Updated • 76.2k • 17 -
ai4bharat/indictrans2-indic-en-1B
Translation • 1B • Updated • 64.1k • 21 -
ai4bharat/indictrans2-indic-en-dist-200M
Translation • 0.2B • Updated • 5.09k • 5
IndicXTREME is a human-supervised benchmark of 9 diverse NLU tasks across 20 languages, featuring 105 evaluation sets in total.
IndicBERT v2 is a multilingual BERT model pretrained on IndicCorp v2, an Indic monolingual corpus of 20.9 billion tokens, covering 24 consitutionally
IndicNLG Benchmark is a dataset collection designed for benchmarking Natural Language Generation (NLG) across 11 Indic languages.