|  | --- | 
					
						
						|  | language: hi | 
					
						
						|  | license: mit | 
					
						
						|  | tags: | 
					
						
						|  | - hindi | 
					
						
						|  | - embeddings | 
					
						
						|  | - sentence-embeddings | 
					
						
						|  | - semantic-search | 
					
						
						|  | - text-similarity | 
					
						
						|  | datasets: | 
					
						
						|  | - custom | 
					
						
						|  | pipeline_tag: sentence-similarity | 
					
						
						|  | library_name: transformers | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # Hindi Sentence Embeddings Model | 
					
						
						|  |  | 
					
						
						|  | This is a custom state-of-the-art sentence embedding model trained specifically for Hindi text. It leverages an advanced transformer architecture with specialized pooling strategies to create high-quality semantic representations of Hindi sentences. | 
					
						
						|  |  | 
					
						
						|  | ## Features | 
					
						
						|  |  | 
					
						
						|  | - Specialized for Hindi language text | 
					
						
						|  | - Advanced transformer architecture with optimized attention mechanism | 
					
						
						|  | - Multiple pooling strategies for enhanced semantic representations | 
					
						
						|  | - Creates normalized vector representations for semantic similarity | 
					
						
						|  | - Supports semantic search and text similarity applications | 
					
						
						|  |  | 
					
						
						|  | ## Usage | 
					
						
						|  |  | 
					
						
						|  | ### Installation | 
					
						
						|  |  | 
					
						
						|  | ```bash | 
					
						
						|  | pip install torch sentencepiece scikit-learn matplotlib | 
					
						
						|  | git lfs install | 
					
						
						|  | git clone https://huggingface.co/DeepMostInnovations/hindi-embedding-foundational-model | 
					
						
						|  | cd hindi-embedding-foundational-model | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ### Quick Start | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from hindi_embeddings import HindiEmbedder | 
					
						
						|  |  | 
					
						
						|  | # Initialize the embedder | 
					
						
						|  | model = HindiEmbedder("path/to/hindi-embedding-foundational-model") | 
					
						
						|  |  | 
					
						
						|  | # Encode sentences to embeddings | 
					
						
						|  | sentences = [ | 
					
						
						|  | "मुझे हिंदी भाषा बहुत पसंद है।", | 
					
						
						|  | "मैं हिंदी भाषा सीख रहा हूँ।" | 
					
						
						|  | ] | 
					
						
						|  | embeddings = model.encode(sentences) | 
					
						
						|  | print(f"Embedding shape: {embeddings.shape}") | 
					
						
						|  |  | 
					
						
						|  | # Compute similarity between sentences | 
					
						
						|  | similarity = model.compute_similarity(sentences[0], sentences[1]) | 
					
						
						|  | print(f"Similarity: {similarity:.4f}") | 
					
						
						|  |  | 
					
						
						|  | # Perform semantic search | 
					
						
						|  | query = "भारत की राजधानी" | 
					
						
						|  | documents = [ | 
					
						
						|  | "दिल्ली भारत की राजधानी है।", | 
					
						
						|  | "मुंबई भारत का सबसे बड़ा शहर है।", | 
					
						
						|  | "हिमालय पर्वत भारत के उत्तर में स्थित है।" | 
					
						
						|  | ] | 
					
						
						|  | results = model.search(query, documents) | 
					
						
						|  | for i, result in enumerate(results): | 
					
						
						|  | print(f"{i+1}. Score: {result['score']:.4f}") | 
					
						
						|  | print(f"   Document: {result['document']}") | 
					
						
						|  |  | 
					
						
						|  | # Visualize embeddings | 
					
						
						|  | example_sentences = [ | 
					
						
						|  | "मुझे हिंदी में पढ़ना बहुत पसंद है।", | 
					
						
						|  | "आज मौसम बहुत अच्छा है।", | 
					
						
						|  | "भारत एक विशाल देश है।" | 
					
						
						|  | ] | 
					
						
						|  | model.visualize_embeddings(example_sentences) | 
					
						
						|  | ``` | 
					
						
						|  |  | 
					
						
						|  | ## Model Details | 
					
						
						|  |  | 
					
						
						|  | This model uses an advanced transformer-based architecture with the following enhancements: | 
					
						
						|  |  | 
					
						
						|  | - Pre-layer normalization for stable training | 
					
						
						|  | - Specialized attention mechanism with relative positional encoding | 
					
						
						|  | - Multiple pooling strategies (weighted, mean, attention-based) | 
					
						
						|  | - L2-normalized vectors for cosine similarity | 
					
						
						|  |  | 
					
						
						|  | Technical specifications: | 
					
						
						|  | - Embedding dimension: 768 | 
					
						
						|  | - Hidden dimension: 768 | 
					
						
						|  | - Layers: 12 | 
					
						
						|  | - Attention heads: 12 | 
					
						
						|  | - Vocabulary size: 50,000 | 
					
						
						|  | - Context length: 128 tokens | 
					
						
						|  |  | 
					
						
						|  | ## Applications | 
					
						
						|  |  | 
					
						
						|  | - Semantic search and information retrieval | 
					
						
						|  | - Text clustering and categorization | 
					
						
						|  | - Recommendation systems | 
					
						
						|  | - Question answering | 
					
						
						|  | - Document similarity comparison | 
					
						
						|  | - Content-based filtering | 
					
						
						|  |  | 
					
						
						|  | ## License | 
					
						
						|  |  | 
					
						
						|  | This model is released under the MIT License. | 
					
						
						|  |  | 
					
						
						|  | ## Citation | 
					
						
						|  |  | 
					
						
						|  | If you use this model in your research or application, please cite us: | 
					
						
						|  |  | 
					
						
						|  | ``` | 
					
						
						|  | @misc{DeepMostInnovations2025hindi, | 
					
						
						|  | author = {DeepMost Innovations}, | 
					
						
						|  | title = {Hindi Sentence Embeddings Model}, | 
					
						
						|  | year = {2025}, | 
					
						
						|  | publisher = {Hugging Face}, | 
					
						
						|  | howpublished = {\url{https://huggingface.co/DeepMostInnovations/hindi-embedding-foundational-model}} | 
					
						
						|  | } | 
					
						
						|  | ``` | 
					
						
						|  |  |