library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
- embeddings
- semantic-search
- sentence-transformers
- presentation-templates
- information-retrieval
Field-adaptive-bi-encoder
Model Details
Model Description
A fine-tuned SentenceTransformers bi-encoder model for semantic similarity and information retrieval. This model is specifically trained for finding relevant presentation templates based on user queries, descriptions, and metadata (industries, categories, tags).
Developed by: Mudasir Syed (mudasir13cs)
Model type: SentenceTransformer (Bi-encoder)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: Microsoft/MiniLM-L12-H384-uncased
Model Sources
Repository: https://github.com/mudasir13cs/hybrid-search
Uses
Direct Use
This model is designed for semantic search and information retrieval tasks, specifically for finding relevant presentation templates based on natural language queries.
Downstream Use
- Presentation template recommendation systems
- Content discovery platforms
- Semantic search engines
- Information retrieval systems
Out-of-Scope Use
- Text generation
- Question answering
- Machine translation
- Any task not related to semantic similarity
Bias, Risks, and Limitations
- The model is trained on presentation template data and may not generalize well to other domains
- Performance may vary based on the quality and diversity of training data
- The model inherits biases present in the base model and training data
How to Get Started with the Model
from sentence_transformers import SentenceTransformer
import torch
# Load the model
model = SentenceTransformer("mudasir13cs/Field-adaptive-bi-encoder")
# Encode text for similarity search
queries = ["business presentation template", "marketing slides for startups"]
embeddings = model.encode(queries)
# Compute similarity
from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {cosine_scores.item():.4f}")
Training Details
Training Data
- Dataset: Presentation template dataset with descriptions and queries
- Size: Custom dataset of presentation templates with metadata
- Source: Curated presentation template collection
Training Procedure
- Architecture: SentenceTransformer with triplet loss
- Loss Function: Triplet loss with hard negative mining
- Optimizer: AdamW
- Learning Rate: 2e-5
- Batch Size: 16
- Epochs: 3
Training Hyperparameters
- Training regime: Supervised learning with triplet loss
- Hardware: GPU (NVIDIA)
- Training time: ~2 hours
Evaluation
Testing Data, Factors & Metrics
- Testing Data: Validation split from presentation template dataset
- Factors: Query-description similarity, template relevance
- Metrics:
- MAP@K (Mean Average Precision at K)
- MRR@K (Mean Reciprocal Rank at K)
- Cosine similarity scores
Results
- MAP@10: ~0.85
- MRR@10: ~0.90
- Performance: Optimized for presentation template retrieval
Environmental Impact
- Hardware Type: NVIDIA GPU
- Hours used: ~2 hours
- Cloud Provider: Local/Cloud
- Carbon Emitted: Minimal (local training)
Technical Specifications
Model Architecture and Objective
- Architecture: Transformer-based bi-encoder
- Objective: Learn semantic representations for similarity search
- Input: Text sequences (queries and descriptions)
- Output: 384-dimensional embeddings
Compute Infrastructure
- Hardware: NVIDIA GPU
- Software: PyTorch, SentenceTransformers, Transformers
Citation
BibTeX:
@misc{field-adaptive-bi-encoder,
title={Field-adaptive Bi-encoder for Presentation Template Search},
author={Mudasir Syed},
year={2024},
url={https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder}
}
APA: Syed, M. (2024). Field-adaptive Bi-encoder for Presentation Template Search. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder
Model Card Authors
Mudasir Syed (mudasir13cs)
Model Card Contact
- GitHub: https://github.com/mudasir13cs
- Hugging Face: https://huggingface.co/mudasir13cs
Framework versions
- SentenceTransformers: 2.2.2
- Transformers: 4.35.0
- PyTorch: 2.0.0