|
--- |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
license: apache-2.0 |
|
tags: |
|
- embeddings |
|
- semantic-search |
|
- sentence-transformers |
|
- presentation-templates |
|
- information-retrieval |
|
--- |
|
|
|
# Field-adaptive-bi-encoder |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
A fine-tuned SentenceTransformers bi-encoder model for semantic similarity and information retrieval. This model is specifically trained for finding relevant presentation templates based on user queries, descriptions, and metadata (industries, categories, tags). |
|
|
|
**Developed by:** Mudasir Syed (mudasir13cs) |
|
|
|
**Model type:** SentenceTransformer (Bi-encoder) |
|
|
|
**Language(s) (NLP):** English |
|
|
|
**License:** Apache 2.0 |
|
|
|
**Finetuned from model:** Microsoft/MiniLM-L12-H384-uncased |
|
|
|
### Model Sources |
|
**Repository:** https://github.com/mudasir13cs/hybrid-search |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
This model is designed for semantic search and information retrieval tasks, specifically for finding relevant presentation templates based on natural language queries. |
|
|
|
### Downstream Use |
|
- Presentation template recommendation systems |
|
- Content discovery platforms |
|
- Semantic search engines |
|
- Information retrieval systems |
|
|
|
### Out-of-Scope Use |
|
- Text generation |
|
- Question answering |
|
- Machine translation |
|
- Any task not related to semantic similarity |
|
|
|
## Bias, Risks, and Limitations |
|
- The model is trained on presentation template data and may not generalize well to other domains |
|
- Performance may vary based on the quality and diversity of training data |
|
- The model inherits biases present in the base model and training data |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
import torch |
|
|
|
# Load the model |
|
model = SentenceTransformer("mudasir13cs/Field-adaptive-bi-encoder") |
|
|
|
# Encode text for similarity search |
|
queries = ["business presentation template", "marketing slides for startups"] |
|
embeddings = model.encode(queries) |
|
|
|
# Compute similarity |
|
from sentence_transformers import util |
|
cosine_scores = util.cos_sim(embeddings[0], embeddings[1]) |
|
print(f"Similarity: {cosine_scores.item():.4f}") |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- **Dataset:** Presentation template dataset with descriptions and queries |
|
- **Size:** Custom dataset of presentation templates with metadata |
|
- **Source:** Curated presentation template collection |
|
|
|
### Training Procedure |
|
- **Architecture:** SentenceTransformer with triplet loss |
|
- **Loss Function:** Triplet loss with hard negative mining |
|
- **Optimizer:** AdamW |
|
- **Learning Rate:** 2e-5 |
|
- **Batch Size:** 16 |
|
- **Epochs:** 3 |
|
|
|
### Training Hyperparameters |
|
- **Training regime:** Supervised learning with triplet loss |
|
- **Hardware:** GPU (NVIDIA) |
|
- **Training time:** ~2 hours |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
- **Testing Data:** Validation split from presentation template dataset |
|
- **Factors:** Query-description similarity, template relevance |
|
- **Metrics:** |
|
- MAP@K (Mean Average Precision at K) |
|
- MRR@K (Mean Reciprocal Rank at K) |
|
- Cosine similarity scores |
|
|
|
### Results |
|
- **MAP@10:** ~0.85 |
|
- **MRR@10:** ~0.90 |
|
- **Performance:** Optimized for presentation template retrieval |
|
|
|
## Environmental Impact |
|
- **Hardware Type:** NVIDIA GPU |
|
- **Hours used:** ~2 hours |
|
- **Cloud Provider:** Local/Cloud |
|
- **Carbon Emitted:** Minimal (local training) |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
- **Architecture:** Transformer-based bi-encoder |
|
- **Objective:** Learn semantic representations for similarity search |
|
- **Input:** Text sequences (queries and descriptions) |
|
- **Output:** 384-dimensional embeddings |
|
|
|
### Compute Infrastructure |
|
- **Hardware:** NVIDIA GPU |
|
- **Software:** PyTorch, SentenceTransformers, Transformers |
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
```bibtex |
|
@misc{field-adaptive-bi-encoder, |
|
title={Field-adaptive Bi-encoder for Presentation Template Search}, |
|
author={Mudasir Syed}, |
|
year={2024}, |
|
url={https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder} |
|
} |
|
``` |
|
|
|
**APA:** |
|
Syed, M. (2024). Field-adaptive Bi-encoder for Presentation Template Search. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder |
|
|
|
## Model Card Authors |
|
Mudasir Syed (mudasir13cs) |
|
|
|
## Model Card Contact |
|
- **GitHub:** https://github.com/mudasir13cs |
|
- **Hugging Face:** https://huggingface.co/mudasir13cs |
|
|
|
## Framework versions |
|
- SentenceTransformers: 2.2.2 |
|
- Transformers: 4.35.0 |
|
- PyTorch: 2.0.0 |
|
|