metadata

library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
  - embeddings
  - semantic-search
  - sentence-transformers
  - presentation-templates
  - information-retrieval

Field-adaptive-bi-encoder

Model Details

Model Description

A fine-tuned SentenceTransformers bi-encoder model for semantic similarity and information retrieval. This model is specifically trained for finding relevant presentation templates based on user queries, descriptions, and metadata (industries, categories, tags).

Developed by: Mudasir Syed (mudasir13cs)

Model type: SentenceTransformer (Bi-encoder)

Language(s) (NLP): English

License: Apache 2.0

Finetuned from model: Microsoft/MiniLM-L12-H384-uncased

Model Sources

Repository: https://github.com/mudasir13cs/hybrid-search

Uses

Direct Use

This model is designed for semantic search and information retrieval tasks, specifically for finding relevant presentation templates based on natural language queries.

Downstream Use

Presentation template recommendation systems
Content discovery platforms
Semantic search engines
Information retrieval systems

Out-of-Scope Use

Text generation
Question answering
Machine translation
Any task not related to semantic similarity

Bias, Risks, and Limitations

The model is trained on presentation template data and may not generalize well to other domains
Performance may vary based on the quality and diversity of training data
The model inherits biases present in the base model and training data

How to Get Started with the Model

from sentence_transformers import SentenceTransformer
import torch

# Load the model
model = SentenceTransformer("mudasir13cs/Field-adaptive-bi-encoder")

# Encode text for similarity search
queries = ["business presentation template", "marketing slides for startups"]
embeddings = model.encode(queries)

# Compute similarity
from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {cosine_scores.item():.4f}")

Training Details

Training Data

Dataset: Presentation template dataset with descriptions and queries
Size: Custom dataset of presentation templates with metadata
Source: Curated presentation template collection

Training Procedure

Architecture: SentenceTransformer with triplet loss
Loss Function: Triplet loss with hard negative mining
Optimizer: AdamW
Learning Rate: 2e-5
Batch Size: 16
Epochs: 3

Training Hyperparameters

Training regime: Supervised learning with triplet loss
Hardware: GPU (NVIDIA)
Training time: ~2 hours

Evaluation

Testing Data, Factors & Metrics

Testing Data: Validation split from presentation template dataset
Factors: Query-description similarity, template relevance
Metrics:
- MAP@K (Mean Average Precision at K)
- MRR@K (Mean Reciprocal Rank at K)
- Cosine similarity scores

Results

MAP@10: ~0.85
MRR@10: ~0.90
Performance: Optimized for presentation template retrieval

Environmental Impact

Hardware Type: NVIDIA GPU
Hours used: ~2 hours
Cloud Provider: Local/Cloud
Carbon Emitted: Minimal (local training)

Technical Specifications

Model Architecture and Objective

Architecture: Transformer-based bi-encoder
Objective: Learn semantic representations for similarity search
Input: Text sequences (queries and descriptions)
Output: 384-dimensional embeddings

Compute Infrastructure

Hardware: NVIDIA GPU
Software: PyTorch, SentenceTransformers, Transformers

Citation

BibTeX:

@misc{field-adaptive-bi-encoder,
  title={Field-adaptive Bi-encoder for Presentation Template Search},
  author={Mudasir Syed},
  year={2024},
  url={https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder}
}

APA: Syed, M. (2024). Field-adaptive Bi-encoder for Presentation Template Search. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder

Model Card Authors

Mudasir Syed (mudasir13cs)

Model Card Contact

GitHub: https://github.com/mudasir13cs
Hugging Face: https://huggingface.co/mudasir13cs

Framework versions

SentenceTransformers: 2.2.2
Transformers: 4.35.0
PyTorch: 2.0.0