mudasir13cs's picture
Update model card with comprehensive information
282e0a8 verified
metadata
library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
  - embeddings
  - semantic-search
  - sentence-transformers
  - presentation-templates
  - information-retrieval

Field-adaptive-bi-encoder

Model Details

Model Description

A fine-tuned SentenceTransformers bi-encoder model for semantic similarity and information retrieval. This model is specifically trained for finding relevant presentation templates based on user queries, descriptions, and metadata (industries, categories, tags).

Developed by: Mudasir Syed (mudasir13cs)

Model type: SentenceTransformer (Bi-encoder)

Language(s) (NLP): English

License: Apache 2.0

Finetuned from model: Microsoft/MiniLM-L12-H384-uncased

Model Sources

Repository: https://github.com/mudasir13cs/hybrid-search

Uses

Direct Use

This model is designed for semantic search and information retrieval tasks, specifically for finding relevant presentation templates based on natural language queries.

Downstream Use

  • Presentation template recommendation systems
  • Content discovery platforms
  • Semantic search engines
  • Information retrieval systems

Out-of-Scope Use

  • Text generation
  • Question answering
  • Machine translation
  • Any task not related to semantic similarity

Bias, Risks, and Limitations

  • The model is trained on presentation template data and may not generalize well to other domains
  • Performance may vary based on the quality and diversity of training data
  • The model inherits biases present in the base model and training data

How to Get Started with the Model

from sentence_transformers import SentenceTransformer
import torch

# Load the model
model = SentenceTransformer("mudasir13cs/Field-adaptive-bi-encoder")

# Encode text for similarity search
queries = ["business presentation template", "marketing slides for startups"]
embeddings = model.encode(queries)

# Compute similarity
from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {cosine_scores.item():.4f}")

Training Details

Training Data

  • Dataset: Presentation template dataset with descriptions and queries
  • Size: Custom dataset of presentation templates with metadata
  • Source: Curated presentation template collection

Training Procedure

  • Architecture: SentenceTransformer with triplet loss
  • Loss Function: Triplet loss with hard negative mining
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 3

Training Hyperparameters

  • Training regime: Supervised learning with triplet loss
  • Hardware: GPU (NVIDIA)
  • Training time: ~2 hours

Evaluation

Testing Data, Factors & Metrics

  • Testing Data: Validation split from presentation template dataset
  • Factors: Query-description similarity, template relevance
  • Metrics:
    • MAP@K (Mean Average Precision at K)
    • MRR@K (Mean Reciprocal Rank at K)
    • Cosine similarity scores

Results

  • MAP@10: ~0.85
  • MRR@10: ~0.90
  • Performance: Optimized for presentation template retrieval

Environmental Impact

  • Hardware Type: NVIDIA GPU
  • Hours used: ~2 hours
  • Cloud Provider: Local/Cloud
  • Carbon Emitted: Minimal (local training)

Technical Specifications

Model Architecture and Objective

  • Architecture: Transformer-based bi-encoder
  • Objective: Learn semantic representations for similarity search
  • Input: Text sequences (queries and descriptions)
  • Output: 384-dimensional embeddings

Compute Infrastructure

  • Hardware: NVIDIA GPU
  • Software: PyTorch, SentenceTransformers, Transformers

Citation

BibTeX:

@misc{field-adaptive-bi-encoder,
  title={Field-adaptive Bi-encoder for Presentation Template Search},
  author={Mudasir Syed},
  year={2024},
  url={https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder}
}

APA: Syed, M. (2024). Field-adaptive Bi-encoder for Presentation Template Search. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder

Model Card Authors

Mudasir Syed (mudasir13cs)

Model Card Contact

Framework versions

  • SentenceTransformers: 2.2.2
  • Transformers: 4.35.0
  • PyTorch: 2.0.0