File size: 4,404 Bytes
b93e3f8
282e0a8
 
 
b93e3f8
282e0a8
 
b93e3f8
282e0a8
 
b93e3f8
 
282e0a8
b93e3f8
 
 
 
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
 
b93e3f8
282e0a8
b93e3f8
282e0a8
 
b93e3f8
282e0a8
 
 
 
 
b93e3f8
282e0a8
 
 
 
 
b93e3f8
282e0a8
 
 
 
b93e3f8
282e0a8
b93e3f8
282e0a8
 
 
b93e3f8
282e0a8
 
b93e3f8
282e0a8
 
 
b93e3f8
282e0a8
 
 
 
 
b93e3f8
 
 
282e0a8
 
 
 
 
 
 
 
 
 
 
 
b93e3f8
 
282e0a8
 
 
b93e3f8
282e0a8
b93e3f8
282e0a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b93e3f8
282e0a8
b93e3f8
282e0a8
b93e3f8
282e0a8
 
 
 
 
b93e3f8
 
 
282e0a8
 
b93e3f8
 
282e0a8
b93e3f8
 
282e0a8
 
b93e3f8
282e0a8
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
license: apache-2.0
tags:
- embeddings
- semantic-search
- sentence-transformers
- presentation-templates
- information-retrieval
---

# Field-adaptive-bi-encoder

## Model Details

### Model Description
A fine-tuned SentenceTransformers bi-encoder model for semantic similarity and information retrieval. This model is specifically trained for finding relevant presentation templates based on user queries, descriptions, and metadata (industries, categories, tags).

**Developed by:** Mudasir Syed (mudasir13cs)

**Model type:** SentenceTransformer (Bi-encoder)

**Language(s) (NLP):** English

**License:** Apache 2.0

**Finetuned from model:** Microsoft/MiniLM-L12-H384-uncased

### Model Sources
**Repository:** https://github.com/mudasir13cs/hybrid-search

## Uses

### Direct Use
This model is designed for semantic search and information retrieval tasks, specifically for finding relevant presentation templates based on natural language queries.

### Downstream Use
- Presentation template recommendation systems
- Content discovery platforms
- Semantic search engines
- Information retrieval systems

### Out-of-Scope Use
- Text generation
- Question answering
- Machine translation
- Any task not related to semantic similarity

## Bias, Risks, and Limitations
- The model is trained on presentation template data and may not generalize well to other domains
- Performance may vary based on the quality and diversity of training data
- The model inherits biases present in the base model and training data

## How to Get Started with the Model

```python
from sentence_transformers import SentenceTransformer
import torch

# Load the model
model = SentenceTransformer("mudasir13cs/Field-adaptive-bi-encoder")

# Encode text for similarity search
queries = ["business presentation template", "marketing slides for startups"]
embeddings = model.encode(queries)

# Compute similarity
from sentence_transformers import util
cosine_scores = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {cosine_scores.item():.4f}")
```

## Training Details

### Training Data
- **Dataset:** Presentation template dataset with descriptions and queries
- **Size:** Custom dataset of presentation templates with metadata
- **Source:** Curated presentation template collection

### Training Procedure
- **Architecture:** SentenceTransformer with triplet loss
- **Loss Function:** Triplet loss with hard negative mining
- **Optimizer:** AdamW
- **Learning Rate:** 2e-5
- **Batch Size:** 16
- **Epochs:** 3

### Training Hyperparameters
- **Training regime:** Supervised learning with triplet loss
- **Hardware:** GPU (NVIDIA)
- **Training time:** ~2 hours

## Evaluation

### Testing Data, Factors & Metrics
- **Testing Data:** Validation split from presentation template dataset
- **Factors:** Query-description similarity, template relevance
- **Metrics:** 
  - MAP@K (Mean Average Precision at K)
  - MRR@K (Mean Reciprocal Rank at K)
  - Cosine similarity scores

### Results
- **MAP@10:** ~0.85
- **MRR@10:** ~0.90
- **Performance:** Optimized for presentation template retrieval

## Environmental Impact
- **Hardware Type:** NVIDIA GPU
- **Hours used:** ~2 hours
- **Cloud Provider:** Local/Cloud
- **Carbon Emitted:** Minimal (local training)

## Technical Specifications

### Model Architecture and Objective
- **Architecture:** Transformer-based bi-encoder
- **Objective:** Learn semantic representations for similarity search
- **Input:** Text sequences (queries and descriptions)
- **Output:** 384-dimensional embeddings

### Compute Infrastructure
- **Hardware:** NVIDIA GPU
- **Software:** PyTorch, SentenceTransformers, Transformers

## Citation

**BibTeX:**
```bibtex
@misc{field-adaptive-bi-encoder,
  title={Field-adaptive Bi-encoder for Presentation Template Search},
  author={Mudasir Syed},
  year={2024},
  url={https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder}
}
```

**APA:**
Syed, M. (2024). Field-adaptive Bi-encoder for Presentation Template Search. Hugging Face. https://huggingface.co/mudasir13cs/Field-adaptive-bi-encoder

## Model Card Authors
Mudasir Syed (mudasir13cs)

## Model Card Contact
- **GitHub:** https://github.com/mudasir13cs
- **Hugging Face:** https://huggingface.co/mudasir13cs

## Framework versions
- SentenceTransformers: 2.2.2
- Transformers: 4.35.0
- PyTorch: 2.0.0