Upload folder using huggingface_hub
Browse files- README.md +23 -1
- SETUP.md +1 -1
- USAGE_EXAMPLES.md +1 -1
README.md
CHANGED
|
@@ -125,6 +125,26 @@ model-index:
|
|
| 125 |
|
| 126 |
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) specifically for **Indonesian language** text embedding tasks. It maps Indonesian sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
## 🇮🇩 **Specialized for Indonesian Language**
|
| 129 |
|
| 130 |
This model is optimized for Indonesian text understanding across multiple domains including:
|
|
@@ -175,12 +195,14 @@ First install the Sentence Transformers library:
|
|
| 175 |
pip install -U sentence-transformers
|
| 176 |
```
|
| 177 |
|
|
|
|
|
|
|
| 178 |
Then you can load this model and run inference.
|
| 179 |
```python
|
| 180 |
from sentence_transformers import SentenceTransformer
|
| 181 |
|
| 182 |
# Download from the 🤗 Hub
|
| 183 |
-
model = SentenceTransformer("asmud/nomic-embed-indonesian")
|
| 184 |
# Run inference with Indonesian text
|
| 185 |
sentences = [
|
| 186 |
'search_query: Apa itu kecerdasan buatan?',
|
|
|
|
| 125 |
|
| 126 |
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) specifically for **Indonesian language** text embedding tasks. It maps Indonesian sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
| 127 |
|
| 128 |
+
## 🚀 Quick Start
|
| 129 |
+
|
| 130 |
+
```python
|
| 131 |
+
from sentence_transformers import SentenceTransformer
|
| 132 |
+
|
| 133 |
+
# Load the model (requires trust_remote_code=True)
|
| 134 |
+
model = SentenceTransformer("asmud/nomic-embed-indonesian", trust_remote_code=True)
|
| 135 |
+
|
| 136 |
+
# Indonesian text examples
|
| 137 |
+
texts = [
|
| 138 |
+
"search_query: Apa itu kecerdasan buatan?",
|
| 139 |
+
"search_document: Kecerdasan buatan adalah teknologi yang memungkinkan mesin belajar",
|
| 140 |
+
"classification: Produk ini sangat berkualitas (sentimen: positif)"
|
| 141 |
+
]
|
| 142 |
+
|
| 143 |
+
# Generate embeddings
|
| 144 |
+
embeddings = model.encode(texts)
|
| 145 |
+
print(f"Embedding shape: {embeddings.shape}") # (3, 768)
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
## 🇮🇩 **Specialized for Indonesian Language**
|
| 149 |
|
| 150 |
This model is optimized for Indonesian text understanding across multiple domains including:
|
|
|
|
| 195 |
pip install -U sentence-transformers
|
| 196 |
```
|
| 197 |
|
| 198 |
+
⚠️ **Important**: This model requires `trust_remote_code=True` due to custom model architecture.
|
| 199 |
+
|
| 200 |
Then you can load this model and run inference.
|
| 201 |
```python
|
| 202 |
from sentence_transformers import SentenceTransformer
|
| 203 |
|
| 204 |
# Download from the 🤗 Hub
|
| 205 |
+
model = SentenceTransformer("asmud/nomic-embed-indonesian", trust_remote_code=True)
|
| 206 |
# Run inference with Indonesian text
|
| 207 |
sentences = [
|
| 208 |
'search_query: Apa itu kecerdasan buatan?',
|
SETUP.md
CHANGED
|
@@ -74,7 +74,7 @@ After uploading, verify the model works:
|
|
| 74 |
from sentence_transformers import SentenceTransformer
|
| 75 |
|
| 76 |
# Load the uploaded model
|
| 77 |
-
model = SentenceTransformer("asmud/nomic-embed-indonesian")
|
| 78 |
|
| 79 |
# Test Indonesian text
|
| 80 |
texts = [
|
|
|
|
| 74 |
from sentence_transformers import SentenceTransformer
|
| 75 |
|
| 76 |
# Load the uploaded model
|
| 77 |
+
model = SentenceTransformer("asmud/nomic-embed-indonesian", trust_remote_code=True)
|
| 78 |
|
| 79 |
# Test Indonesian text
|
| 80 |
texts = [
|
USAGE_EXAMPLES.md
CHANGED
|
@@ -7,7 +7,7 @@ from sentence_transformers import SentenceTransformer
|
|
| 7 |
from sklearn.metrics.pairwise import cosine_similarity
|
| 8 |
import numpy as np
|
| 9 |
|
| 10 |
-
model = SentenceTransformer("asmud/nomic-embed-indonesian")
|
| 11 |
|
| 12 |
# Indonesian search example
|
| 13 |
query = "search_query: Bagaimana cara memasak rendang?"
|
|
|
|
| 7 |
from sklearn.metrics.pairwise import cosine_similarity
|
| 8 |
import numpy as np
|
| 9 |
|
| 10 |
+
model = SentenceTransformer("asmud/nomic-embed-indonesian", trust_remote_code=True)
|
| 11 |
|
| 12 |
# Indonesian search example
|
| 13 |
query = "search_query: Bagaimana cara memasak rendang?"
|