Omartificial-Intelligence-Space
/

Semantic-Ar-Qwen-Embed-0.6B

@@ -7,13 +7,13 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:2280319
 - loss:MatryoshkaLoss
 - loss:MultipleNegativesRankingLoss
 base_model: Qwen/Qwen3-Embedding-0.6B
 widget:
 - source_sentence: >-
-    أقترح أن تجد بنكًا في بلدك المحلي، وأن تفكر في فتح حساب مصرفي مقوم باليورو لديهم.
   sentences:
   - يمكنك مزج هذه الأمور، ولكن من تجربتي، سيكون الأمر صعبًا جدًا في البداية.
   - المرأة تضع ظلال العيون بقلم.
@@ -24,8 +24,8 @@ widget:
   - امرأة تركب فيلًا.
   - طائر أصفر وبرتقالي متمسك بجانب قفص.
 - source_sentence: >-
-    إذا تمكنت من تجاوز "عامل الاشمئزاز"، فسيكون لديك مصدر سهل الاستخدام من السماد
-    العضوي النيتروجيني.
   sentences:
   - أرقام NPK على السماد تمثل النسبة المئوية، بالوزن، للنيتروجين وP2O5 وK2O.
   - تجميع ويكيبيديا لقواعد السفر عبر الزمن هو مصدر جيد لفهم هذا الموضوع.
@@ -34,11 +34,15 @@ widget:
   sentences:
   - رجل يرقص.
   - أسد الجبل يطارد دبًا.
-  - لأغراض الشمول، يحتوي برنامج Pages من Apple على العديد من قوالب الملصقات الجيدة.
 - source_sentence: الجانب الأيسر من محرك قطار فضي.
   sentences:
   - قرد يركب حافلة.
-  - إحدى الأفكار التي كانت تُطرح منذ الثمانينات هي أنه يمكنك التمييز بين "الحركات" و"الثبات".
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 license: apache-2.0
@@ -58,7 +62,26 @@ It maps sentences & paragraphs to a 1024-dimensional dense vector space and can
 - **Output Dimensionality:** 1024 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Language:** ar
-<!-- - **License:** Unknown -->
 ### Full Model Architecture
@@ -83,22 +106,37 @@ Then you can load this model and run inference.
 ```python
 from sentence_transformers import SentenceTransformer
-# Download from the 🤗 Hub
 model = SentenceTransformer("Omartificial-Intelligence-Space/Semantic-Ar-Qwen-Embed-V0.1")
-# Run inference
 sentences = [
     'Left side of a silver train engine.',
     'A close-up of a black train engine.',
     "One idea that's been going around at least since the 80s is that you can distinguish between Holds and Moves.",
 ]
 embeddings = model.encode(sentences)
-print(embeddings.shape)
-# [3, 1024]
-# Get the similarity scores for the embeddings
 similarities = model.similarity(embeddings, embeddings)
-print(similarities.shape)
-# [3, 3]
 ```
 ## Citation
@@ -140,4 +178,4 @@ print(similarities.shape)
     archivePrefix={arXiv},
     primaryClass={cs.CL}
 }
-```

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
 - loss:MatryoshkaLoss
 - loss:MultipleNegativesRankingLoss
 base_model: Qwen/Qwen3-Embedding-0.6B
 widget:
 - source_sentence: >-
+    أقترح أن تجد بنكًا في بلدك المحلي، وأن تفكر في فتح حساب مصرفي مقوم باليورو
+    لديهم.
   sentences:
   - يمكنك مزج هذه الأمور، ولكن من تجربتي، سيكون الأمر صعبًا جدًا في البداية.
   - المرأة تضع ظلال العيون بقلم.
   - امرأة تركب فيلًا.
   - طائر أصفر وبرتقالي متمسك بجانب قفص.
 - source_sentence: >-
+    إذا تمكنت من تجاوز "عامل الاشمئزاز"، فسيكون لديك مصدر سهل الاستخدام من
+    السماد العضوي النيتروجيني.
   sentences:
   - أرقام NPK على السماد تمثل النسبة المئوية، بالوزن، للنيتروجين وP2O5 وK2O.
   - تجميع ويكيبيديا لقواعد السفر عبر الزمن هو مصدر جيد لفهم هذا الموضوع.
   sentences:
   - رجل يرقص.
   - أسد الجبل يطارد دبًا.
+  - >-
+    لأغراض الشمول، يحتوي برنامج Pages من Apple على العديد من قوالب الملصقات
+    الجيدة.
 - source_sentence: الجانب الأيسر من محرك قطار فضي.
   sentences:
   - قرد يركب حافلة.
+  - >-
+    إحدى الأفكار التي كانت تُطرح منذ الثمانينات هي أنه يمكنك التمييز بين
+    "الحركات" و"الثبات".
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 license: apache-2.0
 - **Output Dimensionality:** 1024 dimensions
 - **Similarity Function:** Cosine Similarity
 - **Language:** ar
+### 📊 Performance Evaluation
+This model has been evaluated on Arabic semantic similarity benchmarks using the [MTEB](https://github.com/embeddings-benchmark/mteb) framework. The results below reflect **Spearman correlation scores** on two key tasks: **STS17** and **STS22.v2**.
+| **Model**                         | **STS17 (Spearman)** | **STS22.v2 (Spearman)** |
+|----------------------------------|----------------------|--------------------------|
+| Qwen3 Embeddings 0.6B            | 0.7505               | 0.6520                   |
+| Qwen3 Embeddings 4B              | 0.7912               | **0.6669**               |
+| Semantic-Ar-Qwen-Embed-V0.1 🏆   | **0.8300**           | 0.6130                   |
+> ✅ **STS17**: Classic sentence similarity
+> 🧪 **STS22.v2**: Diverse and challenging sentence pairs
+### 📌 Highlights
+- **Semantic-Ar-Qwen-Embed-V0.1** achieves the **highest score on STS17**, indicating deep understanding of sentence semantics in Arabic.
+- **Qwen3 4B** performs best on **STS22.v2**, showing strength on broader generalization.
+- The **0.6B model** remains competitive despite its smaller size.
 ### Full Model Architecture
 ```python
 from sentence_transformers import SentenceTransformer
+# Load model from Hugging Face Hub
 model = SentenceTransformer("Omartificial-Intelligence-Space/Semantic-Ar-Qwen-Embed-V0.1")
+# Sentences for embedding (English + Arabic)
 sentences = [
     'Left side of a silver train engine.',
     'A close-up of a black train engine.',
     "One idea that's been going around at least since the 80s is that you can distinguish between Holds and Moves.",
+    "الجانب الأيسر من محرك قطار فضي.",
+    "صورة مقربة لمحرك قطار أسود.",
+    "إحدى الأفكار المتداولة منذ الثمانينات هي إمكانية التمييز بين الثبات والحركة.",
 ]
+# Generate embeddings
 embeddings = model.encode(sentences)
+print("Embedding shape:", embeddings.shape)
+# Output: (6, 1024)
+# Compute similarity matrix
 similarities = model.similarity(embeddings, embeddings)
+print("Similarity shape:", similarities.shape)
+# Output: (6, 6)
+# Optionally print similarity scores
+import numpy as np
+import pandas as pd
+df = pd.DataFrame(np.round(similarities, 3), index=sentences, columns=sentences)
+print("\nSimilarity matrix:\n")
+print(df)
 ```
 ## Citation
     archivePrefix={arXiv},
     primaryClass={cs.CL}
 }
+```